Originally from: tweet, LinkedIn post.
I post a new PostgreSQL "howto" article every day. Join me in this journey – subscribe, provide feedback, share!
When connected to a primary (or standby leader in case of cascaded replication), one can use pg_stat_replication
:
nik=# \d pg_stat_replication
View "pg_catalog.pg_stat_replication"
Column | Type | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
pid | integer | | |
usesysid | oid | | |
usename | name | | |
application_name | text | | |
client_addr | inet | | |
client_hostname | text | | |
client_port | integer | | |
backend_start | timestamp with time zone | | |
backend_xmin | xid | | |
state | text | | |
sent_lsn | pg_lsn | | |
write_lsn | pg_lsn | | |
flush_lsn | pg_lsn | | |
replay_lsn | pg_lsn | | |
write_lag | interval | | |
flush_lag | interval | | |
replay_lag | interval | | |
sync_priority | integer | | |
sync_state | text | | |
reply_time | timestamp with time zone | | |
This view contains information for both physical (only those using streaming, not WAL shipping) and logical replicas.
The lag values here are measured in bytes (columns ***_lsn
) or in time intervals (columns ***_lag
), and multiple
steps can be observed for each replication stream.
To analyze the LSN
values in this system view, we need to subtract them from current LSN
values on the server we're
connected to:
- if it's primary (
pg_is_in_recovery()
returnsfalse
), then usepg_current_wal_lsn()
- otherwise (standby leader), use
pg_last_wal_replay_lsn()
To subtract, we can use the function pg_wal_lsn_diff()
or the -
operator.
Docs: Monitoring pg_stat_replication view.
Examples of queries to view lags:
When connected to a physical replica (standby), to get its lag as a time interval:
select now() - pg_last_xact_replay_timestamp();
In some cases, pg_last_xact_replay_timestamp()
may return NULL
:
- if the standby server has just started and hasn't replayed any transactions,
- if there are no recent transactions on the primary.
This behavior of pg_last_xact_replay_timestamp()
might lead to wrong conclusions that standby is lagging and that
replication is not healthy – this isn't uncommon in low-activity setups (e.g., non-production environments).
Docs: pg_last_xact_replay_timestamp.
The lag can be observed from the primary (in the contest of logical replication, it's usually called "publisher).
In addition to pg_stat_replication
already discussed above, pg_replication_slots
can also be used:
select
slot_name,
pg_current_wal_lsn() - confirmed_flush_lsn as lag_bytes
from pg_replication_slots;
In some cases, you might need to deal with a combination of logical and physical replication. For example, consider the case:
- Cluster A: a regular cluster of 3 nodes (primary + 2 physical standbys).
- Cluster B: also primary + 2 physical nodes, and the primary connected to the cluster A's primary via logical replication
This is a typical situation when a complex change (e.g., a major upgrade) is performed involving logical replication. At some point, you might want to redirect a portion of the read-only traffic from cluster A's standbys to cluster B's standbys.
In this case, if we need to understand how each of the nodes is lagging, and we want the code to work well with all nodes, it can be tricky.
Here is how it could be solved (credits: Dylan Griffith from GitLab), assuming we can get information from both the node we're analyzing and the cluster A's primary (keeping all the comments above in mind):
-
First, get
LSN
on the primary:select pg_current_wal_lsn() as primary_lsn;
-
Then obtain the
LSN
location of the observed node, and use it to calculate the lag value in bytes:with current_node as ( select case when exists (select from pg_replication_origin_status) then ( select remote_lsn from pg_replication_origin_status ) when pg_is_in_recovery() then pg_last_wal_replay_lsn() else pg_current_wal_lsn() end as lsn ) select lsn – {{primary_lsn}} as lag_bytes from current_node;