Monitor logical replication using LSN

All we need is an easy explanation of the problem, so here it is.

How do I monitor a logical replication in Postgresql 12 by seeing the lsn ?

What I have done :
Inspect some colomns on Publisher and Subscriber.

Publisher side :

select * from pg_stat_replication; — see the REPLAY_LSN

select * from pg_replication_slots; — see the CONFIRMED_FLUSH_LSN

Subscriber side :

select * from pg_catalog.pg_stat_subscription; — see the RECEIVED_LSN and LATEST_END_LSN

I make sure all the values are the same across those colomns.

Am I correct ? Or Are there any other way to see that the replication work by inspecting some parameters ?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I used replication on Postgres 12.

On the publisher side there are few things you can check:

pg_catalog.pg_publication;
pg_catalog.pg_publication_tables;
pg_current_wal_lsn();

I’ll create a publication "test_publication" with two tables: t_1 and t_2. I won’t cover the prerequisites (user, roles and so on).

test_logical_replication=# create publication test_publication for table t_1, t_2;  
CREATE PUBLICATION  
test_logical_replication=# select * from pg_catalog.pg_publication;  
pubname      | pubowner | puballtables | pubinsert | pubupdate | pubdelete  
-----------------+----------+--------------+-----------+-----------+-----------  
test_publication |       10 | f            | t         | t         | t  
(1 row) 

test_logical_replication=# select * from pg_publication_tables;    
    pubname      | schemaname | tablename  
    ------------------+------------+-----------  
     test_publication | public     | t_1  
     test_publication | public     | t_2  
    (2 rows)  

On the subscriber side:

test_logical_replication_subscriber=# create subscription test_subscription CONNECTION 'dbname=test_logical_replication host=XXX user=repuser' PUBLICATION test_publication;  
NOTICE:  created replication slot "test_subscription" on publisher   
CREATE SUBSCRIPTION 

Interesting information is in the table pg_catalog.pg_stat_subscription.
Here the important columns are:

  • received_lsn: Last write-ahead log location received .
  • last_msg_send_time: send time of last message received from the publisher.
  • last_msg_receipt_time: Receipt time of last message received from the publisher.
  • latest_end_lsn: Last write-ahead log location reported to the publisher.
  • latest_end_time: Time of last write-ahead log location reported to the publisher.

You have to check these columns to catch what is happening.
First, check if the two databases are in sync;

Publisher side:

test_logical_replication=> select pg_current_wal_lsn();  
 pg_current_wal_lsn  
--------------------  
 0/8EB83768     

This shows the location in the WAL file where we are now, before starting a new insert.

We can check on the subscriber that at this moment the two database are in sync, because the value returned by pg_current_wal_lsn() on the publisher matches the value in the columns received_lsn and latest_end_lsn on the subscriber:

test_logical_replication_subscriber=# select received_lsn, latest_end_lsn from pg_catalog.pg_stat_subscription;  

received_lsn    | latest_end_lsn  
----------------+------------------     
 0/8EB83768     | 0/8EB83768        

I’ll add 4000 rows to table t_1, and see what happens on the publisher:

test_logical_replication=> insert into t_1 select id+1, txt||'--BB' from t_1;  
INSERT 0 4000  


test_logical_replication=> select pg_current_wal_lsn();  
 pg_current_wal_lsn
--------------------
 0/8EC4B9D0             <<< this value in increasing
(1 row)

test_logical_replication=> select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/8EC4DE78             <<< this value in increasing
(1 row)

test_logical_replication=> select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/8EC4DEB0             <<< this value in increasing
(1 row) 

test_logical_replication=> select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/8EC4DEB0            <<< same value, WAL sending has finished
(1 row)

Let’s look how pg_catalog.pg_stat_subscription values change during replication on the subscriber:

test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |    last_msg_receipt_time     | latest_end_lsn |        latest_end_time  
--------------+-------------------------------+------------------------------+----------------+-------------------------------
 0/8EC4B9D0   | 2018-12-17 11:39:56.014564+01 | 2018-12-17 11:39:56.07322+01 | 0/8EC4B9D0     | 2018-12-17 11:39:56.014564+01
(1 row)  

test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time
--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 0/8EC4BA08   | 2018-12-17 11:39:56.737101+01 | 2018-12-17 11:39:56.736303+01 | 0/8EC4BA08     | 2018-12-17 11:39:56.737101+01  
(1 row)  

test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time
--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 0/8EC4DE78   | 2018-12-17 11:40:04.184765+01 | 2018-12-17 11:40:04.183937+01 | 0/8EC4DE78     | 2018-12-17 11:40:04.184765+01
(1 row)  

 test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |   last_msg_receipt_time    | latest_end_lsn |        latest_end_time
--------------+-------------------------------+----------------------------+----------------+-------------------------------
 0/8EC4DEB0   | 2018-12-17 11:40:17.153797+01 | 2018-12-17 11:40:17.153+01 | 0/8EC4DEB0     | 2018-12-17 11:40:17.153797+01
(1 row)

As you can see, on the subscriber the four columns shows how WAL is arriving from the publisher and how it is applied. The difference in time in the columns last_msg_send_time and last_msg_receipt_time can give information about the lag between the publisher and subscriber.
In this case, the two servers are on different subnets in the same data centre.

Take into account that the two servers I used are test servers and ARE NOT perfect in sync between them. (The subscriber server has not an NTP server configured at all).

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply