Replication lag 2 nodes cluster

Hello,
I have a setup of PXC with 2 nodes in aws, both in the save AZ with 0.168 ms lag.
My setup is Proxysql with 1 writer and 1 reader, with query routing.
When i insert 48k raw and run count right after i get 0 raw. if i add 430ms delay after the insert i can find the new raw in the reader node

Without sleep:

create table
fill members
filter members
Sunday 25th of February 2018 12:51:27 PM
no sleep
QUERY SELECT COUNT(member_id) count FROM tmp_auto_submit_1776
TOTAL FOUND 0

With 430ms sleep:

create table
fill members
filter members
Sunday 25th of February 2018 12:51:57 PM
430ms sleep
QUERY SELECT COUNT(member_id) count FROM tmp_auto_submit_1776
TOTAL FOUND 48064

wsrep_sync_wait value is 3 which improved the latency from 1 sec to 430ms.

Please advice

Thanks!

  1. Setting wsrep_sync_wait to 3 will place SELECT request after the active replication request.
  2. Say you are inserting all 48K rows as a single transaction using INSERT. This means once the transaction is complete it will start replicating.
  3. On the reader node when you fire SELECT with the sync_wait=3 packet for SELECT will be placed in a queue that will be served once the replicating transaction completes.
  4. From your description, it sounds like it takes 430 ms for the replication applier to complete.

I presume you need instant access to the data w/o the said latency.

  • You can play around with wsrep_sync_wait setting. Try setting it to 0 (DISABLED) and check the effect.
  • You can split loading transaction into 5K rows each (if application allow) so that this transaction gets replicated immediately and applied immediately reducing latency further (even with wsrep_sync_wait=3)
  • You can also try to increase wsrep_slave_threads so if one replication thread is busy applying the first batch of 5K transaction, 2nd applier thread can pick and proceed with the second batch.

Let us know how it goes and we would be happy to support and help you achieve the bigger setup and goal.

Hi, It’s possible to split the transaction into chunks but i’m trying to avoid it.
After changing the wsrep_sync_wait to 0 there’s no change in the lag of the replication. still takes 430 ms to replicate 48K raws.
wsrep_slave_threads is set to 32

That probably means all 48K rows are being loaded as a single transaction. When the rows are being applied you can enable (wsrep-debug=1) on replicating node and check how they are being applied. (Note: wsrep-debug shouldn’t be enabled during production). Alternative is to check using show processlist.