Hello,
I am writing to report an issue with delays in our Galera cluster. Our developers have noticed that when a record is written on a write node, it does not appear immediately or shows outdated data when read from read node. Sometimes, this delay can last several seconds. This even applies to UPDATE single records. This issue does not occur when reading from the write node.
The average replication delay is around 1 ms, but under high load, it can reach up to 3 ms. The maximum Galera replication latency is usually around 30 ms, but occasionally it spikes to 303 ms or even 400 ms in PMM.
Should I monitoring any other indicators in PMM? I have increased wsrep_slave_threads from 16 to 24, but I haven’t observed any changes in the graphs. Is there any way to reduce this delay apart from decreasing the database load?
Hello @Silver3,
Yes, this is expected. It is not an issue/bug. PXC/Galera is not a synchronous commit (eg: 2-phase commit) style cluster. It is well-documented that replication of writesets between nodes on COMMIT is synchronous, but local apply is async. Thus, if you attempt to write to node1, commit, and then immediately try to read from node2, there is absolutely no guarantee that the write will have applied before your read on node2. There is a guarantee it will eventually apply, which is the nature of Galera.
You can modify the session variable [wsrep_sync_wait](https://docs.percona.com/percona-xtradb-cluster/8.0/wsrep-system-index.html#wsrep_sync_wait)
to make reads execute in order, but this will most likely add further execution delay.
Hmm. That is not typical. Using PMM, you should be able to determine where the delay is occurring; in network lag? receive queue? apply queue? Is Flow Control turning on at any time?
Look at the ‘PXC/Galera Node Summary’ dashboard.
Which means that parallelism isn’t the issue. Put it back to 16.
Are the UPDATEs concentrated on a single table, or small set of tables? Do all tables have an integer-based PRIMARY KEY?
Create a dummy table and slam it with test inserts/updates, and observe. Do you see the same lag?
Thank you very much for your advice.
I checked all the places you mentioned and unfortunately I didn’t find anything that looked like the source of the problem.
I performed tests with the table and observed strange behavior. I enter data via INSERT on the write node. I read data via SELECT on another node. Everything is fine if the record has been replicated. If the record has not been replicated before calling SELECT, I cannot read it despite subsequent attempts repeated every second. Even if the record is replicated there because I can see it when use command line/workbanch. Only disconnecting the connection to the database and reconnecting helps.
I do this first on dummy table and on the regular one, the effect is similar.
That tells me that whatever you are using to test this might be starting an explicit transaction, and/or automatically disabling autocommit
for the session. Disconnecting, and reconnecting forces a new transaction, and thus generates a new “view” of the data which explains why you can’t see the new data on another node until you reconnect.
Use 2 terminal windows, and the mysql command-line to connect to node1 and node2. On node2, verify SELECT @@autocommit
should be 1. On node1, also verify autocommit=1, then insert a row to a table. Then select on node2. You should see it immediately. If this test works, then what I said above holds true, that something external to mysql is modifying the connection/transaction.
Thank you very much. Of course you were right about the auto commit being disabled, when I enabled it my test script started working.
From the data collected in the table it looks like the problem affects all nodes equally except of course the one on which the write is.
During the day I have hours where there is a little more or less but I can’t connect it to the general traffic.
hour_of_day |
delayed_record |
0 |
13 |
1 |
12 |
2 |
11 |
3 |
13 |
4 |
12 |
5 |
9 |
6 |
10 |
7 |
10 |
8 |
18 |
9 |
12 |
10 |
10 |
11 |
13 |
12 |
10 |
13 |
9 |
14 |
18 |
15 |
16 |
16 |
13 |
17 |
2 |
18 |
10 |
19 |
13 |
20 |
13 |
21 |
12 |
22 |
17 |
23 |
11 |
However, what surprised me the most is that latency values are much higher than Maximum Galera Replication Latency show me.
Up to 0.5s |
Up to 1s |
Up to 3s |
Up to 4s |
Up to 5s |
Above 5s |
ALL above 1s |
87890 |
88 |
181 |
66 |
28 |
4 |
279 |
Yes, that makes complete sense if you have autocommit disabled. No other nodes will see any new rows until COMMIT on the writer. Fix your application/tests so that autocommit remains enabled all the time. This is the default behavior. What you are experiencing is not “latency”. This is transactional lag caused by delayed COMMIT which is 100% responsibility of your application.
“latency” in PXC is the duration it takes to A) receive a writeset via replication, B) add writeset to receive queue, C) certify transaction, D) add to apply queue, and E) apply transaction locally. Even in the busiest PXCs I’ve seen, processing 20,000 txns/sec, the latency of A->E is measured in milliseconds.
We have autocommit enabled. Sorry, I didn’t write clearly enough. At first time I tried to run tests on the table using Python. For some reason I don’t understand, autocommit is disabled by default. However, the results I sent have been corrected according to your comment with autocommit. Our application that we use is in other language, where autocommit is enabled by default (we checked this).
These delays are also surprise to me, especially since none of the graphs in PMM indicate a delay greater than a few milliseconds.
The delays/graphs you see measured in PMM are at the protocol level, not the application level. You are seeing delays at the application level which shows that this issue is with your application and its behavior, not an issue with PXC or its configuration. You said above that you fixed an issue in your test application and after doing that you did not see this issue anymore. That is more evidence that your other application is doing something not correct.