Hi,
I’m running a 3 node Percona XtraDB Cluster, version 5.7.35-38-57-log.
It’s used for HA, meaning that our application only write/read from a single node and only switches to another node if the first one fails.
We had a brief network timeout in the main node and noticed that at least one transaction was lost. It was accepted by the main node ( no errors reported from Mysql ) but the 3 records created by it were lost.
Other transactions were also running but Percona returned an error ( as expected ) and we knew that the records were not created.
Fortunately we had an audit log with all SQL commands in our application and were able to recreate the records.
Besides that, all nodes are consistent and the tables have the same number of records/data. The main node recovered automatically as expected.
What could cause this issue? I thought that Percona Cluster would only accept a transaction when all nodes confirmed that it was received. But in this case, it looks like the transaction was lost.
This is log of the main node when the issue happened:
2023-01-06T13:16:52.486794Z 25722799 [Note] Got an error reading communication packets
2023-01-06T13:16:52.486794Z 25722800 [Note] Got an error reading communication packets
2023-01-06T13:16:53.254133Z 25722801 [Note] Got an error reading communication packets
2023-01-06T13:16:53.826187Z 0 [Warning] WSREP: last inactive check more than PT1.5S (3evs.inactive_check_period) ago (PT1.92549S), skipping check
2023-01-06T13:16:55.214330Z 0 [Warning] WSREP: last inactive check more than PT1.5S (3evs.inactive_check_period) ago (PT2.16205S), skipping check
2023-01-06T13:16:55.894537Z 25722804 [Note] Got an error reading communication packets
2023-01-06T13:16:56.011289Z 25722805 [Note] Got an error reading communication packets
2023-01-06T13:16:56.211994Z 25722803 [Note] Got an error reading communication packets
2023-01-06T13:16:56.557882Z 25722806 [Note] Got an error reading communication packets
2023-01-06T13:16:56.668552Z 25722808 [Note] Got an error reading communication packets
2023-01-06T13:16:57.199672Z 25722811 [Note] Got an error reading communication packets
2023-01-06T13:16:58.006007Z 25722816 [Note] Got an error reading communication packets
2023-01-06T13:17:25.913475Z 0 [Warning] WSREP: Failed to report last committed 9483525, -110 (Connection timed out)
Node 2 log:
2023-01-06T13:16:57.411487Z 0 [Warning] WSREP: Failed to report last committed 9483680, -110 (Connection timed out)
2023-01-06T13:17:33.619112Z 0 [Warning] WSREP: Failed to report last committed 9483787, -110 (Connection timed out)
Node 3 log:
2023-01-06T13:16:57.403986Z 0 [Warning] WSREP: Failed to report last committed 9483680, -110 (Connection timed out)
2023-01-06T13:17:34.621377Z 0 [Warning] WSREP: Failed to report last committed 9483787, -110 (Connection timed out)