Problem getting IST to work

Gasper_Z · March 24, 2020, 8:07am

We have been trying to give Percona XtraDB Cluster a try and it has been working well so far, but we are having an issue trying to get IST to work for rejoining nodes.

It is a 3 master node cluster (docker container) on each node with few queries (lets say 1 write every few seconds)
Ports 3306,4444,4567(udp/tcp),4568 are all open

When one node gets shut down gracefully and then turned back on after 15seconds or so i expect it to sync via IST but instead does a full SST every time.

From the log of joining node it seems like state seqno is always -1

[Note] WSREP: State transfer required:
Group state: 691ae5d1-6d20-11ea-b27c-9a2c31b366d6:119
Local state: 691ae5d1-6d20-11ea-b27c-9a2c31b366d6:-1
.
.
.
[Note] WSREP: Check if state gap can be serviced using IST
[Note] WSREP: Local state seqno is undefined (-1)
[Note] WSREP: State gap can’t be serviced using IST. Switching to SST
[Note] WSREP: Failed to prepare for incremental state transfer: Local state seqno is undefined: 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():549. IST will be unavailable.

Is there an option we are missing or can anyone point me to the right resource?

Thanks

matthewb · March 24, 2020, 8:35am

Do you have the following configured on each node? Local state of -1 does seem odd. Are you sure all permissions are correct for all files in /var/lib/mysql/

wsrep_provider_options = “gcache.size=1G;gcache.recover=yes”

Gasper_Z · March 24, 2020, 9:45am

Permissions for all files in /var/lib/mysql seem fine. Tried it with 777 on everything in /var/lib/mysql as well.

Increased gcache.size=1G and set gcache.recover=yes
Still the same issue.

Is the local state :-1 supposed to be coming from grastate.dat file?
If so the seqno line does not seem to be updating on any of the nodes and is always -1

jrivera · March 29, 2020, 5:45am

Please send us a copy of your my.cnf, remove any identifiable information such as IP addresses, user and password credentials before uploading.

What Percona XtraDB Cluster version on each node? dpkg -l | grep -i percona or rpm -qa|grep -i percona outputs would help

Permissions for all files in /var/lib/mysql seem fine. Tried it with 777 on everything in /var/lib/mysql as well.

We would advise to revert the permissions.

matthewb · March 29, 2020, 12:07pm

@Gasper_Z Do you have SELinux running? While a node is running, the grastate.dat file will always read -1 for sequence. This is updated on clean shutdown to the correct value. As @jrivera said, you should NOT have 777 everywhere, and instead should use correct user/group permissions. For most of the MySQL datadir, this is 660.

You can attempt a recovery to fix the grastate.dat file by running “mysqld_safe --wsrep_recover”. That will output a recovered Galera GTID sequence. Edit the grastate.dat file and fix the sequence number. Then start MySQL normally. This should allow the IST to complete. This is a workaround to a higher problem of why the grastate.dat was not updated when the node was shutdown properly.

Gasper_Z · March 30, 2020, 2:19am

We managed to get this fixed.

Problem was with mysql not shutting down correctly.

Would have posted sooner and save you guys the trouble, but the forum was down.

Thanks, resolved.

Topic		Replies	Views
IST fails when joining slaves back to cluster Percona XtraDB Cluster 5.x	0	405	December 18, 2012
Node Not Performing IST Percona XtraDB Cluster 5.x	4	1060	March 13, 2023
IST fallback to SST due to safe_ist_seqno Percona XtraDB Cluster 5.x	0	725	April 17, 2019
weird behavior of IST syncronization Percona XtraDB Cluster 5.x	1	2724	January 15, 2018
Randomly IST fail Percona XtraDB Cluster 5.x	4	1054	May 22, 2015

Problem getting IST to work

Related topics