Problem getting IST to work

We have been trying to give Percona XtraDB Cluster a try and it has been working well so far, but we are having an issue trying to get IST to work for rejoining nodes.

It is a 3 master node cluster (docker container) on each node with few queries (lets say 1 write every few seconds)
Ports 3306,4444,4567(udp/tcp),4568 are all open

When one node gets shut down gracefully and then turned back on after 15seconds or so i expect it to sync via IST but instead does a full SST every time.

From the log of joining node it seems like state seqno is always -1

[Note] WSREP: State transfer required:
Group state: 691ae5d1-6d20-11ea-b27c-9a2c31b366d6:119
Local state: 691ae5d1-6d20-11ea-b27c-9a2c31b366d6:-1
.
.
.
[Note] WSREP: Check if state gap can be serviced using IST
[Note] WSREP: Local state seqno is undefined (-1)
[Note] WSREP: State gap can’t be serviced using IST. Switching to SST
[Note] WSREP: Failed to prepare for incremental state transfer: Local state seqno is undefined: 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():549. IST will be unavailable.

Is there an option we are missing or can anyone point me to the right resource?

Thanks

Do you have the following configured on each node? Local state of -1 does seem odd. Are you sure all permissions are correct for all files in /var/lib/mysql/

wsrep_provider_options = “gcache.size=1G;gcache.recover=yes”

Permissions for all files in /var/lib/mysql seem fine. Tried it with 777 on everything in /var/lib/mysql as well.

Increased gcache.size=1G and set gcache.recover=yes
Still the same issue.

Is the local state :-1 supposed to be coming from grastate.dat file?
If so the seqno line does not seem to be updating on any of the nodes and is always -1

Please send us a copy of your my.cnf, remove any identifiable information such as IP addresses, user and password credentials before uploading.

What Percona XtraDB Cluster version on each node? dpkg -l | grep -i percona or rpm -qa|grep -i percona outputs would help

Permissions for all files in /var/lib/mysql seem fine. Tried it with 777 on everything in /var/lib/mysql as well.

We would advise to revert the permissions.

@Gasper_Z Do you have SELinux running? While a node is running, the grastate.dat file will always read -1 for sequence. This is updated on clean shutdown to the correct value. As @jrivera said, you should NOT have 777 everywhere, and instead should use correct user/group permissions. For most of the MySQL datadir, this is 660.

You can attempt a recovery to fix the grastate.dat file by running “mysqld_safe --wsrep_recover”. That will output a recovered Galera GTID sequence. Edit the grastate.dat file and fix the sequence number. Then start MySQL normally. This should allow the IST to complete. This is a workaround to a higher problem of why the grastate.dat was not updated when the node was shutdown properly.

We managed to get this fixed.

Problem was with mysql not shutting down correctly.

Would have posted sooner and save you guys the trouble, but the forum was down.

Thanks, resolved.