3 node cluster, 1 updated stopped working

Need some help here, I was using 5.7.19-29.22.1.el7 and decided to update trying to resolve some issues that happens aleatory, like node shutdown. I picked node1 and updated through yum update, after this this node cant join the cluster anymore, below I’ll put some info:

SElinux is disabled on all nodes, firewalld is with the right zone for the eth, nothing has changed, only yum update mess everything.

[root@noc-db-01 pedro]# cat /etc/my.cnf
[client]
socket=/var/lib/mysql1/mysql.sock

[mysqld]
server-id=1
datadir=/var/lib/mysql1
user=mysql
socket=/var/lib/mysql1/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin
log_slave_updates
expire_logs_days=7
symbolic-links=0
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.100.1,192.168.100.2,192.168.100.3
binlog_format=ROW
default_storage_engine=InnoDB
wsrep_slave_threads= 8
wsrep_log_conflicts
wsrep_node_address=192.168.100.1
wsrep_cluster_name=noc-db
wsrep_node_name=noc-db-01
pxc_strict_mode=DISABLED
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=“sstuser:s3cret”

log_timestamps=SYSTEM
ignore-db-dir=lost+found
thread_cache_size = 50
max_connections = 512
query_cache_limit = 0
query_cache_size = 0
query_cache_type = 0
expire_logs_days = 10
max_binlog_size = 400M
max_allowed_packet = 128M
innodb_buffer_pool_size = 24G
innodb_buffer_pool_instances = 24
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2
innodb_log_file_size = 400M
open_files_limit = 65535
table_open_cache = 16384
table-definition-cache = 4096
join_buffer_size = 512K
tmp-table-size = 32M
max-heap-table-size = 32M

[root@noc-db-01 pedro]# systemctl status mysql.service
● mysql.service - Percona XtraDB Cluster
Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2017-11-14 18:34:42 -02; 1min 12s ago
Process: 12890 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 12860 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=2)
Process: 12120 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
Process: 12119 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
Process: 12078 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 12119 (code=exited, status=0/SUCCESS)

Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: mysql.service: control process exited, code=exited status=1
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12120]: ERROR! mysqld_safe with PID 12119 has already exited: FAILURE
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12860]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12860]: ERROR! mysql already dead
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: mysql.service: control process exited, code=exited status=2
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12890]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12890]: WARNING: mysql may be already dead
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: Failed to start Percona XtraDB Cluster.
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: Unit mysql.service entered failed state.
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: mysql.service failed.

Nov 14 18:34:32 noc-db-01.vetorial.net systemd[1]: Starting Percona XtraDB Cluster…
– Subject: Unit mysql.service has begun start-up
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit mysql.service has begun starting up.
Nov 14 18:34:32 noc-db-01.vetorial.net mysql-systemd[12120]: State transfer in progress, setting sleep higher
Nov 14 18:34:32 noc-db-01.vetorial.net mysqld_safe[12119]: mysqld_safe Adding ‘/usr/lib64/libjemalloc.so.1’ to LD_PRELOAD for mysqld
Nov 14 18:34:32 noc-db-01.vetorial.net mysqld_safe[12119]: 2017-11-14T20:34:32.919982Z mysqld_safe Logging to ‘/var/log/mysqld.log’.
Nov 14 18:34:32 noc-db-01.vetorial.net mysqld_safe[12119]: 2017-11-14T20:34:32.923291Z mysqld_safe Logging to ‘/var/log/mysqld.log’.
Nov 14 18:34:32 noc-db-01.vetorial.net mysqld_safe[12119]: 2017-11-14T20:34:32.967654Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql1
Nov 14 18:34:32 noc-db-01.vetorial.net mysqld_safe[12119]: 2017-11-14T20:34:32.993454Z mysqld_safe WSREP: Running position recovery with --log_error=’/var/lib/mysql1/wsrep_recovery.H90Nca’ --pid-file=’/var/lib/m
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12120]: /usr/bin/mysql-systemd: line 140: kill: (12119) - No such process
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: mysql.service: control process exited, code=exited status=1
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12120]: ERROR! mysqld_safe with PID 12119 has already exited: FAILURE
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12860]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12860]: ERROR! mysql already dead
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: mysql.service: control process exited, code=exited status=2
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12890]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Nov 14 18:34:42 noc-db-01.vetorial.net mysql-systemd[12890]: WARNING: mysql may be already dead
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: Failed to start Percona XtraDB Cluster.
– Subject: Unit mysql.service has failed
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit mysql.service has failed.

– The result is failed.
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: Unit mysql.service entered failed state.
Nov 14 18:34:42 noc-db-01.vetorial.net systemd[1]: mysql.service failed.

As I need full cluster working I removed the 5.7.19-29.22.3.el7 and reinstalled the 5.7.18-29.20.1.el7, everything working again.

Hi! This sounds very similar to an issue we’re seeing. I posted about it in another thread:

https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster/50141-open-table-locks-incrementing-on-one-cluster-node-requires-mysqld-cycle-to-correct

We’re also running Percona-XtraDB-Cluster-server-57-5.7.19-29.22.1.el7.x86_64 and we’ll try downgrading to see if that helps, since we can recreate the problem.

Can you please attach log file to understand what exactly failure on the said node.