Cluster out-of-sync (possibly after running mysql_upgrade ?)

Hi!

First of all, I’m very sorry if this post seems dumb or unnecessary. I’m really a SQL Server kind of guy (had to take over a lot of Percona/MySQL databases due to a guy resigning), so my knowledge of these systems are limited.

We have a three-node cluster (two actual nodes, one Galera arbitrator), running RHEL 7 with Percona XtraDB Cluster 5.7.21-20-57-log.

We have had this cluster running since September this year, and it has been running quite well in fact. However, I noticed this in the error log today for one of the nodes (let’s call it NODE2):

2018-11-12T10:40:09.081278+02:00 2 [ERROR] Slave SQL: Error 'Duplicate column name 'state_'' on query. Default database: 'cos_ci2_admin'. Query: 'alter table Release_ add state_ integer', Error_code: 1060
2018-11-12T10:40:09.081365+02:00 2 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5569099
2018-11-12T10:40:09.081653+02:00 2 [Warning] WSREP: Ignoring error for TO isolated action: source: c3333863-e64d-11e8-b551-43db28e6f0e3 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 25868 trx_id: -1 seqnos (l: 2703, g: 5569099, s: 5569098, d: 5569098, ts: 433484436929401)
2018-11-12T10:43:08.464107+02:00 9 [ERROR] Slave SQL: Error 'Duplicate column name 'schemaVersion'' on query. Default database: 'cos_ci2_admin'. Query: 'alter table Release_ add schemaVersion varchar(75) null', Error_code: 1060
2018-11-12T10:43:08.464223+02:00 9 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5569621
2018-11-12T10:43:08.464562+02:00 9 [Warning] WSREP: Ignoring error for TO isolated action: source: c3333863-e64d-11e8-b551-43db28e6f0e3 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 25871 trx_id: -1 seqnos (l: 3237, g: 5569621, s: 5569620, d: 5569620, ts: 433663818863426)
2018-11-12T10:43:08.512980+02:00 2 [ERROR] Slave SQL: Error 'Duplicate column name 'state_'' on query. Default database: 'cos_ci2_admin'. Query: 'alter table Release_ add state_ integer', Error_code: 1060
2018-11-12T10:43:08.513089+02:00 2 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5569622
2018-11-12T10:43:08.513495+02:00 2 [Warning] WSREP: Ignoring error for TO isolated action: source: c3333863-e64d-11e8-b551-43db28e6f0e3 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 25872 trx_id: -1 seqnos (l: 3238, g: 5569622, s: 5569621, d: 5569621, ts: 433663868524152)

I looked up the error message, and it seems that the cluster is out of sync, if I understand it correctly, because no error message like this exist on NODE1. I also read that it might be caused because you run:

mysql_upgrade

…which will alter tables only on the node you run it on, causing the schema data to be out-of-sync. Is that correct? We ran a mysql_upgrade last week because of an update of Percona.

In either case, I need that error message fixed as soon as possible. I don’t want it to break out of my (albeit limited) control. How do I correct this error? Do I kick out NODE2 from the cluster, manually delete all database files, and re-join it? If so, how do I do that?

This is the my.cnf for NODE1 (the one without the error messages in the log):

[xtrabackup]
open-files-limit=2000000
target_dir=/var/backup/mysql/

[client]
### Certificates ###
ssl-ca=/etc/mysql/ssl-client/ca.pem
ssl-cert=/etc/mysql/ssl-client/client-cert.pem
ssl-key=/etc/mysql/ssl-client/client-key.pem

[sst]
### Encryption of SST (State Snapshot Transfer) ###
encrypt=4
ssl-ca=/etc/mysql/ssl-sst/ca.pem
ssl-cert=/etc/mysql/ssl-sst/server-cert.pem
ssl-key=/etc/mysql/ssl-sst/server-key.pem

[mysqld]
### Cluster Settings ###
# Path to Galera library
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so

# Encrypting Replication/IST Traffic #
wsrep_provider_options="socket.ssl=yes;socket.ssl_ca=/etc/mysql/ssl-replication/ca.pem;socket.ssl_cert=/etc/mysql/ssl-replication/server-cert.pem;socket.ssl_key=/etc/mysql/ssl-replication/server-key.pem"

# Cluster name
wsrep_cluster_name=crdlstgpdb

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://10.29.128.10,10.29.128.11,10.29.128.4

# If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=crdlstgpdb11

# Node IP address
wsrep_node_address=10.29.128.10

# SST method
wsrep_sst_method=xtrabackup-v2

# Authentication for SST method
wsrep_sst_auth="sstuser:****"

# pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=PERMISSIVE

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
log-bin=crdlstgpdb11-bin
log-bin-index=crdlstgpdb11-bin.index

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Slave thread to use
wsrep_slave_threads= 8
wsrep_log_conflicts

### Plugins ###
plugin-load-add=validate_password.so
validate-password=FORCE_PLUS_PERMANENT

### Certificates ###
ssl-ca=/etc/mysql/ssl-server/ca.pem
ssl-cert=/etc/mysql/ssl-server/server-cert.pem
ssl-key=/etc/mysql/ssl-server/server-key.pem

### Security ###
tls_version=TLSv1.2
default_password_lifetime=90
validate_password_policy=STRONG
validate_password_dictionary_file=/etc/mysql/PasswordDictionary.txt
default_authentication_plugin=sha256_password
interactive_timeout=900
wait_timeout=900

### Error and General Logging ###
explicit_defaults_for_timestamp=1
log_timestamps=SYSTEM
long_query_time=2
slow_query_log=1
slow_query_log_file=/var/log/mysql/mysql-log-slow-queries.log

log-error=/var/log/mysql/mysql-log-error.log

# Only enable these for troubleshooting purposes
#general_log=1
#general_log_file=/var/log/mysql/mysql-log-general.log

### Audit Logging ###
audit_log_policy=NONE
audit_log_format=JSON
audit_log_file=/var/log/mysql/mysql-log-audit.log
audit_log_rotate_on_size=64M
audit_log_rotations=2

This is the my.cnf for NODE2 (the one with the error message in the log):

[xtrabackup]
open-files-limit=2000000
target_dir=/var/backup/mysql/

[client]
### Certificates ###
ssl-ca=/etc/mysql/ssl-client/ca.pem
ssl-cert=/etc/mysql/ssl-client/client-cert.pem
ssl-key=/etc/mysql/ssl-client/client-key.pem

[sst]
### Encryption of SST (State Snapshot Transfer) ###
encrypt=4
ssl-ca=/etc/mysql/ssl-sst/ca.pem
ssl-cert=/etc/mysql/ssl-sst/server-cert.pem
ssl-key=/etc/mysql/ssl-sst/server-key.pem

[mysqld]
### Cluster Settings ###
# Path to Galera library
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so

# Encrypting Replication/IST Traffic #
wsrep_provider_options="socket.ssl=yes;socket.ssl_ca=/etc/mysql/ssl-replication/ca.pem;socket.ssl_cert=/etc/mysql/ssl-replication/server-cert.pem;socket.ssl_key=/etc/mysql/ssl-replication/server-key.pem"

# Cluster name
wsrep_cluster_name=crdlstgpdb

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://10.29.128.10,10.29.128.11,10.29.128.4

# If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=crdlstgpdb21

# Node IP address
wsrep_node_address=10.29.128.11

# SST method
wsrep_sst_method=xtrabackup-v2

# Authentication for SST method
wsrep_sst_auth="sstuser:FyZxQifhG3yVHiZ7-tQl"

# pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=PERMISSIVE

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
log-bin=crdlstgpdb21-bin
log-bin-index=crdlstgpdb21-bin.index

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Slave thread to use
wsrep_slave_threads= 8
wsrep_log_conflicts

### Plugins ###
plugin-load-add=validate_password.so
validate-password=FORCE_PLUS_PERMANENT

### Certificates ###
ssl-ca=/etc/mysql/ssl-server/ca.pem
ssl-cert=/etc/mysql/ssl-server/server-cert.pem
ssl-key=/etc/mysql/ssl-server/server-key.pem

### Security ###
tls_version=TLSv1.2
default_password_lifetime=90
validate_password_policy=STRONG
validate_password_dictionary_file=/etc/mysql/PasswordDictionary.txt
default_authentication_plugin=sha256_password
interactive_timeout=900
wait_timeout=900

### Error and General Logging ###
explicit_defaults_for_timestamp=1
log_timestamps=SYSTEM
long_query_time=2
slow_query_log=1
slow_query_log_file=/var/log/mysql/mysql-log-slow-queries.log

log-error=/var/log/mysql/mysql-log-error.log

# Only enable these for troubleshooting purposes
#general_log=1
#general_log_file=/var/log/mysql/mysql-log-general.log

### Audit Logging ###
audit_log_policy=NONE
audit_log_format=JSON
audit_log_file=/var/log/mysql/mysql-log-audit.log
audit_log_rotate_on_size=64M
audit_log_rotations=2

Garb config for arbitrator (just in case it’s needed):

# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.

# A comma-separated list of node addresses (address[:port]) in the cluster
GALERA_NODES="10.29.128.10:4567, 10.29.128.11:4567, 10.29.128.4:4567"

# Galera cluster name, should be the same as on the rest of the nodes.
GALERA_GROUP="crdlstgpdb"

# Optional Galera internal options string (e.g. SSL settings)
# see http://galeracluster.com/documentation-webpages/galeraparameters.html
# Regarding the double quotes, see https://github.com/codership/galera/issues/429
GALERA_OPTIONS="\"socket.ssl_key=/etc/mysql/ssl-replication/server-key.pem;socket.ssl_cert=/etc/mysql/ssl-replication/server-cert.pem;socket.ssl_ca=/etc/mysql/ssl-replication/ca.pem;socket.ssl_cipher=AES128-SHA\""

# Log file for garbd. Optional, by default logs to syslog
# Deprecated for CentOS7, use journalctl to query the log for garbd
LOG_FILE="/var/log/garbd.log"