Cannot join the cluster

Hi,

We got a cluster of Percona Xtradb in production (on CentOS 5.7). This is what we have,

Percona-XtraDB-Cluster-client-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-devel-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-full-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-garbd-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-server-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-shared-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-test-57.x86_64 5.7.23-31.31.2.el7 @percona-release-x86_64
percona-xtrabackup-24.x86_64 2.4.12-1.el7 @percona-release-x86_64

Out of 9 nodes we have, we shutdown 4 of them about two weeks ago for some maintenance. When we started one of the shutdown nodes today, we couldn’t get it join the cluster properly (we tried a couple of times and we always got the same error).

On the donor, we got this in the innobackup.backup.log.

191130 16:50:07 [01] Streaming ./ng_subscriptions/ar_internal_metadata.frm to
191130 16:50:07 [01] …done
191130 16:50:07 [01] Streaming ./ng_subscriptions/schema_migrations.frm to
191130 16:50:07 [01] …done
191130 16:50:07 [01] Streaming ./ng_subscriptions/users.frm to
191130 16:50:07 [01] …done
191130 16:50:07 Finished backing up non-InnoDB tables and files
191130 16:50:08 >> log scanned up to (2896176629241)
191130 16:50:08 Executing LOCK BINLOG FOR BACKUP…
191130 16:50:08 [00] Streaming
191130 16:50:08 [00] …done
191130 16:50:08 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS…
xtrabackup: The latest check point (for incremental): ‘2896122173332’
xtrabackup: Stopping log copying thread.
.191130 16:50:08 >> log scanned up to (2896176681716)

191130 16:50:08 Executing UNLOCK BINLOG
191130 16:50:08 Executing UNLOCK TABLES
191130 16:50:08 All tables unlocked
191130 16:50:08 [00] Streaming ib_buffer_pool to
191130 16:50:08 [00] …done
191130 16:50:11 Backup created in directory ‘/tmp/pxc_sst_MhMv/donor_xb_6vke/’
MySQL binlog position: filename ‘pxcnode-2-bin.002875’, position ‘422058878’
191130 16:50:11 [00] Streaming
191130 16:50:11 [00] …done
Error: failed to execute query SELECT UUID(): Lock wait timeout exceeded; try restarting transaction

On the restart node, we got this in mysql-error.log

2019-11-30T16:45:54.093223Z 2 [Note] WSREP: State gap can’t be serviced using IST. Switching to SST
2019-11-30T16:45:54.093253Z 2 [Note] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (299db2e2-3605-11e9-aa1b-33635dc97917): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():538. IST will be unavailable.
2019-11-30T16:45:54.094703Z 0 [Note] WSREP: Member 5.0 (pxcnode-7) requested state transfer from ‘pxcnode-2’. Selected 1.0 (pxcnode-2)(SYNCED) as donor.
2019-11-30T16:45:54.094736Z 0 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 34881053)
2019-11-30T16:45:54.094798Z 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2019-11-30T16:45:54.094834Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 → 299db2e2-3605-11e9-aa1b-33635dc97917:34881032
2019-11-30T16:45:55.070150Z 0 [Note] WSREP: (de65cb77, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2019-11-30T16:45:57.240838Z WSREP_SST: [INFO] Proceeding with SST…
2019-11-30T16:45:57.416469Z WSREP_SST: [INFO] …Waiting for SST streaming to complete!
2019-11-30T16:48:01.489501Z 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
2019-11-30T16:50:04.391910Z 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
2019-11-30T16:50:41.178825Z WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst
2019-11-30T16:50:41.318077Z 0 [Warning] WSREP: 1.0 (pxcnode-2): State transfer to 5.0 (pxcnode-7) failed: -22 (Invalid argument)
2019-11-30T16:50:41.318123Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():766: Will never receive state. Need to abort.
2019-11-30T16:50:41.318157Z 0 [Note] WSREP: gcomm: terminating thread
2019-11-30T16:50:41.318193Z 0 [Note] WSREP: gcomm: joining thread
2019-11-30T16:50:41.318547Z 0 [Note] WSREP: gcomm: closing backend
2019-11-30T16:50:41.319856Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,02d7a138,100)
memb {
de65cb77,0
}
joined {
}
left {
}
partitioned {
02d7a138,0
441e59e3,0
4d2f3d8d,0
82659ae6,0
cae84b4a,0
}
)
2019-11-30T16:50:41.319926Z 0 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2019-11-30T16:50:41.320391Z 0 [Note] WSREP: gcomm: closed
2019-11-30T16:50:41.320421Z 0 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Terminated
2019-11-30T16:50:44.566545Z WSREP_SST: [ERROR] Removing /var/lib/mysql//.sst/xtrabackup_galera_info file due to signal
2019-11-30T16:50:44.575658Z WSREP_SST: [ERROR] Removing file due to signal
2019-11-30T16:50:44.583539Z WSREP_SST: [ERROR] Cleanup after exit with status:143

Any advise?

Steve