This is now happening on 2 of my 5 nodes. We updating a mailer program on reboot the node can’t join back to the cluster. The only way i know this solve this is to blow away the machine and start from scratch.
Joiner node:
root@webnode2:/home/ken# /etc/init.d/mysql start
Starting mysql (via systemctl): mysql.serviceJob for mysql.service failed because the control process exited with error code.
See “systemctl status mysql.service” and “journalctl -xe” for details.
failed!
root@webnode2:/home/ken# systemctl status mysql.service
● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
Loaded: loaded (/etc/init.d/mysql; generated)
Active: failed (Result: exit-code) since Tue 2024-01-30 15:12:55 EST; 40s ago
Docs: man:systemd-sysv-generator(8)
Process: 3979495 ExecStart=/etc/init.d/mysql start (code=exited, status=1/FAILURE)
Jan 30 15:12:35 webnode2.long-mcquade.com systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon…
Jan 30 15:12:35 webnode2.long-mcquade.com mysql[3979495]: * Stale sst_in_progress file in datadir mysqld
Jan 30 15:12:35 webnode2.long-mcquade.com mysql[3979495]: * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Jan 30 15:12:35 webnode2.long-mcquade.com mysql[3979495]: * State transfer in progress, setting sleep higher mysqld
Jan 30 15:12:55 webnode2.long-mcquade.com mysql[3979495]: * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).
Jan 30 15:12:55 webnode2.long-mcquade.com mysql[3979495]: …fail!
Jan 30 15:12:55 webnode2.long-mcquade.com systemd[1]: mysql.service: Control process exited, code=exited, status=1/FAILURE
Jan 30 15:12:55 webnode2.long-mcquade.com systemd[1]: mysql.service: Failed with result ‘exit-code’.
Jan 30 15:12:55 webnode2.long-mcquade.com systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
Doner Node:
2024-01-30T20:12:41.941906Z 0 [Note] WSREP: Member 2.0 (pxc2) requested state transfer from ‘any’. Selected 0.0 (pxc3)(SYNCED) as donor.
2024-01-30T20:12:43.332884Z 0 [Note] WSREP: (f23c4ba4, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2024-01-30T20:12:53.523896Z 0 [Warning] WSREP: 0.0 (pxc3): State transfer to 2.0 (pxc2) failed: -22 (Invalid argument)
2024-01-30T20:12:53.524856Z 0 [Note] WSREP: Member 0.0 (pxc3) synced with group.
2024-01-30T20:12:53.524909Z 0 [Note] WSREP: declaring 225a4946 at tcp://172.26.0.11:4567 stable
2024-01-30T20:12:53.524950Z 0 [Note] WSREP: declaring 38921ada at tcp://172.26.0.9:4567 stable
2024-01-30T20:12:53.524989Z 0 [Note] WSREP: forgetting eb189faa (tcp://172.26.0.12:4567)
2024-01-30T20:12:53.525615Z 0 [Note] WSREP: Node 225a4946 state primary
2024-01-30T20:12:53.526106Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,225a4946,183)
memb {
225a4946,0
38921ada,0
f23c4ba4,0
}
joined {
}
left {
}
partitioned {
eb189faa,0
}
)
2024-01-30T20:12:53.526129Z 0 [Note] WSREP: Save the discovered primary-component to disk
2024-01-30T20:12:53.526690Z 0 [Note] WSREP: forgetting eb189faa (tcp://172.26.0.12:4567)
2024-01-30T20:12:53.526773Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2024-01-30T20:12:53.526812Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2024-01-30T20:12:53.527188Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: f32dbad8-bfab-11ee-b9bd-bbf4466ecd50
2024-01-30T20:12:53.527437Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: f32dbad8-bfab-11ee-b9bd-bbf4466ecd50 from 0 (pxc3)
2024-01-30T20:12:53.527453Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: f32dbad8-bfab-11ee-b9bd-bbf4466ecd50 from 1 (pxc5)
2024-01-30T20:12:53.527466Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: f32dbad8-bfab-11ee-b9bd-bbf4466ecd50 from 2 (pxc6)
2024-01-30T20:12:53.527483Z 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 171,
members = 3/3 (primary/total),
act_id = 7130896577,
last_appl. = 7130896493,
protocols = 0/9/3 (gcs/repl/appl),
group UUID = 3a0118b7-6c7b-11eb-935d-5fcc5d08be87
2024-01-30T20:12:53.527498Z 0 [Note] WSREP: Flow-control interval: [173, 173]
2024-01-30T20:12:53.527687Z 6 [Note] WSREP: REPL Protocols: 9 (4, 2)
2024-01-30T20:12:53.527725Z 6 [Note] WSREP: REPL Protocols: 9 (4, 2)
2024-01-30T20:12:53.527745Z 6 [Note] WSREP: New cluster view: global state: 3a0118b7-6c7b-11eb-935d-5fcc5d08be87:7130896577, view# 172: Primary, number of nodes: 3, my index: 2, protocol version 3
2024-01-30T20:12:53.527756Z 6 [Note] WSREP: Setting wsrep_ready to true
2024-01-30T20:12:53.527767Z 6 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 4 → 3) (Increment: 4 → 3)
2024-01-30T20:12:53.528755Z 6 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-01-30T20:12:53.531022Z 6 [Note] WSREP: Assign initial position for certification: 7130896577, protocol version: 4
2024-01-30T20:12:53.531133Z 0 [Note] WSREP: Service thread queue flushed.
2024-01-30T20:12:58.835552Z 0 [Note] WSREP: cleaning up eb189faa (tcp://172.26.0.12:4567)
Any thoughts on what I can do to resolve this?