I’ve been experimenting with a pretty stock XtraDB Cluster 8.0.30 install on Ubuntu 22.04, and have had a 3-node cluster up and running without much problem.
I’m interested in how much hand-holding PXC requires if things go bad in the server room, and tried gracefully shutting down all three nodes, and then bringing them back up.
I was hoping once they all could talk to each other again, the cluster would be functional again, but all three just sit there. Here’s /var/log/mysql/error.log output from the first node:
2023-01-31T03:08:36.587890Z 0 [Note] [MY-000000] [WSREP] Starting replication
2023-01-31T03:08:36.587908Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0
2023-01-31T03:08:36.587925Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 0fff4c84-a109-11ed-a165-0ec17c0a2f1e:11
2023-01-31T03:08:36.587985Z 0 [Note] [MY-000000] [Galera] protonet asio version 0
2023-01-31T03:08:36.594760Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.
2023-01-31T03:08:36.594800Z 0 [Note] [MY-000000] [Galera] backend: asio
2023-01-31T03:08:36.594885Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0
2023-01-31T03:08:36.594990Z 0 [Note] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2023-01-31T03:08:36.595009Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2023-01-31T03:08:36.595167Z 0 [Note] [MY-000000] [Galera] GMCast version 0
2023-01-31T03:08:36.595334Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
2023-01-31T03:08:36.595353Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
2023-01-31T03:08:36.595604Z 0 [Note] [MY-000000] [Galera] EVS version 1
2023-01-31T03:08:36.595688Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'pxc-cluster', peer '10.66.0.111:,10.66.0.112:,10.66.0.113:'
2023-01-31T03:08:36.621682Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') connection established to 8bf1e9c6-bb17 ssl://10.66.0.113:4567
2023-01-31T03:08:36.624408Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') connection established to 8a096083-bd7d ssl://10.66.0.112:4567
2023-01-31T03:08:36.624587Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2023-01-31T03:08:36.624688Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address ssl://10.66.0.111:4567
2023-01-31T03:08:36.631984Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') connection established to 8bf1e9c6-bb17 ssl://10.66.0.113:4567
2023-01-31T03:08:37.098090Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2023-01-31T03:08:37.098177Z 0 [Note] [MY-000000] [Galera] declaring 8a096083-bd7d at ssl://10.66.0.112:4567 stable
2023-01-31T03:08:37.098195Z 0 [Note] [MY-000000] [Galera] declaring 8bf1e9c6-bb17 at ssl://10.66.0.113:4567 stable
2023-01-31T03:08:37.098231Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2023-01-31T03:08:37.099285Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible
2023-01-31T03:08:37.099332Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,8a096083-bd7d,3)
memb {
8a096083-bd7d,0
8bf1e9c6-bb17,0
8da19da6-a9d8,0
}
joined {
}
left {
}
partitioned {
}
)
2023-01-31T03:08:40.097352Z 0 [Note] [MY-000000] [Galera] (8da19da6-a9d8, 'ssl://0.0.0.0:4567') turning message relay requesting off
The other two nodes have basically identical log messages, just differing in IP addresses.
The first node has in /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 0fff4c84-a109-11ed-a165-0ec17c0a2f1e
seqno: 11
safe_to_bootstrap: 1
While #2 says:
# GALERA saved state
version: 2.1
uuid: 0fff4c84-a109-11ed-a165-0ec17c0a2f1e
seqno: 10
safe_to_bootstrap: 0
and #3 says
# GALERA saved state
version: 2.1
uuid: 0fff4c84-a109-11ed-a165-0ec17c0a2f1e
seqno: 9
safe_to_bootstrap: 0
So it seems like it can boot based on the first node, and if I manually run on that node:
systemctl stop mysql
systemctl start mysql@bootstrap
Then they all become happy and I see a cluster size of 3 with all nodes as “Primary”
My question is: is there anything that needs to be setup or enabled so that this can be automatic? So that if a node is “safe_to_bootstrap: 1” when the OS boots - that it actually starts “mysql@bootstrap” rather than just “mysql”?
I noticed in systemd that mysql@bootstrap is listed as:
Loaded: loaded (/lib/systemd/system/mysql@.service; disabled; vendor preset: enabled)
I thought it was weird that it was disabled even though the vendor-preset is enabled. I tried enabling it so both it and “mysql” started at boot, but that didn’t really help.
Thanks for any suggestions.