I’m testing a 3-node percona cluster and when a node is forced rebooted, mysql doesn’t start automatically. systemctl status mysql gives me:
mysql-systemd[618]: WARNING: Node has been rebooted, /var/lib/mysql/grastate.dat: seqno = -1, mysql service has not been started automatically
Looking at the systemctl definition file, I see it calls to /usr/bin/mysql-systemd check-grastate
Which does the following:
check_grastate_dat() {
local seqno=-1
local uptime=$(awk ‘{print int($1/60)}’ /proc/uptime)
if [ $uptime -lt 5 ]; then
if [ -f $grastate_loc ]; then
seqno=$(grep ‘seqno:’ $grastate_loc | cut -d: -f2 | tr -d ’ ')
if [ $seqno -eq -1 ]; then
log_warning_msg “Node has been rebooted, $grastate_loc: seqno = $seqno, mysql service has not been started automatically” exit 1
fi
else
…
So, if a node is forced rebooted, and uptime < 5 minutes, systemctl start mysql throws an error. After 5 minutes, if I run the same command systemctl start mysql, starts with no problem (the warning is still logged).
Why ? I assume the intention is the operator can check everything is ok and after 5 minutes, can start it manually. But if I want a completely automatic system (for example a forced reboot after a power failure) is it safe to remove the “exit 1” line so it can start automatically?
This is a safety gate to prevent unsafe automatic startup in an invalid Galera state. This was added in PXC-2985. This is only blocked on unclean shutdowns.
It is not advisable to remove exit 1 in the script.
I’ve read it and I have the same question. When a node was forced rebooted, “systemctl start mysql” will fail the first 5 minutes of uptime. After that, it will start with no errors. Nothing happens in these first 5 minutes. Why is it safe to start it after 5 minutes?
hi. Thanks for the reply. I understand, but it doesn’t cover scenarios on a power failure and no DBA is doing anything. The idea is having a full 24x7 available, with no manual intervention, and in this case, when a power failure happens, the node will not start mysqld automatically and require a DBA starts it manually. Most (of almost all of them) high availability services cover this scenario, with no manual intervention.
Right, for cases like this, I believe the solution is to migrate to PXC Operator and let the Kubernetes controller and statefulset handle pod restarts.
For on-premises or self-managed clusters, it is not advisable, because if all nodes restart abruptly almost simultaneously while DMLs are active on any given node, the DBA would need to start each member with --wsrep_recover to repair grastate.dat, determine which node has the latest commit, and start it as a bootstrapped node. Automatic service startup could start a member that is behind, resulting in either inconsistencies or data loss.
Moreover, editing the mysql-systemd script would not scale and would require you to edit the file each time you upgrade. Your other option is to create a script that will start the service after an OS reboot if the other nodes are up and running.