All nodes say safe_to_bootstrap: 0 in freshly deployed cluster

Hello, I am trying to deploy MySQL 8 using the Kubernetes Operator (v1.7.0), but I cannot get it to run properly. After I kubectl apply my cluster-cr.yaml, The first database pod (cluster1-pxc-0) spins up and with cat /var/lib/mysql/grastate.dat inside the container can I see that safe_to_bootstrap is set to 1. After the first pod is successfully deployed, the second pod (cluster1-pxc-1) spins up and the safe_to_bootstrap value of cluster1-pxc-0 turns to 0. After the second pod is ready, a third pod spins up, and I end up with 3 nodes with safe_to_bootstrap: 0, which to my understanding means the cluster is not functional, as there is no replication between the nodes.

I played this through several times, with fresh nodes & volumes, I always get the same result.

This also shows in the backups, which fail similarly to the issue shown in this thread. In fact, I can also see that all 3 nodes are in “Donor/Desynced:” state.

When I describe my cluster crd, I see:

My cr.yaml is pasted here.

If anyone can give me a hint what I might be doing wrong, I would be very happy!

Best regards,

Jonathan

1 Like

The grastate.dat file does NOT represent a “current status” of the cluster, so stop examining this file. This file is not updated while the cluster is running. The safe-to-bootstrap flag is set to 0 on start up of any node to prevent accidental bootstraps should this node die and restart. The flag is set to 1 only when the last node shuts down cleanly.

3 Likes

Thanks for the quick answer, Matthew!

So putting the grastate file aside, the backup-pods are still failing, logging the following:

+ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
2021/04/06 16:20:19 Peer finder enter
2021/04/06 16:20:19 Determined Domain to be percona-database.svc.cluster.local
2021/04/06 16:20:19 Peer list updated
was [ ]
now [cluster1-pxc-0.cluster1-pxc.percona-database.svc.cluster.local cluster1-pxc-1.cluster1-pxc.percona-database.svc.cluster.local cluster1-pxc-2.cluster1-pxc.percona-database.svc.cluster.local]
2021/04/06 16:20:19 execing: /usr/bin/get-pxc-state with stdin: cluster1-pxc-0.cluster1-pxc.percona-database.svc.cluster.local
cluster1-pxc-1.cluster1-pxc.percona-database.svc.cluster.local
cluster1-pxc-2.cluster1-pxc.percona-database.svc.cluster.local
2021/04/06 16:20:19
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-0.cluster1-pxc.percona-database.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-1.cluster1-pxc.percona-database.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-2.cluster1-pxc.percona-database.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
2021/04/06 16:20:20 Peer finder exiting
[ERROR] Cannot find node for backup
+ echo ‘[ERROR] Cannot find node for backup’
+ exit 1

If I understand the thread that I linked in my original post correctly, the failure is caused by all db instances residing in state “Desynced” (which I mistakenly thought was caused by grastate.dat. Unfortunately, the thread gives no clue how to proceed. This behaviour is reproducible for my specific setup, i.e. I can delete the percona-cluster-cr from my cluster, rebuild the nodes, redeploy the cr and it will return to this state.

Do you have an idea how to troubleshoot this?

1 Like

Unfortunately, I’m not well educated in K8S. It looks like there’s something missing with your secrets file? Did you create the right user and deploy that beforehand?

1 Like

The “xtrabackup” entry is in my secrets file and I can also find the secret mounted to /etc/mysql/mysql-users-secret/xtrabackup in each of the database containers.

Following a suspicion I thought that maybe the secrets were unreadable for the Backup Container, but they are mounted rw-r--r--, so any user should be able to read.

1 Like