Auto recover of 3 VM based Percona Database after abrupt reboots

Manjunath_Batakurki · September 12, 2024, 5:29pm

Hi Team,

I have 3 node Percona XtraDB Cluster(MySQL 8.X) which each DB VM running on 3 different hosts. When i abruptly reboot two nodes(like reboot 2 nodes parallelly), the DB cluster is broken and database is unavailable. When checked, the two of the mysql service on them are in failed state.

Below is my.cnf which is there in all the nodes.

root@cp-db1:/var/lib# cat  /etc/mysql/mysql.conf.d/mysqld.cnf
[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
ssl-cert=/etc/mysql/ssl/tls.pem
ssl-key=/etc/mysql/ssl/key.pem
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
bind-address=0.0.0.0
port=3306
binlog_expire_logs_seconds=604800
pxc-encrypt-cluster-traffic=OFF
wsrep_provider=/usr/lib/galera4/libgalera_smm.so

wsrep_cluster_address=gcomm://<ip1>,<ip2>,<ip3>
binlog_format=ROW
#skip-name-resolve
innodb_autoinc_lock_mode=2
wsrep_node_name=cp-db1
wsrep_node_address=<ip1>
wsrep_cluster_name=morpheus
default_storage_engine=InnoDB
wsrep_sync_wait=2
wsrep_provider_options="cert.optimistic_pa=NO"
wsrep_certification_rules="OPTIMIZED"
pxc_strict_mode=PERMISSIVE
wsrep_sst_method=xtrabackup-v2
default_time_zone="+00:00"
max_connections=3001
sql_generate_invisible_primary_key=ON
ssl_fips_mode=ON

The only way i can bring up the DB cluster back is by bootstrapping the DB nodes. Is there a way to implement auto recover by setting a configuration in my.cnf ? is auto recover recommended in multi master replication DB ? Does it cause data corruption ?

Really appreciate your quick recommendations on the same.

matthewb · September 12, 2024, 10:23pm

Yes, that is 100% expected behavior. You removed 2 of the 3 nodes from the cluster. That is less than 50% of the needed majority.

Your cluster is now in non-Primary state and needs to be bootstrapped. Correct. There is no way to auto-recover from this.

Basically, don’t reboot the majority of your nodes at the same time. If you need to do this, perform a graceful mysql shutdown first. Doing so reduces the quorum calculation.

Manjunath_Batakurki · September 13, 2024, 5:25am

Thank you so much for the quick response. I would agree with you on the fact that 2 of the 3 nodes are down which would break the majority and hence the cluster is broken. The situation we have is not user triggered and system triggered which did reboot both the nodes. During planned maintenance, we are gracefully bringing down the services.

I was looking for any options or automated ways to recover (like we do it in operator for pods percona-xtradb-cluster-operator/build/pxc-entrypoint.sh at fc46e369c9cc1bfca4552fdc6b204f9b5b243227 · percona/percona-xtradb-cluster-operator · GitHub) post sudden reboots. I will also explore if there are ways to do that in the meantime.

Topic		Replies	Views
HI EVERYONE! i have a problem wwhen i restart my system ( i have 2 node of Percona XtraDB Cluster 5.7, rehl 7.9 enterprise). i can't start mysql again Percona XtraDB Cluster 5.x community , troubleshooting , mysql , percona	2	1062	September 15, 2021
Database not responding after adding a server to the cluster Percona XtraDB Cluster 8.x mysql , percona	11	913	November 16, 2022
One of the nodes periodically reconnects to another one because of timeouts Percona XtraDB Cluster 5.x	0	533	April 4, 2019
3 node cluster issue Percona XtraDB Cluster 5.x	0	482	January 17, 2017
MYSQL crash failure autoStartup Percona Distribution for MySQL mysql	5	158	April 4, 2025

Auto recover of 3 VM based Percona Database after abrupt reboots

Related topics