MySql cluster failed after catastrophic power failure and power-gen failed to feed datacenter

teerayuth · September 22, 2020, 6:53am

First of all, I’m a newbie for Linux, MySQL, Percona XtraDB Cluster, and all its kind. I’m from the pure Windows Servers & Networking field. So please bear with my stupid + long questions here, I’m trying hard to achieve Linux and MySQL skills now.

Allow me to get to the point, the guy who set up the said MySQL cluster has left for months while the cluster keeps running without any issue. Unfortunately, last week, the main power went down unexpectedly while a storm hitting the city (our HQ campus) and our UPS cannot hold all the loads for such a long time, even worst, the power-generator also broken…and Bam! all infrastructure equipment been shutdown just like we pull out all the plugs.

We have 4 nodes (yes, 4 nodes) and one cluster interface server (int) running CentOS 7 + Percona XtraDB + HAProxy, two nodes (n1 & n2) sitting in our HQ campus were down, another two (n3 & n4) sitting at the branch campus were up at that time. And when power restored at HQ, all 3 servers (n1,n2, and int) refused to auto power on from the VMware vCenter setup. And I didn’t notice that until our automation script sent the alert to inform that it cannot perform the automation due to not able to access the DB through the interface server with port 3306.

I’ve tried many methods found in the Percona forum as well as from Expert-Exchange, Stack Overflow, and so on, no luck. The cluster seems completely dead and I am completely frustrated as no idea what to do. So I registered to this forum to seek help from the experts. Please kindly help me.

Any additional information needed, please do let me know.

Appreciated any support/advice you can provide.

vadimtk · September 29, 2020, 1:37pm

That’s an unpleasant story.

To provide any response I would need to see log files from all nodes, and also what exactly you are doing and what errors do you see?

teerayuth · September 30, 2020, 4:19am

Hello Vadimtk,
Thank you for reaching my post and replied this.

However, I found the resolution to rebuild and repair cluster already by:

Searching for the highest SQNO on all the nodes.

Power-off all nodes except the highest SQNO one.
Execute command: systemctl mysql@bootstrap.service on the said highest SQNO node.
Wait...and cross your fingers, until the MySQL service on the bootstrap node is up.
Power-on the rest nodes "one at a time" and keep eyes on the cluster sync status. By execute command: clustercheck and cat /var/lib/mysql/grastate.dat to check the GALERA saved state information, etc.

*** My steps above may wrong or not practical same as when the Linux experts investigate the same issue, but this is what I found and it works for my case. ***

Thank you

Teerayuth

vadimtk · September 30, 2020, 6:11am

I would not power-off nodes, just shutdown mysqld is enough, otherwise your steps is how I would do it.

teerayuth · September 30, 2020, 9:09pm

Hello Vadimtk,

Thank you so much for your kind advice, I will remember that and follow the proper procedure next time.

Have a nice day!

Regards,

Teerayuth

Topic		Replies	Views
Urgent Problems with Out Percona XtraDB Cluster Percona XtraDB Cluster 5.x	11	1507	August 13, 2018
I have the following questions Percona XtraDB Cluster 5.x	0	399	May 6, 2019
All nodes in cluster failed. Percona XtraDB Cluster 5.x	2	810	February 17, 2014
MYSQL crash failure autoStartup Percona Distribution for MySQL mysql	5	158	April 4, 2025
xtradb cluster crashing problem Percona XtraDB Cluster 5.x	1	1680	January 8, 2013

MySql cluster failed after catastrophic power failure and power-gen failed to feed datacenter

Related topics