Database cluster hang problem

bthnklc · February 20, 2023, 1:21pm

Hello,

We have a 5 node percona xtradb cluster structure and all nodes work as master-master. We are using MySQL version 8.0.29. 3 nodes are located on dc1. 2 nodes are located on dc2. I added a slave db node to one of the nodes located on dc2 with a 3 hour delay replication. After about half an hour, all the master servers were down.
note: I installed the slave db database server I added as percona server. In the attachment, I have added the log of the node - slave db’ log and the log of the master server to which the slave db is connected. I would like you to share your knowledge and experience about the operability of this structure and the reason for its closure.
dc2_error_20_158_slave.log (7.1 KB)
error_19_156_other.log (16.0 KB)
dc1_error_19_156_main.log (370.5 KB)
dc2_error_20_157.log (41.8 KB)

Pep_Pla · February 20, 2023, 9:41pm

Hi @bthnklc

I think there are several things in your configuration that can lead to problems:

PXC 5 node cluster split in two DCs (If, for whatever reason, communication is broken dc2 will go down)
PXC with multiple writing nodes and foreign keys.

It looks like something happened to your cluster, and then there was a cascade of events that brought the whole cluster down.

I recommend that you try to simplify the architecture to find the root cause of the problem:

Do not write into multiple nodes.
Remove triggers (or use them only for integrity validation, not to propagate changes).

Pep

bthnklc · March 2, 2023, 12:15pm

There is no problem in the connection between DC1 and DC2. It never happened. We use 10gig connection. At the existing 5 knotted structure, applications are adjusted to only one server on the DNS. There is a question here. If any writing process is performed on one of the other nodes in the structure I specified, does the cluster work?

Pep_Pla · March 2, 2023, 9:02pm

2023-02-20T08:00:29.003926Z 0 [Note] [MY-000000] [Galera] (92d437c2-a685, 'ssl://0.0.0.0:4567') connection to peer f31f99f1-92e7 with addr ssl://172.19.0.157:4567 timed out, no messages seen in PT3S,

Looks like there were some timeouts. This is why I was talking about possible network issues.

The cluster should work if any node gets writes, but this can cause concurrency issues.

bthnklc · March 3, 2023, 6:48am

Thank you for the information. Regardless of this issue, I have one more question. I start bootstrap from a server in a 3-node cluster structure and start the other nodes in order. When restarts are required, I can restart other nodes one by one with systemctl restart, but the node I started with bootstrap does not restart when shut down. I would like your help on how to proceed here.

Pep_Pla · March 14, 2023, 2:26pm

Do you mean the database does not restart when you restart the node? Or if you issue a systemct restart mysql does nothing?

In the second case, try the following:

systemctl stop mysql@bootstrap # only if the other nodes are running!
systemctl start mysql

Then check if systemctl restart mysql works.

Topic		Replies	Views
XtraDB Cluster Offline Working Percona XtraDB Cluster 5.x	3	638	November 17, 2016
Percona XtraDB MySQL Cluster not replicate between nodes Percona XtraDB Cluster 5.x	2	951	May 29, 2013
Cluster randomly going NON-PRIM and then recovering. Percona XtraDB Cluster 5.x	0	551	September 19, 2017
Replicating into cluster is slow Percona XtraDB Cluster 5.x	15	2361	November 25, 2020
Write performance is horrible on 2nd node Percona XtraDB Cluster 5.x	2	971	September 9, 2013

Database cluster hang problem

Related topics