2nd node brings down first node for connections

Hi,

I have a 2 node cluster running this version: mysql Ver 8.0.27-18.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3) on ubuntu 20.04.

My app is at the moment my app is directly connecting to node 1 instead of a load balancer, for reasons i can’t say.
But sometimes the 2nd node crashes for some reason i can’t find and that makes it impossible for connections to happen to node 1. My app is simply crashing because it can’t connect to the MySQL.

I can’t find any reason in the log for why this is happening.

2 Likes

Hi Simon,

if you are using PXC, you should have 3 nodes minimum because of “the primary component” as explained here The Primary Component in Galera Cluster | Galera Cluster for MySQL

You should either add a 3rd node or add an arbitrator: Setting up Galera Arbitrator - Percona XtraDB Cluster

Regards

2 Likes

Hi @Simon_Karberg , welcome to the Percona forums!

As Carlos mentioned, 3 nodes is recommended for production usage, however PXC can be used with 2 and even just 1 node, you just don’t gain any high availability guarantees with less than 3 nodes.

It would be best if you shared with us the error logs from node1 and node2 so that we can help diagnose what caused node2 to crash, and why that put node1 into an offline state. Please either post them here after redacting any private information, or send the logs to me over DM. Best of luck!

1 Like

I will suggest to my manager to setup a 3rd node.
I really don’t know what happened with the error logs… but it suddenly didn’t write anything to the log around when it crashed.
So i need to update my logs config a little bit, clean up the current log files i have (many are leftovers from mysql 5.6, ubuntu 16.04).

So i suggest to close this thread, i will convince my manager to setup a 3rd node.
Maybe one last thing.
Currently it seems that wsrep_cluster_status is Primary on both my nodes, how do I get that changed?

1 Like

Hi Simon,

Check memory usage. Maybe PXC gets killed without a chance to write to the error.log

And you cannot change wsrep_cluster_status , it gets updated whenever there are topology changes. You can force a node from “Primary” to “non-primary” by crashing on purpose some other instances in the topology (something you must never do on production).

Regards

1 Like

Okay, so by adding a 3rd node it should technically automatically change?

No. adding a 3rd node won’t change “Primary” component but will help in guaranteeing cluster consistency.

Quorum elections works like this. Whenever there is an unexpected change in the topology (i.e a node crashed or a is unreachable), the remaining nodes will call elections to see current cluster status.
Let’s say you have 2 nodes (by default 2 votes in total) but only 1 node is voting (1/2 = 50%). Quorum is reached (and Primary component is maintained) when remaining nodes have more than 50% of the votes, so in this case if 2nd node is down , the 1st node will go to “non-Primary component” until you restart/reconnect node 2.
If you have 3 nodes and one of them crashes, the other 2 nodes (2 out of 3) will have 66% voting power which is larger than the 50% required and will keep the component as Primary.

if the server is in non-Primary, connects won’t be able to execute until the component gets back to primary.

If the 2nd node is gracefully shutdown and/or if weights of each node (voting power) is different, the 1st node might not go to non primary but might continue as Primary and serving trafffic.

You should check if after 2nd node crashes, the 1st node is either in non-Primary, or if it gets overloaded (too many connections) or something else.
Based on what you said (not possible to execute queries on node1) I assumed this was the most likely reason with PXC, but you better check both error.logs to get more information about the issue.

Regards

1 Like

Hey @CTutte

I have now managed to add a 3rd node in the cluster and it’s fully synced.
But when I look status with show status like 'wsrep%'; then wsrep_cluster_status are all listed as Primary, shouldn’t it balance out? :thinking:

1 Like

“Primary” means “primary component”. I.e all nodes are interacting with each other and is a healthy status for the cluster.
While in “Primary”, each node will work normally but when in “non-primary” the node will refuse operations because he is not in sync. Remember that PXC aims for data consistency at the cost of performance overhead

If there is a brain split scenario, quorum elections will be called and the nodes that can still communicate with each other will vote. Depending on the current weight vs previous weight, the vote result will result in nodes still being on the “Primary” component or “non-Primary”.

You can read more about this here The Primary Component in Galera Cluster | Galera Cluster for MySQL.

1 Like