I have 5 node PXC cluster ( 2 nodes in DC1, 3 nodes in DC2). Haproxy is configured such that it points to one primary write node in local dc and backend config has other nodes( localdc first and then remote dc)
Issue scenario: DC2 was active, DC1 passive. Apps were connected to Node1 in DC2. Max_used_connections reached and all writes/reads were stalled. The spike in connection count happened within 10mins and it almost doubled than the standard connection count.
Then I stopped the Percona service on the Node1 DC2, expecting the connections to be routed to Node2 DC2 since quorum(3 out of remaining 4 nodes) was satisfied, but this didn’t help, still the writes were not happening.
I then stopped Percona service on two nodes in passive DC( DC1) which made the cluster working normal.
At this point I had first node in DC2(active) stopped and two nodes in DC1(passive) stopped
I understand PXC considerations for synchronous replication, below are the few questions which help me investigate my case which need your insight.
Reason for connection spike: Assuming the earlier connections to the db were stalled and apps didn’t receive any response so the clients started reconnecting which doubled the connection count and blocked all reads/writes, how can I check this from DB ? Is there a way I can check from db or grafana were I can see if my connections to DB was stalled or left idle or waiting ?
I have log_error_verbosity=3, but I don’t have any information if the previous connections were stalled or what caused connection spike. I saw only too many connections recorded in error.log
Quorum was not satisfied hence writes were blocked ? When one node is down in cluster, shouldn’t the remaining 4 nodes have majority and continue the cluster functionality ? Quorum should be maintained at a cluster level not dc level correct ?
Having arbritator node will solve the issue ? 2 nodes in DC1, 2 nodes in DC2, 1 node in DC3 will this architecture help ? When one node in primary site is down then again we end up with 4 nodes will it help satisfy quorum?
Write consistency was missing ? Is it that the remaining 4 nodes in cluster were not aware of the write consistent state of the first node which was stopped due to high connections hence the cluster blocked all reads and writes until manual bootstrap of the node ?
I have only stopped service on the passive nodes but not bootstrapped any node to recover the cluster.
I have seen graceful shutdown of one node in cluster, does not impact reads/writes in my cluster. But sudden events(one eg:connection spike) or weird behavior puts a machine in bad state then my cluster is not able to recover even if it has 4 nodes healthy. Is this a expected behavior ? Do I need any tuning ?
Do you recommend PXC running in production with active-active-active setup over WAN (2-nodes DC1, 3-nodes DC2, 2-nodes DC3) ?
Reason for connection spike: Assuming the earlier connections to the db were stalled and apps didn’t receive any response so the clients started reconnecting which doubled the connection count and blocked all reads/writes, how can I check this from DB ? Is there a way I can check from db or grafana were I can see if my connections to DB was stalled or left idle or waiting ?
I have log_error_verbosity=3, but I don’t have any information if the previous connections were stalled or what caused connection spike. I saw only too many connections recorded in error.log
For current monitoring or validation, you can check using the command below.
mysql> show full processlist;
or
mysql> select * from information_schema.processlist;
For more granularity of information (transactions, semaphores, redo etc.), you can check the InnoDB status.
mysql> show engine innodb status\G;
For historical insight into connections, it’s better to have some monitoring enabled, like Grafana or PMM.
Quorum was not satisfied hence writes were blocked ? When one node is down in cluster, shouldn’t the remaining 4 nodes have majority and continue the cluster functionality ? Quorum should be maintained at a cluster level not dc level correct ?
Having arbritator node will solve the issue ? 2 nodes in DC1, 2 nodes in DC2, 1 node in DC3 will this architecture help ? When one node in primary site is down then again we end up with 4 nodes will it help satisfy quorum?
This also depends on how you configured your DR. Are the other side (DR) connected to the DC via Asynchronous replication, or is it part of the Galera Virtual replication directly?
Considering 3 Nodes in DC, if one node is down, ideally, the primary component should be available, as the majority of nodes (2 members) are still available; however, there are some other scenarios, like flow control, where triggers can stall writes on the entire cluster. Please note that PXC/Galera deals in virtually synchronous replication , so if any of the node is struggling or having any performance problems already can degrade or impact the whole cluster performance.
You can monitor based on the last status, which node triggers flow control[wsrep_flow_control_sent], and what other nodes received/impacted by FC [wsrep_flow_control_recv].
node1 [localhost:29334] {root} ((none)) > show global status like 'wsrep_flow_control_%';
+----------------------------------+--------------+
| Variable_name | Value |
+----------------------------------+--------------+
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_active | false |
| wsrep_flow_control_requested | false |
| wsrep_flow_control_interval | [ 173, 173 ] |
| wsrep_flow_control_interval_low | 173 |
| wsrep_flow_control_interval_high | 173 |
| wsrep_flow_control_status | OFF |
+----------------------------------+--------------+
10 rows in set (0.00 sec)
About Flow control and related settings, you can better read here -
Write consistency was missing ? Is it that the remaining 4 nodes in cluster were not aware of the write consistent state of the first node which was stopped due to high connections hence the cluster blocked all reads and writes until manual bootstrap of the node ?
I have only stopped service on the passive nodes but not bootstrapped any node to recover the cluster.
Well, to know exactly what happened, you can share the MySQL error logs for the period during which the issue occurred on all 5 nodes. It is likely due to heavy connections or writes, some flow-control activity that impacts the whole cluster, or a performance issue with the specific node.
I have seen graceful shutdown of one node in cluster, does not impact reads/writes in my cluster. But sudden events(one eg:connection spike) or weird behavior puts a machine in bad state then my cluster is not able to recover even if it has 4 nodes healthy. Is this a expected behavior ? Do I need any tuning ?
Sudden writes can cause FC issues if they exceed the received queue limit. Moreover, in the database, the max_connections are exhausted, so even if one node is down, the other might not be able to clear the connections, or the limit has already been reached there as well.
Exactly what those connections were doing? Are they mostly writing or reading also?
Any unoptimized/slow query can also impact the performance. There could be chances of contention/semaphores or locking overhead as well.
To check the real-time locking/waiting table details, you can use the query below.
SELECT
waiting_trx_id,
waiting_pid,
waiting_query,
blocking_trx_id,
blocking_pid,
blocking_query
FROM sys.innodb_lock_waits;
SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID = <blocking_pid>
SELECT THREAD_ID, SQL_TEXT FROM performance_schema.events_statements_current
WHERE THREAD_ID = <THREAD_ID>;
Do you recommend PXC running in production with active-active-active setup over WAN (2-nodes DC1, 3-nodes DC2, 2-nodes DC3) ?
We don’t encourage running multi-write on PXC nodes. At a time, only one node should be writing.
Writing to multiple nodes can cause data inconsistencies and conflicts/deadlocks.