PXC 8.0 EKS with operator writes hangs when one node leave the cluster

Antonio_Falzarano · October 22, 2024, 4:19pm

Hi, i have already opened an Issue PXC-4524 what seems a serious bug that we found and reproduced into PXC 8.0
All steps to reproduce are written into issue.
Write also here to have help and a feedback about it

Thanks

matthewb · October 23, 2024, 12:09pm

There’s no log files attached. Hard to diagnose what the problem is without seeing the error logs from the PXC nodes.

Antonio_Falzarano · October 23, 2024, 1:36pm

Hi @matthewb i have all logs that you need, on issue PXC-4524 i have insert only show processlist because i can’t find a way to upload it and logs are over the size of the comment, but i can put here.
show_global_status.txt (16.7 KB)
show_innodb_status.txt (37.2 KB)
show_process_list.txt (4.8 KB)

matthewb · October 23, 2024, 2:07pm

Hello @Antonio_Falzarano, we really need the logs. The 3 items you uploaded do not help diagnose the issue. Please remove log contents for dates prior to this issue, compress them, then upload to the ticket or here. Don’t put them as a comment.

Antonio_Falzarano · October 23, 2024, 4:15pm

Hi @matthewb i put here logs because on atlassian i can’t find a way to upload it.
Another things the forum doesn’t allow compressed extensions… so i renamed it and added .txt

db6-crash.tgz.txt (53.3 KB)

matthewb · October 24, 2024, 1:00pm

I see some issues,

Too many connections

You need to increase max_connections or reduce the max number of frontend connections.

2024-10-09T08:41:05.090211Z 9 [ERROR] [MY-010584] [Repl] Replica SQL: Error ‘Table ‘sbtest3’ already exists’ on query. Default database: ‘sbtest’. Query: ‘CREATE TABLE sbtest3( id INTEGER NOT NULL AUTO_INCREMENT, k INTEGER DEFAULT ‘0’ NOT NULL, c CHAR(120) DEFAULT ‘’ NOT NULL, pad CHAR(60) DEFAULT ‘’ NOT NULL, PRIMARY KEY (id) ) /*! ENGINE = innodb */’, Error_code: MY-001050

This caused a cluster-wide revote of membership, and may have caused some nodes to drop out and re-SST. Try to use CREATE TABLE IF NOT EXSTS. The error says ‘Replica SQL’, are you using async replication somewhere? If so, it looks like replication was connected to another node, then to this node, which repeated the binlog contents. If you are using async replication, only 1 member of PXC should be handling this.

Antonio_Falzarano · October 24, 2024, 1:52pm

Hi @matthewb , too many connections error is a consequence of the bug, writer pxc node hangs the ddl queries and they sums until they reach the max connection.

About replica yes, we have also that but you can skip it because i have reproduced the bug also without it, i give you the test case logs also with the replica attached because was the first that i found, but i have also without it, infact the configuration to replicate the bug that i share with you is without any slave

matthewb · October 24, 2024, 2:09pm

Yes, please provide all of the exact steps to reproduce the issue without async replication. Include in your steps setting up EKS, installing the operator, launching the cluster, pod status, etc, etc. We need to reproduce it exactly as you, so don’t leave out any details. Please put all these steps into the JIRA ticket, as JIRA and these forums are not linked.

Antonio_Falzarano · November 21, 2024, 11:44am

Hi, we solved our issue setting innodb_thread_concurrency = 0 as suggested inside PXC-4524.
I put here also the bug that was already covered by Mariadb https://jira.mariadb.org/browse/MDEV-25883 and suggest Percona to apply the same patch.

Thanks

Topic		Replies	Views
Increased amount of deadlocks with Percona XtraDB Cluster with the PXC Operator & ProxySQL Percona Operator for MySQL mysql , percona , kubernetes	4	44	July 1, 2025
Percona XtraDB Cluster 8.0.26-16.1 Released Percona XtraDB Cluster 8.x new-release	3	623	January 26, 2022
Cluster hangs during alter table command Percona XtraDB Cluster 5.x	0	575	June 14, 2018
Percona XtraDB Cluster stucks when softfailed node returns to cluster Percona XtraDB Cluster 8.x mysql , percona , closed-no-reply	0	765	August 22, 2022
Updating data causes PXC Crash Percona XtraDB Cluster 5.x	1	484	April 14, 2013

PXC 8.0 EKS with operator writes hangs when one node leave the cluster

Related topics