Increased amount of deadlocks with Percona XtraDB Cluster with the PXC Operator & ProxySQL

Description:

We are encountering frequent deadlocks in our Percona XtraDB Cluster (PXC) setup after migrating from a single-instance MySQL 8 deployment. Our cluster runs on AWS EKS using the latest version of Percona XtraDB Cluster Operator with 3 nodes (pxc-0, pxc-1, and pxc-2), and we are using ProxySQL to route connections to the cluster.

We suspect that write traffic is intermittently being routed to other nodes as well (e.g., pxc-1 or pxc-2). This seems to be causing transaction inconsistencies and deadlocks, especially under concurrent load from our application, which is PHP Symfony with Doctrine ORM.

We would appreciate any advice or diagnostic steps to confirm whether ProxySQL is indeed violating the single-writer configuration, and how best to enforce strict write routing in our environment.

We are using the default ProxySQL configuration that comes with the Operator, we only increased the default_query_timeout.

The deadlocks are only occurring on high-load tables during peak traffic and everything else is working quite performant and we see no issues there.

Version:

percona-xtradb-cluster-operator:1.17.0
percona-xtradb-cluster:8.0.41-32.1
ProxySQL version 2.7.1-percona-1.2

Logs:

This log is from pxc-1

2025-06-12T09:25:08.618349Z 19 [Note] [MY-000000] [WSREP] Victim thread: 
   THD: 198, mode: local, state: exec, conflict: certifying, seqno: -1
   SQL: INSERT INTO <redacted>
TRANSACTION 2336874677, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT , undo log entries 1
MySQL thread id 20, OS thread handle 139774872467008, query id 3463480 wsrep: writing rows
TRANSACTION 2336874678, ACTIVE 0 sec
mysql tables in use 1, locked 1
, undo log entries 1
MySQL thread id 198, OS thread handle 139747376543296, query id 3463478 10.198.27.70 app update
2025-06-12T09:25:08.618349Z 19 [Note] [MY-000000] [WSREP] Victim thread: 
   THD: 198, mode: local, state: exec, conflict: certifying, seqno: -1
   SQL: INSERT INTO <redacted>
TRANSACTION 2336874677, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT , undo log entries 1
MySQL thread id 20, OS thread handle 139774872467008, query id 3463480 wsrep: writing rows
TRANSACTION 2336874678, ACTIVE 0 sec
mysql tables in use 1, locked 1
, undo log entries 1
MySQL thread id 198, OS thread handle 139747376543296, query id 3463478 10.198.27.70 app update
INSERT INTO <redacted>

Expected Result:

  • All write operations are routed strictly to pxc-0.
  • No transactional deadlocks.

Actual Result:

  • Deadlocks occur frequently under concurrent load.
  • PMM and ProxySQL statistics suggest that write traffic may be reaching pxc-1 or pxc-2 unexpectedly.
  • Application experiences transaction failures due to deadlocks.

Additional Information:

  • We are running the database on AWS EKS with EBS gp3 volumes with highest IOPS 16000 and 1000 MiB/s throughput
  • Our PXC pods have 6 CPU requests and 52Gi memory request
  • We are using the custom MySQL configuration
      [mysqld]
      local_infile=1
      sql_mode="NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
      pxc_strict_mode=permissive
      default_time_zone="Europe/Berlin"
      wsrep_auto_increment_control=off
      join_buffer_size=64M
      key_buffer_size=16M
      sort_buffer_size=2M
      skip-name-resolve=on
      innodb_redo_log_capacity=4G
      innodb_buffer_pool_instances=18
      innodb_log_buffer_size=16M
      innodb_buffer_pool_size=21G
      max_connections=300
      wsrep_applier_threads=12
      wsrep_retry_autocommit=5
      wsrep_log_conflicts=on
      wsrep_provider_options = "gcache.size=5G; gcache.page_size=5G; gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_single_primary = yes"
      binlog_row_metadata=FULL

Thanks for your help!

Hey @kevinrudde,
Log in to the admin interface of ProxySQL and check the mysql_galera_hostgroups table. Make sure max_writers is set to 1. If you change this, be sure to SAVE MYSQL SERVERS TO DISK and LOAD MYSQL SERVERS TO RUNTIME to take affect. Check the table 'runtime_mysql_servers` too and ensure there’s only 1 server in the writer hostgroup.

If all that checks out, then we need to look further. You can pull down proxysql’s logs to validate/verify if the writer is changing.

Hey @matthewb,

max_writers is set to 1.

mysql> select * from mysql_galera_hostgroups;
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| 11               | 12                      | 10               | 13                | 1      | 1           | 1                     | 100                     | NULL    |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
1 row in set (0.036 sec)

But there are multiple servers in the writer hostgroup, but with different weights. The PXC Operator does add them I think, I did not configure anything in ProxySQL.

mysql> select * from mysql_servers;
+--------------+---------------------------------------------------------------+------+-----------+--------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname                                                      | port | gtid_port | status | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------------------------------------------------------+------+-----------+--------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000000 | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000000 | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 12           | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 12           | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+---------------------------------------------------------------+------+-----------+--------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
8 rows in set (0.031 sec)
mysql> select * from runtime_mysql_servers;
+--------------+---------------------------------------------------------------+------+-----------+---------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname                                                      | port | gtid_port | status  | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------------------------------------------------------+------+-----------+---------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE  | 1000000 | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE  | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE  | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE  | 1000000 | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | SHUNNED | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | SHUNNED | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 12           | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE  | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
| 12           | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0         | ONLINE  | 1000    | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+---------------------------------------------------------------+------+-----------+---------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
8 rows in set (0.024 sec)

I can’t find anything useful in the ProxySQL logs, only that the operator removes all servers and re-add them.

HID: 11 , address: core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000000 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 11 , address: core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000 , status: SHUNNED , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 11 , address: core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000 , status: SHUNNED , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 10 , address: core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000000 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 10 , address: core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 10 , address: core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 12 , address: core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
HID: 12 , address: core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local , port: 3306 , gtid_port: 0 , weight: 1000 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: 
2025-06-16 18:54:47 [INFO] Dumping mysql_servers: ALL
+-----+---------------------------------------------------------------+------+------+---------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid | hostname                                                      | port | gtid | weight  | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+-----+---------------------------------------------------------------+------+------+---------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 11  | core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000000 | 0      | 0   | 1000      | 0       | 0   | 0       |         | 140207130166912 |
| 12  | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000    | 0      | 0   | 1000      | 0       | 0   | 0       |         | 140206822589952 |
| 12  | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000    | 0      | 0   | 1000      | 0       | 0   | 0       |         | 140206822588992 |
| 10  | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000    | 0      | 0   | 1000      | 0       | 0   | 0       |         | 140206822589760 |
| 10  | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000    | 0      | 0   | 1000      | 0       | 0   | 0       |         | 140206822588800 |
| 10  | core-pxc-pxc-0.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000000 | 0      | 0   | 1000      | 0       | 0   | 0       |         | 140206822590528 |
| 11  | core-pxc-pxc-2.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000    | 1      | 0   | 1000      | 0       | 0   | 0       |         | 140206822589568 |
| 11  | core-pxc-pxc-1.core-pxc-pxc.core-production.svc.cluster.local | 3306 | 0    | 1000    | 1      | 0   | 1000      | 0       | 0   | 0       |         | 140207130167104 |
+-----+---------------------------------------------------------------+------+------+---------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
2025-06-16 18:54:47 [INFO] Received SAVE MYSQL QUERY RULES FROM RUNTIME command
2025-06-16 18:54:47 [INFO] Received SAVE MYSQL USERS FROM RUNTIME command
2025-06-16 18:54:47 [INFO] Received SAVE MYSQL VARIABLES FROM RUNTIME command

Strange behavior with the operator. When you use ProxySQL by itself, it does not do this, and instead only keeps 1 server in the writer group.

The weight is indeed causing your issue. The weight is probabilistic, not an exact behavior.

You can remove the other two servers from that group, provided that the operator does not re-add them after some time, or change the weights of the other 2 to ‘1’ for both. That will significantly lower the probability they are selected for query execution.

You might also open a JIRA https://jira.percona.com/ for the operator on this to see what our developers have to say. Perhaps there is a hidden/tech preview feature to disable this behavior.

1 Like