I am running 5.6.15-56-log Percona XtraDB Cluster (GPL), Release 25.5, Revision 759, wsrep_25.5.r4061 on Fedora 20. I have three nodes within the cluster happily doing their thing for the most part yet when we start experiencing high traffic the cluster will start locking up.
The connection limit will be reached quickly with most processes in the list showing ‘wsrep in pre-commit stage’. The queries are all INSERT and UPDATE on the same table (which has a primary key).
The logs don’t show anything of interest other than ‘[Warning] Too many connections’.
I have set the gcs.fc_limit to 1000 which has helped reduce the number of times the cluster locks, however I cannot eliminate the problem completely.
Other threads have suggested checking:
SHOW STATUS LIKE ‘Threads%’; and
SELECT substring_index(host, ‘:’,1) AS host_name,state,count(*) FROM information_schema.processlist GROUP BY state,host_name; Unfortunately I haven’t been able to execute them while the problem is occurring yet.
Before the cluster we had a Master-Slave setup in place rather than Master-Master. If this problem cannot be addressed is there an easy way to revert to a Master-Slave setup?
Hi sjregan, did you find any resolution to this i am getting the same error on my production xtradb cluster, when we try to alter a table with 500K rows
here is the package version
||/ Name Version Description
++±===========================================-===========================================-======================================================================================================
un percona-server-client-5.1 (no description available)
un percona-server-client-5.5 (no description available)
un percona-server-common-5.1 (no description available)
un percona-server-common-5.5 (no description available)
un percona-server-server-5.1 (no description available)
un percona-server-server-5.5 (no description available)
ii percona-toolkit 2.2.7 Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.1.8-733-1.precise Open source backup tool for InnoDB and XtraDB
un percona-xtradb-client-5.0 (no description available)
ii percona-xtradb-cluster-client-5.5 5.5.34-25.9-607.precise Percona Server database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.34-25.9-607.precise Percona Server database common files (e.g. /etc/mysql/my.cnf)
un percona-xtradb-cluster-galera (no description available)
ii percona-xtradb-cluster-galera-2.x 163.precise Galera components of Percona XtraDB Cluster
un percona-xtradb-cluster-galera-25 (no description available)
ii percona-xtradb-cluster-server-5.5 5.5.34-25.9-607.precise Percona Server database server binaries
un percona-xtradb-server-5.0 (no description available)
Increasing gcs.fc_limit is the correct workaround but setting it to 1000 seems to be too much. It’s default is 16. You should also check disk IO latency and also review hardware settings which might need to be tuned for better performance.
hi jrivera, Thanks for the response, but these are machines in the cloud, so i am not sure how and which hardware settings i can change, i am attaching the plots from our nagiosgraphs that show the CPU usage, Disk IO and Memory consumption respective, do you see anything standing out?
well the photo doesn’t seem to upload in the right size, not sure how to send you the image, all the stats seems to quite low, disk IO avg to about 150, CPU idle is quite high as well, memory used for active data is about 67%, total used is about 80%, so i am not sure what could be contributing to this slowness in the cluster
I am seeing the exact same issue as the OP, the cluster replication will become paused with usually 4 or 5 MySQL processes in the “wsrep in pre-commit stage” state. While stuck like this all writes to the cluster are blocked and connections build up until the limit is reached.
OS is CentOS 7.1 with current updates.
I am currently have the following Percona rpm’s installed: [INDENT]Percona-XtraDB-Cluster-client-56-5.6.26-25.12.1.el7.x86_64
percona-xtrabackup-2.3.2-1.el7.x86_64
Percona-XtraDB-Cluster-garbd-3-3.12.2-1.rhel7.x86_64
Percona-XtraDB-Cluster-full-56-5.6.26-25.12.1.el7.x86_64
percona-toolkit-2.2.11-1.noarch
Percona-XtraDB-Cluster-shared-56-5.6.26-25.12.1.el7.x86_64
Percona-XtraDB-Cluster-galera-3-3.12.2-1.rhel7.x86_64
Percona-XtraDB-Cluster-galera-3-debuginfo-3.12.2-1.rhel7.x86_64
Percona-XtraDB-Cluster-server-56-5.6.26-25.12.1.el7.x86_64
Percona-XtraDB-Cluster-56-debuginfo-5.6.26-25.12.1.el7.x86_64
Percona-XtraDB-Cluster-test-56-5.6.26-25.12.1.el7.x86_64
Percona-XtraDB-Cluster-devel-56-5.6.26-25.12.1.el7.x86_64[/INDENT] I have had zero problems with the setup until yesterday when this started occurring out of the blue with no known changes to the setup. I generally use the cluster in a way that one particular node always receives the traffic from clients, and the 2 other nodes are backup. I have noticed that while the primary is hung with the “wsrep in pre-commit stage” processes, I will find one of the other nodes will have one of its CPU’s pinned at 100%. This makes sense, this node is too busy to receive flow control data so it pauses the flow. What I can’t figure out is what exactly is what the this node is doing that has the CPU pinned as nothing talks directly to the backup nodes. They should be doing nothing but what they’re getting via replication.
We were going to move the MySQL VM’s to some new hosts with SSD disks, so this issue accelerated that plan, but that did not help this problem. Reducing the cluster to 2 nodes helped a lot, the block is till occurring but limited to around a max of 16 seconds at a time, and usually less than that. Here’s some some vmstat output from the backup server during a locked up period, 1 second intervals:
The server has 4 cores, you can see about a quarter of the way down where CPU usage jumps to ~25%(1 pinned core), during that time the cluster is essentially locked to writes.
I have tried tweaking the gcs.fc_limit value to something much higher than 16 to no avail, although I have been adjusting it dynamically.