wsrep in pre-commit stage :-( :-( :-(

Hello,

I’m trying to fix the errors that we have a few times per day (“wsrep in pre-commit stage”).

We are using the cluster with 4 nodes (Percona XtraDB Cluster (GPL), Release rel70.1, Revision 938, WSREP version 25.8, wsrep_25.8.r4150, and the “version” is : “5.6.21-70.1-56-log”). There is only 1 server that is used for the writes and the others are used for the reads, through a load balancer. The servers are running on Debian 7.8.

A “show status like ws%” shows the table below.

The server’s CPUs are not doing anything, the memory is only 50% full, the disks are really available (< 1% of utilization). We do not have a lot of requests per second.

Any idea of what I should look into ?

±-----------------------------±---------------------------------------------------------------------------+
| Variable_name | Value |
±-----------------------------±---------------------------------------------------------------------------+
| wsrep_local_state_uuid | 700ede00-9a5c-11e4-8fdd-8e7fb515d63a |
| wsrep_protocol_version | 5 |
| wsrep_last_committed | 429286425 |
| wsrep_replicated | 1458702 |
| wsrep_replicated_bytes | 750830711 |
| wsrep_repl_keys | 4772765 |
| wsrep_repl_keys_bytes | 71738407 |
| wsrep_repl_data_bytes | 546080166 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 284599 |
| wsrep_received_bytes | 138352979 |
| wsrep_local_commits | 1457308 |
| wsrep_local_cert_failures | 429 |
| wsrep_local_replays | 49 |
| wsrep_local_send_queue | 71 |
| wsrep_local_send_queue_avg | 0.110698 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 140.119863 |
| wsrep_local_cached_downto | 429079945 |
| wsrep_flow_control_paused_ns | 8752242215169 |
| wsrep_flow_control_paused | 0.097632 |
| wsrep_flow_control_sent | 5779 |
| wsrep_flow_control_recv | 6333 |
| wsrep_cert_deps_distance | 61.608880 |
| wsrep_apply_oooe | 0.068266 |
| wsrep_apply_oool | 0.001023 |
| wsrep_apply_window | 1.270167 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.001011 |
| wsrep_commit_window | 1.181970 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 285 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.399305 |
| wsrep_incoming_addresses | 10.1.0.63:3306,10.1.0.62:3306,10.1.0.60:3306,10.1.0.68:3306,10.1.0.66:3306 |
| wsrep_cluster_conf_id | 318 |
| wsrep_cluster_size | 5 |
| wsrep_cluster_state_uuid | 700ede00-9a5c-11e4-8fdd-8e7fb515d63a |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 20 |
| wsrep_local_index | 2 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 3.5(r178) |
| wsrep_ready | ON |
±-----------------------------±---------------------------------------------------------------------------+

Little update: we updated to the latest version and are closely monitoring these servers.