Master hanging with slave stuck on state init, info SAVEPOINT

We had it happen again and I’ve got these this time.

Main DB:

While down -

mysql> show status like '%wsrep%';
+------------------------------+--------------------------------------+
| Variable_name | Value |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid | 2f59a082-a179-11e2-0800-b01a353c36fc |
| wsrep_protocol_version | 5 |
| wsrep_last_committed | 231400086 |
| wsrep_replicated | 6140774 |
| wsrep_replicated_bytes | 15650843998 |
| wsrep_repl_keys | 58701660 |
| wsrep_repl_keys_bytes | 611089291 |
| wsrep_repl_data_bytes | 10476034924 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 49991 |
| wsrep_received_bytes | 400270 |
| wsrep_local_commits | 6140774 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 190 |
| wsrep_local_send_queue_avg | 0.009599 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.005021 |
| wsrep_local_cached_downto | 231367756 |
| wsrep_flow_control_paused_ns | 601363003898 |
| wsrep_flow_control_paused | 0.001168 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 33 |
| wsrep_cert_deps_distance | 33.923281 |
| wsrep_apply_oooe | 0.009462 |
| wsrep_apply_oool | 0.000005 |
| wsrep_apply_window | 1.013666 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.004106 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 184182 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.025854 |
| wsrep_incoming_addresses | ,10.0.11.32:3306 |
| wsrep_cluster_conf_id | 2 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 2f59a082-a179-11e2-0800-b01a353c36fc |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 3.5(r178) |
| wsrep_ready | ON |
+------------------------------+--------------------------------------+
47 rows in set (0.00 sec)

Some of the 200+ queries stuck in this state -

| 9476801 | dbuser | 10.0.11.24:56432 | dbname | Query | 179 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484122 | dbuser | 10.0.11.27:35341 | dbname | Query | 179 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484123 | dbuser | 10.0.11.27:35344 | dbname | Query | 179 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484124 | dbuser | 10.0.11.26:49171 | dbname | Query | 179 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484125 | dbuser | 10.0.11.25:39560 | dbname | Query | 179 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484131 | dbuser | 10.0.11.24:33790 | dbname | Query | 178 | wsrep in pre-commit stage | INSERT INTO tb1 (g_id, q, r, r_short) VALUES (28592235, 'xe0
t | 0 | 0 |
| 9484134 | dbuser | 10.0.11.24:33799 | dbname | Query | 178 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484140 | dbuser | 10.0.11.26:49184 | dbname | Query | 178 | wsrep in pre-commit stage | INSERT INTO tb1 (g_id, q, r, r_short) VALUES
(28598937, 'ik9v | 0 | 0 |
| 9484143 | dbuser | 10.0.11.24:33806 | dbname | Query | 178 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484148 | dbuser | 10.0.11.26:49189 | dbname | Query | 178 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484149 | dbuser | 10.0.11.27:35363 | dbname | Query | 178 | wsrep in pre-commit stage | commit
| 0 | 0 |
| 9484151 | dbuser | 10.0.11.25:39572 | dbname | Query | 178 | wsrep in pre-commit stage | commit
| 0 | 0 |

Once back up -

mysql> show status like '%wsrep%';
+------------------------------+--------------------------------------+
| Variable_name | Value |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid | 2f59a082-a179-11e2-0800-b01a353c36fc |
| wsrep_protocol_version | 5 |
| wsrep_last_committed | 231403297 |
| wsrep_replicated | 6143985 |
| wsrep_replicated_bytes | 15655416221 |
| wsrep_repl_keys | 58733820 |
| wsrep_repl_keys_bytes | 611420652 |
| wsrep_repl_data_bytes | 10480070282 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 50023 |
| wsrep_received_bytes | 400636 |
| wsrep_local_commits | 6143985 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.009632 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.005118 |
| wsrep_local_cached_downto | 231367901 |
| wsrep_flow_control_paused_ns | 611924764021 |
| wsrep_flow_control_paused | 0.001188 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 33 |
| wsrep_cert_deps_distance | 33.932388 |
| wsrep_apply_oooe | 0.009558 |
| wsrep_apply_oool | 0.000006 |
| wsrep_apply_window | 1.017967 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.008289 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 353 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.030719 |
| wsrep_incoming_addresses | |
| wsrep_cluster_conf_id | 3 |
| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | 2f59a082-a179-11e2-0800-b01a353c36fc |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 3.5(r178) |
| wsrep_ready | ON |
+------------------------------+--------------------------------------+
47 rows in set (0.00 sec)