Is my cluster back in sync?

Hello,

I have a 2 node percona-xtradb-cluster-56 in multi-master mode. The server was crashed (both nodes) then bootstrapped, maybe in the wrong order because the users noticed that some data wasn’t up to date. It’s not relevant, what I would like to know if it is back to normal synchronized mode right now or it’s still in some flaky unpredictable state.

Here are the variables for NODE1:

±-----------------------------±--------------------------------------------------+
| Variable_name | Value |
±-----------------------------±--------------------------------------------------+
| wsrep_local_state_uuid | 53ef5e93-33de-11e5-adb5-1b37c6b643bd |
| wsrep_protocol_version | 6 |
| wsrep_last_committed | 912789 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 776 |
| wsrep_received_bytes | 56548807 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 2 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.111111 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 151 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 15.442010 |
| wsrep_local_cached_downto | 912023 |
| wsrep_flow_control_paused_ns | 458 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 1 |
| wsrep_flow_control_recv | 1 |
| wsrep_cert_deps_distance | 24.422425 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 7 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.234681 |
| wsrep_incoming_addresses | |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000237055/0.000330571/0.00051501/8.63108e-05/13 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | fa66fe2c-34f5-11e5-8c84-035250e696ad |
| wsrep_cluster_conf_id | 4 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 53ef5e93-33de-11e5-adb5-1b37c6b643bd |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 3.8(rf6147dd) |
| wsrep_ready | ON |
±-----------------------------±--------------------------------------------------+

Here are the variables for NODE2:

±-----------------------------±---------------------------------------------------+
| Variable_name | Value |
±-----------------------------±---------------------------------------------------+
| wsrep_local_state_uuid | 53ef5e93-33de-11e5-adb5-1b37c6b643bd |
| wsrep_protocol_version | 6 |
| wsrep_last_committed | 922781 |
| wsrep_replicated | 922780 |
| wsrep_replicated_bytes | 23855221085 |
| wsrep_repl_keys | 3071856 |
| wsrep_repl_keys_bytes | 45801377 |
| wsrep_repl_data_bytes | 6849937076 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 7261 |
| wsrep_received_bytes | 60045 |
| wsrep_local_commits | 922777 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 10 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.002493 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 2 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0.003443 |
| wsrep_local_cached_downto | 921845 |
| wsrep_flow_control_paused_ns | 43461847996 |
| wsrep_flow_control_paused | 0.000357 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 72 |
| wsrep_cert_deps_distance | 19.291316 |
| wsrep_apply_oooe | 0.057334 |
| wsrep_apply_oool | 0.000002 |
| wsrep_apply_window | 1.086623 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.029616 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 29 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.093278 |
| wsrep_incoming_addresses | |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000275245/0.00125926/0.00935097/0.000899017/1104 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 53eef647-33de-11e5-9145-eade2fa688ff |
| wsrep_cluster_conf_id | 4 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 53ef5e93-33de-11e5-adb5-1b37c6b643bd |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 3.8(rf6147dd) |
| wsrep_ready | ON |
±-----------------------------±---------------------------------------------------+

The wsrep_local_state_comment say it is Synced but it even say that if I shut one node down. When I create database and load data in on any side that gets replicated all right to the other side. What worries me is this missing data the users said and what I have found in the documentation “How to recover a PXC cluster Scenario6”.

If I look into this grastate.dat on the nodes:

GALERA saved state

version: 2.1
uuid: 53ef5e93-33de-11e5-adb5-1b37c6b643bd
seqno: -1
cert_index:

The seqno is -1 instead of the last valid sequence number, now I don’t know if this is normal.

Also the logs were full of warnings such as (on node2):

2015-07-28 06:25:02 30682 [Warning] InnoDB: Cannot open table mydb/field_revision_field_decision_nl_mydb_one from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.

I would just really like to know if this is back to normal state so we can continue the work or not.

Can someone help?

Thanks

You can clean up node2’s datadir and then restart node2 to SST from node1 to make sure you have consistent data between the nodes. Check to make sure you do not use MyISAM tables, if so convert them to InnoDB. Add a Garbd node to avoid split brain when one node can’t communicate to the other node.