whole cluster crashed due to table not synced

Hi All,
We had an 7 nodes cluster which crossing three datacenter. with 2/2/3 on different datacenter.
we got problem as the whole cluster not function due to table not synced with 11 minutes delay…

the table was first created on node 1 with the time 2015-02-25 23:59:50, but it not synced to other nodes immediately, and other node failed at 0:11:00 when there is some operation on the table. this make the node 1 an standalone node.

we want to understanding why the replication not replicated, is there any monitoring metric could be alerting in such case ?
any bug for such case?

logs from other 6 nodes

150226 0:11:00 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 4190686)
150226 0:11:00 [ERROR] Slave SQL: Error executing row event: ‘Table ‘keystone.credential’ doesn’t exist’, Error_code: 1146
150226 0:11:00 [Warning] WSREP: RBR event 4963 Write_rows apply warning: 1146, 4190686
150226 0:11:00 [Warning] WSREP: Failed to apply app buffer: seqno: 4190686, status: 1
at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time


150226 0:11:04 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 4190686)
150226 0:11:04 [ERROR] Slave SQL: Error executing row event: ‘Table ‘keystone.credential’ doesn’t exist’, Error_code: 1146
150226 0:11:04 [Warning] WSREP: RBR event 4963 Write_rows apply warning: 1146, 4190686
150226 0:11:04 [ERROR] WSREP: Failed to apply trx: source: 8b8e06eb-594e-11e4-99d3-822b0d796705 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 15113818 trx_id: 4453338 seqnos (l: 4198354, g: 4190686, s: 4190685, d: 4190660, ts: 1424934617949390287)
150226 0:11:04 [ERROR] WSREP: Failed to apply trx 4190686 10 times
150226 0:11:04 [ERROR] WSREP: Node consistency compromized, aborting…
150226 0:11:04 [Note] WSREP: Closing send monitor…
150226 0:11:04 [Note] WSREP: Closed send monitor.
150226 0:11:04 [Note] WSREP: gcomm: terminating thread
150226 0:11:04 [Note] WSREP: gcomm: joining thread
150226 0:11:04 [Note] WSREP: gcomm: closing backend
150226 0:11:04 [Note] WSREP: view(view_id(NON_PRIM,3630e80f-594f-11e4-9fd8-b2a0ad1b9db3,33) memb {
adb89b17-5a25-11e4-833d-c63263a4e8ac,
} joined {
} left {
} partitioned {
3630e80f-594f-11e4-9fd8-b2a0ad1b9db3,
8b8e06eb-594e-11e4-99d3-822b0d796705,
9fd32195-594f-11e4-937f-2b0004fbf107,
cd7de3db-594f-11e4-bc4d-c70bf0d0977b,
f19c53ac-9c38-11e4-a5cb-df74780de7f1,
f814b175-594e-11e4-b14e-becb87dc9620,
})
150226 0:11:04 [Note] WSREP: view((empty))
150226 0:11:04 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150226 0:11:04 [Note] WSREP: gcomm: closed
150226 0:11:04 [Note] WSREP: Flow-control interval: [16, 16]
150226 0:11:04 [Note] WSREP: Received NON-PRIMARY.
150226 0:11:04 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 4190686)
150226 0:11:04 [Note] WSREP: Received self-leave message.
150226 0:11:04 [Note] WSREP: Flow-control interval: [0, 0]
150226 0:11:04 [Note] WSREP: Received SELF-LEAVE. Closing connection.
150226 0:11:04 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 4190686)
150226 0:11:04 [Note] WSREP: RECV thread exiting 0: Success
150226 0:11:04 [Note] WSREP: recv_thread() joined.
150226 0:11:04 [Note] WSREP: Closing replication queue.
150226 0:11:04 [Note] WSREP: Closing slave action queue.
150226 0:11:04 [Note] WSREP: /mysql/home/products/mysql/bin/mysqld: Terminated.

log from the node 1, which shows it can’t connect to all other node after 0:11:04

150226 0:11:04 [Note] WSREP: (8b8e06eb-594e-11e4-99d3-822b0d796705, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.126.52.43:4567
150226 0:11:05 [Note] WSREP: (8b8e06eb-594e-11e4-99d3-822b0d796705, ‘tcp://0.0.0.0:4567’) reconnecting to adb89b17-5a25-11e4-833d-c63263a4e8ac (tcp://10.126.52.43:4567), attempt 0
150226 0:11:05 [Note] WSREP: declaring 3630e80f-594f-11e4-9fd8-b2a0ad1b9db3 stable
150226 0:11:05 [Note] WSREP: declaring 9fd32195-594f-11e4-937f-2b0004fbf107 stable
150226 0:11:05 [Note] WSREP: declaring cd7de3db-594f-11e4-bc4d-c70bf0d0977b stable
150226 0:11:05 [Note] WSREP: declaring f19c53ac-9c38-11e4-a5cb-df74780de7f1 stable
150226 0:11:05 [Note] WSREP: declaring f814b175-594e-11e4-b14e-becb87dc9620 stable
150226 0:11:05 [Note] WSREP: Node 3630e80f-594f-11e4-9fd8-b2a0ad1b9db3 state prim
150226 0:11:05 [Note] WSREP: declaring cd7de3db-594f-11e4-bc4d-c70bf0d0977b stable
150226 0:11:05 [Note] WSREP: declaring f19c53ac-9c38-11e4-a5cb-df74780de7f1 stable
150226 0:11:05 [Note] WSREP: Node 8b8e06eb-594e-11e4-99d3-822b0d796705 state prim
150226 0:11:05 [Warning] WSREP: 8b8e06eb-594e-11e4-99d3-822b0d796705 sending install message failed: Resource temporarily unavailable

Thanks

Are you sure this table was using InnoDB engine? The most likely issue would be either MyISAM table or some replication filters in the config file.