Not the answer you need?
Register and ask your own question!

whole cluster crashed due to table not synced

andymaoandymao EntrantCurrent User Role Beginner
Hi All,
We had an 7 nodes cluster which crossing three datacenter. with 2/2/3 on different datacenter.
we got problem as the whole cluster not function due to table not synced with 11 minutes delay..

the table was first created on node 1 with the time 2015-02-25 23:59:50, but it not synced to other nodes immediately, and other node failed at 0:11:00 when there is some operation on the table. this make the node 1 an standalone node.

we want to understanding why the replication not replicated, is there any monitoring metric could be alerting in such case ?
any bug for such case?

logs from other 6 nodes

150226 0:11:00 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 4190686)
150226 0:11:00 [ERROR] Slave SQL: Error executing row event: 'Table 'keystone.credential' doesn't exist', Error_code: 1146
150226 0:11:00 [Warning] WSREP: RBR event 4963 Write_rows apply warning: 1146, 4190686
150226 0:11:00 [Warning] WSREP: Failed to apply app buffer: seqno: 4190686, status: 1
at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
..........
..........
150226 0:11:04 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 4190686)
150226 0:11:04 [ERROR] Slave SQL: Error executing row event: 'Table 'keystone.credential' doesn't exist', Error_code: 1146
150226 0:11:04 [Warning] WSREP: RBR event 4963 Write_rows apply warning: 1146, 4190686
150226 0:11:04 [ERROR] WSREP: Failed to apply trx: source: 8b8e06eb-594e-11e4-99d3-822b0d796705 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 15113818 trx_id: 4453338 seqnos (l: 4198354, g: 4190686, s: 4190685, d: 4190660, ts: 1424934617949390287)
150226 0:11:04 [ERROR] WSREP: Failed to apply trx 4190686 10 times
150226 0:11:04 [ERROR] WSREP: Node consistency compromized, aborting...
150226 0:11:04 [Note] WSREP: Closing send monitor...
150226 0:11:04 [Note] WSREP: Closed send monitor.
150226 0:11:04 [Note] WSREP: gcomm: terminating thread
150226 0:11:04 [Note] WSREP: gcomm: joining thread
150226 0:11:04 [Note] WSREP: gcomm: closing backend
150226 0:11:04 [Note] WSREP: view(view_id(NON_PRIM,3630e80f-594f-11e4-9fd8-b2a0ad1b9db3,33) memb {
adb89b17-5a25-11e4-833d-c63263a4e8ac,
} joined {
} left {
} partitioned {
3630e80f-594f-11e4-9fd8-b2a0ad1b9db3,
8b8e06eb-594e-11e4-99d3-822b0d796705,
9fd32195-594f-11e4-937f-2b0004fbf107,
cd7de3db-594f-11e4-bc4d-c70bf0d0977b,
f19c53ac-9c38-11e4-a5cb-df74780de7f1,
f814b175-594e-11e4-b14e-becb87dc9620,
})
150226 0:11:04 [Note] WSREP: view((empty))
150226 0:11:04 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150226 0:11:04 [Note] WSREP: gcomm: closed
150226 0:11:04 [Note] WSREP: Flow-control interval: [16, 16]
150226 0:11:04 [Note] WSREP: Received NON-PRIMARY.
150226 0:11:04 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 4190686)
150226 0:11:04 [Note] WSREP: Received self-leave message.
150226 0:11:04 [Note] WSREP: Flow-control interval: [0, 0]
150226 0:11:04 [Note] WSREP: Received SELF-LEAVE. Closing connection.
150226 0:11:04 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 4190686)
150226 0:11:04 [Note] WSREP: RECV thread exiting 0: Success
150226 0:11:04 [Note] WSREP: recv_thread() joined.
150226 0:11:04 [Note] WSREP: Closing replication queue.
150226 0:11:04 [Note] WSREP: Closing slave action queue.
150226 0:11:04 [Note] WSREP: /mysql/home/products/mysql/bin/mysqld: Terminated.

log from the node 1, which shows it can't connect to all other node after 0:11:04
150226 0:11:04 [Note] WSREP: (8b8e06eb-594e-11e4-99d3-822b0d796705, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.126.52.43:4567
150226 0:11:05 [Note] WSREP: (8b8e06eb-594e-11e4-99d3-822b0d796705, 'tcp://0.0.0.0:4567') reconnecting to adb89b17-5a25-11e4-833d-c63263a4e8ac (tcp://10.126.52.43:4567), attempt 0
150226 0:11:05 [Note] WSREP: declaring 3630e80f-594f-11e4-9fd8-b2a0ad1b9db3 stable
150226 0:11:05 [Note] WSREP: declaring 9fd32195-594f-11e4-937f-2b0004fbf107 stable
150226 0:11:05 [Note] WSREP: declaring cd7de3db-594f-11e4-bc4d-c70bf0d0977b stable
150226 0:11:05 [Note] WSREP: declaring f19c53ac-9c38-11e4-a5cb-df74780de7f1 stable
150226 0:11:05 [Note] WSREP: declaring f814b175-594e-11e4-b14e-becb87dc9620 stable
150226 0:11:05 [Note] WSREP: Node 3630e80f-594f-11e4-9fd8-b2a0ad1b9db3 state prim
150226 0:11:05 [Note] WSREP: declaring cd7de3db-594f-11e4-bc4d-c70bf0d0977b stable
150226 0:11:05 [Note] WSREP: declaring f19c53ac-9c38-11e4-a5cb-df74780de7f1 stable
150226 0:11:05 [Note] WSREP: Node 8b8e06eb-594e-11e4-99d3-822b0d796705 state prim
150226 0:11:05 [Warning] WSREP: 8b8e06eb-594e-11e4-99d3-822b0d796705 sending install message failed: Resource temporarily unavailable

Thanks

Comments

  • przemekprzemek Percona Support Engineer Percona Staff Role
    Are you sure this table was using InnoDB engine? The most likely issue would be either MyISAM table or some replication filters in the config file.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.