Hello,
I have 3 nodes cluster. One (NodeB) has been getting evicted and terminated minutes and/or hours after SST completes. Please advise as I am unable to further investigate the cause or fix it. I have tried a few suggestions, including adding it as a new node.
Problem:
Unexpected eviction of cluster nodedue to error executing row event [B]?
Analysis:[/B]
frequently see lots of InnoDB [Warning] indicating "[COLOR=#FF0000]Cannot open table from internal data dictionary tough .frm file exists
AND
followed by the following Error executing row event as below:
[ERROR] [COLOR=#FF0000]Slave SQL: Error executing row event: ‘Table ‘’.’ doesn’t exist’, Error_code: 1146
…
[ERROR] WSREP: Failed to apply trx: source: …
[ERROR] WSREP: Failed to apply trx 10703962 4 times
[ERROR] WSREP: [COLOR=#0000CD]Node consistency compromised, aborting…
This then lead to complete eviction and termination as follows:
2019-05-13T19:24:45.626814-05:00 2 [Note] WSREP: turning isolation on
2019-05-13T19:24:45.626926-05:00 2 [Note] WSREP: Closing send monitor…
2019-05-13T19:24:45.626968-05:00 2 [Note] WSREP: Closed send monitor.
2019-05-13T19:24:45.627056-05:00 2 [Note] WSREP: gcomm: terminating thread
2019-05-13T19:24:45.627301-05:00 2 [Note] WSREP: gcomm: joining thread
2019-05-13T19:24:45.627344-05:00 2 [Note] WSREP: gcomm: closing backend
2019-05-13T19:24:46.056813-05:00 2 [Note] WSREP: (111f9965, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://:4567 tcp://:4567
2019-05-13T19:24:47.326144-05:00 0 [Note] InnoDB: Buffer pool(s) load completed at 190513 19:24:47
2019-05-13T19:24:50.627502-05:00 2 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2019-05-13T19:24:50.627538-05:00 2 [Note] WSREP: declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)
2019-05-13T19:24:50.627548-05:00 2 [Note] WSREP: evs::proto(111f9965, LEAVING, view_id(REG,02457e1e,240)) suspecting node: 02457e1e
2019-05-13T19:24:50.627553-05:00 2 [Note] WSREP: evs::proto(111f9965, LEAVING, view_id(REG,02457e1e,240)) suspected node without join message, declaring inactive
2019-05-13T19:24:50.627559-05:00 2 [Note] WSREP: evs::proto(111f9965, LEAVING, view_id(REG,02457e1e,240)) suspecting node: fc467d1a
2019-05-13T19:24:50.627563-05:00 2 [Note] WSREP: evs::proto(111f9965, LEAVING, view_id(REG,02457e1e,240)) suspected node without join message, declaring inactive
2019-05-13T19:24:50.627581-05:00 2 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,02457e1e,240)
memb {
111f9965,0
}
joined {
}
left {
}
partitioned {
02457e1e,0
fc467d1a,0
}
)
2019-05-13T19:24:50.627630-05:00 2 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2019-05-13T19:24:50.627789-05:00 2 [Note] WSREP: gcomm: closed
2019-05-13T19:24:50.627866-05:00 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2019-05-13T19:24:50.627883-05:00 0 [Note] WSREP: Flow-control interval: [100, 100]
2019-05-13T19:24:50.627887-05:00 0 [Note] WSREP: Trying to continue unpaused monitor
2019-05-13T19:24:50.627890-05:00 0 [Note] WSREP: Received NON-PRIMARY.
2019-05-13T19:24:50.627893-05:00 0 [Note] WSREP: Shifting SYNCED → OPEN (TO: 10703962)
2019-05-13T19:24:50.627900-05:00 0 [Note] WSREP: Received self-leave message.
2019-05-13T19:24:50.627935-05:00 0 [Note] WSREP: Flow-control interval: [0, 0]
2019-05-13T19:24:50.627938-05:00 0 [Note] WSREP: Trying to continue unpaused monitor
2019-05-13T19:24:50.627940-05:00 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2019-05-13T19:24:50.627942-05:00 0 [Note] WSREP: Shifting OPEN → CLOSED (TO: 10703962)
2019-05-13T19:24:50.627948-05:00 0 [Note] WSREP: RECV thread exiting 0: Success
2019-05-13T19:24:50.627979-05:00 2 [Note] WSREP: recv_thread() joined.
2019-05-13T19:24:50.627985-05:00 2 [Note] WSREP: Closing replication queue.
2019-05-13T19:24:50.627992-05:00 2 [Note] WSREP: Closing slave action queue.
2019-05-13T19:24:50.627999-05:00 2 [Note] WSREP: /usr/sbin/mysqld: Terminated.