Hi All; for just over one year we have been running a XtraDB cluster with three nodes. The solution has been working like a charm up till last week.
We are running: Percona XtraDB 5.6.32-78.1, on top of Ubuntu LTS 14.04 boxes.
The second and third node shut down after a period of time (usually two hours or so), with a message in the error log:
2017-10-17 16:14:02 14110 [ERROR] Slave SQL: Could not execute Update_rows event on table .; Can’t find record in ‘’, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event’s master log FIRST, end_log_pos 486, Error_code: 1032
2017-10-17 16:14:02 14110 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 382793
2017-10-17 16:14:02 14110 [ERROR] WSREP: Failed to apply trx: source: b909f93e-b320-11e7-862e-633a9edd55f6 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 86234 trx_id: 12486901096 seqnos (l: 146732, g: 382793, s: 382792, d: 382780, ts: 10986037136250952)
2017-10-17 16:14:02 14110 [ERROR] WSREP: Failed to apply trx 382793 4 times
2017-10-17 16:14:02 14110 [ERROR] WSREP: Node consistency compromized, aborting…
After which it rightfully stops the mysql process.
If we restart the process, it fetches a full SST from the still running ‘good’ primary node(s) (multiple databases, in total some 60GB, so nothing major). The second and third node then restore and are added back to the cluster. This happens multiple times a day.
When we try to open the GRA_1_382793.log file with mysqlbinlog, it complains that:
ERROR: Error in Log_event::read_log_event(): ‘Found invalid event in binary log’, data_len: 220, event_type: 30
Am I correct in assuming that the binlog which is sent from the master is corrupt? And if so, what are possible resolutions to this? The solution worked flawlessly for well over a year and has only just been playing up. Any help is much appreciated.