Restarting servers causes cluster to die with duplicate keys

edwhedwh ContributorActive Member Participant
I've noticed this a couple of times, but caught a good example today.

I wanted to apply patches to the 3 machines in my cluster. So I then stopped mysql. I watched the log to see that it terminated gracefully, then rebooted the machines and watched it re-enter the cluster. Then repeated on the 2 other machines. The restart was quick, so the node caught up with an IST.

A few hours later, I saw a termination with logs like this.

[HTML]2019-12-02T19:49:11.916587Z 19 [ERROR] Slave SQL: Could not execute Write_rows event on table iznik.users_emails; Duplicate entry 'redacted' for key 'email', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 327, Error_code: 1062
2019-12-02T19:49:11.916607Z 19 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 3114871165
2019-12-02T19:49:11.916707Z 19 [Note] WSREP: Applier statement rollback needed
2019-12-02T19:49:11.916749Z 19 [Warning] WSREP: Failed to apply app buffer: seqno: 3114871165, status: 1
at galera/src/trx_handle.cpp:apply():353
Retrying 2th time
2019-12-02T19:49:11.917039Z 19 [ERROR] Slave SQL: Could not execute Write_rows event on table iznik.users_emails; Duplicate entry 'redacted' for key 'email', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 327, Error_code: 1062
2019-12-02T19:49:11.917054Z 19 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 3114871165
2019-12-02T19:49:11.917175Z 19 [Note] WSREP: Applier statement rollback needed
2019-12-02T19:49:11.917222Z 19 [Warning] WSREP: Failed to apply app buffer: seqno: 3114871165, status: 1
at galera/src/trx_handle.cpp:apply():353
Retrying 3th time
2019-12-02T19:49:11.917488Z 19 [ERROR] Slave SQL: Could not execute Write_rows event on table iznik.users_emails; Duplicate entry 'redacted' for key 'email', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 327, Error_code: 1062
2019-12-02T19:49:11.917503Z 19 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 3114871165
2019-12-02T19:49:11.917611Z 19 [Note] WSREP: Applier statement rollback needed
2019-12-02T19:49:11.917656Z 19 [Warning] WSREP: Failed to apply app buffer: seqno: 3114871165, status: 1
at galera/src/trx_handle.cpp:apply():353
Retrying 4th time
2019-12-02T19:49:11.917843Z 19 [ERROR] Slave SQL: Could not execute Write_rows event on table iznik.users_emails; Duplicate entry 'redacted' for key 'email', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 327, Error_code: 1062
2019-12-02T19:49:11.917857Z 19 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 3114871165
2019-12-02T19:49:11.917940Z 19 [Note] WSREP: Applier statement rollback needed
2019-12-02T19:49:11.930674Z 0 [Note] WSREP: (67df1a46, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://89.16.163.155:4567
2019-12-02T19:49:11.941964Z 19 [ERROR] WSREP: Failed to apply trx: source: 69719331-151a-11ea-883e-5aafee641567 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 1623989 trx_id: 21950552 seqnos (l: 352675, g: 3114871165, s: 3114871164, d: 3114871134, ts: 14780336992255)
2019-12-02T19:49:11.941989Z 19 [ERROR] WSREP: Failed to apply trx 3114871165 4 times
2019-12-02T19:49:11.941995Z 19 [ERROR] WSREP: Node consistency compromised, aborting...
[/HTML]

All three nodes terminated and I had to bootstrap the cluster again and force an SST on each of the other two nodes to ensure I had databases that were in sync.

Has anyone seen anything like that?

Comments

Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.