Wan implementation

matthiaz · October 23, 2017, 7:30am

I wanted to chime in, because I had a similar error but have a little extra debugging information that might help with the search. I believe we are hitting the same bug in mysql/percona.

As with Jorang, I also have a DC with 2 nodes (node A, node B), and a second DC with 1 node (node C). 3 nodes in total.

No local scripts were running on the nodes. That would explain the inconsistency but as far as I know, that is not happening. And even if it did, the sync should have worked the other way around as well. (multi write master right?)
All nodes are running:
Server version: 5.7.18-15-57 Percona XtraDB Cluster (GPL),
Release rel15, Revision 7693d6e,
WSREP version 29.20, wsrep_29.20
I have checked the release notes of version 5.7.19 but I can’t see any mention of this error.
- Jorang: could you paste your version as well?
All nodes are Ubuntu 16.04
Only node B remained on line.
Node A and Node C were shut down because of a node consistency problem
Upon restart Node A or Node C, a full SST was done. (according to the log it couldn’t do an IST because of an unexpected shutdown)
I got the same HA_ERR_FOUND_DUPP_KEY error but in my case it is clearly a unique index on two columns. Jorang is that “sequence” key something you know? Is it a simple unique key, or a unique key on two columns?
Node A and Node C had the same error in /var/log/mysql/error.log (similar to Jorang’s error):

2017-10-20T08:06:43.877112Z 6 [ERROR] Slave SQL: Could not execute Write_rows event on table himalaya_tdv_renault.inspection; Duplicate entry ‘224-560744’ for key ‘un_inspection_repairorder’, Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event’s master log FIRST, end_log_pos 490, Error_code: 1062
2017-10-20T08:06:43.877135Z 6 [Warning] WSREP: RBR event 5 Write_rows apply warning: 121, 32518800
2017-10-20T08:06:43.877598Z 6 [Warning] WSREP: Failed to apply app buffer: seqno: 32518800, status: 1
at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
…
Retrying 3th time
…
Retrying 4th time
…
2017-10-20T08:06:43.878931Z 6 [ERROR] WSREP: Failed to apply trx 32518800 4 times
2017-10-20T08:06:43.878936Z 6 [ERROR] WSREP: Node consistency compromized, aborting…
2017-10-20T08:06:43.878943Z 6 [Note] WSREP: Closing send monitor…
2017-10-20T08:06:43.878947Z 6 [Note] WSREP: Closed send monitor.
2017-10-20T08:06:43.878953Z 6 [Note] WSREP: gcomm: terminating thread
2017-10-20T08:06:43.878969Z 6 [Note] WSREP: gcomm: joining thread
2017-10-20T08:06:43.879100Z 6 [Note] WSREP: gcomm: closing backend

I’m guessing the developers would like a reproducible use case. But I don’t have one at the moment, I don’t know where to begin… Any idea on how we can narrow this error down?

As with Jorang, our system is running again, but since I don’t know exactly what caused it, I’m pretty sure it can happen again.

Topic		Replies	Views
2 of 3 nodes crash Percona XtraDB Cluster 5.x	7	2828	March 2, 2023
One node in the cluster gets stopped Percona XtraDB Cluster 5.x	7	5321	July 31, 2014
Analysing the cause of a replication related crash Percona XtraDB Cluster 5.x	4	1097	September 1, 2022
Replicate from XtraDB Cluster 5.5 to Cluster 5.6 Percona XtraDB Cluster 5.x	4	610	October 20, 2017
Percona Mysql 8.0.19-10 Slave SQL: Could not execute Update_rows event on table Error_code: MY-001032 Percona XtraDB Cluster 8.x	1	2601	January 11, 2021

Wan implementation

Related topics