we just upgraded 3 node percona xtradb cluster 5.7 to 8.0.35 (GTID=ON) running on debian 11.
We have an extra host with percona server 8.0.35 as delayed replica ( [Replication with Global Transaction Identifiers). Everything was working fine until master upgrade. Now the delay replica is failing over and over with:
2024-02-20T20:49:06.058500Z 25 [ERROR] [MY-010584] [Repl] Replica SQL for channel ‘’: Worker 1 failed executing transaction ‘200be2d2-10da-11e4-b7d3-3bbbce82e394:10788071’ at source logbinlog.000002, end_log_pos 16805332; Could not execute Update_rows event on table live.events; Can’t find record in ‘events’, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event’s source log binlog.000002, end_log_pos 16805332, Error_code: MY-001032
This transaction is an update query for some field in table live.events.
After analysis we found out that the transaction with insert query (should happen before update transaction) was just skipped (not committed on delay replica) , so the transaction above can’t update field in table.
If we don’t use delay on replication it works fine.
sorry for late answer, was testing different setups. Yes, it looks like the replica is not committing in the right order.
We tested 2 delayed replicas, replicating from same master with different settings:
slave_parallel_workers | 4
replica_preserve_commit_order | ON
and
slave_parallel_workers | 1
replica_preserve_commit_order | ON
both replicas failing with the same error.
Replica_IO_Running: Yes
Replica_SQL_Running: No
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘200be2d2-10da-11e4-b7d3-3bbbce82e394:18845523’ at source log binlog.006029, end_log_pos 152871615. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.