Percona delayed replica problem

Hi,

we just upgraded 3 node percona xtradb cluster 5.7 to 8.0.35 (GTID=ON) running on debian 11.

We have an extra host with percona server 8.0.35 as delayed replica ( [Replication with Global Transaction Identifiers). Everything was working fine until master upgrade. Now the delay replica is failing over and over with:

2024-02-20T20:49:06.058500Z 25 [ERROR] [MY-010584] [Repl] Replica SQL for channel ‘’: Worker 1 failed executing transaction ‘200be2d2-10da-11e4-b7d3-3bbbce82e394:10788071’ at source logbinlog.000002, end_log_pos 16805332; Could not execute Update_rows event on table live.events; Can’t find record in ‘events’, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event’s source log binlog.000002, end_log_pos 16805332, Error_code: MY-001032

This transaction is an update query for some field in table live.events.

After analysis we found out that the transaction with insert query (should happen before update transaction) was just skipped (not committed on delay replica) , so the transaction above can’t update field in table.

If we don’t use delay on replication it works fine.

Any idea where the problem is?

Hi Sven,
Do you mean that the replica is committing in incorrect order?
What is the value of replica-preserve-commit-order on a replica?

Hey Yunus,

sorry for late answer, was testing different setups. Yes, it looks like the replica is not committing in the right order.

We tested 2 delayed replicas, replicating from same master with different settings:

slave_parallel_workers | 4
replica_preserve_commit_order | ON

and

slave_parallel_workers | 1
replica_preserve_commit_order | ON

both replicas failing with the same error.

Replica_IO_Running: Yes
Replica_SQL_Running: No

Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘200be2d2-10da-11e4-b7d3-3bbbce82e394:18845523’ at source log binlog.006029, end_log_pos 152871615. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

I fixed the problem. I can’t tell which option in mysql conf was causing it, but after changing the config delayed replica runs without problem. The working config:
[mysqld@db4h]
socket = /run/mysqld_db4h/mysqld.sock
pid-file = /run/mysqld_db4h/mysqld.pid
datadir = /data/db4h/mysql-data
tmpdir = /data/db4h/mysql-tmp
log-error = /var/log/mysql/db4h.err
general-log-file = /var/log/mysql/db4h.log
bind-address = 0.0.0.0
port = 3306
character-set-server = utf8mb3
collation-server = utf8_unicode_ci
mysqlx = 0
default-authentication-plugin = mysql_native_password
sql-mode = NO_ENGINE_SUBSTITUTION
log_error_verbosity = 2
super_read_only = 1
lc-messages-dir = /usr/share/mysql
skip-external-locking
max_allowed_packet = 1G
thread_stack = 192K
thread_cache_size = 8
innodb_redo_log_capacity = 500M
innodb_log_buffer_size = 16M
innodb_buffer_pool_size = 4G
max-heap-table-size = 16G
log_bin = binlog
relay_log = relay-bin
max_relay_log_size = 10M
max_binlog_size = 1G
log_replica_updates = 1
gtid_mode = ON
enforce_gtid_consistency
relay_log_recovery = ON
binlog_expire_logs_seconds = 864000
server-id = 100