We have a containerized three node Galera cluster running on a Percona XtraDB cluster ( 8.0.32-24.1 Percona XtraDB Cluster (GPL), Release rel24, Revision 793b5d9, WSREP version 26.1.4.3). Each node is running in a container managed by OpenShift. The primary cluster is then using MySQL asynchronous replication to replicate to another three node Galera cluster using the same version as the primary.
We ran into an issue in which we needed to restore a backup of the primary instance on the downstream replicas. However, after the restore, replication remains in a broken state. The show replica error notes that replication was stopped due to a transaction failure. In reviewing the replication_applier_status_by_worker table in the performance schema, the error message indicates the error occurred because a row was to be deleted from a table in the replica, but that row doesn’t exist. The DBA attempted to bypass the issue by inserting an empty transaction for the affected gtid’s; however, there are numerous errors. We tried several times to restore a backup of the primary onto the replica, but we keep experiencing the same problem. We’re using the XtraBackup operator to take a physical backup of the database and the backup is being stored in an AWS S3 bucket. We need help in figuring out how to resolve this issue.
mysql> show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: a6bd61dd0a27847d58378a12ad3f02f5-6e281c2d79690133.elb.us-west-2.amazonaws.com
Source_User: replication
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binlog.036032
Read_Source_Log_Pos: 64708767
Relay_Log_File: wordpress-db-pxc-0-relay-bin-awspxc1_to_bmhpxc1.000002
Relay_Log_Pos: 774790
Relay_Source_Log_File: binlog.036030
Replica_IO_Running: Yes
Replica_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘5b000228-6cab-11ee-8464-43850378d433:14108214’ at master log binlog.036030, end_log_pos 35473386. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Source_Log_Pos: 35443152
Relay_Log_Space: 127190849
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: Yes
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘5b000228-6cab-11ee-8464-43850378d433:14108214’ at master log binlog.036030, end_log_pos 35473386. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Source_Server_Id: 41819022
Source_UUID: b60c9819-6cab-11ee-9a78-0a580a21032d
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State:
Source_Retry_Count: 5
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 240124 22:04:40
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set: 5b000228-6cab-11ee-8464-43850378d433:14106199-14356732
Executed_Gtid_Set: 564d3a9b-bad7-11ee-a81e-0a580a8005f6:1-9,
5b000228-6cab-11ee-8464-43850378d433:1-14108213,
5e454f04-bad7-11ee-8945-e255bccf8478:1-19869,
6235c032-b942-11ed-863e-865d3b297558:1-10076905
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name: awspxc1_to_bmhpxc1
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
mysql> select * from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
CHANNEL_NAME: awspxc1_to_bmhpxc1
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1032
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction ‘5b000228-6cab-11ee-8464-43850378d433:14108214’ at master log binlog.036030, end_log_pos 35473386; Could not execute Delete_rows event on table gaia_wordpress_stage.wp_gaia_queue; Can’t find record in ‘wp_gaia_queue’, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event’s master log binlog.036030, end_log_pos 35473386
LAST_ERROR_TIMESTAMP: 2024-01-24 22:04:40.510076
LAST_APPLIED_TRANSACTION: 5b000228-6cab-11ee-8464-43850378d433:14108213
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2024-01-24 16:48:10.701958
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2024-01-24 16:48:10.701958
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2024-01-24 22:04:40.508267
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2024-01-24 22:04:40.509862
APPLYING_TRANSACTION: 5b000228-6cab-11ee-8464-43850378d433:14108214
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2024-01-24 16:48:10.708629
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2024-01-24 16:48:10.708629
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2024-01-24 22:04:40.509889
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
CHANNEL_NAME: awspxc1_to_bmhpxc1
WORKER_ID: 2
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION:
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 3. row ***************************
CHANNEL_NAME: awspxc1_to_bmhpxc1
WORKER_ID: 3
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION:
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 4. row ***************************
CHANNEL_NAME: awspxc1_to_bmhpxc1
WORKER_ID: 4
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION:
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
4 rows in set (0.00 sec)