Last_Errno: 1062, Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS'

Hello, I’m getting following error on replica after replica server reboot.
How do I fix this issue?

*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: x.x.x.x
                  Master_User: repl
                  Master_Port: yyyy
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000123
          Read_Master_Log_Pos: 568689574
               Relay_Log_File: mysql-relay-bin.000170
                Relay_Log_Pos: 12207
        Relay_Master_Log_File: mysql-bin.000115
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000115, end_log_pos 468117420. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 468117043
              Relay_Log_Space: 8424781894
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1062
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000115, end_log_pos 468117420. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 421740
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 241203 19:43:27
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.01 sec)

You should try to check this table for more detailed error.
It is duplicate key error. Most probably your data is not consistent due to which this error comes in.
You should use pt-table-checksum to find the differences and pt-table-sync to sync replica if there are any differences found.

If nothing has been touched anywhere manually and the only thing which happened was that replica server was rebooted then shouldn’t MySQL be able to automatically take care of these things?

Almost like every-time something happens the replication breaks. What is the point of replication if it keeps breaking every now and then?

Is there any other way to replicate mysql which just works and we don’t have to manually fix it all the time?

@roger,

How do I fix this issue?

MySQL reports an error because it saw that the binlog position has the entry that is already present and hence the duplicate key error.
You need to understand what exactly is that table / values reporting duplicate key error.
This is how the debugging starts and Yunus mentioned the table you need to look for exact error.
select * from performance_schema.replication_applier_status_by_worker;

If nothing has been touched anywhere manually and the only thing which happened was that replica server was rebooted then shouldn’t MySQL be able to automatically take care of these things?

If nothing has been touched then it shouldn’t go haywire. That said it did. We need to figure out why! Was the shutdown of MySQL clean? What’s sync_binlog setting? Was it a database crash?

Almost like every-time something happens the replication breaks. What is the point of replication if it keeps breaking every now and then?

This is semi-sync replication and things break when somethings are not right. It reported error as it saw inconsistency!

Is there any other way to replicate mysql which just works and we don’t have to manually fix it all the time?

I can close my eyes and tell you to use PXC (Percona XtraDB Cluster / Galera) which is a synchronous replication. But that’s not how you choose your technology and there are a lot of things to consider. You better answer the problem you face, dig deeper and identify the root cause.

2 Likes

Adding to what Kedar mentioned,
You can also take a look at these settings mentioned to build a crash safe replication,

https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-unexpected-replica-halt.html

I did a fresh sync setup and encountered the same error. So what’s next to check?