Mysql 8 Replication Troubles

I have a 3 node PXC cluster, which was just upgraded to Mysql 8.0.29 from 5.7.

I also have an asynchronous slave running separately, also running Mysql 8.0.29.

The slave has been running 8.0.29 for a few weeks.

After the upgrade of the PXC cluster, I needed to do a resync of the slave, which I did using xtrabackup.

After preparing the backup, I start the slave and it starts. But it does not receive any updates from the master.

Show slave status shows no errors.
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Slave_IO_State: Waiting for source to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

I used auto_position to set the master’s position. I also tried setting the coordinates manually. Neither worked.

There are no errors on the master or the slave, and no messages in the mysqld.log file.

The firewall ports are open.
This exact process has been run before the upgrade with no issue.

Any help would be greatly appreciated.

1 Like

Hi @MeirW2 , from the status shared, it seems that replication IS running, but it’s just not getting anything nre from the primary, maybe you need to confirm if the primary in the replica is pointed to the right IP, also confirm if there are writes to be replicated.

If that seems ok, please share the complete output from the SHOW SLAVE STATUS\G command so we have more info to troubleshoot; also are you using GTIDs?
Best,
Mauricio.

1 Like

Yes, using GTIDs. IPs are open in both directions between the machines.
This is the SLAVE STATUS:

************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 209.222.9.148
                  Source_User: slave_user
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000062
          Read_Source_Log_Pos: 89134941
               Relay_Log_File: CentOS-74-64-minimal-relay-bin.000007
                Relay_Log_Pos: 8927131
        Relay_Source_Log_File: binlog.000062
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 89134941
              Relay_Log_Space: 8927400
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File: 
           Source_SSL_CA_Path: 
              Source_SSL_Cert: 
            Source_SSL_Cipher: 
               Source_SSL_Key: 
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
 Replicate_Ignore_Server_Ids: 
             Source_Server_Id: 4
                  Source_UUID: f07b9880-5ea6-11ed-97a6-3cecef6a3314
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Source_Retry_Count: 86400
                  Source_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Source_SSL_Crl: 
           Source_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 1bb74cdf-588a-11ed-bbe9-3cecefb9c2c0:1-5,
2ea7d4d5-db8c-11e7-a395-0cc47aa8ab3e:1-96,
4a545cd1-29bc-11ec-9bdc-3cecef46a4b8:1-4415196,
4d154a62-29bc-11ec-b634-3cecef6b0a8a:1-818469,
52bf7592-5a6d-11ed-9f81-001e67c47b04:1,
59ce7551-b19c-11ec-8d08-001e67c47b04:1-13,
772fd0ef-db6b-11e7-8394-de7434f4bcd0:1-91824198,
88d02f10-2494-ee18-7c6b-218bcb0b432f:1-3873719826,
b8213878-b0eb-11ec-8bba-3cecef46a4b8:1-19,
c7063da6-4f9c-11e9-924d-001e67c47b04:1-4,
fc9c4ee0-24cc-ee18-7862-69b0adc2fe43:1-1377216
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Source_TLS_Version: 
       Source_public_key_path: 
        Get_Source_public_key: 0
            Network_Namespace: 
1 row in set (0.00 sec)

This is the show master status (on the master):

mysql> show master status\G;
*************************** 1. row ***************************
             File: binlog.000062
         Position: 110631339
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 1bb74cdf-588a-11ed-bbe9-3cecefb9c2c0:1-5,
2ea7d4d5-db8c-11e7-a395-0cc47aa8ab3e:1-96,
4a545cd1-29bc-11ec-9bdc-3cecef46a4b8:1-4415196,
4d154a62-29bc-11ec-b634-3cecef6b0a8a:1-818469,
772fd0ef-db6b-11e7-8394-de7434f4bcd0:1-62636948,
88d02f10-2494-ee18-7c6b-218bcb0b432f:1-3873719826,
b8213878-b0eb-11ec-8bba-3cecef46a4b8:1-19,
fc9c4ee0-24cc-ee18-7862-69b0adc2fe43:1-1377216
1 row in set (0.00 sec)
1 Like
mysql> select * from performance_schema.replication_connection_status\G
*************************** 1. row ***************************
                                      CHANNEL_NAME: 
                                        GROUP_NAME: 
                                       SOURCE_UUID: f07b9880-5ea6-11ed-97a6-3cecef6a3314
                                         THREAD_ID: 300
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 1789621
                          LAST_HEARTBEAT_TIMESTAMP: 2022-11-13 12:15:26.252162
                          RECEIVED_TRANSACTION_SET: 
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE: 
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: 
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
                              QUEUEING_TRANSACTION: 
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)

1 Like

Here’s the mysqld.cnf file on the master

[mysqld]
pid-file=/var/run/mysqld/mysqld.pid
gtid_mode=ON
log_bin=/var/log/mysql/binlog
log_slave_updates=1
binlog_row_image=minimal
enforce_gtid_consistency
expire_logs_days=4
server_id=4         
skip-name-resolve=true
binlog_format                  = ROW
open_files_limit=150000
innodb_buffer_pool_size        = 55G
innodb_flush_log_at_trx_commit = 0
innodb_flush_method            = O_DIRECT
innodb_log_files_in_group      = 2
innodb_log_file_size           = 2G
innodb_file_per_table          = 1
datadir                        = /var/lib/mysql
max_allowed_packet=1073741824
character_set_server=latin1
collation_server=latin1_swedish_ci

wsrep_cluster_address=gcomm://192.168.0.143,192.168.0.148,192.168.0.236
wsrep_provider =/usr/lib/galera4/libgalera_smm.so
wsrep_node_address=192.168.0.148
wsrep_node_name=db11.example.com
wsrep_slave_threads            = 16

innodb_autoinc_lock_mode       = 2
transaction-isolation=READ-COMMITTED
wsrep_sst_method=xtrabackup-v2
pxc_encrypt_cluster_traffic=off
wsrep_cluster_name=centos_cluster
wsrep_provider_options="gcache.size=10G"
thread_cache_size=50
max_connections = 500
sql_mode=IGNORE_SPACE,ERROR_FOR_DIVISION_BY_ZERO
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[sst]
compressor='zstd -2 -T6'
decompressor='zstd -d -T6'
backup_threads=6
[xtrabackup]
parallel=6
1 Like

Thanks for this.
Based on the SHOW MASTER STATUS and Executed_Gtid_Set in the replica, you can see there’s nothing “new” in the primary that needs to be replicated.

Brief summary on GTIDs and auto_position:
The replica retrieves the Executed_Gtid_Set from the primary, and compares it with its own Executed_Gtid_Set, and if there’s some GTIDs in the primary missing in the replica, it asks for them.

You can manually compare the Executed_Gtid_Set (from master status in the primary and the show slave status in the replica) and you’ll see there’s nothing missing; actually the replica has more GTIDs executed.

Take this for example:
772fd0ef-db6b-11e7-8394-de7434f4bcd0
In the primary, the GTIDs executed are: 1-62636948
In the replica, the GTIDs executed are: 1-91824198.

This means the replica is ahead from its primary; also your replica has this GTID sets executed, that aren’t in your primary:

52bf7592-5a6d-11ed-9f81-001e67c47b04:1,
59ce7551-b19c-11ec-8d08-001e67c47b04:1-13,
c7063da6-4f9c-11e9-924d-001e67c47b04:1-4,

Bottom line, the replication is working, but your data may not be as you think it is. This may be a backup isseue, or maybe your primary is not replicating properly?

1 Like

The master is a very busy machine, with very frequent updates.
Primary is working properly.

I tried the backup via both xtrabackup (several times). I also tried doing a fullstop to the master and rsyncing over the files (took forever…), and got the same result.

I think the issue must be on the primary not sending over the events, but why? The config, as you see above, is pretty vanilla.

The binlogs are being written to the master server with seemingly no issue.

1 Like

Hi MeirW2,

Do you have “log_replica_updates” enabled on every node of the PXC?
Do you have a different server_id in every node in the topology? (i.e all pxc nodes and all the replicas must have a different server_id each).
Please check above and confirm

1 Like

Yes, each of the servers has a separate id, and has log_replicaa_updates enabled.

A development - I spun up a brand new server, added it to the cluster, and it was able to serve as a master for replication!

So presumably something in the update process going from 5.7->8 seems to have broken something - maybe a missing library?

It is Ubunutu 20.04, and all the mysql/percona packages between the 2 servers are the same.

1 Like

After another round of apt updates and a reboot, it seems that replication is now working as expected. I assume something was stuck in limbo, and needed a reboot to settle.

1 Like