I have a 3 node PXC cluster, which was just upgraded to Mysql 8.0.29 from 5.7.
I also have an asynchronous slave running separately, also running Mysql 8.0.29.
The slave has been running 8.0.29 for a few weeks.
After the upgrade of the PXC cluster, I needed to do a resync of the slave, which I did using xtrabackup.
After preparing the backup, I start the slave and it starts. But it does not receive any updates from the master.
Show slave status shows no errors.
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Slave_IO_State: Waiting for source to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
I used auto_position to set the master’s position. I also tried setting the coordinates manually. Neither worked.
There are no errors on the master or the slave, and no messages in the mysqld.log file.
The firewall ports are open.
This exact process has been run before the upgrade with no issue.
Hi @MeirW2 , from the status shared, it seems that replication IS running, but it’s just not getting anything nre from the primary, maybe you need to confirm if the primary in the replica is pointed to the right IP, also confirm if there are writes to be replicated.
If that seems ok, please share the complete output from the SHOW SLAVE STATUS\G command so we have more info to troubleshoot; also are you using GTIDs?
Best,
Mauricio.
Thanks for this.
Based on the SHOW MASTER STATUS and Executed_Gtid_Set in the replica, you can see there’s nothing “new” in the primary that needs to be replicated.
Brief summary on GTIDs and auto_position:
The replica retrieves the Executed_Gtid_Set from the primary, and compares it with its own Executed_Gtid_Set, and if there’s some GTIDs in the primary missing in the replica, it asks for them.
You can manually compare the Executed_Gtid_Set (from master status in the primary and the show slave status in the replica) and you’ll see there’s nothing missing; actually the replica has more GTIDs executed.
Take this for example: 772fd0ef-db6b-11e7-8394-de7434f4bcd0
In the primary, the GTIDs executed are: 1-62636948
In the replica, the GTIDs executed are: 1-91824198.
This means the replica is ahead from its primary; also your replica has this GTID sets executed, that aren’t in your primary:
Bottom line, the replication is working, but your data may not be as you think it is. This may be a backup isseue, or maybe your primary is not replicating properly?
The master is a very busy machine, with very frequent updates.
Primary is working properly.
I tried the backup via both xtrabackup (several times). I also tried doing a fullstop to the master and rsyncing over the files (took forever…), and got the same result.
I think the issue must be on the primary not sending over the events, but why? The config, as you see above, is pretty vanilla.
The binlogs are being written to the master server with seemingly no issue.
Do you have “log_replica_updates” enabled on every node of the PXC?
Do you have a different server_id in every node in the topology? (i.e all pxc nodes and all the replicas must have a different server_id each).
Please check above and confirm
After another round of apt updates and a reboot, it seems that replication is now working as expected. I assume something was stuck in limbo, and needed a reboot to settle.