Replication Stuck Again - No Error Happening

I have a replication node that keeps getting stuck. There is no error, it just gets stuck. It is now 2 days behind. There are other replication nodes replicating from the same server and they are not getting stuck.

Here is the output:

show slave status: → Exec_Master_Log_Pos: 277989746

Binlogs:

at 277989715

#231017 17:58:05 server id xxx end_log_pos 277989746 CRC32 0x03570857 Xid = 30752073692
COMMIT/!/;
→ STUCK HERE # at 277989746
#231017 17:58:05 server id xxx end_log_pos 277989811 CRC32 0x326ec1a7 Anonymous_GTID last_committed=302991 sequence_number=302992 rbr_only=yes
/!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED//!/;
SET @@SESSION.GTID_NEXT= ‘ANONYMOUS’/!/;

at 277989811

#231017 17:58:05 server id xxx end_log_pos 277989885 CRC32 0x6607991e Query thread_id=28202941 exec_time=0 error_code=0
SET TIMESTAMP=1697579885/!/;
BEGIN
/!/;

at 277989885…

All that is happening for this specific bin log is:
/!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED//!/;
SET @@SESSION.GTID_NEXT= ‘ANONYMOUS’/!/;

Any ideas/thoughts/solutions welcome please.

Thanks.

Hi Chris,

Thanks for writing to us.

All that is happening for this specific bin log is:
/!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED //! /;
SET @@SESSION.GTID_NEXT= ‘ANONYMOUS’/! /;

It appears to be the empty event and just happens at the beginning of the transaction. Ideally this should not trigger the issue.

Exactly what database version you are running with this node ? Are you running the same database version on other slave where the lag is not observed ?

If feasible, can you please attach the below information from that particular bin log file ?

mysql> SHOW BINLOG EVENTS IN 'xxx-mysql-bin.log'

Also, if you still facing the lag or if captured previously you can paste the below details as well for a review ?

SHOW ENGINE INNODB STATUS\G;
SHOW FULL PROCESSLIST;
SHOW SLAVE STATUS\G; 
SELECT * FROM PERFORMANCE_SCHEMA.replication_applier_status_by_worker\G;

And below details in order to check if running tables without primary keys or not ?

SELECT 
a.TABLE_SCHEMA,
a.TABLE_NAME,
a.TABLE_ROWS
FROM information_schema.tables a
LEFT JOIN (SELECT TABLE_SCHEMA,TABLE_NAME FROM information_schema.statistics GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
HAVING SUM(CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 end) = COUNT(*)) b
ON a.TABLE_SCHEMA=b.TABLE_SCHEMA AND a.TABLE_NAME=b.TABLE_NAME
WHERE b.TABLE_NAME IS NULL
AND a.TABLE_TYPE='BASE TABLE'
AND a.TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys')

For a temporal workaround did you tried to relax some acid properties ?

sync_relay_log_info = 10000
sync_relay_log = 10000
sync_master_info = 10000
sync_binlog = 0
innodb_flush_log_at_trx_commit = 2
innodb_change_buffering = none

Lastly, please paste the database/mysql error logs as well for a quick look over.

This ended up being an issue with a team querying this instance with insert/select. Unfortunately it held up replication. Thanks.

This ended up being an issue with a team querying this instance with insert/select. Unfortunately it held up replication.

Sorry but I didn’t get this. Is that some INSERT queries were also running in this replication instance ? Could you please elaborate the current situation/Issue ?

Did you tried the temporal workaround in order to release some replication lag ?

Still, If you facing any issues, you can share the previous requested details along with mysql error log files for a review.

Regards,
Anil