MySQL 8 - Replica - Doing but Not?!

So, I’m asking this one to see if anyone else has seen similar in the past / has any suggestions for getting it working again. Currently running on MySQL 8.0.34

I’ve had a mysql source/replica working happily for years, come to it today and the slave is still running but not. Slave_IO_Running - Yes, Slave_SQL_Running - Yes, however, replication isn’t progressing. Its lagged, the time behind is constantly increasing. Load on the replica is negligible.

Stop Replica - never finishes, restarting MySQL never shuts down - the only way to restart the server is to actually reboot the machine now.

Nothing abnormal shown in the logs, the server comes back up and resumes replication without doing anyhting ?

Hi @TimIgoe ,
There are many reasons for that;
1- Replication Thread Hang: Internally hung threads in MySQL can cause the replication threads to appear as running but not actually make progress, especially if you are using MTS and one of the threads hung somehow due to data or heavy load.
2- Long-Running Query on Replica: A query might be stuck in execution due to resource contention or an issue with the query itself, blocking replication progress.
3- Disk or I/O Subsystem Issues: If the disk hosting the relay logs or data directory is facing issues, replication can get stuck.

You might check SHOW ENGINE INNODB STATUS when the issue is happening.

Also, further reading about MTS: A Dive Into MySQL Multi-Threaded Replication - Percona Database Performance Blog

  1. Hangs even after a full server reboot?
  2. Nope, replica is doing nothing other than replicating atm
  3. No logs showing any drive/disk errors

Inno status

=====================================
2024-12-12 14:14:03 140025803597568 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 28 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 3106 srv_active, 0 srv_shutdown, 8081 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 155
OS WAIT ARRAY INFO: signal count 155
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 106541269504
Purge done for trx's n:o < 106541269504 undo n:o < 0 state: running but idle
History list length 1
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421523873735408, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873734600, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873733792, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873732984, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873732176, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873730560, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873731368, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873728944, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873729752, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421523873728136, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests ((null))
I/O thread 1 state: waiting for completed aio requests (insert buffer thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (read thread)
I/O thread 7 state: waiting for completed aio requests (read thread)
I/O thread 8 state: waiting for completed aio requests (read thread)
I/O thread 9 state: waiting for completed aio requests (read thread)
I/O thread 10 state: waiting for completed aio requests (write thread)
I/O thread 11 state: waiting for completed aio requests (write thread)
I/O thread 12 state: waiting for completed aio requests (write thread)
I/O thread 13 state: waiting for completed aio requests (write thread)
I/O thread 14 state: waiting for completed aio requests (write thread)
I/O thread 15 state: waiting for completed aio requests (write thread)
I/O thread 16 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0, 0, 0, 0, 0] , aio writes: [0, 0, 0, 0, 0, 0, 0, 0] ,
 ibuf aio reads:
Pending flushes (fsync) log: 0; buffer pool: 0
184947 OS file reads, 45664 OS file writes, 23592 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 6.20 writes/s, 3.35 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 335, free list len 2406746, seg size 2407082, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 35401603, node heap has 0 buffer(s)
Hash table size 35401603, node heap has 0 buffer(s)
Hash table size 35401603, node heap has 0 buffer(s)
Hash table size 35401603, node heap has 3 buffer(s)
Hash table size 35401603, node heap has 4 buffer(s)
Hash table size 35401603, node heap has 0 buffer(s)
Hash table size 35401603, node heap has 0 buffer(s)
Hash table size 35401603, node heap has 1 buffer(s)
0.43 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number          76392899384621
Log buffer assigned up to    76392899384621
Log buffer completed up to   76392899384621
Log written up to            76392899384621
Log flushed up to            76392899384611
Added dirty pages up to      76392899384621
Pages flushed up to          76392899382837
Last checkpoint at           76392899382837
Log minimum file id is       203813
Log maximum file id is       203813
9852 log i/o's done, 1.17 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 0
Dictionary memory allocated 489041
Buffer pool size   8387852
Free buffers       8203075
Database pages     184769
Old database pages 69421
Modified db pages  21
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 184531, created 238, written 19306
0.00 reads/s, 0.00 creates/s, 2.71 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 184769, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size   131062
Free buffers       128236
Database pages     2826
Old database pages 1063
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2826, created 0, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2826, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 1
Buffer pool size   131060
Free buffers       128200
Database pages     2859
Old database pages 1075
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2859, created 0, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2859, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 2
Buffer pool size   131060
Free buffers       128211
Database pages     2848
Old database pages 1071
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2848, created 0, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2848, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 3
Buffer pool size   131057
Free buffers       128196
Database pages     2860
Old database pages 1075
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2860, created 0, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2860, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 4
<snip... all the rest look similar>
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=7170, Main thread ID=140026366240512 , state=sleeping
Number of rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 24, updated 3899, deleted 24, read 12966
0.00 inserts/s, 0.43 updates/s, 0.00 deletes/s, 0.43 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

What do you see in slave status other than IO thread and SQL thread.

              Master_Log_File:
          Read_Master_Log_Pos: 
               Relay_Log_File: 
                Relay_Log_Pos: 
        Relay_Master_Log_File:

Are the positions moving?
Using relay log event file and name it would be good to check what event is next.
For example,

show relay logs events in 'file_name'  from pos limit 20;

It can also be checked from the binary logs it is reading from.
Does any process like backup runs when the replica gets freezed? Or any job that is in progress.

Thanks for the reply - master log file is updating, read master log pos is updating
Relay log / Relay log pos are static

show relaylogs for the filename / position in question gives me

db2-relay-bin.033381 543 Rotate 0 0 binlog.016668;pos=1073786479
db2-relay-bin.033381 583 Rotate 2 630 db2-relay-bin.033382;pos=4

Whatever mysql is doing, I can’t issue a STOP REPLICA (or Slave) - it just appears to be stuck in an infinite loop internally and never completes.

systemctl stop mysqld also gives the same.

I did break replication by moving that relay file, putting it back and restarting brought the same issue straight back.

Have you done big transactions on your Source node?
Are you using multi-thread replication? What are your replication configurations? If you are using multiple threads, you can try to change them into a single thread.
For a temporary time, you can configure ACID relaxed (innodb_flush_log_at_trx_commit = 2) and watch the positions move or not.

  1. No more than normal
  2. Yes, default mostly, changed the binlog expire days but thats about it
  3. Made no difference