Mysql slave stuck re-reading same binlog repeatedly

I’ve have a weird one.

So in part of our system we store files in mysql as BLOBs but broken up into segments. My issue is someone uploaded a 600MB video, which worked fine on all but 1 slave. For some reason that slave had been build with only 1GB of /tmp space and the server ran out of /tmp space.

We’ve fixed that, but now when I start the slave it only reads the primaries binlog to a certain position (equates to the end of the INSERT for segment 63 of 75) and then appears to stop.

I’ve watched the ‘show engine innodb status’ and I can see enough of the transaction to see what segment it’s on and it slowly grows to segment 63, then resets back to 0 and starts to grow again???

Looking at the relay logs, log 1 starts at the beginning of the large transaction with segment 0 and ends at segment 63 and is 1082041979 in size. Log 2 on the other hand is only 360 bytes and appears to contain nothing. Then log 3 is 1082041979 again and as you might expect starts over again at he beginning of the large transaction with segment 0.

This pattern repeats over and over.

The binlog that this slave is reading from is 1455431416 in size where the transaction starting point is binlog position 173467992. So it’s not all the 1 transaction, but most of it is.

I’ve looked at the binlog with mysqlbinlog and there doesn’t appear to be anything unusual between segments 63 and 64. The binlog appears to finish the transaction and other slaves read and executed this transaction without issue.

I’ve run RESET SLAVE, with no effect. There is the max_binlog_size (set to 1GB), but of course it can only be 1GB in size and it ‘should’ automatically increase to fit whatever a transaction needs.

I’m stumped what is blocking the slave from reading past that 1 binlog position.

Any ideas?

I should have mentioned this is on Percona mysql server 5.6.

1 Like

‘SHOW SLAVE STATUS’ shows both IO and SQL threads running but no progression of Exec_Master_Log_Pos? I take it you have restarted MySQL after increasing the /tmp space?

Also, 5.6 is dead. You should upgrade to at least 5.7 or 8.0 to be current. It’s possible this issue was fixed in 5.7/8.0.

1 Like

Yes, both the IO and SQL threads are running. The Exec_Master_Log_Pos NEVER increases, it stays at 173467992. The one you didn’t ask about is the Read_Master_Log_Pos. It increases until it gets to 1255509519, but it does NOT read past that point.

I thought replication was stuck, until I started watching the engine status and relay logs when I realized it wasn’t stuck, it was endlessly looping.

As for 5.6, you’re preaching to the choir. I’ve been trying to convince the higher ups to move to 5.7 for 4 years now.

1 Like