Replication broken on all slaves after version update

We have a basic master/slave setup. One master and three slaves. Replication was working flawless for months.
They were all on version 5.6.14-56-log Percona Server (GPL).

We added a new slave on a VM a few days ago. The server was build new and the latest version of MySQL was installed 5.6.21-70.0-log Percona Server (GPL).
To set up the slave I used xtrabackup off an existing slave. But before I did I updated the two slaves to 5.6.21-70.0-log Percona Server (GPL) so that they were all on the
same version and the master was still on 5.6.14. We didn’t want to run any updates on the live production server.

All servers were replicating fine for three days until today when suddenly all three slaves stopped with no errors. IO and SQL thread all died but there was no replication error. Issuing START SLAVE did nothing and did not generate a last error. The master shows no slave threads, there are no errors in the log, and its running fine.

Looking through the mysqld.log all the servers had this when issuing START SLAVE:

2014-11-14 20:50:28 15361 [Note] Slave SQL thread initialized, starting replication in log ‘mysql-bin.000244’ at position 775127000, relay log ‘/var/lib/mysql/relay-bin.000007’ position: 775127163
04:50:29 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at [ [url]System Dashboard - Percona JIRA ]

Now I don’t know what’s wrong. Signal 11 seems pretty ambiguous in searches.

Did replication finally hit a snag over the difference in versions? Is there a problem with the master?
Is it safe to just run an update on the master and try to start the slaves again? Does replication need to be reset on all the servers?

Thanks for any help.

I can confirm I am seeing this too on version 5.6.21-70.0-log. I moved data to a new server via innobackupex (from a server with 5.6.21-69.0-log). The mysqld instance starts and runs fine, until I start slave, then I immediately get ‘MySQL instance has gone away’ in the client session. If I tail -f the error log in another window, I see that mysql crashes with error 11 (I have also seen 6), and restarts right away without problems. Could really do with a solution!

Interesting as I get that as well. After issuing ‘start slave’ the command ‘show slave status’ causes the client to disconnect, attempt to reconnect, and then show the status.

I downgraded the slaves version by version until the slave started again. The first version to work was 5.6.19-67.0-log. After issuing ‘start slave’ the slave started correctly and started catching up. One fully caught up, two are still working. When they are all caught up I’ll run pt-table-checksum to make sure they are in sync. But looks good so far. Certainly looks like some bug between versions.

Hi,

I think, you might be hitting these bugs. can you provide full error log, so at least we can compare the stack trace.
[url]https://bugs.launchpad.net/percona-server/+bug/1384583[/url]
[url]https://bugs.launchpad.net/percona-server/+bug/1384568[/url]

Or you can also directly update that bug with your details so devs can know that more guys are affected with that.