Slave Replication breaks


We have migrated our application recently to Percona Server(Percona-Server-server-56-5.6.27-rel75.0.el7.x86_64) on Centos OS 7 (CentOS Linux release 7.1.1503 (Core)) from mysql.

The parameters are tuned as per the guidelines provided by Percona team during their audit activity to our prod environment.

Our application was working very well for a more than a month, until now, where in we are facing replication issues on slave.

The architecture is of single master and 3 slaves(all identical to master) and with a fourth additional slave(this slave is running on Ubuntu OS with ubuntu percona binaries and default my.cnf params)

Replication on all 3 slaves fails with “Last_SQL_Error: Error ‘Duplicate entry ‘*********’ for key ‘PRIMARY’’ on query.” error.

Interestingly, replication does not break on the fourth server. The error is noticed randomly for insert/update query and not to a particular one.

When we try to restore from a complete dump of master, causes the same issue.

we tried the following steps:

  1. Stop the master for any writes
  2. Created a full db dump using mysqldump command
  3. Restore on slave
  4. set the POS and bin log position.
  5. start slave.

it is observed that the entry is not duplicate in the master binlogs. We tried with a skip error option wherein the record value actually differs from the master giving stale data for reads.

Also, we host these server on AWS with IOPS enabled.

Below are some mysql variables would like to bring to the notice:
innodb_flush_log_at_trx_commit = 2
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_doublewrite = OFF

We have a similar setup for our stage environment with a difference of no IOPS for it. Tried replicating the issue with multiple load tests/updates without any luck.

Any pointers in this regard will be appreciated.