innodb_autoinc_lock_mode with SBR and multi-row inserts

I’ve been trying to hunt down this issue for a few days and believe that I have found an issue. Here is my setup for the particular cluster I have been having issues with:

4 Total Servers

  • 2 Masters in Active/Passive M-M
  • 2 slaves (each off of 1 master)
  • All running Percona MySQL 5.1.54
  • Statement Based Replication
  • All writes go to single master - replicated to passive master and primary slave

Documentation says that when using SBR and you know the number of rows to be inserted and those are all the same type, innodb_autoinc_lock_mode 0 OR 1 should be safe. I have been using mode 1 (default) and have seen my slaves get out of sync rapidly (when I define a few skip-slaves) or die on duplicate primary key. I verified that the only user connecting was the mysql system user (replication thread) and found that the PKs were different on the master and the slave.

The query that is causing the issue is a multi row insert, where no PKs are specified, so I would assume it is not multi-type (this could be the problem)?

Here is some sample data (with column names changed):
Master Data

pk, fk_1, fk_2, fk_3, att_1, att_2, fk_4, att_3, att_4
885357437, 105516509, 186757, 230399771, ‘0’, ‘new’, 0, 2011-02-21 11:57:15, 1298311035
885357431, 2453251, 186757, 230399771, ‘1’, ‘sent’, 0, 2011-02-21 11:57:15, 1298311035

Master Binlog Events:

‘mysql-bin.000047’, 214491072, ‘Query’, 5001, 214491155, ‘BEGIN’
‘mysql-bin.000047’, 214491155, ‘Intvar’, 5001, 214491183, ‘INSERT_ID=885357431’
‘mysql-bin.000047’, 214491183, ‘Query’, 5001, 214491470, ‘use my_db; INSERT INTO table_1 (fk_1, fk_2, fk_3, att_1, att_2, att_4) VALUES (‘2453251’, 186757, 230399771, 1, ‘sent’, 1298311035),(‘105516509’, 186757, 230399771, 0, ‘new’, 1298311035)’
‘mysql-bin.000047’, 214491470, ‘Xid’, 5001, 214491497, ‘COMMIT /* xid=12719492 */’

‘mysql-bin.000047’, 214494305, ‘Query’, 5001, 214494388, ‘BEGIN’
‘mysql-bin.000047’, 214494388, ‘Intvar’, 5001, 214494416, ‘INSERT_ID=885357433’
‘mysql-bin.000047’, 214494416, ‘Query’, 5001, 214494704, ‘use my_db; INSERT INTO table_1 (fk_1, fk_2, fk_3, att_1, att_2, att_4) VALUES (‘105620951’, 186757, 230399773, 1, ‘sent’, 1298311040),(‘38049876’, 186757, 230399773, 0, ‘new’, 1298311040)’
‘mysql-bin.000047’, 214494704, ‘Xid’, 5001, 214494731, ‘COMMIT /* xid=12719926 */’

Slave Data:

885357433, 105516509, 186757, 230399771, ‘0’, ‘new’, 0, 2011-02-21 11:57:15, 1298311035
885357431, 2453251, 186757, 230399771, ‘1’, ‘sent’, 0, 2011-02-21 11:57:15, 1298311035

As you can see, the slave got the event with 2 rows and did the inserts with 2, sequential IDs (as expected). However, the master did not and then later tried to do another insert with the already used PK and killed replication.

After changing the mode from 1 to 0 (traditional using table-level AUTO-INC locks), this issue has vanished. I’m wondering if this is a Percona/MySQL/XtraDB bug or if my inserts are technically considered multi-type (in which case documentation is correct in saying it is not safe for mode 1).

Any thoughts or comments would be appreciated!

Thanks

Hi,

Do you have some repeatable test case which can be tried easily ?

We have not changed replication code which should cause such bug to appear in Percona Server but everything is possible when it comes to software engineering. I suggest you to get the test case and see if this is Percona Server issue if it is we’ll fix it

Peter,

Thanks for the quick reply. Currently, I don’t have any particular SQL that can replicate the exact issue. The problem generally appears with multiple threads writing to the same table at the same time. In our production environment, the table in question contains roughly 100 million rows and we see an average of 10-15 writes per second with spikes approaching 1000 per second.

I’ll try to put together a little script/app that can simulate multiple threads writing to the same table structure at random, but overlapping times with both values for that option and post it here.

Yup. I know such race condition bugs can be pain to catch. This is especially why creating a test case is important as if you can’t easily repeat the problem
so it is hard for developers.