innobackupex locking all databases


A month ago, I have used innobackupex and this protocol
[URL=“How to setup a slave for replication in 6 simple steps with Percona XtraBackup”][/URL]
to set up two slaves.

Today I tried to reset one of the slaves from scratch using exactly the same protocol. The innobackupex program took considerable time (the program was printing something like “log position …”. Meanwile the server started to run out of available connections. I hade to kill innobackupex and stop/start mysql when there were 500 waiting connections in mysql.

I bellieve that innobackupex should lock/unlock the databases one by one, but the waiting connections were from many different databases. This is a production server with SSD disks and enough RAM.

mysql> show variables like “%version%”;
| Variable_name | Value |
| innodb_version | 5.5.37 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.37-0+wheezy1-log |
| version_comment | (Debian) |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |

I am not sure if this was a locking problem. What I know for sure is:

  1. after starting innobackupex the number of connections in show_process_list increased from standard 10-20 to 512
  2. afreaching 500 connections, I killed innobackupex, but the number of connections did not decrease - still 512connections
  3. after several minutes I decided to stop/start the mysql server

this is the reason why I suppose that innodbbackup locked all databases. If not, why the number of connections stayed on 512 even after killing the program?

Any help greatly appreciated.

Innobackupex cannot lock/unlock databases one by one, because that would result in an inconsistent backup.

The progress messages about “log position” occur as xtrabackup is reading the InnoDB tablespaces, and meanwhile changes are going on, so it has to monitor how many new changes have been written to the redo log. The larger your database, the longer this takes. Xtrabackup shouldn’t be locking anything while this is going on, but naturally reading all your tablespaces causes some extra load to your disks, so it may cause other other database work to slow down.

Saving the output of the backup can also cause even more load, if you write it to the same disk device that you’re reading. The best practice is to write the backup to a different disk device, or else stream it over the network to another server.

Once it is done reading the InnoDB tablespaces, innobackupex also acquires a global read lock on all databases, while it reads non-InnoDB tables and notes the binary log position. Hopefully, you have minimal data in MyISAM and other storage engines, and this locked period is brief.

But if the locked period is too long, and your application traffic continues to make new connections to the server, they could be blocked from finishing their work, and so they will accumulate. Even after the backup is finished, it could take some time to work through the backlog of unfinished work. Several minutes seems unusually long, but it depends on the nature of the work they expect to do.

For example, suppose they each need to do 1.5 seconds worth of work, and they are all working on overlapping table rows. So they would queue up for serial access, and take at least 512 * 1.5 = 768 seconds, or 12 minutes 48 seconds. Meanwhile, yet more requests are coming in, adding to the contention and further delaying the time when the number of connections gets back to normal.

One alternative for initializing a new slave is to back up from one of the existing slaves, and use the --slave-info option so that the binary log coordinate to use for the new slave is the source slave’s master log position. See “Adding more slaves to the master” at [url]Percona XtraBackup

Bill Karwin

First of all, thank you for the detailed reply. The tables are almost completelly InnoDB. Therefore, if I understood your answer correctly, the cause should be general load, not locking the tables. Thus I can probably try “throttle” parameter to keep the load at reasonable level