High IO utilized when running backup.

Hi Experts,

Im using Xtrabackup in my environment for many years now (backup runs on slave). My data has grown to huge amount 450GB odd. Initially when i had setup my backup script i used the following.

1st step: innobackupex --slave-info --user=$USER --password=$PASS --defaults-file=/etc/my.cnf --use-memory=2G --parallel=3 /mnt/Volumn/backups/ > /tmp/backups.log 2>&1 (to the same slave machine)
2nd step: rsync the generated backup data from slave to a remote backup server

Nowdays this is using alot of IO due to which im getting high IO waits and my replication starts lagging to good amount for that period since it cant get proper IO resource to keep up with the master.

Is there a better way to optimize the backup procedure to use less IO.

Thanks

You could do a few things:

  1. Backup to a separate drive
  2. Change the time you perform backups so it happens at a time with a lighter workload
  3. Reduce your --parallel number
  4. Try using the --throttle option

Option #1 is your best bet all around, but likely will require the most work and may not be practical depending on your hardware setup. Option #2 might help a lot depending on your workload and when you are running the backup currently. Options #3 and #4 are easy to implement and will likely help, but the backups will take longer due to the throttling.

[url]The innobackupex Option Reference

scott.nemes Thanks for the reply. What about --stream=xbstream to remote server . Will streaming backup help reducing IO usage. Also how does locking work when we use --stream=xbstream to remote server. Basically im looking for very minimum lock state like when we run on local disk.
Thanks again

Streaming should not change any locking behavior that I am aware of. If you are using InnoDB tables, then your locking should be minimal to start with. If locking is an issue, then you should try backing up a slave DB server instead where any locks that do occur should be less impacting.

As for streaming using less I/O, I’ve never done any testing to see. Basically streaming just sends the backup output to STDOUT, and then you process it (compress it, SSH it somewhere else, etc). This is something you should be able to test yourself with your specific workload, just by running it both ways and monitoring the I/O.

scott.nemes The backup runs on slave by default. My worries here are due to streaming is to a remote server and it primarily depends on the network speed, when FTWRL being fired for .frm files depending on the network speed the FTWRL time will be high compared to local disk. Please correct me if im wrong.

I doubt the length of time tables remain locked will be noticeably longer when streaming the backup vs saving it locally, but that is again something you’d have to test in your local environment to be sure.

Are you using your slave for something critical in nature where the replication lag created by the backup is an issue? One of the big benefits of having slaves is to perform maintenance like this, so having to worry about the replication lag kind of defeats the purpose of using a slave. If you are using the slave for something critical (besides the backup), then I would suggest setting up a second slave that you use only for backups, and then do not worry if it gets behind temporarily while the backup is running. =)