Storm on Demand + InnoDB Issues

Hey all,

I have a pretty puzzling issue with InnoDB and LiquidWeb’s Storm on Demand service. Basically, the issue revolves around backups. They said they can’t do anything to change their backup system, so I thought maybe there was some MySQL settings that could help with the issue.

From what I can understand, Storm uses a Xen-based virtualization infrastructure. It does daily point-in-time snapshot backups. How exactly it does this, they won’t disclose. All I do know, is at exactly the time this starts, the MySQL database basically becomes completely unresponsive for around 2 minutes. Even simple selects go on hold. If I do “show processlist” a ton of processes are in “statistics” or “sorting results” or the like. And these are all very simple-index based queries. My entire dataset easily fits in the buffer-pool as well.

So, here’s what I’m thinking is happening. The DB does around 10-15 updates, deletes or inserts per second. In order to take a point-in-time backup, the Xen system freezes the state of the hard disks for a very brief period of time. Nothing can be written to the disk. Since the database is unable to write anything to disks, it can’t commit its inserts, updates or deletes, and everything is placed on hold. BTW, I have these settings:

innodb_flush_log_at_trx_commit = 2
innodb_flush_method=O_DIRECT

So, does this sound about right to you guys? If so, is there anyway to stop the database from becoming unresponsive that you might know?

Thanks!
Justin

P.S. I just changed innodb_flush_log_at_trx_commit to 0 as my data is not mission critical. We’ll see if that makes any difference.

Hey,

Just an FYI to all setting innodb_flush_log_at_trx_commit to 0 seems to have solved the issue. If anyone wants to jump as to why this is, I’d appreciate it. My thoughts are just that with that variable, if the disk is unresponsive, InnoDB can wait for a few minutes to write until it is responsive again. With =2, it can’t. It still has to do some writing.

Justin

They just delay the flush operation. With this new setting, the number of dirty pages in memory increases. They may cause freezing at a later point, when these have not been flushed and memory is required for new pages.