Percona 5.5 Deadlocks and Crashes

I’m still digging in, but figured would post what’s happening in case anyone knows of an issue we might be running up against.

We’re running all-InnoDB on Percona 5.5.16 on Centos 6.0 . We’ve got 14 “shards” running this same percona/centos combination on very large hardware.

We handle a lot of queries so this situation isn’t exactly common, but it’s happened twice in the last week on two different machines.

Here’s what it looks like:
[LIST]
[] Connections start piling up, primarily simple inserts, with State “Waiting for table level lock” against a particular table
[
] The longest running query against this table, which is not blocked, is an insert with Command “Query” and State “update”
[] Killing that long running query that seems to be blocking everything does not kill it, the thread and query remain in “show processlist” but the Command changes to “killed”. I waited 15 minutes and it did not clear out
[
] Database eventually stops accepting connections
[*] Once in this state, not able to stop or kill the DB at all, see “Can’t create thread to kill server” in log If I strace the process, it’s stuck on “futex(0x1046224, FUTEX_WAIT_PRIVATE, 99, NULL”
[/LIST]
The only way we’ve managed to resolve these is to “kill -9” mysql, restart it and let recovery happen (has happened cleanly both times thankfully), and then put it back under load.

Are there deadlock issues around inserts on 5.5? There is an auto increment column on the involved table. We are running the innodb buffer pool at 50% of system RAM so I don’t believe we are running out of memory.

What would cause the database to just go completely unresponsive like it has?

These are DBs we recently migrated from Mysql 5.0 on Centos 5 and we never encountered this problem on that older config.

How many rows does that insert query insert? Is your i/o system busy on these moments?

That’s the odd part that makes me think we’ve hit a bug. The insert that was jamming everything up was trivial - an insert of 1 row into a table that contains 4 int columns.

It hasn’t happened again since posting this, but we’ve only hit it twice in the last two weeks. If we can figure it out will update here.

If it happens to show up again you should try getting a pt-pmp dump of the process as well as the output of:

  • SHOW GLOBAL STATUS
  • SHOW GLOBAL VARIABLES
  • SHOW ENGINE INNODB STATUS
  • SHOW PROCESSLIST

Hi, I ran into this kind of behaviour, and I in mya case i think is the new metadata locking in mysql 5.5. Can you post a processlist and a “SHOW ENGINE INNODB STATUS\G” ?

You can also “CREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;” and use innotop to figure out what’s happening.

http://www.xaprb.com/blog/2007/09/18/how-to-debug-innodb-loc k-waits/

(In my case the suspect is the yet not properly instrumented and poorly documented Metadata Lock, but you might be more lucky if you only use the innodb engine).