I’m still digging in, but figured would post what’s happening in case anyone knows of an issue we might be running up against.
We’re running all-InnoDB on Percona 5.5.16 on Centos 6.0 . We’ve got 14 “shards” running this same percona/centos combination on very large hardware.
We handle a lot of queries so this situation isn’t exactly common, but it’s happened twice in the last week on two different machines.
Here’s what it looks like:
[LIST]
[] Connections start piling up, primarily simple inserts, with State “Waiting for table level lock” against a particular table
[] The longest running query against this table, which is not blocked, is an insert with Command “Query” and State “update”
[] Killing that long running query that seems to be blocking everything does not kill it, the thread and query remain in “show processlist” but the Command changes to “killed”. I waited 15 minutes and it did not clear out
[] Database eventually stops accepting connections
[*] Once in this state, not able to stop or kill the DB at all, see “Can’t create thread to kill server” in log If I strace the process, it’s stuck on “futex(0x1046224, FUTEX_WAIT_PRIVATE, 99, NULL”
[/LIST]
The only way we’ve managed to resolve these is to “kill -9” mysql, restart it and let recovery happen (has happened cleanly both times thankfully), and then put it back under load.
Are there deadlock issues around inserts on 5.5? There is an auto increment column on the involved table. We are running the innodb buffer pool at 50% of system RAM so I don’t believe we are running out of memory.
What would cause the database to just go completely unresponsive like it has?
These are DBs we recently migrated from Mysql 5.0 on Centos 5 and we never encountered this problem on that older config.