Seemingly Random Slowdowns in CPU-bound Workload

Server specs:

24 cores (E5670)
128GB mem
10 x X25E RAID 10 for data
2 x X25E RAID 1 for OS/logs

According to our monitoring, it seems like when things slow down, they slow down across the board. All queries seem to take longer. The slowdowns seem to come in groups as you can see here: 9.3-new-relic

Edit: I should add that disk array utilization and await remain constant during the slowdowns, as does the buffer pool hit rate.

Also, we’re mysql version: “Server version: 5.0.91-50-log Percona SQL Server, Revision 73 (GPL)”

My mind immediately jumped to this being a mutex contention issue, but I don’t see much in the SEMAPHORES section of show innodb status. Example innodb status here:
[URL=“https://gist:176319fda167e411f258 · GitHub”][/URL]

I’m not graphing OS Waits yet, but I’m working on getting that in to ganglia now. Any other thoughts on where to look?

I don’t think you have enough information to diagnose the problem. I’d use pt-stalk to gather more information when it happens.

I used pmp to get the following when it happens: [URL][/URL]

Note that this is after I disabled the query cache, since the first few profiles showed some contention on that.

Is it normal to see this much waiting for open_and_lock_tables? I’m not very familiar with the mysql codebase.