24 cores (E5670)
10 x X25E RAID 10 for data
2 x X25E RAID 1 for OS/logs
According to our monitoring, it seems like when things slow down, they slow down across the board. All queries seem to take longer. The slowdowns seem to come in groups as you can see here: https://skitch.com/jamesgolick/8djnw/fetlife-rails3-rubby-1. 9.3-new-relic
Edit: I should add that disk array utilization and await remain constant during the slowdowns, as does the buffer pool hit rate.
Also, we’re mysql version: “Server version: 5.0.91-50-log Percona SQL Server, Revision 73 (GPL)”
My mind immediately jumped to this being a mutex contention issue, but I don’t see much in the SEMAPHORES section of show innodb status. Example innodb status here:
I’m not graphing OS Waits yet, but I’m working on getting that in to ganglia now. Any other thoughts on where to look?