We have a set of mysql servers we use for hosting several client systems. These are fairly recent, large Dell boxes - 32gb ram each, 4 - 6 physical cores each, doubled to 8 - 12 via hyperthreading.
We have around 20 boxes like this, each hosting the database instance for 60 - 100 client systems.
We recently increased RAM in each box from 16 to the current 32, and at the same time, moved from MySQL 5.0 to Percona 5.1. This in an effort to allow each box to host more database instances.
Generally, the move has been successful - we have not had any downtime or data corruption/loss issues surrounding the move.
However, since the change, we have seen a strange occurrence, seemingly random, across servers/client instances.
We use a home brew mysql query monitor to monitor/list the mysql processlist, and alert based on problem queries. In this monitor, we have started seeing a thread in “Killed” state, which we know we did not kill ourselves, nor did any of our systems kill. There are very limited cases in our applications which would actually kill a thread and none of those cases is to blame here based on our diagnosis.
Furthermore, the Killed thread does not have an actual query associated with it. The only info that ‘mysqladmin -v processlist’ has for these threads is a pid, a user, an originating client host/port, the database instance, and that killed state.
So in any case, this Killed thread, while showing in the processlist, usually for up to 2s at a time, seems to slow the entire server down, and other queries stack up until that Killed thread goes away, at which point the other queries go away and things go generally back to normal.
This Killed thread shows up at least once per minute though, so there is definite slowdown in overall database and therefore application throughput.
The only clue we have so far is that usually, after that Killed thread goes away, a series of other threads show the “Waiting on query cache mutex” state, which is new in the Percona 5.1 release, and has to do with the query cache.
As a test, we have tried disabling the query cache altogether, with “SET GLOBAL query_cache_type=OFF”, but the random Killed queries still appear, and the server still backs up with them. The only difference is that we then no longer see the threads in the ‘waiting on query cache mutex’ state.
Can anyone provide any hints as to what’s happening here and why?
Thanks