Until 20:16:02 CET slow-query log is almost empty. Then suddenly, query which normally taking up to 2 seconds taking 8 seconds. In next 9 minutes slow-query.log is filled with queries taking much much more they are normally do. Again same query which taking up to 2 seconds taking 20 seconds (!) at 20:16:22.
Time: 100129 20:22:12
Query_time: 205 Lock_time: 102
Query_time: 307 Lock_time: 199
Query_time: 181 Lock_time: 0
No always same time, so this problem random appearing so not for example some cron-job. Anyone have idea what to check, what this could be?
Ok today i gathered more diagnostics.
Web server restart were at:
which indicates time of problem. Then i looked at gathered diagnostics from this times.
On 11:09 (first restart) usage of /dev/sdb which got 3 most used mysql tables were 100% for some times. Then again on 12:49 and 13:36 on next two restarts same problem 100% usage whole HDD by 3 tables which normally takes up to 20%.
“how to fix it” is not a valid question until you find out what the problem is. Look again at vmstat and think more about the relationship between iowait in the CPUs and disk IO. The CPUs are not causing the disks to “block.” The disks aren’t blocking. Look – they are reading tons of data. The CPUs are blocked waiting for that to complete.
Something is asking the disks to read data. It could be mysqld or something else. Is there a backup running? Is there a long-running query? You need to write a script that gets ‘ps’ and SHOW PROCESSLIST to answer these questions.
We were forced to cut page viewing from for example 3000 pages to first 25 pages because of degradatin performance with every next page.
I will look into this sphinx pagination thing but sphinx got issues itself (returning false matches sometimes etc.) so im afraid it will not be 100% exact in pagination as it should be.
Another thing yesterday (07.02.2010) problem was minimalized maybe from not biggest traffic that day or changes that were made (packet size from 16 back to 1 and threads cache size from 190 to 150 with restarts of both mysqld’s).
Web server didn’t break whole day, and choke’s was for 95% and maximum 97% of disk capabilities.
17:59 - only for 4 seconds disk usage was 60%, 90%, 71%, 76%
20:26 - minimum 4 seconds 75%, 95%, 97%, 79% dont know more…
only 1 query from that times in slow query log )
17:59 similiar like query described above LIMIT 784, 16;
20:26 similiar like query described above LIMIT 784, 16;
Here we go our problem is probably disk seek times because of big LIMIT’s.