Unusual long time spent on storing result in query cache

I have a quite complex query, using seven joins over three tables, that sometimes takes an unusual time to execute.

Normally the query finishes within seconds, but once in a while the queries pile up and the processlist starts to fill up with this specific query, causing them to take about 300-400 seconds per query. This is usually the moment I’m getting calls from our users and start killing away these queries.

Weird thing is that if I profile such a query at the moment they are hanging, all time is spent on storing the result in the query cache, e.g.:

±---------±----±-------------------------------±-------- —±---------±-----------±------------------±----------- ---------±-------------±--------------±--------------±-- ----------------±------------------±------------------±-- ----±----------------------±--------------±------------+
| QUERY_ID | SEQ | STATE | DURATION | CPU_USER | CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE |
±---------±----±-------------------------------±-------- —±---------±-----------±------------------±----------- ---------±-------------±--------------±--------------±-- ----------------±------------------±------------------±-- ----±----------------------±--------------±------------+
| 1 | 1 | starting | 0.000066 | 0.000000 | 0.000194 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL |
| 1 | 2 | checking query cache for query | 0.000249 | 0.000000 | 0.000748 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | send_result_to_client | sql_cache.cc | 1143 |
| 1 | 3 | Opening tables | 0.000097 | 0.000000 | 0.000349 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 2664 |
| 1 | 4 | System lock | 0.000024 | 0.000000 | 0.000060 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 153 |
| 1 | 5 | Table lock | 0.000085 | 0.000000 | 0.000358 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 163 |
| 1 | 6 | init | 0.000115 | 0.000000 | 0.000477 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2275 |
| 1 | 7 | optimizing | 0.000051 | 0.000000 | 0.000229 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 759 |
| 1 | 8 | statistics | 0.023570 | 0.000000 | 0.066565 | 1678 | 18 | 0 | 0 | 10 | 20 | 0 | 0 | 0 | optimize | sql_select.cc | 912 |
| 1 | 9 | preparing | 0.000072 | 0.000000 | 0.000190 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 922 |
| 1 | 10 | executing | 0.000031 | 0.000000 | 0.000108 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1591 |
| 1 | 11 | Sorting result | 0.000021 | 0.000000 | 0.000047 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1991 |
| 1 | 12 | Sending data | 0.013143 | 0.000000 | 0.041227 | 952 | 58 | 0 | 0 | 18 | 32 | 0 | 0 | 0 | exec | sql_select.cc | 2115 |
| 1 | 13 | storing result in query cache | 232.595520 | 0.000000 | 488.715455 | 18629139 | 361433 | 22 | 4067 | 40458 | 75736 | 0 | 9232 | 0 | query_cache_abort | sql_cache.cc | 691 |
| 1 | 14 | end | 0.000019 | 0.000000 | 0.000067 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2320 |
| 1 | 15 | end | 0.000007 | 0.000000 | 0.000028 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2322 |
| 1 | 16 | query end | 0.000008 | 0.000000 | 0.000031 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5329 |
| 1 | 17 | freeing items | 0.000041 | 0.000000 | 0.000165 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_parse | sql_parse.cc | 6340 |
| 1 | 18 | closing tables | 0.000010 | 0.000000 | 0.000041 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 2306 |
| 1 | 19 | logging slow query | 0.000006 | 0.000000 | 0.000024 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement | sql_parse.cc | 2364 |
| 1 | 20 | logging slow query | 0.000040 | 0.000000 | 0.000161 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement | sql_parse.cc | 2374 |
| 1 | 21 | cleaning up | 0.000019 | 0.000000 | 0.000076 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 2329 |
±---------±----±-------------------------------±-------- —±---------±-----------±------------------±----------- ---------±-------------±--------------±--------------±-- ----------------±------------------±------------------±-- ----±----------------------±--------------±------------+

The system is running FreeBSD 6.4, MySQL 5.0.77 and all tables are InnoDB. Cache parameters:

| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 16777216 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |

Has somebody seen this kind of behavior before? Please let me know if you need more information, parameters, schema’s, queries, etc.

And possibly of interest:

±------------------------±----------+
| Variable_name | Value |
±------------------------±----------+
| Qcache_free_blocks | 1921 |
| Qcache_free_memory | 5580504 |
| Qcache_hits | 848105931 |
| Qcache_inserts | 54312258 |
| Qcache_lowmem_prunes | 10197256 |
| Qcache_not_cached | 288569871 |
| Qcache_queries_in_cache | 4528 |
| Qcache_total_blocks | 11184 |
±------------------------±----------+

When the queries all pile up, what status are the other queries in?

How many CPUs do you have and about how many queries per second does your server typically run? 100, 1000, 10000?

The reason that I ask is that the query cache doesn’t scale on SMP hardware, period. It is becoming increasingly common for us to find cases where it is intermittently locking the whole server, and the solution is to turn it off.

If you can catch the server during one of the freezes, use poor man’s profiler http://poormansprofiler.org/ to stack-dump it and let’s see what it is doing. There are tools in http://code.google.com/p/aspersa/ (stalk and collect) that can help with this.

Here’s an update of my findings:

First of all, the 53 threads that are in a ‘read’-call are not active queries, but sleeping connections. If I simultaneously dump the stacktraces and the processlist on an idle system this becomes apparent:
#./poormansprofiler.sh && mysqladmin processlist
47 read,read,vio_read,select_max_min_finder_subselect::cmp_str, select_max_min_finder_subselect::cmp_str,handle_one_connecti on,pthread_mutexattr_init,??
(…)

±---------±-----±----------±-----------±--------±----- —±------±-----------------+
| Id | User | Host | db | Command | Time | State | Info |
±---------±-----±----------±-----------±--------±----- —±------±-----------------+
| 447 | cms | localhost | | Sleep | 3522200 | | |
| 448 | cms | localhost | | Sleep | 3522200 | | |
| 509 | cms | localhost | | Sleep | 0 | | |

| 15624091 | cms | localhost | | Query | 0 | | show processlist |
±---------±-----±----------±-----------±--------±----- —±------±-----------------+

I did capture io and vm activity, but don’t see anything unusual going on. This could also be because I’m not sure if my interpretation of vmstat is ok.

Then, the most interesting finding, is that it might nog be a query cache problem after all. If I profile the query under normal circumstances it takes most time in ‘storing result in query cache’:

| 1 | 1 | starting | 0.000049 | 0.000000 | 0.000044 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | |
| 1 | 2 | checking query cache for query | 0.009305 | 0.000000 | 0.000351 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | send_result_to_client | sql_cache.cc | 1143
| 1 | 3 | Opening tables | 0.000077 | 0.000000 | 0.000072 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 2664
| 1 | 4 | System lock | 0.000013 | 0.000000 | 0.000013 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 153
| 1 | 5 | Table lock | 0.000101 | 0.000000 | 0.000100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 163
| 1 | 6 | init | 0.000114 | 0.000000 | 0.000114 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2275
| 1 | 7 | optimizing | 0.000051 | 0.000000 | 0.000051 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 759
| 1 | 8 | statistics | 0.000258 | 0.000000 | 0.000257 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 912
| 1 | 9 | preparing | 0.000053 | 0.000000 | 0.000053 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 922
| 1 | 10 | executing | 0.000020 | 0.000000 | 0.000020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1591
| 1 | 11 | Sorting result | 0.000009 | 0.000000 | 0.000009 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1991
| 1 | 12 | Sending data | 0.000059 | 0.000000 | 0.000059 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 2115
| 1 | 13 | storing result in query cache | 6.269024 | 5.506759 | 0.111151 | 5879 | 1298 | 1 | 0 | 2427 | 4856 | 0 | 4 | 0 | query_cache_abort | sql_cache.cc | 691
| 1 | 14 | end | 0.000020 | 0.000016 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2320
| 1 | 15 | end | 0.000009 | 0.000009 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2322
| 1 | 16 | query end | 0.000009 | 0.000009 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5329
| 1 | 17 | freeing items | 0.000040 | 0.000041 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_parse | sql_parse.cc | 6340
| 1 | 18 | closing tables | 0.000018 | 0.000017 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 2306
| 1 | 19 | logging slow query | 0.000008 | 0.000007 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement | sql_parse.cc | 2364
| 1 | 20 | logging slow query | 0.000057 | 0.000057 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement | sql_parse.cc | 2374
| 1 | 21 | cleaning up | 0.000016 | 0.000016 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 2329

But when bypassing the cache (using SQL_NO_CACHE) the same time is spent in ‘Sending data’:

| 1 | 1 | starting | 0.000029 | 0.000022 | 0.000003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | |
| 1 | 2 | checking query cache for query | 0.000232 | 0.000208 | 0.000025 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | send_result_to_client | sql_cache.cc | 1143
| 1 | 3 | Opening tables | 0.000069 | 0.000061 | 0.000008 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 2664
| 1 | 4 | System lock | 0.000013 | 0.000011 | 0.000001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 153
| 1 | 5 | Table lock | 0.000020 | 0.000018 | 0.000002 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 163
| 1 | 6 | init | 0.000106 | 0.000094 | 0.000012 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2275
| 1 | 7 | optimizing | 0.000049 | 0.000044 | 0.000005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 759
| 1 | 8 | statistics | 0.000245 | 0.000218 | 0.000027 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 912
| 1 | 9 | preparing | 0.000052 | 0.000046 | 0.000006 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 922
| 1 | 10 | executing | 0.000021 | 0.000018 | 0.000002 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1591
| 1 | 11 | Sorting result | 0.000013 | 0.000012 | 0.000001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1991
| 1 | 12 | Sending data | 5.248548 | 5.575107 | 0.105959 | 6745 | 987 | 2 | 0 | 1960 | 3920 | 0 | 8 | 0 | exec | sql_select.cc | 2115
| 1 | 13 | end | 0.000018 | 0.000015 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2320
| 1 | 14 | end | 0.000010 | 0.000009 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2322
| 1 | 15 | query end | 0.000009 | 0.000009 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5329
| 1 | 16 | freeing items | 0.000040 | 0.000039 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_parse | sql_parse.cc | 6340
| 1 | 17 | closing tables | 0.000015 | 0.000014 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 2306
| 1 | 18 | logging slow query | 0.000007 | 0.000008 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement | sql_parse.cc | 2364
| 1 | 19 | logging slow query | 0.000138 | 0.000138 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement | sql_parse.cc | 2374
| 1 | 20 | cleaning up | 0.000016 | 0.000015 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 2329

I’m pretty convinced that the problem situation, where the query suddenly takes more than 200 secs, could also occur in the SQL_NO_CACHE query, and the time would be accounted to ‘Sending data’.

So, here’s my guess on the series of events:
thd_proc_info(thd, “Sending data”) in exec() in sql_select.cc
-> …
-> optionally thd_proc_info(thd, “storing result in query cache”) in query_cache_abort() in sql_cache.cc
-> …
-> something that sometimes takes forever
-> …
-> thd_proc_info(thd, “end”) in mysql_select() in sql_select.cc

Why query_cache_abort()? I don’t know.

Maybe under certain loads the optimizer tries to use an execution plan that takes more than 200 secs? I’ll keep investigating…

This,

47 read,read,vio_read,select_max_min_finder_subselect::cmp_str, select_max_min_finder_subselect::cmp_str,handle_one_connecti on,pthread_mutexattr_init,??

Is definitely not a sleeping connection ) There is a race condition between the stacktraces and the SHOW PROCESSLIST.

I would use oprofile at this point to find out where the time is being consumed.