hi all,
i noticed some strange behaviour in the use of mysql. currently using 5.1.37-enterprise-gpl-advanced.
It started with the following query:
select d.Date, sum(r.FctA) a, sum(r.FctB) b
from rpo_fct_x x
join rpo_dim_date d on x.SID_DateId = d.SID_DateId
where d.Date > adddate(sysdate(), interval -2 day)
and d.Date < sysdate()
group by d.Date
explain:
±—±------------±------±-------±---------------------- --±--------±--------±--------------------------------±-- -------±--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±-------±---------------------- --±--------±--------±--------------------------------±-- -------±--------------------------------+
| 1 | SIMPLE | r | ALL | date_country,SID_DateId | NULL | NULL | NULL | 13860580 | Using temporary; Using filesort |
| 1 | SIMPLE | d | eq_ref | PRIMARY | PRIMARY | 4 | dwh_rep_olap.r.SID_DateId | 1 | Using where |
±—±------------±------±-------±---------------------- --±--------±--------±--------------------------------±-- -------±--------------------------------+
2 rows in set (0.00 sec)
mysql> show keys from rpo_fct_registrations;
±----------------------±-----------±--------------------- --------------------------±-------------±----------------- -------±----------±------------±---------±-------±----- ±-----------±--------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
±------±-----------±------------------------------------- ----------±-------------±------------------------±------- —±------------±---------±-------±-----±-----------±- -------+
| r | 1 | date_country | 1 | SID_DateId | A | 2047 | NULL | NULL | | BTREE | |
| r | 1 | date_country | 2 | SID_CountryId | A | 28876 | NULL | NULL | | BTREE | |
| r | 1 | SID_DateId | 1 | SID_DateId | A | 2047 | NULL | NULL | | BTREE | |
±----------------------±-----------±--------------------- --------------------------±-------------±----------------- -------±----------±------------±---------±-------±----- ±-----------±--------+
7 rows in set (0.00 sec)
I tried then to play around with the various parts of the query:
mysql> select count() from r where SID_DateId in (select SID_DateId from d where Date>adddate(sysdate() , interval -2 day) and Date < sysdate());
±---------+
| count() |
±---------+
| 13858 |
±---------+
1 row in set (1 min 24.29 sec)
far too long for this query…
±—±-------------------±---------±----------------±— -----------±-----------±--------±-----±---------±------ -------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±-------------------±---------±----------------±— -----------±-----------±--------±-----±---------±------ -------------------+
| 1 | PRIMARY | r | index | NULL | SID_DateId | 4 | NULL | 13860580 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | d | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
±—±-------------------±---------±----------------±— -----------±-----------±--------±-----±---------±------ -------------------+
then i put the two values into the in clause which would be the result of the in:
mysql> select count() from r where SID_DateId in (2091,2090);
±---------+
| count() |
±---------+
| 13858 |
±---------+
1 row in set (0.00 sec)
±—±------------±-------±------±---------------------- --±-------------±--------±-----±------±---------------- ---------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±-------±------±---------------------- --±-------------±--------±-----±------±---------------- ---------+
| 1 | SIMPLE | r | range | date_country,SID_DateId | date_country | 4 | NULL | 24306 | Using where; Using index |
±—±------------±-------±------±---------------------- --±-------------±--------±-----±------±---------------- ---------+
Why does the subquery takes that much time against the two values in the in clause?
i’ve rewritten then the query to use a derived table:
select count(*) from r
join (select SID_DateId from d where Date>adddate(sysdate(), interval -2 day ) and Date < sysdate()) d on d.SID_DateId = r.SID_DateId
±—±------------±-------------±-----±----------------- -------±-----------±--------±-------------±------±----- -------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±-------------±-----±----------------- -------±-----------±--------±-------------±------±----- -------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 2 | |
| 1 | PRIMARY | r | ref | date_country,SID_DateId | SID_DateId | 4 | d.SID_DateId | 6768 | Using index |
| 2 | DERIVED | d | ALL | NULL | NULL | NULL | NULL | 13150 | Using where |
±—±------------±-------------±-----±----------------- -------±-----------±--------±-------------±------±----- -------+
±---------+
| count(*) |
±---------+
| 13858 |
±---------+
1 row in set (0.04 sec)
Again an acceptable performance.
Does anyone has a tip what went wrong on the initial query and how to speed it up? Problem is I can’t work with much query changes, as the queries are build automatically by an reporting tool where the influence possibilities on the queries are very less… we’re using mysql in a data warehouse environement. Storage engine is MyISAM
Thanks
Torsten