Index usage with big IN()

Hi there,

I have some troubles understanding what happens with one of my queries.

Here is the table structures.
There is a tiny dataset within (less than 5000 entries in each).

CREATE TABLE tracker_artifact ( id int(11) NOT NULL auto_increment, tracker_id int(11) NOT NULL, last_changeset_id int(11) NOT NULL, submitted_by int(11) NOT NULL, submitted_on int(11) NOT NULL, use_artifact_permissions tinyint(1) NOT NULL default ‘0’, PRIMARY KEY (id), KEY idx_tracker_id (tracker_id), KEY idx_my (submitted_by,tracker_id,last_changeset_id), KEY idx_last_changeset_id (last_changeset_id,id)) ENGINE=MyISAM AUTO_INCREMENT=2842 DEFAULT CHARSET=utf8;CREATE TABLE tracker_changeset ( id int(11) NOT NULL auto_increment, artifact_id int(11) NOT NULL, submitted_by int(11) default NULL, submitted_on int(11) NOT NULL, email varchar(255) default NULL, PRIMARY KEY (id), KEY artifact_id (artifact_id,id)) ENGINE=MyISAM AUTO_INCREMENT=3342 DEFAULT CHARSET=utf8;

Then I have a query that looks like that:

SELECT c.idFROM tracker_changeset AS c INNER JOIN tracker_artifact AS a ON (a.id = c.artifact_id)WHERE a.id IN (928,933,934,…) AND c.id IN (1398,1432,…);

But the 2 ranges are quite big (~1900 values in each) and the query is taking a lot of time (8s) but I don’t undestand why.
Even explain is taking ages:

±—±------------±------±------±--------------------±------------±--------±-------------±-----±-------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±------±--------------------±------------±--------±-------------±-----±-------------------------+| 1 | SIMPLE | a | range | PRIMARY | PRIMARY | 4 | NULL | 1909 | Using where; Using index | | 1 | SIMPLE | c | ref | PRIMARY,artifact_id | artifact_id | 4 | a.id | 1 | Using where; Using index | ±—±------------±------±------±--------------------±------------±--------±-------------±-----±-------------------------+2 rows in set (3.33 sec)

2 questions arise:
→ What I’m doing wrong there
→ What is the impact of huge IN() statements?

Hi,

Query plan looks perfect but it’s kinda estimation so we need to know what exactly the query is doing.
Please provide output of below commands and some more information like MySQL version, my.cnf parameters etc.

FLUSH STATUS;
SELECT c.id FROM tracker_changeset AS c…
SHOW STATUS;

Profiling the query can also be helpful.

SET PROFILING=1;
SELECT c.id FROM tracker_changeset AS c…
SHOW PROFILE FOR query 1;

Hi Niljoshi, thanks for your answer
here is the status (profiling not available on my old version of mysql)

mysql> SHOW STATUS like ‘handler_read%’;±----------------------±------+| Variable_name | Value |±----------------------±------+| Handler_read_first | 0 || Handler_read_key | 22435 || Handler_read_next | 14418 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_next | 0 |±----------------------±------+6 rows in set (0.00 sec)

BTW, in the meantime I ran an “optimize” on all tables involved and it basically solved the perf issue.

Thus, I had to modify one of the index on tracker_changeset table (artifact_idx) that where on (artifact_id, id), id being the PK.
Now the index is only on ‘artifact_id’ and the query is fast.

I’m a bit confused because:

  • I through it was a good practice to add the PK as second index member
  • If I convert my tables to InnoDB, the query is fast.