can someone EXPLAIN this :)

explain SELECT s.*, d.NAME as DOMAIN_NAME, d.ID as DOMAIN_ID, d.CANONICAL as CANONICAL, COUNT(d.CANONICAL) as SUMMA, l.TWO_LETTER_CODE as CODE, c.ATTEMPTED_AT, TIME_TO_SEC(TIMEDIFF(NOW(),c.ATTEMPTED_AT)) AS DIFFA FROM SEEDS s LEFT JOIN DOMAINS d ON s.domain_id = d.idLEFT JOIN DOMAINS c ON c.id = d.canonicalLEFT JOIN LANGUAGES l ON s.language_id = l.id WHERE s.STATUS = -6 AND s.ID NOT IN (SELECT p.SEED_ID FROM PAGES p) AND ( (c.ATTEMPTED_AT = 0) OR (TIME_TO_SEC(TIMEDIFF(NOW(),c.ATTEMPTED_AT))>5) ) GROUP BY d.CANONICAL ORDER BY c.ATTEMPTED_ATLIMIT 100;

It’s not very slow (about 6/7 seconds) but I would like it less than 1 second.

±-±-----------------±----±-------------±-----------------±---------±------±--------------------------±—±------------------------------+|id|select_type…|table|type…|possible_keys…|key…|key_len|ref…|rows|Extra…|±-±-----------------±----±-------------±-----------------±---------±------±--------------------------±—±------------------------------+|.1|PRIMARY…|.d…|index…|PRIMARY,CANONICAL.|CANONICAL.|.5…|NULL…|.22.|Using.temporary;.Using.filesort||.1|PRIMARY…|.c…|eq_ref…|PRIMARY…|PRIMARY…|.4…|social_search.d.CANONICAL…|…1.|Using.where…||.1|PRIMARY…|.s…|ref…|URL_IDX,DOMAIN_IDX|DOMAIN_IDX|.4…|social_search.d.ID…|418.|Using.where…||.1|PRIMARY…|.l…|eq_ref…|PRIMARY…|PRIMARY…|.4…|social_search.s.LANGUAGE_ID|…1.|…||.2|DEPENDENT.SUBQUERY|.p…|index_subquery|SEED_IDX…|SEED_IDX…|.4…|func…|…1.|Using.index…|±-±-----------------±----±-------------±-----------------±---------±------±--------------------------±—±------------------------------+

Thanks in advance!!

The last line of your EXPLAIN results, the DEPENDENT SUBQUERY, is most likely your problem. The NOT IN sub-query is referencing tables which are outside the sub-query, which causes the sub-query to be re-executed for every row examined by your main select. So while you think you’re executing one query, you’re actually running about a thousand. You can probably convert this to a left join of the form:

SELECT A.*
FROM A
LEFT JOIN B
ON A.foo = B.foo
WHERE
B.foo IS NULL

Which captures rows from A which don’t match B (basically, a NOT IN).

Hi there,

I removed the NOT IN … and it sped it up a small bit.

I am worried about the temporary tables (Using temp; using filesort) created because of the ORDER BY but I can’t seem them optimize them away.

It’d be good to speed this
(TIME_TO_SEC(TIMEDIFF(NOW(),c.ATTEMPTED_AT))>5)
up maybe?

I’m out of ideas…