I am working on a tough mySQL problem, and need some help optimizing an ORDER BY query. Here is a table description:
CREATE TABLE urls
(
urls_id
bigint(32) NOT NULL auto_increment,
urls_url
varchar(96) collate utf8_unicode_ci NOT NULL default ‘’,
urls_last_updated
datetime NOT NULL default ‘0000-00-00 00:00:00’,
urls_added
datetime NOT NULL default ‘0000-00-00 00:00:00’,
urls_active
tinyint(1) NOT NULL default ‘0’,
urls_type
enum(‘MAIN’,‘CATEGORY’,‘LISTING’) collate utf8_unicode_ci NOT NULL default ‘LISTING’,
urls_name
varchar(255) collate utf8_unicode_ci NOT NULL default ‘’,
urls_update_count
int(16) NOT NULL default ‘0’,
urls_hold_until
datetime NOT NULL default ‘0000-00-00 00:00:00’,
urls_last_discovered
datetime NOT NULL default ‘0000-00-00 00:00:00’,
urls_dropdown_name
varchar(64) collate utf8_unicode_ci NOT NULL default ‘’,
urls_flag
enum(‘NONE’,‘EXPIRED’,‘FLAGGED’,‘REMOVED’) collate utf8_unicode_ci NOT NULL default ‘NONE’,
PRIMARY KEY (urls_id
),
UNIQUE KEY urls_unique
(urls_url
),
KEY get_urls
(urls_active
,urls_hold_until
,urls_flag
,urls_type
),
KEY last_updated
(urls_last_updated
),
KEY get_urls2
(urls_active
,urls_hold_until
,urls_flag
,urls_type
,ur ls_last_updated
)
) ENGINE=InnoDB AUTO_INCREMENT=7621125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
select count(*) from urls;
±---------+
| count(*) |
±---------+
| 7620415 |
±---------+
1 row in set (28.24 sec)
The indexes are experimental to do the query I am trying to optimize below, and are not permanent to the schema.
here is the query I am trying to optimize:
SELECT urls_id, urls_url, urls_type, urls_last_updated FROM urls WHERE urls_active = ‘1’ and urls_flag = ‘NONE’ and urls_hold_until < now() and urls_type = ‘CATEGORY’ ORDER BY urls_last_updated LIMIT 100;
Explain on this query:
±—±------------±------±-----±-------------------±— ------±--------±------±-------±------------------------- —+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±-----±-------------------±— ------±--------±------±-------±------------------------- —+
| 1 | SIMPLE | urls | ref | get_urls,get_urls2 | get_urls | 1 | const | 418179 | Using where; Using filesort |
±—±------------±------±-----±-------------------±— ------±--------±------±-------±------------------------- —+
Filesort is a big problem, so if I remove ORDER BY the “Using filesort” goes away. Here is the explain on this query (dropping the ORDER BY)
SELECT urls_id, urls_url, urls_type, urls_last_updated FROM urls WHERE urls_active = ‘1’ and urls_flag = ‘NONE’ and urls_hold_until < now() and urls_type = ‘CATEGORY’ LIMIT 100;
±—±------------±------±------±--------------±------- —±--------±-----±--------±------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±------±--------------±------- —±--------±-----±--------±------------+
| 1 | SIMPLE | urls | range | get_urls2 | get_urls2 | 9 | NULL | 3815726 | Using where |
±—±------------±------±------±--------------±------- —±--------±-----±--------±------------+
1 row in set (0.00 sec)
The first query with the ORDER BY takes 1 minute 38 seconds over 7 million rows. The second query takes 0.01 seconds (with query cache turned off)
How do I get this query optimized, understanding that I MUST have the order by in there, or the equivalent.
Building another table, running periodic summary queries into another table are not acceptable solutions, unfortunately (