How to speed up a simple query?

sql_er · October 30, 2009, 2:15pm

Hello,

We have the following 2 tables (and indexes) in our database:

Table 1

CREATE TABLE linkentityarticle(id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, article_id INT UNSIGNED NOT NULL, entity_id INT UNSIGNED NOT NULL, score TINYINT UNSIGNED NOT NULL) Engine = InnoDB;

Indexes

entity_id
entity_id + article_id
article_id + entity_id
score

Total # rows in table: 10 million
Total space used by table: 1.5 GB

Table 2

CREATE TABLE entity(id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, node VARCHAR(255)) Engine = InnoDB;

Indexes

node

Total # rows in table: 600,000
Total space used by table: 88 MB

When I run the following query against these tables, it runs for over 30 seconds:

SELECT DISTINCT article_id

FROM linkentityarticle d
INNER JOIN
(
SELECT id FROM entity
WHERE node IN(‘sports’)

) AS T1
ON T1.id = d.entity_id
ORDER BY d.article_id DESC
LIMIT 50;

If I remove ORDER BY d.article_id DESC, it takes 0.3 seconds (i.e. 100 fold improvement).

Below is the EXPLAIN PLAN WITH ORDER BY

id: 1
select_type: PRIMARY
table:
type: system
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: d
type: ref
possible_keys: linkentityarticle_entity_id,index_linkentityarticle_entity_i d_article_id
key: linkentityarticle_entity_id
key_len: 4
ref: const
rows: 6708
Extra: Using where
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: drift_entity
type: const
possible_keys: unique_entity_node
key: unique_entity_node
key_len: 257
ref:
rows: 1
Extra: Using index

Below is the EXPLAIN WITHOUT ORDER BY:

id: 1
select_type: PRIMARY
table:
type: system
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra: Using temporary
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: d
type: ref
possible_keys: linkentityarticle_entity_id,index_linkentityarticle_entity_i d_article_id
key: linkentityarticle_entity_id
key_len: 4
ref: const
rows: 6708
Extra:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: drift_entity
type: const
possible_keys: unique_entity_node
key: unique_entity_node
key_len: 257
ref:
rows: 1
Extra: Using index

We are using MySQL version 5.0.44 and our InnoDB Buffer Pool is set to 2GB.

Is there anything that can be done to speed up this query when ORDER BY is used (as we need to use it to get the latest data).

Thank you in advance!

vgatto · October 30, 2009, 6:48pm

If you omit the sort and limit, how many rows are in the result set? It should be quite a few if sorting takes that long. You should also be able to re-write that query to remove the inner select in favor of an inner join, though I’m not sure if that will help any, but you should get a different query plan out of EXPLAIN.

Before doing anything else, seriously consider switching that BIGINT to INT. It will shrink the size of the table and all the indexes considerably, and make everything you do with that table faster.

gmouse · October 31, 2009, 1:43pm

LIMIT makes your second query fast. Your first index is redundant since it is contained in your second index. I would change the lay-out of your first table to make use of a clustered index:

Table 1

CREATE TABLE linkentityarticle(
article_id INT UNSIGNED NOT NULL,
entity_id INT UNSIGNED NOT NULL,
score TINYINT UNSIGNED NOT NULL)
Engine = InnoDB;

Indexes

UNIQUE article_id+entity_id
entity_id + article_id
score

Table 2

CREATE TABLE entity(
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
node VARCHAR(255)
) Engine = InnoDB;

Indexes

node

SELECT DISTINCT article_id
FROM linkentityarticle d
WHERE id IN(SELECT id FROM entity e WHERE node IN(‘sports’) AND e.id = d.entity_id)
ORDER BY d.article_id DESC
LIMIT 50;

To check whether this helps, try this query first:

SELECT DISTINCT article_id
FROM linkentityarticle d FORCE INDEX(name of article_id + entity_id index)
WHERE id IN(SELECT id FROM entity e WHERE node IN(‘sports’) AND e.id = d.entity_id)
ORDER BY d.article_id DESC
LIMIT 50;

sql_er · November 4, 2009, 12:26pm

vgatto,

Ommitting LIMIT I get 94383 rows as a result
Rewriting the query to use the INNER JOIN I do get a slightly different explain plan, but the performance of the query did not improve
I don’t prefer to change BIGINT to INT, as we might run out of integer space. We had it as INT in another system, and we ran out of space in ~ 2 years, and had to change to BIGINT. So although we could keep it as INT for some time, I prefer to come up with some permanent solution to this query optimization (i.e. not something which would only last until we run out of INT space)

gmouse,

You are right. First index was redundant. I dropped it, but it did not help the speed of the query
I cannot change the layout of the table, as there are many stored procedures which use the id field we have in this table

It seems to me that filesort is slowing down the query. Is there anything that can be done to make this sorting faster?

Thanks a lot!

gmouse · November 4, 2009, 1:09pm

What about

SELECT DISTINCT article_id
FROM linkentityarticle d FORCE INDEX(name of article_id + entity_id index)
WHERE id IN(SELECT id FROM entity e WHERE node IN(‘sports’) AND e.id = d.entity_id)
ORDER BY d.article_id DESC
LIMIT 50;

+explain

sql_er · November 4, 2009, 1:24pm

gmouse,

I tried the suggestion, and performance got worse. It now ran for 1 min and 12 seconds. Rows field shows that it searched through the whole table now [Interestingly, it did not use filesort now]. Below is the explain:

*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: d
type: index
possible_keys: NULL
key: unique_linkentityarticle_article_id_entity_id
key_len: 8
ref: NULL
rows: 9913948
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: e
type: const
possible_keys: PRIMARY,unique_entity_node
key: unique_entity_node
key_len: 257
ref: const
rows: 1
Extra: Using index

gmouse · November 4, 2009, 2:24pm

I see. Actually it stops after 50 rows are found but appearantly this can take a very long time for seldomly used node names. The only way to improve running time is by having an index on (node,article_id), but then you must have a table that contains both these fields.

sql_er · November 4, 2009, 3:29pm

gmouse,

Thank you for the suggestions.

I am currently creating such a table - just to test things out - if results are good I’ll consider implementing it in production.

Topic		Replies	Views
ORDER BY ..LIMIT using indexes still slow Other MySQL® Questions	4	587	April 14, 2008
Optimizing simple order by query Other MySQL® Questions	1	337	December 25, 2009
MySQL: Avoid filesort when using ORDER BY LIMIT Other MySQL® Questions	3	4033	April 15, 2015
SELECT * FROM thistable ORDER BY date Other MySQL® Questions	3	449	September 5, 2007
How MySql optimises LIMIT 1 with ORDER BY by expression Other MySQL® Questions	2	599	September 3, 2007

How to speed up a simple query?

Table 1

Indexes

Table 2

Indexes

Table 1

Indexes

Table 2

Indexes

Related topics