Query takes 14 seconds, filesort seems to be the problem

Hi,

I’ve a really simple database, with only one table, which contains 2.24 million rows of image information, and who took the picture etc.

keyData:
±---------±-----------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±---------±-----------------±-----±----±--------±------+| picID | int(10) unsigned | NO | PRI | | || keywords | varchar(4300) | YES | MUL | NULL | || phCode | varchar(100) | YES | MUL | NULL | || phName | varchar(100) | YES | MUL | NULL | || trCode | varchar(1000) | YES | MUL | NULL | |±---------±-----------------±-----±----±--------±------+

The indexes are setup as follows; with a btree index on the primary key (picID) and fulltext indexes on all the others.

±--------±-----------±---------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |±--------±-----------±---------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+| keyData | 0 | PRIMARY | 1 | picID | A | 2246297 | NULL | NULL | | BTREE | || keyData | 1 | kWord | 1 | keywords | NULL | 1 | NULL | NULL | YES | FULLTEXT | || keyData | 1 | pCode | 1 | phCode | NULL | 1 | NULL | NULL | YES | FULLTEXT | || keyData | 1 | pName | 1 | phName | NULL | 1 | NULL | NULL | YES | FULLTEXT | || keyData | 1 | tCode | 1 | trCode | NULL | 1 | NULL | NULL | YES | FULLTEXT | |±--------±-----------±---------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+

Now the select I’m doing is as follows:

select picID from keyData where match(keywords) against(‘+new +york’ in boolean mode) order by picID desc limit 14;

Now this can take anything from 6 seconds to 17, depending on whether it’s cached etc.

I’ve tried increasing the myisam_sort_buffer_size and the key cache and query cache, before realising the bottle neck appears to be the sorting (yeah; slow on the up-take)

Now an explain of this query offers the following

±—±------------±--------±---------±--------------±------±--------±-----±-----±----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±---------±--------------±------±--------±-----±-----±----------------------------+| 1 | SIMPLE | keyData | fulltext | kWord | kWord | 0 | | 1 | Using where; Using filesort |±—±------------±--------±---------±--------------±------±--------±-----±-----±----------------------------+

Having read what the filesort algorithm actually does, I’m not surprised it’s slow.

Is there a way of forcing this to not use this approach? I’ve done some searching on this site of similar problems, and it seems combining the primary key with another key would help, but I don’t have another numeric key to combine with.

Any help would be fantastic, thanks in advance.

How many rows it returns ?

How many rows match to each of the keywords you’re using ?

In this case as you’re using order by LIMIT will not help as all result set will need to be generated from full text search index which is rather slow on bigger data volumes.

I’ve give Sphinx a try it usually works 10-100 times faster for queries like this (because it internally optimizes sorting and limit among other things)

[B]Peter wrote on Tue, 06 March 2007 14:45[/B]
How many rows it returns ?

How many rows match to each of the keywords you’re using ?

In this case as you’re using order by LIMIT will not help as all result set will need to be generated from full text search index which is rather slow on bigger data volumes.

I’ve give Sphinx a try it usually works 10-100 times faster for queries like this (because it internally optimizes sorting and limit among other things)

It returns 274396 rows.

If I understand your question correctly, 331678 matches ‘new’ and 278870 matches ‘york’.

What is this Sphinx you speak of? Does it work with MySQL?

Thanks for your help and speedy response.