Count .... Group By... any other way?

Hello all!

I am glad i found this site and i have to say it is awesome! Helped me in a lot of ways using mysql.

Here is my problem. I have two tables in which i store and index keywords for search engine purpose.

Tables are:

  • keyword2 (230.000 rows)

CREATE TABLE keyword2 ( id int(10) unsigned NOT NULL auto_increment, keyword varchar(50) NOT NULL default ‘’, PRIMARY KEY (id), UNIQUE KEY keyword (keyword)) ENGINE=MyISAM;

  • keyword2_index (almost 5.000.000 rows)

CREATE TABLE keyword2_index ( item_id int(10) unsigned NOT NULL default ‘0’, keyword_id int(10) unsigned NOT NULL default ‘0’, cat enum(‘a’,‘s’,‘l’) NOT NULL default ‘a’, type enum(‘a’,‘t’) NOT NULL default ‘a’, PRIMARY KEY (item_id,keyword_id,cat), KEY keyword_id (keyword_id)) ENGINE=MyISAM;

in table keyword2 are unque keywords for songs, albums, lyrics
in table keyword2_index are keyword id, song/lyric/album id, category and type of keyword (belonging to artist or title).

now to the query…
for instance i want to search for lyric with following keywords:
Shakira feat. Wyclef Jean Hips Don’t Lie

I strip string with php so i get clean keywords:
shakira feat wyclef jean hips don t lie

query:

SELECT keyword2_index.item_id AS id, COUNT(keyword2_index.item_id) AS numFROM keyword2_index,keyword2WHERE keyword2.id=keyword2_index.keyword_id AND keyword2_index.cat=‘l’ AND ( keyword2.keyword=‘shakira’ OR keyword2.keyword=‘feat’ OR keyword2.keyword=‘wyclef’ OR keyword2.keyword=‘jean’ OR keyword2.keyword=‘hips’ OR keyword2.keyword=‘don’ OR keyword2.keyword=‘t’ OR keyword2.keyword=‘lie’)GROUP BY keyword2_index.item_id ORDER BY num DESC;

this takes up to 50 seconds to execute. If i do it without COUNT … GROUP BY it takes less then a second.
I need the results to be grouped like that so i get the relevance of the search… from items with most matching keywords to items with less keywords (usualy i don’t display items with only one matching keyword)

EXPLAIN query…

id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE keyword2 range PRIMARY,keyword keyword 150 NULL 8 Using where; Using temporary; Using filesort1 SIMPLE keyword2_index ref keyword_id keyword_id 4 keyword2.id 21 Using where

i am using server with 1.5 Ghz and 512 RAM with Mysql 4.1.14

so… is there any other solution for this?

I hope you can help me.
Thank you in advance!

It would be helpful if you could provide both EXPLAIN statements and queries without COUNT and with count.

I expect in your case there is large amount of matches (so it is not really 8x21 rows you could see in the stats) so temporary table is needed etc.

I also should ask you why are you trying to implement search engine in SQL ? It is going to be slow as you will have to traverse very many rows for common keywords.

[B]Quote:[/B]
I also should ask you why are you trying to implement search engine in SQL ? It is going to be slow as you will have to traverse very many rows for common keywords.
What should i use then?

Here is EXPLAIN with COUNT … GROUP BY

id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE keyword2 range PRIMARY,keyword keyword 150 NULL 8 Using where; Using temporary; Using filesort1 SIMPLE keyword2_index ref keyword_id keyword_id 4 keyword2.id 21 Using where

Here is EXPLAIN without COUNT … GROUP BY

id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE keywordd range PRIMARY,keyword keyword 150 NULL 8 Using where1 SIMPLE keyword2_index ref keyword_id keyword_id 4 keyword2.id 21 Using where

Thank you!

Matej

And if you do repeated runs for these queries they take 50 and 1 sec appropriately ?

repeated is important as first run could have been uncached and so very different.

You can use ether MySQL Full Text search (which is also slow but should work for your sizes), Sphinx search (http://www.sphinxsearch.com), lucene or other solutions.

yes repeated run is faster but still slow … 1-10sec … i think it depends on number of results.
Problem is also that table will grow… i expect it will grow up to 10 mil by the end of the year.

How would Full Text version look like then… table desgin and query?

Matej

If you’re using full text search you will simply create FULLTEXT index on columns you want to search.

Check this page for details:

[URL]http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html[/URL]

Thank you for all your help!

I have treid Mysql FULL TEXT index and i have to say i was impressed by speed! Queries that took 50 sec before now took only max 3 sec.
But then i installed Sphinx and i think i will stick to it.

It feels so great to make query load from almost a minute down to less then a second… and it’s ordered by relevance too )

Matej

Good to hear you got your problem solved

On medium data sizes both sphinx and MySQL FT Search work well, sphinx however scales much better