bad optimization or bad structure?


I’m working a database that will contain a bunch of releases (movie-titles if you will) and each release (movie) will get a few tags depending on what type of movie it is, eg. english, pal, ntsc etc…

When i list these it takes about 2s (100 rows) and as EXPLAIN has explained, it uses filesort and im guessing thats why its slow.

I’ve tried to use indexes as far as i know but i haven’t really understood how to use them just yet i guess.

Anyway, here is the structure:

CREATE TABLE releases ( r_id int(11) NOT NULL auto_increment, rel_name text NOT NULL, rel_title text NOT NULL, rel_date int(11) NOT NULL default ‘0’, add_date int(11) NOT NULL default ‘0’, imdb text NOT NULL, group_id int(11) NOT NULL default ‘0’, section_id int(11) NOT NULL default ‘0’, add_nickhost text NOT NULL, PRIMARY KEY (r_id), KEY section_id (section_id), KEY group_id (group_id), KEY rel_date (rel_date)) ENGINE=MyISAM DEFAULT CHARSET=latin1;CREATE TABLE sections ( s_id tinyint(4) NOT NULL auto_increment, section text NOT NULL, PRIMARY KEY (s_id), KEY section (section(8))) ENGINE=MyISAM DEFAULT CHARSET=latin1;CREATE TABLE sections ( s_id tinyint(4) NOT NULL auto_increment, section text NOT NULL, PRIMARY KEY (s_id), KEY section (section(8))) ENGINE=MyISAM DEFAULT CHARSET=latin1;CREATE TABLE groups ( g_id int(11) NOT NULL auto_increment, group_name text NOT NULL, PRIMARY KEY (g_id)) ENGINE=MyISAM DEFAULT CHARSET=latin1;

releases has about 80000 rows at this point and tags about 10000.

query i use is:

SELECT r_id, rel_date, rel_name, rel_title, group_name , imdb, section, GROUP_CONCAT( DISTINCT tag SEPARATOR ’ ’ ) as tags FROM tags INNER JOIN releases ON releases.r_id = tags.rel_id INNER JOIN groups ON releases.group_id = groups.g_id INNER JOIN sections ON releases.section_id = sections.s_id WHERE section = ‘dvdr’ GROUP BY rel_id ORDER BY rel_date DESC LIMIT 0,100

I’m pretty much stuck here and i hope you can at least point me in the right direction.


Best Regards, Patrik.

You have not posted EXPLAIN for the query so it is hard to tell exactly.

I think it is analyzes much more than 100 rows, looking at amount of joins you have and 100 rows may apply just to one of the tables.

This query is complex and I would not do such queries in interractive application.

oh, sorry. didn’t think of EXPLAIN.

here it is:

±—±------------±---------±-------±----------------------------±--------±--------±----------------------------±-------±--------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±---------±-------±----------------------------±--------±--------±----------------------------±-------±--------------------------------+| 1 | SIMPLE | tags | ALL | NULL | NULL | NULL | NULL | 105208 | Using temporary; Using filesort || 1 | SIMPLE | releases | eq_ref | PRIMARY,section_id,group_id | PRIMARY | 4 | dupenew.tags.rel_id | 1 | || 1 | SIMPLE | groups | eq_ref | PRIMARY | PRIMARY | 4 | dupenew.releases.group_id | 1 | || 1 | SIMPLE | sections | eq_ref | PRIMARY,section | PRIMARY | 1 | dupenew.releases.section_id | 1 | Using where |±—±------------±---------±-------±----------------------------±--------±--------±----------------------------±-------±--------------------------------+

As you said it doesn’t just check 100 rows.

I don’t know what you mean with an interactive application but its for a web page, but still I’d like it to be fast and i need all the info this query returns.

Is it possible to change the structure of the database to be able to fetch it faster? but still use relations.

Best Regards, Patrik

I’d suggest you to ether

  • pre-generate results (ie have cron job which updates data each 5 minutes)

  • think about data structure which can be updated in real time but provide this info quickly.

I guess I will overlook the structure then, cause to pre-generate is not an option. since its also possible to search and exclude some types of rows (thats what ‘tags’ are for), i don’t want this to be much slower then the “default view”.

thanks for looking over this for me.

Best Regards, Patrik

Well if you can’t pre-generate you will need to build it so you can retrieve quickly the info you need building dynamic summary tables.

It can be often done with tags too.

If it needs to be fully dynamic you can try denormalizing it and then making parallel on number of CPUs