Joining multiple tables turns an indexed query into a temp/filesort

– This first explain without the two feed tables work fine:EXPLAINSELECT raw_score.match_id AS id , raw_score.score AS raw_score FROM match_scores AS raw_score JOIN matches ON raw_score.match_id = matches.id/* LEFT OUTER JOIN feeds_a_source AS source_feed ON matches.source_item = source_feed.item_id LEFT OUTER JOIN feeds_a_target AS target_feed ON matches.target_item = target_feed.item_id*/ WHERE raw_score.scoring_section = ‘raw’ AND raw_score.run_id = 5 GROUP BY raw_score.match_id , raw_score.run_id , raw_score.score±—±------------±----------±-------±-----------------------------------------------------------------------------------------------±-------------------------------------------±--------±--------------------------------------±-----±-------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±----------±-------±-----------------------------------------------------------------------------------------------±-------------------------------------------±--------±--------------------------------------±-----±-------------------------+| 1 | SIMPLE | raw_score | ref | index_runid_scoring_section_match_id_score,index_match_scores_on_match_id_and_run_id_and_score | index_runid_scoring_section_match_id_score | 56 | const,const | 1776 | Using where; Using index || 1 | SIMPLE | matches | eq_ref | PRIMARY | PRIMARY | 4 | text_matching_tool.raw_score.match_id | 1 | Using index |±—±------------±----------±-------±-----------------------------------------------------------------------------------------------±-------------------------------------------±--------±--------------------------------------±-----±-------------------------±- But this query with the feed tables joined in causes it to do a temp/filesort on matches instead. What can I do to avoid this?EXPLAINSELECT raw_score.match_id AS id , raw_score.score AS raw_score FROM match_scores AS raw_score JOIN matches ON raw_score.match_id = matches.id LEFT OUTER JOIN feeds_a_source AS source_feed ON matches.source_item = source_feed.item_id LEFT OUTER JOIN feeds_a_target AS target_feed ON matches.target_item = target_feed.item_id WHERE raw_score.scoring_section = ‘raw’ AND raw_score.run_id = 5 GROUP BY raw_score.match_id , raw_score.run_id , raw_score.score±—±------------±------------±-----±-----------------------------------------------------------------------------------------------±----------------------------------------------------±--------±---------------------------------------±-----±--------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------------±-----±-----------------------------------------------------------------------------------------------±----------------------------------------------------±--------±---------------------------------------±-----±--------------------------------+| 1 | SIMPLE | matches | ALL | PRIMARY | NULL | NULL | NULL | 1012 | Using temporary; Using filesort || 1 | SIMPLE | raw_score | ref | index_runid_scoring_section_match_id_score,index_match_scores_on_match_id_and_run_id_and_score | index_match_scores_on_match_id_and_run_id_and_score | 8 | text_matching_tool.matches.id,const | 1 | Using where || 1 | SIMPLE | source_feed | ref | index_on_id | index_on_id | 13 | text_matching_tool.matches.source_item | 1 | Using index || 1 | SIMPLE | target_feed | ref | index_on_id | index_on_id | 13 | text_matching_tool.matches.target_item | 1 | Using index |±—±------------±------------±-----±-----------------------------------------------------------------------------------------------±----------------------------------------------------±--------±---------------------------------------±-----±--------------------------------+

So I found this morning that by moving the WHERE clause constraints on raw_scores into the matches table join, I could eliminate temp/filesort for this set of joins.

Unfortunately adding the next join just brings me right back to where I was.

Here is the full query with pieces commented out:

EXPLAINSELECT raw_score.match_id AS id , raw_score.score AS raw_score-- , weighted_score.score AS weighted_score/* , title_score.score AS title_score , director_score.score AS director_score , director_norm_score.score AS director_norm_score , year_score.score AS year_score*/ FROM match_scores AS raw_score JOIN matches ON raw_score.match_id = matches.id AND raw_score.run_id = 5 AND raw_score.scoring_section = ‘raw’ LEFT OUTER JOIN feeds_disney AS source_feed ON matches.source_item = source_feed.item_id LEFT OUTER JOIN feeds_amg AS target_feed ON matches.target_item = target_feed.v_id/* LEFT OUTER JOIN match_scores AS weighted_score ON matches.id = weighted_score.match_id AND weighted_score.scoring_section = ‘weighted’ AND weighted_score.run_id = raw_score.run_id LEFT OUTER JOIN match_scores AS title_score ON raw_score.match_id = title_score.match_id AND title_score.scoring_section = ‘title’ AND title_score.run_id = raw_score.run_id LEFT OUTER JOIN match_scores AS director_score ON raw_score.match_id = director_score.match_id AND director_score.scoring_section = ‘director’ AND director_score.run_id = raw_score.run_id LEFT OUTER JOIN match_scores AS director_norm_score ON raw_score.match_id = director_norm_score.match_id AND director_norm_score.scoring_section = ‘director_norm’ AND director_norm_score.run_id = raw_score.run_id LEFT OUTER JOIN match_scores AS year_score ON raw_score.match_id = year_score.match_id AND year_score.scoring_section = ‘year’ AND year_score.run_id = raw_score.run_id*/ GROUP BY raw_score.match_id, raw_score.run_id, raw_score.score-- , weighted_score.score/* , title_score.score , director_score.score , director_norm_score.score , year_score.score*/

And the explain:

±—±------------±------------±-------±--------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------±--------±---------------------------------------±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------------±-------±--------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------±--------±---------------------------------------±-----±------------+| 1 | SIMPLE | raw_score | ref | index_runid_scoring_section_match_id_score,index_match_scores_on_match_id_and_run_id_and_score,index_scoring_section_and_run_id | index_runid_scoring_section_match_id_score | 56 | const,const | 550 | Using where || 1 | SIMPLE | matches | eq_ref | PRIMARY | PRIMARY | 4 | text_matching_tool.raw_score.match_id | 1 | || 1 | SIMPLE | source_feed | ref | index_on_id | index_on_id | 13 | text_matching_tool.matches.source_item | 1 | Using index || 1 | SIMPLE | target_feed | ref | index_on_id | index_on_id | 13 | text_matching_tool.matches.target_item | 1 | Using index |±—±------------±------------±-------±--------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------±--------±---------------------------------------±-----±------------+

If I uncomment the weighted_score join:

±—±------------±---------------±-------±--------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------±--------±--------------------------------------------------±-----±---------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±---------------±-------±--------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------±--------±--------------------------------------------------±-----±---------------------------------------------+| 1 | SIMPLE | raw_score | ref | index_runid_scoring_section_match_id_score,index_match_scores_on_match_id_and_run_id_and_score,index_scoring_section_and_run_id | index_scoring_section_and_run_id | 56 | const,const | 550 | Using where; Using temporary; Using filesort || 1 | SIMPLE | matches | eq_ref | PRIMARY | PRIMARY | 4 | text_matching_tool.raw_score.match_id | 1 | || 1 | SIMPLE | source_feed | ref | index_on_id | index_on_id | 13 | text_matching_tool.matches.source_item | 1 | Using index || 1 | SIMPLE | target_feed | ref | index_on_id | index_on_id | 13 | text_matching_tool.matches.target_item | 1 | Using index || 1 | SIMPLE | weighted_score | ref | index_runid_scoring_section_match_id_score,index_match_scores_on_match_id_and_run_id_and_score,index_scoring_section_and_run_id | index_runid_scoring_section_match_id_score | 60 | const,const,text_matching_tool.raw_score.match_id | 1 | Using index |±—±------------±---------------±-------±--------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------±--------±--------------------------------------------------±-----±---------------------------------------------+

As you can see, I have several more joins after this one. The number of self joins for match_scores depend on how many scoring_sections the user has configured.

Do you think it is likely I’ll be able to find a combination of query structure and indices that will allow me to avoid temp/filesort?

If not, could I have some advice on the best my.cnf settings to make that temp/filesort as painless as possible? As it is, I save the results of this query into a “review” table that is used by the application for dynamic display. This way, resorting, filtering, and pagination don’t impact this big nasty query.
If I could get this query down to just a few seconds, I’d be okay with it, but right now, it is taking minutes.

Try adding ORDER BY NULL to that query, as you are not implicitly ordering the results, I think mysql server is sorting based on the group by columns.

I didn’t post so, but I actually did try ORDER BY NULL just to make sure.

As near as I can tell after reading the MySQL docs on how GROUP BY impacts query optimization, it doesn’t look like it would be possible for my query to avoid at least the filesort and in many cases, the temporary table. This seems to be because I am grouping by columns in multiple tables.

The good news is that by adjusting the temp table size variables in my config, I was able to keep these queries solely in memory which provides enough of a performance increase to satisfy my needs.

No I don’t think that you will be able to get rid of the filesort with a GROUP BY like that.

GROUP BY means that MySQL will need that the rows are in order according to the columns involved in the GROUP BY.
And since your group by columns come from different tables you can’t have an index that solves this since indexes can’t span over several tables.

If the GROUP BY columns are all part of one table and you have an combined index for them then MySQL can choose to use that index as you have seen.

Add this to my.cnf to raise the limit for in memory temp tables to 10MB:

tmp_table_size=10M

But does the query actually take minutes?
Because it looks like you don’t have that many rows.

How much memory are you allowing for index cache?
The key_buffer_size variable in my.cnf.
If the problem is that this cache is to small and mysql has to read the indexes from disk which can slow down things a lot.