baffled by LEFT JOIN versus INNER JOIN speed

I’ve got a two-table setup (well, actually I have 7 tables, with 5 more Authorfunction* tables like the one below, but these 5 don’t make much difference in the problem):

CREATE TABLE authorfunctionA (
ID mediumint(9) NOT NULL default ‘0’,
value varchar(255) NOT NULL default ‘’,
KEY ID (ID),
KEY value (value),
FULLTEXT KEY val (value)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

and

CREATE TABLE stories_full (
ID mediumint(9) NOT NULL default ‘0’,
title varchar(255) NOT NULL default ‘’,
displaytitle varchar(255) NOT NULL default ‘’,
story mediumtext NOT NULL,
PRIMARY KEY (ID)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

the first table has 165.000 records, the 2nd table has 40.000 records. If I run the query below, mysql needs a full 2 seconds to compute :

SELECT stories_full.* FROM stories_full LEFT JOIN authorfunctionA AS field_0 ON field_0.id=stories_full.id WHERE (MATCH(field_0.value) AGAINST (‘+Roc*’ IN BOOLEAN MODE))

now if I run the same query with a normal join :

SELECT stories_full.* FROM stories_full JOIN authorfunctionA AS field_0 ON field_0.id=stories_full.id WHERE (MATCH(field_0.value) AGAINST (‘+Roc*’ IN BOOLEAN MODE)), the result is done in 0.002 seconds

(this query is still relatively simple : in real life, i’m running 4 joins, and the queries takes up to 15 seconds sometimes)

here are the explains for the 2 queries :

±—±------------±-------------±-----±--------------±-----±--------±-------------------------±-------±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±-------------±-----±--------------±-----±--------±-------------------------±-------±------------+| 1 | SIMPLE | stories_full | ALL | NULL | NULL | NULL | NULL | 164406 | | | 1 | SIMPLE | field_0 | ref | ID | ID | 3 | ipbfull2.stories_full.ID | 1 | Using where | ±—±------------±-------------±-----±--------------±-----±--------±-------------------------±-------±------------+

and

±—±------------±-------------±---------±--------------±--------±--------±--------------------±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±-------------±---------±--------------±--------±--------±--------------------±-----±------------+| 1 | SIMPLE | field_0 | fulltext | ID,val | val | 0 | | 1 | Using where | | 1 | SIMPLE | stories_full | eq_ref | PRIMARY | PRIMARY | 3 | ipbfull2.field_0.ID | 1 | | ±—±------------±-------------±---------±--------------±--------±--------±--------------------±-----±------------+

so apparently, the LEFT JOIN is simply not using the index !!!

I’ve tested this on mysql 4.1.14 and 5.0.45 with the same results

can anyone shed some light on this matter ? I’m totally in the dark on how to optimize this (

maaaaaany many thanks in advance !

ah, a PS : i realised that you might think the query execution order could be in play : running a LEFT join, followed by an inner join might have some query cached. So I ran a dozen more queries in mixed order (first the inner join, then the left join), and every time, the left join was substantially slower

I think that the issue at play here is a misunderstanding of how a LEFT JOIN works.

An INNER JOIN selects only those records which are common to both tables. Thus you notice that the number of records operated on is only 1, and both indexes are used.

A LEFT JOIN selects all records from the left side, and whatever records from the right side that match up, filling in NULL values for the right side where there’s no match. That’s why it did a full scan of the stories_full table, but used the index on the field_0 table.

Unless there’s a real need for a LEFT JOIN, don’t use it. In this case, you really don’t need it, since you’re only returning values from the stories_full table, and only using the other table to narrow down the results you’re going to return.

-Doug

EDIT: Corrected FULL to INNER JOIN.