What's the difference in performance (why)?

Hi All,

The 2 queries below are the same except the “ON” clause is on 2 different sets of fields.

The first ON clause is based on the “email” field in both tables. In both tables the email field is a “unique” index.

The second ON clause is based on the “user_id” field which is a primary key in the users table (id) and only indexed in the subscribers table.

Why would the “email” query take such a long time vs. the “id” query when both sets of fields are indexed? Is it the text comparison vs. the integer comparison?

This query takes a really long time.

SELECT M.* FROM users AS M
LEFT JOIN subscribers AS N
ON M.email = N.email
WHERE
N.id IS NULL AND
M.block=0 AND
M.registerDate<>‘0000-00-00 00:00:00’

This query takes a very short time.

SELECT M.* FROM users AS M
LEFT JOIN subscribers AS N
ON M.id = N.user_id
WHERE
N.id IS NULL AND
M.block=0 AND
M.registerDate<>‘0000-00-00 00:00:00’

What does EXPLAIN report? Storage engine? Time taken per query (really long / very short is not very descriptive)? Any multi-column indices?

Very wild guess: the index on e-mail is too large to be fully kept in memory.

Thanks for the response.

Here are additional details:

  1. Explain on the “fast” query comes up with 2 rows:
    On the users table (M), it ends up using a “where” clause and looking at 153,000 records. On the subscribers table (N), it ends up looking at 1 row using “where, not exists” but also relying on the “user_id” index of “N”.

  2. Explain on the “slow” query comes up with 2 rows:
    On the users table (M), it ends up using a “where” clause and looking at 153,000 records. On the subscribers table (N), it ends up using a “where, not exists” clause and looking at 152,000 records without using an index, this is probably a big part of the problem.

  3. Storage engine for both tables is: MyISAM.

  4. The fast query takes about 3 to 4 seconds, the slow query is over 50 seconds and essentially holds up the dedicated server we are using.

  5. There are no “multi-field” indexes in either table.

Any more feedback you could provide would be really appreciated.

Thanks

The problem is that it is not using an index to look up the emailaddrsss in the N table.

What’s the data type of the e-mail column? And what’s the length of the email index?

If M.block=0 AND M.registerDate<>‘0000-00-00 00:00:00’ is very restrictive (filters at least 95%), add a multi-column index on (block,registerDate).

Thanks for the quick response.

Each table has an email address unique index, field type is varchar(100). I’m including the table information for both below. As I’m writing this, I noticed that the charset of users is set as utf8 and the charset of subscribers is set as latin1. Would that matter?

CREATE TABLE users (
id int(11) NOT NULL auto_increment,
name varchar(255) NOT NULL default ‘’,
username varchar(150) NOT NULL default ‘’,
email varchar(100) NOT NULL default ‘’,
password varchar(100) NOT NULL default ‘’,
usertype varchar(25) NOT NULL default ‘’,
block tinyint(4) NOT NULL default ‘0’,
sendEmail tinyint(4) default ‘0’,
gid tinyint(3) unsigned NOT NULL default ‘1’,
registerDate datetime NOT NULL default ‘0000-00-00 00:00:00’,
lastvisitDate datetime NOT NULL default ‘0000-00-00 00:00:00’,
activation varchar(100) NOT NULL default ‘’,
params text NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY email (email),
UNIQUE KEY username (username),
KEY usertype (usertype),
KEY idx_name (name),
KEY gid_block (gid,block)
) ENGINE=MyISAM AUTO_INCREMENT=155071 DEFAULT CHARSET=utf8;

CREATE TABLE subscribers (
id int(11) NOT NULL auto_increment,
user_id int(11) NOT NULL default ‘0’,
name varchar(64) character set latin1 collate latin1_general_ci NOT NULL default ‘’,
email varchar(100) character set latin1 collate latin1_general_ci NOT NULL default ‘’,
receive_html tinyint(1) NOT NULL default ‘1’,
confirmed tinyint(1) NOT NULL default ‘0’,
blacklist tinyint(1) NOT NULL default ‘0’,
timezone time NOT NULL default ‘00:00:00’,
language_iso varchar(10) character set latin1 collate latin1_general_ci NOT NULL default ‘eng’,
subscribe_date datetime NOT NULL default ‘0000-00-00 00:00:00’,
params text character set latin1 collate latin1_general_ci,
PRIMARY KEY (id),
UNIQUE KEY email (email),
KEY subscribe_date (subscribe_date),
KEY user_id (user_id),
KEY name (name)
) ENGINE=MyISAM AUTO_INCREMENT=171388 DEFAULT CHARSET=latin1;

Yes, character set is your problem. After you alter it, the second query should be as fast.

Please read [URL=“http&#58;&#47;&#47;www.xaprb.com/blog/tag/vadim-tkachenko/”]http://www.xaprb.com/blog/tag/vadim-tkachenko/[/URL] after “Generosity can be Unwise”.

Usertype could probably an enum-field, and I doubt an index helps much for that field.

Thanks!

I actually changed the character set of both email fields to “ascii”. That took care of the problem.

Thanks!