Apparently bizarre query plan joins all rows of huge table to tiny table

Hello,

I would love some help understanding what I believe to be a bizarre query plan, and how I can avoid it. The table alerts_ad_staging1 is huge (~420M rows), and the table ads_page_group is joined in, but it only has 4 rows. If I remove the join, the query uses the primary key against the alerts_ad_staging1 table, as I would expect. This query should select about 1.8M rows, so the estimated 47807 rows in the second step of the query plan makes no sense to me.

mysql version: mysql Ver 14.14 Distrib 5.1.31, for debian-linux-gnu (x86_64) using EditLine wrapperquery:explain extended select aa.id, ads_page_group.name from alerts_ad_staging1 aa join ads_page_group on aa.page_group_id = ads_page_group.id where aa.id between 418536945 and 420368655\Gquery plan:*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ads_page_group type: ALLpossible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 4 filtered: 100.00 Extra:*************************** 2. row *************************** id: 1 select_type: SIMPLE table: aa type: refpossible_keys: PRIMARY,alerts_ad_page_group_id key: alerts_ad_page_group_id key_len: 1 ref: bv.ads_page_group.id rows: 47807 filtered: 100.00 Extra: Using where; Using index*************************** 1. row table definitions:mysql> show create table ads_page_group\G 1. row *************************** Table: ads_page_groupCreate Table: CREATE TABLE ads_page_group ( id tinyint(3) unsigned NOT NULL AUTO_INCREMENT, name varchar(6) NOT NULL, PRIMARY KEY (id)) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin11 row in set (0.00 sec)mysql> select * from ads_page_group;±—±-------+| id | name |±—±-------+| 1 | Top | | 2 | Right | | 3 | Left | | 4 | Bottom | ±—±-------+4 rows in set (0.00 sec)CREATE TABLE alerts_ad_staging1 ( id int(11) NOT NULL AUTO_INCREMENT,…other columns not referenced by this query… page_group_id tinyint(3) unsigned NOT NULL, PRIMARY KEY (id,date),…other keys not referenced by this query… KEY alerts_ad_page_group_id (page_group_id),) ENGINE=InnoDB AUTO_INCREMENT=426916572 DEFAULT CHARSET=utf8 COLLATE=utf8_bin/*!50100 PARTITION BY RANGE (TO_DAYS(date))(PARTITION All2008 VALUES LESS THAN (733773) ENGINE = InnoDB, PARTITION Jan09 VALUES LESS THAN (733804) ENGINE = InnoDB, PARTITION Feb09 VALUES LESS THAN (733832) ENGINE = InnoDB, PARTITION Mar09 VALUES LESS THAN (733863) ENGINE = InnoDB, PARTITION Apr09 VALUES LESS THAN (733893) ENGINE = InnoDB, PARTITION May09 VALUES LESS THAN (733924) ENGINE = InnoDB, PARTITION Jun09 VALUES LESS THAN (733954) ENGINE = InnoDB, PARTITION Jul09 VALUES LESS THAN (733985) ENGINE = InnoDB, PARTITION Aug09 VALUES LESS THAN (734016) ENGINE = InnoDB, PARTITION Sep09 VALUES LESS THAN (734046) ENGINE = InnoDB, PARTITION Oct09 VALUES LESS THAN (734077) ENGINE = InnoDB, PARTITION Nov09 VALUES LESS THAN (734107) ENGINE = InnoDB, PARTITION Dec09 VALUES LESS THAN (734138) ENGINE = InnoDB, PARTITION Jan10 VALUES LESS THAN (734169) ENGINE = InnoDB, PARTITION Feb10 VALUES LESS THAN (734197) ENGINE = InnoDB, PARTITION Mar10 VALUES LESS THAN (734228) ENGINE = InnoDB, PARTITION Apr10 VALUES LESS THAN (734258) ENGINE = InnoDB, PARTITION May10 VALUES LESS THAN (734289) ENGINE = InnoDB, PARTITION Jun10 VALUES LESS THAN (734319) ENGINE = InnoDB, PARTITION Jul10 VALUES LESS THAN (734350) ENGINE = InnoDB, PARTITION Aug10 VALUES LESS THAN (734381) ENGINE = InnoDB, PARTITION Sep10 VALUES LESS THAN (734411) ENGINE = InnoDB, PARTITION Oct10 VALUES LESS THAN (734442) ENGINE = InnoDB, PARTITION Nov10 VALUES LESS THAN (734472) ENGINE = InnoDB, PARTITION Dec10 VALUES LESS THAN (734503) ENGINE = InnoDB, PARTITION After10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Thanks in advance for any help. I’m more than happy to provide other information that is helpful.

Andy

What are the execution times?

Suppose the query would return only ten rows, then scanning all rows from the table aa is too expensive. In that case the current plan could be much better, given that this index is present.
ALTER TABLE aa ADD KEY alerts_ad_page_group_id_id (page_group_id, id)

If all or almost all rows in aa will be returned, as is the case in your situation, then indeed the execution plan is bad. This query would run better:

select aa.id, ads_page_group.name
from alerts_ad_staging1 aa
straight_join ads_page_group on aa.page_group_id = ads_page_group.id
where aa.id between 418536945 and 420368655

good questions,

The query should return ~1.8M rows, which is about 1/233rd of the table. All of these rows should be contiguous on disk.

execution times:
(edited to reflect recent test)
query with no join: 4 seconds
query with straight_join: 4 seconds
query with join: 58 minutes (wow - big difference vs straight_join!)

Indeed, using straight_join produces the query plan I expected to see in the first place (below). But I am dying to know why the optimizer would choose the plan it did?

mysql> explain extended select aa.id, ads_page_group.name from alerts_ad_staging1 aa straight_join ads_page_group on aa.page_group_id = ads_page_group.id where aa.id between 418536945 and 420368655\G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: aa type: rangepossible_keys: PRIMARY,alerts_ad_page_group_id key: PRIMARY key_len: 4 ref: NULL rows: 5123229 filtered: 100.00 Extra: Using where*************************** 2. row *************************** id: 1 select_type: SIMPLE table: ads_page_group type: ALLpossible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 4 filtered: 75.00 Extra: Using where; Using join buffer2 rows in set, 1 warning (0.01 sec)

Thanks,

Andy

That makes sense as well Baron. Thanks again.