Hi,
I have a slow query that I could really do with some help optimising please.
It involves two tables:
CREATE TABLE IF NOT EXISTS data ( id bigint(cool: NOT NULL auto_increment, brand_id bigint(cool: NOT NULL default ‘0’, pub_date date NOT NULL default ‘0000-00-00’, concept_id bigint(cool: NOT NULL default ‘0’,
PRIMARY KEY (id),
KEY brand_id (brand_id),
KEY pub_date_2 (pub_date),
KEY brand_id_2 (brand_id,pub_date)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=8579258 ;
CREATE TABLE IF NOT EXISTS concept ( id bigint(cool: NOT NULL default ‘0’, concept varchar(255) NOT NULL default ‘’,
KEY concept (concept),
KEY id (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Select c.concept,count(concept) concept1 from concept c,data d where d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’ and d.brand_id =32 and d.id = c.id GROUP BY concept ORDER BY concept1 DESC
So what we are trying to do is find the most frequently occuring concepts from the concept table, for the rows selected in the data table then order and group them.
The concept table can have many different rows where the ID links back to a single row in the data tables ID.
So if there is 30000 rows in the data table it could link to 300,000 rows in the concept table. It appears to the group by/order by that is slowing things down.
These tables are going to continue to grow so I need to find a more scalable option to this.
Any advice or query/schema changes to speed this up and help the application scale would be great.
First thing: KEY brand_id_2 (brand_id,pub_date), so where should be:
[B]Quote:[/B]
where d.brand_id = 32 and (d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’)
I could be wrong, so You could check it with explain
[B]Quote:[/B]
explain Select c.concept,count(concept) concept1 from concept c,data d where d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’ and d.brand_id =32 and d.id = c.id GROUP BY concept ORDER BY concept1 DESC
explain Select c.concept,count(concept) concept1 from concept c,data d where d.brand_id =32 and d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’ and d.id = c.id GROUP BY concept ORDER BY concept1 DESC
(please, post both results)
edit:
You could add table “counter” with columns: year, month, concept and couter.
explain Select c.concept,count(concept) concept1 from concept c,data d where d.pub_date >=‘2009-01-20’ AND d.pub_date <= ‘2009-01-27’ and d.brand_id =273 and c.concept !=‘“.$brand.”’ and d.id = c.id GROUP BY concept ORDER BY concept1 DESC
1 SIMPLE d range PRIMARY,brand_id,pub_date_2,brand_id_2 brand_id_2 11 NULL 17561 Using where; Using temporary; Using filesort
1 SIMPLE c ref concept,id id 8 sentimen_sentimen.d.id 19 Using where
explain Select c.concept,count(concept) concept1 from concept c,data d where d.brand_id =273 and d.pub_date >=‘2009-01-20’ AND d.pub_date <= ‘2009-01-27’ and c.concept !=‘“.$brand.”’ and d.id = c.id GROUP BY concept ORDER BY concept1 DESC
1 SIMPLE d range PRIMARY,brand_id,pub_date_2,brand_id_2 brand_id_2 11 NULL 17561 Using where; Using temporary; Using filesort
1 SIMPLE c ref concept,id id 8 sentimen_sentimen.d.id 19 Using where
hmmm …
Explain shows that it will fetch 17561 rows.
[B]Quote:[/B]
Select count(*) from concept c,data d where d.pub_date >=‘2009-01-20’ AND d.pub_date <= ‘2009-01-27’ and d.brand_id =273 and c.concept !=‘“.$brand.”’ and d.id = c.id