Query optimisation help please

Hi,
I have a slow query that I could really do with some help optimising please.

It involves two tables:

CREATE TABLE IF NOT EXISTS data (
id bigint(cool: NOT NULL auto_increment,
brand_id bigint(cool: NOT NULL default ‘0’,
pub_date date NOT NULL default ‘0000-00-00’,
concept_id bigint(cool: NOT NULL default ‘0’,
PRIMARY KEY (id),
KEY brand_id (brand_id),
KEY pub_date_2 (pub_date),
KEY brand_id_2 (brand_id,pub_date)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=8579258 ;

CREATE TABLE IF NOT EXISTS concept (
id bigint(cool: NOT NULL default ‘0’,
concept varchar(255) NOT NULL default ‘’,
KEY concept (concept),
KEY id (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Select c.concept,count(concept) concept1 from concept c,data d where d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’ and d.brand_id =32 and d.id = c.id GROUP BY concept ORDER BY concept1 DESC

So what we are trying to do is find the most frequently occuring concepts from the concept table, for the rows selected in the data table then order and group them.

The concept table can have many different rows where the ID links back to a single row in the data tables ID.

So if there is 30000 rows in the data table it could link to 300,000 rows in the concept table. It appears to the group by/order by that is slowing things down.

These tables are going to continue to grow so I need to find a more scalable option to this.

Any advice or query/schema changes to speed this up and help the application scale would be great.

Thanks very much

First thing: KEY brand_id_2 (brand_id,pub_date), so where should be:

[B]Quote:[/B]

where d.brand_id = 32 and (d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’)

I could be wrong, so You could check it with explain
[B]Quote:[/B]

explain Select c.concept,count(concept) concept1 from concept c,data d where d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’ and d.brand_id =32 and d.id = c.id GROUP BY concept ORDER BY concept1 DESC

explain Select c.concept,count(concept) concept1 from concept c,data d where d.brand_id =32 and d.pub_date >=‘2008-12-01’ AND d.pub_date <= ‘2008-12-31’ and d.id = c.id GROUP BY concept ORDER BY concept1 DESC

(please, post both results)

edit:
You could add table “counter” with columns: year, month, concept and couter.

Thanks for your feedback here are the results:

explain Select c.concept,count(concept) concept1 from concept c,data d where d.pub_date >=‘2009-01-20’ AND d.pub_date <= ‘2009-01-27’ and d.brand_id =273 and c.concept !=‘“.$brand.”’ and d.id = c.id GROUP BY concept ORDER BY concept1 DESC

1 SIMPLE d range PRIMARY,brand_id,pub_date_2,brand_id_2 brand_id_2 11 NULL 17561 Using where; Using temporary; Using filesort
1 SIMPLE c ref concept,id id 8 sentimen_sentimen.d.id 19 Using where

explain Select c.concept,count(concept) concept1 from concept c,data d where d.brand_id =273 and d.pub_date >=‘2009-01-20’ AND d.pub_date <= ‘2009-01-27’ and c.concept !=‘“.$brand.”’ and d.id = c.id GROUP BY concept ORDER BY concept1 DESC

1 SIMPLE d range PRIMARY,brand_id,pub_date_2,brand_id_2 brand_id_2 11 NULL 17561 Using where; Using temporary; Using filesort
1 SIMPLE c ref concept,id id 8 sentimen_sentimen.d.id 19 Using where

Any ideas how to better optimise this?

Thanks in advance

hmmm …
Explain shows that it will fetch 17561 rows.

[B]Quote:[/B]

Select count(*) from concept c,data d where d.pub_date >=‘2009-01-20’ AND d.pub_date <= ‘2009-01-27’ and d.brand_id =273 and c.concept !=‘“.$brand.”’ and d.id = c.id

Does this query return the same value ?

Yes it does return the same amount of rows.

What I am looking at doing now is just getting the data in mysql
using

SELECT c.concept FROM concept c, data d WHERE d.brand_id =273 AND d.pub_date >= ‘2008-10-20’ AND d.pub_date <= ‘2009-01-27’ AND d.id = c.id

which takes 0.0008ms

and then using php to format the data using:

$data = array_count_values($results);
$arsort($data);

Seems to be much faster and won’t hog mysql resources.

Any idea if using php in the application to group is generally a better idea than group by in mysql?

Thanks