Join and speed issues

hi,

i have certain tables for products, time and the market data the structures of which are shown below.

CREATE TABLE product ( id mediumint( 11 ) NOT NULL auto_increment ,
code int( 11 ) NOT NULL ,
name varchar( 150 ) NOT NULL ,
brand_name varchar( 200 ) NOT NULL ,
start_date date default NULL ,
end_date date default NULL ,
code1 varchar( 45 ) NOT NULL ,
brand_name2 varchar( 45 ) NOT NULL ,
PRIMARY KEY ( id ) ,
KEY brand_name ( brand_name ) ) ENGINE = MyISAM DEFAULT CHARSET = latin1 ROW_FORMAT = COMPACT;

CREATE TABLE time (
id int( 11 ) NOT NULL AUTO_INCREMENT ,
date date NOT NULL ,
month int( 9 ) default NULL ,
year int( 4 ) default NULL ,
PRIMARY KEY ( id ) ,
KEY month ( month ) ,
KEY year ( year )
) ENGINE = MYISAM DEFAULT CHARSET = latin1 ROW_FORMAT = COMPACT;

CREATE TABLE mkt_data ( id int( 11 ) NOT NULL auto_increment ,
time_id int( 11 ) NOT NULL ,
brand_id int( 11 ) NOT NULL ,
buy_qty bigint( 20 ) NOT NULL ,
buy_value bigint( 20 ) NOT NULL ,
sold_qty bigint( 20 ) NOT NULL ,
sold_value bigint( 20 ) NOT NULL ,
PRIMARY KEY ( id ) ,
KEY time_id ( time_id , brand_id ) ,
KEY time_id_2 ( time_id ) ,
KEY brand_id ( brand_id ) ,
) ENGINE = MyISAM DEFAULT CHARSET = latin1 ROW_FORMAT = COMPACT;

Now i have performance issues when i run a query which joins all these 3 tables.

SELECT product.code, SUM(mkt_data.buy_value) FROM mkt_data, time, product WHERE time.id=mkt_data.time_id AND product.id=mkt_data.brand_id AND (time.year=2009 AND time.month<=8) GROUP BY product.code ORDER BY product.code

The problem with this query is that its very slow.
it takes almost 7 s to execute. Also it causes the use of filesort and temporary.

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE product ALL PRIMARY NULL NULL NULL 102 Using temporary; Using filesort

could anyone pls help me out as to how i can reduce the timing since the table size is also very less ( in a few thousands).
its really very important to me as i am not able to figure out how to move ahead with this?

You could check:

SELECT product.code, SUM(mkt_data.buy_value) FROM time
left join mkt_data on time.id=mkt_data.time_id
left join product on product.id=mkt_data.brand_id
where (time.year=2009 AND time.month<=8)
GROUP BY product.code
ORDER BY product.code

There should be new index on the (time.year,time.month) and maybe on the product.code.

Edit: As for filesort - You are using order by and mysql docs says: “In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following: * The key used to fetch the rows is not the same as the one used in the ORDER BY:”

Thanks januzi.

It worked. But i still have the problem of having “using Filesort” and “Using temporary” in the explain statement.

And could you please let me know why you chose left join and why did u take that sequence of having the time table first, then the market data and then product table.

That sequence could be different, for example: product, time, mkt_data. Mysql finds it own sequence that looks like optimal. As for left join (right join) - it makes where condition smaller and leaves only those variables that are needed time.year and time.month. Did You check it with explain ? How many rows will be fetched ?

Filesort is used even with index on product.code. Thats because You have got index on (time.y,time.m) and You try to sort column by product.code. Don’t worry about filesort, it is quicksort, so it is rather quick.

As “for using temporary”: http://www.mysqlperformanceblog.com/2007/08/16/how-much-over head-is-caused-by-on-disk-temporary-tables/
I think that temporary table is created because You are sorting and grouping rows.

Those are statistics, right ? You could create new table with year, month, product code and sum_value. First query (slower) that fetches rows and puts them into new table (You could put it into crontab; that query should fetch only rows from time.month = date(‘m’)-1 ). Second query (fast) fetches rows from new table (no group by).

It surprises me that januzi’s query works. It looks horrible.

This is the same query you had, but a little restructured for readability. Try to make your own queries readable next time.

SELECT product.code, SUM(mkt_data.buy_value)
FROM product
JOIN mkt_data ON (product.id=mkt_data.brand_id)
JOIN time ON (time.id=mkt_data.time_id AND time.year=2009 AND time.month<=8)
GROUP BY product.code
ORDER BY product.code

mkt_data should have an index on (brand_id)
time should have an index on (id,year,month) (that is ONE index, in THIS order)
product should have an index on (code)

And make sure MySQL uses those indices, use FORCE INDEX if necessary. Then check query speed )

If you really want a fast query at the cost of slower inserts, use covering indices:
mkt_data should have an index on (brand_id,time_id,buy_value)
time should have an index on (id,year,month)
product should have an index on (code,id)

Thanks gmouse.

There are certain questions that i had and would be glad if u could help me out. I am kinda new to Mysql and would appreciate some help.

  1. why in the indexes do we add even the “id” column? the one that i have defined as the primary key?

  2. The query on execution still shows me “using Filesort” and “Using Temporary”. My Explain result shows me that for the product table the key is not being used.
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE product ALL PRIMARY NULL NULL NULL 102 Using temporary; Using filesort

I have noticed in the show profile table that maximum time is taken in creating the temporary table. Can u let me know how i can avoid its creation. Im new to Mysql and have read on “temporary table” but i still haven’t managed to understand it.

  1. Is there some significance of Cardinality in MYSQL’s decision to use indices.

  2. How can using filesort be avoided?

  3. In the composite indexes that i create, is the arrangement of the columns a matter of concern for proper indexing as u suggested me to have the time index in the form of (id,year,month) in the same order.why is it so?

  4. Can creation of too many indexes lead to some performance problems?

  5. Can using foreign keys solve the problem of slower queries. Also MYISAM does not support this feature of foreign keys, so is it advisable to move on to INNODB. I have herd that it does have its own problems.

8 ) How can we establish which table needs to be selected first and so on. i tried using Straight join and it did. Is there any other way?

Thanks in advance. )

  1. MySQL cannot combine seperate indices in this case in an efficient manner. Actually, in most cases it cannot, so just try to stick to using one index per table per query.

  2. change FROM product to FROM product FORCE INDEX(code)

  3. yes, it uses the cardinality to determine how selective a criterium in the WHERE clause is.

  4. in general: indices

  5. the order is very important; [URL]http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html[/URL]

  6. inserts will be slower and key_cache (MyISAM) or innodb buffer pool (InnoDB) might become too small

  7. no

  8. straight_join is the only way for outter joins, else MySQL will determine it for you. For good performance, the first table should contain all the rows you group by/sort on, and if you use a restrictive WHERE clause, some columns that will be constant for one result set.