Optimisation of SELECT on a very large table

FatalFlaw · March 1, 2007, 5:56am

Hi Everyone

I have a large table (40m + records):

CREATE TABLE Stats_Web_Access_Raw (
Time timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
Web_Group_ID smallint(5) unsigned default NULL,
Web_User_ID mediumint(cool: unsigned default NULL,
Secure enum(‘Y’,‘N’) NOT NULL default ‘N’,
URL varchar(200) default NULL,
Category int(11) default NULL,
filelocation int(11) default ‘0’,
KEY Time (Time,Web_Group_ID,Web_User_ID),
KEY Time_2 (Time,Web_User_ID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The queries I need to run have the following qualities:

order will always be by Time or Time DESC.
Groups (which are identified by Web_Group_ID) contain users (identified by Web_User_ID).
I need to be able to query 1 or more Web_Group_IDs or Web_User_IDs, eg: ‘Web_Group_ID in (1,32,97,101)’

My problem is that the Time part of the index is the only part ever to be used. Here is a typical explain:

Note the ‘key_len=4’ value, indicating that the Time column is the only index part to be used. The result is a slow (several 10s of seconds) query, if the first 10 records are not well distributed (ie all at one end of the table!). I tried playing with a range of different values for the max_seeks_for_key value but it made no difference.

How do I get mysql to use the additional fields in the index?

Any comments/tips/suggestions very gratefully received.
Jim

Peter · March 1, 2007, 6:04am

You need Time to go after WebUserId so both columns are used

If you have column with “<” etc, all following columns in index can’t be used.

FatalFlaw · March 1, 2007, 6:20am

Hi Peter,

thanks very much for getting back to me so quickly.

Peter said: “If you have column with “<” etc, all following columns in index can’t be used.”

Is there anywhere in the docs where these rules are explained fully? I don’t recall seeing anothing covering this particular issue before and I thought I had read the docs pretty thoroughly. If not, what operators are ok? =, <=>, <>? Does changing the engine change these rules?

Ok so I have already tried reversing the indexes as you suggest. They work brilliantly where we have 1 group or user in the query. As soon as we query on several (eg ‘Web_Group_ID in (1,2,3,4)’) the DB engine does a filesort to get the ordering correct. If there are a large number of records reterned, the query can take minutes/hours to complete.

Is there any way to harness the fact that InnoDB clusters the primary key? I believe in my case is a monotonic 6-byte ID allocated by InnoDB (as I don’t specify I primary key in the create table), which essentially delivers and ordering by date, and could mean I could lose the Time field from the index …

Otherwise I don’t see how I can get this working any better, unless there is a solution using multiple table merging, say 1 table per web_group_id … ?

Thanks again
Jim

Topic		Replies	Views
Index question Other MySQL® Questions	3	551	February 24, 2014
Inconsistent use of time index Other MySQL® Questions	5	521	May 5, 2009
stumped on datetime index issue Other MySQL® Questions	3	639	January 24, 2008
optimize 20 second query Other MySQL® Questions	5	456	February 2, 2008
Index issue with TIMESTAMP and indices Other MySQL® Questions	4	855	August 14, 2009

Optimisation of SELECT on a very large table

Related topics