Optimisation of SELECT on a very large table

Hi Everyone

I have a large table (40m + records):

CREATE TABLE Stats_Web_Access_Raw (
Time timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
Web_Group_ID smallint(5) unsigned default NULL,
Web_User_ID mediumint(cool: unsigned default NULL,
Secure enum(‘Y’,‘N’) NOT NULL default ‘N’,
URL varchar(200) default NULL,
Category int(11) default NULL,
filelocation int(11) default ‘0’,
KEY Time (Time,Web_Group_ID,Web_User_ID),
KEY Time_2 (Time,Web_User_ID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The queries I need to run have the following qualities:

  1. order will always be by Time or Time DESC.
  2. Groups (which are identified by Web_Group_ID) contain users (identified by Web_User_ID).
  3. I need to be able to query 1 or more Web_Group_IDs or Web_User_IDs, eg: ‘Web_Group_ID in (1,32,97,101)’

My problem is that the Time part of the index is the only part ever to be used. Here is a typical explain:

mysql> explain select * from Stats_Web_Access_Raw where Time > “07-02-20 10:48:57” AND Web_User_ID = 37 order by Time desc limit 10;
±—±------------±---------------------±------±-------- ------±-------±--------±-----±--------±------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±---------------------±------±-------- ------±-------±--------±-----±--------±------------+
| 1 | SIMPLE | Stats_Web_Access_Raw | range | Time,Time_2 | Time_2 | 4 | NULL | 6019062 | Using where |
±—±------------±---------------------±------±-------- ------±-------±--------±-----±--------±------------+

Note the ‘key_len=4’ value, indicating that the Time column is the only index part to be used. The result is a slow (several 10s of seconds) query, if the first 10 records are not well distributed (ie all at one end of the table!). I tried playing with a range of different values for the max_seeks_for_key value but it made no difference.

How do I get mysql to use the additional fields in the index?

Any comments/tips/suggestions very gratefully received.
Jim

You need Time to go after WebUserId so both columns are used

If you have column with “<” etc, all following columns in index can’t be used.

Hi Peter,

thanks very much for getting back to me so quickly.

Peter said: “If you have column with “<” etc, all following columns in index can’t be used.”

Is there anywhere in the docs where these rules are explained fully? I don’t recall seeing anothing covering this particular issue before and I thought I had read the docs pretty thoroughly. If not, what operators are ok? =, <=>, <>? Does changing the engine change these rules?

Ok so I have already tried reversing the indexes as you suggest. They work brilliantly where we have 1 group or user in the query. As soon as we query on several (eg ‘Web_Group_ID in (1,2,3,4)’) the DB engine does a filesort to get the ordering correct. If there are a large number of records reterned, the query can take minutes/hours to complete.

Is there any way to harness the fact that InnoDB clusters the primary key? I believe in my case is a monotonic 6-byte ID allocated by InnoDB (as I don’t specify I primary key in the create table), which essentially delivers and ordering by date, and could mean I could lose the Time field from the index …

Otherwise I don’t see how I can get this working any better, unless there is a solution using multiple table merging, say 1 table per web_group_id … ?

Thanks again
Jim