optimization for inner join.. order by.. limit with large order by data

Trying to optimize the following query in MySQL 5.0.45

SELECT * FROM artdistinct
INNER JOIN artartstyles USING (pid)
WHERE CategoryID = ‘7778’
ORDER BY ArtistLN,ArtistFN
LIMIT 90, 45;

The artdistinct and artartstyles tables each have about 150K records.

This takes about 60 seconds to execute. The order by uses two columns that are each a varchar(50). If I change the order by column to pID, the select takes about 3 seconds to execute. Any suggestions on optimization for this? Thanks.

CREATE TABLE artdistinct (
pID int(11) NOT NULL,
PublisherCode char(3) NOT NULL DEFAULT ‘’,
ItemNo varchar(50) NOT NULL,
artType int(11) DEFAULT NULL,
IsAvailable int(11) DEFAULT NULL,
CanCanvas int(11) DEFAULT NULL,
CanFrame int(11) DEFAULT NULL,
ArtistLN varchar(50) DEFAULT NULL,
ArtistFN varchar(50) DEFAULT NULL,
Title varchar(200) DEFAULT NULL,
PaperWidth decimal(10,3) DEFAULT NULL,
PaperHeight decimal(10,3) DEFAULT NULL,
ImageWidth decimal(10,3) DEFAULT NULL,
ImageHeight decimal(10,3) DEFAULT NULL,
RetailPrice decimal(10,2) DEFAULT NULL,
count int(11) DEFAULT NULL,
PRIMARY KEY (pID),
KEY NewIndex1 (ArtistLN,ArtistFN,Title),
KEY ArtistLN (ArtistLN,ArtistFN)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Index Information
Indexes Columns Index_Type
PRIMARY pID Unique
NewIndex1 ArtistLN, ArtistFN, Title
ArtistLN ArtistLN, ArtistFN

CREATE TABLE artartstyles (
pID int(11) NOT NULL,
CategoryID int(11) NOT NULL,
PRIMARY KEY (pID,CategoryID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC

Index Information
Indexes Columns Index_Type
PRIMARY pID, CategoryID Unique

SHOW PROFILE Result (ordered by duration)

state duration (summed) in sec percentage
Sending data 57.77979 99.99960
freeing items 0.00007 0.00012
starting 0.00005 0.00009
init 0.00003 0.00005
statistics 0.00002 0.00003
end 0.00001 0.00002
preparing 0.00001 0.00002
optimizing 0.00001 0.00002
Table lock 0.00001 0.00002
cleaning up 0.00001 0.00002
Opening tables 0.00001 0.00002
Total 57.78002 100.00000

Change Of STATUS VARIABLES Due To Execution Of Query

variable value description
Bytes_received 137 Bytes sent from the client to the server
Bytes_sent 4469 Bytes sent from the server to the client
Com_select 1 Number of SELECT statements that have been executed
Handler_commit 1 Number of internal commit statements
Handler_read_first 1 Number of times the first entry was read from an index. A high value indicates that full index scans were done
Handler_read_key 152038 Number of requests to read a row based on a key
Handler_read_next 152035 Number of index columns read with a range constraint or an index scan
Innodb_buffer_pool_pages_data* 501 The number of pages containing data - both dirty or clean
Innodb_buffer_pool_pages_misc# 10 The number of buffer pool pages allocated for administrative overhead
Innodb_buffer_pool_read_ahead_rnd# 41 The number of random reads by Innodb. The query scanned a large portion of a table in random order
Innodb_buffer_pool_read_ahead_seq# 969 The number of sequential reads by Innodb.
Innodb_buffer_pool_read_requests# 898039 The number of logical read requests Innodb has done
Innodb_buffer_pool_reads# 24807 The number of logical reads that were not satisfied from the buffer pool and were read from the disk
Innodb_data_read# 586055680 The amount of data read by Innodb
Innodb_data_reads# 35770 The total number of data reads by Innodb
Innodb_rows_read# 152157 The number of rows read from Innodb tables
Last_query_cost* 182953 The total cost of this query as computed by the query optimizer
Questions 1 Number of statements executed by the server
Select_scan 1 Number of full table scans of the first table in the query
Slow_queries 1 This query took more than the value specified in long_query_time. It probably requires optimization
Table_locks_immediate 2 The number of requests for table locks that could be granted immediately

  • Actual values after query execution (not changes due to query)

Global values (affected by operations by all clients connected)

EXPLAIN Result

id select_type table type possible_keys key key_len refrows filtered Extra
1 SIMPLE artdistinct index PRIMARY ArtistLN 106 (NULL) 135 112201.48
1 SIMPLE artartstyle seq_ref PRIMARY PRIMARY 8 qaaf.artdistinct.pID,const 1100.00 Using index

EXPLAIN EXTENDED Information

Level:NoteCode:1003
Message
select
qaaf.artdistinct.pID AS pID,
qaaf.artdistinct.PublisherCode AS PublisherCode,
qaaf.artdistinct.ItemNo AS ItemNo,
qaaf.artdistinct.artType AS artType,
qaaf.artdistinct.IsAvailable AS IsAvailable,
qaaf.artdistinct.CanCanvas AS CanCanvas,
qaaf.artdistinct.CanFrame AS CanFrame,
qaaf.artdistinct.ArtistLN AS ArtistLN,
qaaf.artdistinct.ArtistFN AS ArtistFN,
qaaf.artdistinct.Title AS Title,
qaaf.artdistinct.PaperWidth AS PaperWidth,
qaaf.artdistinct.PaperHeight AS PaperHeight,
qaaf.artdistinct.ImageWidth AS ImageWidth,
qaaf.artdistinct.ImageHeight AS ImageHeight,
qaaf.artdistinct.RetailPrice AS RetailPrice,
qaaf.artdistinct.count AS count,
qaaf.artartstyles.CategoryID AS CategoryID
from qaaf.artdistinct
join qaaf.artartstyles
where ((qaaf.artartstyles.pID = qaaf.artdistinct.pID)
and (qaaf.artartstyles.CategoryID = ‘7778’))
order by qaaf.artdistinct.ArtistLN,qaaf.artdistinct.Artist FN
limit 90,45

How many rows match the following condition?

WHERE CategoryID = ‘7778’

only 122, but the result range of rows that this query can be used for range from 2 to 11178.

I just reran the query using the largest Category and suddenly some of my queries are about 1 second each. Others are 3 to 6 seconds and some occasionally are still about 60 seconds but most are acceptable. I’m really confused now. I have run this many times now with no consistency.

Yesterday while I was trying to figure this out I added an index for ArtistLN, ArtistFN to the artdistinct table but the performance didn’t change. Since yesterday, the only other change I have made is to reboot.

I don’t know if this makes a difference, but the tables that are being used in the query are largely static. They only change when I make an occasional update to the tables when new art is available. During normal use they don’t change.

I am thinking you need different indexes. The output you pasted is hard for me to read because some numbers are joined together in EXPLAIN, but I think you might try an index on (CategoryID,ArtistLN,ArtistFN).

This general type of query looks like a suboptimal design anyway, and you might consider other ways. Looks like a typical pagination query – search our blog for “pagination” for some ideas.

It is a pagination query so I will take a look your blog for some ideas.

I thought about using an index for (CategoryID,ArtistLN,ArtistFN but since CategoryID is in a different table than ArtistLN, ArtistFN, I didn’t think that this could be done on a join. Am I wrong on this?

I also thought about de-normalizing this into a single table but I in addition to this query for Art Style, I also have similar queries and paginations for Primary Color, Subjects and Artists so de-normalization is not an option.