Query Performance Advice

Hi,

I have the following query:

select messages.sid, messages.mid, messages.sender_addr, recipients.rcpt_addr, messages.subject,
date_format(from_unixtime(messages.timestamp),"%M %d, %Y %H:%i:%s") as sent_date, ip_system.hostname
from messages, recipients,ip_system
where messages.mid = recipients.mid
AND messages.sid = recipients.sid
AND messages.sid = ip_system.sid
AND sender_domain = “sendingcompany.com
AND subject like “Some Subject%”
AND recipients.rcpt_domain = “receivingcompany.com
AND messages.timestamp >= unix_timestamp(“2007-03-08 20:00:00”)
group by messages.mid
order by sent_date LIMIT 0,10

Using InnoDB tables on a Windows 2003 Server (not my choice) with 3.5GB of RAM. Dual Xeon 3.6GHz.

messages table has about 45M rows
recipients table has about 70M rows
ip_system table has 4 rows (2 are logically deleted)

Explain Extended:
1,SIMPLE,messages,range,PRIMARY,msg_complete_idx,msg_sender_ domain_idx,msg_timestamp_idx,msg_sid_domain_timestamp_idx,ms g_sender_domain_idx,14,430,Using where; Using temporary; Using filesort
1,SIMPLE,recipients,ref,PRIMARY,rcpt_domain_idx,rcpt_sid_mid _rid_domain,rcpt_sid_mid_rid_domain,6,message_tracker.messag es.sid,message_tracker.messages.mid,1,Using where
1,SIMPLE,ip_system,ALL,PRIMARY,4,Using where

Typically the query uses the index (sender_domain[8],timestamp) on the messages table. sender_domain is unique for these set of companies in 8 characters or less.

Query times vary based on timeframe (the longer the worse) specified and the message volume of “sendingcompany.com”. But can easily take 10-15 minutes. For small volume/timeframes it will be less than a minute.

I am thinking the group by on the computed field could be changed to the source column.

Can a sortfile be prevented by using both columns in the index? i.e. sender_domain, timestamp?

The primary index on the recipients table is mid,sid,rid. This should facilitate the joining of rows from the messages table.

I believe the ip_system table isn’t a big player since it logically has only 2 rows.

INNODB STATUS OUTPUT
Per second averages calculated from the last 0 seconds

SEMAPHORES

OS WAIT ARRAY INFO: reservation count 566606, signal count 564804
Mutex spin waits 15390019, rounds 21880070, OS waits 217206
RW-shared spins 438734, OS waits 182672; RW-excl spins 448303, OS waits 125671

TRANSACTIONS

FILE I/O

I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o’s: 0, sync i/o’s: 0
Pending flushes (fsync) log: 0; buffer pool: 0
9601252 OS file reads, 8643216 OS file writes, 371528 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s

INSERT BUFFER AND ADAPTIVE HASH INDEX

Ibuf for space 0: size 1, free list len 74, seg size 76, is empty
Ibuf for space 0: size 1, free list len 74, seg size 76,
5461531 inserts, 5461531 merged recs, 845038 merges
Hash table size 6607487, used cells 946582, node heap has 1003 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s

LOG

Log sequence number 404 3161598899
Log flushed up to 404 3161598899
Last checkpoint at 404 3161598899
0 pending log writes, 0 pending chkp writes
153464 log i/o’s done, 0.00 log i/o’s/second

BUFFER POOL AND MEMORY

Total memory allocated 1771661386; in additional pool allocated 490240
Buffer pool size 101888
Free buffers 0
Database pages 100885
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 9601242, created 2974457, written 8324174
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout

ROW OPERATIONS

0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread id 2760, state: waiting for server activity
Number of rows inserted 201389704, updated 426, deleted 113685, read 314844469
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

Any help advice would be appreciated.
David

Yes the sort on the date is bad from a performance point of view.

Because of this sort the DB needs to first convert each date to the new format and then it has to compare them.

Imagine if you have 2,000,000 messages that match your timeframe/company name and each date has to be converted and then you have to sort these 2,000,000 rows.

But you have a lot of inconcistency in your query:
1.
You group on message.mid but you do not use group functions in the SELECT part.
Which means that you don’t know which values you are going to get.
So which values do you want to get?

I also find the sort a bit strange since the output you are going to get is:
April 4
December 2
January 30

Which means that it is not sorted in date order but in string order which I feel is very strange when I look at a list of dates.
If you drop this sort we can start to tweak your query.