Need help with optimization of queries

My queries have been taking really really long to load these last few weeks, and I’m guessing it could be because of more users and increased table size. Anyway the table that’s involved is as follows:

CREATE TABLE IF NOT EXISTS dsitesubmissions (
submit_id int(11) NOT NULL auto_increment,
siteid int(11) NOT NULL default ‘0’,
did int(11) NOT NULL default ‘0’,
transactionid int(11) NOT NULL default ‘0’,
registration_date datetime NOT NULL default ‘0000-00-00 00:00:00’,
submission_date datetime default NULL,
approved enum(‘Unknown’,‘Rejected’,‘Approved’) default ‘Unknown’,
link_location varchar(255) default NULL,
tdid int(11) default NULL,
payment_status enum(‘Paid’,‘Unpaid’,‘Previous Submission’,‘Custom’) NOT NULL default ‘Unpaid’,
operator_id int(10) NOT NULL default ‘0’,
time_taken decimal(10,2) NOT NULL default ‘0.00’,
trylater int(11) NOT NULL default ‘0’,
skipped int(11) NOT NULL default ‘0’,
url_error int(11) NOT NULL default ‘0’,
link_urls varchar(255) NOT NULL default ‘none’,
link_text varchar(255) NOT NULL default ‘none’,
category_text varchar(255) NOT NULL default ‘none’,
category_index int(10) NOT NULL default ‘0’,
PRIMARY KEY (submit_id),
UNIQUE KEY sd (siteid,did),
KEY registration_date (registration_date),
KEY submission_date (submission_date),
KEY transactionid (transactionid),
KEY siteid (siteid),
KEY did (did),
KEY tdid (tdid)
) ENGINE=MyISAM

It has 4.5 million records and the query that’s been taking long is shown below:

SELECT submit_id
FROM dsitesubmissions AS a, dsitedetails c
WHERE (submission_date IS NULL OR submission_date=‘0000-00-00 00:00:00’
)
AND payment_status = ‘Paid’
AND c.status = ‘Active’
AND DATE_FORMAT( a.registration_date, ‘%Y-%m-%d’ ) <= CURDATE( )
AND c.siteid = a.siteid
AND selection_needed = ‘No’

When I do an explain it shows:

select_type table type possible_keys key ref rows Extra

SIMPLE a ref_or_null sd,submission_date,siteid submission_date const 67587 Using where

SIMPLE c eq_ref PRIMARY PRIMARY inksmax_main.a.siteid 1 Using where

While I think 67000 is a relatively small number of rows out of the 4.6 million rows, I dont know why the query’s been taking so long…could it be a problem that there are inserts/updates also taking place on the same table every few seconds concurrently while the selects are being run? If so, that’s a situation I cannot avoid.

I need a solution fast.

If you’re updating and reading very often, you should consider switching to InnoDB. MyISAM will be using table-level locks to ensure consistency for your updates, whereas InnoDB has row-level locking. If this really is your problem, then you should be able to verify it by doing

SHOW GLOBAL STATUS LIKE ‘Table_locks%’;

If Table_locks_waited is very high, chances are switching to InnoDB will help you out.

Also, MySQL’s query optimizer and the output of explain are only as good as the index statistics kept by MySQL. If those become stale, MySQL starts making really bad decisions about how to execute a query. If the numbers in explain stop making sense, you can run

ANALYZE TABLE

to make sure MySQL’s index statistics are up to date. You should read more about it in the manual, because it may take a little bit of time given your table size and the number of indexes you have defined.