Optimizing a legacy application's query

I’ve been assigned to extend a web based legacy application originally written 5 years ago by someone with less skill than I (still learning here). The system runs on RHEL5, with MySQL 5.0.45. Due to budget constraints the powers that be will not permit the system to be scrapped and/or redesigned (please don’t shoot the messenger).

app details
Test results are automatically imported nightly (avg 50/night). Each test contains over 200 markers, and the genius who put the system together years ago didn’t anticipate growing from 10 markers to 200+. As you guessed, all markers are stored in a single table, with one column per marker. The table uses the INNODB engine.
/app details

It’s a mess.

Nonetheless, I have been charged with implementing new disease detection algorithms. Currently, one or two fields are examined per disease, regardless of patient age or sex.

The new algorithms are based on age and sex - so the number algorithms per disease have increased from 1 to 12.

I’m stuck on a report that counts the number of elevated markers within a given time frame. Since each patient has to be looked at individually (for age & sex), I can no longer SELECT COUNT(*) WHERE marker_1>123 AND marker_2<321; I have to first determine the age & sex, then apply the correct cutoff levels to determine elevation. My first attempt involved a whole lot of AND’s and OR’s in the WHERE clause - it works, but is dreadfully slow.


date_of_test>‘2008-01-01’ AND
date_of_test<‘2008-04-01’ AND

( age_rage=‘0-9’ AND sex=‘m’ AND ( marker_1 < 876 AND marker_2 > 345) ) OR
( age_rage=‘10-19’ AND sex=‘m’ AND ( marker_1 < 824 AND marker_2 > 312) ) OR
( age_rage=‘20-29’ AND sex=‘m’ AND ( marker_1 < 798 AND marker_2 > 311) )

) OR (

( age_rage=‘0-9’ AND sex=‘f’ AND ( marker_1 < 987 AND marker_2 > 465) ) OR
( age_rage=‘10-19’ AND sex=‘f’ AND ( marker_1 < 813 AND marker_2 > 404) ) OR
( age_rage=‘20-29’ AND sex=‘f’ AND ( marker_1 < 701 AND marker_2 > 209) )


This is simplified for example - there are actually 12 variations to check against.

Each column is indexed separately, but since I am basing queries on date_of_test, age_range, sex and marker_xyz, wouldn’t I need multi-column indexes for each possible combination?. If so, I do not have room for that many indexes, and am hence stuck.

Sorry to have written a book here… it helps me to have background when approaching complex problems.

I am glad to provide more details/code… please let me know.

What does explain say ?
Is there an index on the date_of_test ?

this is terrible database design.
i´d start by normalizing the existing database schema.
planning a migration from the old to the new database.
rewrite queries for the new database schema.
that is already a huge improvement.

I seriously believe the cost of redesign will be lower than actually trying to continue developing with this… aberration of a database.
sorry mate.

by the way. use the BETWEEN syntax, its easier to read and avoids problems.

you might consider stored procedures, to hide the complex algorithms youre implementing?