I’ve been assigned to extend a web based legacy application originally written 5 years ago by someone with less skill than I (still learning here). The system runs on RHEL5, with MySQL 5.0.45. Due to budget constraints the powers that be will not permit the system to be scrapped and/or redesigned (please don’t shoot the messenger).
app details
Test results are automatically imported nightly (avg 50/night). Each test contains over 200 markers, and the genius who put the system together years ago didn’t anticipate growing from 10 markers to 200+. As you guessed, all markers are stored in a single table, with one column per marker. The table uses the INNODB engine.
/app details
It’s a mess.
Nonetheless, I have been charged with implementing new disease detection algorithms. Currently, one or two fields are examined per disease, regardless of patient age or sex.
The new algorithms are based on age and sex - so the number algorithms per disease have increased from 1 to 12.
I’m stuck on a report that counts the number of elevated markers within a given time frame. Since each patient has to be looked at individually (for age & sex), I can no longer SELECT COUNT(*) WHERE marker_1>123 AND marker_2<321; I have to first determine the age & sex, then apply the correct cutoff levels to determine elevation. My first attempt involved a whole lot of AND’s and OR’s in the WHERE clause - it works, but is dreadfully slow.
(ie:
WHERE
date_of_test>‘2008-01-01’ AND
date_of_test<‘2008-04-01’ AND
(
( age_rage=‘0-9’ AND sex=‘m’ AND ( marker_1 < 876 AND marker_2 > 345) ) OR
( age_rage=‘10-19’ AND sex=‘m’ AND ( marker_1 < 824 AND marker_2 > 312) ) OR
( age_rage=‘20-29’ AND sex=‘m’ AND ( marker_1 < 798 AND marker_2 > 311) )
) OR (
( age_rage=‘0-9’ AND sex=‘f’ AND ( marker_1 < 987 AND marker_2 > 465) ) OR
( age_rage=‘10-19’ AND sex=‘f’ AND ( marker_1 < 813 AND marker_2 > 404) ) OR
( age_rage=‘20-29’ AND sex=‘f’ AND ( marker_1 < 701 AND marker_2 > 209) )
)
This is simplified for example - there are actually 12 variations to check against.
Each column is indexed separately, but since I am basing queries on date_of_test, age_range, sex and marker_xyz, wouldn’t I need multi-column indexes for each possible combination?. If so, I do not have room for that many indexes, and am hence stuck.
Sorry to have written a book here… it helps me to have background when approaching complex problems.
I am glad to provide more details/code… please let me know.