i have a table with 25+ million rows and i have a query that needs to get the top n ids for varying criteria,
the basic problem is optimizing something like:
SELECT id, SUM(x) xsum
WHERE …
GROUP BY id
ORDER BY xsum
LIMIT n
i have a clustered index on id and the fields in the where clause
without the ORDER BY, this is very fast, due to id being the first index in the cluster (i think), the problem is when the ORDER BY is added and the result is quite large (1-2 million), ordering such a set takes ~10-15 seconds
ive tried partitioning the query by turning id into MD5(id) and then doing
SELECT id, SUM(x) xsum FROM (
(SELECT id, SUM(x) xsum
WHERE …
AND id LIKE ‘0%’
GROUP BY id
order by xsum limit n)
UNION ALL
(SELECT id, SUM(x) xsum
WHERE …
AND id LIKE ‘1%’
GROUP BY id
order by xsum limit n)
UNION ALL
.
.
.
)
ORDER BY xsum
LIMIT n
but though each query in the union is much faster, once i union all 16 (0-9a-f) it still takes about the same time
is there anyway to optimize such a query given i only need the top n?