Optimizing top N from large aggregates

joe_mpb · January 27, 2008, 2:23pm

i have a table with 25+ million rows and i have a query that needs to get the top n ids for varying criteria,
the basic problem is optimizing something like:

SELECT id, SUM(x) xsum
WHERE …
GROUP BY id
ORDER BY xsum
LIMIT n

i have a clustered index on id and the fields in the where clause

without the ORDER BY, this is very fast, due to id being the first index in the cluster (i think), the problem is when the ORDER BY is added and the result is quite large (1-2 million), ordering such a set takes ~10-15 seconds

ive tried partitioning the query by turning id into MD5(id) and then doing
SELECT id, SUM(x) xsum FROM (

(SELECT id, SUM(x) xsum
WHERE …
AND id LIKE ‘0%’
GROUP BY id
order by xsum limit n)
UNION ALL
(SELECT id, SUM(x) xsum
WHERE …
AND id LIKE ‘1%’
GROUP BY id
order by xsum limit n)
UNION ALL
.
.
.
)
ORDER BY xsum
LIMIT n

but though each query in the union is much faster, once i union all 16 (0-9a-f) it still takes about the same time

is there anyway to optimize such a query given i only need the top n?

MarkRose · January 28, 2008, 5:10am

Have you read the comments on http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization .html ?

Some of the techniques may apply to the WHERE clauses you are using (whatever they may be).

Topic		Replies	Views
optimize 20 second query Other MySQL® Questions	5	460	February 2, 2008
ORDER BY ..LIMIT using indexes still slow Other MySQL® Questions	4	585	April 14, 2008
poor performance ORDERing BY indexed column Other MySQL® Questions	2	548	November 12, 2007
Obtaining PK of Aggregate (MAX/MIN) Function Value Other MySQL® Questions	3	533	January 11, 2008
ORDER BY … LIMIT Performance Optimization Other MySQL® Questions	4	485	February 26, 2007

Optimizing top N from large aggregates

Related topics