MySQL DISTINCT with LIMIT optimization

gmouse · November 22, 2011, 5:27pm

Suppose the following three conditions are met:

DISTINCT is used together with ORDER BY
an index can be used for ORDER BY
the query contains LIMIT x with x relatively small compared to the full result set

I think this is a very common situation.

I’ve tried this in a quite recent version, and DISTINCT is still performed by ordering the results first, then removing duplicates, then ordering on the columns specified after ORDER BY, then returning the first x rows.

It seems more sense to fetch the rows in the order specified by ORDER BY if an index is available, and checking if it is not equal to a row that has been returned already. Especially when x is small, say 10 or 50, and the full result set is in the millions, the performance gain is substantial. The check for uniqueness could be based on full row comparison or by checksum.

gmouse · November 29, 2011, 5:50am

does anyone have useful thoughts?

sterin · November 29, 2011, 6:57am

Sorry I didn’t really understand that it was a question the first time I read the post.

Do you have a join in the query is it on only one table?

The first thought that comes from the top of my head is that you create a derived table with say 2 times the amount of rows that you need to ensure that your DISTINCT will result in the LIMIT amount of rows.

SELECT DISTINCT …FROM ( SELECT … FROM yourBigTable ORDER BY x LIMIT y ) AS tempORDER BY zLIMIT n

That way you can use the order by index limit x optimization on the inner query and only have to sort/condense the smaller derived table.

gmouse · November 29, 2011, 7:15am

That’s a nice workaround, but still causes disk i/o if the result set has a text table. I was hoping to hear that either MySQL has the functionality which can be triggered somehow or why it does not have it

Topic		Replies	Views
1-n distinct/group by optimization Other MySQL® Questions	2	527	August 14, 2006
ORDER BY ..LIMIT using indexes still slow Other MySQL® Questions	4	577	April 14, 2008
Two general MySQL questions.. Other MySQL® Questions	3	430	January 9, 2007
ORDER BY … LIMIT Performance Optimization Other MySQL® Questions	4	476	February 26, 2007
Query optimization indexing help Other MySQL® Questions	2	459	April 19, 2013

MySQL DISTINCT with LIMIT optimization

Related topics