the query contains LIMIT x with x relatively small compared to the full result set
I think this is a very common situation.
I’ve tried this in a quite recent version, and DISTINCT is still performed by ordering the results first, then removing duplicates, then ordering on the columns specified after ORDER BY, then returning the first x rows.
It seems more sense to fetch the rows in the order specified by ORDER BY if an index is available, and checking if it is not equal to a row that has been returned already. Especially when x is small, say 10 or 50, and the full result set is in the millions, the performance gain is substantial. The check for uniqueness could be based on full row comparison or by checksum.
Sorry I didn’t really understand that it was a question the first time I read the post.
Do you have a join in the query is it on only one table?
The first thought that comes from the top of my head is that you create a derived table with say 2 times the amount of rows that you need to ensure that your DISTINCT will result in the LIMIT amount of rows.
SELECT DISTINCT …FROM ( SELECT … FROM yourBigTable ORDER BY x LIMIT y ) AS tempORDER BY zLIMIT n
That way you can use the order by index limit x optimization on the inner query and only have to sort/condense the smaller derived table.
That’s a nice workaround, but still causes disk i/o if the result set has a text table. I was hoping to hear that either MySQL has the functionality which can be triggered somehow or why it does not have it