ORDER BY … LIMIT Performance Optimization

refer to the article:

ORDER BY … LIMIT Performance Optimization

http://www.mysqlperformanceblog.com/2006/09/01/order-by-limi t-performance-optimization/

I am facing this problem. I am using the correct index, but the problem is on “large LIMIT”, e.g.

ORDER BY … LIMIT 10000,10

we can’t avoid this problem in our application, but are they workaround?

can data partitioning help?


If you have LIMIT 10000,X means MySQL will have to generate and throw away first 10000 rows which is not overly efficient and partitioning is not directly helpful here.

I’d ask the question describing task you’re trying to solve rather than judging it way you can do things… there are usually many ways.

I’m facing a similar problem to the original poster’s, perhaps you can help.

The problim is: I have items that belong to a certain category. You can select which categories to view, and browse through these items by page.
For example, you might view page 1000 of categories 1, 2, 10, and 30.

Sample table:

CREATE TABLE test ( id int(10) unsigned NOT NULL auto_increment, category tinyint(3) unsigned NOT NULL, PRIMARY KEY (id), KEY cat (category)) ENGINE=MyISAM

Sample query to view page 1000 of cateogies 1,2,10,30:

SELECT id FROM test WHERE category IN(1,2,10,30) ORDER BY id DESC LIMIT 10000,10

I have the ORDER BY id DESC because the latest items are added at the end, so page 1 would be the last rows of the table.

I don’t know how to optimize this query - as you said, this kind of query is generating 10,010 rows and throwing away all but 10 of them.

There will ultimately be millions of items, but not hundreds of millions (but there will potentially be 100,000 added daily). There will be less than 200 categories, and the user will need to be able to select any categories they want and browse through the pages.

Any ideas?

For millions (not hundreds of millions) “brute force” approach may work. With 7-byte key length occasional 100k rows index scan isn’t a big deal.
But this of course should be tested with real workloads in mind.

I’d look if this causes performance problems for you.

Alexey is a bit wrong about index scan - LIMIT will actually perform row read and then throw data away, unless it is fully index covered query.

My first suggestion is to limit how far you can go - you can’t go to page number 1000 even in Google. Restricting number of pages you really show works for many cases.

If you absolutely need to show all pages and you only update data once per day - you can pre-generate positions within each of category.