Slower than expected primary key based query

sergiu.hlihor · May 9, 2019, 5:52pm

Hello
I have a simple query like
SELECT * FROM database.table ORDER by primaryKeyColumn ASC LIMIT 5000000,1;

From my understanding the primary keys are stored as clustered indexes and should be sorted, therefore allowing very fast lookup. Therefore I would have expected an almost instant execution time. However what I have found is that execution time grows exponentially with limit size (factor about 1.3 once limit gets in millions range). For my machine for example I have:
~ 1 second for LIMIT 5000000,1
~ 12.8 seconds for LIMIT 50000000,1
~ 28 seconds for LIMIT 100000000,1

Is this a performance bug? or there are some underlying limits that prevents the engine to really use a sorted primary key. To be mentioned that execution times, surprisingly were almost identical when used both TokuDB and InnoDB, this on Percona MySQL 8.0.15.5, storage medium being an Intel Optane SSD, table size, about 120M.

Also, interesting, MySQL 5.6 appears to be faster by about 20% for this specific query

vadimtk · May 10, 2019, 7:58am

Primary Key lookups are very fast, but your queries are not lookups, but scans.
So your first query basically scans 5000000 records to retrieve 1 record, and it takes 1 sec.

Your second query scans 50000000 records (10 times more), and unsurprisingly it takes 10 times more to execute - 12.8 sec

sergiu.hlihor · May 10, 2019, 8:25am

Maybe my understanding was wrong, but if primary keys are sorted and clustered, then the complexity for finding position 5000000 is logN (binary search), since keys are sorted already. If clustered and having clusters with fixed sizes, then it’s again easier. So such a query does not perform binary search but linear search even though the keys are sorted?

vadimtk · May 10, 2019, 8:34am

You have to use the condition WHERE PK= in order for the fast lookup. Finding by position does not work this way

sergiu.hlihor · May 11, 2019, 12:12am

Still, for such queries I would have expected a PK optimization. Since key relative position is indeed expensive to keep, a walk through clusters of keys and just counting until reaching the desired range would have been way faster. This would reduce the complexity to N/keysPerBlock, which should have been way faster than 5M/s for my table where row size is 64 bytes, so there is room for improvement. Thanks for clarifying.

Topic		Replies	Views
USE INDEX (PRIMARY) is much faster, even when not looking for anything in that column Percona Server for MySQL 5.7	1	2660	April 24, 2018
PRIMARY key not working Other MySQL® Questions	2	455	December 27, 2009
Mysql not using index for ORDER BY ? Other MySQL® Questions	2	533	May 13, 2007
poor performance ORDERing BY indexed column Other MySQL® Questions	2	546	November 12, 2007
Slow queries with very large table Other MySQL® Questions	2	897	January 9, 2014

Slower than expected primary key based query

Related topics