Not the answer you need?
Register and ask your own question!

Slower than expected primary key based query

sergiu.hlihorsergiu.hlihor ContributorCurrent User Role Novice
Hello
I have a simple query like
SELECT * FROM `database`.`table` ORDER by `primaryKeyColumn` ASC LIMIT 5000000,1;

From my understanding the primary keys are stored as clustered indexes and should be sorted, therefore allowing very fast lookup. Therefore I would have expected an almost instant execution time. However what I have found is that execution time grows exponentially with limit size (factor about 1.3 once limit gets in millions range). For my machine for example I have:
~ 1 second for LIMIT 5000000,1
~ 12.8 seconds for LIMIT 50000000,1
~ 28 seconds for LIMIT 100000000,1

Is this a performance bug? or there are some underlying limits that prevents the engine to really use a sorted primary key. To be mentioned that execution times, surprisingly were almost identical when used both TokuDB and InnoDB, this on Percona MySQL 8.0.15.5, storage medium being an Intel Optane SSD, table size, about 120M.

Also, interesting, MySQL 5.6 appears to be faster by about 20% for this specific query

Comments

  • vadimtkvadimtk Contributor Percona Staff Role
    Primary Key lookups are very fast, but your queries are not lookups, but scans.
    So your first query basically scans 5000000 records to retrieve 1 record, and it takes 1 sec.

    Your second query scans 50000000 records (10 times more), and unsurprisingly it takes 10 times more to execute - 12.8 sec
  • sergiu.hlihorsergiu.hlihor Contributor Current User Role Novice
    Maybe my understanding was wrong, but if primary keys are sorted and clustered, then the complexity for finding position 5000000 is logN (binary search), since keys are sorted already. If clustered and having clusters with fixed sizes, then it's again easier. So such a query does not perform binary search but linear search even though the keys are sorted?
  • vadimtkvadimtk Contributor Percona Staff Role
    You have to use the condition WHERE PK=<value> in order for the fast lookup. Finding by position does not work this way
  • sergiu.hlihorsergiu.hlihor Contributor Current User Role Novice
    Still, for such queries I would have expected a PK optimization. Since key relative position is indeed expensive to keep, a walk through clusters of keys and just counting until reaching the desired range would have been way faster. This would reduce the complexity to N/keysPerBlock, which should have been way faster than 5M/s for my table where row size is 64 bytes, so there is room for improvement. Thanks for clarifying.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.