SELECT FOR UPDATE using secondary index => 50x times slower than forcing full table scan

Hi all! (Please be kind with my poor english…)

THANK YOU for this blog! :o)

I am developing on my spare time a little CMS storing data in a hierarchical manner. After googling and testing a while, I end up using the “materialized path” solution despite its unefficient storage requirement.

Here is the (simplified) tree table:

create table site_node ( id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT, # — path VARBINARY(32) NOT NULL, children SMALLINT UNSIGNED NOT NULL, parent MEDIUMINT UNSIGNED NOT NULL, alias VARBINARY(32) NOT NULL, # — PRIMARY KEY (id), UNIQUE INDEX ak_path (path), UNIQUE INDEX ak_parent (parent, alias), FOREIGN KEY fk_parent (parent) REFERENCES site_node (id) ON UPDATE CASCADE ON DELETE RESTRICT)ENGINE=InnoDB;

“parent” field is used to select direct children (faster than using path index).

“children” field contains the number of direct children (0 => leaf node).

“path” field contains the node path encoded in hexadecimal, each level is 4 bytes long:

mysql> SELECT id, parent, children, path FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path LIMIT 20;±-------±-------±---------±-----------------+| id | parent | children | path |±-------±-------±---------±-----------------+| 2 | 1 | 521 | 0001 || 102 | 2 | 348 | 00010001 || 5128 | 102 | 133 | 000100010001 || 258090 | 5128 | 0 | 0001000100010001 || 261033 | 5128 | 0 | 0001000100010002 || 263014 | 5128 | 0 | 0001000100010003 || 269425 | 5128 | 0 | 0001000100010004 || 272585 | 5128 | 0 | 0001000100010005 || 283558 | 5128 | 0 | 0001000100010006 || 284133 | 5128 | 0 | 0001000100010007 || 284925 | 5128 | 0 | 0001000100010008 || 293219 | 5128 | 0 | 0001000100010009 || 309769 | 5128 | 0 | 000100010001000A || 311042 | 5128 | 0 | 000100010001000B || 316655 | 5128 | 0 | 000100010001000C || 337223 | 5128 | 0 | 000100010001000D || 339034 | 5128 | 0 | 000100010001000E || 346686 | 5128 | 0 | 000100010001000F || 348222 | 5128 | 0 | 0001000100010010 || 363623 | 5128 | 0 | 0001000100010011 |±-------±-------±---------±-----------------+20 rows in set (0.00 sec)

As expected, retrieving or counting all children from a node is trivial:

Cold:mysql> SELECT COUNT() FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path;±---------+| COUNT() |±---------+| 601591 |±---------+1 row in set (13.07 sec)Warm:mysql> SELECT COUNT() FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path;±---------+| COUNT() |±---------+| 601591 |±---------+1 row in set (0.55 sec)

Here is my problem:

when I need to move or shift a node, I must lock all affected nodes before updating them. For instance, before shifting node id #2 from rank 1 to rank 2 and updating 601591 rows, I lock the whole subtree :

Note: should be path < “0003” but you get the idea.

mysql> EXPLAIN SELECT COUNT(*) FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path FOR UPDATE;±—±------------±----------±------±--------------±--------±--------±-----±-------±-------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±----------±------±--------------±--------±--------±-----±-------±-------------------------+| 1 | SIMPLE | site_node | range | ak_path | ak_path | 32 | NULL | 902702 | Using where; Using index |±—±------------±----------±------±--------------±--------±--------±-----±-------±-------------------------+1 row in set (0.00 sec)

Index range scan should be fast (Handler_read_next going up) but it is not :

SELECT FOR UPDATE:mysql> SELECT COUNT() FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path FOR UPDATE;±---------+| COUNT() |±---------+| 601591 |±---------+1 row in set (20 min 41.33 sec)Note: using SELECT COUNT(id) does not improve anything.Innodb output:LIST OF TRANSACTIONS FOR EACH SESSION:—TRANSACTION 0 36156404, ACTIVE 1208 sec, process no 20386, OS thread id 1448221616 fetching rows, thread declared inside InnoDB 136mysql tables in use 1, locked 126110 lock struct(s), heap size 1944896MySQL thread id 629, query id 30002552 localhost root Sending dataSELECT COUNT(*) FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path FOR UPDATEBUFFER POOL AND MEMORY----------------------Total memory allocated 305426468; in additional pool allocated 1682688Buffer pool size 16384Free buffers 0Database pages 16289Modified db pages 1Pending reads 1Pending writes: LRU 0, flush list 0, single page 0Pages read 1367173, created 54631, written 605910228.55 reads/s, 0.00 creates/s, 0.00 writes/sBuffer pool hit rate 883 / 1000ROW OPERATIONS--------------1 queries inside InnoDB, 0 queries in queueMain thread process no. 20386, id 1439226800, state: waiting for server activityNumber of rows inserted 5000000, updated 10000000, deleted 0, read 725714210.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 598.03 reads/s

What I do not understand is that using a read lock is much faster:

SELECT LOCK IN SHARE MODE:mysql> SELECT COUNT() FROM site_node WHERE path >= “0001” AND path < “0002” ORDER BY path LOCK IN SHARE MODE;±---------+| COUNT() |±---------+| 601591 |±---------+1 row in set (13.57 sec)

After scratching my head a while, I tried this:

Full index scan:mysql> EXPLAIN SELECT COUNT() FROM site_node USE INDEX(PRIMARY) WHERE id != 0 AND path >= “0001” AND path < “0002” ORDER BY path FOR UPDATE;±—±------------±----------±------±--------------±--------±--------±-----±--------±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±----------±------±--------------±--------±--------±-----±--------±------------+| 1 | SIMPLE | site_node | range | PRIMARY | PRIMARY | 3 | NULL | 2501352 | Using where |±—±------------±----------±------±--------------±--------±--------±-----±--------±------------+1 row in set (0.04 sec)mysql> SELECT COUNT() FROM site_node USE INDEX(PRIMARY) WHERE id != 0 AND path >= “0001” AND path < “0002” ORDER BY path FOR UPDATE;±---------+| COUNT(*) |±---------+| 601591 |±---------+1 row in set (23.97 sec)Yep! Much better. :o|

And this:

Full table scan:mysql> EXPLAIN SELECT COUNT() FROM site_node USE INDEX(PRIMARY) WHERE path >= “0001” AND path < “0002” ORDER BY path FOR UPDATE;±—±------------±----------±-----±--------------±-----±--------±-----±--------±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±----------±-----±--------------±-----±--------±-----±--------±------------+| 1 | SIMPLE | site_node | ALL | NULL | NULL | NULL | NULL | 5002703 | Using where |±—±------------±----------±-----±--------------±-----±--------±-----±--------±------------+1 row in set (0.00 sec)mysql> SELECT COUNT() FROM site_node USE INDEX(PRIMARY) WHERE path >= “0001” AND path < “0002” ORDER BY path FOR UPDATE;±---------+| COUNT(*) |±---------+| 601591 |±---------+1 row in set (20.21 sec)Sigh… Better again…

The “site_node” table contains 5 millions rows and is arround 1Go (Data 396,288 Mo, Index 525,392 Mo, Total 921,680 Mo).

MySQL is running on a tiny box:

  • CPU 3Ghz
  • RAM 512 Mo
  • HDD 80Go (IDE, 55 Mo/s)

skip-networkingtable_cache = 256innodb_file_per_table = 1innodb_buffer_pool_size = 256M (I cannot raise it up…)innodb_additional_mem_pool_size = 8Minnodb_log_file_size = 256Minnodb_log_buffer_size = 8Minnodb_flush_log_at_trx_commit = 2

Could it be because ak_path index does not fit in buffer pool?

Note: update speed is quite normal (15 000 - 20 000 rows/s).

I have also tried to use “path” field as PRIMARY KEY hoping that it would be faster using a clustered index but in this case total index size grows up and update speed is slower because of B-TREE rebalancing.

Thank you for your help!

PS: No, I do not plan to use nested sets… ;o) (not yet!)

Interesting.

Take a look at handler stats when you run this query with LOCK IN SHARE MODE and SELECT FOR UPDATE

I do not see any difference in how locking is done - it should be about same in both cases.

Hi again!

Watching “mysqladmin extended-status -ri 10” did not help:

  • Handler_read_next goes up when using index (and index scan)
  • Handler_read_rnd_next goes up when forcing table scan

The only difference is that SELECT … FOR UPDATE when using ak_path index is very slow while SELECT … LOCK IN SHARE MODE runs “normaly”.

The buffer pool is small so these three queries are all IO bounded: the HDD led confirms that the hard drive is really having a bad time… ;o)

Running “iostat -xk 10”:SELECT … LOCK IN SHARE MODEDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %utilhda 0.00 3.50 119.68 0.80 3931.47 12.19 1965.73 6.09 32.73 1.16 9.19 7.67 92.37hda 0.00 2.00 117.12 1.30 3781.38 10.81 1890.69 5.41 32.02 1.22 10.71 7.87 93.17hda 0.00 4.09 120.96 1.50 3877.05 16.57 1938.52 8.28 31.80 1.39 11.39 7.56 92.58hda 0.00 3.60 136.40 1.30 4444.80 15.20 2222.40 7.60 32.39 1.24 8.98 6.70 92.28hda 0.00 3.60 129.90 1.20 4236.80 14.40 2118.40 7.20 32.43 1.19 9.08 6.88 90.20InnoDB output (50 000 reads/s) <= index scan, sounds good1 pending preads, 0 pending pwrites120.93 reads/s, 16858 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s118.93 reads/s, 17029 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s126.62 reads/s, 17694 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s# -----SELECT … FOR UPDATEDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %utilhda 0.00 3.70 240.54 1.40 8760.76 16.82 4380.38 8.41 36.28 1.75 7.22 4.07 98.47hda 0.00 3.60 199.70 1.20 6818.78 14.39 3409.39 7.19 34.01 1.34 6.66 4.89 98.16hda 0.00 3.60 234.43 1.30 8798.40 14.61 4399.20 7.31 37.39 1.53 6.48 4.18 98.51hda 0.00 3.60 238.36 1.40 9356.24 15.38 4678.12 7.69 39.09 1.58 6.59 4.10 98.31hda 0.00 3.60 237.10 1.20 8820.80 15.20 4410.40 7.60 37.08 1.63 6.68 4.11 98.01InnoDB output (500 reads/s) <= index scan that sucks, but why?1 pending preads, 0 pending pwrites215.17 reads/s, 20876 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s179.11 reads/s, 18670 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s217.67 reads/s, 21741 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s235.24 reads/s, 20179 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s# -----SELECT [INDEX & TABLE SCAN] … FOR UPDATE => InnoDB output: 250 000 reads/sDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %utilhda 7.69 3.60 185.51 1.70 43442.96 17.18 21721.48 8.59 232.14 0.61 3.23 2.96 55.34hda 6.80 3.60 177.20 1.20 42112.00 14.40 21056.00 7.20 236.13 0.57 3.19 2.87 51.20hda 6.41 3.60 171.57 1.70 39933.53 16.02 19966.77 8.01 230.56 0.54 3.09 2.87 49.78hda 8.30 3.60 190.70 1.40 44837.40 16.00 22418.70 8.00 233.49 0.65 3.38 2.94 56.53hda 6.09 3.60 160.64 1.40 38005.19 15.98 19002.60 7.99 234.64 0.51 3.18 2.84 45.97InnoDB output (500 000 reads/s) <= at least a good news, table scan rocks… ;o)(I forgot to copy and paste the data…)# -----hdparm /dev/hda: multcount = 16 (on) IO_support = 3 (32-bit w/sync) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 65535/16/63, sectors = 156301488, start = 0hdparm -Tt /dev/hda: Timing cached reads: 2504 MB in 2.00 seconds = 1249.69 MB/sec Timing buffered disk reads: 158 MB in 3.00 seconds = 52.60 MB/sec Timing cached reads: 2524 MB in 2.00 seconds = 1261.56 MB/sec Timing buffered disk reads: 166 MB in 3.02 seconds = 54.96 MB/sec Timing cached reads: 2516 MB in 2.00 seconds = 1257.56 MB/sec Timing buffered disk reads: 154 MB in 3.02 seconds = 51.02 MB/sec

The query cache is disabled and I ran each query after restarting MySQL. I will try ASAP using ak_path as PRIMARY KEY or/and with less rows in order to get the whole table in the buffer pool.

There is no speed difference between PRIMARY KEY Index scan and full table scan. I think this is because rows are physically ordered by id.

Is there some pages in MySQL documention that explains how InnoDB row locking internally works? Does it need to read the whole row or just the index?

Did I miss something or is it just “normal”?

I asked Heikki about it and he filed the bug report

http://bugs.mysql.com/bug.php?id=25914

Basically SELECT FOR UPDATE can’t be “using index” because it needs to locks rows as well. Standard select as well as LOCK IN SHARE MODE can.

Grmph… :o(

Note: if it helps, I use MySQL 4.1 (stable) on Debian (Sarge).

This is getting a little “technical” for me but I would like to be sure I really understand what is going on.

The optimizer says it will use an index (PRIMARY, secondary, whatever) when SELECT … FOR UPDATE locking is requeted but in reality, MySQL do not use it. Right?

Suggested fix:
The optimizer should be aware that a SELECT … FOR UPDATE query does not use a ‘key read’.

Sorry for the stupid question but what does this means “in real life”? I mean, what is doing MySQL if it does not use an index?

Does the ORDER BY clause still works or are rows locked in another/unknown order?


If SELECT … FOR UPDATE is “unusable”, when shifting/moving a node, can/should I try this instead:

  1. BEGIN2) SELECT COUNT(*) WHERE ORDER BY path LOCK IN SHARE MODE3) do some PHP stuff (but do it fast…)4) UPDATE SET path = XXX WHERE 5) COMMIT

Is it “cheap” to upgrade from a READ lock to a WRITE lock? Is there some gotchas to watch for when not using directly a WRITE lock?

Rows are always locked in the same order (ORDER BY path ASC) but is it the right way to avoid/minimize deadlocks?

Sorry for all these questions and many thanks for taking the time to help me! :o)

Sorry. It should be “using index”

It does uses the index but with other selects it only can use index without touching the row data - this is why it is fast.

With SELECT FOR UPDATE it has to read the row data as well this is what slows it down.

At this stage full table scan becomes faster because it is sequential IO.

Great! Now I can die in peace… :o)

Many, many thanks for your help!

[Going back “playing” with this awesome “toy”…]