MYISAM on SSD and Fragmentation

My application mainly queries a handful of tables which are updated (replaced actually) once per week. These are effectively WORM (write once read many) tables. Naturally I thought they would be perfect candidates for SSD storage as well as the MYISAM storage engine. The main advantage of SSD is blisteringly fast reads thanks to 0 seek time. The main disadvantages are that the drives can only handle a limited number of writes per lifetime of the drive, and that lots of small random writes tend to perform poorly (compared to reads). In addition, SSD drives are still pretty expensive per GB of space, luckily my tables total < 40GB or so. So WORM tables easily leverage all of the advantages and (mostly) avoid all the disadvantages of SSD. Likewise the low overhead of MYISAM tables when there are absolutely no writes/transactions/locks/etc occurring makes that table engine a great choice as well. Also note that no writes means no fragmentation once the table is written to disk.

Situation:

Large MYISAM table (>200M records): “roTable” for this example

Fixed width records (65 byte rows)

Primary (unique) key on one column: “col1” for this example

Non-unique key on one other column: “col2” for this example

Write the entire table once (weekly), high-volume read-only thereafter

MYISAM files on a SSD drive: ext3, “noatime” set as well as “noop” scheduler

MySQL Query Cache disabled/off

Queries (%100 selects) fit one of these two situations:

  1. Get a single record using primary key.

SELECT * FROM roTable WHERE col1 = ‘blah’

These obviously perform great.

  1. Get a set of records using a range of non-unique key values.

SELECT * FROM roTable WHERE col2 like ‘ABC%’

Mostly performs fine… However, some of these queries (surprisingly not the ones that retrieve the most records) would take many seconds(!) to complete. The offending queries seemed random, but were reproducibly slow. Then, I tried those same queries getting “count()" instead of "”.

SELECT count(*) FROM roTable WHERE col2 like ‘ABC%’

These would return instantly. So, reading the index (MYI file) was working fine, it was the act of getting data from the disk (MYD file) that was taking so long. Even though it is on SSD!

Solution:

When building the table (which I do weekly) I explicitly ordered the records by col2 (the only column used to obtain multiple records), like this:

INSERT INTO roTable SELECT * FROM sourceTable ORDER BY col2

Now, range queries on the non-unique key that were taking over 10 seconds complete in about 1/100th the time:

45992 rows in set (0.09 sec)

Note: record retrieval by primary key is unaffected by this sorting (still super fast).

Conclusion:

WORM data in MYISAM tables on SSD works great, but if you do queries for multiple records, it may be worth sorting your data (if you can) to get that sub-second response time you’d expect. It seems like fragmentation of data on disk has a large impact on responsiveness of (multi-row) queries, even when using SSD.