What to do when you are IO Bound?

I have an InnoDB type instance with many million rows:

±----------------±---------------------±-----±----±---- ----------------±---------------+
| Field | Type | Null | Key | Default | Extra |
±----------------±---------------------±-----±----±---- ----------------±---------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| description | text | YES | | NULL | |
| full_story | text | YES | | NULL | |
| uri | varchar(255) | YES | MUL | NULL | |
| pubdate | datetime | YES | | 0000-00-00 00:00:00 | |
| pubdate_reverse | int(9) | YES | | NULL | |
| feed_id | int(5) | YES | MUL | NULL | |
±----------------±---------------------±-----±----±---- ----------------±---------------+

I’m doing an ETL on this data to externalize some of it - but the extract process is taking forever and I’ve determined its IO bound.

What can I do to speed things up? I really want to dump the entire table contents to files. But I wanted to do it in individual files that made the transform and load easy - so I was querying and generating a file - in a loop.

Here is an example query:

Select id, title, uri, pubdate from entries where feed_id = 151 order by pubdate_reverse limit 1000;

1000 rows in set (3 min 34.11 sec)

Yes that’s 3+ minutes for 1000 rows. I’m iterating int he value of feed_id increasing it for each query.

Explain the query:

+—-+————-+———+——+——————-+— —————-+———+——-+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——+——————-+— —————-+———+——-+——-+————-+
| 1 | SIMPLE | entries | ref | index_feed_recent | index_feed_recent | 5 | const | 18032 | Using where |
+—-+————-+———+——+——————-+— —————-+———+——-+——-+————-+

Seems fine.

Looking at iostat while the query is running:

avg-cpu: %user %nice %system %iowait %steal %idle
0.51 0.00 0.20 54.55 0.00 44.73

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdb 255.00 221.80 70.60 49.00 2580.80 2159.00 39.63 3.88 32.34 7.66 91.60
sdc 235.20 216.80 65.40 47.80 2360.00 2104.60 39.44 3.82 33.53 7.23 81.80
dm-0 0.00 0.00 628.00 535.20 5024.00 4263.60 7.98 39.04 33.41 0.85 99.20

I see 91-99% disk utilization which makes me think the process is IO bound.

Question is: what now?

Jim

If you can, get rid of the order by. It doesn’t really matter that you’re limiting the output result set to 1000 rows, since MySQL will still need to sort the entire table before it can perform the limit. Your explain plan should indicate a filesort is being used if this is the case. If you must do this, try creating one of the situations describe in the MySQL docs under LIMIT optimization.

Also, you may just want to dump all the data at once, rather trying to page through it. Just SELECT INTO OUTFILE with your ORDER BY, it shouldn’t take you much longer than three minutes to dump all the data if I’m right about it doing a full sort before applying the limit.

You should also remove keys you don’t need, such as on your text columns. The indices created will be massive and will create a lot of overhead if you don’t need them. I’d also remove them from the two varchar columns, too.

Also, as you haven’t mentioned it, have you set innodb_buffer_pool_size to something appropriate?