I asked this question on IRC, but the best anyone could tell me was “read the source”, which to me is read as “nobody here knows”. That’s unusual, so I thought I’d try here (being that my pointer math and such is sure to be rusty).
The question is mostly academic at this point, as I’ve begun scripting some test scenarios - but it would be nice to have this knowledge so I can know what I’m working with.
I have a database which contains several large tables. One table has 20 million rows. All of the tables are MyISAM tables, and they are mostly poorly indexed (I inherited this database, btw). I came up with a solution whereby all queries on the very big table are at least using indexes, and some of the queries are index-covered. In the end, I’ve reached a balance of decent performance without adding too much index overhead. The biggest win is that the new indexes will fit just about entirely in memory, so there’s much less I/O (this db used to cause %iowait numbers in the 90’s).
The next step in a larger migration to a more sane database strategy is dropping indexes that aren’t (nay, can’t) be used, and adding indexes that can/will be used. Although the queries I’ve run in testing prove that SELECTS will be much, much faster, I have not done testing using concurrent INSERT statements. My question at a higher level is:
“How do concurrent inserts affect my stellar select performance, given that that relies on the indexes mostly staying in memory?”
At a lower level, the question is:
“How does MySQL handle updating indexes that are in memory?”
The particular index in question is very large itself, and is expensive to load or do disk-based scans on, so if it has to flush the index to disk to update it and then reload it back into memory, that’s going to cause so much pain that I’ll have to take more drastic measures. I’ve never had to write software that wrote to disk and also updating something live while it was in memory to be used by some engine, but it sounds like it might be an unreliable, error-prone, fault-intolerant way to do things. Again, I profess no deep knowledge of how something like this might work, but it would help me understand how performance will change when I go from testing selects to testing selects with concurrent inserts, and from there moving into production, where there’s more of both )