I am running data mining process that updates/inserts rows to the table (i.e. I do multifield select on indexed fields, and if row is found, I update the data, if not I insert new row). Up to about 15,000,000 rows (1.4GB of data) the procedure was quite fast (500-1000 rows per second), and then it started to slow down. Now I have about 75,000,000 rows (7GB of data) and I am getting about 30-40 rows per second.
If index does not fit in memory you may expect read+write for each index modified + same for data. So for table with 3 indexes you can have 8 IOs needed which can bring you down to 15 rows/sec for completely IO bound workload assuming drive can do 120 IO/sec
In practice it is a bit better a you probably have some cache hit rates (both OS and MySQL) plus writes can be paralelized if you have multiple hard drives.
This is very helpful information. So, the rule of thumb is that index should fit into memory for queries to be fast. Is that correct? And if I throw more memory to the system, would MySQL take advantage of this? For example, if I switch to 64-bit OS with 16GB of memory, can I have 12GB index and still be fast?
About buffer sizes, what values would you recommend?
Also. do you have any recommendations for the hardware, and if there is advantage to upgrade to MySQL 5.1? My main concern is speed.
The database structure if flat, no foreign keys, I don’t use joins, or transactions. Table format is MyISAM. The total size of data in the database is currently 38.5GB, and size of indices is 26GB with the largest one being 10.8GB. The database grows daily by 500,000 rows (or about 80MB).
Yes. MySQL can use more memory for caching, especially on 64bit OS. Having your data well cached in memory is very important for performance, at least having data which you access intensively.
Regarfing hardware - I guess 2CPU boxes with 16GB ram is where sweetspot currently is. If you can build your application to be able to use multiple boxes effectively this is what you can try to stick to. If not you might be forced to purchase more powerful hardware.
Ie number of customers were using boxes with 64GB of RAM because they did not want to rewrite their application.
MySQL 5.1 gives partitioning which might be helpful for your application but I would wait for production release before using it for production.
Flat table is good. You however might wish to split table.