My project is going to update a large table (2GB, >10,000,000 rows) every night. The content is built from a aggregated set of data, also in the same database. As you may realize, the selecting and inserting process are time consuming.
My strategy is:
- Dump the data using SELECT INTO OUTFILE to prevent full table locking
(Thanks to the suggestion from this blog ) )
- Create a new table, say, “indexinfo_071010”.
(This is to prevent the importing task affecting current search. During the import, other users should be searching/selecting against the previous large table, indexinfo_071009)
Import the data with LOAD DATA FILE.
Update the configuration table to instruct future users to search on indexinfo_071010. Then drop the old large table indexinfo_071009.
With this method the transition from the old large table to the new one should be seamless. But now there’s one problem:
LOAD DATA FILE seems to be so aggressive that it starts to affect the performance of other connections!
As said, “indexinfo_071010” should be exclusively used by LOAD DATA FILE during the import. No other user should attempt to use this table. But I found it actually affecting the select and update statement, where for example a simple UPDATE costs more than a minute.
Since the import process takes more than an hour this is not acceptable. The LOW PRIORITY is for other users accessing the same table, so I think it doesn’t help.
I would like to ask if there’s a way to make LOAD DATA FILE not so greedy? Or it actually shouldn’t behave like that, and there’s something worth checking elsewhere?
Thanks in advance.
[EDIT: Forgot to mention that the large table is in MyISAM. It’s only inserted bulkly in first place, afterwards it should be SELECT only, with limited INSERTS. No UPDATE will be taken.]