My project is going to update a large table (2GB, >10,000,000 rows) every night. The content is built from a aggregated set of data, also in the same database. As you may realize, the selecting and inserting process are time consuming.
My strategy is:
Dump the data using SELECT INTO OUTFILE to prevent full table locking
(Thanks to the suggestion from this blog ) )
Create a new table, say, “indexinfo_071010”.
(This is to prevent the importing task affecting current search. During the import, other users should be searching/selecting against the previous large table, indexinfo_071009)
Import the data with LOAD DATA FILE.
Update the configuration table to instruct future users to search on indexinfo_071010. Then drop the old large table indexinfo_071009.
With this method the transition from the old large table to the new one should be seamless. But now there’s one problem:
LOAD DATA FILE seems to be so aggressive that it starts to affect the performance of other connections!
As said, “indexinfo_071010” should be exclusively used by LOAD DATA FILE during the import. No other user should attempt to use this table. But I found it actually affecting the select and update statement, where for example a simple UPDATE costs more than a minute.
Since the import process takes more than an hour this is not acceptable. The LOW PRIORITY is for other users accessing the same table, so I think it doesn’t help.
I would like to ask if there’s a way to make LOAD DATA FILE not so greedy? Or it actually shouldn’t behave like that, and there’s something worth checking elsewhere?
Thanks in advance.
[EDIT: Forgot to mention that the large table is in MyISAM. It’s only inserted bulkly in first place, afterwards it should be SELECT only, with limited INSERTS. No UPDATE will be taken.]
You check for the cpu usage with for exampel “top” which is a command that you can run from the shell.
On the third row from the top you have a summary of cpu usage on the system. Compare the “wa” with “us” and “sy” if the wa is high then that means that the OS is waiting for the disk.
And in your case since you have the file and the database on the same disk I’m guessing that that is the problem since they are competing for disk read/writes.
Unfortunately it is pretty impossible to slow down a LOAD DATA INFILE.
Try getting a second disk.
If you want to slow it down so that normal operations can be performed then you could try with running the LOAD DATA with LOCAL on another server, the latency in the network should slow it down a bit.
The only other suggestion that I have is that you ditch the LOAD DATA INFILE and use another application that imports the data by creating and sending SQL commands. And that this application has a wait once in a while so that it doesn’t load the server flat out.