LOAD DATA FILE being too resource intensive

Hello all,

My project is going to update a large table (2GB, >10,000,000 rows) every night. The content is built from a aggregated set of data, also in the same database. As you may realize, the selecting and inserting process are time consuming.

My strategy is:

  1. Dump the data using SELECT INTO OUTFILE to prevent full table locking

(Thanks to the suggestion from this blog ) )

  1. Create a new table, say, “indexinfo_071010”.

(This is to prevent the importing task affecting current search. During the import, other users should be searching/selecting against the previous large table, indexinfo_071009)

  1. Import the data with LOAD DATA FILE.

  2. Update the configuration table to instruct future users to search on indexinfo_071010. Then drop the old large table indexinfo_071009.

With this method the transition from the old large table to the new one should be seamless. But now there’s one problem:

LOAD DATA FILE seems to be so aggressive that it starts to affect the performance of other connections!

As said, “indexinfo_071010” should be exclusively used by LOAD DATA FILE during the import. No other user should attempt to use this table. But I found it actually affecting the select and update statement, where for example a simple UPDATE costs more than a minute.

Since the import process takes more than an hour this is not acceptable. The LOW PRIORITY is for other users accessing the same table, so I think it doesn’t help.

I would like to ask if there’s a way to make LOAD DATA FILE not so greedy? Or it actually shouldn’t behave like that, and there’s something worth checking elsewhere?

Thanks in advance.

[EDIT: Forgot to mention that the large table is in MyISAM. It’s only inserted bulkly in first place, afterwards it should be SELECT only, with limited INSERTS. No UPDATE will be taken.]

Regards,
Tamcy

It is supposed to do that since it is designed to import as much as possible as fast as possible = much load on the system.

Questions:
1.
I’m assuming that the import file is placed on the server and you are using LOAD DATA INFILE without the LOCAL word?

Are the import file and database files placed on the same disk?

When importing, is it disk that is the bottle neck or is it CPU?

How many MB is the import file/database size?

How much RAM do you have on the machine?

Hello,

Here are the answers: )

[B]Quote:[/B]

I’m assuming that the import file is placed on the server and you are using LOAD DATA INFILE without the LOCAL word?

Yes, without LOCAL.
[B]Quote:[/B]

Are the import file and database files placed on the same disk?

Yes, there's only one disk available.
[B]Quote:[/B]

When importing, is it disk that is the bottle neck or is it CPU?

I'm not sure. How can I check this?
[B]Quote:[/B]

How many MB is the import file/database size?

Database size: 2-3GB File size: 1-2GB
[B]Quote:[/B]

How much RAM do you have on the machine?

2GB RAM, 32bit Linux.

Thank you!

You check for the cpu usage with for exampel “top” which is a command that you can run from the shell.

On the third row from the top you have a summary of cpu usage on the system. Compare the “wa” with “us” and “sy” if the wa is high then that means that the OS is waiting for the disk.

And in your case since you have the file and the database on the same disk I’m guessing that that is the problem since they are competing for disk read/writes.

Unfortunately it is pretty impossible to slow down a LOAD DATA INFILE.

Suggestions are:
1.
Try getting a second disk.

If you want to slow it down so that normal operations can be performed then you could try with running the LOAD DATA with LOCAL on another server, the latency in the network should slow it down a bit.

The only other suggestion that I have is that you ditch the LOAD DATA INFILE and use another application that imports the data by creating and sending SQL commands. And that this application has a wait once in a while so that it doesn’t load the server flat out.

Hi sterin,

Just checked the database and well it is obviously the disk issue since “wa” always reaches 60% or even more. I will consider adding another disk or RAM then.

Thank you for your assistance )

Tamcy