importing data from file with --replace vs. --ignore

udin · December 15, 2011, 5:51pm

For the following schema
table t1 (
c1 int not null,
c2 int not null,
primary key (c1,c2)
);

would it be better (i.e. faster) to import data from file using --replace or --ignore?

let’s assume that the vast majority (99%) of rows to be imported are new rows and the table has billions of rows to start with (before the import).

i am open to any storage engine if it makes a big difference.

Thanks.

sterin71 · December 18, 2011, 6:54pm

Well --replace would require a write in addition to the check for unique value. But since you said that only 1% would be duplicates that time would probably not be noticeable.
Basically if you choose --replace or --ignore will probably not matter for the performance.

As for storage engine, choosing InnoDB with that table structure would mean that you don’t get any duplicate storage of data (for the index) since the two columns are the primary key and innodb tables are clustered on the primary key.
But on the other hand InnoDB has a bunch of hidden “columns” in the table and since you only have two int’s in the table the overhead for these hidden ones will be pretty high.

A MyISAM on the other hand would store the two columns twice, once in the table and once in the primary key together with a pointer to the position in the table. Drawback with MyISAM is that a repair in case of a corrupt table could take a very long time.

In any event the only real way to speed this up is to have so much RAM that the entire table is more or less available in RAM to avoid random reads from the disks.

Topic		Replies	Views
How to import large datasets Other MySQL® Questions	2	1062	June 30, 2014
How to import large datasets Other MySQL® Questions	5	812	December 3, 2006
using index puzzle Other MySQL® Questions	3	576	August 4, 2011
varchar index killing csv import Other MySQL® Questions	3	539	February 25, 2007
InnoDB DELETE/INSERT vs. UPDATE in transaction Other MySQL® Questions	4	2772	May 29, 2007

importing data from file with --replace vs. --ignore

Related topics