I need to import 100 million records from a file into a table (schema given below). The table is stored in innodb engine and am using mysql server version 5.0.45.
I tried using ‘LOAD DATA INFILE’ to import data; however, its performance deteriorates as more rows are inserted. Is there any trick that can be used to complete this import in less than a couple of hours. Need a response on an urgent basis. Thanks.
Table schema:
CREATE TABLE t ( id int(11) NOT NULL auto_increment, pid int(11) NOT NULL, cname varchar(255) NOT NULL, dname varchar(255) default NULL,
PRIMARY KEY (id),
UNIQUE KEY index_t_on_pid_and_cname (pid,cname),
KEY index_tags_on_cname (cname)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
and about half way down is a program to split a csv file up. I found splitting it into groups of 500,000 has the best trade of size and performance. I then used Navicat to do a batch import - opening all of the files at once and importing them all to the same table.
What I could advise is to try an ETL tool. It is one of the best way to load data into Innodb.
Talend Open Studio is an open source ETL tool for data integration and migration experts. It’s easy to learn for a non-technical user. What distinguishes Talend, when it comes to business users, is the tMap component. It allows the user to get a graphical and functional view of integration processes.
one CSV datafile 100 million rows sorted in primary key order (order is important to improve bulk loading times – remember innodb clustered primary keys)
truncate ;
set autocommit = 0;
load data infile into table
…
commit;
runtime 15 mins depending on your hardware and mysql config.
typical import stats i have observed during bulk loads:
3.5 - 6.5 million rows imported per min
210 - 400 million rows per hour
other optimisations as mentioned above if applicable: