Not the answer you need?
Register and ask your own question!

Load large data from file into innodb table

aba1979aba1979 EntrantInactive User Role Participant
I need to import 100 million records from a file into a table (schema given below). The table is stored in innodb engine and am using mysql server version 5.0.45.
I tried using 'LOAD DATA INFILE' to import data; however, its performance deteriorates as more rows are inserted. Is there any trick that can be used to complete this import in less than a couple of hours. Need a response on an urgent basis. Thanks.

Table schema:

CREATE TABLE `t` (
`id` int(11) NOT NULL auto_increment,
`pid` int(11) NOT NULL,
`cname` varchar(255) NOT NULL,
`dname` varchar(255) default NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_t_on_pid_and_cname` (`pid`,`cname`),
KEY `index_tags_on_cname` (`cname`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Comments

  • drjaydrjay Entrant Current User Role Participant
    Yeah, I had the same problem but with about a billion rows. You can imagine my frustration waiting for it!

    I found out you can make it go hundreds of times faster by splitting the file up. Go here:
    http://www.fxfisherman.com/forums/forex-metatrader/tools-uti lities/75-csv-splitter-divide-large-csv-files.html

    and about half way down is a program to split a csv file up. I found splitting it into groups of 500,000 has the best trade of size and performance. I then used Navicat to do a batch import - opening all of the files at once and importing them all to the same table.
  • TgratleTgratle Entrant Inactive User Role Poster
    Hello,

    What I could advise is to try an ETL tool. It is one of the best way to load data into Innodb.

    Talend Open Studio is an open source ETL tool for data integration and migration experts. It's easy to learn for a non-technical user. What distinguishes Talend, when it comes to business users, is the tMap component. It allows the user to get a graphical and functional view of integration processes.

    For more information: http://www.talend.com/
  • f00barf00bar Entrant Current User Role Participant
    engine=innodb;

    one CSV datafile 100 million rows sorted in primary key order (order is important to improve bulk loading times – remember innodb clustered primary keys)



    truncate <table>;

    set autocommit = 0;

    load data infile <path> into table <table>...

    commit;

    runtime 15 mins depending on your hardware and mysql config.



    typical import stats i have observed during bulk loads:

    3.5 - 6.5 million rows imported per min
    210 - 400 million rows per hour

    other optimisations as mentioned above if applicable:

    unqiue_checks = 0;
    foreign_key_checks = 0;

    split the file into chunks
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.