I have been given the task of loading 6.5billion records (yes, that is not a typo, six billion records) into mysql database running on Ubuntu 8.04.
I have successfully done so using:
load data local infile ‘FILENAME’ into table TABLENAME FIELDS TERMINATED BY ‘,’;
This took around 10 hours.
I used the following procedure to create the INDEXES:
Create the data without indexes
Move (rename) the table.MYD file to a backup
Create an empy MYD file using TRUNCATE TABLE
Add indexes
Move the backed up MYD file back to table.MYD
Run to fix:
myisamchk -q -r -O key_buffer=4096M -O sort_buffer=2048M -O read_buffer=640M -O write_buffer=640M
This procedure took around 15 HOURS
Once complete file sizes:
MYD → 410Gig
MYI → 200Gig
Server SPEC:
DELL 2950 Dual Quad Core 2.5Mhz
32Gig RAM
Data on 7 x 7200kRPM (STRIPE) iSCSI Connected Disks.
At the end of the project I will have 10 times the amount of data to load.
I would be interested to hear comments/ideas on ways to improve the load and index time. Or any other suggestions working with this quantity of data.
Thanks
Laurence