HUGE Tables and Indexes

laurence · May 17, 2009, 9:57am

I have been given the task of loading 6.5billion records (yes, that is not a typo, six billion records) into mysql database running on Ubuntu 8.04.

I have successfully done so using:
load data local infile ‘FILENAME’ into table TABLENAME FIELDS TERMINATED BY ‘,’;

This took around 10 hours.

I used the following procedure to create the INDEXES:

Create the data without indexes
Move (rename) the table.MYD file to a backup
Create an empy MYD file using TRUNCATE TABLE
Add indexes
Move the backed up MYD file back to table.MYD
Run to fix:
myisamchk -q -r -O key_buffer=4096M -O sort_buffer=2048M -O read_buffer=640M -O write_buffer=640M

This procedure took around 15 HOURS

Once complete file sizes:
MYD → 410Gig
MYI → 200Gig

Server SPEC:
DELL 2950 Dual Quad Core 2.5Mhz
32Gig RAM
Data on 7 x 7200kRPM (STRIPE) iSCSI Connected Disks.

At the end of the project I will have 10 times the amount of data to load.

I would be interested to hear comments/ideas on ways to improve the load and index time. Or any other suggestions working with this quantity of data.

Thanks

Laurence

arya · May 17, 2009, 12:50pm

It may be possible, depending on the nature of the data, to shard it to multiple tables (and servers). Doing that should decrease indexing time, and maybe slightly insertion time.

Is there a reason you’re using MyISAM? Or would you be able to look into other storage engines?

laurence · May 17, 2009, 2:55pm

Yes, I could consider using another storage engine.
What would you suggest?
Can you elaborate a little on the sharding?

januzi · May 17, 2009, 5:14pm

So, You need to import data and indexes. You could create temporary table without any index and main table with indexes. After that, You import rows to the temporary table. This may take a while. Then You create script that fetches some rows from temporary table and puts them to the main table. You could run that script manually, but cron should do the trick. For example:

- - - - /path/to/the/php /path/to/the/script
        will run importing script every minute.
        This should be faster than playing with files and fixing indexes.

laurence · May 18, 2009, 12:56am

Januzi,

Why would that be faster?
Can you explain the import script?
Why would I need to cron it, would this not be a one time action?

tute666 · June 9, 2009, 6:29pm

will this data be modified?
consider the Archive engine.

Topic		Replies	Views
Creating index on a large table (60 millions rows) Other MySQL® Questions	2	661	March 21, 2009
LOAD DATA FILE being too resource intensive Other MySQL® Questions	4	660	November 1, 2007
How to import large datasets Other MySQL® Questions	2	1072	June 30, 2014
How to import large datasets Other MySQL® Questions	5	827	December 3, 2006
Tuning table indexes with large datasets and frequent index updates Other MySQL® Questions	4	629	September 8, 2007

HUGE Tables and Indexes

Related topics