Hi all,
As suggested in many post and for instance also in the book ‘High Performance MySQL’ it is a good idea to restore different tables of a database in parallel. But, it seems like this is not giving a significant improvement in restore time at all.
For instance. I first tried the following. Using an awk script, I split up one sequential mysql dump file (SQL format) into separate SQLs, taking into account the bits at the start and the end of the files. Then I imported those files in parallel using xargs with the -P option. In this I made sure to import the biggest files first to minimize total import time.
On one system where I tried this (windows, 1 quad core with mysql 5.0.82-community-nt-log), using 4 parallel import, I did not achieve any increase in restore time at all. The total time was practically the same. Yet what I saw in ‘show processlist’ was that it was executing SQL statements for different tables in parallel.
Then I though it could be related to windows and/or this old MySQL version so I tried it on one of the newer systems. This is a dual 6-core processor machine running Red Hat EL 6.2 and mysql 5.1.61. This time, after googling around some more I dumped the database in mysqlimport format with one txt file per table. Then I used mysqlimport with the --use-threads=4 option.
In the latter case, I am seeing a performance improvement, going from 2 hours 45 minutes with a single SQL file imported in a single thread to 2 hours with 4 threads using mysqlimport.
I am willing to ignore the results of the windows system (as that will be replaced anyway in the near future), but on the linux system, I would have expected a much higher performance increase.
Any ideas as to what could be the problem? The mysql configuration on the windows and linux systems is similar (the linux my.cnf is based on the windows my.ini file). It looks like there is some bottleneck in the database which is preventing truly parallel insertion.
Also, it is perhaps good to mention that we are using a single innodb data file and not file per table. I experimented also with using file per table on my laptop but got a much lower performance.
Any ideas as to what could be going on?
Cheers
Erik