In cleaning up a legacy database 0.5M rows, 100+ fields, I have written an application to open simultaneously read a result set row by row and write the processed result set to a second table within the same database.
Along the way I use a third connection to read from one of four or five other indexed tables to clean the data or fill in missing fields.
This is not fast!
If I do a simple read row then insert row, it takes about 100 minutes. If I do three other table reads as well, it takes 1000 minutes.
When I watch the windows process manager I see that mysqld is only ever reaching 10% CPU maximum. My application is less than 5% CPU, so I am guessing the slow computer response is due to the shuffling of data in memory?
Why is CPU so under utilized, what tricks can I use to get it more involved?
Would it be faster to use a separate database for the INSERTs?
Woould it be faster to use a secondary database engine for the indexed tables that fill in missing data like say SQLite?
The destination table is type MYISAM not inno. From what I have read this is optimal.
The format of the detination table is all BIGINT and CHAR() with a PRIMARY KEY, TIMESTAMP autoupdate and a few DATE fields… ie a fixed record length (no VARCHR()). This should be optimal. Other than the primary key and TIMESTAMP there are no indexes.
With a the above factors I reduced the simple inserts a little, but introducing the third table lookups slows the process by an order of magnitude again.
MySQL server v 5.1
--------- Loading 621212 TotRows
Row= 9999 Mins= 0.6 Rows/Min= 15146
Row= 19998 Mins= 0.8 Rows/Min= 24510
Row= 29997 Mins= 0.9 Rows/Min= 30782
Row= 39996 Mins= 1.1 Rows/Min= 34897
Row= 49995 Mins= 1.3 Rows/Min= 36230
Row= 59994 Mins= 1.7 Rows/Min= 34236
Row= 69993 Mins= 2.3 Rows/Min= 29863
Row= 79992 Mins= 3.0 Rows/Min= 25914
Row= 89991 Mins= 3.9 Rows/Min= 22661
Row= 99990 Mins= 5.3 Rows/Min= 18729
Row= 109989 Mins= 6.3 Rows/Min= 17277
Row= 119988 Mins= 7.5 Rows/Min= 15805
Row= 129987 Mins= 8.9 Rows/Min= 14581
Row= 139986 Mins= 10.7 Rows/Min= 13049
Row= 149985 Mins= 11.8 Rows/Min= 12633
Row= 159984 Mins= 13.1 Rows/Min= 12189
Row= 169983 Mins= 14.6 Rows/Min= 11630
Row= 179982 Mins= 16.2 Rows/Min= 11080
Row= 189981 Mins= 17.9 Rows/Min= 10604
Row= 199980 Mins= 19.6 Rows/Min= 10185
Row= 209979 Mins= 21.6 Rows/Min= 9704
Row= 219978 Mins= 23.4 Rows/Min= 9376
Row= 229977 Mins= 25.3 Rows/Min= 9071
Row= 239976 Mins= 27.4 Rows/Min= 8743
Row= 249975 Mins= 29.3 Rows/Min= 8503
Row= 259974 Mins= 31.5 Rows/Min= 8236
Row= 269973 Mins= 31.9 Rows/Min= 8455
Row= 279972 Mins= 32.1 Rows/Min= 8704
Row= 289971 Mins= 32.4 Rows/Min= 8949
Row= 299970 Mins= 32.6 Rows/Min= 9184
Row= 309969 Mins= 32.9 Rows/Min= 9412
Row= 319968 Mins= 33.2 Rows/Min= 9627
Row= 329967 Mins= 33.5 Rows/Min= 9834
Row= 339966 Mins= 33.9 Rows/Min= 10002
Row= 349965 Mins= 34.4 Rows/Min= 10155
Row= 359964 Mins= 35.0 Rows/Min= 10266
Row= 369963 Mins= 35.6 Rows/Min= 10365
Row= 379962 Mins= 36.3 Rows/Min= 10462
Row= 389961 Mins= 37.1 Rows/Min= 10482
Row= 399960 Mins= 37.9 Rows/Min= 10541
Row= 409959 Mins= 38.7 Rows/Min= 10566
Row= 419958 Mins= 39.8 Rows/Min= 10530
Row= 429957 Mins= 41.3 Rows/Min= 10385
Row= 439956 Mins= 42.9 Rows/Min= 10254
Row= 449955 Mins= 43.5 Rows/Min= 10324
Row= 459954 Mins= 44.4 Rows/Min= 10356
Row= 469953 Mins= 45.2 Rows/Min= 10390
Row= 479952 Mins= 46.0 Rows/Min= 10415
Row= 489951 Mins= 47.1 Rows/Min= 10387
Row= 499950 Mins= 48.4 Rows/Min= 10322
Row= 509949 Mins= 49.4 Rows/Min= 10320
Row= 519948 Mins= 50.6 Rows/Min= 10268
Row= 529947 Mins= 54.5 Rows/Min= 9723
Row= 539946 Mins= 57.8 Rows/Min= 9338
Row= 549945 Mins= 60.6 Rows/Min= 9071
Row= 559944 Mins= 63.3 Rows/Min= 8840
Row= 569943 Mins= 65.9 Rows/Min= 8646
Row= 579942 Mins= 68.7 Rows/Min= 8436
Row= 589941 Mins= 71.2 Rows/Min= 8274
Row= 599940 Mins= 73.9 Rows/Min= 8112
Row= 609939 Mins= 76.4 Rows/Min= 7980
Row= 619938 Mins= 79.1 Rows/Min= 7827
localhost via TCP/IP
5.1.26-rc-community-log
Suggestions?