Help optimizing UPDATE statement

Could someone see any way to make this query faster ?
Any help or ideas are welcome

UPDATE customscan_track_attributes INNER JOIN tracks on tracks.url=customscan_track_attributes.url and customscan_track_attributes.musicbrainz_id is null set customscan_track_attributes.track=tracks.id where customscan_track_attributes.track!=tracks.id

I’ve tried this version also, but it is a little bit slower:

UPDATE tracks,customscan_track_attributes SET customscan_track_attributes.track=tracks.id where customscan_track_attributes.musicbrainz_id is null and tracks.url=customscan_track_attributes.url and customscan_track_attributes.track!=tracks.id

The tables look like this:

CREATE TABLE tracks ( id int(10) unsigned NOT NULL auto_increment, url text collate utf8_unicode_ci NOT NULL, title blob, titlesort text collate utf8_unicode_ci, titlesearch text collate utf8_unicode_ci, customsearch text collate utf8_unicode_ci, album int(10) unsigned default NULL, tracknum int(10) unsigned default NULL, content_type varchar(255) collate utf8_unicode_ci default NULL, timestamp int(10) unsigned default NULL, filesize int(10) unsigned default NULL, audio_size int(10) unsigned default NULL, audio_offset int(10) unsigned default NULL, year smallint(5) unsigned default NULL, secs float unsigned default NULL, cover blob, thumb blob, vbr_scale varchar(255) collate utf8_unicode_ci default NULL, bitrate float unsigned default NULL, samplerate int(10) unsigned default NULL, samplesize int(10) unsigned default NULL, channels tinyint(1) unsigned default NULL, block_alignment int(10) unsigned default NULL, endian tinyint(1) default NULL, bpm smallint(5) unsigned default NULL, tagversion varchar(255) collate utf8_unicode_ci default NULL, drm tinyint(1) default NULL, rating tinyint(1) unsigned default NULL, disc tinyint(1) unsigned default NULL, playCount int(10) unsigned default NULL, lastPlayed int(10) unsigned default NULL, audio tinyint(1) default NULL, remote tinyint(1) default NULL, lossless tinyint(1) default NULL, lyrics text collate utf8_unicode_ci, moodlogic_id int(10) unsigned default NULL, moodlogic_mixable tinyint(1) default NULL, musicbrainz_id varchar(40) collate utf8_unicode_ci default NULL, musicmagic_mixable tinyint(1) default NULL, replay_gain float default NULL, replay_peak float default NULL, PRIMARY KEY (id), KEY trackTitleIndex (title(255)), KEY trackAlbumIndex (album), KEY ctSortIndex (content_type), KEY trackSortIndex (titlesort(255)), KEY trackSearchIndex (titlesearch(255)), KEY trackCustomSearchIndex (customsearch(255)), KEY trackBitrateIndex (bitrate), KEY trackDiscIndex (disc), KEY trackFilesizeIndex (filesize), KEY trackTimestampIndex (timestamp), KEY trackTracknumIndex (tracknum), KEY trackRatingIndex (rating), KEY trackPlayCountIndex (playCount), KEY trackLastPlayedIndex (lastPlayed), KEY trackAudioIndex (audio), KEY trackLyricsIndex (lyrics(255)), KEY trackRemoteIndex (remote), KEY trackLosslessIndex (lossless), KEY urlIndex (url(255)), KEY trackStatMBIndex (musicbrainz_id), CONSTRAINT tracks_ibfk_1 FOREIGN KEY (album) REFERENCES albums (id) ON DELETE CASCADE) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE customscan_track_attributes ( id bigint(20) unsigned NOT NULL auto_increment, track int(10) default NULL, url varchar(511) collate utf8_unicode_ci NOT NULL, musicbrainz_id varchar(40) collate utf8_unicode_ci default NULL, module varchar(40) collate utf8_unicode_ci NOT NULL, attr varchar(255) collate utf8_unicode_ci NOT NULL, value varchar(255) collate utf8_unicode_ci default NULL, valuesort varchar(255) collate utf8_unicode_ci default NULL, extravalue varchar(255) collate utf8_unicode_ci default NULL, valuetype varchar(255) collate utf8_unicode_ci default NULL, PRIMARY KEY (id), UNIQUE KEY id (id), KEY track_attr_idx (track,module,attr,id), KEY musicbrainzIndex (musicbrainz_id), KEY urlIndex (url(255)), KEY module_attr_value_idx (module,attr,value), KEY attr_module_idx (attr,module), KEY extravalue_attr_module_track_idx (extravalue,attr,module,track), KEY track_module_attr_extravalue_idx (track,module,attr,extravalue), KEY module_attr_extravalue_idx (module,attr,extravalue)) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

When the tracks table contains about 40000 rows and the customscan_track_attributes table contains about 160000 rows the UPDATE statements takes about 5 seconds to execute on Linux, but on Windows it takes about 15 seconds. A user has also reported that it takes several minutes when the customscan_track_attributes table is larger. This is the case even if the update doesn’t actually update any rows.

I tried to make a SELECT first to check if the update was required, like this:

SELECT count(*) FROM customscan_track_attributes INNER JOIN tracks on tracks.url=customscan_track_attributes.url and customscan_track_attributes.musicbrainz_id is null where customscan_track_attributes.track!=tracks.id

When running this select with explain I get the following result:

id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE tracks ALL urlIndex (null) (null) (null) 40541 1 SIMPLE customscan_track_attributes ref musicbrainzIndex,urlIndex urlIndex 767 slimserver65_large.tracks.url 1 Using where

This SELECT statement takes approximately the same time to execute as the UPDATE statement.

I have full control of the customscan_track_attributes table, so if some changes in column types or indexes would make it better that is possible. The tracks table is from an external product, I can add index to this but I can’t change any column definitions.

Some pointers:
1.
Your inner join and where should be written correctly as:

SELECT COUNT(*)FROM customscan_track_attributes INNER JOIN tracks ON tracks.url = customscan_track_attributes.url AND tracks.id != customscan_track_attributes.trackWHERE customscan_track_attributes.musicbrainz_id IS NULL

Joining based on a large string like your url is not recommended since it degrades performance.

Especially since your url columns are of different types. That requires type conversion during the join which is also degrading performance.

tracks table:
url text collate utf8_unicode_ci NOT NULL

customscan_track_attributes table:
url varchar(511) collate utf8_unicode_ci NOT NULL

Since I’m guessing that you don’t really want to make any larger changes in the table structure I suggest that you start by change the tracks.url to VARCHAR instead.

Then create a combined index on tracks(url,id) and possibly on customscan_track_attributes(url, track) that way it can perform the entire join using one index.

Also what is your innodb_buffer_pool_size set to?
Because when the DB starts to grow this gets very important to keep performance up.

[B]sterin wrote on Sat, 07 July 2007 11:43[/B]
Some pointers: 2. Joining based on a large string like your url is not recommended since it degrades performance.

Especially since your url columns are of different types. That requires type conversion during the join which is also degrading performance.

tracks table:
url text collate utf8_unicode_ci NOT NULL

customscan_track_attributes table:
url varchar(511) collate utf8_unicode_ci NOT NULL

Since I’m guessing that you don’t really want to make any larger changes in the table structure I suggest that you start by change the tracks.url to VARCHAR instead.

Then create a combined index on tracks(url,id) and possibly on customscan_track_attributes(url, track) that way it can perform the entire join using one index.

Unfortunately I don't have a choice. The 'tracks' table is defined by an external product (SlimServer) which I can't change. The additional data I store is stored in my own table 'customscan_track_attributes'. So I cannot change the 'tracks' table and the only identifier that can be used is the 'url'.

I’ve tried to define customscan_track_attributes.url as a ‘text’, but that makes it even worse, on Windows MySQL locks the whole computer when joining two text columns in large tables so you need to use the power switch to get back the control.

However, I tried to create a temporary table where I defined the tracks.url as a varchar(511) and it didn’t make it any faster. It looked like this:

create temporary table tmp_tracks ( id int(10), url varchar(511)) select id,url from tracks;UPDATE customscan_track_attributes INNER JOIN tmp_tracks on tmp_tracks.url=customscan_track_attributes.url and customscan_track_attributes.track!=tmp_tracks.id set customscan_track_attributes.track=tmp_tracks.id where customscan_track_attributes.musicbrainz_id is null;DROP temporary table tmp_tracks;

[B]sterin wrote on Sat, 07 July 2007 11:43[/B]

Also what is your innodb_buffer_pool_size set to?
Because when the DB starts to grow this gets very important to keep performance up.

innodb_buffer_pool_size was set to 15M I tried changing it to 128M and this made it faster, the UPDATE took about 12 seconds with 15M and with 128M it takes 8 seconds the first time it is executed and about 3 seconds the second and third time it is executed.

When innodb_buffer_pool_size was set to 15M, there is always a lot of disk activity when I run the UPDATE. When changing the pool to 128M, there is disk activity the first time the update is run but the second time there doesn’t seem to be any disc activity at all.

I you have any more ideas I really appreciate them.

Although the track table is defined by an external product you should still be able to run a alter table statement on it to change the column to VARCHAR.

And combine it with the combined index on (url, id) and (url,tracks) I suggested.

And another setting that you also want to do to speed up insert/updates.
innodb_flush_log_at_trx_commit = 0
this gives you less security against corruption but you will very often gain a lot in speed.

[B]sterin wrote on Sun, 08 July 2007 12:19[/B]
Although the track table is defined by an external product you should still be able to run a alter table statement on it to change the column to VARCHAR.

And combine it with the combined index on (url, id) and (url,tracks) I suggested.

I'm sorry if I didn't give you the whole picture. The external product is SlimServer and I'm making a plugin for it. This plugin is installed by other users in their SlimServer installation. The reason I don't like to change the 'tracks' table is that this might change the behaviour of other things in the external product since it might depend on that the url column is defined as 'TEXT'.

However, I tested this:

create table tmp_tracks ( id int(10), url varchar(511), primary key (id), index urlIndex (url,id)) type=innodb select id,url from tracks;

The result is that I have a tmp_tracks table that has the exact same type of its (id,url) columns as I have for the (track,url) columns in customscan_track_attributes. It also has an index for (url,id). I also added an (url,track) index to the customscan_track_attributes table.

I then changed the UPDATE to run towards the tmp_tracks table instead of the tracks table. The result is that the speed is exactly the same as when I used the standard tracks table with the ‘TEXT’ url column. So the datatype doesn’t seem to matter, at least there isn’t any difference between a TEXT and VARCHAR(511).

I also tested to remove the != directive in the UPDATE statement, but this made the query even slower.

Finally I also tested to move the != statement inside the JOIN and move out the musicbrainz_id part to the where statement like like below, but this also made the query a bit slower:

UPDATE customscan_track_attributes JOIN tmp_tracks on tmp_tracks.url=customscan_track_attributes.url and customscan_track_attributes.track!=tmp_tracks.id set customscan_track_attributes.track=tmp_tracks.id where customscan_track_attributes.musicbrainz_id is null;

Thanks for all the ideas