Optimizing table... Copy to tmp table slow down

lavium · January 3, 2010, 8:49am

Hi,

I’m working on a project where I need to find out what number of items one user has in common with another user.

I’ve simplified my tables while I’m working on optimizing these queries. Here they are:

CREATE TABLE users ( id int(10) unsigned NOT NULL auto_increment, PRIMARY KEY (id)) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;CREATE TABLE items ( id int(10) unsigned NOT NULL auto_increment, PRIMARY KEY (id)) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;CREATE TABLE items_to_users( id int(10) unsigned NOT NULL auto_increment, item_id int(10) unsigned NOT NULL, user_id int(10) unsigned NOT NULL, PRIMARY KEY (id), UNIQUE KEY item_id_2(item_id,user_id)) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

There may be millions of users, millions of items, and, of course, millions of rows in items_to_users. For now, I generated a temporary dataset of 10000 users, 10000 items, and 2 million items_to_users.

Here are the queries I’m using to find users (the top 100) with similar items:

CREATE TEMPORARY TABLE temp_items as SELECT items_to_users.id,item_id,user_id FROM items_to_users WHERE user_id = SOME_USER_ID;SELECT m1.user_id, m2.user_id, count(*) AS in_common FROM temp_items AS m1 INNER JOIN items_to_users AS m2 ON m1.item_id = m2.item_id AND m1.user_id < m2.user_id GROUP BY m1.user_id, m2.user_id HAVING in_common > 0 ORDER BY in_common DESC limit 100;

An explain of the second query shows that it makes a temporary table (I guess due to the group by coming from different tables?) and that step (Copy to tmp) takes a really long time compared to the rest of the query. Is there any way I can cut this time down? Or optimize it?

Thanks for your help! )
Tom

xaprb · January 3, 2010, 6:13pm

This is a difficult problem. It’s basically an N-dimensional proximity search. I can’t think of a really efficient way to do this in a database – any database – but you might try some kind of fuzzy matching to get you closer, such as creating a signature of a person’s preferences. For example, are the items categorizable? Maybe you can assign each user a string of bits that reflects whether they have an item in a given category, and then do an initial match on this string. Within the results, do an exact match (if needed) to get the final set of users. If I think of anything else I’ll post more.

Topic		Replies	Views
Need Faster query results Other MySQL® Questions	5	486	July 17, 2009
Join and speed issues Other MySQL® Questions	6	658	August 27, 2009
Query optimisation help please Other MySQL® Questions	4	430	January 29, 2009
very simple jointure, how to avoid filesort ? Other MySQL® Questions	1	513	September 7, 2010
mysql not using index Other MySQL® Questions	1	354	February 1, 2007

Optimizing table... Copy to tmp table slow down

Related topics