find some random recently added items in a large table

barryhunter · July 12, 2007, 4:54pm

This is an easy problem to solve, but seems to be harder to do ‘nicely’, we cant seem to find a nice solution yet.

The Problem
Select a small number of random rows from those recently added to a table. (this is for a photo upload site, and want to showcase a selection of recently submitted photos)

Our Solution
We maintain a second ‘recent’ table, which holds just the id’s for the most recent 250 images (maintained by a cronjob - quite well we think).

This is the select statement we come up with so far to pick some recent images:

<?php$limit="limit ".rand(0,245).",5";$sql="select * from gridimage inner join (select gridimage_id from recent_gridimage $limit) as t2 using (gridimage_id)";?>

(I havnt included the schema but it should be clear on the columns, gridimage_id is a primary key of both tables)

This of results in a full table scan (almost) of recent_gridimage but it should never contain more than 250 rows anyway.

Other attempts

A quick select count(*) and then limit 486643,5; but of course that is very slow too.

select * from recent_gridimage inner join gridimage_search using (gridimage_id) order by rand() limit 5

Both of these do quite well (the second slightly better)

select * from gridimage where gridimage_id between 486643 and 486647 limit 5select * from gridimage where gridimage_id in (486643,486644,486645,486646,486647) limit 5

the problem beeing that not all ids are used, (about 1%), so could jsut select 6 to be safe, I don’t think it critical that we would occaisaonly only get 4.

… so any neat tricks to tackle this?
(the last one actully sounds quite good, and only thought of as writing this, but I’ll post anyway incase it helps others!)

Peter · August 16, 2007, 9:45am

Create dedicated table like recent_items which has auto-increment key with no holes and just pick rows where generating random IDs in your application. This will make it very fast.

barryhunter · September 5, 2007, 4:55am

Thanks for the reply,

I guess you meaning to store all columns in the ‘recent table’ as opposed to just the primary key column (like we did, but that had sub-optimal performance as it had to join with the large table)

In the end have gone with a count(*) then pick (in php) somewhere in the last few rows, and then do a “id in(x,x,x)” which from some crude testing gives marginally faster than other methods (apart from your suggestion which we haven’t tried).

Topic		Replies	Views
Fastest way to Select 1 random result Other MySQL® Questions	1	414	April 28, 2009
Clustered index decision in a large table Other MySQL® Questions	3	500	November 16, 2007
Help order by rand before group by Other MySQL® Questions	6	1402	November 10, 2009
Improve a query to select random rows, but without repeating records Other MySQL® Questions	3	5685	June 5, 2008
RAND() Efficiency on well indexed columns Other MySQL® Questions	2	420	August 10, 2008

find some random recently added items in a large table

Related topics