Dear Srs,
I have a table with near 1.000.000 rows, and I want to select random rows, but without repeating records.
If I don’t prevent repeating records MySQL works fine, this is a PHP code fragment:
$result = mysql_query(“SELECT MIN(id) AS min , MAX(id) AS max FROM tabla”);$rango = mysql_fetch_object($result);$rand_id = mt_rand($rango->min, $rango->max);$result = mysql_query(“SELECT * FROM tabla WHERE id >= $rand_id LIMIT 0,1”);
This is soo fast, MySQL rocks
For preventing repeating rows, I’m making this:
-
There’s another column to mark if a record was selected previosly or not
-
Select rows not market at selected
-
Get the record and mark it as selected
-
Go to 1)
The step 2) has a WHERE statement that slows de query to more than 10 seconds…
This is the table definition:
mysql> SHOW CREATE TABLE ip_assigned \G*************************** 1. row *************************** Table: ip_assignedCreate Table: CREATE TABLE ip_assigned
( id
int(11) NOT NULL auto_increment, ip
varchar(256) NOT NULL, assigned_time
timestamp NOT NULL default ‘0000-00-00 00:00:00’, crawler
int(11) NOT NULL, PRIMARY KEY USING BTREE (id
), KEY index_ip
(ip
(16)), KEY index_crawler
(crawler
,ip
)) ENGINE=MyISAM AUTO_INCREMENT=871539 DEFAULT CHARSET=utf81 row in set (0.00 sec)mysql>
This is the query to get not market records:
// Get COUNT(*) FROM cache, cache->total_ips$sql = “SELECT val AS total_ips FROM cache WHERE param = ‘total_ips’”;$result = mysql_query($sql);$cache = mysql_fetch_object($result);// Calculate a random offset for the LIMIT$offset = mt_rand(1, $cache->total_ips);// Get $total_requested_ips random IPs$sql = “SELECT ip FROM ip_assigned WHERE crawler = 0 LIMIT $offset,$total_requested_ips”;$result = mysql_query($sql);
This last query is very slow, because MySQL needs to scan a lot of rows, this is the output of EXPLAIN command:
mysql> DESCRIBE SELECT ip FROM ip_assigned WHERE crawler = 0 LIMIT 705607,10240 ;±—±------------±------------±-----±--------------±--------------±--------±------±-------±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------------±-----±--------------±--------------±--------±------±-------±------------+| 1 | SIMPLE | ip_assigned | ref | index_crawler | index_crawler | 4 | const | 813308 | Using index | ±—±------------±------------±-----±--------------±--------------±--------±------±-------±------------+1 row in set (0.00 sec)
What can I do to improve this query? perhaps another method/idea to prevent repeating random registers? Thanks!
Regards,