So, I’m attempting to use pt-archiver to archive and purge old data from a table. The source table is unfortunately using the MyISAM engine, which causes the major part of the issue.
pt-archiver --ask-pass --file archive-%Y-%m-%d.csv --limit 100 --output-format csv --progress 10000 --sleep 1 --source h=db.example.com,u=user,D=mydb,t=mytable --where "dldate<'2014-02-02'"
The above works OK until the archiving gets to a point where there are fewer than 100 rows with “dldate<‘2014-02-20’”. At that point a SQL statement gets issues which causes a full table scan (on a 70 million row table), which locks said table on the master database server - causing big issues.
The statement pt-archiver ends up using looks like the following, I’ve anonymised the statement somewhat:
SELECT /*!40001 SQL_NO_CACHE */ `dlid`,`other_id`,`dldate`,`dltime`,`other_col` FROM `mydb`.`mytable` FORCE INDEX(`PRIMARY`) WHERE (dldate<'2014-02-02') AND (`dlid` < '123226848') AND ((`dlid` >= '53129354')) ORDER BY `dlid` LIMIT 100`
Now, what I discovered is that this query works fine if the FORCE INDEX(
PRIMARY)
is removed.
mysql> SELECT /*!40001 SQL_NO_CACHE */ COUNT(*) FROM `mydb`.`mydb` WHERE (dldate<'2014-02-02') AND (`dlid` < '123226848') AND ((`dlid` >= '53129354')) ORDER BY `dlid` LIMIT 100; +----------+
| COUNT(*) |
+----------+
| 64 |
+----------+
1 row in set (0.08 sec)