Pt-archiver leads to full table scan (and therefore lock on the source MyISAM table)

So, I’m attempting to use pt-archiver to archive and purge old data from a table. The source table is unfortunately using the MyISAM engine, which causes the major part of the issue.

pt-archiver --ask-pass --file archive-%Y-%m-%d.csv --limit 100 --output-format csv --progress 10000 --sleep 1 --source h=db.example.com,u=user,D=mydb,t=mytable --where "dldate<'2014-02-02'"

The above works OK until the archiving gets to a point where there are fewer than 100 rows with “dldate<‘2014-02-20’”. At that point a SQL statement gets issues which causes a full table scan (on a 70 million row table), which locks said table on the master database server - causing big issues.

The statement pt-archiver ends up using looks like the following, I’ve anonymised the statement somewhat:

SELECT /*!40001 SQL_NO_CACHE */ `dlid`,`other_id`,`dldate`,`dltime`,`other_col` FROM `mydb`.`mytable` FORCE INDEX(`PRIMARY`) WHERE (dldate<'2014-02-02') AND (`dlid` < '123226848') AND ((`dlid` >= '53129354')) ORDER BY `dlid` LIMIT 100`

Now, what I discovered is that this query works fine if the FORCE INDEX(PRIMARY) is removed.

mysql> SELECT /*!40001 SQL_NO_CACHE */ COUNT(*) FROM `mydb`.`mydb` WHERE (dldate<'2014-02-02') AND (`dlid` < '123226848') AND ((`dlid` >= '53129354')) ORDER BY `dlid` LIMIT 100;                                                                                    +----------+
| COUNT(*) |
+----------+
|       64 |
+----------+
1 row in set (0.08 sec)
2 Likes

Hi John,

MyISAM has table level locking instead of row level locking, so this kind of operations are expensive.

2nd, does the table has an index on the column you are purging? (dldate)
Pt-archiver will search for the matching condition to delete in chunks of ids. So maybe the rows you intended to delete had the lowest ids where pt-archiver began checking, but if there is no index on dldate, pt-archiver will need to do the entire full table scan to make sure there are no other rows matching the purge condition.

Regards

The table does have an index for dldate, but the “FORCE INDEX(PRIMARY)” appears to stop the index being used:

  PRIMARY KEY  (`dlid`),
  KEY `chrono_idx` (`dldate`,`dltime`,`dlid`),
  KEY `espid` (`espid`),
  KEY `dldate_idx` (`dldate`),
  KEY `domain_id_idx` (`domain_id`)