Does pt-archiver optimize table afterwards when using the --run-time and --optimize flags together?

Pretty much the title, we are using pt-archiver on a large table with a time limit (using the run-time flag)
The docs are unclear about whether optimize will be run after the time limit is reached and archival stops.

Hi @Atharv_Kirtikar , thanks for posting in the forum.

The sort answer is yes, it does.

I tested in my test environment to confirm, I loaded the sakila DB and added more data to it, and enabled slow query log to check all the commands; then I executed the following pt-archiver:

[centos@ip-172-31-23-46 ~]$ pt-archiver --source h=localhost,D=sakila,t=actor --where "actor_id>104000" --retries=5 --commit-each --limit=1000 --primary-key-only --purge --bulk-delete --no-check-charset --run-time=15 --why-quit --optimize=s
Exiting because time exceeded.

With the --why-quit flag you can see the pt-archiver finishes due to run-time was reached before finishing, but from the slow query log:

# Time: 2022-10-20T15:30:17.312126Z
# User@Host: root2[root2] @ localhost []  Id:    24
# Query_time: 0.011015  Lock_time: 0.000003 Rows_sent: 0  Rows_examined: 1000
SET timestamp=1666279817;
DELETE FROM `sakila`.`actor` WHERE (((`actor_id` >= '2585246'))) AND (((`actor_id` <= '2586245'))) AND (actor_id>104000) LIMIT 1000;
# Time: 2022-10-20T15:30:17.357019Z
# User@Host: root2[root2] @ localhost []  Id:    24
# Query_time: 0.044499  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
SET timestamp=1666279817;
commit;
# Time: 2022-10-20T15:30:17.358211Z
# User@Host: root2[root2] @ localhost []  Id:    24
# Query_time: 0.000792  Lock_time: 0.000004 Rows_sent: 1000  Rows_examined: 1000
SET timestamp=1666279817;
SELECT /*!40001 SQL_NO_CACHE */ `actor_id` FROM `sakila`.`actor` FORCE INDEX(`PRIMARY`) WHERE (actor_id>104000) AND (`actor_id` < '3473317') AND ((`actor_id` >= '2586245')) ORDER BY `actor_id` LIMIT 1000;
# Time: 2022-10-20T15:30:17.358689Z
# User@Host: root2[root2] @ localhost []  Id:    24
# Query_time: 0.000033  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
SET timestamp=1666279817;
commit;


# Time: 2022-10-20T15:30:33.137903Z
# User@Host: root2[root2] @ localhost []  Id:    24
# Query_time: 15.779057  Lock_time: 0.000080 Rows_sent: 0  Rows_examined: 0
SET timestamp=1666279817;
OPTIMIZE TABLE `sakila`.`actor`;


# Time: 2022-10-20T15:30:33.138654Z
# User@Host: root2[root2] @ localhost []  Id:    24
# Query_time: 0.000005  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
SET timestamp=1666279833;
# administrator command: Quit;

If you see the second to last command was OPTIMIZE TABLE, so this is something you should take in consideration, because pt-archiver will stop at run-time, but you need to add the time MySQL takes to optimize your table (from the output above, you can see my optimize took 15 seconds, so in total pt-archiver “ran” more than the run-time).

I hope this helps you.
Best regards,
Mauricio.

2 Likes