Could pt-archiver use a non-unique key as chunk index?

Luke03011 · November 17, 2022, 6:33am

Assume I have a table defined as follows:

create table t(id int primary key, c1 int,c2 int, idx_c2(c2))engine=innodb

I want to archive data satisfying the following condition:

c2>=value

When I use --dry-run to run pt-archiver, it always picks up the primary key as the chunk index unless specifying the i part in the --source option.

SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>=6350) ORDER BY `c2` LIMIT 1000

Can this guarantee no duplicate rows to be archived? Because the SELECT generated has no guarantee it will not produce duplicate rows.

In fact, I expect the generated SELECT like the following:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE  ((c2=? and id>? ) or c2>?) ORDER BY `c2`,`id` LIMIT 1000

and the ? is the last value for the last iteration.

matthewb · November 17, 2022, 2:18pm

pt-archiver needs a unique index in order to “traverse” the table and to ensure that only 1 row is deleted after being inserted into the destination. If you tell pt-archiver to use a non-unique index, then there is no guarantee that you won’t have duplicates or delete the wrong data.

You might try creating an index on (PK, c2), which won’t give you direct uniqueness, but indirect since it uses the PK as a component.

Luke03011 · January 12, 2024, 2:58am

Sorry it’s a long time when I post this ticket. But as the idx_c2 actually contains the primary key id internally(In fact every secondary index will contain the primary key columns to make the indexed value is unique in the secondary index tree). So I am curious why pt-archiver does not use that part to guarantee the correctness.

Topic		Replies	Views
Is this a BUG or an intended behavior, pt-archiver misses rows when archiving with a secondary index and option --no-delete Percona Toolkit	2	494	January 15, 2024
pt-archiver use of indexes Percona Toolkit	0	707	April 18, 2016
pt-archiver errors with unique key violation Percona Toolkit	0	547	February 13, 2013
pt-archiver SELECT query fails because of primary key Percona Toolkit	6	1714	May 29, 2017
Pt-archiver leads to full table scan (and therefore lock on the source MyISAM table) Percona Toolkit	2	1426	July 7, 2021

Could pt-archiver use a non-unique key as chunk index?

Related topics