Is this a BUG or an intended behavior, pt-archiver misses rows when archiving with a secondary index and option --no-delete

Luke03011 · January 12, 2024, 7:57am

pt-archiver version is 3.5.4 and OS is Centos 7.9

When given a non-primary index to pt-archiver, it archives the wrong data set.

Steps to repeat:

create test tables

create table t(id int not null primary key,c1 int,c2 int, index idx_c2(c2));
mysql> select * from t;
+-----+------+------+
| id  | c1   | c2   |
+-----+------+------+
| 141 |  142 |    2 |
| 142 |  143 |    2 |
| 143 |  144 |    2 |
| 144 |  145 |    2 |
| 145 |  146 |    2 |
| 146 |  147 |    2 |
| 147 |  148 |    2 |
| 148 |  149 |    2 |
| 149 |  150 |    2 |
| 150 |  151 |    2 |
| 151 |  152 |    2 |
| 152 |  153 |    2 |
| 153 |  154 |    2 |
| 154 |  155 |    2 |
| 155 |  156 |    2 |
| 156 |  157 |    2 |
| 157 |  158 |    2 |
| 158 |  159 |    2 |
| 159 |  160 |    2 |
| 160 |  161 |    2 |
| 161 |  162 |    2 |
| 162 |  163 |    2 |
| 163 |  164 |    2 |
| 164 |  165 |    2 |
| 165 |  166 |    2 |
| 166 |  167 |    2 |
| 167 |  168 |    2 |
| 168 |  169 |    2 |
| 169 |  170 |    2 |
| 170 |  171 |    2 |
| 171 |  172 |    2 |
| 172 |  173 |    2 |
| 173 |  174 |    2 |
| 174 |  175 |    2 |
| 175 |  176 |    2 |
| 176 |  177 |    2 |
| 177 |  178 |    2 |
| 178 |  179 |    2 |
| 179 |  180 |    2 |
| 180 |  181 |    2 |
| 181 |  182 |    2 |
| 182 |  183 |    2 |
| 183 |  184 |   10 |
| 184 |  185 |   10 |
| 185 |  186 |   10 |
| 186 |  187 |   10 |
| 187 |  188 |   10 |
| 188 |  189 |   10 |
| 189 |  190 |   10 |
| 190 |  191 |   10 |
| 191 |  192 |   10 |
| 192 |  193 |   10 |
| 193 |  194 |   10 |
| 194 |  195 |   10 |
| 195 |  196 |   10 |
| 196 |  197 |   10 |
| 197 |  198 |   10 |
| 198 |  199 |   10 |
| 199 |  200 |   10 |
| 200 |  201 |   10 |
| 201 |  202 |   10 |
| 202 |  203 |   10 |
| 203 |  204 |   30 |
| 204 |  205 |   30 |
| 205 |  206 |   30 |
| 206 |  207 |   30 |
| 207 |  208 |   30 |
| 208 |  209 |   30 |
| 209 |  210 |   30 |
| 210 |  211 |   30 |
| 211 |  212 |   30 |
| 212 |  213 |   30 |
| 213 |  214 |   30 |
| 214 |  215 |   30 |
| 215 |  216 |   30 |
| 216 |  217 |   30 |
| 217 |  218 |   30 |
| 218 |  219 |   30 |
| 219 |  220 |   30 |
| 220 |  221 |   30 |
| 221 |  222 |   30 |
| 222 |  223 |   30 |
| 223 |  224 |   30 |
| 224 |  225 |   30 |
| 225 |  226 |   30 |
| 226 |  227 |   30 |
| 227 |  228 |   30 |
| 228 |  229 |   30 |
| 229 |  230 |   30 |
| 230 |  231 |   30 |
| 231 |  232 |   30 |
| 232 |  233 |   30 |
+-----+------+------+
92 rows in set (0.20 sec)

create table t2 like t;

archive rows with c2>11 to table t2 using the following command:

pt-archiver --source h=host,u=user,D=test,t=t,i=idx_c2 --dest t=t2 --no-delete --limit 8 --commit-each --ask-pass --where "c2>11"

check the target table t2:

mysql> select * from t2;
+-----+------+------+
| id  | c1   | c2   |
+-----+------+------+
| 203 |  204 |   30 |
| 204 |  205 |   30 |
| 205 |  206 |   30 |
| 206 |  207 |   30 |
| 207 |  208 |   30 |
| 208 |  209 |   30 |
| 209 |  210 |   30 |
| 210 |  211 |   30 |
+-----+------+------+
8 rows in set (0.19 sec)

It only archive 8 rows, and this is obviously wrong, there are 30 rows matching c2>11.

I enabled the general log and these are the SELECTS pt-archiver generated:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) ORDER BY `c2` LIMIT 8

SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND (((('30' IS NULL AND `c2` IS NOT NULL) OR (`c2` > '30')))) ORDER BY `c2` LIMIT 8

And the second generated SQL is definitely wrong. I raised a related ticket at Could pt-archiver use a non-unique key as chunk index? - #3 by Luke03011

Does this mean pt-archiver can not work on non-primary(unique) indexes(with --no-delete option)?

And when I remove the option --no-delete, it can archive correctly. And it generated the following SELECTS:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) ORDER BY `c2` LIMIT 8
... deletion omitted
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
..
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
...
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
...
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8

I am curious why --no-delete affects the generated SELECTs? In my understanding, they should be exactly the same.

If it’s intended, please update the doc at pt-archiver — Percona Toolkit Documentation to warn users it may cause incorrect archive - missing data or duplicated data in the destination.

matthewb · January 12, 2024, 3:37pm

Hello @Luke03011,

No, it is not wrong. Look at your command:

pt-archiver ... --limit 8

You literally told pt-archiver to only archive 8 rows. The SQL from the general log also confirmed that, LIMIT 8

Luke03011 · January 15, 2024, 2:09am

Are you sure? I think the --limit option is to tell pt-archiver to fetch this many rows per statement but not the total rows to archive. And if this specifies the total rows, why the the second command without --no-delete archives 30 rows?

This is from the pt-archiver doc:

–limit

type: int; default: 1

Number of rows to fetch and archive per statement.

Limits the number of rows returned by the SELECT statements that retrieve rows to archive. Default is one row. It may be more efficient to increase the limit, but be careful if you are archiving sparsely, skipping over many rows; this can potentially cause more contention with other queries, depending on the storage engine, transaction isolation level, and options such as --for-update.

Topic		Replies	Views
pt-archiver use of indexes Percona Toolkit	0	707	April 18, 2016
Could pt-archiver use a non-unique key as chunk index? Percona Toolkit	2	1293	January 12, 2024
How to speed up pt-archiver	1	1545	June 16, 2022
[Closed] Optimizing pt-archiver for row deletion ( 5.7.44-48 Percona Server) Percona Server for MySQL 5.7	5	135	January 16, 2025
Pt-archiver not using --columns listed when index is defined manually Percona Toolkit	2	1237	May 24, 2022

Is this a BUG or an intended behavior, pt-archiver misses rows when archiving with a secondary index and option --no-delete

Related topics