pt-archiver version is 3.5.4 and OS is Centos 7.9
When given a non-primary index to pt-archiver
, it archives the wrong data set.
Steps to repeat:
- create test tables
create table t(id int not null primary key,c1 int,c2 int, index idx_c2(c2));
mysql> select * from t;
+-----+------+------+
| id | c1 | c2 |
+-----+------+------+
| 141 | 142 | 2 |
| 142 | 143 | 2 |
| 143 | 144 | 2 |
| 144 | 145 | 2 |
| 145 | 146 | 2 |
| 146 | 147 | 2 |
| 147 | 148 | 2 |
| 148 | 149 | 2 |
| 149 | 150 | 2 |
| 150 | 151 | 2 |
| 151 | 152 | 2 |
| 152 | 153 | 2 |
| 153 | 154 | 2 |
| 154 | 155 | 2 |
| 155 | 156 | 2 |
| 156 | 157 | 2 |
| 157 | 158 | 2 |
| 158 | 159 | 2 |
| 159 | 160 | 2 |
| 160 | 161 | 2 |
| 161 | 162 | 2 |
| 162 | 163 | 2 |
| 163 | 164 | 2 |
| 164 | 165 | 2 |
| 165 | 166 | 2 |
| 166 | 167 | 2 |
| 167 | 168 | 2 |
| 168 | 169 | 2 |
| 169 | 170 | 2 |
| 170 | 171 | 2 |
| 171 | 172 | 2 |
| 172 | 173 | 2 |
| 173 | 174 | 2 |
| 174 | 175 | 2 |
| 175 | 176 | 2 |
| 176 | 177 | 2 |
| 177 | 178 | 2 |
| 178 | 179 | 2 |
| 179 | 180 | 2 |
| 180 | 181 | 2 |
| 181 | 182 | 2 |
| 182 | 183 | 2 |
| 183 | 184 | 10 |
| 184 | 185 | 10 |
| 185 | 186 | 10 |
| 186 | 187 | 10 |
| 187 | 188 | 10 |
| 188 | 189 | 10 |
| 189 | 190 | 10 |
| 190 | 191 | 10 |
| 191 | 192 | 10 |
| 192 | 193 | 10 |
| 193 | 194 | 10 |
| 194 | 195 | 10 |
| 195 | 196 | 10 |
| 196 | 197 | 10 |
| 197 | 198 | 10 |
| 198 | 199 | 10 |
| 199 | 200 | 10 |
| 200 | 201 | 10 |
| 201 | 202 | 10 |
| 202 | 203 | 10 |
| 203 | 204 | 30 |
| 204 | 205 | 30 |
| 205 | 206 | 30 |
| 206 | 207 | 30 |
| 207 | 208 | 30 |
| 208 | 209 | 30 |
| 209 | 210 | 30 |
| 210 | 211 | 30 |
| 211 | 212 | 30 |
| 212 | 213 | 30 |
| 213 | 214 | 30 |
| 214 | 215 | 30 |
| 215 | 216 | 30 |
| 216 | 217 | 30 |
| 217 | 218 | 30 |
| 218 | 219 | 30 |
| 219 | 220 | 30 |
| 220 | 221 | 30 |
| 221 | 222 | 30 |
| 222 | 223 | 30 |
| 223 | 224 | 30 |
| 224 | 225 | 30 |
| 225 | 226 | 30 |
| 226 | 227 | 30 |
| 227 | 228 | 30 |
| 228 | 229 | 30 |
| 229 | 230 | 30 |
| 230 | 231 | 30 |
| 231 | 232 | 30 |
| 232 | 233 | 30 |
+-----+------+------+
92 rows in set (0.20 sec)
create table t2 like t;
- archive rows with
c2>11
to tablet2
using the following command:
pt-archiver --source h=host,u=user,D=test,t=t,i=idx_c2 --dest t=t2 --no-delete --limit 8 --commit-each --ask-pass --where "c2>11"
- check the target table
t2
:
mysql> select * from t2;
+-----+------+------+
| id | c1 | c2 |
+-----+------+------+
| 203 | 204 | 30 |
| 204 | 205 | 30 |
| 205 | 206 | 30 |
| 206 | 207 | 30 |
| 207 | 208 | 30 |
| 208 | 209 | 30 |
| 209 | 210 | 30 |
| 210 | 211 | 30 |
+-----+------+------+
8 rows in set (0.19 sec)
It only archive 8
rows, and this is obviously wrong, there are 30
rows matching c2>11
.
I enabled the general log and these are the SELECTS pt-archiver
generated:
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) ORDER BY `c2` LIMIT 8
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND (((('30' IS NULL AND `c2` IS NOT NULL) OR (`c2` > '30')))) ORDER BY `c2` LIMIT 8
And the second generated SQL is definitely wrong. I raised a related ticket at Could pt-archiver use a non-unique key as chunk index? - #3 by Luke03011
Does this mean pt-archiver
can not work on non-primary(unique) indexes(with --no-delete
option)?
And when I remove the option --no-delete
, it can archive correctly. And it generated the following SELECTS:
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) ORDER BY `c2` LIMIT 8
... deletion omitted
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
..
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
...
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
...
SELECT /*!40001 SQL_NO_CACHE */ `id`,`c1`,`c2` FROM `test`.`t` FORCE INDEX(`idx_c2`) WHERE (c2>11) AND ((('30' IS NULL OR `c2` >= '30'))) ORDER BY `c2` LIMIT 8
I am curious why --no-delete
affects the generated SELECTs? In my understanding, they should be exactly the same.
If it’s intended, please update the doc at pt-archiver — Percona Toolkit Documentation to warn users it may cause incorrect archive - missing data or duplicated data in the destination.