Hi all. I have a pt-archiver job that is meant to run all day, but it is exiting after about 30m with this message. I can’t see any reason why the inserts would fail, and the output doesn’t give me any clues.
Output looks like
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: TIME ELAPSED COUNT
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:00:02 0 0
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:03:46 224 10000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:07:09 427 20000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:10:35 632 30000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:14:15 853 40000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:17:42 1059 50000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:21:11 1269 60000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:24:39 1477 70000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:28:04 1682 80000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:31:32 1889 90000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:34:56 2094 100000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:38:21 2299 110000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:41:48 2506 120000
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: 2021-05-19T21:44:38 2676 128267
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: Started at 2021-05-19T21:00:02, ended at 2021-05-19T21:48:02
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: Source: D=redacted
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: Dest: D=redacted
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: SELECT 128300
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: INSERT 128267
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: DELETE 128250
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: Action Count Time Pct
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: sleep 2565 2565.3479 89.05
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: inserting 128267 45.6143 1.58
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: bulk_deleting 2565 27.3536 0.95
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: select 2567 23.6883 0.82
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: commit 256536 9.8213 0.34
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: rollback 2 0.0001 0.00
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: other 0 208.9058 7.25
May 19 21:48:02 ip-172-22-81-79 do_purge.sh[32735] O: Exiting because retries exceeded.
Any clues how to investigate much appreciated.
FWIW, nothing else is writing to the dest table, which is just a simple table into which the primary key from the old table is being inserted. However, I do have two other jobs running that read from this table to remove orphaned records in a couple child tables.