Pt-online-schema-change failed at 23% on 501M row table — Is it possible to resume from where it left off?

Hello everyone

Environment

  • Percona XtraDB Cluster 5.7.44-48 (wsrep 31.65)
  • 4-node Galera cluster
  • pt-online-schema-change (latest on RHEL)
  • Table: Attachments (~501 million rows)

What I Was Doing

Running a charset conversion on the Attachments table as part of an RT (Request Tracker) upgrade from 4.4.x to 5.0.x:

pt-online-schema-change \
  --execute \
  --no-drop-old-table \
  --progress=time,30 \
  --chunk-size=500 \
  --max-load='Threads_running:30' \
  --critical-load='Threads_running:60' \
  --sleep=0.5 \
  --max-lag=5 \
  --alter-foreign-keys-method=auto \
  --recursion-method=none \
  D=mydb,t=Attachments \
  --alter "CONVERT TO CHARACTER SET utf8mb4, ROW_FORMAT=DYNAMIC"

Timeline

Started  : May 19, 17:00
Failed   : May 21, ~17:00
Runtime  : approximately 48 hours
Progress : 23% complete at time of failure
Remaining: 5 days + 5 hours + 33 minutes (as shown by pt-osc)

The last output before failure was:

Copying `mydb`.`Attachments`:  23% 5+05:33:08 remain

What Was Left Behind

After the failure, the following objects exist on all 4 nodes (replicated via TOI):

  • Shadow table mydb._Attachments_new — partially filled with approximately 115 million rows (23% of 501M)
  • Three pt-osc triggers still active on the original Attachments table:
    • pt_osc_mydb_Attachments_ins
    • pt_osc_mydb_Attachments_upd
    • pt_osc_mydb_Attachments_del

The original Attachments table is completely untouched.


My Question

I understand that pt-osc has no built-in resume functionality and a normal restart would drop the shadow table and start over from row 1.

However, given that:

  1. The shadow table _Attachments_new already has ~115 million rows correctly copied
  2. The three triggers are still active and have been continuously capturing all INSERT / UPDATE / DELETE changes on the original table since the run started on May 19th
  3. The shadow table already has the correct structure (utf8mb4, ROW_FORMAT=DYNAMIC)

Is there any safe way to resume or continue the copy from where it left off, rather than starting over from scratch?

For example:

  • Is it possible to manually run the chunk copy INSERT statements starting from the last successfully copied primary key value, and then manually trigger the ANALYZE + RENAME steps that pt-osc would normally do?
  • Or is the risk of data inconsistency between the shadow table and original table too high to attempt a manual resume?
  • Is there a way to point pt-osc at the existing shadow table and tell it to continue?

If a full restart is the only safe option, I completely understand — I just want to be certain before dropping 48 hours of copy work and starting over.


Additional Context

The likely reason for the repeated failures is that chunk-size=500 with sleep=0.5 generates approximately 1 million chunks with 0.5s sleep each — totalling ~5.8 days of sleep time alone for a 501M row table. I plan to retry with chunk-size=10000 and sleep=0.05 which should bring the total runtime down to approximately 3-5 hours.

One node in the cluster was also under flow control pressure at the time (wsrep_local_recv_queue_avg=26, wsrep_flow_control_sent=32,071) which likely contributed to the node dropping out of Synced state and causing pt-osc to lose its connection.

Thank you in advance for any guidance.

Best Regards

Srikanth Lingala

There is a resume functionality added from v3.6.0

You can refer to this blog,

However to use this resume functionality you need to start the pt-online-schema-change using history option,

If this is percona xtradb cluster hope that you are adding this option --max-flow-ctl=0 so if it finds there is a flow control, it will pause its operation until it gets normal. This will not let cluster stall or pull out the node.

pt-online-schema-change has good capability to handle these failures. You can also manually handle by pausing operations if you feel cluster is under high load using --pause option.

Hope these option helps you to alter the table.

Please remove the --chunk-size flag. PTOSC automatically adjusts the chunk-size based on --chunk-time to maximize performance and speed. The other safety limits you have in place will handle if the server gets too busy, including the max-flow-ctl that Yunus recommended.