Atomic table rename with pt-online-schema-change and Aurora replica

Brian_Hill · February 13, 2024, 11:22pm

Hi,

I have hopefully a quick question: I used pt-online-schema-change to rename a table like this:

pt-online-schema-change --execute --alter "MODIFY value VARCHAR(4096)" \
  --ask-pass --host [hostname] --user [username] D=[database],t=[table]

I ran this on an Aurora cluster running Aurora 2.11.0. The modifications to the table worked exactly as expected with no issues. However, at the moment of the atomic table swap, on the Aurora replicas, we had some queries fail with an error table [table] doesn't exist.

I’ve engaged the Aurora support team on this to get clarification, but thought I’d ask here for the benefit of community awareness what exactly it means that the table swap is atomic. I was expecting that, from the point of view of another query, the table would always exist (i.e. the entire swap is atomic, not each individual table rename). Does it work that way in Community MySQL? And is it known that Aurora behaves differently?

Ivan_Groenewold · February 16, 2024, 3:10pm

Hey Brian, thanks for posting here. Does the table have any foreign keys? that can cause the swap to not be atomic as per pt-online-schema-change — Percona Toolkit Documentation

Brian_Hill · February 16, 2024, 3:49pm

Ah yes, I saw this in the docs. The table does have one foreign key, but there are no other tables that reference the one being renamed using a foreign key. Based on how I was reading the docs, I think that shouldn’t require using the foreign key functionality in pt-osc.

I appreciate you replying – I also heard back yesterday from AWS support, who was able to confirm that the non-atomic rename behavior is expected based on the design of Aurora’s replication system. They suggested modifying queries of the table to retry in case they receive this error, but we are hesitant to implement that to work around this issue.

Another approach we’ve considered is temporarily re-routing read traffic that uses this table to the writer temporarily while we perform the migration, because the table rename does appear to be atomic with respect to the writer instance. This doesn’t give us a great feeling, either, but doesn’t require that the application be changed.

Ivan_Groenewold · February 16, 2024, 4:23pm

Thanks for letting us know about this. AFAIK Aurora read replicas actually use the same underlying storage, so it’s likely they changed the source code to somehow invalidate the table on the replicas when it is changed on the writer node. Unfortunately the source is not public.
Routing all traffic to the writer could be risky depending on the load but I don’t see any other options except that or changing the app to retry.

Topic		Replies	Views
pt-online-schema swapping tables Percona Toolkit	3	982	November 19, 2018
Atomic Rename Updating one Table to Another MySQL & MariaDB mysql , percona	4	710	July 15, 2022
pt-online-schema-change concepts Percona Toolkit	1	676	April 2, 2015
Pt-online-schema-change breaks table replication in AWS DMS Other MySQL® Questions mysql , percona	5	1392	March 1, 2021
Can pt-online-schema-change be run separately to just handle the table swap? Percona Toolkit	3	1678	February 21, 2023

Atomic table rename with pt-online-schema-change and Aurora replica

Related topics