Best practices for pt-table-sync with composite primary keys

I’m curious about the best practices for using pt-table-sync on tables with composite primary keys.

I have several fairly large tables ( ~5 - 10M rows) that have composite primary keys.

Running pt-table-checksum with newer options to limit how deep to go into the key help that tool finish across the database in question in under 5 minutes.

However when I then use pt-table-sync to find the actual differences it’s been running for hours and hours and I am left with little insight as to how long it will take. Looking at the actual queries coming across I can see the X/Y chunk comments, however once it reaches Y/Y it’s numbering system just resets on a new batch of the same table.

I’m running with the --replicate option in hopes it would use that information to only reexamine chunks that have already have some difference detected. To further complicate matters it seems to be hitting some kind of bug where it’s not making progress and periodically just adds another replace for this one specific row over and over again.

Has anyone ran into a similar situation and found a good way to make pt-table-sync useful for tables like these?

I found pt-table-sync get into a loop with the same query in some cases with compound primary keys. Most often consisting of an ENUM field.

I have no solution for this, other than using --algorithm groupby (which may or may no be to slow on a production server)

1 Like