Table-checksum and table-sync

pt-table-sync feels as though it could be more efficiently used if provided with --where params based upon the results from pt-table-checksum. It doesn’t seem to do so when running --sync-to-master, though. It feels inefficient re-process an entire large table.

I can see which tables need attention from the CRC and CNT values in checksums. Does anyone know how to compose a where statement based upon the content in the checksums table, or perhaps have a different workflow suggestion?

1 Like

When you run pt-table-sync, you need to pass --replicate parameter and specify the database.table of where the checksum data is stored. This will make pt-t-s read the checksum results first and only look at those tables with differences.

1 Like

Thank you, yes. I’m using --replicate and it is applying only to those tables, but it doesn’t seem to process only part of the table. It may be that some of our tables are in worse shape than I knew, but based upon the structure of the checksums table it seemed as though targeted processing might be available rather than going through the whole table a second time.

1 Like

Hi @mark-gci,
If you have evidence that pt-t-s while using --replicate is checking chunks other than where this_crc!=master_crc, please open a bug report at https://jira.percona.com/ and that is indeed a performance issue. Keep in mind that pt-t-s will operate at the chunk level, so if you have a chunk of 1M rows, then pt-t-s will check all in that chunk. You can configure pt-t-c to use a static chunk size if you find this is the case.

1 Like

I’m not sure I could provide that evidence. It may not be happening. The whole point of this discussion is to understand more about how to determine which part of a table relates to a given checksum+crc.

1 Like

That information can be found in the percona.checksums table, which holds the results of the pt-t-c process. You can see the upper and lower index boundaries.

1 Like