pt-table-checksum questions about algorithm

neuen · March 22, 2014, 3:39pm

Hi,

excuse me for my bad Perl knowledge, I was not able to read and understand pt-table-checksum code thoroughly so my questions might be quite simple. Here they are.

It issues $ub_sql query to get {upper} boundary for current chunk and {next_lower} boundary for next chunk if available. Does it check replicas have no data between {upper} and {next_lower}?
Issuing $ub_sql and then $checksum_dml partially reads same data twice (LIMIT with offset has to pass that many rows). Considering master server it is run on is often the main production server and is quite IO busy/bound it is either resource intensive or time consuming (or both). Why cannot it issue $checksum_dml only but with $ub_sql LIMIT instead of WHERE boundaries, log max boundary to percona table and check it is the same across all replicas? This will change O(N) complexity for running $ub_sql to O(1) complexity for running one row select from percona table.

I mean query like this (for simplicity I assume index single column):

REPLACE INTO percona.checksums (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc)
SELECT ‘schema’, ‘table’, ‘chunk’, ‘chunk_index’, ‘lower_boundary’,
MAX(index_field) AS upper_boundary,
COUNT(*) AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(’ #', field1, …, fieldN, CONCAT(ISNULL(field1), …, ISNULL(fieldN)))) AS UNSIGNED)), 10, 16)), 0) AS crc
FROM schema.table FORCE INDEX(chunk_index)
WHERE index_field>=:upper_boundary_of_previous_chunk:
ORDER BY index_field
LIMIT ?
/checksum chunk/

At least one row which satisfies WHERE index_field=:upper_boundary_of_previous_chunk: will be read twice this way. This won’t spoil checksum results but is unnecessary overhead. I’m quite sure this can be avoided with something like MAX(CASE … END) however I don’t want to go that far now.

Thanks for the answers in advance.

Topic		Replies	Views
How to read the checksums table Percona XtraDB Cluster 8.x	2	66	March 26, 2025
Skipping table - it would be checksummed in 1 chunk but replicas have too many rows Percona Toolkit	2	2878	October 17, 2014
pt-table-checsum details Percona Toolkit	2	446	December 18, 2014
pt-table-checksum error clarification Percona Toolkit	1	769	November 1, 2012
Pt-table-checksum can't start/ Percona Toolkit	2	690	October 10, 2023

pt-table-checksum questions about algorithm

Related topics