Not the answer you need?
Register and ask your own question!

pt-table-checksum questions about algorithm

neuenneuen EntrantInactive User Role Beginner

excuse me for my bad Perl knowledge, I was not able to read and understand pt-table-checksum code thoroughly so my questions might be quite simple. Here they are.

1. It issues $ub_sql query to get {upper} boundary for current chunk and {next_lower} boundary for next chunk if available. Does it check replicas have no data between {upper} and {next_lower}?

2. Issuing $ub_sql and then $checksum_dml partially reads same data twice (LIMIT with offset has to pass that many rows). Considering master server it is run on is often the main production server and is quite IO busy/bound it is either resource intensive or time consuming (or both). Why cannot it issue $checksum_dml only but with $ub_sql LIMIT instead of WHERE boundaries, log max boundary to percona table and check it is the same across all replicas? This will change O(N) complexity for running $ub_sql to O(1) complexity for running one row select from percona table.

I mean query like this (for simplicity I assume index single column):

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc)
SELECT 'schema', 'table', 'chunk', 'chunk_index', 'lower_boundary',
MAX(`index_field`) AS upper_boundary,
COUNT(*) AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(' #', `field1`, ..., `fieldN`, CONCAT(ISNULL(`field1`), ..., ISNULL(`fieldN`)))) AS UNSIGNED)), 10, 16)), 0) AS crc
FROM `schema`.`table` FORCE INDEX(`chunk_index`)
WHERE `index_field`>=:upper_boundary_of_previous_chunk:
ORDER BY `index_field`
/*checksum chunk*/

At least one row which satisfies WHERE `index_field`=:upper_boundary_of_previous_chunk: will be read twice this way. This won't spoil checksum results but is unnecessary overhead. I'm quite sure this can be avoided with something like MAX(CASE ... END) however I don't want to go that far now.

Thanks for the answers in advance.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.