pt-table-checksum not showing diffs and reporting success

Hello,

I’ve used pt-table-checksum in the past to great success. However I’m having some trouble using it with a Percona Cluster setup. Currently my setup is a 3 node cluster, and two of these nodes each have their own standard replication slave, for a total of 5 servers.

I’m using the DSN method to define connections to each cluster node as well as the slave instances. One of the slaves is physically located near the cluster so no SSH tunnel is necessary. When running the tool here, it successfully discovers diffs on the slave node.

However, the 2nd slave server is physically far away and can only be connected to via SSH tunnel. I have set up a tunnel, and entered the information into the DSN table. I’ve verified I can connect to the percona server on the slave over the tunnel. When running pt-table-checksum for this slave, it reports 0 diffs found when I am certain there is a difference on a specific table. There are no error message that I can see.

Any ideas?

Thanks!

Hello,
Could you run pt-table-checksum enabling debug info?
PTDEBUG=1 pt-table-checksum

Thanks

Thanks for the reply. Here is the debug output (at least the last bit of it where it is checking the slave in question). I’ve replaced server names, etc.:

pt_table_checksum:10925 40168 Table on [brokenSlaveServer] has 242 rows

pt_table_checksum:12118 40168 use_repl_db

pt_table_checksum:12139 40168 USE percona

pt_table_checksum:10991 40168 DELETE FROM percona.checksums WHERE db = ? AND tbl = ?

pt_table_checksum:12118 40168 use_repl_db

pt_table_checksum:12139 40168 USE ftwill

NibbleIterator:6508 40168 init callback returned 1

NibbleIterator:6544 40168 Nibble: REPLACE INTO percona.checksums (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /!99997/ ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(’#’, CRC32(description), CRC32(type), CRC32(sorder), CRC32(document), CRC32(other), CRC32(pages), aid, CONCAT(ISNULL(description)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM ftwill.5500Type /checksum table/ params:

pt_table_checksum:12204 40168 EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(’#’, CRC32(description), CRC32(type), CRC32(sorder), CRC32(document), CRC32(other), CRC32(pages), aid, CONCAT(ISNULL(description)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM ftwill.5500Type /explain checksum table/ params:

pt_table_checksum:12214 40168 EXPLAIN plan: $VAR1 = {

extra => ‘’,

id => ‘1’,

key => undef,

key_len => undef,

possible_keys => undef,

ref => undef,

rows => ‘242’,

select_type => ‘SIMPLE’,

table => ‘5500Type’,

type => ‘ALL’

};

Quoter:4123 40168 Serializing $VAR1 = [];

Quoter:4123 40168 Serializing $VAR1 = [];

Retry:7976 40168 Try 1 of 2

pt_table_checksum:11584 40168 REPLACE INTO percona.checksums (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /!99997/ ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(’#’, CRC32(description), CRC32(type), CRC32(sorder), CRC32(document), CRC32(other), CRC32(pages), aid, CONCAT(ISNULL(description)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM ftwill.5500Type /checksum table/ lower boundary: upper boundary:

pt_table_checksum:11605 40168 SHOW WARNINGS

Retry:7993 40168 Try code succeeded

pt_table_checksum:11106 40168 Nibble time: 0.00342392921447754

NibbleIterator:6552 40168 0 rows in nibble 1

NibbleIterator:6564 40168 No rows in nibble or nibble skipped

pt_table_checksum:11170 40168 Total avg rate: 70679

WeightedAvgRate:9500 40168 Master op time: 242 n / 0.00342392921447754 s

WeightedAvgRate:9512 40168 Initial avg rate: 70679.0312652322 n/s

WeightedAvgRate:9516 40168 Adjust n to 35339

ReplicaLagWaiter:8592 40168 Checking slave lag

MasterSlave:5435 40168 DBI::db=HASH(0x2f85210) SHOW SLAVE STATUS

ReplicaLagWaiter:8595 40168 [workingSlaveServer] slave lag: 0

MasterSlave:5435 40168 DBI::db=HASH(0x2f39dc0) SHOW SLAVE STATUS

ReplicaLagWaiter:8595 40168 [brokenSlaveServer] slave lag: 0

ReplicaLagWaiter:8628 40168 All slaves caught up

MySQLStatusWaiter:9389 40168 Checking status variables

pt_table_checksum:10675 40168 SHOW GLOBAL STATUS LIKE ? Threads_running

MySQLStatusWaiter:9392 40168 Threads_running = 2

MySQLStatusWaiter:9419 40168 All var vals are low enough

OobNibbleIterator:7175 40168 Done nibbling past boundaries

NibbleIterator:6573 40168 Done nibbling

pt_table_checksum:11229 40168 Checking slave diffs

pt_table_checksum:12298 40168 SELECT MAX(chunk) FROM percona.checksums WHERE db=‘ftwill’ AND tbl=‘5500Type’ AND master_crc IS NOT NULL

pt_table_checksum:12316 40168 Getting last checksum on [clusterNode1]

pt_table_checksum:12319 40168 [clusterNode1] max chunk: 1

pt_table_checksum:12316 40168 Getting last checksum on [workingSlaveServer]

pt_table_checksum:12319 40168 [workingSlaveServer] max chunk: 1

pt_table_checksum:12316 40168 Getting last checksum on [clusterNode3]

pt_table_checksum:12319 40168 [clusterNode3] max chunk: 1

pt_table_checksum:12316 40168 Getting last checksum on [brokenSlaveServer]

pt_table_checksum:12319 40168 [brokenSlaveServer] max chunk: 1

RowChecksum:6256 40168 SELECT CONCAT(db, ‘.’, tbl) AS table, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM percona.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db=‘ftwill’ AND tbl=‘5500Type’)

pt_table_checksum:11261 40168 0 checksum diffs on [clusterNode1]

RowChecksum:6256 40168 SELECT CONCAT(db, ‘.’, tbl) AS table, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM percona.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db=‘ftwill’ AND tbl=‘5500Type’)

pt_table_checksum:11261 40168 0 checksum diffs on [workingSlaveServer]

RowChecksum:6256 40168 SELECT CONCAT(db, ‘.’, tbl) AS table, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM percona.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db=‘ftwill’ AND tbl=‘5500Type’)

pt_table_checksum:11261 40168 0 checksum diffs on [clusterNode3]

RowChecksum:6256 40168 SELECT CONCAT(db, ‘.’, tbl) AS table, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM percona.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db=‘ftwill’ AND tbl=‘5500Type’)

pt_table_checksum:11261 40168 0 checksum diffs on [brokenSlaveServer]

OobNibbleIterator:7183 40168 Finish nibble_sth

OobNibbleIterator:7183 40168 Finish explain_nibble_sth

SchemaIterator:7680 40168 Getting next schema object from dbh DBI::db=HASH(0x2f2c9c0)

SchemaIterator:7751 40168 No more tables in database ftwill

SchemaIterator:7756 40168 No more databases

pt_table_checksum:11456 40168 Exit status 0 oktorun 1 have time 1

Cxn:3806 40168 Destroying cxn

Cxn:3815 40168 DBI::db=HASH(0x2f39dc0) Disconnecting dbh on [brokenSlaveServer] h=127.0.0.1,P=4306

Cxn:3806 40168 Destroying cxn

Cxn:3815 40168 DBI::db=HASH(0x2f3da10) Disconnecting dbh on [clusterNode3] h=10...*,P=3306

Cxn:3806 40168 Destroying cxn

Cxn:3815 40168 DBI::db=HASH(0x2f85210) Disconnecting dbh on [workingSlaveServer] h=10...*,P=3306

Cxn:3806 40168 Destroying cxn

Cxn:3815 40168 DBI::db=HASH(0x2f85258) Disconnecting dbh on [clusterNode1] h=10...*,P=3306

Cxn:3806 40168 Destroying cxn

Cxn:3815 40168 DBI::db=HASH(0x2f2c9c0) Disconnecting dbh on [clusterNode2] h=10...*,P=3306

Sadly, I cannot see any problem from that log. I’ve never tried using a MySQL connection from Perl, using and ssh tunnel. That might be the problem but I cannot tell you for sure.
Could you share the data in the dsns table (stripping user/pass) ?
Perl DBI should be able to connect over an ssh tunnel. The only thing I can think about at this moment is that in the dsns table, use 127.0.0.1 and not localhost to force a tcp connection.

Regards

Sure, screenshot of DSNs table attached. It is currently set to 127.0.0.1 for the tunnel connection.

If you are using an ssh tunnel, how can all the instances can be using the 3306 port?
Shouldn’t they be mapped to the tunneled port? I mean, remotely they are using the 3306 port but, locally, you has had to map the 3306 to a different local port, right?

Hi - the 4 connections via 3306 from the DSN table are accessible directly without the need for a tunnel. Each server has its own IP as they are open to each other.

The one connecting via 4306 is using the ssh tunnel. Local tunnel listening to 4306 and forwarding to the remote host at 3306.