pt-table-checksum - miscellaneous questions

Taking a look at pt-table-checksum, a couple of questions came up. One of which I feel may be an omission from the docs:

  1. What are the minimum user privileges required for pt-table-checksum to function? Of course you need a user/password to connect as, but what privileges does that user require? I have only seen reference to “all privileges” in a couple of forum posts, and was wondering if there isn’t a smaller/safer set?

  2. I am seeing weird output in my dev system (testing before running this in production):

Does this mean I need to do some kind of (index) repair on zarafa.tproperties?

  1. How do you actually “fix” diffs? For example:

There is one diff here. How can I see what the diff is or simply just fix the one diff? I am assuming the answer is “pt-table-sync”?

When I run pt-table-sync with some options:

And then actually execute, if I run checksum again:

If I run this replace manually on the slave server:

The timestamp is basically “now”. I’m wondering if I’m catching an item that is changing while the checksum is running, that hasn’t been picked up during replication yet by the slave?

Thanks all!

  1. I also see this error:

I do have perl-DBD-MySQL installed:
perl-DBD-MySQL.x86_64 3.0007-2.el5

This is MySQL 5.0 on EL5:
mysql.x86_64 5.0.95-5.el5_9

Any thoughts here?

  1. https://blueprints.launchpad.net/percona-toolkit/+spec/document-pt-table-checksum-privs
  2. What version of pt-table-checksum you are running ? I noticed couple of bugs reported. Can you please try with latest version.
  3. Diffs should be fixed with pt-table-sync tool. You can inspect the queries which will execute to fix the differences on slave via pt-table-sync --print option. Running diff queries manually on slave is not recommended. pt-table-sync executes REPLACE queries on master so this not change any data on master and changes will be propagated to slave via replication which is safe way to do it. Not sure, what’s your question regarding to NOW() function this may help you on this http://www.mysqlperformanceblog.com/2012/09/25/timezone-and-pt-table-checksum/
  4. innodb_lock_wait_timeout is not dynamic. pt-table-checksum sets its session-level innodb_lock_wait_timeout to 1 second. You can override the default value with --set-vars option. Check here for details http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html#cmdoption-pt-table-checksum–set-vars
  1. Great info! This should probably be somewhere in the pt-table-checksum docs?

  2. pt-table-checksum 2.2.6, which is the latest from the website.

  3. Yeah, I read the docs a little later and noticed not to run them on the slave. I guess the question is – I’m running the REPLACE, but if I then run pt-table-checksum after running the REPLACE, it still says there’s a diff. If I run the table-sync --print again, it shows the same exact REPLACE statement as before. It’s as if nothing changed on the slave. But the slave is properly running (seconds behind = 0, IO running, etc).

What I mean by “now” is not NOW() the MySQL command, but rather if you look at the output of the replace statement, it has a timestamp on it. For example:
‘2014-01-05 16:26:54’

Here is some output:

You can see that the timestamp of just before we ran pt-table-sync and the timestamp of the thing that is not sync’d is almost identical (within seconds). So this makes me wonder if the pt-table-sync is “seeing” out of sync before the slave table actually gets updated by the original insert?

Every time I run the above command (date && pt-table-sync…) there is only one result, and it’s within seconds of the sync being run, and always a new sync_time value.

It doesn’t appear that with mysql Ver 14.12 Distrib 5.0.95, for redhat-linux-gnu (x86_64) using readline 5.1 this works. Is a timeout of 50 a bad thing? Is 1 a sane level for normal operations? Should this work with --set-vars against 5.0.95?

  1. It seems one of my original questions is still unanswered:
    I am seeing weird output in my dev system (testing before running this in production):

Does this mean I need to do some kind of (index) repair on zarafa.tproperties? If I run pt-table-checksum with --nocheck-plan it will check this table, but the description for --nocheck-plan makes it sound “dangerous”:
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html#cmdoption-pt-table-checksum–[no]check-plan

Thoughts here?

Thanks!

Is this bug perhaps related?

https://bugs.launchpad.net/percona-toolkit/+bug/1130498

@thoraxe

For #2 - what is the structure of you zarafa.tproperties? I would imagine the PRIMARY key to be composite of at least 2 columns. The warning means that an EXPLAIN output executed by pt-table-checksum (to make sure chunk-size*chunk-size-limit is not exceeded) says that not all columns are used on the index. This is usually not a problem - but if it does, you can tune --chunk-size and --chunk-size-limit in case the chunks becomes oversized and of course the --no-check-plan option (but I do not recommend that).

If I don’t use --no-check-plan it doesn’t appear to check those tables at all.

Here is the describe for tproperties:


mysql> describe tproperties;
+-------------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------------------+------+-----+---------+-------+
| folderid | int(11) unsigned | NO | PRI | 0 | |
| hierarchyid | int(11) unsigned | NO | PRI | 0 | |
| tag | smallint(6) unsigned | NO | PRI | 0 | |
| type | smallint(6) unsigned | NO | PRI | NULL | |
| val_ulong | int(11) unsigned | YES | | NULL | |
| val_string | longtext | YES | | NULL | |
| val_binary | longblob | YES | | NULL | |
| val_double | double | YES | | NULL | |
| val_longint | bigint(20) | YES | | NULL | |
| val_hi | int(11) | YES | | NULL | |
| val_lo | int(11) unsigned | YES | | NULL | |
+-------------+----------------------+------+-----+---------+-------+
11 rows in set (0.00 sec)