Help understanding the doc and the process! pt-table-checksum/pt-table-sync, etc

Hello everyone,

For some time I have wanted to learn to use the pt-toolkit to help repair replication issues, but I’ve always had a terrible time understanding the documents. For example, do I run the tool on the source or on the replica? What incantations are required? Here is a simple scenario that I run into all the time:

Our source server, A, has two replicas, B and C. C is a read-only replica that never gets into trouble, so we can ignore him. (All servers use ROW replication, but I ain’t married to it.) However, B has replication filters; users are allowed to write to non-replicated data, but you can imagine how this goes.

So my situation is this:

  • I trust source A
  • I do NOT trust the replicated data on B (I don’t not care about unreplicated data)

I wish to run a tool on 100% of source A’s data, find the differences in replicated data in B, and then fix those errors. Is this possible? For the sake of clarity, let’s say that:

I must admit up-front that you may be reading this and thinking, “This man want’s us to show him exactly how to make this happen!” I admit that does seem lazy, but I have read the instructions over and over again and always get confused, especially when it comes to “run this tool on X server BUT NOT Y.” So a simple example would be very, very much appreciated. Thank you again for your time.

Thank you,

Josh

Hello @joshlawrence,

I think we all are lazy in a away and hence, have no regrets :slight_smile:
With above fact in mind I have two blogs noted below for you, which I suppose have examples that you seek. Instead of doc, go through them and see if you could resolve your issue. If not, come back with your attempted commands and errors / issues.

  1. Percona Database Performance Blog
  2. MySQL replication primer with pt-table-checksum / pt-table-sync, part 2

And we also have a tutorial video for verifying replication inconsistencies using percona toolkit.

Cheers,
K

Ah, thank you so much. It’s not so much that I’m lazy, but reading can be hard for me. I was hoping for some clearer explanations, but the video may do the trick.

Please do not assume every question on here is because of laziness. And I do have regrets.

@joshlawrence,
pt-table-checksum is always ran on the source. Since you have filters on B, you must run pt-t-c on A and use the --tables, or --databases flag to limit which tables are checksummed. Failure to do this will break replication on both B and C.

After the tool finishes, go to B and SELECT DISTINCT db, tbl FROM percona.checksums WHERE this_crc != master_crc to find out which tables on B differ from A. Then you can run pt-table-sync (on B, use --sync-to-master), specifying those tables and pt-table-sync will figure out what to do.

Ah-ha, this is starting to make sense. Let me repeat back to you what I think you said:

  • Run pt-t-c on the SOURCE, but because some servers down the line (server B) have filters, I must use --tables and --databases to make sure I am only checksumming tables that exists EVERYWHERE. is this correct?
  • Tool is now finished, moved to server B and run the query you mention. Uh oh, we have errors, so
  • Run pt-t-s on B using --sync-to-master.

Is this correct?

You got it! Actually, you should use --replicate percona.checksum instead of --sync-to-master. Using the former option makes pt-t-s only check those with differences. You will also need either --print or --execute to view, or apply the changes to the source. Changes ALWAYS happen to the source, even if B has different data, an UPDATE will be executed on the source to fix.

Thank you so, so much. So if I want to check the consistency of a database called my_awesome_db is the same on source as on replica B. So I will run this statement on the source:

pt-table-checksum -h 127.0.0.1 -d my_awesome_db --replicate percona.checksums --recursion-method=dsn=h=[slaves IP address],D=percona,t=dsns

I haven’t really done anything with percona.dsns, should I?

You shouldn’t need to use the --recursion-method because you don’t have complex replication topology. Otherwise, yes, that’s good. Make sure the MySQL username you are using to execute pt-t-c exists on B and has permissions to connect from A. That is how pt-tc will monitor replication.