Hello everyone,
For some time I have wanted to learn to use the pt-toolkit to help repair replication issues, but I’ve always had a terrible time understanding the documents. For example, do I run the tool on the source or on the replica? What incantations are required? Here is a simple scenario that I run into all the time:
Our source server, A, has two replicas, B and C. C is a read-only replica that never gets into trouble, so we can ignore him. (All servers use ROW replication, but I ain’t married to it.) However, B has replication filters; users are allowed to write to non-replicated data, but you can imagine how this goes.
So my situation is this:
- I trust source A
- I do NOT trust the replicated data on B (I don’t not care about unreplicated data)
I wish to run a tool on 100% of source A’s data, find the differences in replicated data in B, and then fix those errors. Is this possible? For the sake of clarity, let’s say that:
I must admit up-front that you may be reading this and thinking, “This man want’s us to show him exactly how to make this happen!” I admit that does seem lazy, but I have read the instructions over and over again and always get confused, especially when it comes to “run this tool on X server BUT NOT Y.” So a simple example would be very, very much appreciated. Thank you again for your time.
Thank you,
Josh
Hello @joshlawrence,
I think we all are lazy in a away and hence, have no regrets
With above fact in mind I have two blogs noted below for you, which I suppose have examples that you seek. Instead of doc, go through them and see if you could resolve your issue. If not, come back with your attempted commands and errors / issues.
- Percona Database Performance Blog
- MySQL replication primer with pt-table-checksum / pt-table-sync, part 2
And we also have a tutorial video for verifying replication inconsistencies using percona toolkit.
Cheers,
K
Ah, thank you so much. It’s not so much that I’m lazy, but reading can be hard for me. I was hoping for some clearer explanations, but the video may do the trick.
Please do not assume every question on here is because of laziness. And I do have regrets.
@joshlawrence,
pt-table-checksum is always ran on the source. Since you have filters on B, you must run pt-t-c on A and use the --tables
, or --databases
flag to limit which tables are checksummed. Failure to do this will break replication on both B and C.
After the tool finishes, go to B and SELECT DISTINCT db, tbl FROM percona.checksums WHERE this_crc != master_crc
to find out which tables on B differ from A. Then you can run pt-table-sync (on B, use --sync-to-master), specifying those tables and pt-table-sync will figure out what to do.
Ah-ha, this is starting to make sense. Let me repeat back to you what I think you said:
- Run pt-t-c on the SOURCE, but because some servers down the line (server B) have filters, I must use --tables and --databases to make sure I am only checksumming tables that exists EVERYWHERE. is this correct?
- Tool is now finished, moved to server B and run the query you mention. Uh oh, we have errors, so
- Run pt-t-s on B using --sync-to-master.
Is this correct?
You got it! Actually, you should use --replicate percona.checksum
instead of --sync-to-master
. Using the former option makes pt-t-s only check those with differences. You will also need either --print
or --execute
to view, or apply the changes to the source. Changes ALWAYS happen to the source, even if B has different data, an UPDATE will be executed on the source to fix.
Thank you so, so much. So if I want to check the consistency of a database called my_awesome_db
is the same on source as on replica B. So I will run this statement on the source:
pt-table-checksum -h 127.0.0.1 -d my_awesome_db --replicate percona.checksums --recursion-method=dsn=h=[slaves IP address],D=percona,t=dsns
I haven’t really done anything with percona.dsns, should I?
You shouldn’t need to use the --recursion-method because you don’t have complex replication topology. Otherwise, yes, that’s good. Make sure the MySQL username you are using to execute pt-t-c exists on B and has permissions to connect from A. That is how pt-tc will monitor replication.