How to verify data consistency for replication(master:window OS, slave:Ubuntu OS)?

Hi gpuser;

Yes the first command looks like it connected that time, however you have replication filters setup, so it exited before doing anything. You can run it if you force it to ignore the replication filters, but as the manual suggests, that is dangerous and can break replication.

[url]pt-table-checksum — Percona Toolkit Documentation

Since you seem to be fairly new to this area, I would not recommend forcing it, because it could cause you a lot of headache if it does break replication.

-Scott

Hi Scott,

So what should i do now since it exited before doing anything due to replication filters and it is not recommended to force it run?

Hi gpuser;

You could:

  1. Get rid of your replication filters. This is always advised, but requires knowledge as to why they are there to start with and what would need to be done to get rid of them.
  2. Run pt-table-checksum on a single database at a time using the --databases option and --no-check-replication-filters and see what happens.
  3. Not run the tool
  4. Pay for support from Percona or another professional organization/dba

Unfortunately what I can help you with is limited to general advice, but it sounds like you may need more involved hands-on help.

-Scott

Hi Scott,

Thanks for your advice.

If I run the pt-table-checksum with no-check-replication-filters, the worst case will stop the replication only? it wont affect my master database?

Hi Scott,

I tried to run pt-table-checksum on a single database at a time using the --databases option and --no-check-replication-filters

command:
replication@replication:/$ pt-table-checksum --databases=SUBS_270315_9NKD3 --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password= --recursion-method dsn=D=percona,t=dsns
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…

it is correct? It wont show the process like below output?

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
06-04T15:46:35 0 0 0 1 0 0.062 mysql.columns_priv
06-04T15:46:35 0 0 47 1 0 0.058 mysql.db
06-04T15:46:35 0 0 0 1 0 0.051 mysql.event
06-04T15:46:35 0 0 0 1 0 0.050 mysql

Hi gpuser;

It should show you the output with a list of tables it checked after it is finished. When you wrote the above was the tool still running? You could add the --progress=time,300 to see if it’s still working on something. It can take quite a while for large tables.

-Scott

Hi Scott,

Yes, when I type command:
pt-table-checksum --progress=time,300 --databases=subs_270315_9nkd3 --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password= --recursion-method dsn=D=percona,t=dsns

The output is :
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…
Waiting for the --replicate table to replicate to replication…

I ran it for whole day still the same output.
any advice?

Hi gpuser;

Depending on your table sizes and number of tables, it could take a long time to run. I have not seen your specific output message before, so not sure if it is stuck waiting on that indefinitely, or if it is just outputting that off and on as it is doing other work. To get a better idea of what it is doing, you could try it in debug mode, but that will output a lot of data, so be careful:

PTDEBUG=1 pt-table-checksum --progress=time,300 --databases=subs_270315_9nkd3 --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password= --recursion-method dsn=D=percona,t=dsns > /tmp/pt.log 2>&1

Other than that, I would check the processlist on the master and slave to see if it is actually doing work still (if you decide not to try debug).

-Scott

Hi Scott,

I run the debug command and store in google drive.
[url]https://drive.google.com/file/d/0B_1idIUKpJq_bVI0U0dqNlZvRDQ/view?usp=sharing[/url]
Please assist me have a look into it.

What I have seen is it keeps looping below error

TableParser:4441 18067 Checking percona.checksums

TableParser:4445 18067 SHOW TABLES FROM percona LIKE ‘checksums’

TableParser:4451 18067 DBD::mysql::db selectrow_arrayref failed: Unknown database ‘percona’ [for Statement “SHOW TABLES FROM percona LIKE ‘checksums’”] at /usr/bin/pt-table-checksum line 4448.

Thank you very much.

Hi gpuser;

It looks like it is complaining about a missing percona database. The issue is probably that your percona database is not getting replicated to the slave due to your replication filters. You could try to manually create the percona database on your slave, and then run it again and see if that works. If that does not work, then what you’ll likely need to do is add “percona” to the list of databases to replicate with the replicate-do-db command in your my.cnf configuration file.

-Scott

Hi Scott,

From Slave dsns table IP, I should put master IP? Because from master dsns I put slave IP

Hi Scott,

I able to run it with manually create the percona database on your slave (dsns put master IP) and add “percona” to the list of databases to replicate with the replicate-do-db command in my.cnf configuration file

But here return me some errors:
Below link is the txt file that I had save.
[url]https://drive.google.com/file/d/0B_1idIUKpJq_cHByeFd2VUU2Y3M/view?usp=sharing[/url]

Below is 2 of the error return:

06-09T19:24:50 Skipping table subs_270315_9nkd3.gl_gstreturntrack because it has problems on these replicas:
Table subs_270315_9nkd3.gl_gstreturntrack does not exist on replica replication
This can break replication. If you understand the risks, specify --no-check-slave-tables to disable this check.
06-09T19:24:50 Error checksumming table subs_270315_9nkd3.gl_gstreturntrack: DBD::mysql::db selectrow_hashref failed: Table ‘subs_270315_9nkd3.gl_gstreturntrack’ doesn’t exist [for Statement “EXPLAIN SELECT * FROM subs_270315_9nkd3.gl_gstreturntrack WHERE 1=1”] at /usr/bin/pt-table-checksum line 6521.

06-09T19:24:59 Skipping table subs_270315_9nkd3.imp_gl_doubleentry because it has problems on these replicas:
Table subs_270315_9nkd3.imp_gl_doubleentry on replica replication is missing these columns: projectcode, docdate
This can break replication. If you understand the risks, specify --no-check-slave-tables to disable this check.
06-09T19:24:59 Skipping table subs_270315_9nkd3.imp_gl_doubleentrykoff because it has problems on these replicas:
Table subs_270315_9nkd3.imp_gl_doubleentrykoff does not exist on replica replication
This can break replication. If you understand the risks, specify --no-check-slave-tables to disable this check.
06-09T19:24:59 Error checksumming table subs_270315_9nkd3.imp_gl_doubleentrykoff: DBD::mysql::db selectrow_hashref failed: Table ‘subs_270315_9nkd3.imp_gl_doubleentrykoff’ doesn’t exist [for Statement “EXPLAIN SELECT * FROM subs_270315_9nkd3.imp_gl_doubleentrykoff WHERE 1=1”] at /usr/bin/pt-table-checksum line 6521.

Hi gpuser;

Do those tables actually exist on the slave? Looks like it is saying one table has a different structure and the other just does not exist at all.

-Scott

Hi Scott,

I checked the table is in Slave.
From Slave dsns table IP, I should put master IP? Because from master dsns I put slave IP

Any advise or how should I troubleshoot it?

Hi Scott,

from the slave percona DB, checksum table do not record the table. But in the actual DB, the table is existed.

Hi gpuser;

The dsns table on the master should have the connection information for the slave. The dsns table on the slave can be the same as the master since normally it would be replicated.

The tool is saying that those tables do no exist / have a different structure. So if you think that is wrong, you can try the --no-check-slave-tables option it mentions. But as it tells you, that will break replication if the tables are actually no there / have the wrong structure. So use that at your own risk.

I would verify your connection information in the master dsns table and make sure that the IP and username/password actually work to connect to the slave from the host you are running pt-table-checksum from as well.

-Scott

Hi Scott,

I tried to run on one DB only it everything look fine without error.

Command:
:~$pt-table-checksum --databases=SUBS_270315_9NKD3 --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password= --recursion-method dsn=D=percona,t=dsns

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
06-24T17:43:39 0 0 0 1 0 0.093 subs_270315_9nkd3.ao_ccpayment
06-24T17:43:39 0 0 0 1 0 0.082 subs_270315_9nkd3.ao_ccpaymentchangeset
06-24T17:43:39 0 0 1 1 0 0.083 subs_270315_9nkd3.ao_ccterminal

When i tried to check all database and it return below output:
:~$pt-table-checksum --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password= --recursion-method dsn=D=percona,t=dsns


Waiting to check replicas for differences: 0% 00:00 remain
Waiting to check replicas for differences: 0% 00:00 remain
Waiting to check replicas for differences: 0% 00:00 remain

I tried to debug it with below command:
:~$PTDEBUG=1 pt-table-checksum --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password= --recursion-method dsn=D=percona,t=dsns > /tmp/pt.log 2>&1


Waiting to check replicas for differences: 0% 00:00 remain

pt_table_checksum:11144 5583 Sleep 1.25 waiting for chunks

pt_table_checksum:11111 5583 replication max chunk: undef

pt_table_checksum:11144 5583 Sleep 1.25 waiting for chunks

pt_table_checksum:11111 5583 replication max chunk: undef


Should I wait longer time to let it finish load the waiting to check replicas for differences?

Hi gpuser;

Are your replication filters preventing some of the databases from being replicated still? If it works on one of them, then it likely is due to your replication configuration.

-Scott

Hi Scott,

When I tried to checksum 2 DB, it run as normal but return me some error output in regards chunk size limit

replication@replication:~$ pt-table-checksum --databases=subs_300115_0yh4k,subs_270315_dbpxq --no-check-replication-filters --host=192.168.104.197 --port=3306 --user=root --password=–recursion-method dsn=D=percona,t=dsns > /home/replication/2506.txt

06-25T18:19:31 Skipping table subs_270315_dbpxq.mc_postcode because on the master it would be checksummed in one chunk but on these replicas it has too many rows:
2819 rows on replication
The current chunk size limit is 2794 rows (chunk size=1397 * chunk size limit=2.0).

how to resolve this?

Hi gpuser;

Try upping your --chunk-size-limit to 3 and give it another go. Take a look at the below link prior so you know what is happening and the warning about it potentially causing additional load on your system:

[url]pt-table-checksum — Percona Toolkit Documentation

-Scott