Waiting for the --replicate table to replicate - nothing else happens

Hello,

I’m really new to using the percona tools, and I’m trying to learn how to use them correctly.
I’m having a problem with replication drift in my 4 node mysql setup. I’m running a wiki on this setup. And when I create a new page or edit text in the wiki, the changes will only get saved to one of the 4 db nodes and not get replicated to the others,

I’m running MariaDB-server-10.0.21-1​ on all 4 database nodes.

I’ll then have to go into the wiki config and point the wiki to one DB at a time until I find the missing content. And then I’ll have to replicate the database by hand to the other 4 nodes.
This will happen even tho all indicators of replication seem fine on all 4 db nodes.

For example this is what I see on all of them as demonstrated on the 1st node:


[root@db1:~] #mysql -e "show slave status \G" | egrep "Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0

It’s a real pain! So I’m hoping to solve this problem with some help from the percona toolkit.

I just ran a pt-table-checksum command on my 4 node database cluster, and I got this response back:


[root@db1:~] #pt-table-checksum h=localhost -d jfwiki -t text -u admin -p secret --no-check-binlog-format --no-version-check
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...
Waiting for the --replicate table to replicate to db4...

And it just keeps repeating that message, and nothing else seems to happen.

What do these messages mean? And how can I get past this so that I can really check the state of replication with this tool?

Here’s the full output of the show slave status command in case that provides some more context.


MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: db2.example.com
Master_User: jf_slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000082
Read_Master_Log_Pos: 376
Relay_Log_File: db4-relay-bin.000062
Relay_Log_Pos: 537
Relay_Master_Log_File: mariadb-bin.000082
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 376
Relay_Log_Space: 832
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: Yes
Master_SSL_CA_File: /opt/mysql/ca.crt
Master_SSL_CA_Path:
Master_SSL_Cert: /opt/mysql/db4.example.com.crt
Master_SSL_Cipher:
Master_SSL_Key: /opt/mysql/db4.example.com.key
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_SSL_Crl: /opt/mysql/ca.crt
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
1 row in set (0.00 sec)

Thanks!

Hi bluethundr;

Sounds like you have an issue with your replication config. The fact that your normal setup will not replicate changes means there is a problem, and that is likely the same thing that is stopping the pt-table-checksum tool from working properly.

How are your four servers hooked together replication-wise? Do you always write to the same server, or do you write to all four servers? That aside, the first thing I would check is to make sure all four of your servers have unique server-ids set in their MySQL configs.

-Scott

Hi Scott,

Thanks for your reply! Not sure what could be going in my config that could be causing the replication to silently drift out of sync. But I’ll be happy to show you what I’ve got!

And the reason that the pt-table-checksum wasn’t working is because the percona database hadn’t successfully replicated to db4. Once I replicated the percona db from one of the masters to db4, all the percona tools started working.

However, with that said, I am extremely novice in using the percona tools. Although I have been around mysql for a while it was mostly installing and configuring it on the sysadmin side, and I’ve never held the role of DBA. And I don’t have a DBA to rely on in this case! Although I am willing and do consider myself able to learn how to work more with MySQL than I currently do.

So I’m hoping to fix this problem I’m having with replication where changes written to the database don’t automatically get replicated to all nodes.

As far as how replication hooks together, it goes like this. I have HA/Proxy sitting in front of two masters that are slaving off each other. Then each master has it’s own slave, where the slave’s role is to backup the database from its master onto Amazon S3 using S3Fs (s3 file system). And the slave only nodes are set to read-only. The wiki config is set to write to the two load balanced database nodes, and read from the two slave.

To summarize replication is set like this:


DB1 -> DB2
DB2 -> DB1
DB1 -> DB3
DB2 -> DB4

And her’s my mysql configuration on each node:

checking mysql config on host: db1

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
ssl
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db1.example.com.crt
ssl-key=/opt/mysql/db1.example.com.key
ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL
server-id=1
log_bin=/var/log/mariadb/mariadb-bin.log
expire_logs_days=10
max_binlog_size=100M
sync_master_info = 1
sync_relay_log = 1
sync_relay_log_info = 1


[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
master-connect-retry=60


#
# include all files from the config directory
#
!includedir /etc/my.cnf.d


[client]
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db1.example.com.crt
ssl-key=/opt/mysql/db1.example.com.key

checking mysql config on host: db2

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
ssl
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db2.example.com.crt
ssl-key=/opt/mysql/db2.example.com.key
ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL
server-id=2
log_bin=/var/log/mariadb/mariadb-bin.log
expire_logs_days=10
max_binlog_size=100M
sync_master_info = 1
sync_relay_log = 1
sync_relay_log_info = 1


[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
master-connect-retry=60


#
# include all files from the config directory
#
!includedir /etc/my.cnf.d


[client]
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db2.example.com.crt
ssl-key=/opt/mysql/db2.example.com.key

checking mysql config on host: db3

mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
read-only=1
ssl
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db3.example.com.crt
ssl-key=/opt/mysql/db3.example.com.key
ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL
server-id=3
log_bin=/var/log/mariadb/mariadb-bin.log
expire_logs_days=10
max_binlog_size=100M
sync_master_info = 1
sync_relay_log = 1
sync_relay_log_info = 1


[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
master-connect-retry=60


#
# include all files from the config directory
#
!includedir /etc/my.cnf.d


[client]
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db3.example.com.crt
ssl-key=/opt/mysql/db3.example.com.key

checking mysql config on host: db4

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
read-only=1
ssl
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db4.example.com.crt
ssl-key=/opt/mysql/db4.example.com.key
ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL
server-id=4
log_bin=/var/log/mariadb/mariadb-bin.log
expire_logs_days=10
max_binlog_size=100M
sync_master_info = 1
sync_relay_log = 1
sync_relay_log_info = 1


[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
master-connect-retry=60


#
# include all files from the config directory
#
!includedir /etc/my.cnf.d


[client]
ssl-ca=/opt/mysql/ca.crt
ssl-cert=/opt/mysql/db4.example.com.crt
ssl-key=/opt/mysql/db4.example.com.key

Earlier in the config before I tried using the percona tools, I had which databases to replicate set with binlog_do_db commands in the config. But when I saw that the percona tools wouldn’t work that way I learned that I could perform replication without specifying the binlog_do_db command in the config.

I really appreciate any help and advice you can give on how to solve this problem!

Thanks

Hi bluethundr;

Sounds like you are learning a lot, so that is always good!

Looking at your configs, I’m guessing your problem is log_slave_updates (or lack there of):
[url]http://dev.mysql.com/doc/refman/5.5/en/replication-options-slave.html#option_mysqld_log-slave-updates[/url]

Since you have slaves replicating from slaves, updates will not get passed down the chain past the first slave without log_slave_updates enabled on each of the slaves.

For example you seem to have DB1 > DB2 > DB4 as one possible chain. If you perform an update on DB1, it will replicate to DB2. However it will not replicate to DB4 unless DB2 has log_slave_updates enabled. This would be the same case for DB2 > DB1 > DB3.

So based on your setup, at a minimum, I would enable log_slave_updates on DB1 and DB2. That should allow replication to work as you would expect on all hosts.

-Scott

Hey Scott,

Yeah thanks! I really do enjoy learning about this stuff! And thanks for the clear explanation. I feel like I understand the problem a little bit better at this point! I’ve tried the config change that you’ve suggested. I’m looking forward to seeing if this solves it!

Thanks,
Tim