SST Step is failing while 2nd node trying to join the Percona xtradb cluster 5.7

Hi,

I really need help here. I’ve 3 nodes and The 1st node started with bootstrap and the cluster is up and running with 1 node. Now, while the 2nd node trying to join the cluster the SST step is failing. Below is the error in joiner. Also, I’ve attached the log files from joiner and donor.

2018-12-21T13:51:14.606655Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-12-21T13:51:14.606665Z 1 [Note] WSREP: Setting wsrep_ready to false
2018-12-21T13:51:14.606900Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘192.168.50.84’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘3695’ --mysqld-version ‘5.7.23-23-57’ --binlog ‘/var/lib/mysql/mysql-bin’ )
2018-12-21T13:51:14.607720Z 0 [ERROR] WSREP: Failed to read 'ready ’ from: wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘192.168.50.84’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘3695’ --mysqld-version ‘5.7.23-23-57’ --binlog ‘/var/lib/mysql/mysql-bin’
Read: ‘(null)’
2018-12-21T13:51:14.607838Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘192.168.50.84’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘3695’ --mysqld-version ‘5.7.23-23-57’ --binlog ‘/var/lib/mysql/mysql-bin’ : 2 (No such file or directory)
2018-12-21T13:51:14.608109Z 1 [ERROR] WSREP: Failed to prepare for ‘xtrabackup-v2’ SST. Unrecoverable.
2018-12-21T13:51:14.608143Z 1 [ERROR] Aborting

Note: All the ports are opened in firewall. I’m using latest version of percona xtradb cluster.

Log.zip (84.8 KB)

Could you post your backup logs from all nodes please.
Also the my.cnf from the PRIM node and the one that is failing.

There is no backup log generated at the data file location. I’ve attached the .cnf files from both donor and joiner nodes.

my.cnf.zip (5.85 KB)

Hi,

Please make sure that if you are using the settings are correct or it is disabled.

SELinux:

Check if your firewall allows the communication between the nodes on the ports: [LIST=1]
[]Regular MySQL port (default is 3306).
[
]Port for group communication (default is 4567).
[]Port for State Snaphot Transfer (default is 4444).
[
]Port for Incremental State Transfer (default is port for group communication + 1 or 4568).
[/LIST] It is possible to identify that the . Another point, check if you have any file on your datadir on the joiner and remove it. Another theory is that the socat might be with a problem:

Check if both are using the same versions.

OS:
Donor:
[root@qtsxtradb03 ~]# cat /etc/os-release
NAME=“Red Hat Enterprise Linux Server”
VERSION=“7.5 (Maipo)”
ID=“rhel”
ID_LIKE=“fedora”
VARIANT=“Server”
VARIANT_ID=“server”
VERSION_ID=“7.5”
PRETTY_NAME=“Red Hat Enterprise Linux Server 7.5 (Maipo)”
ANSI_COLOR=“0;31”
CPE_NAME=“cpe:/o:redhat:enterprise_linux:7.5:GA:server”
HOME_URL=“https://www.redhat.com/
BUG_REPORT_URL=“https://bugzilla.redhat.com/

REDHAT_BUGZILLA_PRODUCT=“Red Hat Enterprise Linux 7”
REDHAT_BUGZILLA_PRODUCT_VERSION=7.5
REDHAT_SUPPORT_PRODUCT=“Red Hat Enterprise Linux”
REDHAT_SUPPORT_PRODUCT_VERSION=“7.5”

Joiner:

[root@qtsxtradb02 ~]# cat /etc/os-release
NAME=“Red Hat Enterprise Linux Server”
VERSION=“7.5 (Maipo)”
ID=“rhel”
ID_LIKE=“fedora”
VARIANT=“Server”
VARIANT_ID=“server”
VERSION_ID=“7.5”
PRETTY_NAME=“Red Hat Enterprise Linux Server 7.5 (Maipo)”
ANSI_COLOR=“0;31”
CPE_NAME=“cpe:/o:redhat:enterprise_linux:7.5:GA:server”
HOME_URL=“https://www.redhat.com/
BUG_REPORT_URL=“https://bugzilla.redhat.com/

REDHAT_BUGZILLA_PRODUCT=“Red Hat Enterprise Linux 7”
REDHAT_BUGZILLA_PRODUCT_VERSION=7.5
REDHAT_SUPPORT_PRODUCT=“Red Hat Enterprise Linux”
REDHAT_SUPPORT_PRODUCT_VERSION=“7.5”

[I][B]As mentioned earlier post, all the ports are opened. And tested listening onto the mentioned ports.

As I have sent the error logs, it seems the PRIMARY node is receiving request from the joiner node.[/B][/I]

socat version:

Donor:
[root@qtsxtradb03 ~]# rpm -qa |grep -i socat
socat-1.7.3.2-2.el7.x86_64

Joiner:
[root@qtsxtradb02 ~]# rpm -qa |grep -i socat
socat-1.7.3.2-2.el7.x86_64

Yes I have files in datadir in joiner node as follows.

[root@qtsxtradb02 mysql]# ll
total 733540
-rw-rw----. 1 mysql mysql 56 Sep 28 05:36 auto.cnf
drwxr-x—. 2 mysql mysql 20 Nov 30 03:16 db1
-rw-r-----. 1 mysql mysql 134219048 Dec 21 05:51 galera.cache
-rw-r-----. 1 mysql mysql 0 Dec 21 05:13 grastate.dat
-rw-r-----. 1 mysql mysql 463 Nov 30 04:28 ib_buffer_pool
-rw-rw----. 1 mysql mysql 79691776 Dec 21 05:51 ibdata1
-rw-r-----. 1 mysql mysql 268435456 Dec 21 05:51 ib_logfile0
-rw-r-----. 1 mysql mysql 268435456 Nov 29 05:40 ib_logfile1
drwx------. 2 mysql mysql 4096 Sep 28 05:21 mysql
-rw-r-----. 1 mysql mysql 0 Dec 21 05:51 mysql-bin.index
-rw-rw----. 1 root root 5 Dec 21 05:51 mysqld_safe.pid
-rw-r-----. 1 mysql mysql 315228 Dec 21 05:51 mysql-error.log
-rw-rw----. 1 mysql mysql 3019 Dec 21 05:51 mysql-slow.log
drwx------. 2 mysql mysql 4096 Sep 28 05:21 performance_schema
-rw-rw----. 1 mysql mysql 167 Nov 13 04:03 relay-bin.000004
-rw-rw----. 1 mysql mysql 1257 Nov 16 03:09 relay-bin.000005
-rw-rw----. 1 mysql mysql 64 Nov 13 04:03 relay-bin.index
-rw-r–r--. 1 mysql mysql 117 Sep 28 05:21 RPM_UPGRADE_HISTORY
-rw-r–r--. 1 mysql mysql 117 Sep 28 05:21 RPM_UPGRADE_MARKER-LAST
drwx------. 2 mysql mysql 60 Nov 13 04:10 Test
[root@qtsxtradb02 mysql]# pwd
/var/lib/mysql

[I][B]Do I have to delete everything?
As per the demo video on Percona xtradb cluster, the files in datadir of the joiner node was not removed.

Please specify what are the files I have to remove from the joiner’s datadir.[/B][/I]

Hi,

Did you check for SELinux?

Check if SELinux is disabled:

It is necessary only to remove file.

The donor node is not starting the SST script (there’s no SST logging output in the donor error logs).

In the wsrep.cnf for the donor node, the wsrep_sst_method is commented out

SST method

#wsrep_sst_method=xtrabackup-v2

This causes the donor node to not startup the script for the donor side (thus the joiner node fails).

For additional SST-only error logging, you can set wsrep_debug in the [sst] section.
[sst]
wsrep_debug=ON


Actually, PXC defaults to xtrabackup-v2, so it should still work. I would suggest enabling the SST wsrep_debug to ON on both sides and seeing what happens. For some reason the SST process is not starting up on the donor side…

Hi,

I have disabled the SELinux in both the servers.

Enabled wsrep_sst_method=xtrabackup-v2

Added below lines into both server’s my.cnf
[sst]
wsrep_debug=ON

Now I have attached the logs from both donor and joiner server,

Now everything in the joiner datadir got deleted.
[root@qtsxtradb02 mysql]# ll
total 348
-rw-r-----. 1 mysql mysql 346357 Jan 3 03:48 mysql-error.log
-rw-rw----. 1 mysql mysql 3275 Dec 24 00:41 mysql-slow.log
-rw-r–r--. 1 mysql mysql 117 Sep 28 05:21 RPM_UPGRADE_HISTORY
[root@qtsxtradb02 mysql]# pwd
/var/lib/mysql

innobackup log is having below information.

xtrabackup: [ERROR] Could not open required defaults file: /etc/my.cnf
xtrabackup: [ERROR] Fatal error in defaults handling. Program aborted!

So, I gave read permission to mysql user to the file.
-rw-r-----. 1 root mysql 2043 Dec 21 04:57 my.cnf

After this all,
Now when starting mysql at joiner it’s not doing anything and nothing is getting written onto the logs.
​​​​​​​
[root@qtsxtradb02 mysql]# systemctl start mysql
Job for mysql.service failed because the control process exited with error code. See “systemctl status mysql.service” and “journalctl -xe” for details.

Log.zip (49.9 KB)

Now I have deleted all the files from the joiner’s datadir and mysql now started at joiner.

Thanks for all the help.