Second node won't join cluster/SST fails

Hello,

I’ve had a 3-node cluster online for a few days, and I tried to take the second node down to change the tmpdir in my.cnf (disk was getting full). When I start MySQL now, the node will not recieve an SST and fails with some frustrating error messages.

Packages (same on both nodes)
ii percona-xtradb-cluster-server-5.6 5.6.21-25.8-938.trusty
ii percona-xtrabackup 2.2.9-5067-1.trusty

my.cnf

# Path to Galera library
wsrep_provider=/usr/lib/libgalera_smm.so
# Cluster connection URL
wsrep_cluster_address=gcomm://1.1.1.1,2.2.2.2,3.3.3.3
#wsrep_cluster_address=gcomm://
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB
# This changes how |InnoDB| autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Authentication for SST method
wsrep_sst_auth="sstuser:passwordhere"
# Node #1 address
wsrep_node_address=2.2.2.2
# SST method
wsrep_sst_method=xtrabackup-v2
# Cluster name
wsrep_cluster_name=brp
#innodb_buffer_pool_size=145774M
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_io_capacity=200
innodb_doublewrite=1
innodb_log_file_size=1024M
innodb_log_buffer_size=96M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=64
#innodb_file_format=barracuda
innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode=2
## avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
innodb_data_home_dir=/var/lib/mysql
innodb_log_group_home_dir=/var/lib/mysql

innobackup.prepare.log (JOINER)


InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona LLC and/or its affiliates 2009-2013. All Rights Reserved.

This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.

Get the latest version of Percona XtraBackup, documentation, and help resources:
http://www.percona.com/xb/p

150324 12:01:13 innobackupex: Starting the apply-log operation

IMPORTANT: Please check that the apply-log run completes successfully.
At the end of a successful apply-log run innobackupex
prints "completed OK!".


150324 12:01:13 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/var/lib/mysql/.sst/backup-my.cnf" --defaults-group="mysqld" --prepare --target-dir=/var/lib/mysql/.sst

xtrabackup version 2.2.9 based on MySQL server 5.6.22 Linux (x86_64) (revision id: )
xtrabackup: cd to /var/lib/mysql/.sst
xtrabackup: Error: cannot open ./xtrabackup_checkpoints
xtrabackup: error: xtrabackup_read_metadata()
xtrabackup: This target seems not to have correct metadata...
2015-03-24 12:01:13 7fc3e1ed3780 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
xtrabackup: Warning: cannot open ./xtrabackup_logfile. will try to find.
2015-03-24 12:01:13 7fc3e1ed3780 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
xtrabackup: Fatal error: cannot find ./xtrabackup_logfile.
xtrabackup: Error: xtrabackup_init_temp_log() failed.
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2642.
main::apply_log() called at /usr//bin/innobackupex line 1570
innobackupex: Error:
innobackupex: ibbackup failed at /usr//bin/innobackupex line 2642.

innobackup.backup.log (DONOR)


InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona LLC and/or its affiliates 2009-2013. All Rights Reserved.

This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.

Get the latest version of Percona XtraBackup, documentation, and help resources:
http://www.percona.com/xb/p

150324 11:58:24 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/mysql/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock' as 'sstuser' (using password: YES).
150324 11:58:24 innobackupex: Connected to MySQL server
150324 11:58:24 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
At the end of a successful backup run innobackupex
prints "completed OK!".

innobackupex: Using server version 5.6.21-70.1-56

innobackupex: Created backup directory /tmp/tmp.EZdqYbIZoL

150324 11:58:24 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/etc/mysql/my.cnf" --defaults-group="mysqld" --backup --suspend-at-end --target-dir=/tmp --tmpdir=/tmp --extra-lsndir='/tmp' --stream=xbstream
innobackupex: Waiting for ibbackup (pid=20892) to suspend
innobackupex: Suspend file '/tmp/xtrabackup_suspended_2'

xtrabackup version 2.2.9 based on MySQL server 5.6.22 Linux (x86_64) (revision id: )
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 1024000, set to 1024000
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = /var/lib/mysql
xtrabackup: innodb_data_file_path = ibdata1:100M:autoextend
xtrabackup: innodb_log_group_home_dir = /var/lib/mysql
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 1073741824
xtrabackup: using O_DIRECT
>> log scanned up to (1374935811)
xtrabackup: Generating a list of tablespaces
2015-03-24 11:58:24 7f9248add780 InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means mysqld does not have the access rights to
InnoDB: the directory.
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2704.
main::wait_for_ibbackup_file_create('/tmp/xtrabackup_suspended_2') called at /usr//bin/innobackupex line 2724
main::wait_for_ibbackup_suspend('/tmp/xtrabackup_suspended_2') called at /usr//bin/innobackupex line 1977
main::backup() called at /usr//bin/innobackupex line 1601
innobackupex: Error: The xtrabackup child process has died at /usr//bin/innobackupex line 2704.

Now, there are 2 errors that look like permissions/OS errors. OS error 13 Permission Denied, and OS error 2, file not found. The file not found relates to not finding the xtrabackup-checkpoints I think, but I have no idea if I have to fix that or not.

The Permission Denied errors, make no sense to me, here is some permissions from my setup.

(JOINER)
-rw-r–r-- 1 mysql root 5164 Mar 24 11:56 /etc/mysql/my.cnf <— my.cnf
drwxrwxrwt 5 root root 4096 Mar 24 12:20 tmp/ <----tmpdir
drwxr-xr-x 3 mysql mysql 12288 Mar 24 12:08 mysql/ <— datadir

(DONER)
-rw-r–r-- 1 mysql root 5164 Mar 24 11:56 my.cnf <— my.cnf
drwxrwxrwt 4 root root 4096 Mar 24 12:20 tmp/ ← tmpdir
drwxr-xr-x 20 mysql mysql 12288 Mar 24 12:09 mysql/ <----datadir

Any help is appreciated. I’ve gone round in circles for hours and hours checking the basic config and it all seems OK.

I’m not sure how much use it is but here is the mysql error.log from Joiner as well:


2015-03-24 12:01:00 6622 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.22.11.19:4444/xtrabackup_sst//1
2015-03-24 12:01:00 6622 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-03-24 12:01:00 6622 [Note] WSREP: REPL Protocols: 6 (3, 2)
2015-03-24 12:01:00 6622 [Note] WSREP: Service thread queue flushed.
2015-03-24 12:01:00 6622 [Note] WSREP: Assign initial position for certification: 14211, protocol version: 3
2015-03-24 12:01:00 6622 [Note] WSREP: Service thread queue flushed.
2015-03-24 12:01:00 6622 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (8e461731-bc35-11e4-8a30-32847f70120f): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
2015-03-24 12:01:00 6622 [Note] WSREP: Member 0.0 (brpdbsl03) requested state transfer from '*any*'. Selected 1.0 (brpdbsl04)(SYNCED) as donor.
2015-03-24 12:01:00 6622 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 14211)
2015-03-24 12:01:00 6622 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Proceeding with SST (20150324 12:01:01.470)
WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous SST (20150324 12:01:01.473)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20150324 12:01:01.477)
removed ‘/var/lib/mysql/ibdata1’
removed ‘/var/lib/mysql/ib_logfile0’
removed ‘/var/lib/mysql/auto.cnf’
removed ‘/var/lib/mysql/.sst/backup-my.cnf’
removed ‘/var/lib/mysql/.sst/innobackup.prepare.log’
removed directory: ‘/var/lib/mysql/.sst’
removed ‘/var/lib/mysql/ib_logfile1’
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[&#64;]} ) (20150324 12:01:02.041)
2015-03-24 12:01:02 6622 [Note] WSREP: (6f3dc360, 'tcp://0.0.0.0:4567') turning message relay requesting off
grep: /var/lib/mysql//.sst/xtrabackup_checkpoints: No such file or directory
WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20150324 12:01:12.933)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20150324 12:01:12.936)
2015-03-24 12:01:12 6622 [Warning] WSREP: 1.0 (brpdbsl04): State transfer to 0.0 (brpdbsl03) failed: -22 (Invalid argument)
2015-03-24 12:01:12 6622 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():722: Will never receive state. Need to abort.
2015-03-24 12:01:12 6622 [Note] WSREP: gcomm: terminating thread
2015-03-24 12:01:12 6622 [Note] WSREP: gcomm: joining thread
2015-03-24 12:01:12 6622 [Note] WSREP: gcomm: closing backend
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150324 12:01:13.202)
2015-03-24 12:01:13 6622 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.22.11.19' --auth 'sstuser:nqNnnCQwHsrJ' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '6622' '' : 1 (Operation not permitted)
2015-03-24 12:01:13 6622 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-03-24 12:01:13 6622 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2015-03-24 12:01:13 6622 [ERROR] Aborting

Fixed.

The problem was that the datadir was mounted on a local disk, and at some point a lost+found directory had been created. This was preventing the backup from working as it would be unable to read the directory and just quit.

Annoyingly, the error messages did not point to this at all and were more a hinderance than a help.

Thanks,

Jamie

Thanks robothands … we spent a couple of days on this issue and finally stumbled across this.

I agree that the error message really sends you on the wrong path.

I’m hoping this has been rectified in newer versions :slight_smile: