Error with xtrabackup

Hello,

I recently tried to change from rsync to xtrabackup-v2 on an existing xtradb cluster on Ubuntu 14.04 but second and third nodes can’t join the custer.
From the logs it seem i’m having an issue with some sort of authentication or permissions.
The first node starts without any sort of error but i’m not able to join none of the nodes to it. When I change the sst to rsync, it works

Error log


(...)
2014-12-30 03:55:30 20657 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2014-12-30 03:55:30 20657 [Note] WSREP: Passing config to GCS: base_host = 192.168.11.103; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bd/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /bd/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; p
2014-12-30 03:55:30 20657 [Note] WSREP: Service thread queue flushed.
2014-12-30 03:55:30 20657 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2014-12-30 03:55:30 20657 [Note] WSREP: wsrep_sst_grab()
2014-12-30 03:55:30 20657 [Note] WSREP: Start replication
2014-12-30 03:55:30 20657 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2014-12-30 03:55:30 20657 [Note] WSREP: protonet asio version 0
2014-12-30 03:55:30 20657 [Note] WSREP: Using CRC-32C for message checksums.
2014-12-30 03:55:30 20657 [Note] WSREP: backend: asio
2014-12-30 03:55:30 20657 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
2014-12-30 03:55:30 20657 [Note] WSREP: restore pc from disk failed
2014-12-30 03:55:30 20657 [Note] WSREP: GMCast version 0
2014-12-30 03:55:30 20657 [Note] WSREP: (b2bdfec8, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2014-12-30 03:55:30 20657 [Note] WSREP: (b2bdfec8, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2014-12-30 03:55:30 20657 [Note] WSREP: EVS version 0
2014-12-30 03:55:30 20657 [Note] WSREP: gcomm: connecting to group 'nvcluster', peer '192.168.11.101:,192.168.11.102:'
2014-12-30 03:55:30 20657 [Note] WSREP: (b2bdfec8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2014-12-30 03:55:31 20657 [Note] WSREP: declaring b826dbf4 at tcp://192.168.11.101:4567 stable
2014-12-30 03:55:31 20657 [Note] WSREP: Node b826dbf4 state prim
2014-12-30 03:55:31 20657 [Note] WSREP: view(view_id(PRIM,b2bdfec8,30) memb {
b2bdfec8,0
b826dbf4,0
} joined {
} left {
} partitioned {
})
2014-12-30 03:55:31 20657 [Note] WSREP: save pc into disk
2014-12-30 03:55:31 20657 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.11.102:4567
2014-12-30 03:55:31 20657 [Note] WSREP: gcomm: connected
2014-12-30 03:55:31 20657 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2014-12-30 03:55:31 20657 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2014-12-30 03:55:31 20657 [Note] WSREP: Opened channel 'nvcluster'
2014-12-30 03:55:31 20657 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2014-12-30 03:55:31 20657 [Note] WSREP: Waiting for SST to complete.
2014-12-30 03:55:31 20657 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b30afe4c-8fd7-11e4-8df5-4f19a22bc0aa
2014-12-30 03:55:31 20657 [Note] WSREP: STATE EXCHANGE: sent state msg: b30afe4c-8fd7-11e4-8df5-4f19a22bc0aa
2014-12-30 03:55:31 20657 [Note] WSREP: STATE EXCHANGE: got state msg: b30afe4c-8fd7-11e4-8df5-4f19a22bc0aa from 0 (node03)
2014-12-30 03:55:31 20657 [Note] WSREP: STATE EXCHANGE: got state msg: b30afe4c-8fd7-11e4-8df5-4f19a22bc0aa from 1 (node01)
2014-12-30 03:55:31 20657 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 29,
members = 1/2 (joined/total),
act_id = 161486336,
last_appl. = -1,
protocols = 0/5/3 (gcs/repl/appl),
group UUID = 51143ec4-6c99-11e4-ab3f-87c51f793d2d
2014-12-30 03:55:31 20657 [Note] WSREP: Flow-control interval: [23, 23]
2014-12-30 03:55:31 20657 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 161486336)
2014-12-30 03:55:31 20657 [Note] WSREP: State transfer required:
Group state: 51143ec4-6c99-11e4-ab3f-87c51f793d2d:161486336
Local state: 00000000-0000-0000-0000-000000000000:-1
2014-12-30 03:55:31 20657 [Note] WSREP: New cluster view: global state: 51143ec4-6c99-11e4-ab3f-87c51f793d2d:161486336, view# 30: Primary, number of nodes: 2, my index: 0, protocol version 3
2014-12-30 03:55:31 20657 [Warning] WSREP: Gap in state sequence. Need state transfer.
2014-12-30 03:55:31 20657 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.11.103' --auth 'sstuser:ssts3cret' --datadir '/bd/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '20657' '' '
WSREP_SST: [INFO] Streaming with xbstream (20141230 03:55:31.456)
WSREP_SST: [INFO] Using socat as streamer (20141230 03:55:31.459)
WSREP_SST: [INFO] Stale sst_in_progress file: /bd/mysql//sst_in_progress (20141230 03:55:31.929)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20141230 03:55:31.973)
2014-12-30 03:55:32 20657 [Note] WSREP: Prepared SST request: xtrabackup-v2|192.168.11.103:4444/xtrabackup_sst//1
2014-12-30 03:55:32 20657 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-12-30 03:55:32 20657 [Note] WSREP: REPL Protocols: 5 (3, 1)
2014-12-30 03:55:32 20657 [Note] WSREP: Service thread queue flushed.
2014-12-30 03:55:32 20657 [Note] WSREP: Assign initial position for certification: 161486336, protocol version: 3
2014-12-30 03:55:32 20657 [Note] WSREP: Service thread queue flushed.
2014-12-30 03:55:32 20657 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (51143ec4-6c99-11e4-ab3f-87c51f793d2d): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
2014-12-30 03:55:32 20657 [Note] WSREP: Member 0.0 (node03) requested state transfer from '*any*'. Selected 1.0 (node01)(SYNCED) as donor.
2014-12-30 03:55:32 20657 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 161486336)
2014-12-30 03:55:32 20657 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Proceeding with SST (20141230 03:55:32.824)
WSREP_SST: [INFO] Cleaning the existing datadir (20141230 03:55:32.828)
removed ‘/bd/mysql/ib_logfile0’
removed ‘/bd/mysql/ib_logfile1’
removed ‘/bd/mysql/backup-my.cnf’
removed ‘/bd/mysql/ibdata1’
WSREP_SST: [INFO] Cleaning the binlog directory /bd as well (20141230 03:55:33.264)
[COLOR=#FF0000]WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20141230 03:55:33.273)
2014-12-30 03:55:34 20657 [Note] WSREP: (b2bdfec8, 'tcp://0.0.0.0:4567') turning message relay requesting off
grep: /bd/mysql//xtrabackup_checkpoints: No such file or directory
2014-12-30 03:55:43 20657 [Warning] WSREP: 1.0 (node01): State transfer to 0.0 (node03) failed: -22 (Invalid argument)
2014-12-30 03:55:43 20657 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():722: Will never receive state. Need to abort.
2014-12-30 03:55:43 20657 [Note] WSREP: gcomm: terminating thread
2014-12-30 03:55:43 20657 [Note] WSREP: gcomm: joining thread
2014-12-30 03:55:43 20657 [Note] WSREP: gcomm: closing backend
WSREP_SST: [INFO] Preparing the backup at /bd/mysql/ (20141230 03:55:43.733)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20141230 03:55:43.737)
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20141230 03:55:44.093)
[COLOR=#FF0000]2014-12-30 03:55:44 20657 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.11.103' --auth 'sstuser:ssts3cret' --datadir '/bd/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '20657' '' : 1 (Operation not permitted)
2014-12-30 03:55:44 20657 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2014-12-30 03:55:44 20657 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2014-12-30 03:55:44 20657 [ERROR] Aborting

2014-12-30 03:55:44 20657 [Note] WSREP: view(view_id(NON_PRIM,b2bdfec8,30) memb {
b2bdfec8,0
} joined {
} left {
} partitioned {
b826dbf4,0
})
2014-12-30 03:55:44 20657 [Note] WSREP: view((empty))
2014-12-30 03:55:44 20657 [Note] WSREP: gcomm: closed
2014-12-30 03:55:44 20657 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted (core dumped)
141230 03:55:45 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

They are running xtradb version 5.6.

Can anyone help me, please.

Thank you.

The config of each node is like this


[mysqld]
server_id=1
binlog_format=ROW
log_bin=/bd/logs
wsrep_cluster_address=gcomm://192.168.11.101,192.168.11.102,192.168.11.102
wsrep_provider=/usr/lib/libgalera_smm.so
datadir=/var/lib/mysql
tempdir=/tmp


expire_logs_days = 10
max_binlog_size = 100M

max_connections = 16000
open_files_limit = 55000
innodb_thread_concurrency = 32
innodb_buffer_pool_size = 7G
innodb_lock_wait_timeout = 100
innodb_additional_mem_pool_size = 20M
innodb_autoinc_lock_mode = 0
skip-name-resolve
#testes
#engine_condition_pushdown = 0
key_buffer = 1500M
max_allowed_packet = 1024M
thread_stack = 256K
thread_cache_size = 128
#table_cache = 6000
table_definition_cache = 6000
thread_concurrency = 16
join_buffer_size = 262144
query_cache_size = 256M
query_cache_min_res_unit = 4K
concurrent_insert = 2
range_alloc_block_size = 2M
max_heap_table_size =512M
tmp_table_size = 512M
log_bin_trust_function_creators=TRUE
wsrep_slave_threads=4
wsrep_cluster_name=nvcluster
#wsrep_sst_method=xtrabackup
wsrep_sst_method=rsync
wsrep_node_name=node02
wsrep_node_address=192.168.11.101
expire_logs_days = 5
log_slave_updates
wait_timeout=180
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
#pedro 14112014
innodb_file_per_table = 1

[mysql]
prompt="node01 mysql> "

And the innobackup.prepare.log gives me this error

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona LLC and/or its affiliates 2009-2013. All Rights Reserved.

This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.

Get the latest version of Percona XtraBackup, documentation, and help resources:
http://www.percona.com/xb/p

141230 02:52:28 innobackupex: Starting the apply-log operation

IMPORTANT: Please check that the apply-log run completes successfully.
At the end of a successful apply-log run innobackupex
prints "completed OK!".


141230 02:52:28 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/var/lib/mysql/backup-my.cnf" --defaults-group="mysqld" --prepare --target-dir=/var/lib/mysql

xtrabackup version 2.2.7 based on MySQL server 5.6.21 Linux (x86_64) (revision id: )
xtrabackup: cd to /var/lib/mysql
xtrabackup: Error: cannot open ./xtrabackup_checkpoints
xtrabackup: error: xtrabackup_read_metadata()
xtrabackup: This target seems not to have correct metadata...
2014-12-30 02:52:28 7f6494c57780 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
xtrabackup: Warning: cannot open ./xtrabackup_logfile. will try to find.
2014-12-30 02:52:28 7f6494c57780 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
xtrabackup: Fatal error: cannot find ./xtrabackup_logfile.
xtrabackup: Error: xtrabackup_init_temp_log() failed.
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2633.
main::apply_log() called at /usr//bin/innobackupex line 1561
innobackupex: Error:
innobackupex: ibbackup failed at /usr//bin/innobackupex line 2633.

Thank you once again.

After changing the sst method from rsync to xtrabackup-v2, have you added the sstuser user with ssts3cret password to the donor? I’m not seeing on the my.cnf file content you’ve just posted the variable wsrep_sst_auth=sstuser:ssts3cret

Hi, wagnerbianchi

Thank you for your reply.

Yes, i’ve had added to the wsrep_sst_auth variable a user.
Not that user and password, but a specific user/password with permissions.
I’v also tried with root user and it didn’t work.

Ok, but, have you tried with…?

wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=user:pass

Additionally, share the current my.cnf and error logs to check what’s really going on there…

Yes, i’ve tried with both variables and the logs i’ve pasted on my first two posts are with both of them.

It’s OK, so, let’s go through the details posted until now…

Based on the configuration file content you posted on your second post, it seems that the tmpdir variable is wrongly written as [COLOR=#FF0000]tempdir, you must adjust that - check more of that here.

As I’m not seeing any configuration about InnoDB Transaction Logs, I’m assuming that you’re running MySQL as its default and you must double check if there are two ib_logfileX at /var/lib/mysql - it’s good to have it well configured on my.cnf.

Additionally, what is the path that appears after the --datadir flag below?

Some points to troubleshooting when going to xtrabackup-v2 from rsync:
[LIST]
[]Check if the user configured at wsrep_sst_auth really exists on the donor’s mysql;
[
]Check if all nodes have the same configs and the current version of percona-xtrabackup package - this is going to determine the correct value of wsrep_sst_method;
[]Check if anything is blocking connections between nodes;
[
]Check the innobackup.backup.log with tail -f to keep track the backup phase of SST execution;
[]Check the error log on donor and joiner to keep track messages and detect possible bizarre things happening behind the scenes;
[/LIST] To finish, let us know what’s the version of…
[LIST]
[
]$ mysqld --version
[*]$ xtrabackup --version
[/LIST] Just to give an example, my last Percona XtraDB Cluster setup has the below package’s versions:

[vagrant@node01 mysql]$ xtrabackup --version
xtrabackup version 2.2.7 based on MySQL server 5.6.21 Linux (x86_64) (revision id: )
[vagrant@node01 mysql]$ mysqld --version
mysqld Ver 5.6.21-70.1-56 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel70.1, Revision 938, WSREP version 25.8, wsrep_25.8.r4150)

And it’s…

mysql> show global status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | f9cdb70c-8483-11e4-86af-4b70fd8e7a88 |
| wsrep_cluster_status | Primary |
+--------------------------+--------------------------------------+
4 rows in set (0.00 sec)

Let us know where you are on it, cheers and happy new year’s eve!!

Hi, wagnerbianchi

The tempdir was already corrected to tmpdir. That was a specification on the node 3 only, not the other 2 nodes.

In the /var/lib/mysql directory exists two ib_log files, ib_logfile0 and ib_logfile1.

The --datadir while executing the sync with xtrabackup-v2 is the correct one, pointing to /var/lib/mysql.

Here are the versions you’ve asked:


root@node01:~# mysqld --version
mysqld Ver 5.6.21-70.1-56 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel70.1, Revision 938, WSREP version 25.8, wsrep_25.8.r4150)
root@node01:~# xtrabackup --version
xtrabackup version 2.2.7 based on MySQL server 5.6.21 Linux (x86_64) (revision id: )

And the wsrep query:


mysql> show global status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 51143ec4-6c99-11e4-ab3f-87c51f793d2d |
| wsrep_cluster_status | Primary |
+--------------------------+--------------------------------------+

One thing I’ve noticed recently it’s different galera versions on node1 and node2.
Node1


mysql> show global status like 'wsrep_provider%';
+------------------------+-----------------------------------+
| Variable_name | Value |
+------------------------+-----------------------------------+
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 3.2(r170) |
+------------------------+-----------------------------------+

Node2


show global status like 'wsrep_provider%';
+------------------------+-----------------------------------+
| Variable_name | Value |
+------------------------+-----------------------------------+
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 3.8(rf6147dd) |
+------------------------+-----------------------------------+

Do you think this can be the issue?

Thank you once again and we wish you a great 2015!

Thanks for the wishes! BTW, I haven’t seen a case like that yet when two nodes pertaining to the same cluster have different galera provider’s versions. Perhaps you must give it a try and validate if different provider’s versions will generate this kind of issue.

Can you upgrade the Node1 and check if the problem remains there?

I’ll try to replicate it on a testing environment and will come back to you with a feedback.
Thanks once more!