SST not working with xtrabackup-v2, only with rsync. IPv6 issue ?

Hello.
I have 3 nodes working great for testing purposes. All nodes working only in IPv6 and SELINUX/FIREWALLD disabled.

Problem: the nodes only sync using rsync (wsrep_sst_method=rsync) and I don’t know why xtrabackup-v2 not works. I can use normally xtrabackup for backup.

I think may be some issue with IPv6, but not sure. TCPDUMP in JOINER show traffic in port 4444 coming from the DONOR.

Above the wsrep.cnf and the logs from JOINER (debug mode) and from DONOR. The first error can be see at ‘2018-01-19T11:03:34.087779Z’

I appreciate any help about this issue. Thanks.

[mysqld]
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_cluster_name=pxc-cluster
wsrep_cluster_address=gcomm://[fd00:faca:0:b0::164],[fd00:faca:0:b0::177],[fd00:faca:0:b0::192]
wsrep_provider_options = “gmcast.listen_addr=tcp://[::]:4567;debug=yes”
wsrep_debug=ON
wsrep_node_name=alfresco2
wsrep_node_address=[fd00:faca:0:b0::192]
#wsrep_sst_method=rsync
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sstuser:passw0rd
pxc_strict_mode=PERMISSIVE
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_slave_threads= 8
wsrep_log_conflicts

– JOINER LOG – (see attached file)

– DONOR LOG –
2018-01-19T11:03:31.575884Z 0 [Note] WSREP: Save the discovered primary-component to disk
2018-01-19T11:03:31.576195Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2018-01-19T11:03:31.576220Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2018-01-19T11:03:32.074764Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 6303f372-fd08-11e7-855e-5b19cc198634
2018-01-19T11:03:32.075214Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 6303f372-fd08-11e7-855e-5b19cc198634 from 0 (alfresco2)
2018-01-19T11:03:32.075261Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 6303f372-fd08-11e7-855e-5b19cc198634 from 1 (divisa)
2018-01-19T11:03:32.075278Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 6303f372-fd08-11e7-855e-5b19cc198634 from 2 (sinos)
2018-01-19T11:03:32.075290Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 8,
members = 2/3 (primary/total),
act_id = 11336,
last_appl. = 0,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 5067efe8-f7b8-11e7-99f1-a656309a571f
2018-01-19T11:03:32.075301Z 0 [Note] WSREP: Flow-control interval: [173, 173]
2018-01-19T11:03:32.075321Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-01-19T11:03:32.075385Z 4 [Note] WSREP: New cluster view: global state: 5067efe8-f7b8-11e7-99f1-a656309a571f:11336, view# 9: Primary, number of nodes: 3, my index: 1, protocol version 3
2018-01-19T11:03:32.075406Z 4 [Note] WSREP: Setting wsrep_ready to true
2018-01-19T11:03:32.075416Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-19T11:03:32.075437Z 4 [Note] WSREP: REPL Protocols: 7 (3, 2)
2018-01-19T11:03:32.075449Z 4 [Note] WSREP: Assign initial position for certification: 11336, protocol version: 3
2018-01-19T11:03:32.075493Z 0 [Note] WSREP: Service thread queue flushed.
2018-01-19T11:03:33.017260Z 0 [Note] WSREP: Member 0.0 (alfresco2) requested state transfer from ‘any’. Selected 1.0 (divisa)(SYNCED) as donor.
2018-01-19T11:03:33.017300Z 0 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 11336)
2018-01-19T11:03:33.017395Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-19T11:03:33.017628Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘[fd00:faca:0:b0::192]:4444/xtrabackup_sst//1’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ ‘’ --gtid ‘5067efe8-f7b8-11e7-99f1-a656309a571f:11336’)
2018-01-19T11:03:33.022015Z 4 [Note] WSREP: DONOR thread signaled with 0
2018-01-19T11:03:34.455733Z 0 [Note] WSREP: (6e3010ed, ‘tcp://[::]:4567’) turning message relay requesting off
2018/01/19 09:04:03 socat[14997] E connect(5, AF=10 [fd00:faca:0000:00b0:0000:0000:0000:0192]:4444, 28): Connection refused
2018-01-19T11:04:03.715649Z WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
2018-01-19T11:04:03.717183Z WSREP_SST: [ERROR] Error while sending data to joiner node: exit codes: 0 1
2018-01-19T11:04:03.718670Z WSREP_SST: [ERROR] ******************************************************
2018-01-19T11:04:03.720328Z WSREP_SST: [ERROR] Cleanup after exit with status:32
2018-01-19T11:04:03.733916Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘[fd00:faca:0:b0::192]:4444/xtrabackup_sst//1’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ ‘’ --gtid ‘5067efe8-f7b8-11e7-99f1-a656309a571f:11336’: 32 (Broken pipe)
2018-01-19T11:04:03.733994Z 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘[fd00:faca:0:b0::192]:4444/xtrabackup_sst//1’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ ‘’ --gtid ‘5067efe8-f7b8-11e7-99f1-a656309a571f:11336’
2018-01-19T11:04:03.734784Z 0 [Warning] WSREP: 1.0 (divisa): State transfer to 0.0 (alfresco2) failed: -32 (Broken pipe)
2018-01-19T11:04:03.734820Z 0 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 11336)
2018-01-19T11:04:03.735809Z 0 [Note] WSREP: declaring 82217999 at tcp://[fd00:faca:0:b0::177]:4567 stable
2018-01-19T11:04:03.735849Z 0 [Note] WSREP: forgetting 626ab09e (tcp://[fd00:faca:0:b0::192]:4567)
2018-01-19T11:04:03.736015Z 0 [Note] WSREP: Member 1.0 (divisa) synced with group.
2018-01-19T11:04:03.736029Z 0 [Note] WSREP: Shifting JOINED → SYNCED (TO: 11336)
2018-01-19T11:04:03.736049Z 2 [Note] WSREP: Synchronized with group, ready for connections
2018-01-19T11:04:03.736057Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-19T11:04:03.736474Z 0 [Note] WSREP: Node 6e3010ed state primary
2018-01-19T11:04:03.736686Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,6e3010ed,10)
memb {
6e3010ed,0
82217999,0
}
joined {
}
left {
}
partitioned {
626ab09e,0
}
)
2018-01-19T11:04:03.736724Z 0 [Note] WSREP: Save the discovered primary-component to disk
2018-01-19T11:04:03.736930Z 0 [Note] WSREP: forgetting 626ab09e (tcp://[fd00:faca:0:b0::192]:4567)
2018-01-19T11:04:03.737018Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2018-01-19T11:04:03.737138Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 75e189d9-fd08-11e7-9f92-5f8e89157047
2018-01-19T11:04:03.737564Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 75e189d9-fd08-11e7-9f92-5f8e89157047
2018-01-19T11:04:03.737817Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 75e189d9-fd08-11e7-9f92-5f8e89157047 from 0 (divisa)
2018-01-19T11:04:03.737834Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 75e189d9-fd08-11e7-9f92-5f8e89157047 from 1 (sinos)
2018-01-19T11:04:03.737845Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 9,
members = 2/2 (primary/total),
act_id = 11336,
last_appl. = 0,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 5067efe8-f7b8-11e7-99f1-a656309a571f
2018-01-19T11:04:03.737854Z 0 [Note] WSREP: Flow-control interval: [141, 141]
2018-01-19T11:04:03.737894Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-01-19T11:04:03.737945Z 9 [Note] WSREP: New cluster view: global state: 5067efe8-f7b8-11e7-99f1-a656309a571f:11336, view# 10: Primary, number of nodes: 2, my index: 0, protocol version 3
2018-01-19T11:04:03.737957Z 9 [Note] WSREP: Setting wsrep_ready to true
2018-01-19T11:04:03.737970Z 9 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-19T11:04:03.737984Z 9 [Note] WSREP: REPL Protocols: 7 (3, 2)
2018-01-19T11:04:03.737996Z 9 [Note] WSREP: Assign initial position for certification: 11336, protocol version: 3
2018-01-19T11:04:03.738054Z 0 [Note] WSREP: Service thread queue flushed.
2018-01-19T11:04:08.967742Z 0 [Note] WSREP: cleaning up 626ab09e (tcp://[fd00:faca:0:b0::192]:4567)

JOINER-LOG.txt (57.9 KB)

Can you provide the debug output for the donor side also? (just for the SST script). You can do this by adding the following to your .cnf file

[sst]
wsrep_debug=1

and retry the SST. There’s no need to restart the donor, but this will make the wsrep_sst_xtrabackup-v2 script log all the commands being run.

Thanks,

Hi Kenn!

Let me add a comment: my data folder have 1.3GB and when using the xtrabackup-v2 only ± 314 MB are transfered; using rsync all 1.3GB are transfered to joiner.
Is there any configuration of time or size for the transfer to complete ?

BR.

Attached donor log using xtrabackup-v2 in debug mode.

donor-log.txt (12.9 KB)

Hello.
I have found two solutions and I’m using the 2nd.

1st solution - (not good, need edit the script):
edit /usr/bin/wsrep_sst_xtrabackup-v2, line 532:
replace
tcmd=“socat -u TCP-LISTEN:${TSST_PORT},reuseaddr${sockopt} stdio”
to
tcmd=“socat -u TCP6-LISTEN:${TSST_PORT},reuseaddr${sockopt} stdio”

restart mysql and sst will work using xtrabackup-v2.

2nd solution: (good, edit the config file):
before any changes, check joiner log:
2018-01-25T15:23:38.623734Z WSREP_SST: [DEBUG] The xtrabackup version is 2.4.9
2018-01-25T15:23:38.941069Z WSREP_SST: [DEBUG] Streaming with xbstream
2018-01-25T15:23:38.942338Z WSREP_SST: [DEBUG] Using socat as streamer
2018-01-25T15:23:39.006472Z WSREP_SST: [DEBUG] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio | pv -f -i 10 -N joiner -F ‘%N => Rate:%r Avg:%a Elapsed:%t %e Bytes: %b %p’ | xbstream $xbstreameopts -x; RC=( ${PIPESTATUS[@]} )

edit my.cnf and add parameter:
[sst]
sockopt=“pf=ip6”

save the file, restart mysql and sst will work using xtrabackup-v2.

after restart I see in joiner log:
2018-01-25T15:25:42.423590Z WSREP_SST: [DEBUG] The xtrabackup version is 2.4.9
2018-01-25T15:25:42.767761Z WSREP_SST: [DEBUG] Streaming with xbstream
2018-01-25T15:25:42.769007Z WSREP_SST: [DEBUG] Using socat as streamer
2018-01-25T15:25:42.836480Z WSREP_SST: [DEBUG] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr,pf=ip6,retry=30 stdio | pv -f -i 10 -N joiner -F ‘%N => Rate:%r Avg:%a Elapsed:%t %e Bytes: %b %p’ | xbstream $xbstreameopts -x; RC=( ${PIPESTATUS[@]} )

I found this parameter in socat docs, but nothing in the xtrabackup docs.
I believe socat is only accepting IPv4 connections by default, after adding the parameter sockopt=“pf=ip6” all works fine.
Now, I can stop mysql, erase the data folder and sst transfer all data from any donor.

If this solution is okay for the community, I will mark it as solved.
BR