MariaDB Galera cluster with xtrabackup issue

Hello!

I have recently installed Galera cluster, but I cannot get up running mariadb on the second node.

NODE1 /etc/my.cnf.d/server.cnf:

[mysqld]

[embedded]

[mysqld-5.5]

[mariadb]

datadir=/var/lib/mysql
tmpdir=/tmp/zabbix
[mariadb-10.1]
optimizer_switch = ‘index_condition_pushdown=off’
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=“gcomm://10.10.16.1,10.10.16.2”
wsrep_cluster_name=‘zabbix-cluster’
wsrep_node_address=‘10.10.16.1’
wsrep_node_name=‘zabbix-db-node1’
wsrep_sst_method=xtrabackup
binlog_format=row
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
query_cache_size = 0
query_cache_type = 0
default_storage_engine=InnoDB
wsrep_sst_auth=“zabbix_db_node1:zabbix_db_node1”
innodb_flush_log_at_trx_commit = 2
wsrep_node_incoming_address=‘10.10.16.1’
wsrep_sst_receive_address=‘10.10.16.1’
innodb_file_per_table
innodb_buffer_pool_size = 10G
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 128M
innodb_lock_wait_timeout = 500
innodb_log_file_size = 100M

[mariadb-5.5]

NODE2 /etc/my.cnf.d/server.cnf:

[mysqld]

[embedded]

[mysqld-5.5]

[mariadb]
datadir=/var/lib/mysql
tmpdir=/tmp/zabbix
[mariadb-10.1]
optimizer_switch = ‘index_condition_pushdown=off’
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=“gcomm://10.10.16.1,10.10.16.2”
wsrep_cluster_name=‘zabbix-cluster’
wsrep_node_address=‘10.10.16.2’
wsrep_node_name=‘zabbix-db-node2’
wsrep_sst_method=xtrabackup
binlog_format=row
innodb_doublewrite=1
query_cache_size = 0
query_cache_type = 0
innodb_autoinc_lock_mode=2
default_storage_engine=InnoDB
wsrep_sst_auth=“zabbix_db_node2:zabbix_db_node2”
innodb_flush_log_at_trx_commit = 2
wsrep_node_incoming_address=‘10.10.16.2’
wsrep_sst_receive_address=‘10.10.16.2’
innodb_file_per_table
innodb_buffer_pool_size = 10G
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 128M
innodb_lock_wait_timeout = 500
innodb_log_file_size = 100M

[mariadb-5.5]

1. I start galera_new_cluster on NODE1. Everything seems to be OK:

systemctl status mariadb

● mariadb.service - MariaDB 10.1.31 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Wed 2018-02-14 05:31:09 EST; 8min ago
Docs: man:mysqld(8)
[url]https://mariadb.com/kb/en/library/systemd/[/url]
Process: 10744 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 10694 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=/usr/bin/galera_recovery; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 10691 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 10711 (mysqld)
Status: “Taking your SQL requests now…”
CGroup: /system.slice/mariadb.service
└─10711 /usr/sbin/mysqld --wsrep-new-cluster

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139659971651328 [Note] WSREP: Flow-control interval: [16, 16]
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139659971651328 [Note] WSREP: Trying to continue unpaused monitor
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: New cluster view: global state: ad8e3ac3-116d-11e8-859f-eadc403c6207:1780, view# 3: P…ol version 3
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: REPL Protocols: 7 (3, 2)
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: Assign initial position for certification: 1780, protocol version: 3
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660030375680 [Note] WSREP: Service thread queue flushed.
Feb 14 05:31:23 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:23 139659980044032 [Note] WSREP: (2b3ca384, ‘tcp://0.0.0.0:4567’) connection established to 3324f771 tcp://10.10.16.2:4567
Feb 14 05:31:23 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:23 139659980044032 [Warning] WSREP: discarding established (time wait) 3324f771 (tcp://10.10.16.2:4567)
Feb 14 05:31:24 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:24 139659980044032 [Note] WSREP: cleaning up 3324f771 (tcp://10.10.16.2:4567)
Hint: Some lines were ellipsized, use -l to show in full.

2. I systemctl start mariadb on NODE2, wait some 30 seconds and get:

3. Then I look into the journal:

: Starting MariaDB 10.1.31 database server…
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] /usr/sbin/mysqld (mysqld 10.1.31-MariaDB) starting as process 3150 …
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Read nil XID from storage engines, skipping position init
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/galera/libgalera_smm.so’
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_load(): Galera 25.3.22(r3764) by Codership Oy <info@codership.com> loaded successfully.



Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: gcomm thread scheduling priority set to other:0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: restore pc from disk failed
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: GMCast version 0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: EVS version 0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: gcomm: connecting to group ‘zabbix-cluster’, peer ‘10.10.16.1:,10.10.16.2:’
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, ‘tcp://0.0.0.0:4567’) connection established to 887aad3d tcp://10.10.16.2:4567
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Warning] WSREP: (887aad3d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.16.2:4567’ points to own listening address,



Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Flow-control interval: [23, 23]
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Trying to continue unpaused monitor
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 2321)
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Note] WSREP: State transfer required:
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Group state: ad8e3ac3-116d-11e8-859f-eadc403c6207:2321
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Local state: ad8e3ac3-116d-11e8-859f-eadc403c6207:-1
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Note] WSREP: New cluster view: global state: ad8e3ac3-116d-11e8-859f-eadc403c6207:2321, view# 4: Primary, number o
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Warning] WSREP: Gap in state sequence. Need state transfer.
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.10.16.2’ --datadir ‘/var/lib/mysql/’ -
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: /usr//bin/wsrep_sst_xtrabackup: line 397: setup_ports: command not found
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [ERROR] WSREP: Failed to read ‘ready ’ from: wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.10.16.2’ --da
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Read: ‘(null)’
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.10.16.2’ --datadir
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [ERROR] WSREP: Failed to prepare for ‘xtrabackup’ SST. Unrecoverable.
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [ERROR] Aborting
Feb 14 12:48:18 zabbix-db-node2 mysqld[3150]: Error in my_thread_global_end(): 1 threads didn’t exit
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: Failed to start MariaDB 10.1.31 database server.
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: Unit mariadb.service entered failed state.
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: mariadb.service failed.

What is wrong?
Thank you!

I’ve just hit the same problem.
It appears to be a problem of interaction of the latest version 10.1.31 with innobackupex. We’ve switched some systems back to 10.1.30, which fixes it, and others by switching to rsync, which fixes it, but either solution is not long term.

I’m looking for a bug report now. Will update if I find it.

[url][MDEV-15254] 10.1.31 does not join an existing cluster with SST xtrabackup-v2 - Jira