Error trying to create a backup

marlovil · September 10, 2020, 9:11am

Hi guys!
We are configuring a backup policy in our cluster. We’ve defined two storage types (s3 and pvc). When we run an on demand backup, we get this error in pod logs:

+ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
2020/09/10 14:37:27 Peer finder enter
2020/09/10 14:37:27 Determined Domain to be pxc.svc.cluster.local
2020/09/10 14:37:27 Peer list updated
was []
now [cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local cluster1-pxc-1.cluster1-pxc.pxc.svc.cluster.local cluster1-pxc-2.cluster1-pxc.pxc.svc.cluster.local]
2020/09/10 14:37:27 execing: /usr/bin/get-pxc-state with stdin: cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local
cluster1-pxc-1.cluster1-pxc.pxc.svc.cluster.local
cluster1-pxc-2.cluster1-pxc.pxc.svc.cluster.local
2020/09/10 14:37:27 
node:cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
node:cluster1-pxc-1.cluster1-pxc.pxc.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
node:cluster1-pxc-2.cluster1-pxc.pxc.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
2020/09/10 14:37:28 Peer finder exiting
[ERROR] Cannot find node for backup
+ echo '[ERROR] Cannot find node for backup'
+ exit 1

Do you have any clue about what is happening?
Many thanks in advance!
Marc

vadimtk · October 2, 2020, 5:57am

It seems ALL your nodes are in “Donor/Desynced:” state, which is not normal, but it is impossible to say why it happened.

For backup to start it needs to find a node in “Synced” status.

mygov · May 12, 2021, 7:44am

I’m getting the same issue also with the latest v1.8.0 Operator version.

+ GARBD_OPTS=
+ check_ssl
+ CA=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ '[' -f /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt ']'
+ CA=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
+ SSL_DIR=/etc/mysql/ssl
+ '[' -f /etc/mysql/ssl/ca.crt ']'
+ CA=/etc/mysql/ssl/ca.crt
+ SSL_INTERNAL_DIR=/etc/mysql/ssl-internal
+ '[' -f /etc/mysql/ssl-internal/ca.crt ']'
+ CA=/etc/mysql/ssl-internal/ca.crt
+ KEY=/etc/mysql/ssl/tls.key
+ CERT=/etc/mysql/ssl/tls.crt
+ '[' -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
+ KEY=/etc/mysql/ssl-internal/tls.key
+ CERT=/etc/mysql/ssl-internal/tls.crt
+ '[' -f /etc/mysql/ssl-internal/ca.crt -a -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
+ GARBD_OPTS='socket.ssl_ca=/etc/mysql/ssl-internal/ca.crt;socket.ssl_cert=/etc/mysql/ssl-internal/tls.crt;socket.ssl_key=/etc/mysql/ssl-internal/tls.key;socket.ssl_cipher=;pc.weight=0;'
+ request_streaming
++ hostname -i
+ local LOCAL_IP=10.225.0.79
++ get_backup_source
+++ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
+++ grep wsrep_cluster_size
+++ sort
+++ tail -1
+++ cut -d : -f 12
++ CLUSTER_SIZE=3
+++ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
+++ grep wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary
+++ sort -r
+++ tail -1
+++ cut -d : -f 2
+++ cut -d . -f 1
++ FIRST_NODE=cluster1-pxc-0
++ SKIP_FIRST_POD='|'
++ ((  3 > 1  ))
++ SKIP_FIRST_POD=cluster1-pxc-0
++ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
++ grep wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary
++ grep -v cluster1-pxc-0
++ sort
++ tail -1
++ cut -d : -f 2
++ cut -d . -f 1
+ local NODE_NAME=
+ '[' -z '' ']'
+ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
2021/05/12 07:37:57 Peer finder enter
2021/05/12 07:37:57 Determined Domain to be pxc-new.svc.cluster.local
2021/05/12 07:37:57 Peer list updated
was []
now [cluster1-pxc-0.cluster1-pxc.pxc-new.svc.cluster.local cluster1-pxc-1.cluster1-pxc.pxc-new.svc.cluster.local cluster1-pxc-2.cluster1-pxc.pxc-new.svc.cluster.local]
2021/05/12 07:37:57 execing: /usr/bin/get-pxc-state with stdin: cluster1-pxc-0.cluster1-pxc.pxc-new.svc.cluster.local
cluster1-pxc-1.cluster1-pxc.pxc-new.svc.cluster.local
cluster1-pxc-2.cluster1-pxc.pxc-new.svc.cluster.local
2021/05/12 07:37:57
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-0.cluster1-pxc.pxc-new.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-1.cluster1-pxc.pxc-new.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-2.cluster1-pxc.pxc-new.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
2021/05/12 07:37:58 Peer finder exiting
[ERROR] Cannot find node for backup
+ echo '[ERROR] Cannot find node for backup'
+ exit 1

I simply started a Backup with Create PerconaXtraDBClusterBackup > YAML

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: backup1
  namespace: pxc-new
spec:
  pxcCluster: cluster1
  storageName: s3-us-west

I see the same error also with storageName: fs-pvc. The database in the instance is still empty. I don’t think it takes so long until full sync status is reached.
But anyway, I’m not able to see any unsynced or synced cluster status anywhere.

Slava_Sarzhan · May 12, 2021, 5:39pm

@mygov As you can see from the log you have three PXC nodes and two of them are in ‘Donor’ state(you have the backups in running state). Donor could not been chosen for backup. The first pxc-0 node has ‘Synced’ state but it is the first ‘primary’ pod which accepts the writes. So, it also can’t be used for backup. Did you make the several backups at the same time?

mygov · May 12, 2021, 5:57pm

I deleted the manual backup jobs before I started a new one. So I will not see any active jobs or running pods which indicate that something is still going on. It would be great to see some status somewhere when the cluster is ready to proceed with another backupjob.

After trying to start again a backup in about 10-20 minutes, the backup runs successfully.

The reason for starting the backupjobs one behind the other was an invalid s3 bucket configuration and a server timeout.

Slava_Sarzhan · May 28, 2021, 2:03pm

@mygov Without the logs from all pxc pods and from backup pods I can not say what exactly happened with your backups/cluster.

Oleksandr_Slubskyi · November 23, 2021, 12:41pm

@mygov try to set ‘endpointUrl’ in backup storage options. In my case it was empty and for some reason backup script use ‘https://s3.amazonaws.com’

INFO: [SST script] + mc -C /tmp/mc config host add dest https://s3.amazonaws.com ACCESS_KEY_ID SECRET_ACCESS_KEY

which is wrong. So I’ve setup manually endpointUrl: "https://s3.eu-central-1.amazonaws.com" in config

ajay · January 27, 2022, 3:16am

Hi, iam having the same issue changed the endpointurl’s but still the error is same and s3 credentials are working for other application backup jobs. Using the latest 1.10.0 pxc.

2022/01/27 03:13:22
node:cluster1-pxc-db-pxc-0.cluster1-pxc-db-pxc.pxo.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
node:cluster1-pxc-db-pxc-1.cluster1-pxc-db-pxc.pxo.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
node:cluster1-pxc-db-pxc-2.cluster1-pxc-db-pxc.pxo.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
2022/01/27 03:13:23 Peer finder exiting

echo ‘[ERROR] Cannot find node for backup’
exit 1
[ERROR] Cannot find node for backup

please help.

Thank you,

Topic		Replies	Views
All nodes say safe_to_bootstrap: 0 in freshly deployed cluster Percona Operator for MySQL	12	2734	May 12, 2021
Cluster does not start Percona XtraDB Cluster 5.x	3	4369	March 27, 2015
Cluster status and backups not working Percona Operator for MySQL	5	1070	October 28, 2021
Backup Error on Xtradb cluster Percona Operator for MySQL mysql	5	1145	March 7, 2021
Node IP address not obtained from bind_addres Percona XtraDB Cluster 5.x	4	1170	August 14, 2015

Error trying to create a backup

Related topics