Error trying to create a backup

Hi guys!
We are configuring a backup policy in our cluster. We’ve defined two storage types (s3 and pvc). When we run an on demand backup, we get this error in pod logs:

+ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
2020/09/10 14:37:27 Peer finder enter
2020/09/10 14:37:27 Determined Domain to be pxc.svc.cluster.local
2020/09/10 14:37:27 Peer list updated
was []
now [cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local cluster1-pxc-1.cluster1-pxc.pxc.svc.cluster.local cluster1-pxc-2.cluster1-pxc.pxc.svc.cluster.local]
2020/09/10 14:37:27 execing: /usr/bin/get-pxc-state with stdin: cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local
cluster1-pxc-1.cluster1-pxc.pxc.svc.cluster.local
cluster1-pxc-2.cluster1-pxc.pxc.svc.cluster.local
2020/09/10 14:37:27 
node:cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
node:cluster1-pxc-1.cluster1-pxc.pxc.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
node:cluster1-pxc-2.cluster1-pxc.pxc.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary
2020/09/10 14:37:28 Peer finder exiting
[ERROR] Cannot find node for backup
+ echo '[ERROR] Cannot find node for backup'
+ exit 1

Do you have any clue about what is happening?
Many thanks in advance! :slight_smile:
Marc

1 Like

It seems ALL your nodes are in “Donor/Desynced:” state, which is not normal, but it is impossible to say why it happened.

For backup to start it needs to find a node in “Synced” status.

1 Like

I’m getting the same issue also with the latest v1.8.0 Operator version.

+ GARBD_OPTS=
+ check_ssl
+ CA=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ '[' -f /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt ']'
+ CA=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
+ SSL_DIR=/etc/mysql/ssl
+ '[' -f /etc/mysql/ssl/ca.crt ']'
+ CA=/etc/mysql/ssl/ca.crt
+ SSL_INTERNAL_DIR=/etc/mysql/ssl-internal
+ '[' -f /etc/mysql/ssl-internal/ca.crt ']'
+ CA=/etc/mysql/ssl-internal/ca.crt
+ KEY=/etc/mysql/ssl/tls.key
+ CERT=/etc/mysql/ssl/tls.crt
+ '[' -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
+ KEY=/etc/mysql/ssl-internal/tls.key
+ CERT=/etc/mysql/ssl-internal/tls.crt
+ '[' -f /etc/mysql/ssl-internal/ca.crt -a -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
+ GARBD_OPTS='socket.ssl_ca=/etc/mysql/ssl-internal/ca.crt;socket.ssl_cert=/etc/mysql/ssl-internal/tls.crt;socket.ssl_key=/etc/mysql/ssl-internal/tls.key;socket.ssl_cipher=;pc.weight=0;'
+ request_streaming
++ hostname -i
+ local LOCAL_IP=10.225.0.79
++ get_backup_source
+++ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
+++ grep wsrep_cluster_size
+++ sort
+++ tail -1
+++ cut -d : -f 12
++ CLUSTER_SIZE=3
+++ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
+++ grep wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary
+++ sort -r
+++ tail -1
+++ cut -d : -f 2
+++ cut -d . -f 1
++ FIRST_NODE=cluster1-pxc-0
++ SKIP_FIRST_POD='|'
++ ((  3 > 1  ))
++ SKIP_FIRST_POD=cluster1-pxc-0
++ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
++ grep wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary
++ grep -v cluster1-pxc-0
++ sort
++ tail -1
++ cut -d : -f 2
++ cut -d . -f 1
+ local NODE_NAME=
+ '[' -z '' ']'
+ peer-list -on-start=/usr/bin/get-pxc-state -service=cluster1-pxc
2021/05/12 07:37:57 Peer finder enter
2021/05/12 07:37:57 Determined Domain to be pxc-new.svc.cluster.local
2021/05/12 07:37:57 Peer list updated
was []
now [cluster1-pxc-0.cluster1-pxc.pxc-new.svc.cluster.local cluster1-pxc-1.cluster1-pxc.pxc-new.svc.cluster.local cluster1-pxc-2.cluster1-pxc.pxc-new.svc.cluster.local]
2021/05/12 07:37:57 execing: /usr/bin/get-pxc-state with stdin: cluster1-pxc-0.cluster1-pxc.pxc-new.svc.cluster.local
cluster1-pxc-1.cluster1-pxc.pxc-new.svc.cluster.local
cluster1-pxc-2.cluster1-pxc.pxc-new.svc.cluster.local
2021/05/12 07:37:57
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-0.cluster1-pxc.pxc-new.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-1.cluster1-pxc.pxc-new.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
cat: /etc/mysql/mysql-users-secret/xtrabackup: No such file or directory
node:cluster1-pxc-2.cluster1-pxc.pxc-new.svc.cluster.local:wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Donor/Desynced:wsrep_cluster_status:Primary:wsrep_cluster_size:3
2021/05/12 07:37:58 Peer finder exiting
[ERROR] Cannot find node for backup
+ echo '[ERROR] Cannot find node for backup'
+ exit 1

I simply started a Backup with Create PerconaXtraDBClusterBackup > YAML

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: backup1
  namespace: pxc-new
spec:
  pxcCluster: cluster1
  storageName: s3-us-west

I see the same error also with storageName: fs-pvc. The database in the instance is still empty. I don’t think it takes so long until full sync status is reached.
But anyway, I’m not able to see any unsynced or synced cluster status anywhere.

1 Like

@mygov As you can see from the log you have three PXC nodes and two of them are in ‘Donor’ state(you have the backups in running state). Donor could not been chosen for backup. The first pxc-0 node has ‘Synced’ state but it is the first ‘primary’ pod which accepts the writes. So, it also can’t be used for backup. Did you make the several backups at the same time?

1 Like

I deleted the manual backup jobs before I started a new one. So I will not see any active jobs or running pods which indicate that something is still going on. It would be great to see some status somewhere when the cluster is ready to proceed with another backupjob.

After trying to start again a backup in about 10-20 minutes, the backup runs successfully.

The reason for starting the backupjobs one behind the other was an invalid s3 bucket configuration and a server timeout.

1 Like

@mygov Without the logs from all pxc pods and from backup pods I can not say what exactly happened with your backups/cluster.

1 Like

@mygov try to set ‘endpointUrl’ in backup storage options. In my case it was empty and for some reason backup script use ‘https://s3.amazonaws.com

INFO: [SST script] + mc -C /tmp/mc config host add dest https://s3.amazonaws.com ACCESS_KEY_ID SECRET_ACCESS_KEY

which is wrong. So I’ve setup manually endpointUrl: "https://s3.eu-central-1.amazonaws.com" in config

1 Like