Unable to Restore PXC - shutdown pods: exceeded wait limit

Hi,

I tried to restore my PXC with a backup. The cluster I want to restore is the one from which the backup was made.

The version of running percona mysql operator is: 1.11.0

The PXC was installed with helm chart depending on :

dependencies:
- name: pxc-db 
  version: 1.11.5
  repository: https://percona.github.io/percona-helm-charts

with the following values:

pxc-db:
  finalizers:
    - delete-pxc-pods-in-order
    - delete-proxysql-pvc
    - delete-pxc-pvc
  fullnameOverride: testcluster
  pxc:
    expose:
      enabled: false
    persistence:
      enabled: true
      size: 10Gi
      storageClass: hcloud-volumes
    disableTLS: false
  resources:
    limits:
      memory: 1G
      cpu: 600m
  backup:
    enabled: true
    pitr:
      enabled: true
      storageName: devscr-s3-pitr
      timeBetweenUploads: 60
    storages:
      devscr-s3:
        type: s3
        s3:
          credentialsSecret: s3-backup-creds
          region: ''
          bucket: percona
          endpointUrl: https://$DOMAIN:443
      devscr-s3-pitr:
        type: s3
        s3:
          credentialsSecret: s3-backup-creds
          region: ''
          bucket: percona-pitr
          endpointUrl: https:/$DOMAIN:443
  haproxy:
    enabled: true
    size: 2

The backup was created with the following CR:

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: manual-backup-20221114-1345
spec:
  pxcCluster: testcluster
  storageName: devscr-s3

The backup was created without any problems and is also listed in the result of kubectl get pxc-backup

NAME                                        CLUSTER       STORAGE     DESTINATION                                         STATUS      COMPLETED   AGE
manual-backup-20221110-1550                 testcluster   devscr-s3   s3://percona/testcluster-2022-11-10-14:50:01-full   Succeeded   4d1h        4d1h
manual-backup-20221114-1345                 testcluster   devscr-s3   s3://percona/testcluster-2022-11-14-12:44:36-full   Succeeded   4h3m        4h3m

My next step was triggering the restore with the following:

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: rollback-cluster-to-specific-timestamp
spec:
  pxcCluster: testcluster
  backupName: manual-backup-20221114-1345 

So, I was referencing the backup created some minutes ago.

The object was created successfully. But from now there are at least two kind of problems:
1st: sometimes I can spot the same behavior like reported in Unable to Restore PXC, that the restore process hangs and there is no status reported - but after killing operator pod the status changed to ‘stopping cluster’…

2nd: after changing status to ‘stopping cluster’ nearly nothing happened. there is no message providing some hints/problems in the operator log. the only information I got, is the status in PXCrestore-object:

status:
  comments: 'stop cluster testcluster: shutdown pods: exceeded wait limit'
  state: Failed

I added the operator log, just for information.
percona_mysql_op.log (9.7 KB)

My questions:
What can I do to get more information for debugging?
Can I enable debug log in operator pod?
Is there a mistake in my configuration?
What can cause this behavior?

thanks in advance for any additional information providing answer.

kind regards

fgo

1 Like

hey @fgo ,

just for clarity - I cannot reproduce it.

Can you please share anything specific about your setup? Storage type, size, etc?

Also if you can please show the following when these things are happening?

kubectl get pxc-restore -o yaml
kubectl get pods

1 Like