Percona MongoDB operator backup failure

Hi all,

I’m using the helm chart psmdb-operator and psmdb-db under percona-helm-charts/charts at main · percona/percona-helm-charts · GitHub to deploy a sharded MongoDB.
It was version 1.11.0 and now upgraded to 1.12.0, however, I can’t trigger a successful backup for both of them.

The backup image located on AWS ECR for pulling.
The secret mongodb-backup-s3 is configured with AWS access id&key, the CRD is upgraded using https://github.com/percona/percona-server-mongodb-operator/blob/v1.12.0/deploy/crd.yaml.

And the backup section configuration like this:

backup:
  enabled: true
  image:
    repository: xxxxx.dkr.ecr.cn-northwest-1.amazonaws.com.cn/percona/percona-backup-mongodb
    tag: 1.8.1
  serviceAccountName: xxxx-mongodb-operator-psmdb-operator
  resources:
    limits:
      cpu: "300m"
      memory: "1G"
    requests:
      cpu: "100m"
      memory: "128Mi"
  storages:
    s3-cn-northwest:
      type: s3
      s3:
        bucket: mongodb-backup
        credentialsSecret: mongodb-backup-s3
#        endpointUrl: s3.cn-northwest-1.amazonaws.com.cn
        region: cn-northwest-1
        prefix: ""
  pitr:
    enabled: false
    # oplogSpanMin: 10
  tasks:
  - name: daily-s3
    enabled: true
    schedule: "0 0 * * *"
    keep: 30
    storageName: s3-cn-northwest
    compressionType: gzip

The operator successfully proceed to create the cronjob and trigger the psmdb-backup accordingly, but there is no verbose event except for:

$ kubectl describe psmdb-backup xxx
Spec:
  Cluster Name:      xxx-mongodb-psmdb-db
  Compression Type:  gzip
  Storage Name:      s3-cn-northwest
Status:
  Azure:
    Credentials Secret:
  Destination:           2022-09-08T02:58:59Z
  Error:                 starting deadline exceeded
  Last Transition:       2022-09-08T02:58:59Z
  Pbm Name:              2022-09-08T02:58:59Z
  s3:
    Bucket:             mongodb-backup
    Credentials Secret:  mongodb-backup-s3
    Endpoint URL:        https://s3.cn-northwest-1.amazonaws.com.cn
    Region:              cn-northwest-1
  Start:                 2022-09-08T02:58:59Z
  State:                 error
  Storage Name:          s3-cn-northwest
Events:                  <none>

The only hint I can see is the Error: starting deadline exceeded, what does this mean? I tried to configure the storage.s3.endpointUrl to AWS China specifically yet failed.
Is there anything I can leverage to troubleshoot?

Thanks!

1 Like

Hi @xiaoguang_zhang !
It seems that you are hitting this issue here: [K8SPSMDB-660] backup error - starting deadline exceeded - Percona JIRA
Could you please upgrade to 1.13.0 and see if you are getting the same.
1.13.0 was released last week and I expect that the helm chart will be available with this version in a week or so - so if you are only using helm chart you will maybe need to wait few days.
Please follow up if it solves your issue or not.
Thanks!

1 Like

Hi @Tomislav_Plavcic !
Thanks for your reply! I’ve tried the latest chart, the error still exists.
First I manually upgrade the CRD and the release, it turns out the backup task scheduler fails to update the cronjob. Then I reinstall it from operator and db, no cronjob created…
Interesting part is the log from operator prints out after I modified the task part, but the cronjob not created or updated.

2022-09-28T06:43:45.090Z        INFO    controller_psmdb        deleting outdated backup job    {"name": "daily"}
2022-09-28T06:47:36.276Z        INFO    controller_psmdb        Creating or updating backup job {"name": "daily-s3", "schedule": "0 3 * * *"}

Then I run a on-demand backup with following, and the error shows the same Error: starting deadline exceeded

cat <<EOF | kubectl apply -f-
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  name: test
spec:
  clusterName: xxx-psmdb-db
  storageName: s3-cn-northwest
EOF
1 Like

I see the operator 1.13.0 takes charge of the backup task instead of k8s cronjob, there is a pmsdb-backup created at the correct scheduled time yesterday.
Then, only timeout error remaining, anything I missed or is there a way to print more verbose logs? I’ve configured the LOG_LEVEL=DEBUG in env of the operator, but with no informative logs. Thanks!

1 Like

New update.

Sorry I misconfigured the s3 secret for the backup, now it’s all good. Thanks!

2 Likes