Error while creating backup: failed to find CERTIFICATE

While creating a normal logical backup of a 1.3TB replica, its state changes to errored after about 16 hours of run with the following:

check for concurrent jobs: getting pbm object: create PBM connection to mongo-rs0-2.mongo-rs0.mongo.svc.cluster.local:27017,mongo-rs0-0.mongo-rs0.mongo.svc.cluster.local:27017,mongo-rs0-1.mong-rs0.mongo.svc.cluster.local:27017: create mongo connection: create mongo client: failed to find CERTIFICATE

The backup job however still progresses and completes, but I’m unsure as to the validity of the backup it creates (and the fact that I doubt it’s possible to restore from that unless manually changing its status to success)

Operator 1.13.0

1 Like

+1 same issue, but with a much lighter base (1GB)

create pbm object: create PBM connection to prod-mongodb-rs0-0.prod-mongodb-rs0.mongo.svc.cluster.local:27017,prod-mongodb-rs0-1.prod-mongodb-rs0.mongo.svc.cluster.local:27017,prod-mongodb-rs0-2.prod-mongodb-rs0.mongo.svc.cluster.local:27017: create mongo connection: create mongo client: failed to find CERTIFICATE

PSMDB Operator 1.14.0

Would you need any additional info, please ask

I think it’s possible for it to be related to this, and as @Sergey_Pronin says, the backups are good but indicating a wrong failed status. I haven’t tried restoring from those failed ones though.

Hi there,

Same problem here and in my case, it’s not related to this. I think it is a problem with deploy certificates, i have two environments and i’ve done an operator upgrade from 1.9.0 to 1.14.0, in development it is working as expected, i got backups in S3 but in production, i got this error. Comparing both environments, i’ve only seen that the certificates doesn’t upgrade properly like in development, i say this because in ArgoCD the objects of each app are different, not the same result. In development, i see two secret objects for certificates but they aren’t in production so i’m guessing something is happening there.

Cheers.

@Semantic @aporrinali is it happening to any backup resource?

Hello @Sergey_Pronin!

What do you mean by a resource in this case?

@Semantic I’m talking about psmdb-backup. In other words - if you try manual backup through creating a psmdb-backup resource - is it always erroring out or there is a chance that it goes through?

Correct my backups are only done manually at the moment. Some clusters happen to work, some don’t even though I retried them a couple times.

The problem persists with operator 1.15.0 and pbm 2.3.0 - at least for manual backups.
The backup container completed the backup but backup object is in failed state.

Trying cron backups next.

hi all, i also got this error on the PSMDB-backup resource but seeing the mongod backup process seem that it’s ok. I run 1.15 operator and mongodb backup 2.3.0

Hi,
Same issue in my case, after two hours backup was successfully created and saved to S3 storage but Backup resource was switched to the error state after ~30 minutes.

Operator image: percona/percona-server-mongodb-operator:1.15.0
Backup image: perconalab/percona-server-mongodb-operator:main-backup

Backup resource:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  creationTimestamp: '2024-04-16T00:00:00Z'
  finalizers:
    - delete-backup
  generateName: cron-mongodb-20240416000000-
  generation: 1
  labels:
    ancestor: daily-minio
    cluster: mongodb
    type: cron
  name: cron-mongodb-20240416000000-kt5d7
  namespace: mongodb
  resourceVersion: '500059694'
  uid: ef16e0a0-91c5-4e3e-a363-267446d37328
spec:
  clusterName: mongodb
  compressionLevel: 6
  compressionType: gzip
  storageName: minio
  type: logical
status:
  destination: s3://mongodb-backups/2024-04-16T00:00:21Z
  error: >-
    create pbm object: create PBM connection to
    mongodb-rs-0.mongodb-rs.mongodb.svc.cluster.local:27017,mongodb-rs-1.mongodb-rs.mongodb.svc.cluster.local:27017,mongodb-rs-2.mongodb-rs.mongodb.svc.cluster.local:27017:
    create mongo connection: create mongo client: failed to find CERTIFICATE
  lastTransition: '2024-04-16T00:00:23Z'
  pbmName: '2024-04-16T00:00:21Z'
  pbmPod: mongodb-rs-2
  replsetNames:
    - rs
  s3:
    bucket: mongodb-backups
    credentialsSecret: mongodb-backup-s3
    endpointUrl: https://s3-endpoint:9000
    serverSideEncryption: {}
  start: '2024-04-16T00:00:21Z'
  state: error
  storageName: minio
  type: logical

As mentioned previously, those backups actually succeed but are marked with a false-positive error state. I believe that since a real solution would involve codebase changes to this certificate error from surfacing in the first place, here’s a not-so-ideal one, but one you can use right now:

You should install the edit-status kube plugin and edit the state of whatever backup you’d like to use.

kubectl edit-status psmdb-backup NAME

Update the status.state field from error to ready, which would allow you to use it for a restore.

Hi, as the certificate issue is still persistent, are there any hopes of having this topic on the roadmap for any of the upcoming releases?

Bump, does anyone look into that? Backups are unusable in the current state :confused:

I had to write a python service to extract backup statuses directly from pbm-container via command pbm status -o json, and based on that update Backup resources states in Kubernetes. Tricky, but already works for half a year.