Error while creating backup: failed to find CERTIFICATE

Semantic · June 26, 2023, 2:10am

While creating a normal logical backup of a 1.3TB replica, its state changes to errored after about 16 hours of run with the following:

check for concurrent jobs: getting pbm object: create PBM connection to mongo-rs0-2.mongo-rs0.mongo.svc.cluster.local:27017,mongo-rs0-0.mongo-rs0.mongo.svc.cluster.local:27017,mongo-rs0-1.mong-rs0.mongo.svc.cluster.local:27017: create mongo connection: create mongo client: failed to find CERTIFICATE

The backup job however still progresses and completes, but I’m unsure as to the validity of the backup it creates (and the fact that I doubt it’s possible to restore from that unless manually changing its status to success)

Operator 1.13.0

aporrinali · July 13, 2023, 10:34am

+1 same issue, but with a much lighter base (1GB)

create pbm object: create PBM connection to prod-mongodb-rs0-0.prod-mongodb-rs0.mongo.svc.cluster.local:27017,prod-mongodb-rs0-1.prod-mongodb-rs0.mongo.svc.cluster.local:27017,prod-mongodb-rs0-2.prod-mongodb-rs0.mongo.svc.cluster.local:27017: create mongo connection: create mongo client: failed to find CERTIFICATE

PSMDB Operator 1.14.0

Would you need any additional info, please ask

Semantic · July 14, 2023, 1:46am

I think it’s possible for it to be related to this, and as @Sergey_Pronin says, the backups are good but indicating a wrong failed status. I haven’t tried restoring from those failed ones though.

ki0 · July 26, 2023, 11:10am

Hi there,

Same problem here and in my case, it’s not related to this. I think it is a problem with deploy certificates, i have two environments and i’ve done an operator upgrade from 1.9.0 to 1.14.0, in development it is working as expected, i got backups in S3 but in production, i got this error. Comparing both environments, i’ve only seen that the certificates doesn’t upgrade properly like in development, i say this because in ArgoCD the objects of each app are different, not the same result. In development, i see two secret objects for certificates but they aren’t in production so i’m guessing something is happening there.

Cheers.

Sergey_Pronin · August 3, 2023, 8:28am

@Semantic @aporrinali is it happening to any backup resource?

Semantic · August 5, 2023, 4:31pm

Hello @Sergey_Pronin!

What do you mean by a resource in this case?

Sergey_Pronin · August 7, 2023, 7:25am

@Semantic I’m talking about psmdb-backup. In other words - if you try manual backup through creating a psmdb-backup resource - is it always erroring out or there is a chance that it goes through?

Semantic · August 7, 2023, 5:53pm

Correct my backups are only done manually at the moment. Some clusters happen to work, some don’t even though I retried them a couple times.

Nils_Dehn · November 2, 2023, 9:35pm

The problem persists with operator 1.15.0 and pbm 2.3.0 - at least for manual backups.
The backup container completed the backup but backup object is in failed state.

Trying cron backups next.

Claudio_Bisegni · November 9, 2023, 5:10pm

hi all, i also got this error on the PSMDB-backup resource but seeing the mongod backup process seem that it’s ok. I run 1.15 operator and mongodb backup 2.3.0

m.vasilenko · April 16, 2024, 6:34am

Hi,
Same issue in my case, after two hours backup was successfully created and saved to S3 storage but Backup resource was switched to the error state after ~30 minutes.

Operator image: percona/percona-server-mongodb-operator:1.15.0
Backup image: perconalab/percona-server-mongodb-operator:main-backup

Backup resource:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  creationTimestamp: '2024-04-16T00:00:00Z'
  finalizers:
    - delete-backup
  generateName: cron-mongodb-20240416000000-
  generation: 1
  labels:
    ancestor: daily-minio
    cluster: mongodb
    type: cron
  name: cron-mongodb-20240416000000-kt5d7
  namespace: mongodb
  resourceVersion: '500059694'
  uid: ef16e0a0-91c5-4e3e-a363-267446d37328
spec:
  clusterName: mongodb
  compressionLevel: 6
  compressionType: gzip
  storageName: minio
  type: logical
status:
  destination: s3://mongodb-backups/2024-04-16T00:00:21Z
  error: >-
    create pbm object: create PBM connection to
    mongodb-rs-0.mongodb-rs.mongodb.svc.cluster.local:27017,mongodb-rs-1.mongodb-rs.mongodb.svc.cluster.local:27017,mongodb-rs-2.mongodb-rs.mongodb.svc.cluster.local:27017:
    create mongo connection: create mongo client: failed to find CERTIFICATE
  lastTransition: '2024-04-16T00:00:23Z'
  pbmName: '2024-04-16T00:00:21Z'
  pbmPod: mongodb-rs-2
  replsetNames:
    - rs
  s3:
    bucket: mongodb-backups
    credentialsSecret: mongodb-backup-s3
    endpointUrl: https://s3-endpoint:9000
    serverSideEncryption: {}
  start: '2024-04-16T00:00:21Z'
  state: error
  storageName: minio
  type: logical

Semantic · May 20, 2024, 5:17pm

As mentioned previously, those backups actually succeed but are marked with a false-positive error state. I believe that since a real solution would involve codebase changes to this certificate error from surfacing in the first place, here’s a not-so-ideal one, but one you can use right now:

You should install the edit-status kube plugin and edit the state of whatever backup you’d like to use.

kubectl edit-status psmdb-backup NAME

Update the status.state field from error to ready, which would allow you to use it for a restore.

Bartosz_Dabrowski · June 27, 2024, 6:09am

Hi, as the certificate issue is still persistent, are there any hopes of having this topic on the roadmap for any of the upcoming releases?

Bartosz_Dabrowski · November 28, 2024, 8:29am

Bump, does anyone look into that? Backups are unusable in the current state

m.vasilenko · November 28, 2024, 8:49am

I had to write a python service to extract backup statuses directly from pbm-container via command pbm status -o json, and based on that update Backup resources states in Kubernetes. Tricky, but already works for half a year.

radoslaw.szulgo · January 20, 2025, 9:06am

Hey all,
thank you for your patience. I’ve submitted a Jira ticket for that issue:
https://perconadev.atlassian.net/browse/K8SPSMDB-1263

Hope our team can address that in the next version.

Adhiraj_Singh · January 22, 2025, 10:45am

Got the same issue, preventing us from rolling Percona MongoDB into production. Is there a temporary workaround? (Using the latest 1.19.0 operator)

This doesn’t seem to work, because the backup metadata isn’t uploaded to S3

radoslaw.szulgo · January 29, 2025, 2:37pm

We plan to fix it in 1.20 version of operator.

Slava_Sarzhan · February 27, 2025, 8:56pm

@Adhiraj_Singh @Semantic @m.vasilenko @Bartosz_Dabrowski @Semantic @Nils_Dehn @ki0

We are working on fixing this issue: PR. The main challenge is that we haven’t been able to reproduce it on our end.

Could you help us test our fix? You can use the main branch for testing, but please avoid using it for production traffic, only for testing purposes.

Let us know if you encounter any issues or need assistance. Thanks!

Topic		Replies	Views
When I turn on the backup option, operator reports an error：error occured during connection handshake: x509 Percona Operator for MongoDB	10	1454	February 3, 2023
MongoDB Restore Failing Percona Operator for MongoDB percona , mongodb , kubernetes	1	152	January 26, 2025
Failed to perform backup to S3 on Percona Operator for MongoDB Percona Operator for MongoDB percona , mongodb	5	1517	May 14, 2024
Crash restore causes problem with system accounts/certificates Percona Operator for MongoDB	3	1418	March 21, 2022
Restore to new environment with Operator 1.16.0 and mongo v7 Percona Operator for MongoDB mongodb , psmdb-operator	6	278	July 4, 2025

Error while creating backup: failed to find CERTIFICATE

Related topics