Error in backup-agent container: failed to refresh cached credentials

Hello,
We have a problem with making backups in our mongodb cluster.
We updated the operator from version 1.15 to 1.21.1 and now we are getting a error every 5 second in the backup_agent container.Percona Server for MongoDB 8.0.12-4
We are using Percona Backup for MongoDB version 2.11.0 and are running our mongodb cluster in kubernetes.

This is the error that we get:

2025-11-05T10:02:44.000+0000 E [agentCheckup] check storage connection: storage check failed with: file stat: list with parts for mw file stat op: fetching pbm file parts base: get S3 object header: operation error S3: HeadObject, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded

Does anybody have an idea what the problem could be?
Let me know if i forgot some information about the problem.
I added part of the cr.yaml (backup part) en the externalsecrets.yaml.

---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: **********************   
spec:
  refreshInterval: 60s
  secretStoreRef:
    kind: SecretStore
    name: mongodb
  target:
    name: ********************** 
  data:
  - secretKey: AWS_ACCESS_KEY_ID
    remoteRef:
      key: ********************** 
      property: **********************
  - secretKey: AWS_ACCESS_SECRET_KEY
    remoteRef:
      key: ********************** 
      property: **********************

    storages:
      s3-***:                                                              
        type: s3
        s3:
          bucket: ****                                                  
          region: ****                                                 
          prefix: ****/****/****   
          credentialsSecret: **********************              
          endpointUrl: **********************                 
          insecureSkipTLSVerify: true
    pitr:
      enabled: true
      oplogOnly: false
      compressionType: gzip
      compressionLevel: 1
    tasks:                                                                 
    - name: daily-backup-mon-sat
      enabled: true
      schedule: "0 22 * * 1-6"
      retention:
        count: 12
        type: count
        deleteFromStorage: true
      storageName: ****
      compressionType: gzip
      compressionLevel: 1
    - name: weekly-backup-sun
      enabled: true
      schedule: "0 22 * * 0"
      retention:
        count: 3
        type: count
      storageName: ****
      compressionType: gzip
      compressionLevel: 1


Hi, Does anyone have an idea what the problem could be :slightly_smiling_face:?

We encounter the same issue as well. Is this a bug in the new operator version 1.21.1, or is it a configuration error on our side?

Other backup tools have no problem accessing our on-premises S3 storage. We also didn’t have such errors when we were running MongoDB operator v1.15.

Hey folks, may I ask if you are using the actual s3 or any s3-compatible storage?

Hey Ege_Gunes, thanks for reaching out!

I use S3 compatible storage (an on-premises NetApp storage system), not the actual Amazon S3 storage.

I asked it because PBM v2.12.0 added a new storage type called minio to work better with s3-compatible storages: Percona Backup for MongoDB 2.12.0 (2025-11-04) - Percona Backup for MongoDB

Unfortunately Operator v1.21.1 doesn’t support PBM v2.12.0.

If you have a test environment, you can try downgrading PBM to v2.9.1 instead to see if it solves the issue. Since we didn’t test this downgrade, I don’t recommend doing it on a production cluster.

Thanks for your reply! Next week my colleague is back from holiday, we will then further test and investigate this.

Thanks for your suggestion Ege_Gunes! Downgrading from PBM 2.11.0 to PBM 2.9.1 did in fact resolve this issue! We can now back up our MongoDB stuff as usual, just like before :slight_smile:

We will await the release of the new MongoDB operato version with PBM 2.12.0. We will then test with the new minio options for configuring the backups.

Thank you for this solution it did work. We will wait for the new operator version and then again test our backup configuration.

Hi all,

I wanted to share a related finding that may help others troubleshooting backup-agent credential issues after backup storage credential changes.

In our case, we are using:

  • psmdb-db Helm chart: 1.21.2
  • psmdb-operator Helm chart: 1.21.3
  • PBM image: percona/percona-backup-mongodb:2.11.0
  • Backup storage: GCS
  • Credentials provided via Kubernetes Secret using:
    • GCS_CLIENT_EMAIL
    • GCS_PRIVATE_KEY

The setup had been working fine and backups were succeeding daily. The issue started after rotating the Google service account key used for GCS backups. We updated the Kubernetes Secret with the new private key and deleted the old key in GCP. After that, all PBM agents started failing with:

2026-05-12T08:19:59.000+0000 E [agentCheckup] check storage connection: storage check failed with: file stat: list with parts for mw file stat op: fetching pbm file parts base: get properties: Get "http
s://storage.googleapis.com/storage/v1/b/<gcs-backup-bucket>/o/mongodb%2F.pbm.init?alt=json&prettyPrint=false&projection=full": auth: cannot fetch token: 400
Response: {"error":"invalid_grant","error_description":"Invalid JWT Signature."}

At first we validated the Kubernetes Secret and confirmed the new private key was syntactically valid using openssl. We also restarted the MongoDB StatefulSet, but the backup-agent continued to fail.

The important finding was that PBM was not actually using the new key from the Kubernetes Secret. The Kubernetes Secret had the updated key, but the active PBM configuration stored in MongoDB still contained the old private key.

We confirmed this by checking the PBM config document in MongoDB. In our environment, the config was stored in:

admin.pbmConfig

and the relevant path was:

storage.gcs.credentials.privateKey

The PBM config fingerprint matched the old key, while the Kubernetes Secret matched the new key. That explained why Google kept returning “Invalid JWT Signature”: PBM was still signing JWTs using the old private key, which had already been deleted in GCP.

The temporary fix was to explicitly re-apply the PBM storage config using pbm config --file, generating the config from the current Kubernetes Secret values. After doing that, pbm status recovered and the backup-agent stopped failing.

This is a temporary/manual workaround, but it helped us restore backups without changing the PBM image version.