When using pbm sidecars and operator, what pod attempts backup restores?

Description:

I have a replicaset deployed into kube 1.26 using the operator and the PerconaServerMongoDB CR. The replicaset is healthy and the CR is configured to enable backups into s3. Each of the replicaset pods has its backup-agent container without erroring out. Backups are run and stored in s3 as scheduled.

S3 authentication is via a service account - I have kyverno automations to link the replicaset pods with a service account linked to an IAM role, and also to add this same role to the service account automatically created by the operator (mongodb-operator-psmbd-operator).

When I trigger a backup restore using a PerconaServerMongoDBRestore CR, it fails. I don’t see any new pods being created. If I inspect the logs for the operator pod as well as the backup agent on each of the replicaset, the only output log is for the operator:

set resync backup list from the store: init storage: get S3 object header: Forbidden: Forbidden\n\tstatus code: 403

full log: https://pastebin.com/suBY3a8k

I am completely at a loss of what exactly is attempting to contact s3 - as I said I’m inspecting the logs to all of the pods and the only output is that of the operator. It feels as if it’s the operator doing so and failing, but as I said it has the correct service account. I can confirm in cloudtrail that the s3 request is being made authenticated as the instance profile, instead of the role associated with the service account of all of the pods.

So currently since I don’t know what pod is trying to attempt to talk to S3 I can’t really debug let alone fix this access problem. Can you shed some light here?

Steps to Reproduce:

replicaset:

---
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb-replicaset
  finalizers:
    - delete-psmdb-pods-in-order

spec:
  image: "percona/percona-server-mongodb:4.2"
  imagePullPolicy: "Always"

  secrets:
    users: admin-users

  updateStrategy: SmartUpdate

  replsets:
    - name: my-replicaset
      size: 3

  backup:
    enabled: true
    image: "percona/percona-backup-mongodb:2.0.4"

    serviceAccountName: my-service-account

    tasks:
      - name: hourly
        enabled: true
        schedule: "3 * * * *"
        storageName: s3-london

    storages:
      s3-london:
        type: s3
        s3:
          region: eu-west-2
          bucket: my-bucket
          prefix: mongo_backups
          endpointUrl: s3.eu-west-2.amazonaws.com

backup restore:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: restore1
spec:
  clusterName: mongodb-replicaset
  backupName: cron-mongodb-replicas-xxxxxxx
  storageName: s3-london

Version:

Operator & chart 1.14
Mongo 4.2
PBM 2.0.4

Logs:

https://pastebin.com/suBY3a8k

Expected Result:

Backups should work. At least get an indication of which pod is trying to perform the restore and failing.

Actual Result:

Backup restore fails.

Additional Information:

~

Hi @Luis_Pabon !
If I’m not mistaking logical backups should be taken from secondary pods and restore should be done on the primary pod in replica.
Are you sure that your storageName and the actual storage used for backup are the same and that your storage credentials are correct?
Also maybe just try to remove the storageName just for the test.

Thank you Tomislav. Yes, I’m positive the storage name is right - my example above of the two CRs is pretty much verbatim with only a few edits to anonymise my customer.

If I open a shell into the backup-agent container on any of the pods, I can successfully use the pbm cli to list and restore backups, and I can see on the primary logs the restore happening.

It’s only when using the restore CR resource that it doesn’t work / is not clear who’s trying to contact s3.