When using pbm sidecars and operator, what pod attempts backup restores?

Description:

I have a replicaset deployed into kube 1.26 using the operator and the PerconaServerMongoDB CR. The replicaset is healthy and the CR is configured to enable backups into s3. Each of the replicaset pods has its backup-agent container without erroring out. Backups are run and stored in s3 as scheduled.

S3 authentication is via a service account - I have kyverno automations to link the replicaset pods with a service account linked to an IAM role, and also to add this same role to the service account automatically created by the operator (mongodb-operator-psmbd-operator).

When I trigger a backup restore using a PerconaServerMongoDBRestore CR, it fails. I don’t see any new pods being created. If I inspect the logs for the operator pod as well as the backup agent on each of the replicaset, the only output log is for the operator:

set resync backup list from the store: init storage: get S3 object header: Forbidden: Forbidden\n\tstatus code: 403

full log: https://pastebin.com/suBY3a8k

I am completely at a loss of what exactly is attempting to contact s3 - as I said I’m inspecting the logs to all of the pods and the only output is that of the operator. It feels as if it’s the operator doing so and failing, but as I said it has the correct service account. I can confirm in cloudtrail that the s3 request is being made authenticated as the instance profile, instead of the role associated with the service account of all of the pods.

So currently since I don’t know what pod is trying to attempt to talk to S3 I can’t really debug let alone fix this access problem. Can you shed some light here?

Steps to Reproduce:

replicaset:

---
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb-replicaset
  finalizers:
    - delete-psmdb-pods-in-order

spec:
  image: "percona/percona-server-mongodb:4.2"
  imagePullPolicy: "Always"

  secrets:
    users: admin-users

  updateStrategy: SmartUpdate

  replsets:
    - name: my-replicaset
      size: 3

  backup:
    enabled: true
    image: "percona/percona-backup-mongodb:2.0.4"

    serviceAccountName: my-service-account

    tasks:
      - name: hourly
        enabled: true
        schedule: "3 * * * *"
        storageName: s3-london

    storages:
      s3-london:
        type: s3
        s3:
          region: eu-west-2
          bucket: my-bucket
          prefix: mongo_backups
          endpointUrl: s3.eu-west-2.amazonaws.com

backup restore:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: restore1
spec:
  clusterName: mongodb-replicaset
  backupName: cron-mongodb-replicas-xxxxxxx
  storageName: s3-london

Version:

Operator & chart 1.14
Mongo 4.2
PBM 2.0.4

Logs:

https://pastebin.com/suBY3a8k

Expected Result:

Backups should work. At least get an indication of which pod is trying to perform the restore and failing.

Actual Result:

Backup restore fails.

Additional Information:

~

Hi @Luis_Pabon !
If I’m not mistaking logical backups should be taken from secondary pods and restore should be done on the primary pod in replica.
Are you sure that your storageName and the actual storage used for backup are the same and that your storage credentials are correct?
Also maybe just try to remove the storageName just for the test.

Thank you Tomislav. Yes, I’m positive the storage name is right - my example above of the two CRs is pretty much verbatim with only a few edits to anonymise my customer.

If I open a shell into the backup-agent container on any of the pods, I can successfully use the pbm cli to list and restore backups, and I can see on the primary logs the restore happening.

It’s only when using the restore CR resource that it doesn’t work / is not clear who’s trying to contact s3.

I’m running into this issue, backup works fine, but when I try to restore through cr resource I get
init storage: get S3 object header: Forbidden: Forbidden. I made sure to have all the required permissions from the below document. The issue is happening with psmdb crVersion 1.14 & 1.15. Any help will be appreciated.

Hi all,

I’ve looked into this a bit more. Backup is using a k8s service account with proper AWS permissions and mogodb is set with requireTLS. The backup works fine, but restoring doesn’t seem to be using AWS Token from the backup agent. I can’t find any open issue for this. What I am missing here?

Hi @tra_for,

I have created EKS cluster and attached the following policies:

      iam:
        attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
        - arn:aws:iam::aws:policy/AmazonS3FullAccess

And I can’t reproduce the issue described by you :frowning:

Hi @Slava_Sarzhan ,
do you know if these policies are documented somewhere? I can’t find it anywhere. Notice the backup.serviceAccountName: my-service-account, I expect that account to be used for restorations, I’m using the following permissions Overview - Percona Backup for MongoDB Adding full S3 access to the EKS cluster will fail security.

You are trying to use IRSA. PSMDB operator does not support it for now. We have a task about it IAM Roles for Service Accounts. Now you can use only IAM. You need to attach the IAM Policy to EKS cluster.

1 Like

As previously stated by @Luis_Pabon. Restore can done through the backup agent cli until the mentioned issue gets resolve. Thanks @Slava_Sarzhan

Do we have an ETA for this [K8SPSMDB-921] - Percona JIRA