When using pbm sidecars and operator, what pod attempts backup restores?

Luis_Pabon · August 22, 2023, 2:59pm

Description:

I have a replicaset deployed into kube 1.26 using the operator and the PerconaServerMongoDB CR. The replicaset is healthy and the CR is configured to enable backups into s3. Each of the replicaset pods has its backup-agent container without erroring out. Backups are run and stored in s3 as scheduled.

S3 authentication is via a service account - I have kyverno automations to link the replicaset pods with a service account linked to an IAM role, and also to add this same role to the service account automatically created by the operator (mongodb-operator-psmbd-operator).

When I trigger a backup restore using a PerconaServerMongoDBRestore CR, it fails. I don’t see any new pods being created. If I inspect the logs for the operator pod as well as the backup agent on each of the replicaset, the only output log is for the operator:

set resync backup list from the store: init storage: get S3 object header: Forbidden: Forbidden\n\tstatus code: 403

full log: https://pastebin.com/suBY3a8k

I am completely at a loss of what exactly is attempting to contact s3 - as I said I’m inspecting the logs to all of the pods and the only output is that of the operator. It feels as if it’s the operator doing so and failing, but as I said it has the correct service account. I can confirm in cloudtrail that the s3 request is being made authenticated as the instance profile, instead of the role associated with the service account of all of the pods.

So currently since I don’t know what pod is trying to attempt to talk to S3 I can’t really debug let alone fix this access problem. Can you shed some light here?

Steps to Reproduce:

replicaset:

---
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb-replicaset
  finalizers:
    - delete-psmdb-pods-in-order

spec:
  image: "percona/percona-server-mongodb:4.2"
  imagePullPolicy: "Always"

  secrets:
    users: admin-users

  updateStrategy: SmartUpdate

  replsets:
    - name: my-replicaset
      size: 3

  backup:
    enabled: true
    image: "percona/percona-backup-mongodb:2.0.4"

    serviceAccountName: my-service-account

    tasks:
      - name: hourly
        enabled: true
        schedule: "3 * * * *"
        storageName: s3-london

    storages:
      s3-london:
        type: s3
        s3:
          region: eu-west-2
          bucket: my-bucket
          prefix: mongo_backups
          endpointUrl: s3.eu-west-2.amazonaws.com

backup restore:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: restore1
spec:
  clusterName: mongodb-replicaset
  backupName: cron-mongodb-replicas-xxxxxxx
  storageName: s3-london

Version:

Operator & chart 1.14
Mongo 4.2
PBM 2.0.4

Logs:

https://pastebin.com/suBY3a8k

Expected Result:

Backups should work. At least get an indication of which pod is trying to perform the restore and failing.

Actual Result:

Backup restore fails.

Additional Information:

~

Tomislav_Plavcic · August 22, 2023, 3:37pm

Hi @Luis_Pabon !
If I’m not mistaking logical backups should be taken from secondary pods and restore should be done on the primary pod in replica.
Are you sure that your storageName and the actual storage used for backup are the same and that your storage credentials are correct?
Also maybe just try to remove the storageName just for the test.

Luis_Pabon · August 22, 2023, 4:23pm

Thank you Tomislav. Yes, I’m positive the storage name is right - my example above of the two CRs is pretty much verbatim with only a few edits to anonymise my customer.

If I open a shell into the backup-agent container on any of the pods, I can successfully use the pbm cli to list and restore backups, and I can see on the primary logs the restore happening.

It’s only when using the restore CR resource that it doesn’t work / is not clear who’s trying to contact s3.

tra_for · October 16, 2023, 10:33pm

I’m running into this issue, backup works fine, but when I try to restore through cr resource I get
init storage: get S3 object header: Forbidden: Forbidden. I made sure to have all the required permissions from the below document. The issue is happening with psmdb crVersion 1.14 & 1.15. Any help will be appreciated.

tra_for · January 24, 2024, 8:50pm

Hi all,

I’ve looked into this a bit more. Backup is using a k8s service account with proper AWS permissions and mogodb is set with requireTLS. The backup works fine, but restoring doesn’t seem to be using AWS Token from the backup agent. I can’t find any open issue for this. What I am missing here?

Slava_Sarzhan · January 25, 2024, 5:25pm

Hi @tra_for,

I have created EKS cluster and attached the following policies:

      iam:
        attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
        - arn:aws:iam::aws:policy/AmazonS3FullAccess

And I can’t reproduce the issue described by you

tra_for · January 25, 2024, 5:38pm

Hi @Slava_Sarzhan ,
do you know if these policies are documented somewhere? I can’t find it anywhere. Notice the backup.serviceAccountName: my-service-account, I expect that account to be used for restorations, I’m using the following permissions Overview - Percona Backup for MongoDB Adding full S3 access to the EKS cluster will fail security.

Slava_Sarzhan · January 26, 2024, 6:26pm

You are trying to use IRSA. PSMDB operator does not support it for now. We have a task about it IAM Roles for Service Accounts. Now you can use only IAM. You need to attach the IAM Policy to EKS cluster.

tra_for · January 26, 2024, 10:23pm

As previously stated by @Luis_Pabon. Restore can done through the backup agent cli until the mentioned issue gets resolve. Thanks @Slava_Sarzhan

Sumeet_Chaudhari · March 27, 2024, 3:20pm

Do we have an ETA for this [K8SPSMDB-921] - Percona JIRA

Topic		Replies	Views
Cannot restore backups on demand with pbmctl on PSMDB running on K8s Percona Server for MongoDB	4	989	December 2, 2019
Error when trying to restore - failed to ensure snapshot file .../cfg/metadata.json: get S3 object header: NoCredentialProviders Percona Operator for MongoDB	6	117	December 9, 2024
Unable to restore the PSMDB backup data on a new Kubernetes cluster Percona Operator for MongoDB	2	783	July 30, 2021
Error when trying to backup with IRSA in AWS EKS in 1.19 Percona Operator for MongoDB	6	61	February 5, 2025
How to setup AWS S3 Storage Backup using service account (IRSA) Percona Backup for MongoDB	7	1417	October 18, 2023