Error when trying to backup with IRSA in AWS EKS in 1.19

Hello,

following the instructions in Configure storage for backups - Percona Operator for MongoDB I’m failing to backup the cluster to an S3 bucket.

As explained in the documentation, I configured the IAM’s role trust relationship with the EKS cluster allowing s3:* to the bucket and annotated both the PSMDB operator’s and cluster’s service accounts.

The related section of the psmdb object’s spec looks like this:

Spec:
  Backup:
    Enabled:  true
    Image:    percona/percona-backup-mongodb:2.8.0-multi
    Pitr:
      Compression Type:  gzip
      Enabled:           false
      Oplog Only:        true
      Oplog Span Min:    10
    Storages:
      aws-s3:
        s3:
          Bucket:  <redacted bucket>
          Region:  eu-west-1
        Type:      s3
    Tasks:
      Compression Type:  gzip
      Enabled:           true
      Keep:              3
      Name:              daily-s3
      Schedule:          20 15 * * *
      Storage Name:      aws-s3
      Type:              physical

When trying to backup I get this message in the backup-agent container log:

2025-02-04T15:20:52.000+0000 E [agentCheckup] check storage connection: storage check failed with: file stat: get S3 object header: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors

As you can see here I got the correct environment variables set:

$ k exec -it test-mongodb-shard01-2 -c backup-agent -- sh
sh-4.4$ printenv | grep AWS_
AWS_DEFAULT_REGION=eu-west-1
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_REGION=eu-west-1
AWS_ROLE_ARN=arn:aws:iam::<redacted>:role/s3-backup-role
AWS_STS_REGIONAL_ENDPOINTS=regional

The AWS_WEB_IDENTITY_TOKEN_FILE JWT token seems to be correct, and if I try to run a debug pod with the aws cli and the same service account it correctly reaches the bucket.

Thanks!

Hello, Hipos! Thank you for your report.
Could you also provide the env variables in operator pod? Do you have AWS_ROLE_ARN there?
Could you please share operator logs?

Hello Natalia,

thank you.

I can confirm it has the same environment variables:

$ k -n psmdb exec -it psmdb-operator-7bd5fbf54d-vzggf -- sh
sh-5.1$ printenv | grep AWS_
AWS_DEFAULT_REGION=eu-west-1
AWS_REGION=eu-west-1
AWS_ROLE_ARN=arn:aws:iam::<redacted>:role/s3-backup-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_STS_REGIONAL_ENDPOINTS=regional

Here are the relevant PSMDB operator logs I get when I set up the cron of the task to run at the next minute:

2025-02-04T16:20:00.196Z	DEBUG	checking if backup is allowed	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "38d44d4d-7359-40c0-8853-7fe2519081e8", "cluster": "test-mongodb", "namespace": "test-dev"}
2025-02-04T16:20:00.196Z	DEBUG	Checking for active jobs	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "38d44d4d-7359-40c0-8853-7fe2519081e8", "currentJob": {"Name":"cron-test-mongodb-20250204162000-9qlvk","Type":0}}
2025-02-04T16:20:10.301Z	INFO	Starting backup	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "38d44d4d-7359-40c0-8853-7fe2519081e8", "backup": "cron-test-mongodb-20250204162000-9qlvk", "storage": "s3"}
2025-02-04T16:20:10.301Z	INFO	Setting PBM config	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "38d44d4d-7359-40c0-8853-7fe2519081e8", "cluster": "test-mongodb"}
2025-02-04T16:20:21.311Z	INFO	Sending backup command	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "38d44d4d-7359-40c0-8853-7fe2519081e8", "backup": "cron-test-mongodb-20250204162000-9qlvk", "storage": "s3", "backupCmd": "backup [name: 2025-02-04T16:20:21Z, compression: gzip (level: default)] <ts: 0>"}
2025-02-04T16:20:21.326Z	INFO	Backup state changed	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "38d44d4d-7359-40c0-8853-7fe2519081e8", "previous": "", "current": "requested"}
2025-02-04T16:20:21.449Z	DEBUG	checking if backup is allowed	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "88b2377e-c389-45f4-8b1b-ff50d0bdd2ad", "cluster": "test-mongodb", "namespace": "test-dev"}
2025-02-04T16:20:21.449Z	DEBUG	Checking for active jobs	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "88b2377e-c389-45f4-8b1b-ff50d0bdd2ad", "currentJob": {"Name":"cron-test-mongodb-20250204162000-9qlvk","Type":0}}
2025-02-04T16:20:26.702Z	DEBUG	backupStatus	Got backup meta	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "88b2377e-c389-45f4-8b1b-ff50d0bdd2ad", "backup": "cron-test-mongodb-20250204162000-9qlvk", "pbmName": "2025-02-04T16:20:21Z", "meta": {"type":"physical","opid":"67a23e450024dc57b17fdd61","name":"2025-02-04T16:20:21Z","shardRemap":{"cfg":"config"},"replsets":[],"compression":"gzip","store":{"type":"s3","s3":{"region":"eu-west-1","endpointUrl":"","forcePathStyle":true,"bucket”:”“<redacted>,”prefix":"percona","maxUploadParts":10000,"storageClass":"STANDARD","insecureSkipTLSVerify":false}},"size":0,"mongodb_version":"6.0.19-16","fcv":"6.0","start_ts":1738686021,"last_transition_ts":1738686021,"first_write_ts":{"T":1,"I":1},"last_write_ts":{"T":1,"I":1},"hb":{"T":1738686021,"I":10},"status":"error","conditions":[{"timestamp":1738686021,"status":"starting"},{"timestamp":1738686021,"status":"error","error":"no available agent(s) on replsets: cfg, shard01"}],"n":[],"error":"no available agent(s) on replsets: cfg, shard01","pbm_version":"2.8.0","balancer":"full"}}
2025-02-04T16:20:26.706Z	INFO	Backup state changed	{"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"cron-test-mongodb-20250204162000-9qlvk","namespace":"test-dev"}, "namespace": "test-dev", "name": "cron-test-mongodb-20250204162000-9qlvk", "reconcileID": "88b2377e-c389-45f4-8b1b-ff50d0bdd2ad", "previous": "requested", "current": "error"}

Just for the record, if I try this I can access the bucket:

$ k run -it debug --image=amazon/aws-cli --overrides='{ "spec": { "serviceAccount": “test-backup" }  }' s3 ls s3://<bucket>/
If you don't see a command prompt, try pressing enter.
warning: couldn't attach to pod/debug, falling back to streaming logs: Internal error occurred: unable to upgrade connection: container debug not found in pod debug_test-dev
                           PRE mongodb/
                           PRE percona/
                           PRE velero/

$ k -n psmdb run -it debug --image=amazon/aws-cli --overrides='{ "spec": { "serviceAccount": "psmdb-operator" }  }' s3 ls s3://<bucket>/
If you don't see a command prompt, try pressing enter.
warning: couldn't attach to pod/debug, falling back to streaming logs: Internal error occurred: unable to upgrade connection: container debug not found in pod debug_psmdb
                           PRE mongodb/
                           PRE percona/
                           PRE velero/

I will recheck the issue and return to you.

Hi, Hipos!
Do you use sharded cluster? Could you confirm that you have AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE variables in config pods in this case too?
If yes and you run demand backup, could you please provide with configuration backup.yaml that you run.

Hi Natalia,

I have to apologize, it wasn’t a Percona’s issue: there was a network policy in Kubernetes blocking the pods to access the Internet. Once removed it worked nicely. I found it by debugging the network using tcpdump.

Thanks for your time and patience.

1 Like

@hipos glad you figured it out! Let me know if there is anything I can help with.