Operator doesn't apply the provided service account to the backup containers

Description:

  • Operator installed via the official helm charts
  • replicaset configuration given via helm values, including backup config

When inspecting the actual replicaset pods, the serviceAccountName property is default.

Furthermore it’s unclear how this is all supposed to work, since in config the property is backup.serviceAccountName yet service accounts are pod level, not container-level.

Steps to Reproduce:

Install helm charts above. For psmbd-db, this is the backup values

backup:
  enabled: true
  serviceAccountName: s3-rw

  pitr:
    enabled: false
  
  tasks:
    - name: once_daily
      enabled: true
      schedule: "17 22 * * *"
      storageName: s3-london
  
  storages:
    s3-london:
      type: s3
      s3:
        provider: s3
        region: eu-west-2
        bucket: my_bucket
        prefix: mongodb_eks
        endpointUrl: s3.eu-west-2.amazonaws.com

Version:

  • psmbd-operator chart 1.14.2
  • psmbd-db chart 1.14.3

Logs:

The charts generate the following Percona crd:

# Source: mongodb/charts/replicaset/templates/cluster.yaml
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDB"}
  name: mongodb-replicaset
  labels:
    app.kubernetes.io/name: replicaset
    helm.sh/chart: replicaset-1.14.3
    app.kubernetes.io/instance: mongodb
    app.kubernetes.io/version: "1.14.0"
    app.kubernetes.io/managed-by: Helm
  finalizers:
    - delete-psmdb-pods-in-order
spec:
  crVersion: 1.14.0
  pause: false
  unmanaged: false
  image: "percona/percona-server-mongodb:4.2"
  imagePullPolicy: "Always"
  multiCluster:
    enabled: false
  secrets:
    users: admin-users
  updateStrategy: SmartUpdate
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: disabled
    schedule: 0 2 * * *
    setFCV: false
  pmm:
    enabled: false
    image: "percona/pmm-client:2.35.0"
    serverHost: monitoring-service
  replsets:
  - name: my-rs
    size: 3
    affinity:
      advanced:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/replset: my-rs
            topologyKey: topology.kubernetes.io/zone
          - labelSelector:
              matchLabels:
                app.kubernetes.io/component: mongod
                app.kubernetes.io/instance: mongodb-replicaset
                app.kubernetes.io/managed-by: percona-server-mongodb-operator
                app.kubernetes.io/name: percona-server-mongodb
                app.kubernetes.io/part-of: percona-server-mongodb
                app.kubernetes.io/replset: my-rs
            topologyKey: kubernetes.io/hostname
    nodeSelector:
      my: selector
    resources:
      null
    volumeSpec:
      persistentVolumeClaim:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi

  sharding:
    enabled: false
    # edited rest ouf sharding config out


  backup:
    enabled: true
    image: "percona/percona-backup-mongodb:2.0.4"
    serviceAccountName: s3-rw
    storages:
      s3-london:
        s3:
          bucket: my_bucket
          endpointUrl: s3.eu-west-2.amazonaws.com
          prefix: mongodb_eks
          provider: s3
          region: eu-west-2
        type: s3
    pitr:
      enabled: false
    tasks:
      - enabled: true
        name: once_daily
        schedule: "17 22 * * *"
        storageName: s3-london

Expected Result:

Not exactly sure since the backup-agent is a container within the replicaset pods, and the service account name is a pod-level (not container-level) property.

Actual Result:

On the replicaset pod, the backup-agent container is logging the following a few times, then dies, restarting the whole pod with it:

2023-08-08T12:56:40.000+0000 E [agentCheckup] check storage connection: storage check failed with: get S3 object header: Forbidden: Forbidden
	status code: 403, request id: XXXXXXXXXXXXX, host id: YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

I can see on cloudtrail the denied operations the backup agent is trying, and indeed it’s falling back to the ec2 instance profile role, instead of using the service account we provided in config.

Inspecting the replicaset pods, they indeed have a serviceAccountName: default on them, which is consistent with this behaviour.

For the benefit of someone stumbling upon this thread, my current workaround is via a ClusterPolicy using kyverno. The helm hook is to ensure it installs before anything else, so when the operator spawns a statefulset kyverno is ready.

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: s3-access-service-account-stateful-set
  annotations:
    policies.kyverno.io/title: Add s3-access service account to operator StatefulSet
    policies.kyverno.io/subject: StatefulSet
    policies.kyverno.io/minversion: 1.6.0
    policies.kyverno.io/description: >-
      Due to an issue with the percona operator, we can't set the correct service account to the
      pods of the stateful set that make the replica set in order to get s3 access.
      This automation will mutate operator stateful sets.
    "helm.sh/hook": pre-install,pre-upgrade
spec:
  rules:
    - name: percona-operator-stateful-sets
      match:
        any:
          - resources:
              kinds:
                - StatefulSet
              labels:
                app.kubernetes.io/managed-by: percona-server-mongodb-operator
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                serviceAccount: my-service-account
                serviceAccountName: my-service-account