Percona postgres values.yaml configuration for pgbackrest backup to aws s3 bucket

Dear Community,

My problem is that I want to create a full backup into an aws s3 bucket when initialize the db. I created a values in which I was able to deploy a chart with helm successfully, however the default pgbackrest backup created differential backup into the repo2 named aws s3 bucket and i cannot restore the db from that. (I know this because later I synced the repo1 to the bucket and then the restore was successful.)

Please find my values.yaml configuration below:

pg-db:
  enabled: true
  finalizers:
  # Set this if you want that operator deletes the PVCs on cluster deletion
    - percona.com/delete-pvc
  # Set this if you want that operator deletes the ssl objects on cluster deletion
  #  - percona.com/delete-ssl
  crVersion: 2.2.0
  repository: percona/percona-postgresql-operator
  image: "" # perconalab/percona-postgresql-operator:main-ppg16-postgres
  imagePullPolicy: Always
  postgresVersion: 15
  # port: 5432
  pause: false
  unmanaged: false
  standby:
    enabled: false
    # host: "<primary-ip>"
    # port: "<primary-port>"
    # repoName: repo1
  # customTLSSecret:
  #   name: ""
  customReplicationTLSSecret:
    name: ""
  openshift: false
  users:
    - name: postgres
      databases:
        - postgres
      options: "SUPERUSER"
      password:
        type: ASCII
      secretName: "pg-db-postgres-secret"
  # databaseInitSQL:
  #   key: init.sql
  #   name: cluster1-init-sql
  # dataSource:
  #   postgresCluster:
  #     clusterName: test-pg-db
  #     repoName: repo2
  #     # options:
  #     # - --type=time
  #     # - --target="2024-03-04 14:15:11-04"
  #   pgbackrest:
  #     stanza: db
  #     configuration:
  #       secret:
  #         name: pg-db-pgbackrest-secret
  #     # global:
  #     #   repo2-path: /pgbackrest/postgres-operator/hippo/repo2
  #   repo:
  #     name: repo2
  #     s3:
  #       bucket: "test-db-backup-bucket"
  #       endpoint: "s3.ca-central-1.amazonaws.com"
  #       region: "ca-central-1"
  expose:
  #   annotations:
  #     my-annotation: value1
  #   labels:
  #     my-label: value2
    type: NodePort # LoadBalancer
  #   loadBalancerSourceRanges:
  #     - 10.0.0.0/8
  instances:
  - name: test-pg
    replicas: 2
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            labelSelector:
              matchLabels:
                postgres-operator.crunchydata.com/data: postgres
            topologyKey: kubernetes.io/hostname
    # resources:
    #   limits:
    #     cpu: 2
    #     memory: 4Gi
    # sidecars:
    # - name: testcontainer
    #   image: mycontainer1:latest
    # - name: testcontainer2
    #   image: mycontainer1:latest
  #  topologySpreadConstraints:
  #    - maxSkew: 1
  #      topologyKey: my-node-label
  #      whenUnsatisfiable: DoNotSchedule
  #      labelSelector:
  #        matchLabels:
  #          postgres-operator.crunchydata.com/instance-set: instance1
  #  tolerations:
  #  - effect: NoSchedule
  #    key: role
  #    operator: Equal
  #    value: connection-poolers
  #  priorityClassName: high-priority
  #  walVolumeClaimSpec:
  #    accessModes:
  #    - "ReadWriteOnce"
  #    resources:
  #      requests:
  #        storage: 1Gi
    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
  proxy:
    pgBouncer:
      replicas: 1
      image: "" # perconalab/percona-postgresql-operator:main-ppg16-pgbouncer
  #    exposeSuperusers: true
  #    resources:
  #      limits:
  #        cpu: 200m
  #        memory: 128Mi
  #      expose:
  #        annotations:
  #          my-annotation: value1
  #        labels:
  #          my-label: value2
  #        type: LoadBalancer
  #        loadBalancerSourceRanges:
  #          - 10.0.0.0/8
      # affinity:
      #   podAntiAffinity:
      #     preferredDuringSchedulingIgnoredDuringExecution:
      #     - weight: 1
      #       podAffinityTerm:
      #         labelSelector:
      #           matchLabels:
      #             postgres-operator.crunchydata.com/role: pgbouncer
      #         topologyKey: kubernetes.io/hostname
  #      tolerations:
  #      - effect: NoSchedule
  #        key: role
  #        operator: Equal
  #        value: connection-poolers
  #      topologySpreadConstraints:
  #        - maxSkew: 1
  #          topologyKey: my-node-label
  #          whenUnsatisfiable: ScheduleAnyway
  #          labelSelector:
  #            matchLabels:
  #              postgres-operator.crunchydata.com/role: pgbouncer
  #      sidecars:
  #      - name: bouncertestcontainer1
  #        image: mycontainer1:latest
  #      customTLSSecret:
  #        name: keycloakdb-pgbouncer.tls
  #      config:
  #        global:
  #          pool_mode: transaction
  backups:
    pgbackrest:
  #    metadata:
  #    labels:
      image: ""
      configuration:
        secret:
          name: pg-db-pgbackrest-secret
          s3conf: |
            [global]
            repo2-s3-key=*******************
            repo2-s3-key-secret=**********************
            repo2-storage-verify-tls=y
            # repo2-retention-full=2
            # repo2-retention-full-type=count
  #    jobs:
  #      priorityClassName: high-priority
  #      resources:
  #        limits:
  #          cpu: 200m
  #          memory: 128Mi
  #      tolerations:
  #      - effect: NoSchedule
  #        key: role
  #        operator: Equal
  #        value: connection-poolers
  #     global:
          # repo1-retention-full: "14"
          # repo1-retention-full-type: time
          # repo2-retention-full: "14"
          # repo2-retention-full-type: time
  #      repo1-path: /pgbackrest/postgres-operato r/cluster1/repo1
  #      repo1-cipher-type: aes-256-cbc
  #      repo1-s3-uri-style: path
  #      repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
  #      repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
  #      repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
      # repoHost:
      #   affinity:
      #     podAntiAffinity:
      #       preferredDuringSchedulingIgnoredDuringExecution:
      #       - weight: 1
      #         podAffinityTerm:
      #           labelSelector:
      #             matchLabels:
      #               postgres-operator.crunchydata.com/data: pgbackrest
      #           topologyKey: kubernetes.io/hostname
  #      priorityClassName: high-priority
  #      topologySpreadConstraints:
  #      - maxSkew: 1
  #        topologyKey: my-node-label
  #        whenUnsatisfiable: ScheduleAnyway
  #        labelSelector:
  #          matchLabels:
  #            postgres-operator.crunchydata.com/pgbackrest: ""
      manual:
        repoName: repo1
        options:
        - --type=full
      repos:
      - name: repo1
        schedules:
          full: "0 0 * * *"
  #        differential: "0 1 * * 1-6"
        volume:
          volumeClaimSpec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 10Gi
      - name: repo2
        s3:
          bucket: "bucket"
          endpoint: "s3.ca-central-1.amazonaws.com"
          region: "ca-central-1"
  #    - name: repo3
  #      gcs:
  #        bucket: "<YOUR_GCS_BUCKET_NAME>"
  #    - name: repo4
  #      azure:
  #        container: "<YOUR_AZURE_CONTAINER>"
  pmm:
    enabled: true
    image:
      repository: percona/pmm-client
      tag: 2.40.0
  #  imagePullPolicy: IfNotPresent
    secret: pg-db-pmm-secret
    serverHost: monitoring-service
  #  resources:
  #    requests:
  #      memory: 200M
  #      cpu: 500m
  # patroni:
  #   dynamicConfiguration:
  #     postgresql:
  #       parameters:
  #         max_parallel_workers: 2
  #         max_worker_processes: 2
  #         shared_buffers: 1GB
  #         work_mem: 2MB
  #       pg_hba:
  #       - host    all mytest 123.123.123.123/32 reject
  # extensions:
  #   image: percona/percona-postgresql-operator:2.3.1
  #   imagePullPolicy: Always
  #   storage:
  #     type: s3
  #     bucket: pg-extensions
  #     region: ca-central-1
  #     secret:
  #       name: cluster1-extensions-secret
  #   builtin:
  #     pg_stat_monitor: true
  #     pg_audit: true
  #   custom:
  #   - name: pg_cron
  #     version: 1.6.1
  secrets:
    name: 
    # replication user password
    primaryuser:
    # superuser password
    postgres: "pg-db-postgres-secret"
    # pgbouncer user password
    pgbouncer:
    # pguser user password
    pguser:

The possible combinations what I have already tried:

  • Under backups - pgbackrest - configuration - global, i removed the comment from the repo-2 configuration details, but not worked:
  backups:
    pgbackrest:
      image: ""
      configuration:
        global:
          repo2-retention-full: "14"
          repo2-retention-full-type: time
  • Under backups - pgbackrest - configuration - secret - s3conf (which purpose is to refer values from here to a self made pg-db-pgbackrest-secret), i removed the comment of the 2 lines regarding repo2-retention-full:
  backups:
    pgbackrest:
      image: ""
      configuration:
        secret:
          name: pg-db-pgbackrest-secret
          s3conf: |
            [global]
            repo2-s3-key=*******************
            repo2-s3-key-secret=**********************
            repo2-storage-verify-tls=y
            repo2-retention-full=2
            repo2-retention-full-type=count

So my questions are:

  • What will be the correct configuration in case I want to create a full backup to the repo2 s3 bucket which can be restore in future?
  • Is there any solution to create config which can init a full backup to the repo2 s3 bucket after deploy and then switch to differential backup?

Many thanks,
Mate

Hello Mate,

Hope you are doing great!
I have completely reviewed your case, and tried to reproduce the scenario. It seems you need to add following parameters to repo2:

repo2-type: s3
repo2-s3-endpoint: s3.amazonaws.com
repo2-s3-bucket: bucket-name
repo2-path: backup-path
repo2-s3-region: region

We need more clarification on the second question.

Do you want a setup where only differential backups happen after the first full backup? Or do you want differential backups to happen between 2 full backups?

Please feel free to revert in case of any concern.

Thanks,
Ninad

Hey Ninad,

Thank you for your answer, I hope you are doing well too.

Unfortunately, my understanding is not complete regarding the backup mechanism of percona postgres backrest. The backup is working well in my code written above, however the created backup which initiated in the repo2 (s3 bucket) is not a full backup as default.

I noticed that percona operator generates a default global config (named pgbackrest_instance.conf), in which I cannot modify that creates a full backup after initialization to the s3 bucket as it works well in repo1.

The s3 bucket receives configuration from 2 places in the mentioned values file, therefore the following section partly contains your suggestions:

      - name: repo2
        s3:
          bucket: "bucket"
          endpoint: "s3.ca-central-1.amazonaws.com"
          region: "ca-central-1"

and the following section contains the credentials part mainly:

          s3conf: |
            [global]
            repo2-s3-key=*******************
            repo2-s3-key-secret=**********************
            repo2-storage-verify-tls=y
            repo2-retention-full=2
            repo2-retention-full-type=count

As per my understanding this section in the values can be used to set the global configuration to the repos which overwrites the default config of pgbackrest:

  #     global:
          # repo1-retention-full: "14"
          # repo1-retention-full-type: time
          # repo2-retention-full: "14"
          # repo2-retention-full-type: time

In summary, I have a working backup with a default repo1 which initiates full backup to the local storage, however in the s3 (repo2) remote storage, it only creates differential backup (and I cannot overwrite the global pgbackrest settings).

My goal would be to setup a full backup to the local and remote storage at the same time (or by using only one storage): at first deploy and after n days do full backups, then between these n days, do the same with differential backups.

Do you have any recommendation in this case?

Many thanks,
Mate

Hello Mate,

Thank you for the detailed narration. This explains the situation in a better way.

We will have a look into the same and get back to you.

Thanks,
Ninad

1 Like

Hey Mate,

Would you please share the steps you followed to perform the setup?

Thanks,
Ninad