Backup fails after helm upgrade

I had a pgbackrest setup with 2 repos, repo1 was doing backup on the local pv and repo2 backup on s3. I felt that having incremental backups on both repo1 and repo2 was too much so I upgraded postgres cluster to remove repo1. After the upgrade, all backups are failing with the following message:

Waiting for 10 seconds
Running pgbackrest
time="2024-11-04T13:51:52Z" level=info msg="crunchy-pgbackrest starts"
time="2024-11-04T13:51:52Z" level=info msg="debug flag set to false"
time="2024-11-04T13:51:52Z" level=info msg="backrest backup command requested"
time="2024-11-04T13:51:52Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --annotation=\"percona.com/backup-job-type\"=\"replica-create\"]"
time="2024-11-04T13:51:55Z" level=info msg="output=[]"
time="2024-11-04T13:51:55Z" level=info msg="stderr=[WARN: a timeline switch has occurred since the 20241104-111254F_20241104-125203I backup, enabling delta checksum\n      HINT: this is normal after restoring from backup or promoting a standby.\nERROR: [032]: local-1 process terminated unexpectedly [32]: ERROR: [032]: key '2' is not valid for 'repo' option\n]"
time="2024-11-04T13:51:55Z" level=fatal msg="command terminated with exit code 32"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    3    0     3    0     0   1000      0 --:--:-- --:--:-- --:--:--  1000
OK
Failed to run pgbackrest

We are talking about brand new percona postgres test cluster,with no previous restore or any operations.

Afterwards I made another upgrade to add repo1 back at its removal seemed to be an issue but backups kept failing but with a different error.

shoot--barmer--dev
Pods
24 items
Namespace: cao-postgres
service-1-repo-host-0
cao-postgres
0
0.007
StatefulSet
80.2MiB
shoot--barmer--dev-idp-z3-5f9d4-t4hw5
Burstable
56m
Running
service-1-instance1-nqbc-0
cao-postgres
0
0.020
StatefulSet
172.2MiB
shoot--barmer--dev-idp-z1-85f5b-q56sc
Burstable
55m
Running
service-1-instance1-7h8t-0
cao-postgres
0
0.032
StatefulSet
157.9MiB
shoot--barmer--dev-idp-z2-95b7b-wrq48
Burstable
55m
Running
service-1-instance1-64st-0
cao-postgres
0
0.029
StatefulSet
191.2MiB
shoot--barmer--dev-idp-z3-5f9d4-4ss2s
Burstable
56m
Running
service-1-backup-wwnl-kjs8q
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4h2m
Succeeded
service-1-backup-vhgt-6tlzd
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
160m
Succeeded
service-1-backup-szfg-mmb26
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
176m
Succeeded
service-1-backup-rwqm-48qsz
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
143m
Succeeded
service-1-backup-r68w-rftrw
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3h45m
Succeeded
service-1-backup-pt29-g9zjj
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
126m
Succeeded
service-1-backup-p84l-m5gck
cao-postgres
0
0.000
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3h29m
Succeeded
service-1-backup-nsp2-8wjpj
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3h13m
Succeeded
service-1-backup-ld6l-xc2ks
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4h34m
Succeeded
service-1-backup-dqh7-z6p4n
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
5m20s
Failed
service-1-backup-dqh7-slv9m
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4m37s
Failed
service-1-backup-dqh7-krqqn
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
89s
Failed
service-1-backup-dqh7-htflx
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4m1s
Failed
service-1-backup-dqh7-b8k2t
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3m5s
Failed
service-1-backup-dqh7-7rw5f
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
5m3s
Failed
service-1-backup-8x26-w88ql
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4h19m
Succeeded
pgo-pg-operator-7547f49f46-nxnqh
cao-postgres
0
0.021
ReplicaSet
313.7MiB
shoot--barmer--dev-idp-z1-85f5b-q56sc
Burstable
3d22h
Running
pg-prometheus-postgres-exporter-67cc44d89c-tltsg
cao-postgres
0
0.006
ReplicaSet
134.5MiB
shoot--barmer--dev-idp-z3-5f9d4-4ss2s
Burstable
2d23h
Running
pg-pgbackup-clean-546fc6f88c-fw4r8
cao-postgres
0
0.023
ReplicaSet
85.9MiB
shoot--barmer--dev-idp-z1-85f5b-q56sc
Burstable
2d23h
Running
pg
cao-postgres
0
0.004
86.9MiB
shoot--barmer--dev-idp-z3-5f9d4-t4hw5
Burstable
17d
Running
Namespace
cao-postgres
Owner
Job service-1-backup-dqh7
Pod
service-1-backup-dqh7-krqqn
Container
pgbackrest
Search...
Waiting for 10 seconds
Running pgbackrest
time="2024-11-04T14:57:54Z" level=info msg="crunchy-pgbackrest starts"
time="2024-11-04T14:57:54Z" level=info msg="debug flag set to false"
time="2024-11-04T14:57:54Z" level=info msg="backrest backup command requested"
time="2024-11-04T14:57:54Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1 --annotation=\"percona.com/backup-job-type\"=\"replica-create\"]"
time="2024-11-04T14:57:54Z" level=info msg="output=[]"
time="2024-11-04T14:57:54Z" level=info msg="stderr=[ERROR: [055]: unable to load info file '/pgbackrest/repo1/backup/db/backup.info' or '/pgbackrest/repo1/backup/db/backup.info.copy':\n       FileMissingError: unable to open missing file '/pgbackrest/repo1/backup/db/backup.info' for read\n       FileMissingError: unable to open missing file '/pgbackrest/repo1/backup/db/backup.info.copy' for read\n       HINT: backup.info cannot be opened and is required to perform a backup.\n       HINT: has a stanza-create been performed?\n]"
time="2024-11-04T14:57:54Z" level=fatal msg="command terminated with exit code 55"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    3    0     3    0     0    750      0 --:--:-- --:--:-- --:--:--   750
OK
Failed to run pgbackrest

Initial backup values:

backups:
  pgbackrest:
    image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1

    configuration:
      - secret:
          name: cluster1-pgbackrest-secrets

    repoHost:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/data: pgbackrest
              topologyKey: kubernetes.io/hostname



    manual:
      repoName: repo2
      options:
      - --type=full
    repos:
    - name: repo1
      schedules:
        full: "0 4 * * *"
        incremental: "*/43 * * * *"
      volume:
        volumeClaimSpec:
          storageClassName: "default"
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 200Gi
    - name: repo2
      s3:
        bucket: "29b03b4896302b9297c50160c9a3453c28fbec2d31b3203f98326c82"
        endpoint: "minio.apps.cluster.local
        region: "default"
      schedules:
        full: "0 3 * * *"
        incremental: "*/17 * * * *"

Backup values for the first upgrade:

backups:
  pgbackrest:
    image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1

    configuration:
      - secret:
          name: cluster1-pgbackrest-secrets

    repoHost:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/data: pgbackrest
              topologyKey: kubernetes.io/hostname



    manual:
      repoName: repo2
      options:
      - --type=full
    repos:
    # - name: repo1
    #   schedules:
    #     full: "0 4 * * *"
    #     incremental: "*/43 * * * *"
    #   volume:
    #     volumeClaimSpec:
    #       storageClassName: "default"
    #       accessModes:
    #       - ReadWriteOnce
    #       resources:
    #         requests:
    #           storage: 200Gi
    - name: repo2
      s3:
        bucket: "29b03b4896302b9297c50160c9a3453c28fbec2d31b3203f98326c82"
        endpoint: "minio.apps.cluster.local
        region: "default"
      schedules:
        full: "0 3 * * *"
        incremental: "*/17 * * * *"

And the last backup values which were supposed to fix everything:

backups:
  pgbackrest:
    image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1

    configuration:
      - secret:
          name: cluster1-pgbackrest-secrets

    repoHost:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/data: pgbackrest
              topologyKey: kubernetes.io/hostname



    manual:
      repoName: repo2
      options:
      - --type=full
    repos:
    - name: repo1
      schedules:
        full: "0 4 * * 6"
        # incremental: "*/43 * * * *"
      volume:
        volumeClaimSpec:
          storageClassName: "default"
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 200Gi
    - name: repo2
      s3:
        bucket: "29b03b4896302b9297c50160c9a3453c28fbec2d31b3203f98326c82"
        endpoint: "minio.apps.cluster.local
        region: "default"
      schedules:
        full: "0 3 * * *"
        incremental: "*/17 * * * *"

Pgbackrest info:

stanza: db
    status: mixed
        repo1: error (missing stanza path)
        repo2: ok
    cipher: none

    db (current)
        wal archive min/max (15): 000000010000000000000004/000000040000001200000028

        full backup: 20241101-151723F
ERROR: [064]: unable to write to fd

I continued testing and it seems that the upgrade had no issue if I renamed repo2 to repo1 after removing the old repo1 from values file. I guess you can’t have repo2 without repo1

That is interesting! I will check and work with our eng team on it. Thanks for sharing!

I forgot to mention, very important, that Percona postgres is running in an Istio injected namespace, Istio v 1.17
I haven’t seen any issues in an non istio-injected namespace
I will do more testing