I had a pgbackrest setup with 2 repos, repo1 was doing backup on the local pv and repo2 backup on s3. I felt that having incremental backups on both repo1 and repo2 was too much so I upgraded postgres cluster to remove repo1. After the upgrade, all backups are failing with the following message:
Waiting for 10 seconds
Running pgbackrest
time="2024-11-04T13:51:52Z" level=info msg="crunchy-pgbackrest starts"
time="2024-11-04T13:51:52Z" level=info msg="debug flag set to false"
time="2024-11-04T13:51:52Z" level=info msg="backrest backup command requested"
time="2024-11-04T13:51:52Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --annotation=\"percona.com/backup-job-type\"=\"replica-create\"]"
time="2024-11-04T13:51:55Z" level=info msg="output=[]"
time="2024-11-04T13:51:55Z" level=info msg="stderr=[WARN: a timeline switch has occurred since the 20241104-111254F_20241104-125203I backup, enabling delta checksum\n HINT: this is normal after restoring from backup or promoting a standby.\nERROR: [032]: local-1 process terminated unexpectedly [32]: ERROR: [032]: key '2' is not valid for 'repo' option\n]"
time="2024-11-04T13:51:55Z" level=fatal msg="command terminated with exit code 32"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
3 0 3 0 0 1000 0 --:--:-- --:--:-- --:--:-- 1000
OK
Failed to run pgbackrest
We are talking about brand new percona postgres test cluster,with no previous restore or any operations.
Afterwards I made another upgrade to add repo1 back at its removal seemed to be an issue but backups kept failing but with a different error.
shoot--barmer--dev
Pods
24 items
Namespace: cao-postgres
service-1-repo-host-0
cao-postgres
0
0.007
StatefulSet
80.2MiB
shoot--barmer--dev-idp-z3-5f9d4-t4hw5
Burstable
56m
Running
service-1-instance1-nqbc-0
cao-postgres
0
0.020
StatefulSet
172.2MiB
shoot--barmer--dev-idp-z1-85f5b-q56sc
Burstable
55m
Running
service-1-instance1-7h8t-0
cao-postgres
0
0.032
StatefulSet
157.9MiB
shoot--barmer--dev-idp-z2-95b7b-wrq48
Burstable
55m
Running
service-1-instance1-64st-0
cao-postgres
0
0.029
StatefulSet
191.2MiB
shoot--barmer--dev-idp-z3-5f9d4-4ss2s
Burstable
56m
Running
service-1-backup-wwnl-kjs8q
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4h2m
Succeeded
service-1-backup-vhgt-6tlzd
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
160m
Succeeded
service-1-backup-szfg-mmb26
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
176m
Succeeded
service-1-backup-rwqm-48qsz
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
143m
Succeeded
service-1-backup-r68w-rftrw
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3h45m
Succeeded
service-1-backup-pt29-g9zjj
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
126m
Succeeded
service-1-backup-p84l-m5gck
cao-postgres
0
0.000
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3h29m
Succeeded
service-1-backup-nsp2-8wjpj
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3h13m
Succeeded
service-1-backup-ld6l-xc2ks
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4h34m
Succeeded
service-1-backup-dqh7-z6p4n
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
5m20s
Failed
service-1-backup-dqh7-slv9m
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4m37s
Failed
service-1-backup-dqh7-krqqn
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
89s
Failed
service-1-backup-dqh7-htflx
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4m1s
Failed
service-1-backup-dqh7-b8k2t
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
3m5s
Failed
service-1-backup-dqh7-7rw5f
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
5m3s
Failed
service-1-backup-8x26-w88ql
cao-postgres
0
0.000
Job
0
shoot--barmer--dev-idp-z2-95b7b-z2fd5
Burstable
4h19m
Succeeded
pgo-pg-operator-7547f49f46-nxnqh
cao-postgres
0
0.021
ReplicaSet
313.7MiB
shoot--barmer--dev-idp-z1-85f5b-q56sc
Burstable
3d22h
Running
pg-prometheus-postgres-exporter-67cc44d89c-tltsg
cao-postgres
0
0.006
ReplicaSet
134.5MiB
shoot--barmer--dev-idp-z3-5f9d4-4ss2s
Burstable
2d23h
Running
pg-pgbackup-clean-546fc6f88c-fw4r8
cao-postgres
0
0.023
ReplicaSet
85.9MiB
shoot--barmer--dev-idp-z1-85f5b-q56sc
Burstable
2d23h
Running
pg
cao-postgres
0
0.004
86.9MiB
shoot--barmer--dev-idp-z3-5f9d4-t4hw5
Burstable
17d
Running
Namespace
cao-postgres
Owner
Job service-1-backup-dqh7
Pod
service-1-backup-dqh7-krqqn
Container
pgbackrest
Search...
Waiting for 10 seconds
Running pgbackrest
time="2024-11-04T14:57:54Z" level=info msg="crunchy-pgbackrest starts"
time="2024-11-04T14:57:54Z" level=info msg="debug flag set to false"
time="2024-11-04T14:57:54Z" level=info msg="backrest backup command requested"
time="2024-11-04T14:57:54Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1 --annotation=\"percona.com/backup-job-type\"=\"replica-create\"]"
time="2024-11-04T14:57:54Z" level=info msg="output=[]"
time="2024-11-04T14:57:54Z" level=info msg="stderr=[ERROR: [055]: unable to load info file '/pgbackrest/repo1/backup/db/backup.info' or '/pgbackrest/repo1/backup/db/backup.info.copy':\n FileMissingError: unable to open missing file '/pgbackrest/repo1/backup/db/backup.info' for read\n FileMissingError: unable to open missing file '/pgbackrest/repo1/backup/db/backup.info.copy' for read\n HINT: backup.info cannot be opened and is required to perform a backup.\n HINT: has a stanza-create been performed?\n]"
time="2024-11-04T14:57:54Z" level=fatal msg="command terminated with exit code 55"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
3 0 3 0 0 750 0 --:--:-- --:--:-- --:--:-- 750
OK
Failed to run pgbackrest
Initial backup values:
backups:
pgbackrest:
image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1
configuration:
- secret:
name: cluster1-pgbackrest-secrets
repoHost:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
manual:
repoName: repo2
options:
- --type=full
repos:
- name: repo1
schedules:
full: "0 4 * * *"
incremental: "*/43 * * * *"
volume:
volumeClaimSpec:
storageClassName: "default"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
- name: repo2
s3:
bucket: "29b03b4896302b9297c50160c9a3453c28fbec2d31b3203f98326c82"
endpoint: "minio.apps.cluster.local
region: "default"
schedules:
full: "0 3 * * *"
incremental: "*/17 * * * *"
Backup values for the first upgrade:
backups:
pgbackrest:
image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1
configuration:
- secret:
name: cluster1-pgbackrest-secrets
repoHost:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
manual:
repoName: repo2
options:
- --type=full
repos:
# - name: repo1
# schedules:
# full: "0 4 * * *"
# incremental: "*/43 * * * *"
# volume:
# volumeClaimSpec:
# storageClassName: "default"
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: 200Gi
- name: repo2
s3:
bucket: "29b03b4896302b9297c50160c9a3453c28fbec2d31b3203f98326c82"
endpoint: "minio.apps.cluster.local
region: "default"
schedules:
full: "0 3 * * *"
incremental: "*/17 * * * *"
And the last backup values which were supposed to fix everything:
backups:
pgbackrest:
image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1
configuration:
- secret:
name: cluster1-pgbackrest-secrets
repoHost:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
manual:
repoName: repo2
options:
- --type=full
repos:
- name: repo1
schedules:
full: "0 4 * * 6"
# incremental: "*/43 * * * *"
volume:
volumeClaimSpec:
storageClassName: "default"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
- name: repo2
s3:
bucket: "29b03b4896302b9297c50160c9a3453c28fbec2d31b3203f98326c82"
endpoint: "minio.apps.cluster.local
region: "default"
schedules:
full: "0 3 * * *"
incremental: "*/17 * * * *"
Pgbackrest info:
stanza: db
status: mixed
repo1: error (missing stanza path)
repo2: ok
cipher: none
db (current)
wal archive min/max (15): 000000010000000000000004/000000040000001200000028
full backup: 20241101-151723F
ERROR: [064]: unable to write to fd