Percona PostgreSQL Operator: Restore Fails with ‘No Completed Backups Found’ Error Despite Successful Backups in AWS S3
Hi Everyone,
I’m reaching out after spending weeks troubleshooting this issue and closely following the documentation. I’m hoping someone here might have encountered a similar problem and can help.
Goal:
I want to restore a full backup from an existing running PostgreSQL cluster to a new cluster.
Infrastructure and Configuration:
- I have an EKS cluster running with the following namespaces and components:
- pg-op namespace
- pg-db1 namespace
- pg-db2 namespace
- pg-op: Contains the PostgreSQL Operator and Percona Monitoring and Management (PMM), both deployed using their respective Helm charts:
- Helm Chart: Percona Operator for PostgreSQL
- Helm Chart: Percona Monitoring and Management (PMM)
$ helm install pmm percona/pmm -n pg-op
$ helm install pg-operator percona/pg-operator -f pg-operator.yml -n pg-op
Note: for your reference, i’m attaching pg-operator.yml file below
// pg-operator.yml
replicaCount: 2
operatorImageRepository: percona/percona-postgresql-operator
imagePullPolicy: IfNotPresent
image: ""
watchAllNamespaces: true
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
resources:
limits:
cpu: 200m
memory: 500Mi
requests:
cpu: 100m
memory: 250Mi
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
disableTelemetry: false
logStructured: true
logLevel: "INFO"
- pg-db1: Contains a PostgreSQL cluster deployed with the following configuration:
- Helm Chart: Percona Distribution for PostgreSQL
- Cluster Name: cluster1
$ helm install pg-db percona/pg-db -f pg-db.yml -n pg-db1
Note: for your reference, i’m attaching pg-db1.yml file below
// pg-db1.yml
finalizers:
fullnameOverride: cluster1
crVersion: 2.5.0
repository: percona/percona-postgresql-operator
image: percona/percona-postgresql-operator:2.5.0-ppg13.16-postgres
imagePullPolicy: Always
postgresVersion: 13
# port: 5432
pause: false
unmanaged: false
standby:
enabled: false
# host: "<primary-ip>"
# port: "<primary-port>"
# repoName: repo1
customTLSSecret:
name: ""
customReplicationTLSSecret:
name: ""
instances:
- name: instance1
replicas: 3
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: postgres
topologyKey: kubernetes.io/hostname
resources:
requests:
cpu: 2.0
memory: 4Gi
limits:
cpu: 2.0
memory: 4Gi
dataVolumeClaimSpec:
storageClassName: "gp3-storage"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
proxy:
pgBouncer:
replicas: 3
image: percona/percona-postgresql-operator:2.5.0-ppg16.4-pgbouncer1.23.1
exposeSuperusers: true
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
expose:
type: LoadBalancer
loadBalancerSourceRanges:
- 0.0.0.0/0 # Restrict to specific IP ranges in production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/role: pgbouncer
topologyKey: kubernetes.io/hostname
configuration:
- secret:
name: s3-pgbackrest-secrets
global:
# repo1-retention-full: "2"
# repo1-retention-full-type: count
repo2-retention-full: "2"
repo2-retention-full-type: time
repo2-retention-diff: "7"
# repo1-retention-diff-type: count
# repo2-retention-full: "14"
# repo2-retention-diff: "5"
# repo2-retention-full-type: time
# repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
# repo1-cipher-type: aes-256-cbc
# repo1-s3-uri-style: path
# repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
# repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
# repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
repoHost:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
manual:
repoName: repo2
options:
- --type=full
repos:
- name: repo2
s3:
bucket: pgbackrest-backup
endpoint: s3.us-east-1.amazonaws.com
region: us-east-1
schedules:
full: "0 */2 * * *" # Full backup every Sunday at midnight
differential: "0 * * * *" # Differential backup every day except Sunday
pmm:
enabled: true
image:
repository: percona/pmm-client
tag: 2.43.2
imagePullPolicy: IfNotPresent
secret: pmm-secret
serverHost: monitoring-service.pg-op.svc
querySource: pgstatmonitor
- pg-db2: Target namespace where I want to restore the backup into a new PostgreSQL cluster, also deployed using the same Helm chart:
- Helm Chart: Percona Distribution for PostgreSQL
$ helm install pg-db percona/pg-db -f pg-db2.yml -n pg-db2
Note: for your reference, i’m attaching pg-db2.yml file below
// pg-db2.yml
finalizers:
fullnameOverride: cluster2
crVersion: 2.5.0
repository: percona/percona-postgresql-operator
image: percona/percona-postgresql-operator:2.5.0-ppg13.16-postgres
imagePullPolicy: Always
postgresVersion: 13
# port: 5432
pause: false
unmanaged: false
standby:
enabled: false
# host: "<primary-ip>"
# port: "<primary-port>"
# repoName: repo1
customTLSSecret:
name: ""
customReplicationTLSSecret:
name: ""
dataSource:
postgresCluster:
clusterName: "cluster1"
clusterNamespace: "pg-db1"
repoName: "repo2"
options:
- --type=time
- --target="2024-12-08 08:35:00+00"
pgbackrest:
stanza: db
configuration:
- secret:
name: s3-pgbackrest-secrets
# global:
# repo1-path: /pgbackrest/postgres-operator/hippo/repo1
repo:
name: "repo2"
s3:
bucket: "pgbackrest-backup"
endpoint: "s3.us-east-1.amazonaws.com"
region: "us-east-1"
instances:
- name: instance1
replicas: 3
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: postgres
topologyKey: kubernetes.io/hostname
resources:
requests:
cpu: 2.0
memory: 4Gi
limits:
cpu: 2.0
memory: 4Gi
dataVolumeClaimSpec:
storageClassName: "gp3-storage"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
proxy:
pgBouncer:
replicas: 3
image: percona/percona-postgresql-operator:2.5.0-ppg16.4-pgbouncer1.23.1
exposeSuperusers: true
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
expose:
type: LoadBalancer
loadBalancerSourceRanges:
- 0.0.0.0/0 # Restrict to specific IP ranges in production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/role: pgbouncer
topologyKey: kubernetes.io/hostname
backups:
trackLatestRestorableTime: true
pgbackrest:
# metadata:
# labels:
image: percona/percona-postgresql-operator:2.5.0-ppg16.4-pgbackrest2.53-1
configuration:
- secret:
name: s3-pgbackrest-secrets
global:
# repo1-retention-full: "2"
# repo1-retention-full-type: count
repo2-retention-full: "2"
repo2-retention-full-type: time
repo2-retention-diff: "7"
# repo2-retention-full: "2"
# repo2-retention-full-type: count
# repo2-retention-diff: "4"
# repo1-retention-diff-type: count
# repo2-retention-full: "14"
# repo2-retention-diff: "5"
# repo2-retention-full-type: time
# repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
# repo1-cipher-type: aes-256-cbc
# repo1-s3-uri-style: path
# repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
# repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
# repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
repoHost:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
manual:
repoName: repo2
options:
- --type=full
repos:
- name: repo1
schedules:
full: "0 0 * * 6"
volume:
volumeClaimSpec:
storageClassName: "gp3-storage"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
pmm:
enabled: true
image:
repository: percona/pmm-client
tag: 2.43.2
imagePullPolicy: IfNotPresent
secret: pmm-secret
serverHost: monitoring-service.pg-op.svc
querySource: pgstatmonitor
Backup Details:
In the pg-db1 namespace, I have successfully set up scheduled backups to an AWS S3 bucket. The backups are created and stored without issues.
- I have validated the backups; they appear complete, and I can see them in the S3 bucket.
- The PostgreSQL cluster (
cluster1
) is functioning properly; I’ve inserted data and verified connectivity from my local machine.
Note: see screenshot below to verify the backups
Restore Process and Issue:
I followed the documentation to restore the backup to a new PostgreSQL cluster in the pg-db2 namespace. After deploying the new cluster, no pods are starting in the pg-db2 namespace.
Upon investigating the logs of the PostgreSQL Operator, I found the following error:
// ERRORS
{"level":"error","ts":1733931658.536926,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"929a3632-1807-46e5-a082-06403dfe52c3","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733931659.7483046,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"83799672-1e80-4882-bfc9-5c08ae6924f5","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"info","ts":1733931664.7487805,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"d61b09cf-8470-41de-a53a-62c80cee047e","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"error","ts":1733931668.5370939,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"929a3632-1807-46e5-a082-06403dfe52c3","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733931669.7504153,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"02813d8d-e52c-4125-a5ba-70fea00434d0","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
Loading...
You're using Lens Personal (for individuals or companies with < $10M annual revenue or funding)
The error clearly states: “No completed backups found”, even though multiple completed backups exist in the S3 bucket.
What I’ve Tried:
- Verified that the backups in the S3 bucket are complete and accessible.
- Double-checked the restore configuration in my
values.yml
file. - Followed the official Percona documentation step-by-step.
Despite this, the restore process fails to start, and the pods in the target namespace (pg-db2
) don’t come up.
Request for Help:
Has anyone encountered this issue?
- Is there something specific I need to configure for the operator to recognize the backups in S3?
- Could this be a permissions issue with S3 or something misconfigured in the operator’s restore logic?
I’m attaching screenshots of the completed backups, the target cluster configuration, and the error logs for reference.
Any insights or suggestions would be highly appreciated. Thank you!