Percona PostgreSQL Operator: Restore Fails with 'No Completed Backups Found' Error Despite Successful Backups in AWS S3

Percona PostgreSQL Operator: Restore Fails with ‘No Completed Backups Found’ Error Despite Successful Backups in AWS S3

:raising_hand_man: Hi Everyone,

I’m reaching out after spending weeks troubleshooting this issue and closely following the documentation. I’m hoping someone here might have encountered a similar problem and can help.

Goal:
I want to restore a full backup from an existing running PostgreSQL cluster to a new cluster.

Infrastructure and Configuration:

  • I have an EKS cluster running with the following namespaces and components:
  1. pg-op namespace
  2. pg-db1 namespace
  3. pg-db2 namespace
  1. pg-op: Contains the PostgreSQL Operator and Percona Monitoring and Management (PMM), both deployed using their respective Helm charts:
  • Helm Chart: Percona Operator for PostgreSQL
  • Helm Chart: Percona Monitoring and Management (PMM)
$ helm install pmm percona/pmm -n pg-op

$ helm install pg-operator percona/pg-operator -f pg-operator.yml -n pg-op

Note: for your reference, i’m attaching pg-operator.yml file below

// pg-operator.yml


replicaCount: 2

operatorImageRepository: percona/percona-postgresql-operator
imagePullPolicy: IfNotPresent
image: ""

watchAllNamespaces: true

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

resources:
  limits:
    cpu: 200m
    memory: 500Mi
  requests:
    cpu: 100m
    memory: 250Mi

nodeSelector: {}

tolerations: []

affinity: {}

podAnnotations: {}

disableTelemetry: false

logStructured: true
logLevel: "INFO"

  1. pg-db1: Contains a PostgreSQL cluster deployed with the following configuration:
  • Helm Chart: Percona Distribution for PostgreSQL
  • Cluster Name: cluster1
$ helm install pg-db percona/pg-db -f pg-db.yml -n pg-db1

Note: for your reference, i’m attaching pg-db1.yml file below

// pg-db1.yml


finalizers:

fullnameOverride: cluster1

crVersion: 2.5.0
repository: percona/percona-postgresql-operator
image: percona/percona-postgresql-operator:2.5.0-ppg13.16-postgres
imagePullPolicy: Always
postgresVersion: 13
# port: 5432
pause: false
unmanaged: false
standby:
  enabled: false
  # host: "<primary-ip>"
  # port: "<primary-port>"
  # repoName: repo1

customTLSSecret:
  name: ""
customReplicationTLSSecret:
  name: ""

instances:
- name: instance1
  replicas: 3

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchLabels:
              postgres-operator.crunchydata.com/data: postgres
          topologyKey: kubernetes.io/hostname

  resources:
    requests:
      cpu: 2.0
      memory: 4Gi
    limits:
      cpu: 2.0
      memory: 4Gi
  dataVolumeClaimSpec:
    storageClassName: "gp3-storage"
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Gi

proxy:
  pgBouncer:
    replicas: 3
    image: percona/percona-postgresql-operator:2.5.0-ppg16.4-pgbouncer1.23.1
    exposeSuperusers: true
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi
    expose:
      type: LoadBalancer
      loadBalancerSourceRanges:
        - 0.0.0.0/0  # Restrict to specific IP ranges in production
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            labelSelector:
              matchLabels:
                postgres-operator.crunchydata.com/role: pgbouncer
            topologyKey: kubernetes.io/hostname
    configuration:
      - secret:
          name: s3-pgbackrest-secrets
    global:
      # repo1-retention-full: "2"
      # repo1-retention-full-type: count
      repo2-retention-full: "2"
      repo2-retention-full-type: time
      repo2-retention-diff: "7"
      # repo1-retention-diff-type: count
    #   repo2-retention-full: "14"
    #   repo2-retention-diff: "5"
    #   repo2-retention-full-type: time
#      repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
#      repo1-cipher-type: aes-256-cbc
#      repo1-s3-uri-style: path
#      repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
#      repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
#      repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4

    repoHost:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/data: pgbackrest
              topologyKey: kubernetes.io/hostname


    manual:
      repoName: repo2
      options:
      - --type=full
    repos:
    - name: repo2
      s3:
        bucket: pgbackrest-backup
        endpoint: s3.us-east-1.amazonaws.com
        region: us-east-1
      schedules:
        full: "0 */2 * * *"          # Full backup every Sunday at midnight
        differential: "0 * * * *" # Differential backup every day except Sunday

pmm:
  enabled: true
  image:
    repository: percona/pmm-client
    tag: 2.43.2
  imagePullPolicy: IfNotPresent
  secret: pmm-secret
  serverHost: monitoring-service.pg-op.svc
  querySource: pgstatmonitor

  1. pg-db2: Target namespace where I want to restore the backup into a new PostgreSQL cluster, also deployed using the same Helm chart:
  • Helm Chart: Percona Distribution for PostgreSQL
$ helm install pg-db percona/pg-db -f pg-db2.yml -n pg-db2

Note: for your reference, i’m attaching pg-db2.yml file below

// pg-db2.yml


finalizers:

fullnameOverride: cluster2

crVersion: 2.5.0
repository: percona/percona-postgresql-operator
image: percona/percona-postgresql-operator:2.5.0-ppg13.16-postgres
imagePullPolicy: Always
postgresVersion: 13
# port: 5432
pause: false
unmanaged: false
standby:
  enabled: false
  # host: "<primary-ip>"
  # port: "<primary-port>"
  # repoName: repo1

customTLSSecret:
  name: ""
customReplicationTLSSecret:
  name: ""


dataSource:
  postgresCluster:
    clusterName: "cluster1"
    clusterNamespace: "pg-db1"
    repoName: "repo2"
    options:
    - --type=time
    - --target="2024-12-08 08:35:00+00"
  pgbackrest:
    stanza: db
    configuration:
    - secret:
        name: s3-pgbackrest-secrets
    # global:
    #   repo1-path: /pgbackrest/postgres-operator/hippo/repo1
    repo:
      name: "repo2"
      s3:
        bucket: "pgbackrest-backup"
        endpoint: "s3.us-east-1.amazonaws.com"
        region: "us-east-1"

instances:
- name: instance1
  replicas: 3

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchLabels:
              postgres-operator.crunchydata.com/data: postgres
          topologyKey: kubernetes.io/hostname

  resources:
    requests:
      cpu: 2.0
      memory: 4Gi
    limits:
      cpu: 2.0
      memory: 4Gi
  dataVolumeClaimSpec:
    storageClassName: "gp3-storage"
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Gi

proxy:
  pgBouncer:
    replicas: 3
    image: percona/percona-postgresql-operator:2.5.0-ppg16.4-pgbouncer1.23.1
    exposeSuperusers: true
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi
    expose:
      type: LoadBalancer
      loadBalancerSourceRanges:
        - 0.0.0.0/0  # Restrict to specific IP ranges in production
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            labelSelector:
              matchLabels:
                postgres-operator.crunchydata.com/role: pgbouncer
            topologyKey: kubernetes.io/hostname

backups:
  trackLatestRestorableTime: true
  pgbackrest:
#    metadata:
#    labels:
    image: percona/percona-postgresql-operator:2.5.0-ppg16.4-pgbackrest2.53-1
    configuration:
      - secret:
          name: s3-pgbackrest-secrets
    global:
      # repo1-retention-full: "2"
      # repo1-retention-full-type: count
      repo2-retention-full: "2"
      repo2-retention-full-type: time
      repo2-retention-diff: "7"
      # repo2-retention-full: "2"
      # repo2-retention-full-type: count
      # repo2-retention-diff: "4"
      # repo1-retention-diff-type: count
    #   repo2-retention-full: "14"
    #   repo2-retention-diff: "5"
    #   repo2-retention-full-type: time
#      repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
#      repo1-cipher-type: aes-256-cbc
#      repo1-s3-uri-style: path
#      repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
#      repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
#      repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4

    repoHost:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/data: pgbackrest
              topologyKey: kubernetes.io/hostname


    manual:
      repoName: repo2
      options:
      - --type=full
    repos:
    - name: repo1
      schedules:
        full: "0 0 * * 6"
      volume:
        volumeClaimSpec:
          storageClassName: "gp3-storage"
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 20Gi
pmm:
  enabled: true
  image:
    repository: percona/pmm-client
    tag: 2.43.2
  imagePullPolicy: IfNotPresent
  secret: pmm-secret
  serverHost: monitoring-service.pg-op.svc
  querySource: pgstatmonitor


:partly_sunny: Backup Details:

In the pg-db1 namespace, I have successfully set up scheduled backups to an AWS S3 bucket. The backups are created and stored without issues.

  • I have validated the backups; they appear complete, and I can see them in the S3 bucket.
  • The PostgreSQL cluster (cluster1) is functioning properly; I’ve inserted data and verified connectivity from my local machine.

Note: see screenshot below to verify the backups

:no_entry_sign: Restore Process and Issue:

I followed the documentation to restore the backup to a new PostgreSQL cluster in the pg-db2 namespace. After deploying the new cluster, no pods are starting in the pg-db2 namespace.

Upon investigating the logs of the PostgreSQL Operator, I found the following error:

// ERRORS

{"level":"error","ts":1733931658.536926,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"929a3632-1807-46e5-a082-06403dfe52c3","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733931659.7483046,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"83799672-1e80-4882-bfc9-5c08ae6924f5","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"info","ts":1733931664.7487805,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"d61b09cf-8470-41de-a53a-62c80cee047e","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"error","ts":1733931668.5370939,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"929a3632-1807-46e5-a082-06403dfe52c3","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733931669.7504153,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"02813d8d-e52c-4125-a5ba-70fea00434d0","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
Loading...
You're using Lens Personal (for individuals or companies with < $10M annual revenue or funding)


The error clearly states: “No completed backups found”, even though multiple completed backups exist in the S3 bucket.

:pensive: What I’ve Tried:

  1. Verified that the backups in the S3 bucket are complete and accessible.
  2. Double-checked the restore configuration in my values.yml file.
  3. Followed the official Percona documentation step-by-step.

Despite this, the restore process fails to start, and the pods in the target namespace (pg-db2) don’t come up.

:pray: Request for Help:

Has anyone encountered this issue?

  • Is there something specific I need to configure for the operator to recognize the backups in S3?
  • Could this be a permissions issue with S3 or something misconfigured in the operator’s restore logic?

I’m attaching screenshots of the completed backups, the target cluster configuration, and the error logs for reference.

Any insights or suggestions would be highly appreciated. Thank you!

Hi @ankitjodhani,
I see your user case and we even have test case for it:

You can see which options were set when we applied the new cr.

Hi @Slava_Sarzhan ,

Thank you so much for your response! I truly appreciate your guidance and the time you took to assist me.

I’ve made the suggested changes based on your recommendations and the official documentation. Below is the updated block from my pg-db1.yml (the values.yml file used for the Helm chart):



dataSource:
  postgresCluster:
    clusterName: "cluster1"
    clusterNamespace: "pg-db1"
    repoName: "repo2"
    options:
    - --type=immediate
    - --set=20241212-055801F
  pgbackrest:
    stanza: db
    configuration:
    - secret:
        name: s3-pgbackrest-secrets
    global:
      repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
    repo:
      name: "repo2"
      s3:
        bucket: pgbackrest-backup
        endpoint: s3.us-east-1.amazonaws.com
        region: us-east-1

  • Despite implementing these changes, the issue unfortunately persists.

  • The logs still indicate that no completed backups are found, even though I can confirm the backups are present in the S3 bucket and appear complete.

  • I’ve also attached the latest logs and screenshots showing the backups, along with the relevant configurations for additional context.

{"level":"info","ts":1733985295.9201462,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"2ca4d059-94bb-4f6a-92dd-6ba3ad391fa7","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"error","ts":1733985300.5550013,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"5b23067c-772d-4400-b4e7-0e5f06f3188a","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733985300.9204364,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"96001b13-efd9-4c0f-9a08-4a747394cc2b","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"info","ts":1733985305.9217515,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"3dd27462-3f8b-4e31-93d9-c00a7989581a","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"error","ts":1733985310.554757,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"5b23067c-772d-4400-b4e7-0e5f06f3188a","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733985310.9222345,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"9090e3f3-3a67-467f-b17e-e92607ee858b","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"info","ts":1733985315.9230106,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"93103aea-8f3e-4a8d-a4e7-edcee4218ceb","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}
{"level":"error","ts":1733985320.5549712,"logger":"WALWatcher","msg":"get latest backup","controller":"perconapgcluster","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGCluster","PerconaPGCluster":{"name":"cluster2","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2","reconcileID":"5b23067c-772d-4400-b4e7-0e5f06f3188a","error":"no completed backups found","errorVerbose":"no completed backups found\ngithub.com/percona/percona-postgresql-operator/percona/watcher.getLatestBackup\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:129\ngithub.com/percona/percona-postgresql-operator/percona/watcher.WatchCommitTimestamps\n\t/go/src/github.com/percona/percona-postgresql-operator/percona/watcher/wal.go:65\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
{"level":"info","ts":1733985320.9234273,"msg":"Waiting for restore to start","controller":"perconapgrestore","controllerGroup":"pgv2.percona.com","controllerKind":"PerconaPGRestore","PerconaPGRestore":{"name":"cluster2-bootstrap","namespace":"pg-db2"},"namespace":"pg-db2","name":"cluster2-bootstrap","reconcileID":"2312b5cf-3dd7-4307-880e-db95a1185dde","request":{"name":"cluster2-bootstrap","namespace":"pg-db2"}}

I would be grateful if you could review the updates and provide further insights into what might be causing this issue. Let me know if any additional details or debugging information would help.

Thanks again for your continued support!

@ankitjodhani
Do you want to use the existing cluster as a source, or do you want to use the existing backup as a source when you create a new cluster? Because as I can see from your cr you mixed these two types and have both.

Hi @Slava_Sarzhan

:pray: Thank you for pointing that out! I appreciate your observation.

To clarify, my goal is to create a new PostgreSQL cluster that can run alongside the existing one. I’m open to using either method to restore the database:

  1. Using the existing cluster as the source.
  2. Using the existing backup as the source.

I don’t have any preference between the two approaches—whichever works to achieve a successful restoration.

Given this, I’d appreciate your insights on where I might be going wrong. Could the issue be related to how I’ve mixed the configurations for these two methods? If so, could you guide me on how to correctly configure the values.yml file to align with a single method?

Thank you again for your patience and support. Let me know if you need more details or additional logs for troubleshooting!

@Slava_Sarzhan Could you please share some solution or links from where i can find the solution?

Thank you