Backup jobs not created anymore

Hi,
Using PG Opertor 2.3.1 Postgresql 15
I was testing a backup cleaning script during which the PerconaPGBackup suddenly stopped creating backup jobs.
It is unknown when the issue happened but I think it was possibly during some mistakes when the script deleted some PerconaPGBackups that were not completed
It is useless to post the script code because I don’t know during which change this happened
Every PerconaPGBackup looks like this now, without a job created:

apiVersion: pgv2.percona.com/v2
kind: PerconaPGBackup
metadata:
  creationTimestamp: '2024-08-28T06:49:25Z'
  generateName: service-1-pg-db-repo2-incr-
  generation: 1
  managedFields:
    - apiVersion: pgv2.percona.com/v2
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:generateName: {}
        f:spec:
          .: {}
          f:options: {}
          f:pgCluster: {}
          f:repoName: {}
      manager: postgres-operator
      operation: Update
      time: '2024-08-28T06:49:25Z'
  name: service-1-pg-db-repo2-incr-pdsn9
  namespace: postgres
  resourceVersion: '569677870'
  uid: fddf2d7c-b598-4471-a92b-cff304e42051
  selfLink: >-
    /apis/pgv2.percona.com/v2/namespaces/postgres/perconapgbackups/service-1-pg-db-repo2-incr-pdsn9
spec:
  options:
    - '--type=incr'
  pgCluster: service-1-pg-db
  repoName: repo2

Hi,
So I continued testing and I realised that the backups stops working because I am removing the PerconaPGBackups CRs with a certain pattern.
The script is removing the PerconaPGBackups, leaving only the last 11 ones to see.
In the operator logs it’s complaining that it is waiting for batch of 11 PerconaPGBackups to complete before starting a new backup

Hello @mboncalo

I was testing a backup cleaning script during which the PerconaPGBackup suddenly stopped creating backup jobs.

Check Operator logs for any errors for backup jobs.

Also, Check below things for backup Pod/Job,
List backups:
kubectl get pg-backup

Get backup pods and check logs to see why it’s failing.

kubectl get pods

kubectl logs  <failed backup pod>

You try taking a simple on-demand backup as described here if you are testing on-demand backup and see how it works,

I posted an image where you can see how backups stopped woking.
In the first post I wrote the yaml file of the new PerconaPGBackups that are not creating any backup jobs anymore
There are no failed pods because no pods are created

Do you see any backup job related error/message in the Operator log ?

No backup related errors

Hi @mboncalo, I need to see your CR. Could you please provide it for us?

Hi @Slava_Sarzhan ,
Sorry for the late answer
Database is running in an istio injected namespace with Istio version 1.23.3

You can see that the postgres database image and the pgbackrest image are in my dockerhub because I added some libraries for postgres and for pgbackrest I implemented a curl to terminate the istio container in order for the backup pod to get completed

What I am trying to figure out for more effective troubleshooting is, how does the operator store its metadata ? Because , for example, it remembers and complains about peconapgbackup pr perconapgrestore even if they are removed. Is there a way to clear that metadata ?

apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  annotations:
    current-primary: service-1
    meta.helm.sh/release-name: pg
    meta.helm.sh/release-namespace: postgres
  finalizers:
    - percona.com/stop-watchers
  generation: 582
  labels:
    app.kubernetes.io/instance: pg
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: pg-db
    app.kubernetes.io/version: 2.4.1
    crunchy-pgha-scope: service-1
    deployment-name: service-1
    helm.sh/chart: pg-db-2.4.3
    name: service-1
    pg-cluster: service-1
    pgo-version: 2.4.1
    pgouser: admin
  name: service-1
  namespace: postgres
status:
  host: service-1-pgbouncer.cao-postgres.svc
  pgbouncer: {}
  postgres:
    instances:
      - name: instance1
        ready: 3
        size: 3
    ready: 3
    size: 3
  state: ready
spec:
  backups:
    pgbackrest:
      configuration:
        - secret:
            name: cluster1-pgbackrest-secrets
      image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1
      manual:
        options:
          - '--type=incr'
          - '--annotation="percona.com/backup-name"="service-1-repo1-incr-x4ctt"'
        repoName: repo1
      metadata:
        labels:
          pgv2.percona.com/version: 2.4.1
      repoHost:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
              - podAffinityTerm:
                  labelSelector:
                    matchLabels:
                      postgres-operator.crunchydata.com/data: pgbackrest
                  topologyKey: kubernetes.io/hostname
                weight: 1
        resources: {}
      repos:
        - name: repo1
          schedules:
            full: 0 4 * * *
            incremental: '*/43 * * * *'
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 200Gi
              storageClassName: default
        - name: repo2
          s3:
            bucket: ***
            endpoint: minio.apps.sw.live
            region: default
          schedules:
            full: 0 3 * * *
            incremental: '*/17 * * * *'
  crVersion: 2.4.1
  extensions:
    builtin:
      pg_audit: false
      pg_stat_monitor: false
    custom:
      - name: pg_stat_statements
      - name: pg_auth_mon
      - name: timescaledb
      - name: pg_stat_kcache
      - name: pg_cron
    image: percona/percona-postgresql-operator:2.4.1
    imagePullPolicy: Always
    storage: {}
  image: mboncalo/percona-postgresql-operator:2.4.1-ppg15.7-postgres
  imagePullPolicy: Always
  instances:
    - affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: postgres-operator.crunchydata.com/cluster
                      operator: In
                      values:
                        - service-1
                topologyKey: failure-domain.beta.kubernetes.io/zone
              weight: 10
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: postgres-operator.crunchydata.com/cluster
                      operator: In
                      values:
                        - service-1
                topologyKey: kubernetes.io/hostname
              weight: 10
      dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi
      metadata:
        labels:
          pgv2.percona.com/version: 2.4.1
      name: instance1
      replicas: 3
      resources:
        limits:
          cpu: '8'
          memory: 16Gi
        requests:
          cpu: '8'
          memory: 16Gi
  openshift: false
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          checkpoint_completion_target: '0.9'
          default_statistics_target: '100'
          effective_cache_size: 10752MB
          effective_io_concurrency: '200'
          maintenance_work_mem: 896MB
          max_connections: '1024'
          max_parallel_maintenance_workers: '2'
          max_parallel_workers: '4'
          max_parallel_workers_per_gather: '2'
          max_wal_size: 4GB
          max_worker_processes: '4'
          min_wal_size: 1GB
          password_encryption: scram-sha-256
          random_page_cost: '1.1'
          shared_buffers: 3584MB
          shared_preload_libraries: >-
            pg_stat_statements,pg_auth_mon,set_user,timescaledb,pg_cron,pg_stat_kcache
          wal_buffers: 16MB
          work_mem: 1792kB
    leaderLeaseDurationSeconds: 30
    port: 8008
    syncPeriodSeconds: 10
  pause: false
  pmm:
    enabled: false
    image: percona/pmm-client:2.42.0
    resources: {}
    secret: service-1-pmm-secret
    serverHost: monitoring-service
  port: 5432
  postgresVersion: 15
  proxy:
    pgBouncer:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    postgres-operator.crunchydata.com/role: pgbouncer
                topologyKey: kubernetes.io/hostname
              weight: 1
      config: {}
      exposeSuperusers: true
      image: percona/percona-postgresql-operator:2.4.1-ppg15.7-pgbouncer1.22.1
      metadata:
        labels:
          pgv2.percona.com/version: 2.4.1
      port: 5432
      replicas: 0
      resources: {}
  secrets: {}
  standby:
    enabled: false
  unmanaged: false

the pgbackrest secret:

[global]
repo1-retention-full=10
repo1-retention-full-type=time
repo1-s3-uri-style=path
repo2-s3-key=***
repo2-s3-key-secret=***
repo2-storage-verify-tls=y
repo2-retention-full=20
repo2-retention-full-type=time
repo2-s3-uri-style=path
archive-timeout=10ms