Hi,
Using PG Opertor 2.3.1 Postgresql 15
I was testing a backup cleaning script during which the PerconaPGBackup suddenly stopped creating backup jobs.
It is unknown when the issue happened but I think it was possibly during some mistakes when the script deleted some PerconaPGBackups that were not completed
It is useless to post the script code because I don’t know during which change this happened
Every PerconaPGBackup looks like this now, without a job created:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGBackup
metadata:
creationTimestamp: '2024-08-28T06:49:25Z'
generateName: service-1-pg-db-repo2-incr-
generation: 1
managedFields:
- apiVersion: pgv2.percona.com/v2
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:spec:
.: {}
f:options: {}
f:pgCluster: {}
f:repoName: {}
manager: postgres-operator
operation: Update
time: '2024-08-28T06:49:25Z'
name: service-1-pg-db-repo2-incr-pdsn9
namespace: postgres
resourceVersion: '569677870'
uid: fddf2d7c-b598-4471-a92b-cff304e42051
selfLink: >-
/apis/pgv2.percona.com/v2/namespaces/postgres/perconapgbackups/service-1-pg-db-repo2-incr-pdsn9
spec:
options:
- '--type=incr'
pgCluster: service-1-pg-db
repoName: repo2
Hi,
So I continued testing and I realised that the backups stops working because I am removing the PerconaPGBackups CRs with a certain pattern.
The script is removing the PerconaPGBackups, leaving only the last 11 ones to see.
In the operator logs it’s complaining that it is waiting for batch of 11 PerconaPGBackups to complete before starting a new backup
Hello @mboncalo
I was testing a backup cleaning script during which the PerconaPGBackup suddenly stopped creating backup jobs.
Check Operator logs for any errors for backup jobs.
Also, Check below things for backup Pod/Job,
List backups:
kubectl get pg-backup
Get backup pods and check logs to see why it’s failing.
kubectl get pods
kubectl logs <failed backup pod>
You try taking a simple on-demand backup as described here if you are testing on-demand backup and see how it works,
I posted an image where you can see how backups stopped woking.
In the first post I wrote the yaml file of the new PerconaPGBackups that are not creating any backup jobs anymore
There are no failed pods because no pods are created
Do you see any backup job related error/message in the Operator log ?
Hi @mboncalo, I need to see your CR. Could you please provide it for us?
Hi @Slava_Sarzhan ,
Sorry for the late answer
Database is running in an istio injected namespace with Istio version 1.23.3
You can see that the postgres database image and the pgbackrest image are in my dockerhub because I added some libraries for postgres and for pgbackrest I implemented a curl to terminate the istio container in order for the backup pod to get completed
What I am trying to figure out for more effective troubleshooting is, how does the operator store its metadata ? Because , for example, it remembers and complains about peconapgbackup pr perconapgrestore even if they are removed. Is there a way to clear that metadata ?
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
annotations:
current-primary: service-1
meta.helm.sh/release-name: pg
meta.helm.sh/release-namespace: postgres
finalizers:
- percona.com/stop-watchers
generation: 582
labels:
app.kubernetes.io/instance: pg
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: pg-db
app.kubernetes.io/version: 2.4.1
crunchy-pgha-scope: service-1
deployment-name: service-1
helm.sh/chart: pg-db-2.4.3
name: service-1
pg-cluster: service-1
pgo-version: 2.4.1
pgouser: admin
name: service-1
namespace: postgres
status:
host: service-1-pgbouncer.cao-postgres.svc
pgbouncer: {}
postgres:
instances:
- name: instance1
ready: 3
size: 3
ready: 3
size: 3
state: ready
spec:
backups:
pgbackrest:
configuration:
- secret:
name: cluster1-pgbackrest-secrets
image: mboncalo/percona-postgresql-operator:2.4.1-ppg16.3-pgbackrest2.51-1
manual:
options:
- '--type=incr'
- '--annotation="percona.com/backup-name"="service-1-repo1-incr-x4ctt"'
repoName: repo1
metadata:
labels:
pgv2.percona.com/version: 2.4.1
repoHost:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
weight: 1
resources: {}
repos:
- name: repo1
schedules:
full: 0 4 * * *
incremental: '*/43 * * * *'
volume:
volumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
storageClassName: default
- name: repo2
s3:
bucket: ***
endpoint: minio.apps.sw.live
region: default
schedules:
full: 0 3 * * *
incremental: '*/17 * * * *'
crVersion: 2.4.1
extensions:
builtin:
pg_audit: false
pg_stat_monitor: false
custom:
- name: pg_stat_statements
- name: pg_auth_mon
- name: timescaledb
- name: pg_stat_kcache
- name: pg_cron
image: percona/percona-postgresql-operator:2.4.1
imagePullPolicy: Always
storage: {}
image: mboncalo/percona-postgresql-operator:2.4.1-ppg15.7-postgres
imagePullPolicy: Always
instances:
- affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: postgres-operator.crunchydata.com/cluster
operator: In
values:
- service-1
topologyKey: failure-domain.beta.kubernetes.io/zone
weight: 10
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: postgres-operator.crunchydata.com/cluster
operator: In
values:
- service-1
topologyKey: kubernetes.io/hostname
weight: 10
dataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
metadata:
labels:
pgv2.percona.com/version: 2.4.1
name: instance1
replicas: 3
resources:
limits:
cpu: '8'
memory: 16Gi
requests:
cpu: '8'
memory: 16Gi
openshift: false
patroni:
dynamicConfiguration:
postgresql:
parameters:
checkpoint_completion_target: '0.9'
default_statistics_target: '100'
effective_cache_size: 10752MB
effective_io_concurrency: '200'
maintenance_work_mem: 896MB
max_connections: '1024'
max_parallel_maintenance_workers: '2'
max_parallel_workers: '4'
max_parallel_workers_per_gather: '2'
max_wal_size: 4GB
max_worker_processes: '4'
min_wal_size: 1GB
password_encryption: scram-sha-256
random_page_cost: '1.1'
shared_buffers: 3584MB
shared_preload_libraries: >-
pg_stat_statements,pg_auth_mon,set_user,timescaledb,pg_cron,pg_stat_kcache
wal_buffers: 16MB
work_mem: 1792kB
leaderLeaseDurationSeconds: 30
port: 8008
syncPeriodSeconds: 10
pause: false
pmm:
enabled: false
image: percona/pmm-client:2.42.0
resources: {}
secret: service-1-pmm-secret
serverHost: monitoring-service
port: 5432
postgresVersion: 15
proxy:
pgBouncer:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/role: pgbouncer
topologyKey: kubernetes.io/hostname
weight: 1
config: {}
exposeSuperusers: true
image: percona/percona-postgresql-operator:2.4.1-ppg15.7-pgbouncer1.22.1
metadata:
labels:
pgv2.percona.com/version: 2.4.1
port: 5432
replicas: 0
resources: {}
secrets: {}
standby:
enabled: false
unmanaged: false
the pgbackrest secret:
[global]
repo1-retention-full=10
repo1-retention-full-type=time
repo1-s3-uri-style=path
repo2-s3-key=***
repo2-s3-key-secret=***
repo2-storage-verify-tls=y
repo2-retention-full=20
repo2-retention-full-type=time
repo2-s3-uri-style=path
archive-timeout=10ms