Unable to open log file '/pgdata/pgbackrest/log/db-restore.log'

Description:

I’m launching a new Percona PG cluster with a data source for restoring a backup from S3.

The pod starts with the log message:

Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg15 --repo=1 --delta --link-map=pg_wal=/pgdata/pg15_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg15' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.

The pod sits, apparently idle? (no logs) for about 40 minutes and then restore starts normally and is successful.

Maybe the pod is doing work and I’m just not seeing anything because of this apparent disabled log functionality. How can I fix that?

Steps to Reproduce:

Here’s the manifest of the cluster:

apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  annotations:
  finalizers:
    - percona.com/delete-pvc
    - percona.com/delete-ssl
  name: cluster1
  namespace: giba
spec:
  backups:
    pgbackrest:
      configuration:
        - secret:
            name: pg-s3-backup-creds
      global:
        repo1-path: >-
          /giba-cloudstack-l9vvuk2p/pgbackrest/giba-sample-apps-prod-cluster1-backup
        repo1-retention-diff: '7'
        repo1-retention-full: '4'
        repo1-retention-full-type: count
      image: 'percona/percona-postgresql-operator:2.2.0-ppg15-pgbackrest'
      jobs:
        resources:
          limits:
            cpu: '2'
            memory: 512Mi
          requests:
            cpu: 40m
            memory: 512Mi
      manual:
        options:
          - '--type=full'
        repoName: repo1
      repos:
        - name: repo1
          s3:
            bucket: giba-e2-bucket
            endpoint: 'https://y0g4.mi22.idrivee2-7.com'
            region: us-east-1
          schedules:
            differential: 17 2 * * *
            full: 17 1 * * 6
  crVersion: 2.2.0
  dataSource:
    pgbackrest:
      configuration:
        - secret:
            name: pg-s3-backup-creds
      global:
        repo1-path: /eks-conbkhho/pgbackrest/giba-sample-apps-prod-cluster1-backup
      repo:
        name: repo1
        s3:
          bucket: giba-e2-bucket
          endpoint: 'https://y0g4.mi22.idrivee2-7.com'
          region: us-east-1
      stanza: db
  databaseInitSQL:
    key: init.sql
    name: odoo-init-sql
  image: 'percona/percona-postgresql-operator:2.2.0-ppg15-postgres'
  imagePullPolicy: Always
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: local-path
      name: cluster1-instance1
      replicas: 3
      resources:
        limits:
          cpu: '2'
          memory: 4Gi
        requests:
          cpu: 300m
          memory: 4Gi
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              postgres-operator.crunchydata.com/instance-set: cluster1-instance1
          maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
  pmm:
    enabled: false
    image: 'percona/pmm-client:2.37.0'
    secret: cluster1-pmm-secret
    serverHost: monitoring-service
  port: 5432
  postgresVersion: 15
  proxy:
    pgBouncer:
      config:
        global:
          client_tls_sslmode: disable
      image: 'percona/percona-postgresql-operator:2.2.0-ppg15-pgbouncer'
      port: 5432
      replicas: 3
      resources:
        limits:
          cpu: '2'
          memory: 512Mi
        requests:
          cpu: 40m
          memory: 512Mi
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              postgres-operator.crunchydata.com/cluster: cluster1
              postgres-operator.crunchydata.com/role: pgbouncer
          maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
  users:
    - databases:
        - odoo
      name: odoo
      options: CREATEDB CREATEROLE
      secretName: cluster1-odoo-pg
status:
  host: cluster1-pgbouncer.giba.svc
  pgbouncer: {}
  postgres:
    instances:
      - name: cluster1-instance1
        ready: 1
        size: 3
    ready: 1
    size: 3
  state: initializing

Version:

The version I’m specifying is 2.2.0 for the crVersion. I’m installing version 2.2.2 of the pg-operator chart in Percona Helm Charts | percona-helm-charts

The CRDs were downloaded from https://raw.githubusercontent.com/percona/percona-helm-charts/pg-operator-2.2.2/charts/pg-operator/crds/crd.yaml

Expected Result:

Expected that some logs would be displayed during the first 40 minutes of execution of the pod launched by the restore job.

Actual Result:

Apart from this long silence of 40 minutes the restore seemed to work fine.

Hi @gmautner,

While the restore is ongoing, were you able to log in to the pod and check if the failing directory existed and had correct ownership? What do you get from the following inside that pod’s pgbackrest-restore container?

ls -l /pgdata/pgbackrest/log/

This could be a new bug, in which case should be reported to https://jira.percona.com/projects/K8SPG.

The pod sits, apparently idle? (no logs) for about 40 minutes and then restore starts normally and is successful.

The fact that we can’t see logs (due to the WARN messages about no such directory existing) doesn’t mean the restore procedure is idle.

Maybe the pod is doing work and I’m just not seeing anything because of this apparent disabled log functionality. How can I fix that?

You could also check for ps aux outputs inside that container and check the kubectl logs pod_name to see if you have additional information printed there.

You can use kubectl logs -f pod_name to follow the changes in outputs (like tail -f would).

Let us know how this goes.