Percona Operator Backup Fails

Firstly, thank you to the devs for an excellent operator - it has been a pleasure setting it up over the last couple of days.

However, I cannot get backup to local filesystem working. I am using Helm (through Ansible) to run it on a 3 node bare metal test K3S cluster. I am testing local backup at this stage. At the appointed backup time, the db-pxc-2 pod enters a crashloop erroring in stage 3.

Just tried to add attachments but as a new user I’m not permitted so I’ll try to add them on a second post.

1 Like

Looks like I still cannot add attachments so here at least is the section log for the db-pxc-2 pod showing errors:

{“log”:“2022-11-19T09:53:56.861821Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)\n\t at gcomm/src/pc.cpp:connect():161\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:56.861884Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862114Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862199Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862365Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1758: Failed to open channel ‘vaultwarden-db-pxc-db-pxc’ at ‘gcomm://vaultwarden-db-pxc-db-pxc-0.vaultwarden-db-pxc-db-pxc,vaultwarden-db-pxc-db-pxc-1.vaultwarden-db-pxc-db-pxc’: -110 (Connection timed out)\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862411Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862445Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://vaultwarden-db-pxc-db-pxc-0.vaultwarden-db-pxc-db-pxc,vaultwarden-db-pxc-db-pxc-1.vaultwarden-db-pxc-db-pxc) failed to establish connection with cluster (reason: 7)\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862479Z 0 [ERROR] [MY-010119] [Server] Aborting\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862869Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.27-18.1) Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3.\n”,“file”:“/var/lib/mysql/mysqld-error.log”}

1 Like

and here is the kubectl describe of the backup section of the pxc resource:

Spec:
  Backup:
    Image:  percona/percona-xtradb-cluster-operator:1.11.0-pxc8.0-backup
    Pitr:
      Enabled:  false
    Schedule:
      Keep:          3
      Name:          daily-backup
      Schedule:      09 53 * * *
      Storage Name:  backup-vaultwarden-db
    Storages:
      Backup - Vaultwarden - Db:
        Type:  filesystem
        Volume:
          Persistent Volume Claim:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:  4Gi
      Fs - Pvc:
        Type:  filesystem
        Volume:
          Persistent Volume Claim:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:         6Gi
  Cr Version:                    1.11.0
  Enable CR Validation Webhook:  false
  Haproxy:
    Affinity:
      Anti Affinity Topology Key:  kubernetes.io/hostname
    Annotations:
    Enabled:       true
    Grace Period:  30
    Image:         percona/percona-xtradb-cluster-operator:1.11.0-haproxy
    Labels:
    Liveness Delay Sec:  300
    Liveness Probes:
      Failure Threshold:      4
      Initial Delay Seconds:  60
      Period Seconds:         30
      Success Threshold:      1
      Timeout Seconds:        5
    Node Selector:
    Pod Disruption Budget:
      Max Unavailable:    1
    Readiness Delay Sec:  15
    Readiness Probes:
      Failure Threshold:       3
      Initial Delay Seconds:   15
      Period Seconds:          5
      Success Threshold:       1
      Timeout Seconds:         1
    Replicas Service Enabled:  true
    Resources:
      Limits:
        Cpu:     1
        Memory:  1Gi
      Requests:
        Cpu:     500m
        Memory:  0.5Gi
    Sidecar PV Cs:
    Sidecar Resources:
      Limits:
      Requests:
    Sidecar Volumes:
    Sidecars:
    Size:  3
    Tolerations:
    Volume Spec:
      Empty Dir:
  Init Image:                 percona/percona-xtradb-cluster-operator:1.11.0
  Log Collector Secret Name:  vaultwarden-db-pxc-db-log-collector
  Logcollector:
    Enabled:  true
    Image:    percona/percona-xtradb-cluster-operator:1.11.0-logcollector
    Resources:
      Limits:
      Requests:
        Cpu:     200m
        Memory:  100M
  Pause:         false
  Pmm:
    Enabled:  false
  Proxysql:
    Enabled:  false
  Pxc:
    Affinity:
      Anti Affinity Topology Key:  kubernetes.io/hostname
    Annotations:
    Auto Recovery:  true
    Grace Period:   600
    Image:          percona/percona-xtradb-cluster:8.0.27-18.1
    Labels:
    Liveness Delay Sec:  300
    Liveness Probes:
      Failure Threshold:      3
      Initial Delay Seconds:  300
      Period Seconds:         10
      Success Threshold:      1
      Timeout Seconds:        5
    Node Selector:
    Pod Disruption Budget:
      Max Unavailable:    1
    Readiness Delay Sec:  15
    Readiness Probes:
      Failure Threshold:      5
      Initial Delay Seconds:  15
      Period Seconds:         30
      Success Threshold:      1
      Timeout Seconds:        15
    Resources:
      Limits:
        Cpu:     1
        Memory:  1Gi
      Requests:
        Cpu:     500m
        Memory:  0.5Gi
    Sidecar PV Cs:
    Sidecar Resources:
      Limits:
      Requests:
    Sidecar Volumes:
    Sidecars:
    Size:  3
    Tolerations:
    Volume Spec:
      Persistent Volume Claim:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         2Gi
        Storage Class Name:  local-path
  Secrets Name:              vaultwarden-db-pxc-db
  Ssl Internal Secret Name:  vaultwarden-db-pxc-db-ssl-internal
  Ssl Secret Name:           vaultwarden-db-pxc-db-ssl
  Update Strategy:           SmartUpdate
  Upgrade Options:
    Apply:                     8.0-recommended
    Schedule:                  0 4 * * *
    Version Service Endpoint:  https://check.percona.com
  Vault Secret Name:           vaultwarden-db-pxc-db-vault
1 Like

(post deleted by author)

1 Like

Just tried explicitly adding backup.storages.volume.persistentVolumeClaim.storageClassName as this was missing from the Helm chart but still got the same error crashloop in db-pxc-2 pod.

1 Like

Hey @TimM ,
Can you please share you values.yaml that you use to deploy the cluster?
What do you use for storage?

1 Like

Hi Sergey, thank you for your response.

Here is the values.yaml. I am currently testing with in-cluster storage but will use ultimately use S3. In the last attempt I was using longhorn but I also tried with (Rancher) local-path storageClass. Have I missed/misconfigured something?

pxc:
  size: 3
  resources:
    requests:
      cpu: 250m
      memory: 0.5Gi
    limits:
      cpu: 1
      memory: 1Gi                           
  persistence:
    enabled: true
    storageClass: local-path
    accessMode: ReadWriteOnce
    size: 2Gi
haproxy:
  size: 3
  resources:
    requests:
      cpu: 100m
      memory: 0.5Gi
    limits:
      cpu: 1
      memory: 1Gi
backup:
 enabled: true
 image: "percona/percona-xtradb-cluster-operator:1.11.0-pxc8.0-backup"
 schedule:
   - name: "daily-backup"
     schedule: "12 15 * * *"
     keep: 3
     storageName: backup-vaultwarden-db
 storages:
   backup-vaultwarden-db:
     type: filesystem
     volume:
       persistentVolumeClaim:
         storageClassName: longhorn
         accessModes: [ "ReadWriteOnce" ]
         resources:
           requests:
             storage: 4Gi
1 Like

Hello @TimM ,

I cannot see any issues with your config.

For every backup there are backup pods created. Can you please show the log from one of them?

1 Like