Percona Operator Backup Fails

TimM · November 19, 2022, 10:17am

Firstly, thank you to the devs for an excellent operator - it has been a pleasure setting it up over the last couple of days.

However, I cannot get backup to local filesystem working. I am using Helm (through Ansible) to run it on a 3 node bare metal test K3S cluster. I am testing local backup at this stage. At the appointed backup time, the db-pxc-2 pod enters a crashloop erroring in stage 3.

Just tried to add attachments but as a new user I’m not permitted so I’ll try to add them on a second post.

TimM · November 19, 2022, 10:21am

Looks like I still cannot add attachments so here at least is the section log for the db-pxc-2 pod showing errors:

{“log”:“2022-11-19T09:53:56.861821Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)\n\t at gcomm/src/pc.cpp:connect():161\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:56.861884Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862114Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862199Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862365Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1758: Failed to open channel ‘vaultwarden-db-pxc-db-pxc’ at ‘gcomm://vaultwarden-db-pxc-db-pxc-0.vaultwarden-db-pxc-db-pxc,vaultwarden-db-pxc-db-pxc-1.vaultwarden-db-pxc-db-pxc’: -110 (Connection timed out)\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862411Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862445Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://vaultwarden-db-pxc-db-pxc-0.vaultwarden-db-pxc-db-pxc,vaultwarden-db-pxc-db-pxc-1.vaultwarden-db-pxc-db-pxc) failed to establish connection with cluster (reason: 7)\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862479Z 0 [ERROR] [MY-010119] [Server] Aborting\n”,“file”:“/var/lib/mysql/mysqld-error.log”}
{“log”:“2022-11-19T09:53:57.862869Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.27-18.1) Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3.\n”,“file”:“/var/lib/mysql/mysqld-error.log”}

TimM · November 19, 2022, 10:25am

and here is the kubectl describe of the backup section of the pxc resource:

Spec:
  Backup:
    Image:  percona/percona-xtradb-cluster-operator:1.11.0-pxc8.0-backup
    Pitr:
      Enabled:  false
    Schedule:
      Keep:          3
      Name:          daily-backup
      Schedule:      09 53 * * *
      Storage Name:  backup-vaultwarden-db
    Storages:
      Backup - Vaultwarden - Db:
        Type:  filesystem
        Volume:
          Persistent Volume Claim:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:  4Gi
      Fs - Pvc:
        Type:  filesystem
        Volume:
          Persistent Volume Claim:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:         6Gi
  Cr Version:                    1.11.0
  Enable CR Validation Webhook:  false
  Haproxy:
    Affinity:
      Anti Affinity Topology Key:  kubernetes.io/hostname
    Annotations:
    Enabled:       true
    Grace Period:  30
    Image:         percona/percona-xtradb-cluster-operator:1.11.0-haproxy
    Labels:
    Liveness Delay Sec:  300
    Liveness Probes:
      Failure Threshold:      4
      Initial Delay Seconds:  60
      Period Seconds:         30
      Success Threshold:      1
      Timeout Seconds:        5
    Node Selector:
    Pod Disruption Budget:
      Max Unavailable:    1
    Readiness Delay Sec:  15
    Readiness Probes:
      Failure Threshold:       3
      Initial Delay Seconds:   15
      Period Seconds:          5
      Success Threshold:       1
      Timeout Seconds:         1
    Replicas Service Enabled:  true
    Resources:
      Limits:
        Cpu:     1
        Memory:  1Gi
      Requests:
        Cpu:     500m
        Memory:  0.5Gi
    Sidecar PV Cs:
    Sidecar Resources:
      Limits:
      Requests:
    Sidecar Volumes:
    Sidecars:
    Size:  3
    Tolerations:
    Volume Spec:
      Empty Dir:
  Init Image:                 percona/percona-xtradb-cluster-operator:1.11.0
  Log Collector Secret Name:  vaultwarden-db-pxc-db-log-collector
  Logcollector:
    Enabled:  true
    Image:    percona/percona-xtradb-cluster-operator:1.11.0-logcollector
    Resources:
      Limits:
      Requests:
        Cpu:     200m
        Memory:  100M
  Pause:         false
  Pmm:
    Enabled:  false
  Proxysql:
    Enabled:  false
  Pxc:
    Affinity:
      Anti Affinity Topology Key:  kubernetes.io/hostname
    Annotations:
    Auto Recovery:  true
    Grace Period:   600
    Image:          percona/percona-xtradb-cluster:8.0.27-18.1
    Labels:
    Liveness Delay Sec:  300
    Liveness Probes:
      Failure Threshold:      3
      Initial Delay Seconds:  300
      Period Seconds:         10
      Success Threshold:      1
      Timeout Seconds:        5
    Node Selector:
    Pod Disruption Budget:
      Max Unavailable:    1
    Readiness Delay Sec:  15
    Readiness Probes:
      Failure Threshold:      5
      Initial Delay Seconds:  15
      Period Seconds:         30
      Success Threshold:      1
      Timeout Seconds:        15
    Resources:
      Limits:
        Cpu:     1
        Memory:  1Gi
      Requests:
        Cpu:     500m
        Memory:  0.5Gi
    Sidecar PV Cs:
    Sidecar Resources:
      Limits:
      Requests:
    Sidecar Volumes:
    Sidecars:
    Size:  3
    Tolerations:
    Volume Spec:
      Persistent Volume Claim:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         2Gi
        Storage Class Name:  local-path
  Secrets Name:              vaultwarden-db-pxc-db
  Ssl Internal Secret Name:  vaultwarden-db-pxc-db-ssl-internal
  Ssl Secret Name:           vaultwarden-db-pxc-db-ssl
  Update Strategy:           SmartUpdate
  Upgrade Options:
    Apply:                     8.0-recommended
    Schedule:                  0 4 * * *
    Version Service Endpoint:  https://check.percona.com
  Vault Secret Name:           vaultwarden-db-pxc-db-vault

TimM · November 19, 2022, 10:55am

(post deleted by author)

TimM · November 19, 2022, 11:04am

Just tried explicitly adding backup.storages.volume.persistentVolumeClaim.storageClassName as this was missing from the Helm chart but still got the same error crashloop in db-pxc-2 pod.

Sergey_Pronin · November 28, 2022, 9:02am

Hey @TimM ,
Can you please share you values.yaml that you use to deploy the cluster?
What do you use for storage?

TimM · November 28, 2022, 10:02am

Hi Sergey, thank you for your response.

Here is the values.yaml. I am currently testing with in-cluster storage but will use ultimately use S3. In the last attempt I was using longhorn but I also tried with (Rancher) local-path storageClass. Have I missed/misconfigured something?

pxc:
  size: 3
  resources:
    requests:
      cpu: 250m
      memory: 0.5Gi
    limits:
      cpu: 1
      memory: 1Gi                           
  persistence:
    enabled: true
    storageClass: local-path
    accessMode: ReadWriteOnce
    size: 2Gi
haproxy:
  size: 3
  resources:
    requests:
      cpu: 100m
      memory: 0.5Gi
    limits:
      cpu: 1
      memory: 1Gi
backup:
 enabled: true
 image: "percona/percona-xtradb-cluster-operator:1.11.0-pxc8.0-backup"
 schedule:
   - name: "daily-backup"
     schedule: "12 15 * * *"
     keep: 3
     storageName: backup-vaultwarden-db
 storages:
   backup-vaultwarden-db:
     type: filesystem
     volume:
       persistentVolumeClaim:
         storageClassName: longhorn
         accessModes: [ "ReadWriteOnce" ]
         resources:
           requests:
             storage: 4Gi

Sergey_Pronin · December 5, 2022, 8:04am

Hello @TimM ,

I cannot see any issues with your config.

For every backup there are backup pods created. Can you please show the log from one of them?

Topic		Replies	Views
Random pxc node fails with gcomm issues Percona Operator for MySQL mysql , percona	7	1420	January 12, 2023
Problems with backup Percona Operator for MySQL percona , kubernetes	8	219	September 2, 2024
I cannot take backup via percona mysql operator Percona Operator for MySQL	1	684	November 21, 2023
SmartUpdate breaks pxc pod when "applying changes" Percona Operator for MySQL	1	74	September 25, 2024
Backups fail after upgrading to 1.11.0 Percona Operator for MySQL	2	867	June 1, 2023

Percona Operator Backup Fails

Related topics