Existing database import almost impossible (PXC 8.0.27)

Hello,

I’m trying to import 4 GB dump of gzipped SQL data (100+ tables) into PXC 8.0.27 established on on-prem Kubernetes with iSCSI based PVC. I have 3 PXC and 3 haproxy pods, which means 1 PXC and 1 haproxy pod per node. The dump is from MySQL 8 single docker container, which is currently used in production. I have created a script which imports each table separately and small tables (under 100k rows) are processed instantly, meanwhile bigger tables (over 1m rows) cause huge issues in stability of PXC. This means that as soon as after first bigger table is successfully imported (until this one, there is no any issue at all) the next import via mysql client gets error: WSREP has not yet prepared node for application use. In order to overcome this issue my script implements 2 minute backoff and retry after that which in most cases resolves issues until the end of next table import (the WSREP error then shows up again and is resolved after waiting for 2 minutes, so the next import can be done). After approximately five successful bigger tables import then everytime at least one or two pxc containers inside PXC pods results in errored state: Readiness probe failed: + [[ Primary == \P\r\i\m\a\r\y ]] + [[ 3 -eq 4 ]] + [[ 3 -eq 2 ]] + exit 1

Later on I get this error from mysql client (with which the tables are imported; VM with it is in the same 10G network as Kubernetes cluster): ERROR 2013 (HY000) at line 21: Lost connection to MySQL server during query

The line here ^ is different for depending on the table, but it is usually at the beginning of a huge INSERT statement.

The only config which I’m having set for pxc is to overcome problem with GROUP BY statements. Complete Kubernetes yaml:

apiVersion: pxc.percona.com/v1-11-0
kind: PerconaXtraDBCluster
metadata:
  name: pxc-cluster
spec:
  crVersion: 1.11.0
  secretsName: pxc-cluster-secrets

  upgradeOptions:
    apply: 8.0-recommended
    schedule: '0 4 * * *'
  pxc:
    size: 3
    image: percona/percona-xtradb-cluster:8.0.27
    autoRecovery: true
    configuration: |
      [mysqld]
      sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
    resources:
      requests:
        memory: 12G
        cpu: 2000m
      limits:
        memory: 16G
        cpu: 6000m
    affinity:
      topologyKey: kubernetes.io/hostname
    podDisruptionBudget:
      maxUnavailable: 1
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: iscsi-storage
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200G
    gracePeriod: 600
  haproxy:
    enabled: true
    size: 3
    image: percona/percona-xtradb-cluster-operator:1.11.0-haproxy
    resources:
      requests:
        memory: 1G
        cpu: 600m
      limits:
        memory: 2G
        cpu: 1000m
    podDisruptionBudget:
      maxUnavailable: 1
  logcollector:
    enabled: true
    image: percona/percona-xtradb-cluster-operator:1.11.0-logcollector

I would really like to make PXC working with this data as it is part of migration of that application to Kubernetes. I’m a little worried if such errors could appear with bigger usage impact on DB, as this would then cause issues on production. I would really appreciate anything on how to overcome described issue with import.

Thank you in advance!

1 Like

I’m hoping you understand the synchronous nature of PXC. When you commit txn on node1, that node waits for the other nodes to acknowledge they have received that txn. If another node queues up too many txns, flow control will kick in and cause all other nodes to slow down. This is probably what is affecting you. Also, if you are doing giant transactions, these are un-optimal. Check out PXC Streaming Replication to optimize large batch imports.

Lastly, if this 4GB dump is only going to happen once, I would stop node2 and node3 and import the dump just to node1. Then bring node2 online and let it SST all of the data at once from node1. Then repeat for node3. That will probably result in a faster import.

2 Likes

Thanks for reply. I understand that cluster has different way of working than single node, but I have never imported existing DB to the cluster before, so I’m having a bit of a hard time with this now.

4GB dump will only be imported once, so I would follow your tip, but there is also one problem… I can’t scale PXC pods from 3 to 1 as this is probably not expected (even when size in deployment is specified to 1, all 3 pods are created) and also if I delete second and third pod, they will come back because of K8s scheduler and its rules. Any idea on how to work around this?

1 Like

You can set allowUnsafeConfigurations=true in the cr.yml to allow a single node to be deployed. Deploy that, scale down to 1. Do the import. Then scale up to 3. This should force an SST to each new node.

2 Likes

Thank you! This completely resolved my problem.

If anyone else will follow the same way of import, just make sure that you really start with three nodes, then scale down to 1 and do import. After import was done in my case, I did “step by step” scale up as Matthew wrote in his initial reply - first to 2 nodes and when sync was done to 3 nodes. The status of sync progress is visible in logs container, where you can also see when it is done.

2 Likes

I’m experiencing something similar however scaling the cluster down to 1 replica does not help, no matter what i do the liveness probe fails and the import will fail. It’s a InnoDB ~30 GB sql file that imports just fine on a single Percona server ( similar version )