Fresh instance with Percona XtraDB Cluster Operator v1.8.0 not starting completly under OKD

pavloos · October 11, 2021, 2:52pm

I have the same problem:

vanilla scaleway kapsule cluster v1.21.4
coredns CoreDNS-1.8.4
operator and db installed using helm charts
operator running in db namespace

 k -n db get deployments.apps pxc-operator
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
pxc-operator   1/1     1            1           6m12s
❯ k -n db get deployments.apps pxc-operator -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: pxc-operator
    meta.helm.sh/release-namespace: db
  creationTimestamp: "2021-10-11T13:37:59Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: pxc-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: pxc-operator
    app.kubernetes.io/version: 1.9.0
    helm.sh/chart: pxc-operator-1.9.1
  name: pxc-operator
  namespace: db
  resourceVersion: "46352088"
  uid: b8c837b2-778f-466c-8d50-7967368ec120
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: operator
      app.kubernetes.io/instance: pxc-operator
      app.kubernetes.io/name: pxc-operator
      app.kubernetes.io/part-of: pxc-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: operator
        app.kubernetes.io/instance: pxc-operator
        app.kubernetes.io/name: pxc-operator
        app.kubernetes.io/part-of: pxc-operator
    spec:
      containers:
      - command:
        - percona-xtradb-cluster-operator
        env:
        - name: WATCH_NAMESPACE
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OPERATOR_NAME
          value: pxc-operator
        image: percona/percona-xtradb-cluster-operator:1.9.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /metrics
            port: metrics
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: pxc-operator
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: pxc-operator
      serviceAccountName: pxc-operator
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-10-11T13:37:59Z"
    lastUpdateTime: "2021-10-11T13:37:59Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-10-11T13:37:59Z"
    lastUpdateTime: "2021-10-11T13:38:10Z"
    message: ReplicaSet "pxc-operator-5998c9b5cb" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

pxc in men namespace

❯ k -n men get pxc -o yaml
apiVersion: v1
items:
- apiVersion: pxc.percona.com/v1
  kind: PerconaXtraDBCluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"pxc.percona.com/v1-9-0","kind":"PerconaXtraDBCluster"}
      meta.helm.sh/release-name: db
      meta.helm.sh/release-namespace: men
    creationTimestamp: "2021-10-11T13:38:47Z"
    finalizers:
    - delete-pxc-pods-in-order
    - delete-proxysql-pvc
    - delete-pxc-pvc
    generation: 2
    labels:
      app.kubernetes.io/instance: db
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: pxc-db
      app.kubernetes.io/version: 1.9.0
      helm.sh/chart: pxc-db-1.9.1
    name: db-pxc-db
    namespace: men
    resourceVersion: "46354859"
    uid: 36eae74f-f7c4-4df0-8dff-1ccc2491eb68
  spec:
    backup:
      image: percona/percona-xtradb-cluster-operator:1.9.0-pxc8.0-backup
      imagePullPolicy: Always
      pitr:
        enabled: false
        storageName: ""
      schedule:
      - keep: 5
        name: daily-backup
        schedule: 0 0 * * *
        storageName: fs-pvc
      storages:
        fs-pvc:
          podSecurityContext:
            fsGroup: 1001
            supplementalGroups:
            - 1001
          s3:
            bucket: ""
            credentialsSecret: ""
          type: filesystem
          volume:
            persistentVolumeClaim:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 6Gi
    crVersion: 1.9.0
    enableCRValidationWebhook: false
    haproxy:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      enabled: true
      envVarsSecret: db-pxc-db-env-vars-haproxy
      gracePeriod: 30
      image: percona/percona-xtradb-cluster-operator:1.9.0-haproxy
      imagePullPolicy: Always
      livenessProbes:
        failureThreshold: 4
        initialDelaySeconds: 60
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 5
      podDisruptionBudget:
        maxUnavailable: 1
      readinessProbes:
        failureThreshold: 3
        initialDelaySeconds: 15
        periodSeconds: 5
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        limits: {}
        requests:
          cpu: 600m
          memory: 1G
      serviceAccountName: default
      sidecarResources:
        limits: {}
        requests: {}
      size: 3
      volumeSpec:
        emptyDir: {}
    logCollectorSecretName: db-pxc-db-log-collector
    logcollector:
      enabled: true
      image: percona/percona-xtradb-cluster-operator:1.9.0-logcollector
      imagePullPolicy: Always
      resources:
        limits: {}
        requests: {}
    platform: kubernetes
    pmm:
      resources:
        limits: {}
        requests:
          cpu: 600m
          memory: 1G
    proxysql:
      livenessProbes: {}
      podSecurityContext:
        fsGroup: 1001
        supplementalGroups:
        - 1001
      readinessProbes: {}
    pxc:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      autoRecovery: true
      envVarsSecret: db-pxc-db-env-vars-pxc
      expose: {}
      gracePeriod: 600
      image: percona/percona-xtradb-cluster:8.0.23-14.1
      imagePullPolicy: Always
      livenessDelaySec: 300
      livenessProbes:
        failureThreshold: 3
        initialDelaySeconds: 300
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      podDisruptionBudget:
        maxUnavailable: 1
      podSecurityContext:
        fsGroup: 1001
        supplementalGroups:
        - 1001
      readinessDelaySec: 15
      readinessProbes:
        failureThreshold: 5
        initialDelaySeconds: 15
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 15
      resources:
        limits: {}
        requests:
          cpu: 600m
          memory: 1G
      serviceAccountName: default
      sidecarResources:
        limits: {}
        requests: {}
      size: 3
      sslInternalSecretName: db-pxc-db-ssl-internal
      sslSecretName: db-pxc-db-ssl
      vaultSecretName: db-pxc-db-vault
      volumeSpec:
        emptyDir: {}
    secretsName: db-pxc-db
    sslInternalSecretName: db-pxc-db-ssl-internal
    sslSecretName: db-pxc-db-ssl
    updateStrategy: SmartUpdate
    upgradeOptions:
      apply: 8.0-recommended
      schedule: 0 4 * * *
      versionServiceEndpoint: https://check.percona.com
    vaultSecretName: db-pxc-db-vault
  status:
    backup:
      version: 8.0.23
    conditions:
    - lastTransitionTime: "2021-10-11T13:38:52Z"
      status: "True"
      type: initializing
    haproxy:
      labelSelectorPath: app.kubernetes.io/component=haproxy,app.kubernetes.io/instance=db-pxc-db,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster
      size: 3
      status: initializing
    host: db-pxc-db-haproxy.men
    logcollector:
      version: 1.9.0
    observedGeneration: 2
    pmm:
      version: 2.18.0
    proxysql: {}
    pxc:
      image: percona/percona-xtradb-cluster:8.0.23-14.1
      labelSelectorPath: app.kubernetes.io/component=pxc,app.kubernetes.io/instance=db-pxc-db,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster
      size: 3
      status: initializing
      version: 8.0.23-14.1
    ready: 0
    size: 6
    state: initializing
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

the cluster is not starting due to DNS resolution

❯ k -n men get pods
NAME                  READY   STATUS    RESTARTS   AGE
db-pxc-db-haproxy-0   1/2     Running   13         55m
db-pxc-db-pxc-0       2/3     Running   3          55m

❯ k -n men exec -ti dnsutils -- nslookup db-pxc-db-pxc
Server:         10.32.0.10
Address:        10.32.0.10#53

** server can't find db-pxc-db-pxc: NXDOMAIN

command terminated with exit code 1
❯ k -n men exec -ti dnsutils -- nslookup db-pxc-db-pxc-unready
Server:         10.32.0.10
Address:        10.32.0.10#53

** server can't find db-pxc-db-pxc-unready: NXDOMAIN

command terminated with exit code 1

❯ k logs db-pxc-db-pxc-0 -c pxc | tail
2021/10/11 14:35:54 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:55 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:56 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:57 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:58 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:59 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:00 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:01 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:02 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:03 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host

❯ k -n men describe pod db-pxc-db-pxc-0 | tail -20
Events:
  Type     Reason     Age   From     Message
  ----     ------     ----  ----     -------
  Normal   Pulled     60m   kubelet  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector" in 24.784727162s
  Normal   Created    60m   kubelet  Created container logs
  Normal   Started    60m   kubelet  Started container logs
  Normal   Pulling    60m   kubelet  Pulling image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector"
  Normal   Pulled     60m   kubelet  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector" in 1.942397197s
  Normal   Created    60m   kubelet  Created container logrotate
  Normal   Pulling    60m   kubelet  Pulling image "percona/percona-xtradb-cluster:8.0.23-14.1"
  Normal   Started    60m   kubelet  Started container logrotate
  Normal   Pulled     60m   kubelet  Successfully pulled image "percona/percona-xtradb-cluster:8.0.23-14.1" in 23.794414542s
  Normal   Created    60m   kubelet  Created container pxc
  Normal   Started    60m   kubelet  Started container pxc
  Warning  Unhealthy  55m   kubelet  Liveness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'db-pxc-db-pxc-0' (111)
+ [[ -n '' ]]
+ exit 1
  Warning  Unhealthy  40s (x116 over 59m)  kubelet  Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'db-pxc-db-pxc-0' (111)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1

Cluster dns resolution seems fine otherwise and endpoints are populated fine

❯ k -n men get endpoints db-pxc-db-pxc-unready
NAME                    ENDPOINTS                                                    AGE
db-pxc-db-pxc-unready   100.64.142.70:33062,100.64.142.70:33060,100.64.142.70:3306   64m

What I think might be the problem is that PXC statefulset’s serviceName field does not match the “unready” service name

❯ k -n men get statefulset db-pxc-db-pxc -o jsonpath='{.spec.serviceName}'
db-pxc-db-pxc

Similar issue is described here

and it is how it’s done for proxysql statefulset which in my tests starts just fine.

Any help appreciated.

Regards

Topic		Replies	Views
Fresh instance with Percona XtraDB Cluster Operator v1.8.0 (using HELM) not starting completly under Rancher Percona Operator for MySQL	4	1320	August 6, 2021
Kubernetes: percona-xtradb-cluster-operator fails to initialize - readiness probe failed Percona Operator for MySQL	15	1863	February 16, 2023
Percona Xtradb cluster never gets created when pxc-db helm chart as soon as pxc-operator is installed Percona Operator for MySQL	1	733	September 19, 2022
XtraDB operators on older version dont have the HAProxy and Cluster working properly MySQL & MariaDB	3	763	January 6, 2022
Openshift Operator provisioned xtradb cluster won't start Percona Operator for MySQL	2	592	February 3, 2022

Fresh instance with Percona XtraDB Cluster Operator v1.8.0 not starting completly under OKD

Related topics