Fresh instance with Percona XtraDB Cluster Operator v1.8.0 not starting completly under OKD

I have the same problem:

  • vanilla scaleway kapsule cluster v1.21.4
  • coredns CoreDNS-1.8.4
  • operator and db installed using helm charts
  • operator running in db namespace
 k -n db get deployments.apps pxc-operator
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
pxc-operator   1/1     1            1           6m12s
❯ k -n db get deployments.apps pxc-operator -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: pxc-operator
    meta.helm.sh/release-namespace: db
  creationTimestamp: "2021-10-11T13:37:59Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: pxc-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: pxc-operator
    app.kubernetes.io/version: 1.9.0
    helm.sh/chart: pxc-operator-1.9.1
  name: pxc-operator
  namespace: db
  resourceVersion: "46352088"
  uid: b8c837b2-778f-466c-8d50-7967368ec120
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: operator
      app.kubernetes.io/instance: pxc-operator
      app.kubernetes.io/name: pxc-operator
      app.kubernetes.io/part-of: pxc-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: operator
        app.kubernetes.io/instance: pxc-operator
        app.kubernetes.io/name: pxc-operator
        app.kubernetes.io/part-of: pxc-operator
    spec:
      containers:
      - command:
        - percona-xtradb-cluster-operator
        env:
        - name: WATCH_NAMESPACE
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OPERATOR_NAME
          value: pxc-operator
        image: percona/percona-xtradb-cluster-operator:1.9.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /metrics
            port: metrics
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: pxc-operator
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: pxc-operator
      serviceAccountName: pxc-operator
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-10-11T13:37:59Z"
    lastUpdateTime: "2021-10-11T13:37:59Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-10-11T13:37:59Z"
    lastUpdateTime: "2021-10-11T13:38:10Z"
    message: ReplicaSet "pxc-operator-5998c9b5cb" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
  • pxc in men namespace
❯ k -n men get pxc -o yaml
apiVersion: v1
items:
- apiVersion: pxc.percona.com/v1
  kind: PerconaXtraDBCluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"pxc.percona.com/v1-9-0","kind":"PerconaXtraDBCluster"}
      meta.helm.sh/release-name: db
      meta.helm.sh/release-namespace: men
    creationTimestamp: "2021-10-11T13:38:47Z"
    finalizers:
    - delete-pxc-pods-in-order
    - delete-proxysql-pvc
    - delete-pxc-pvc
    generation: 2
    labels:
      app.kubernetes.io/instance: db
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: pxc-db
      app.kubernetes.io/version: 1.9.0
      helm.sh/chart: pxc-db-1.9.1
    name: db-pxc-db
    namespace: men
    resourceVersion: "46354859"
    uid: 36eae74f-f7c4-4df0-8dff-1ccc2491eb68
  spec:
    backup:
      image: percona/percona-xtradb-cluster-operator:1.9.0-pxc8.0-backup
      imagePullPolicy: Always
      pitr:
        enabled: false
        storageName: ""
      schedule:
      - keep: 5
        name: daily-backup
        schedule: 0 0 * * *
        storageName: fs-pvc
      storages:
        fs-pvc:
          podSecurityContext:
            fsGroup: 1001
            supplementalGroups:
            - 1001
          s3:
            bucket: ""
            credentialsSecret: ""
          type: filesystem
          volume:
            persistentVolumeClaim:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 6Gi
    crVersion: 1.9.0
    enableCRValidationWebhook: false
    haproxy:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      enabled: true
      envVarsSecret: db-pxc-db-env-vars-haproxy
      gracePeriod: 30
      image: percona/percona-xtradb-cluster-operator:1.9.0-haproxy
      imagePullPolicy: Always
      livenessProbes:
        failureThreshold: 4
        initialDelaySeconds: 60
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 5
      podDisruptionBudget:
        maxUnavailable: 1
      readinessProbes:
        failureThreshold: 3
        initialDelaySeconds: 15
        periodSeconds: 5
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        limits: {}
        requests:
          cpu: 600m
          memory: 1G
      serviceAccountName: default
      sidecarResources:
        limits: {}
        requests: {}
      size: 3
      volumeSpec:
        emptyDir: {}
    logCollectorSecretName: db-pxc-db-log-collector
    logcollector:
      enabled: true
      image: percona/percona-xtradb-cluster-operator:1.9.0-logcollector
      imagePullPolicy: Always
      resources:
        limits: {}
        requests: {}
    platform: kubernetes
    pmm:
      resources:
        limits: {}
        requests:
          cpu: 600m
          memory: 1G
    proxysql:
      livenessProbes: {}
      podSecurityContext:
        fsGroup: 1001
        supplementalGroups:
        - 1001
      readinessProbes: {}
    pxc:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      autoRecovery: true
      envVarsSecret: db-pxc-db-env-vars-pxc
      expose: {}
      gracePeriod: 600
      image: percona/percona-xtradb-cluster:8.0.23-14.1
      imagePullPolicy: Always
      livenessDelaySec: 300
      livenessProbes:
        failureThreshold: 3
        initialDelaySeconds: 300
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      podDisruptionBudget:
        maxUnavailable: 1
      podSecurityContext:
        fsGroup: 1001
        supplementalGroups:
        - 1001
      readinessDelaySec: 15
      readinessProbes:
        failureThreshold: 5
        initialDelaySeconds: 15
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 15
      resources:
        limits: {}
        requests:
          cpu: 600m
          memory: 1G
      serviceAccountName: default
      sidecarResources:
        limits: {}
        requests: {}
      size: 3
      sslInternalSecretName: db-pxc-db-ssl-internal
      sslSecretName: db-pxc-db-ssl
      vaultSecretName: db-pxc-db-vault
      volumeSpec:
        emptyDir: {}
    secretsName: db-pxc-db
    sslInternalSecretName: db-pxc-db-ssl-internal
    sslSecretName: db-pxc-db-ssl
    updateStrategy: SmartUpdate
    upgradeOptions:
      apply: 8.0-recommended
      schedule: 0 4 * * *
      versionServiceEndpoint: https://check.percona.com
    vaultSecretName: db-pxc-db-vault
  status:
    backup:
      version: 8.0.23
    conditions:
    - lastTransitionTime: "2021-10-11T13:38:52Z"
      status: "True"
      type: initializing
    haproxy:
      labelSelectorPath: app.kubernetes.io/component=haproxy,app.kubernetes.io/instance=db-pxc-db,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster
      size: 3
      status: initializing
    host: db-pxc-db-haproxy.men
    logcollector:
      version: 1.9.0
    observedGeneration: 2
    pmm:
      version: 2.18.0
    proxysql: {}
    pxc:
      image: percona/percona-xtradb-cluster:8.0.23-14.1
      labelSelectorPath: app.kubernetes.io/component=pxc,app.kubernetes.io/instance=db-pxc-db,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster
      size: 3
      status: initializing
      version: 8.0.23-14.1
    ready: 0
    size: 6
    state: initializing
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
  • the cluster is not starting due to DNS resolution
❯ k -n men get pods
NAME                  READY   STATUS    RESTARTS   AGE
db-pxc-db-haproxy-0   1/2     Running   13         55m
db-pxc-db-pxc-0       2/3     Running   3          55m
❯ k -n men exec -ti dnsutils -- nslookup db-pxc-db-pxc
Server:         10.32.0.10
Address:        10.32.0.10#53

** server can't find db-pxc-db-pxc: NXDOMAIN

command terminated with exit code 1
❯ k -n men exec -ti dnsutils -- nslookup db-pxc-db-pxc-unready
Server:         10.32.0.10
Address:        10.32.0.10#53

** server can't find db-pxc-db-pxc-unready: NXDOMAIN

command terminated with exit code 1
❯ k logs db-pxc-db-pxc-0 -c pxc | tail
2021/10/11 14:35:54 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:55 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:56 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:57 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:58 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:59 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:00 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:01 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:02 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:03 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
❯ k -n men describe pod db-pxc-db-pxc-0 | tail -20
Events:
  Type     Reason     Age   From     Message
  ----     ------     ----  ----     -------
  Normal   Pulled     60m   kubelet  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector" in 24.784727162s
  Normal   Created    60m   kubelet  Created container logs
  Normal   Started    60m   kubelet  Started container logs
  Normal   Pulling    60m   kubelet  Pulling image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector"
  Normal   Pulled     60m   kubelet  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector" in 1.942397197s
  Normal   Created    60m   kubelet  Created container logrotate
  Normal   Pulling    60m   kubelet  Pulling image "percona/percona-xtradb-cluster:8.0.23-14.1"
  Normal   Started    60m   kubelet  Started container logrotate
  Normal   Pulled     60m   kubelet  Successfully pulled image "percona/percona-xtradb-cluster:8.0.23-14.1" in 23.794414542s
  Normal   Created    60m   kubelet  Created container pxc
  Normal   Started    60m   kubelet  Started container pxc
  Warning  Unhealthy  55m   kubelet  Liveness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'db-pxc-db-pxc-0' (111)
+ [[ -n '' ]]
+ exit 1
  Warning  Unhealthy  40s (x116 over 59m)  kubelet  Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'db-pxc-db-pxc-0' (111)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1

Cluster dns resolution seems fine otherwise and endpoints are populated fine

❯ k -n men get endpoints db-pxc-db-pxc-unready
NAME                    ENDPOINTS                                                    AGE
db-pxc-db-pxc-unready   100.64.142.70:33062,100.64.142.70:33060,100.64.142.70:3306   64m

What I think might be the problem is that PXC statefulset’s serviceName field does not match the “unready” service name

❯ k -n men get statefulset db-pxc-db-pxc -o jsonpath='{.spec.serviceName}'
db-pxc-db-pxc 

Similar issue is described here

and it is how it’s done for proxysql statefulset which in my tests starts just fine.

Any help appreciated.

Regards

1 Like

@pavloos Please have a look at this commit K8SPXC-876 use PublishNotReadyAddresses instead of annotation · percona/percona-xtradb-cluster-operator@808ce22 · GitHub

This issue was fixed for 1.10.0. As a workaround you can edit service kubectl edit svc/${clustername}-pxc-unready manually and add publishNotReadyAddresses: true to the spec.

1 Like

Thanks @Slava_Sarzhan. That’s great news. When is 1.10.0 planned to be released?

Cheers

1 Like

@pavloos We have a plan to release it at the beginning of November.

2 Likes

Hello

I`m using percona-xtradb-cluster-operator:1.11.0
But after database restoration, cluster freezes in initialization status

Only 1 of 3 pxc pods are created, the next one is in crashloopbackoff statut and can`t connect to pxc-unready svc

I can fix this issue and start cluster only if i delete backup up information from cluster manifest

Could you tell me please what am i doing wrong?

@Vsevosemnog Can you reproduce this issue using the latest PXC operator, v1.12.0?

@Vsevosemnog, we need to know more about your k8s deployment and also need to have your CR.