Fresh instance with Percona XtraDB Cluster Operator v1.8.0 not starting completly under OKD

mygov · May 11, 2021, 9:44am

I have tried multiple times to install a fresh and basic Percona XtraDB Cluster Instance with the Operator provided from the operatorhub. Unfortunately without success.

OKD Version: 4.7.0-0.okd-2021-04-24-103438 with OpenShift Container Storage v4.6.4 (latest)
Percona XtraDB Cluster Operator Version: 1.8.0 from operatorhub.io (latest)

Steps to reproduce:

oc create namespace pxc
(
cat <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-secrets
  namespace: pxc
type: Opaque
data:
  root: dHU3ZWl6aWVuNEVldGg3ZGFlbjZtaWV5aQo=
  xtrabackup: Ym9vTDN2YWhkMG5pRjVBZVBoMGVlamFWaQo=
  monitor: QWVwaGVlMGVleDJRdW9oc2hlaWM5U2VlNAo=
  clustercheck: b293aWV4YTZldTBlb2owYWNvb2NhM0VjaAo=
  proxyadmin: SWUzU2hhaDV0aGlleGVpRDFzaGllaDRBaQo=
  pmmserver: RmFpTmdpazFvaG43YWVuaWVmZXVwYWhzaAo=
  operator: WWVuYWlsYWlQb1dpdUhlaWdlZTFvZXZpZQo=
EOF
 ) | oc create -f -

OperatorHub > Install Percona XtraDB Cluster Operator (v1.8.0)

Installed Operators > Percona XtraDB Cluster Operator → PerconaXtraDBCluster > Create PerconaXtraDBCluster > Name: cluster1 > Create

After creating the Instance, the pods cluster1-haproxy-0 and cluster1-pxc-0 will not start completely:

oc -n pxc get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 1/2     Running   0          112s
cluster1-pxc-0                                     2/3     Running   0          112s
percona-xtradb-cluster-operator-598bf796f7-5k6jt   1/1     Running   0          19h

oc -n pxc logs cluster1-haproxy-0 pxc-monit
+ '[' /usr/bin/peer-list = haproxy ']'
+ exec /usr/bin/peer-list -on-change=/usr/bin/add_pxc_nodes.sh -service=cluster1-pxc
2021/05/11 09:26:54 Peer finder enter
2021/05/11 09:26:54 Determined Domain to be pxc.svc.cluster.local
2021/05/11 09:26:54 No on-start supplied, on-change /usr/bin/add_pxc_nodes.sh will be applied on start.
2021/05/11 09:26:54 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:55 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:56 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:57 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:58 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:59 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:00 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:01 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:02 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:03 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:05 lookup cluster1-pxc on 10.30.0.10:53: no such host

c -n pxc get events
LAST SEEN   TYPE      REASON                   OBJECT                                 MESSAGE
23m         Warning   Unhealthy                pod/cluster1-haproxy-0                 Readiness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
27m         Warning   Unhealthy                pod/cluster1-haproxy-0                 Liveness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
18m         Warning   Unhealthy                pod/cluster1-haproxy-0                 Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of ba67b440dbfab358bf8c4ca5015898b0c3b113d0b2bd652affa59ff5040860d4 is running failed: container process not found
27m         Warning   Unhealthy                pod/cluster1-pxc-0                     Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'cluster1-pxc-0' (111)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1
23m         Warning   Unhealthy                pod/cluster1-pxc-0                     Readiness probe failed: ERROR 1045 (28000): Access denied for user 'monitor'@'cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local' (using password: YES)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1
22m         Warning   Unhealthy                pod/cluster1-pxc-0                     Liveness probe failed: ERROR 1045 (28000): Access denied for user 'monitor'@'cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local' (using password: YES)
+ [[ -n '' ]]
+ exit 1
18m         Warning   FailedToUpdateEndpoint            endpoints/cluster1-pxc-unready                 Failed to update endpoint pxc/cluster1-pxc-unready: Operation cannot be fulfilled on endpoints "cluster1-pxc-unready": the object has been modified; please apply your changes to the latest version and try again

Do you have any ideas to resolve this issue?

Ege_Gunes · May 11, 2021, 10:37am

Hi,

Can you try without creating the my-cluster-secrets on a new namespace? Operator should automatically create secrets for you.

mygov · May 11, 2021, 10:50am

I deleted the my-cluster-secrets (the creation of the secrets is shown in the readme at OperatorHub.io | The registry for Kubernetes Operators) and reinstalled the operator.

After create a new instance I see the same result.

oc -n pxc get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 1/2     Running   0          106s
cluster1-pxc-0                                     2/3     Running   0          106s
percona-xtradb-cluster-operator-669c94886f-g44cs   1/1     Running   0          2m43s

Instance status in PerconaXtraDBClusters is still “State: initializing”.

The pxc-monit container shows again much of those lines:
2021/05/11 10:49:12 lookup cluster1-pxc on 10.30.0.10:53: no such host

So, no changes at all without the my-cluster-secrets secret.

Ege_Gunes · May 11, 2021, 11:12am

Yes, it seems something is wrong. Could you please try again on a clean namespace?

oc create namespace pxc-new

mygov · May 11, 2021, 11:37am

I removed the old pxc namespace and also the Operator

After I created a new namespace:
oc create namespace pxc-new
namespace/pxc-new created
and used this new namespace for the fresh Operator installation, it seems to work:

oc -n pxc-new get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          6m27s
cluster1-haproxy-1                                 2/2     Running   0          4m45s
cluster1-haproxy-2                                 2/2     Running   0          4m11s
cluster1-pxc-0                                     3/3     Running   0          6m27s
cluster1-pxc-1                                     3/3     Running   0          4m45s
cluster1-pxc-2                                     3/3     Running   0          3m21s
percona-xtradb-cluster-operator-6ff787986b-gdcdl   1/1     Running   0          7m17s

Now the instance state is “State: ready”.

Thanks!
The issue seems to be resolved.

Seedy_Bensouda · May 26, 2021, 6:28pm

I am running into the same issue. We evaluating the operator and are doing a vanila deployment on a charmed kubernetes cluster with rook-ceph. Initially we had modified secrets.yaml as per the document and then did a kubectl -n pvx apply -f secrets.yaml but haproxy did not start with Warning Unhealthy 5m3s kubelet Readiness probe failed: ERROR 2003 (HY000): Can’t connect to MySQL server on ‘cluster1-pxc-0’ (111)

We delete everything including the namspace and created a new one called pxcluster and then ran everything again with out applying the secrets.yaml file.

However when we run kubectl -n pxcluster get pods we get:
NAME READY STATUS RESTARTS AGE
cluster1-haproxy-0 2/2 Running 0 10m
cluster1-haproxy-1 1/2 Running 3 9m22s
cluster1-pxc-0 3/3 Running 0 10m
cluster1-pxc-1 2/3 Running 1 9m28s
percona-xtradb-cluster-operator-77bfd8cdc5-psrpb 1/1 Running 0 11m

When we describe the haproxy and cluster node we see the following:
ype Reason Age From Message

Warning FailedScheduling Normal Scheduled 5m50s Normal SuccessfulAttachVolume 5m50s Normal Pulling 5m47s Normal Started 5m46s Normal Created 5m46s Normal Pulled 5m46s Normal Pulling 5m45s Normal Pulling 5m44s Normal Pulled 5m44s Normal Created 5m44s Normal Started 5m44s Normal Pulled 5m42s Normal Created 5m42s Normal Started 5m42s Normal Pulling 5m42s Normal Pulled 5m41s Normal Created 5m41s Normal Started 5m41s Warning Unhealthy 5m3s 5m53s (x2 over 5m53s) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
default-scheduler Successfully assigned pxcluster/cluster1-pxc-0 to k8s-node-3
attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-8617f9c4-d5d5-43f4-af54-54e685b17bac”
kubelet Pulling image “percona/percona-xtradb-cluster-operator:1.8.0”
kubelet Started container pxc-init
kubelet Created container pxc-init
kubelet Successfully pulled image “percona/percona-xtradb-cluster-operator:1.8.0” in 1.298818047s
kubelet Pulling image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector”
kubelet Pulling image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector”
kubelet Successfully pulled image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector” in 1.294141746s
kubelet Created container logs
kubelet Started container logs
kubelet Successfully pulled image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector” in 1.351197875s
kubelet Created container logrotate
kubelet Started container logrotate
kubelet Pulling image “percona/percona-xtradb-cluster:8.0.22-13.1”
kubelet Successfully pulled image “percona/percona-xtradb-cluster:8.0.22-13.1” in 1.318286907s
kubelet Created container pxc
kubelet Started container pxc
kubelet Readiness probe failed: ERROR 2003 (HY000): Can’t connect to MySQL server on ‘cluster1-pxc-0’ (111)

[[ ‘’ == \P\r\i\m\a\r\y ]]
exit 1

Slava_Sarzhan · May 28, 2021, 1:32pm

Hi @Seedy_Bensouda ,

As I can see you have two issues there. One issue is that your HAProxy ‘cluster1-haproxy-1’ pod can’t connect to cluster1-pxc-0 and another one is that pxc container on pod cluster1-pxc-1 can not start (join to the cluster) . Please make sure that you don’t have any communication/network (all needed ports are opened, IPs are reachable and so on ) issues between k8s nodes.

Seedy_Bensouda · May 29, 2021, 10:42am

Thanks for the reploy @Slava_Sarzhan . I do not have any communication issues as far as I can see. Flannel with calico are up and both working and my ceph cluster is detecting heart beats from all nodes. Also other pods are working right.

Do I need configure EmptyDir on haproxy? Could that be it?

Sergey_Pronin · June 7, 2021, 8:14am

I don’t believe EmptyDir has anything to do here. HAProxy pods are stateless, so it should not be an issue.

Is there anything else specific about your k8s cluster or Operator configuration?

MarcoFan · June 12, 2021, 10:11am

Hello, i have the same issue, instance installed following this page: Install Percona XtraDB Cluster on Kubernetes

in the yaml files i edited only storageClass name

pod can’t complete running status
NAME READY STATUS RESTARTS AGE
cluster1-haproxy-0 2/2 Running 1 16m
cluster1-haproxy-1 2/2 Running 0 11m
cluster1-haproxy-2 2/2 Running 0 11m
cluster1-pxc-0 3/3 Running 0 16m
cluster1-pxc-1 2/3 Running 0 11m
percona-xtradb-cluster-operator-77bfd8cdc5-5c9xm 1/1 Running 0 17m

12m Warning Unhealthy pod/cluster1-pxc-1 Readiness probe failed: ERROR 2003 (HY000): Can’t connect to MySQL server on ‘cluster1-pxc-1’ (111)

already tried to install changing namespace, with and without create secrets…

kubernetes v1.20.6 on rancher v2.5.7

Sergey_Pronin · June 13, 2021, 11:33am

@MarcoFan anything in the logs of the Operator and Pods?
Is the 3rd PXC pod starting at all?

andrey · June 24, 2021, 2:09pm

i have the same error if i install calico

minikube start --driver=virtualbox --disable-driver-mounts --cpus=12 --memory=16096  --network-plugin=cni --cni=calico --nodes=3

Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'my-db-pxc-db-pxc-0' (111) + [[ '' == \P\r\i\m\a\r\y ]] + exit 1

logs-from-logs-in-cluster1-pxc-0.txt (929 Bytes)
logs-from-haproxy-in-cluster1-haproxy-0.txt (647 Bytes)

 Readiness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
Back-off restarting failed container

andrey · June 24, 2021, 2:33pm

if no calico
percona-xtradb-cluster works

Slava_Sarzhan · July 8, 2021, 2:19pm

Hi @andrey

I can reproduce it using command provided by you. The root of the issue is that cluster1-pxc-0/cluster1-haproxy-0 pods can’t resolve services like cluster1-pxc-unready. That is why operator can’t configure the cluster in a proper way. It is calico issue. As I can see calico v3.14.1 is used by minikube and it was released more than one year ago. I have installed the latest v3.19.1 calico using official documentation Quickstart for Calico on minikube (using Manifest ) and issue has gone:

>kubectl get pods -l k8s-app=calico-node -n kube-system
NAME                READY   STATUS    RESTARTS   AGE
calico-node-fkwnn   1/1     Running   0          20m
calico-node-mk8dx   1/1     Running   0          19m
calico-node-z29f5   1/1     Running   0          18m

> kubectl get pods
NAME                                            READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                              2/2     Running   0          5m32s
cluster1-haproxy-1                              2/2     Running   0          3m30s
cluster1-haproxy-2                              2/2     Running   0          3m4s
cluster1-pxc-0                                  3/3     Running   0          5m32s
cluster1-pxc-1                                  3/3     Running   0          3m29s
cluster1-pxc-2                                  3/3     Running   0          117s
percona-xtradb-cluster-operator-d99c748-jhv4x   1/1     Running   0          6m16s

Also, I have tested it on scaleway k8s cluster with CNI calico and it also works. Try to use the latest version of calico and inform me about the results.

Seedy_Bensouda · July 24, 2021, 12:59pm

@Slava_Sarzhan so i managed to get this working. The issue originally was that I was trying to define secrets instead of allowing percona to create them for itself. Things worked and I moved on. I came back today to do some maintenance and noticed that issue had come back.

NAME READY STATUS RESTARTS AGE
68e50-daily-backup-1627084800-ps25h 0/1 Completed 0 12h
68e50-sat-night-backup-1627084800-q2wx6 0/1 Completed 0 12h
cluster1-haproxy-0 1/2 Running 6 19m
cluster1-haproxy-1 1/2 CrashLoopBackOff 11903 53d
cluster1-haproxy-2 1/2 CrashLoopBackOff 11903 53d
cluster1-pxc-0 3/3 Running 19 53d
cluster1-pxc-1 2/3 CrashLoopBackOff 7 20m
percona-xtradb-cluster-operator-77bfd8cdc5-9r6vr 1/1 Running 1 53d
xb-cron-cluster1-s3-us-west-20210605000008-3d2dv-hxwjd 0/1 CreateContainerConfigError 0 45d

I am using calico v3.19.1 as shown below.
kubectl calico version
Client Version: v3.19.1
Git commit: 6fc0db96
Unable to retrieve Cluster Version or Type: resource does not exist: ClusterInformation(default) with error: the server could not find the requested resource (get ClusterInformations.crd.projectcalico.org default)

I did some more digging in the logs and found the following. It looks likes there was an attempt by galera to open a connection and that failed.

[0] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130432.413401552, {“log”=>“2021-07-24T12:40:32.412880Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50417S), skipping check”}]
[0] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130461.921485610, {“log”=>“2021-07-24T12:41:01.920841Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 → 0”}]
[1] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130461.921909299, {“log”=>“2021-07-24T12:41:01.921460Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node”}]
[2] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130461.921911826, {“log”=>“view ((empty))”}]
[3] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130461.922410405, {“log”=>“2021-07-24T12:41:01.922374Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)”}]
[4] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130461.922412800, {“log”=>" at gcomm/src/pc.cpp:connect():161"}]
[5] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130461.922487606, {“log”=>“2021-07-24T12:41:01.922428Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)”}]
[0] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.922868257, {“log”=>“2021-07-24T12:41:02.922714Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread”}]
[1] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.922874019, {“log”=>“2021-07-24T12:41:02.922822Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread”}]
[2] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.923158562, {“log”=>“2021-07-24T12:41:02.923073Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1754: Failed to open channel ‘cluster1-pxc’ at ‘gcomm://10.1.86.126’: -110 (Connection timed out)”}]
[3] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.923258685, {“log”=>“2021-07-24T12:41:02.923175Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out”}]
[4] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.923260499, {“log”=>“2021-07-24T12:41:02.923219Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://10.1.86.126) failed to establish connection with cluster (reason: 7)”}]
[5] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.923345885, {“log”=>“2021-07-24T12:41:02.923255Z 0 [ERROR] [MY-010119] [Server] Aborting”}]
[6] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.923724901, {“log”=>“2021-07-24T12:41:02.923666Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.22-13.1) Percona XtraDB Cluster (GPL), Release rel13, Revision a48e6d5, WSREP version 26.4.3.”}]
[7] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.924285826, {“log”=>“2021-07-24T12:41:02.924248Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED”}]
[8] pxcluster.cluster1-pxc-1.mysqld-error.log: [1627130462.924356763, {“log”=>“2021-07-24T12:41:02.924329Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0, misses: 0, in use: 0, in pool: 0”}]

andrey · July 25, 2021, 6:22pm

Thanks for the quick response. The problem was with coredns on minikube. I created a simple Pod to use as a test environment

apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
  nodeName: minikube-m02

Installed it on minikube-m02 to check DNS and I found out that dns does not work on minikube-m02
no found kubernetes.default

no found cluster1-pxc-0

I just reloaded CoreDNS on minikube

└$► kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                      READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-hfklv   1/1     Running   0          38m
└$► kubectl delete coredns-74ff55c5b-hfklv -n kube-system

Percona XtraDB Cluster on Minikube (minikube start --driver=virtualbox --disable-driver-mounts --cpus=12 --memory=16096 --network-plugin=cni --cni=calico --nodes=3) is working

Just restart CoreDNS on minikube

It makes no difference to run the minicube on three nodes at once or add one at a time. CoreDNS work only minikube-m00

Seedy_Bensouda · July 27, 2021, 8:26pm

@Andrey just to give more insite. I am using ubuntu with charmed kubernetes. I check and my DNS server is working as expected. dns test is as follows:

kubectl exec -i -t dnsutils – nslookup cluster1-pxc.pxcluster
Server: 10.152.183.14
Address: 10.152.183.14#53

Name: cluster1-pxc.pxcluster.svc.cluster.local
Address: 10.1.86.126

andrey · July 28, 2021, 9:35am

I don’t have much experience, but it seems to me that you still have a problem with DNS.

└$► kubectl exec -i -t dnsutils -- nslookup cluster1-pxc
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	cluster1-pxc.default.svc.cluster.local
Address: 10.244.205.194
Name:	cluster1-pxc.default.svc.cluster.local
Address: 10.244.151.3
Name:	cluster1-pxc.default.svc.cluster.local
Address: 10.244.120.66

Seedy_Bensouda · July 29, 2021, 6:58pm

Not sure I see the issue. Normally you have to specify the name space of the pod to resolve it and in this case dnsutils us running in the default name space while percona cluster is running in pxcluster name space.

Seedy_Bensouda · August 5, 2021, 8:53am

is there any further suggestion on this that i can try?

Topic		Replies	Views
Kubernetes: percona-xtradb-cluster-operator fails to initialize - readiness probe failed Percona Operator for MySQL	15	1909	February 16, 2023
PXC cluster not starting in kind Percona Operator for MySQL	10	2391	February 28, 2023
Cluster status and backups not working Percona Operator for MySQL	5	1072	October 28, 2021
Create PXC Cluster: sed: -e expression Percona Operator for MySQL percona	21	1835	May 8, 2021
Can you help me setup pxc cluster on k8s Percona Operator for MySQL	11	1120	August 25, 2021

Fresh instance with Percona XtraDB Cluster Operator v1.8.0 not starting completly under OKD

Related topics