Fresh instance with Percona XtraDB Cluster Operator v1.8.0 not starting completly under OKD

I have tried multiple times to install a fresh and basic Percona XtraDB Cluster Instance with the Operator provided from the operatorhub. Unfortunately without success.

OKD Version: 4.7.0-0.okd-2021-04-24-103438 with OpenShift Container Storage v4.6.4 (latest)
Percona XtraDB Cluster Operator Version: 1.8.0 from operatorhub.io (latest)

Steps to reproduce:

oc create namespace pxc
(
cat <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-secrets
  namespace: pxc
type: Opaque
data:
  root: dHU3ZWl6aWVuNEVldGg3ZGFlbjZtaWV5aQo=
  xtrabackup: Ym9vTDN2YWhkMG5pRjVBZVBoMGVlamFWaQo=
  monitor: QWVwaGVlMGVleDJRdW9oc2hlaWM5U2VlNAo=
  clustercheck: b293aWV4YTZldTBlb2owYWNvb2NhM0VjaAo=
  proxyadmin: SWUzU2hhaDV0aGlleGVpRDFzaGllaDRBaQo=
  pmmserver: RmFpTmdpazFvaG43YWVuaWVmZXVwYWhzaAo=
  operator: WWVuYWlsYWlQb1dpdUhlaWdlZTFvZXZpZQo=
EOF
 ) | oc create -f -

OperatorHub > Install Percona XtraDB Cluster Operator (v1.8.0)

Installed Operators > Percona XtraDB Cluster Operator → PerconaXtraDBCluster > Create PerconaXtraDBCluster > Name: cluster1 > Create

After creating the Instance, the pods cluster1-haproxy-0 and cluster1-pxc-0 will not start completely:

oc -n pxc get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 1/2     Running   0          112s
cluster1-pxc-0                                     2/3     Running   0          112s
percona-xtradb-cluster-operator-598bf796f7-5k6jt   1/1     Running   0          19h

oc -n pxc logs cluster1-haproxy-0 pxc-monit
+ '[' /usr/bin/peer-list = haproxy ']'
+ exec /usr/bin/peer-list -on-change=/usr/bin/add_pxc_nodes.sh -service=cluster1-pxc
2021/05/11 09:26:54 Peer finder enter
2021/05/11 09:26:54 Determined Domain to be pxc.svc.cluster.local
2021/05/11 09:26:54 No on-start supplied, on-change /usr/bin/add_pxc_nodes.sh will be applied on start.
2021/05/11 09:26:54 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:55 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:56 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:57 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:58 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:26:59 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:00 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:01 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:02 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:03 lookup cluster1-pxc on 10.30.0.10:53: no such host
2021/05/11 09:27:05 lookup cluster1-pxc on 10.30.0.10:53: no such host

c -n pxc get events
LAST SEEN   TYPE      REASON                   OBJECT                                 MESSAGE
23m         Warning   Unhealthy                pod/cluster1-haproxy-0                 Readiness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
27m         Warning   Unhealthy                pod/cluster1-haproxy-0                 Liveness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
18m         Warning   Unhealthy                pod/cluster1-haproxy-0                 Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of ba67b440dbfab358bf8c4ca5015898b0c3b113d0b2bd652affa59ff5040860d4 is running failed: container process not found
27m         Warning   Unhealthy                pod/cluster1-pxc-0                     Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'cluster1-pxc-0' (111)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1
23m         Warning   Unhealthy                pod/cluster1-pxc-0                     Readiness probe failed: ERROR 1045 (28000): Access denied for user 'monitor'@'cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local' (using password: YES)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1
22m         Warning   Unhealthy                pod/cluster1-pxc-0                     Liveness probe failed: ERROR 1045 (28000): Access denied for user 'monitor'@'cluster1-pxc-0.cluster1-pxc.pxc.svc.cluster.local' (using password: YES)
+ [[ -n '' ]]
+ exit 1
18m         Warning   FailedToUpdateEndpoint            endpoints/cluster1-pxc-unready                 Failed to update endpoint pxc/cluster1-pxc-unready: Operation cannot be fulfilled on endpoints "cluster1-pxc-unready": the object has been modified; please apply your changes to the latest version and try again

Do you have any ideas to resolve this issue?

Hi,

Can you try without creating the my-cluster-secrets on a new namespace? Operator should automatically create secrets for you.

2 Likes

I deleted the my-cluster-secrets (the creation of the secrets is shown in the readme at OperatorHub.io | The registry for Kubernetes Operators) and reinstalled the operator.

After create a new instance I see the same result.

oc -n pxc get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 1/2     Running   0          106s
cluster1-pxc-0                                     2/3     Running   0          106s
percona-xtradb-cluster-operator-669c94886f-g44cs   1/1     Running   0          2m43s

Instance status in PerconaXtraDBClusters is still “State: initializing”.

The pxc-monit container shows again much of those lines:
2021/05/11 10:49:12 lookup cluster1-pxc on 10.30.0.10:53: no such host

So, no changes at all without the my-cluster-secrets secret. :frowning:

1 Like

Yes, it seems something is wrong. Could you please try again on a clean namespace?

oc create namespace pxc-new
1 Like

I removed the old pxc namespace and also the Operator

After I created a new namespace:
oc create namespace pxc-new
namespace/pxc-new created
and used this new namespace for the fresh Operator installation, it seems to work:

oc -n pxc-new get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          6m27s
cluster1-haproxy-1                                 2/2     Running   0          4m45s
cluster1-haproxy-2                                 2/2     Running   0          4m11s
cluster1-pxc-0                                     3/3     Running   0          6m27s
cluster1-pxc-1                                     3/3     Running   0          4m45s
cluster1-pxc-2                                     3/3     Running   0          3m21s
percona-xtradb-cluster-operator-6ff787986b-gdcdl   1/1     Running   0          7m17s

Now the instance state is “State: ready”.

Thanks!
The issue seems to be resolved.

1 Like

I am running into the same issue. We evaluating the operator and are doing a vanila deployment on a charmed kubernetes cluster with rook-ceph. Initially we had modified secrets.yaml as per the document and then did a kubectl -n pvx apply -f secrets.yaml but haproxy did not start with Warning Unhealthy 5m3s kubelet Readiness probe failed: ERROR 2003 (HY000): Can’t connect to MySQL server on ‘cluster1-pxc-0’ (111)

We delete everything including the namspace and created a new one called pxcluster and then ran everything again with out applying the secrets.yaml file.

However when we run kubectl -n pxcluster get pods we get:
NAME READY STATUS RESTARTS AGE
cluster1-haproxy-0 2/2 Running 0 10m
cluster1-haproxy-1 1/2 Running 3 9m22s
cluster1-pxc-0 3/3 Running 0 10m
cluster1-pxc-1 2/3 Running 1 9m28s
percona-xtradb-cluster-operator-77bfd8cdc5-psrpb 1/1 Running 0 11m

When we describe the haproxy and cluster node we see the following:
ype Reason Age From Message


Warning FailedScheduling 5m53s (x2 over 5m53s) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 5m50s default-scheduler Successfully assigned pxcluster/cluster1-pxc-0 to k8s-node-3
Normal SuccessfulAttachVolume 5m50s attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-8617f9c4-d5d5-43f4-af54-54e685b17bac”
Normal Pulling 5m47s kubelet Pulling image “percona/percona-xtradb-cluster-operator:1.8.0”
Normal Started 5m46s kubelet Started container pxc-init
Normal Created 5m46s kubelet Created container pxc-init
Normal Pulled 5m46s kubelet Successfully pulled image “percona/percona-xtradb-cluster-operator:1.8.0” in 1.298818047s
Normal Pulling 5m45s kubelet Pulling image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector”
Normal Pulling 5m44s kubelet Pulling image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector”
Normal Pulled 5m44s kubelet Successfully pulled image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector” in 1.294141746s
Normal Created 5m44s kubelet Created container logs
Normal Started 5m44s kubelet Started container logs
Normal Pulled 5m42s kubelet Successfully pulled image “percona/percona-xtradb-cluster-operator:1.8.0-logcollector” in 1.351197875s
Normal Created 5m42s kubelet Created container logrotate
Normal Started 5m42s kubelet Started container logrotate
Normal Pulling 5m42s kubelet Pulling image “percona/percona-xtradb-cluster:8.0.22-13.1”
Normal Pulled 5m41s kubelet Successfully pulled image “percona/percona-xtradb-cluster:8.0.22-13.1” in 1.318286907s
Normal Created 5m41s kubelet Created container pxc
Normal Started 5m41s kubelet Started container pxc
Warning Unhealthy 5m3s kubelet Readiness probe failed: ERROR 2003 (HY000): Can’t connect to MySQL server on ‘cluster1-pxc-0’ (111)

  • [[ ‘’ == \P\r\i\m\a\r\y ]]
  • exit 1
1 Like

Hi @Seedy_Bensouda ,

As I can see you have two issues there. One issue is that your HAProxy ‘cluster1-haproxy-1’ pod can’t connect to cluster1-pxc-0 and another one is that pxc container on pod cluster1-pxc-1 can not start (join to the cluster) . Please make sure that you don’t have any communication/network (all needed ports are opened, IPs are reachable and so on ) issues between k8s nodes.

1 Like

Thanks for the reploy @SlavaSarzhan . I do not have any communication issues as far as I can see. Flannel with calico are up and both working and my ceph cluster is detecting heart beats from all nodes. Also other pods are working right.

Do I need configure EmptyDir on haproxy? Could that be it?

1 Like

I don’t believe EmptyDir has anything to do here. HAProxy pods are stateless, so it should not be an issue.

Is there anything else specific about your k8s cluster or Operator configuration?

1 Like

Hello, i have the same issue, instance installed following this page: Install Percona XtraDB Cluster on Kubernetes

in the yaml files i edited only storageClass name

pod can’t complete running status
NAME READY STATUS RESTARTS AGE
cluster1-haproxy-0 2/2 Running 1 16m
cluster1-haproxy-1 2/2 Running 0 11m
cluster1-haproxy-2 2/2 Running 0 11m
cluster1-pxc-0 3/3 Running 0 16m
cluster1-pxc-1 2/3 Running 0 11m
percona-xtradb-cluster-operator-77bfd8cdc5-5c9xm 1/1 Running 0 17m

12m Warning Unhealthy pod/cluster1-pxc-1 Readiness probe failed: ERROR 2003 (HY000): Can’t connect to MySQL server on ‘cluster1-pxc-1’ (111)

already tried to install changing namespace, with and without create secrets…

kubernetes v1.20.6 on rancher v2.5.7

1 Like

@MarcoFan anything in the logs of the Operator and Pods?
Is the 3rd PXC pod starting at all?

1 Like