Hi,
I am running into the exact same issue just now. I already have a setup of the operator that works on my laptop which I installed with helm. Following the steps from the docs on helm leads me to this issue. Something has changed since Jan 6 2023 when I last had to set it up. I see that there has been a new helm chart version since that date which is v1.12.1. On my running setup, it is v1.12.0.
I tried using that older version just in case but the result is still the same. I also got the exact same cr.yaml
file that I used for the successful deployment from git and applied it against both v1.12.0 and v1.12.1 but no luck.
Something weird is happening because based on my version control I am doing the exact same thing with the exact same versions of everything just as I did on Jan 6 2023 but I am getting a different result. Could there have been a change in the helm chart without bumping the version (backporting something idk)?
System:
# uname -a
Linux pop-os 6.0.12-76060006-generic #202212290932~1674139725~22.04~ca93ccf SMP PREEMPT_DYNAMIC Thu J x86_64 x86_64 x86_64 GNU/Linux
Steps to reproduce:
With helm
kind create cluster --config kind-config.yaml
# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: redstone
nodes:
- role: control-plane
image: kindest/node:v1.23.13
- role: worker
image: kindest/node:v1.23.13
- role: worker
image: kindest/node:v1.23.13
- role: worker
image: kindest/node:v1.23.13
# I have also tested with v1.22.15 -> same issue
I have already added the percona helm repo and did a
helm repo update
.
helm install my-xtdb-op percona/pxc-operator
which gets the operator up and running and installs CRDs:
NAME READY STATUS RESTARTS AGE
my-xtdb-op-pxc-operator-85c45b4549-qvn2d 1/1 Running 0 6s
and then:
helm install my-db percona/pxc-db \
--set haproxy.enabled=false \
--set proxysql.enabled=true \
--set logcollector.enabled=false
# I am using proxysql but I also got the exact same issue without setting anything custom and using haproxy.
This gets us here:
❯ kubectl get pods,pxc
NAME READY STATUS RESTARTS AGE
pod/my-db-pxc-db-proxysql-0 3/3 Running 0 115s
pod/my-db-pxc-db-proxysql-1 3/3 Running 0 107s
pod/my-db-pxc-db-proxysql-2 3/3 Running 0 97s
pod/my-db-pxc-db-pxc-0 0/1 Running 0 115s
pod/my-xtdb-op-pxc-operator-85c45b4549-qvn2d 1/1 Running 0 4m33s
NAME ENDPOINT STATUS PXC PROXYSQL HAPROXY AGE
perconaxtradbcluster.pxc.percona.com/my-db-pxc-db my-db-pxc-db-proxysql-unready.ivo initializing 3 116s
and the my-db-pxc-db-pxc-0
pod is spamming:
│ + '[' '' = Synced ']' │
│ + echo 'MySQL init process in progress...' │
│ + sleep 1 │
│ MySQL init process in progress... │
│ + for i in {120..0} │
│ ++ echo 'SELECT variable_value FROM performance_schema.global_status WHERE variable_name='\''wsrep_local_state_comment'\''' │
│ ++ mysql --protocol=socket -uroot -hlocalhost --socket=/var/lib/mysql/mysql.sock --password= -s │
│ + wsrep_local_state= │
│ MySQL init process in progress... │
After some time or with a kill pod my-db-pxc-db-pxc-0
gets restarted and the logs change. After the first restart the PXC-0 pod is sitting at:
│ 2023-02-28T06:54:37.828344Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library 'none' │
│ 2023-02-28T06:54:37.830008Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/lib/mysql/mysqlx.sock │
│ 2023-02-28T06:54:37.830076Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.29-21.1' socket: '/var/lib/mysql/mysql.sock' │
but kubernetes is not marking the pod as ready due to a failed ready check that looks like this:
│ Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Normal Scheduled 4m34s default-scheduler Successfully assigned ivo/my-db-pxc-db-pxc-0 to redstone-worker3 │
│ Normal Pulling 4m34s kubelet Pulling image "percona/percona-xtradb-cluster-operator:1.12.0" │
│ Normal Pulled 4m33s kubelet Successfully pulled image "percona/percona-xtradb-cluster-operator:1.12.0" in 800.88027ms │
│ Normal Created 4m33s kubelet Created container pxc-init │
│ Normal Started 4m33s kubelet Started container pxc-init │
│ Normal Pulled 4m31s kubelet Successfully pulled image "percona/percona-xtradb-cluster:8.0.29-21.1" in 760.944329ms │
│ Normal Pulling 2m11s (x2 over 4m31s) kubelet Pulling image "percona/percona-xtradb-cluster:8.0.29-21.1" │
│ Normal Pulled 2m11s kubelet Successfully pulled image "percona/percona-xtradb-cluster:8.0.29-21.1" in 758.547316ms │
│ Normal Created 2m10s (x2 over 4m31s) kubelet Created container pxc │
│ Normal Started 2m10s (x2 over 4m30s) kubelet Started container pxc │
│ Warning Unhealthy 4s (x10 over 4m4s) kubelet Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on '10.244.1.11:33062' │
│ (111) │
│ + [[ '' == \P\r\i\m\a\r\y ]] │
│ + exit 1 │
No luck
EDIT 2:
The first and only pxc pod gets stuck right before it needs to open the Admin interface which is at port 33062. The logs on a normal running look something like this:
....
│ 2023-02-28T06:54:37.828344Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library 'none' │
│ 2023-02-28T06:54:37.830008Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/lib/mysql/mysqlx.sock │
│ 2023-02-28T06:54:37.830076Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.29-21.1' socket: '/var/lib/mysql/mysql.sock' │
# This is the key part that we are missing. ofc this log line is from a totally different running one that I am scared to touch atm.
| 2023-02-28T12:39:11.383051Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: '10.244.2.8' port: 33062
....
Because the PXC pod does not open its Admin interface on port 33062 then the readiness check is not able to pass because it queries the Admin interface. PXC pods are spawned sequentially if the first one
I have the feeling the following is happening but I don’t know why atm:
- First PXC pod loops on
MySQL init process in progress...
and eventually gets killed by k8s
1.1. Just in case they are related: https://forums.percona.com/t/bug-in-entry-entrypoint-sh-in-docker-image-based-setup/19585/2 - On the first restart of the pod because the init process didn’t go nicely it does not open Admin Interface
- No Admin Interface on port 33062 → no ready check → no cluster
After this, I went directly from the source and directly from the repo with the tag v1.12.0 just as mentioned in the docs.
git clone -b v1.12.0 https://github.com/percona/percona-xtradb-cluster-operator
kubectl apply -f deploy/bundle.yaml
kubectl apply -f deploy/cr.yaml
Unfortunately, this also leads to the same behaviour as the first time the pod is spamming MySQL init process in progress
and after a restart, it gets ready for connections but the ready check from kubernetes does not go through → pod never gets ready etc.
I am a bit at a loss here
EDIT 1:
I just tested with minikube instead of kind and I get the same issue. I will now test on a fully-fledged kubernetes cluster on GCP (google) and will keep you posted.