I have the same problem:
- vanilla scaleway kapsule cluster
v1.21.4
- coredns
CoreDNS-1.8.4
- operator and db installed using helm charts
- operator running in
db
namespace
k -n db get deployments.apps pxc-operator
NAME READY UP-TO-DATE AVAILABLE AGE
pxc-operator 1/1 1 1 6m12s
❯ k -n db get deployments.apps pxc-operator -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: pxc-operator
meta.helm.sh/release-namespace: db
creationTimestamp: "2021-10-11T13:37:59Z"
generation: 1
labels:
app.kubernetes.io/instance: pxc-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: pxc-operator
app.kubernetes.io/version: 1.9.0
helm.sh/chart: pxc-operator-1.9.1
name: pxc-operator
namespace: db
resourceVersion: "46352088"
uid: b8c837b2-778f-466c-8d50-7967368ec120
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: pxc-operator
app.kubernetes.io/name: pxc-operator
app.kubernetes.io/part-of: pxc-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: pxc-operator
app.kubernetes.io/name: pxc-operator
app.kubernetes.io/part-of: pxc-operator
spec:
containers:
- command:
- percona-xtradb-cluster-operator
env:
- name: WATCH_NAMESPACE
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: pxc-operator
image: percona/percona-xtradb-cluster-operator:1.9.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: metrics
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: pxc-operator
ports:
- containerPort: 8080
name: metrics
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: pxc-operator
serviceAccountName: pxc-operator
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-10-11T13:37:59Z"
lastUpdateTime: "2021-10-11T13:37:59Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-10-11T13:37:59Z"
lastUpdateTime: "2021-10-11T13:38:10Z"
message: ReplicaSet "pxc-operator-5998c9b5cb" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
-
pxc
inmen
namespace
❯ k -n men get pxc -o yaml
apiVersion: v1
items:
- apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"pxc.percona.com/v1-9-0","kind":"PerconaXtraDBCluster"}
meta.helm.sh/release-name: db
meta.helm.sh/release-namespace: men
creationTimestamp: "2021-10-11T13:38:47Z"
finalizers:
- delete-pxc-pods-in-order
- delete-proxysql-pvc
- delete-pxc-pvc
generation: 2
labels:
app.kubernetes.io/instance: db
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: pxc-db
app.kubernetes.io/version: 1.9.0
helm.sh/chart: pxc-db-1.9.1
name: db-pxc-db
namespace: men
resourceVersion: "46354859"
uid: 36eae74f-f7c4-4df0-8dff-1ccc2491eb68
spec:
backup:
image: percona/percona-xtradb-cluster-operator:1.9.0-pxc8.0-backup
imagePullPolicy: Always
pitr:
enabled: false
storageName: ""
schedule:
- keep: 5
name: daily-backup
schedule: 0 0 * * *
storageName: fs-pvc
storages:
fs-pvc:
podSecurityContext:
fsGroup: 1001
supplementalGroups:
- 1001
s3:
bucket: ""
credentialsSecret: ""
type: filesystem
volume:
persistentVolumeClaim:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 6Gi
crVersion: 1.9.0
enableCRValidationWebhook: false
haproxy:
affinity:
antiAffinityTopologyKey: kubernetes.io/hostname
enabled: true
envVarsSecret: db-pxc-db-env-vars-haproxy
gracePeriod: 30
image: percona/percona-xtradb-cluster-operator:1.9.0-haproxy
imagePullPolicy: Always
livenessProbes:
failureThreshold: 4
initialDelaySeconds: 60
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
podDisruptionBudget:
maxUnavailable: 1
readinessProbes:
failureThreshold: 3
initialDelaySeconds: 15
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
resources:
limits: {}
requests:
cpu: 600m
memory: 1G
serviceAccountName: default
sidecarResources:
limits: {}
requests: {}
size: 3
volumeSpec:
emptyDir: {}
logCollectorSecretName: db-pxc-db-log-collector
logcollector:
enabled: true
image: percona/percona-xtradb-cluster-operator:1.9.0-logcollector
imagePullPolicy: Always
resources:
limits: {}
requests: {}
platform: kubernetes
pmm:
resources:
limits: {}
requests:
cpu: 600m
memory: 1G
proxysql:
livenessProbes: {}
podSecurityContext:
fsGroup: 1001
supplementalGroups:
- 1001
readinessProbes: {}
pxc:
affinity:
antiAffinityTopologyKey: kubernetes.io/hostname
autoRecovery: true
envVarsSecret: db-pxc-db-env-vars-pxc
expose: {}
gracePeriod: 600
image: percona/percona-xtradb-cluster:8.0.23-14.1
imagePullPolicy: Always
livenessDelaySec: 300
livenessProbes:
failureThreshold: 3
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
podDisruptionBudget:
maxUnavailable: 1
podSecurityContext:
fsGroup: 1001
supplementalGroups:
- 1001
readinessDelaySec: 15
readinessProbes:
failureThreshold: 5
initialDelaySeconds: 15
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 15
resources:
limits: {}
requests:
cpu: 600m
memory: 1G
serviceAccountName: default
sidecarResources:
limits: {}
requests: {}
size: 3
sslInternalSecretName: db-pxc-db-ssl-internal
sslSecretName: db-pxc-db-ssl
vaultSecretName: db-pxc-db-vault
volumeSpec:
emptyDir: {}
secretsName: db-pxc-db
sslInternalSecretName: db-pxc-db-ssl-internal
sslSecretName: db-pxc-db-ssl
updateStrategy: SmartUpdate
upgradeOptions:
apply: 8.0-recommended
schedule: 0 4 * * *
versionServiceEndpoint: https://check.percona.com
vaultSecretName: db-pxc-db-vault
status:
backup:
version: 8.0.23
conditions:
- lastTransitionTime: "2021-10-11T13:38:52Z"
status: "True"
type: initializing
haproxy:
labelSelectorPath: app.kubernetes.io/component=haproxy,app.kubernetes.io/instance=db-pxc-db,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster
size: 3
status: initializing
host: db-pxc-db-haproxy.men
logcollector:
version: 1.9.0
observedGeneration: 2
pmm:
version: 2.18.0
proxysql: {}
pxc:
image: percona/percona-xtradb-cluster:8.0.23-14.1
labelSelectorPath: app.kubernetes.io/component=pxc,app.kubernetes.io/instance=db-pxc-db,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster
size: 3
status: initializing
version: 8.0.23-14.1
ready: 0
size: 6
state: initializing
kind: List
metadata:
resourceVersion: ""
selfLink: ""
- the cluster is not starting due to DNS resolution
❯ k -n men get pods
NAME READY STATUS RESTARTS AGE
db-pxc-db-haproxy-0 1/2 Running 13 55m
db-pxc-db-pxc-0 2/3 Running 3 55m
❯ k -n men exec -ti dnsutils -- nslookup db-pxc-db-pxc
Server: 10.32.0.10
Address: 10.32.0.10#53
** server can't find db-pxc-db-pxc: NXDOMAIN
command terminated with exit code 1
❯ k -n men exec -ti dnsutils -- nslookup db-pxc-db-pxc-unready
Server: 10.32.0.10
Address: 10.32.0.10#53
** server can't find db-pxc-db-pxc-unready: NXDOMAIN
command terminated with exit code 1
❯ k logs db-pxc-db-pxc-0 -c pxc | tail
2021/10/11 14:35:54 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:55 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:56 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:57 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:58 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:35:59 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:00 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:01 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:02 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
2021/10/11 14:36:03 lookup db-pxc-db-pxc-unready on 10.32.0.10:53: no such host
❯ k -n men describe pod db-pxc-db-pxc-0 | tail -20
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 60m kubelet Successfully pulled image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector" in 24.784727162s
Normal Created 60m kubelet Created container logs
Normal Started 60m kubelet Started container logs
Normal Pulling 60m kubelet Pulling image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector"
Normal Pulled 60m kubelet Successfully pulled image "percona/percona-xtradb-cluster-operator:1.9.0-logcollector" in 1.942397197s
Normal Created 60m kubelet Created container logrotate
Normal Pulling 60m kubelet Pulling image "percona/percona-xtradb-cluster:8.0.23-14.1"
Normal Started 60m kubelet Started container logrotate
Normal Pulled 60m kubelet Successfully pulled image "percona/percona-xtradb-cluster:8.0.23-14.1" in 23.794414542s
Normal Created 60m kubelet Created container pxc
Normal Started 60m kubelet Started container pxc
Warning Unhealthy 55m kubelet Liveness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'db-pxc-db-pxc-0' (111)
+ [[ -n '' ]]
+ exit 1
Warning Unhealthy 40s (x116 over 59m) kubelet Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on 'db-pxc-db-pxc-0' (111)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1
Cluster dns resolution seems fine otherwise and endpoints are populated fine
❯ k -n men get endpoints db-pxc-db-pxc-unready
NAME ENDPOINTS AGE
db-pxc-db-pxc-unready 100.64.142.70:33062,100.64.142.70:33060,100.64.142.70:3306 64m
What I think might be the problem is that PXC statefulset’s serviceName
field does not match the “unready” service name
❯ k -n men get statefulset db-pxc-db-pxc -o jsonpath='{.spec.serviceName}'
db-pxc-db-pxc
Similar issue is described here
and it is how it’s done for proxysql statefulset which in my tests starts just fine.
Any help appreciated.
Regards