Newly created Mongodb fails liveness check

Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication"

In a nutshell, cluster creates certs, but while liveness check appears to use them, it fails. Same result if cert_manager installed, either the version shown in Percona docs or current version.

  • exec mongod --bind_ip_all --auth --dbpath=/data/db --port=27017 --replSet=rs0 --storageEngine=wiredTiger --relaxPermChecks --clusterAuthMode=x509 --enableEncryption --encryptionKeyFile=/etc/mongodb-encryption/encryption-key --wiredTigerIndexPrefixCompression=true --tlsMode preferTLS --tlsCertificateKeyFile /tmp/tls.pem --tlsAllowInvalidCertificates --tlsClusterFile /tmp/tls-internal.pem --tlsCAFile /etc/mongodb-ssl/ca.crt --tlsClusterCAFile /etc/mongodb-ssl-internal/ca.crt
    {“t”:{“$date”:“2023-04-13T10:11:23.918+00:00”},“s”:“I”, “c”:“NETWORK”, “id”:4915701, “ctx”:“-”,“msg”:“Initialized wire specification”,“attr”:{“spec”:{“incomingExternalClient”:{“minWireVersion”:0,“maxWireVersion”:17},“incomingInternalClient”:{“minWireVersion”:0,“maxWireVersion”:17},“outgoing”:{“minWireVersion”:6,“maxWireVersion”:17},“isInternalClient”:true}}}
    {“t”:{“$date”:“2023-04-13T10:11:23.920+00:00”},“s”:“I”, “c”:“NETWORK”, “id”:4913010, “ctx”:“-”,“msg”:“Certificate information”,“attr”:{“subject”:“O=PSMDB”,“issuer”:“O=Root CA”,“thumbprint”:“384C6E35CF35FDC7C55CFBCE501D4077615888C4”,“notValidBefore”:{“$date”:“2023-04-13T09:50:16.000Z”},“notValidAfter”:{“$date”:{“$numberLong”:“253402300799000”}},“keyFile”:“/tmp/tls.pem”,“type”:“Server”}}
    {“t”:{“$date”:“2023-04-13T10:11:23.920+00:00”},“s”:“I”, “c”:“NETWORK”, “id”:4913011, “ctx”:“-”,“msg”:“Certificate information”,“attr”:{“subject”:“O=PSMDB”,“issuer”:“O=Root CA”,“thumbprint”:“7B057B1008A9A6CE4B4D6D042A6A8A3503EC7A03”,“notValidBefore”:{“$date”:“2023-04-13T09:50:17.000Z”},“notValidAfter”:{“$date”:{“$numberLong”:“253402300799000”}},“keyFile”:“/tmp/tls-internal.pem”,“type”:“Cluster”}}
    {“t”:{“$date”:“2023-04-13T10:11:23.920+00:00”},“s”:“I”, “c”:“CONTROL”, “id”:23285, “ctx”:“-”,“msg”:“Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols ‘none’”}

{“t”:{“$date”:“2023-04-13T10:19:38.251+00:00”},“s”:“I”, “c”:“CONTROL”, “id”:23285, “ctx”:“main”,“msg”:“Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols ‘none’”}
{“t”:{“$date”:“2023-04-13T10:19:38.253+00:00”},“s”:“I”, “c”:“CONTROL”, “id”:23403, “ctx”:“initandlisten”,“msg”:“Build Info”,“attr”:{“buildInfo”:{“version”:“6.0.4-3”,“gitVersion”:“6c7d07d27d493392d5e4933b1173960fe97a5381”,“openSSLVersion”:“OpenSSL 1.1.1k FIPS 25 Mar 2021”,“modules”:,“allocator”:“tcmalloc”,“environment”:{“distarch”:“x86_64”,“target_arch”:“x86_64”}}}}
{“t”:{“$date”:“2023-04-13T10:19:38.254+00:00”},“s”:“I”, “c”:“CONTROL”, “id”:21951, “ctx”:“initandlisten”,“msg”:“Options set by command line”,“attr”:{“options”:{“net”:{“bindIp”:“*”,“port”:27017,“tls”:{“CAFile”:“/etc/mongodb-ssl/ca.crt”,“allowInvalidCertificates”:true,“certificateKeyFile”:“/tmp/tls.pem”,“clusterCAFile”:“/etc/mongodb-ssl-internal/ca.crt”,“clusterFile”:“/tmp/tls-internal.pem”,“mode”:“preferTLS”}},“replication”:{“replSet”:“rs0”},“security”:{“authorization”:“enabled”,“clusterAuthMode”:“x509”,“enableEncryption”:true,“encryptionKeyFile”:“/etc/mongodb-encryption/encryption-key”,“relaxPermChecks”:true},“storage”:{“dbPath”:“/data/db”,“engine”:“wiredTiger”,“wiredTiger”:{“indexConfig”:{“prefixCompression”:true}}}}}}
{“t”:{“$date”:“2023-04-13T10:19:38.957+00:00”},“s”:“I”, “c”:“NETWORK”, “id”:23016, “ctx”:“listener”,“msg”:“Waiting for connections”,“attr”:{“port”:27017,“ssl”:“on”}}
{“t”:{“$date”:“2023-04-13T10:20:07.265+00:00”},“s”:“W”, “c”:“NETWORK”, “id”:23235, “ctx”:“conn14”,“msg”:“SSL peer certificate validation failed”,“attr”:{“reason”:“certificate signature failure”}}
{“t”:{“$date”:“2023-04-13T10:20:07.294+00:00”},“s”:“W”, “c”:“NETWORK”, “id”:23235, “ctx”:“conn16”,“msg”:“SSL peer certificate validation failed”,“attr”:{“reason”:“certificate signature failure”}}

Liveness:       exec [/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200] delay=60s timeout=10s period=30s #success=1 #failure=4
Readiness:      tcp-socket :27017 delay=10s timeout=2s period=3s #success=1 #failure=8

Warning Unhealthy 101s (x19 over 14m) kubelet (combined from similar events): Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T10:10:52Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T10:10:52Z”}

Log,pod and cluster creation info below.

Any suggestions as to issue with the certifcate please. I think I’ve seen something about changing the file location from /tmp, but not sure why this would help

also see this form of unathorised error

Warning Unhealthy 25m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T10:21:30Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get oplog.rs info: (Unauthorized) not authorized on local to execute command { collStats: "oplog.rs", scale: 1073741824, lsid: { id: UUID("eee0da00-2514-445c-9509-16440a8ef39f") }, $clusterTime: { clusterTime: Timestamp(1681381289, 1), signature: { hash: BinData(0, 62317B8DF70AF7BA8AFC806AD835DB071A387B6D), keyId: 7221476003388850183 } }, $db: "local", $readPreference: { mode: "primaryPreferred" } }”,“time”:“2023-04-13T10:21:30Z”}

Finally, if i connect to the DB when its initially created,

root@kube-1:~# kubectl run -i --rm --tty percona-client --image=percona/percona-server-mongodb:5.0 --restart=Never – mongo “mongodb+srv://${ADMIN_USER}:${ADMIN_PASSWORD}@mongodb-clu1-psmdb-db-rs0.mongodb.svc.cluster.local/admin?replicaSet=rs0&ssl=false”
If you don’t see a command prompt, try pressing enter.

rs0:PRIMARY> rs.status()
{
“ok” : 0,
“errmsg” : “not authorized on admin to execute command { replSetGetStatus: 1.0, lsid: { id: UUID("0751e5db-977c-4765-95f4-420f9e51e242") }, $clusterTime: { clusterTime: Timestamp(1681383768, 1), signature: { hash: BinData(0, 05DD5220FD7EC1609700D9A9454750F857FA7E1C), keyId: 7221469835815813126 } }, $db: "admin" }”,
“code” : 13,
“codeName” : “Unauthorized”,
“$clusterTime” : {
“clusterTime” : Timestamp(1681383788, 1),
“signature” : {
“hash” : BinData(0,“Zp/MOY9jPpPAJ7oMqp0Ix/JYMBY=”),
“keyId” : NumberLong(“7221469835815813126”)
}
},
“operationTime” : Timestamp(1681383788, 1)
}

Thanks,

Mike

Cluster details - 3 Debian 11 nodes - created via kubespray - k8s v 1.25

Same config as used to create working production instance, in Hetzner dedicated, only inventory changed.

Tested in Hetzner as 3 control plane /worker nodes
Tested in AWS as single control plane and 3 worker nodes

Same result

Mongodb creation by Helm using percona operator

helm install mongodb-clu1 percona/psmdb-db --version 1.14.0 --namespace mongodb
–set “replsets[0].volumeSpec.pvc.storageClassName=local-hostpath-mongo-prod-sc”
–set “replsets[0].name=rs0”
–set “replsets[0].size=3”
–set “replsets[0].volumeSpec.pvc.resources.requests.storage=15Gi”
–set backup.enabled=false
–set sharding.enabled=false
–set pmm.enabled=true

pvc - openEBS local volume
also tried mainly with backup.enabled=true

Note on log times - These servers are in Finland, 2 hrs ahead of UK, which is a further 1 hr ahead of UTC

So, kubectl describe pod time is current time in Finland

but, kubectl logs time appears to be UTC, and I’m guessing is reset with each restart

root@kube-1:~# kubectl get pods -n mongodb
NAME READY STATUS RESTARTS AGE
mongodb-clu1-psmdb-db-rs0-0 0/1 Running 7 (9s ago) 21m
mongodb-clu1-psmdb-db-rs0-1 1/1 Running 6 (2m42s ago) 20m
mongodb-clu1-psmdb-db-rs0-2 1/1 Running 6 (2m21s ago) 20m
perc-mongo-op-psmdb-operator-db9bfd78b-zj78z 1/1 Running 0 20h

cluster will shortly go into crashloopbackoff

pod descriptions

root@kube-1:~# kubectl describe pod -n mongodb mongodb-clu1-psmdb-db-rs0-0
Name: mongodb-clu1-psmdb-db-rs0-0
Namespace: mongodb
Priority: 0
Service Account: default
Node: kube-3/65.109.193.4
Start Time: Thu, 13 Apr 2023 12:50:21 +0300
Labels: app.kubernetes.io/component=mongod
app.kubernetes.io/instance=mongodb-clu1-psmdb-db
app.kubernetes.io/managed-by=percona-server-mongodb-operator
app.kubernetes.io/name=percona-server-mongodb
app.kubernetes.io/part-of=percona-server-mongodb
app.kubernetes.io/replset=rs0
controller-revision-hash=mongodb-clu1-psmdb-db-rs0-7d45d6fbcc
statefulset.kubernetes.io/pod-name=mongodb-clu1-psmdb-db-rs0-0
Annotations: cni.projectcalico.org/containerID: 3ce30323a2808ca216e39ec751c0144bb01c5e4a5e2dc650e58fef1f9a10e485
cni.projectcalico.org/podIP: 10.233.99.101/32
cni.projectcalico.org/podIPs: 10.233.99.101/32
percona.com/ssl-hash: b03ba102536396d3482238f36afd0a23
percona.com/ssl-internal-hash: 499cefd0eba6eca3615247fef4ea833b
Status: Running
IP: 10.233.99.101
IPs:
IP: 10.233.99.101
Controlled By: StatefulSet/mongodb-clu1-psmdb-db-rs0
Init Containers:
mongo-init:
Container ID: containerd://dcfbc4f9eab4def0cf83ad20b80406a3fe5369fdb3ab3131911488edde10e2ef
Image: percona/percona-server-mongodb-operator:1.14.0
Image ID: docker.io/percona/percona-server-mongodb-operator@sha256:b5db0eae838e338f43633163a43579f57468b05144bde1fa161825a132b29bd2
Port:
Host Port:
Command:
/init-entrypoint.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 13 Apr 2023 12:50:23 +0300
Finished: Thu, 13 Apr 2023 12:50:23 +0300
Ready: True
Restart Count: 0
Environment:
Mounts:
/data/db from mongod-data (rw)
/opt/percona from bin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fgt8c (ro)
Containers:
mongod:
Container ID: containerd://add1c63bb5d580bec14894863df1e10d5993495dd5d84bb0597ce926d518de64
Image: percona/percona-server-mongodb:6.0.4-3
Image ID: docker.io/percona/percona-server-mongodb@sha256:df46c596e6f7339badec3b36f7f209689c3f31e5391ef714be0701deef555570
Port: 27017/TCP
Host Port: 0/TCP
Command:
/opt/percona/ps-entry.sh
Args:
–bind_ip_all
–auth
–dbpath=/data/db
–port=27017
–replSet=rs0
–storageEngine=wiredTiger
–relaxPermChecks
–sslAllowInvalidCertificates
–clusterAuthMode=x509
–enableEncryption
–encryptionKeyFile=/etc/mongodb-encryption/encryption-key
–wiredTigerIndexPrefixCompression=true
State: Running
Started: Thu, 13 Apr 2023 13:11:23 +0300
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 13 Apr 2023 13:08:23 +0300
Finished: Thu, 13 Apr 2023 13:11:22 +0300
Ready: True
Restart Count: 7
Liveness: exec [/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200] delay=60s timeout=10s period=30s #success=1 #failure=4
Readiness: tcp-socket :27017 delay=10s timeout=2s period=3s #success=1 #failure=8
Environment Variables from:
internal-mongodb-clu1-psmdb-db-users Secret Optional: false
Environment:
SERVICE_NAME: mongodb-clu1-psmdb-db
NAMESPACE: mongodb
MONGODB_PORT: 27017
MONGODB_REPLSET: rs0
Mounts:
/data/db from mongod-data (rw)
/etc/mongodb-encryption from mongodb-clu1-psmdb-db-mongodb-encryption-key (ro)
/etc/mongodb-secrets from mongodb-clu1-psmdb-db-mongodb-keyfile (ro)
/etc/mongodb-ssl from ssl (ro)
/etc/mongodb-ssl-internal from ssl-internal (ro)
/etc/users-secret from users-secret-file (rw)
/opt/percona from bin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fgt8c (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
mongod-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongod-data-mongodb-clu1-psmdb-db-rs0-0
ReadOnly: false
mongodb-clu1-psmdb-db-mongodb-keyfile:
Type: Secret (a volume populated by a Secret)
SecretName: mongodb-clu1-psmdb-db-mongodb-keyfile
Optional: false
bin:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
mongodb-clu1-psmdb-db-mongodb-encryption-key:
Type: Secret (a volume populated by a Secret)
SecretName: mongodb-clu1-psmdb-db-mongodb-encryption-key
Optional: false
ssl:
Type: Secret (a volume populated by a Secret)
SecretName: mongodb-clu1-psmdb-db-ssl
Optional: false
ssl-internal:
Type: Secret (a volume populated by a Secret)
SecretName: mongodb-clu1-psmdb-db-ssl-internal
Optional: true
users-secret-file:
Type: Secret (a volume populated by a Secret)
SecretName: internal-mongodb-clu1-psmdb-db-users
Optional: false
kube-api-access-fgt8c:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 22m default-scheduler Successfully assigned mongodb/mongodb-clu1-psmdb-db-rs0-0 to kube-3
Normal Pulling 22m kubelet Pulling image “percona/percona-server-mongodb-operator:1.14.0”
Normal Pulled 22m kubelet Successfully pulled image “percona/percona-server-mongodb-operator:1.14.0” in 806.276416ms
Normal Created 22m kubelet Created container mongo-init
Normal Started 22m kubelet Started container mongo-init
Normal Pulled 22m kubelet Successfully pulled image “percona/percona-server-mongodb:6.0.4-3” in 852.256752ms
Warning Unhealthy 20m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:51:52Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:51:52Z”}
Warning Unhealthy 20m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:52:22Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:52:22Z”}
Warning Unhealthy 19m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:52:52Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:52:52Z”}
Warning Unhealthy 19m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:53:22Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:53:22Z”}
Normal Pulled 19m kubelet Successfully pulled image “percona/percona-server-mongodb:6.0.4-3” in 793.004073ms
Warning Unhealthy 17m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:54:52Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:54:52Z”}
Warning Unhealthy 17m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:55:22Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:55:22Z”}
Warning Unhealthy 16m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:55:52Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:55:52Z”}
Normal Killing 16m (x2 over 19m) kubelet Container mongod failed liveness probe, will be restarted
Warning Unhealthy 16m kubelet Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T09:56:22Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T09:56:22Z”}
Normal Started 16m (x3 over 22m) kubelet Started container mongod
Normal Created 16m (x3 over 22m) kubelet Created container mongod
Normal Pulled 16m kubelet Successfully pulled image “percona/percona-server-mongodb:6.0.4-3” in 819.327979ms
Normal Pulling 7m11s (x6 over 22m) kubelet Pulling image “percona/percona-server-mongodb:6.0.4-3”
Warning Unhealthy 101s (x19 over 14m) kubelet (combined from similar events): Liveness probe failed: {“level”:“info”,“msg”:“Running Kubernetes liveness check for mongod”,“time”:“2023-04-13T10:10:52Z”}
{“level”:“error”,“msg”:“Member failed Kubernetes liveness check: get replsetGetStatus response: (Unauthorized) command replSetGetStatus requires authentication”,“time”:“2023-04-13T10:10:52Z”}

Update

I’ve managed to reduce the damage a little by adding some read rights to to userAdmin, so one out of 3 pods no longer restarts. However, in comparing this DB to our Production instance, it becomes apparent the this is the only defined user, while there are at least 4 more users which seem to be missing. (backup, clusterMonitor, clusteradmin, databaseAdmin). I’m assuming that the operator should be defining these, any suggestions on recovery, or would it just be best to start again from scratch ?

There is a secret defining all these users → secrets.yaml
with default values.

Well, this was fun.

A number of things were going wrong here, only 1 of which now remains, which was been reported elsewhere in these forums, namely this one - “SSL peer certificate validation failed”,“attr”:{“reason”:“certificate signature failure”, but doesn’t seem to stop the database from working.

I’ve also seen mention of situations were 1 pod seems to work and the others go into CLB and constantly restart. I reviewed a lot of articles relating to Hetzner, Kubespray and Kubernetes, and came to the conclusion that it would probably be a good idea to try this cluster setup with an additional internal network.

So the whole cluster was rebuilt, with an additional 192.168 internal network, and surprise surprise, all except the SSL cert issue seems to be working. I can only conclude that either an internal iptable was missing, or that the applied firewall rules were insufficient, and that one of more ports needed to be open. As everything else seemed to be in place, it seems the only logical solution.

The operator had managed to launch the pods, but there were issues creating accounts, which then prevented health and liveness checks from working. It would be good to have a list of required ports, just to see if it’s possible to see what was missing.

Thanks,

Mike

We are using Kubernetes on GCP/GKE … zero issues regarding networking. I wonder where you run the cluster if you have issues with the fw rules ?!

SSL peer cert … either you apply your own certs or you live with this error. No side effect except the increased cost for log.