Getting CrashLoopBackoff in rs pods when installing to vanilla k8s

Hi! Having got the operator working in minikube and OpenShift I moved on to vanilla k8s but unfortunately can't get a working db up yet. Having followed the tutorial everything seems to be created correctly but the pods are failing after a while.

$ kubectl get pods
NAME                                               READY   STATUS             RESTARTS   AGE
my-cluster-name-rs0-0                              0/1     CrashLoopBackOff   9          34m
my-cluster-name-rs0-1                              1/1     Running            9          34m
my-cluster-name-rs0-2                              1/1     Running            9          34m
percona-server-mongodb-operator-568f85969c-fl8jh   1/1     Running            0          35m
$ kubectl get pods
NAME                                               READY   STATUS             RESTARTS   AGE
my-cluster-name-rs0-0                              0/1     CrashLoopBackOff   9          37m
my-cluster-name-rs0-1                              0/1     CrashLoopBackOff   9          36m
my-cluster-name-rs0-2                              0/1     CrashLoopBackOff   9          36m
percona-server-mongodb-operator-568f85969c-fl8jh   1/1     Running            0          37m
This is on k8s v1.17 as per:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.4", GitCommit:"67d2fcf276fcd9cf743ad4be9a9ef5828adc082f", GitTreeState:"clean", BuildDate:"2019-09-18T14:51:13Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
I am seeing errors in the operator pod:

$ kubectl logs percona-server-mongodb-operator-568f85969c-fl8jh
{"level":"info","ts":1587043326.966063,"logger":"cmd","msg":"Git commit: 44e3cb883501c2adb1614df762317911d7bb16eb Git branch: master"}
{"level":"info","ts":1587043326.9661248,"logger":"cmd","msg":"Go Version: go1.12.17"}
{"level":"info","ts":1587043326.9661362,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1587043326.966145,"logger":"cmd","msg":"operator-sdk Version: v0.3.0"}
{"level":"info","ts":1587043326.966367,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1587043327.1066792,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1587043327.112258,"logger":"controller_psmdb","msg":"server version","platform":"kubernetes","version":"v1.17.2"}
{"level":"info","ts":1587043327.1129541,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"psmdb-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587043327.1132038,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"perconaservermongodbbackup-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587043327.1134188,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"perconaservermongodbbackup-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587043327.1136668,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"perconaservermongodbrestore-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587043327.1138349,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"perconaservermongodbrestore-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587043327.1138775,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1587043327.214392,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"perconaservermongodbrestore-controller"}
{"level":"info","ts":1587043327.2144375,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"perconaservermongodbbackup-controller"}
{"level":"info","ts":1587043327.2143924,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"psmdb-controller"}
{"level":"info","ts":1587043327.3160355,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"perconaservermongodbrestore-controller","worker count":1}
{"level":"info","ts":1587043327.3161073,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"perconaservermongodbbackup-controller","worker count":1}
{"level":"info","ts":1587043327.316146,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"psmdb-controller","worker count":1}
{"level":"info","ts":1587043352.4551826,"logger":"controller_psmdb","msg":"Created a new mongo key","Request.Namespace":"psmdb","Request.Name":"my-cluster-name","KeyName":"my-cluster-name-mongodb-keyfile"}
{"level":"info","ts":1587043352.4619968,"logger":"controller_psmdb","msg":"Created a new mongo key","Request.Namespace":"psmdb","Request.Name":"my-cluster-name","KeyName":"my-cluster-name-mongodb-encryption-key"}
{"level":"error","ts":1587043352.7108507,"logger":"controller_psmdb",
  "msg":"failed to reconcile cluster",
  "Request.Namespace":"psmdb",
  "Request.Name":"my-cluster-name",
  "replset":"rs0",
  "error":"handleReplsetInit:: no mongod containers in running state",
  "errorVerbose":"no mongod containers in running state ...}
{"level":"error","ts":1587043352.8449605,"logger":"kubebuilder.controller",
  "msg":"Reconciler error","controller":"psmdb-controller",
  "request":"psmdb/my-cluster-name",
  "error":"reconcile StatefulSet for rs0: update StatefulSet my-cluster-name-rs0: StatefulSet.apps \"my-cluster-name-rs0\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden",
...}
Actually connecting to mongo *while the pods are up* actually works but only by connecting without using credentials. Using userAdmin/userAdmin123456 results in "Authentication Denied". The secrets set from deploy/secrets.yaml are available in the mongo pods as env vars so looks like they are picked up. I wondered if the mongodb-healthcheck wasn't connecting because the mongo user creds weren't being set and that was causing the pods to fail?

Answers

  • jameskhedleyjameskhedley Entrant Active Member Poster
    edited April 17
    Some more diagnostics:
    $ kubectl describe pod/my-cluster-name-rs0-0
    Name:           my-cluster-name-rs0-0
    Namespace:      psmdb
    Priority:       0
    Node:           jkh-test-k8s-worker-2.fyre.ibm.com/10.51.4.169
    Start Time:     Thu, 16 Apr 2020 15:20:10 +0100
    Labels:         app.kubernetes.io/component=mongod
                    app.kubernetes.io/instance=my-cluster-name
                    app.kubernetes.io/managed-by=percona-server-mongodb-operator
                    app.kubernetes.io/name=percona-server-mongodb
                    app.kubernetes.io/part-of=percona-server-mongodb
                    app.kubernetes.io/replset=rs0
                    controller-revision-hash=my-cluster-name-rs0-78fddd4ffd
                    statefulset.kubernetes.io/pod-name=my-cluster-name-rs0-0
    Annotations:    percona.com/ssl-hash: 
                    percona.com/ssl-internal-hash: 
    Status:         Running
    IP:             10.36.0.4
    Controlled By:  StatefulSet/my-cluster-name-rs0
    Containers:
      mongod:
        Container ID:  docker://4d1de9a2666e35bd547dad1a6c922874b0f7256309f3f13a59a647585d956848
        Image:         percona/percona-server-mongodb-operator:1.4.0-mongod4.2
        Image ID:      docker-pullable://percona/[email protected]:d79a68524efb48d06e79e84b50870d1673cdfecc92b043d811e3a76cb0ae05ab
        Port:          27017/TCP
        Host Port:     0/TCP
        Args:
          --bind_ip_all
          --auth
          --dbpath=/data/db
          --port=27017
          --replSet=rs0
          --storageEngine=wiredTiger
          --relaxPermChecks
          --sslAllowInvalidCertificates
          --clusterAuthMode=keyFile
          --keyFile=/etc/mongodb-secrets/mongodb-key
          --slowms=100
          --profile=1
          --rateLimit=100
          --enableEncryption
          --encryptionKeyFile=/etc/mongodb-encryption/encryption-key
          --encryptionCipherMode=AES256-CBC
          --wiredTigerCacheSizeGB=0.25
          --wiredTigerCollectionBlockCompressor=snappy
          --wiredTigerJournalCompressor=snappy
          --wiredTigerIndexPrefixCompression=true
          --setParameter
          ttlMonitorSleepSecs=60
          --setParameter
          wiredTigerConcurrentReadTransactions=128
          --setParameter
          wiredTigerConcurrentWriteTransactions=128
        State:          Waiting
          Reason:       CrashLoopBackOff
        Last State:     Terminated
          Reason:       Completed
          Exit Code:    0
          Started:      Fri, 17 Apr 2020 00:39:56 +0100
          Finished:     Fri, 17 Apr 2020 00:42:54 +0100
        Ready:          False
        Restart Count:  105
        Limits:
          cpu:     300m
          memory:  500M
        Requests:
          cpu:      300m
          memory:   500M
        Liveness:   exec [mongodb-healthcheck k8s liveness --startupDelaySeconds 7200] delay=60s timeout=5s period=30s #success=1 #failure=4
        Readiness:  tcp-socket :27017 delay=10s timeout=2s period=3s #success=1 #failure=8
        Environment Variables from:
          my-cluster-name-secrets  Secret  Optional: false
        Environment:
          SERVICE_NAME:     my-cluster-name
          NAMESPACE:        psmdb
          MONGODB_PORT:     27017
          MONGODB_REPLSET:  rs0
        Mounts:
          /data/db from mongod-data (rw)
          /etc/mongodb-encryption from my-cluster-name-mongodb-encryption-key (ro)
          /etc/mongodb-secrets from my-cluster-name-mongodb-keyfile (ro)
          /etc/mongodb-ssl from ssl (ro)
          /etc/mongodb-ssl-internal from ssl-internal (ro)
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-m9pt6 (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
      mongod-data:
        Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
        ClaimName:  mongod-data-my-cluster-name-rs0-0
        ReadOnly:   false
      my-cluster-name-mongodb-keyfile:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  my-cluster-name-mongodb-keyfile
        Optional:    false
      my-cluster-name-mongodb-encryption-key:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  my-cluster-name-mongodb-encryption-key
        Optional:    false
      ssl:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  my-cluster-name-ssl
        Optional:    true
      ssl-internal:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  my-cluster-name-ssl-internal
        Optional:    true
      default-token-m9pt6:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-m9pt6
        Optional:    false
    QoS Class:       Guaranteed
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason     Age                   From                                         Message
      ----     ------     ----                  ----                                         -------
      Warning  Unhealthy  6m29s (x412 over 9h)  kubelet, jkh-test-k8s-worker-2.fyre.ibm.com  (combined from similar events): Liveness probe failed: 2020-04-16 23:41:22.538 main.go:74       INFO   ssl connection error: no reachable servers 
    2020-04-16 23:41:22.539 main.go:81       FATAL  Error connecting to mongodb: no reachable servers
      Warning  BackOff  2m34s (x1212 over 9h)  kubelet, jkh-test-k8s-worker-2.fyre.ibm.com  Back-off restarting failed container
    
  • Hi
    Could you please provide us with CR in yaml format? We need that for tests.
  • jameskhedleyjameskhedley Entrant Active Member Poster
    Hi Ivan, attached. I basically turned off backup, commented requests and allowed unsafe configs
  • jameskhedleyjameskhedley Entrant Active Member Poster
    Also, I tested with k8s v1.15.11 and it works fine, so potentially an issue with me using v1.17?
  • Hi James
    We've tried to reproduce the case on GKE 1.17 and found unexpected operator container behavior. However minikube 1.17 works OK.
    At the moment 1.17 is not on a supported platforms list. Please use them if possible since 1.17 is not stable.

Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.