Mongo pods constantly restart due to failed liveness probe

Hello,

When I create a PerconaServerMongoDB in replicaset mode, my pods restart constantly due to failed liveness probes with this error: “connection error: no reachable servers”.

I can connect to the pod and run /data/db/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem as in the liveness probe definition and indeed it fails with the same connection error.

I am running on RKE 1.18.6. My CR will be in the comment.

Thanks for your help!

1 Like

I cannot upload yaml file here so I will just put the raw file. :confused:

apiVersion: psmdb.percona.com/v1-9-0
kind: PerconaServerMongoDB
metadata:
  name: mongodb
  finalizers:
    - delete-psmdb-pvc
spec:
  crVersion: 1.9.0
  image: "percona/percona-server-mongodb:4.4.6-8"
  imagePullPolicy: Always
  allowUnsafeConfigurations: true
  updateStrategy: SmartUpdate
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: 4.4-recommended
    schedule: 0 2 * * *
    setFCV: false
  secrets:
    users: 
  pmm:
    enabled: false
    image: percona/pmm-client:2.18.0
    serverHost: monitoring-service
  replsets:
    - name: rs0
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      #      advanced:
      #        nodeAffinity:
      #          requiredDuringSchedulingIgnoredDuringExecution:
      #            nodeSelectorTerms:
      #            - matchExpressions:
      #              - key: kubernetes.io/e2e-az-name
      #                operator: In
      #                values:
      #                - e2e-az1
      #                - e2e-az2
      #    tolerations:
      #    - key: "node.alpha.kubernetes.io/unreachable"
      #      operator: "Exists"
      #      effect: "NoExecute"
      #      tolerationSeconds: 6000
      #    priorityClassName: high-priority
      #    annotations:
      #      iam.amazonaws.com/role: role-arn
      #    labels:
      #      rack: rack-22
      #    nodeSelector:
      #      disktype: ssd
      livenessProbe:
        failureThreshold: 4
        initialDelaySeconds: 60
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 10
        startupDelaySeconds: 7200
      #    runtimeClassName: image-rc
      #    sidecars:
      #    - image: busybox
      #      command: ["/bin/sh"]
      #      args: ["-c", "while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done"]
      #      name: rs-sidecar-1
      podDisruptionBudget:
        maxUnavailable: 1
        minAvailable: 3
      expose:
        enabled: true
        exposeType: LoadBalancer
        serviceAnnotations:
          service.beta.kubernetes.io/openstack-internal-load-balancer: "true"
      arbiter:
        enabled: false
      resources:
        limits:
          cpu: "2"
          memory: 8Gi
          requests:
            cpu: "2"
            memory: 8Gi
      volumeSpec:
        #      emptyDir: {}
        #      hostPath:
        #        path: /data
        #        type: Directory
        persistentVolumeClaim:
          storageClassName: percona-mongodb
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 10Gi

  sharding:
    enabled: false

  mongod:
    net:
      hostPort: 0


  backup:
    enabled: false
    restartOnFailure: true
    image: percona/percona-server-mongodb-operator:1.9.0-backup
    serviceAccountName: percona-server-mongodb-operator
    resources:

    storages:
    #      s3-us-west:
    #        type: s3
    #        s3:
    #          bucket: S3-BACKUP-BUCKET-NAME-HERE
    #          credentialsSecret: my-cluster-name-backup-s3
    #          region: us-west-2
    #      minio:
    #        type: s3
    #        s3:
    #          bucket: MINIO-BACKUP-BUCKET-NAME-HERE
    #          region: us-east-1
    #          credentialsSecret: my-cluster-name-backup-minio
    #          endpointUrl: http://minio.psmdb.svc.cluster.local:9000/minio/
    pitr:
      enabled: true
    tasks:
#      - name: daily-s3-us-west
#        enabled: true
#        schedule: "0 0 * * *"
#        keep: 3
#        storageName: s3-us-west
#        compressionType: gzip
#      - name: weekly-s3-us-west
#        enabled: false
#        schedule: "0 0 * * 0"
#        keep: 5
#        storageName: s3-us-west
  #        compressionType: gzip

1 Like

may be add some users ?

1 Like

Oh yeah I commented that out before pasting it here, forgot to put the user secrets back. But even with user registered I still have the same error. :frowning:

1 Like

may be you should do something like

kubectl get pods

so we can see if you installation is correct

because your cr.yaml looks not correct ← load the original file in a editor where you can see clearly what belongs to what … and tell what you actually want (only replicaset ? with sharding ?)

1 Like

I did load the original file from Github and modify it to meet my need, so it looks a bit different than the original one. Can you point out which part of my cr.yaml doesn’t look correct please?

Just to make sure, I tried putting the original file on, same error. I hope it isn’t a platform-specific issue. I made it work a few weeks ago with GKE and now I’m trying to make it work on an on-premise RKE cluster.

1 Like

made it work a few weeks ago with GKE

so I guess this is not a Percona Operator issue and more like that you don’t know RKE

1 Like

Thanks, very useful. =.=!

Actually it turns out that the problem only happen when I expose the replicaset with LoadBalancer or NodePort.

When I use NodePort and connect to one of the pod and launch:
mongo --host <node_ip> --port <node_port>

It returns connection refused.

By curiosity I tried spinning up a single pod running percona/percona-server-mongodb:4.4.6-8 image and start mongod --bind_ip_all --port 27017, then expose the port as a NodePort service, I can connect to this test mongo normally.

1 Like