Can't get simple deployment to work without errors

Hi,

I’m trying to setup a HI Percona MongoDB server on a k8s cluster with Percona Operator for MDB.
I followed the ‘setup on kubernetes’ tutorial as I’m on DigitalOcean.

After setup everything boots up but I have recurring errors in the logs and restarts.

On the Operator I get this every few seconds/minutes:

"level":"info","ts":1624551290.0543494,"logger":"controller_psmdb","msg":"adding rs to shard","rs":"rs0"}
{"level":"error","ts":1624551310.0939054,"logger":"controller_psmdb","msg":"failed to reconcile cluster","Request.Namespace":"mongodbnamespace","Request.Name":"test","replset":"rs0","error":"add shard: failed to add shard: add shard: (FailedToSatisfyReadPreference) Could not find host matching read preference { mode: \"primary\" } for set rs0","errorVerbose":"(FailedToSatisfyReadPreference) Could not find host matching read preference { mode: \"primary\" } for set rs0\nadd shard\ngithub...........

On the rs0-X pods I get :

{"t":{"$date":"2021-06-24T16:29:20.805+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:58856","connectionId":964,"connectionCount":5}}
{"t":{"$date":"2021-06-24T16:29:20.816+00:00"},"s":"W",  "c":"NETWORK",  "id":23235,   "ctx":"conn964","msg":"SSL peer certificate validation failed","attr":{"reason":"certificate signature failure"}}

and on the cfg pods I get :

{"t":{"$date":"2021-06-24T16:30:09.894+00:00"},"s":"W",  "c":"NETWORK",  "id":23235,   "ctx":"conn986","msg":"SSL peer certificate validation failed","attr":{"reason":"certificate signature failure"}}

Here are my steps to reproduce :

kubectl apply -f crd.yaml
kubectl create namespace mongodbnamespace
kubectl config set-context $(kubectl config current-context) --namespace=mongodbnamespace
kubectl apply -f rbac.yaml
kubectl apply -f secrets.yaml
kubectl apply -f operator.yaml
kubectl apply -f cr.yaml

cr.yaml :

apiVersion: psmdb.percona.com/v1-8-0
kind: PerconaServerMongoDB
metadata:
  name: test
spec:
  crVersion: 1.8.0
  image: percona/percona-server-mongodb:4.4.5-7
  imagePullPolicy: Always
  allowUnsafeConfigurations: false
  updateStrategy: SmartUpdate
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: 4.4-recommended
    schedule: "0 2 * * *"
    setFCV: false
  secrets:
    users: test-db-secrets
  pmm:
    enabled: false
    image: percona/pmm-client:2.12.0
    serverHost: monitoring-service
  replsets:

  - name: rs0
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: true
      exposeType: LoadBalancer
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 10Gi

  sharding:
    enabled: true

    configsvrReplSet:
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 10Gi

    mongos:
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      expose:
        exposeType: ClusterIP

  mongod:
    net:
      port: 27017
      hostPort: 0
    security:
      redactClientLogData: false
      enableEncryption: true
      encryptionKeySecret: test-mongodb-encryption-key
      encryptionCipherMode: AES256-CBC
    setParameter:
      ttlMonitorSleepSecs: 60
      wiredTigerConcurrentReadTransactions: 128
      wiredTigerConcurrentWriteTransactions: 128
    storage:
      engine: wiredTiger
      inMemory:
        engineConfig:
          inMemorySizeRatio: 0.9
      wiredTiger:
        engineConfig:
          cacheSizeRatio: 0.5
          directoryForIndexes: false
          journalCompressor: snappy
        collectionConfig:
          blockCompressor: snappy
        indexConfig:
          prefixCompression: true
    operationProfiling:
      mode: slowOp
      slowOpThresholdMs: 100
      rateLimit: 100

  backup:
    enabled: false

what am I missing ?

Thanks a lot.

1 Like

Hey @hmenzagh ,

Do you have similar issues on other k8s flavors/services?
Do you use managed kubernetes on Digital Ocean? What CNI do they use? I know there are some issues with old Calico versions.

1 Like

Don’t you need the sidecar for the cfg’s ? It’s commented in the cr.yaml but it is necessary to get it running.

  sidecars:
  - args:
    - -c
    - while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done
    command:
    - /bin/sh
    image: busybox
    name: rs-sidecar-1
1 Like

You don’t need a sidecar for cfgs.

I found this bug though which looks similar to what you have: https://jira.percona.com/browse/K8SPSMDB-504

If replicaset nodes are exposed with LoadBalancer (service per pod) then sometimes the cluster might get into the crash loop.

The code fixing it is already merged here and we are going to release it in 1.10.0.

1 Like

I did a test on Digital Ocean and can confirm this issue. It is related to https://jira.percona.com/browse/K8SPSMDB-504 and will be fixed in the next release.

1 Like