MongoServerSelectionError: ReplicaSetNoPrimary

Hi friends,

I have created a mongodb instance and exposed it externally via “Loadbalancer” and im attempting to connect to mongodb using a node.js application and the ofifical mongodb driver, altough this problem is not specific to node.js and it also occurs in the official gui mongodb compass.

When connecting to mongodb if i add the option directConnection:true it then works.

With my current setup why does this occur? Have i misconfigured something that is forcing me to use directConnection: true

error:

MongoServerSelectionError: getaddrinfo EAI_AGAIN psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local
    at Timeout._onTimeout (/home/kay/checkpoint/example/example-api/node_modules/mongodb/src/sdam/topology.ts:564:30)
    at listOnTimeout (node:internal/timers:564:17)
    at processTimers (node:internal/timers:507:7) {
  reason: TopologyDescription {
    type: 'ReplicaSetNoPrimary',
    servers: Map(1) {
      'psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017' => [ServerDescription]
    },
    stale: false,
    compatible: true,
    heartbeatFrequencyMS: 10000,
    localThresholdMS: 15,
    setName: 'rs0',
    maxElectionId: new ObjectId("7fffffff0000000000000001"),
    maxSetVersion: 3,
    commonWireVersion: 0,
    logicalSessionTimeoutMinutes: null
  },
  code: undefined,
  [Symbol(errorLabels)]: Set(0) {}
}

rs.status()

rs0:PRIMARY> rs.status()
{
        "set" : "rs0",
        "date" : ISODate("2023-04-27T10:21:01.518Z"),
        "myState" : 1,
        "term" : NumberLong(3),
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "majorityVoteCount" : 1,
        "writeMajorityCount" : 1,
        "votingMembersCount" : 1,
        "writableVotingMembersCount" : 1,
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1682590858, 1),
                        "t" : NumberLong(3)
                },
                "lastCommittedWallTime" : ISODate("2023-04-27T10:20:58.192Z"),
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1682590858, 1),
                        "t" : NumberLong(3)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1682590858, 1),
                        "t" : NumberLong(3)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1682590858, 1),
                        "t" : NumberLong(3)
                },
                "lastAppliedWallTime" : ISODate("2023-04-27T10:20:58.192Z"),
                "lastDurableWallTime" : ISODate("2023-04-27T10:20:58.192Z")
        },
        "lastStableRecoveryTimestamp" : Timestamp(1682590823, 1),
        "electionCandidateMetrics" : {
                "lastElectionReason" : "electionTimeout",
                "lastElectionDate" : ISODate("2023-04-27T10:09:27.983Z"),
                "electionTerm" : NumberLong(3),
                "lastCommittedOpTimeAtElection" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "lastSeenOpTimeAtElection" : {
                        "ts" : Timestamp(1682590146, 1),
                        "t" : NumberLong(2)
                },
                "numVotesNeeded" : 1,
                "priorityAtElection" : 2,
                "electionTimeoutMillis" : NumberLong(10000),
                "newTermStartDate" : ISODate("2023-04-27T10:09:27.986Z"),
                "wMajorityWriteAvailabilityDate" : ISODate("2023-04-27T10:09:27.988Z")
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 697,
                        "optime" : {
                                "ts" : Timestamp(1682590858, 1),
                                "t" : NumberLong(3)
                        },
                        "optimeDate" : ISODate("2023-04-27T10:20:58Z"),
                        "lastAppliedWallTime" : ISODate("2023-04-27T10:20:58.192Z"),
                        "lastDurableWallTime" : ISODate("2023-04-27T10:20:58.192Z"),
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1682590167, 1),
                        "electionDate" : ISODate("2023-04-27T10:09:27Z"),
                        "configVersion" : 5,
                        "configTerm" : 3,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1682590858, 1),
                "signature" : {
                        "hash" : BinData(0,"21SiIhjRT7Epx+N06OuODKfaV1k="),
                        "keyId" : NumberLong("7226631497447374854")
                }
        },
        "operationTime" : Timestamp(1682590858, 1)
}

yaml

# Default values for psmdb-cluster.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Platform type: kubernetes, openshift
# platform: kubernetes

# Cluster DNS Suffix
# clusterServiceDNSSuffix: svc.cluster.local
# clusterServiceDNSMode: "Internal"

finalizers:
## Set this if you want that operator deletes the primary pod last
  - delete-psmdb-pods-in-order
## Set this if you want to delete database persistent volumes on cluster deletion
#  - delete-psmdb-pvc

nameOverride: ""
fullnameOverride: ""

env:
  name: LOG_STRUCTURED
  value: 'false'
  name: LOG_LEVEL
  value: DEBUG


crVersion: 1.14.0
pause: false
unmanaged: false
allowUnsafeConfigurations: true
# ignoreAnnotations:
#   - service.beta.kubernetes.io/aws-load-balancer-backend-protocol
# ignoreLabels:
#   - rack
multiCluster:
  enabled: false
  # DNSSuffix: svc.clusterset.local
updateStrategy: SmartUpdate
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false

image:
  repository: percona/percona-server-mongodb
  tag: 6.0.4-3

imagePullPolicy: Always
# imagePullSecrets: []
# initImage:
#   repository: percona/percona-server-mongodb-operator
#   tag: 1.14.0
# initContainerSecurityContext: {}
# tls:
#   # 90 days in hours
#   certValidityDuration: 2160h
secrets: {}
  # If you set users secret here the operator will use existing one or generate random values
  # If not set the operator generates the default secret with name <cluster_name>-secrets
  # users: my-cluster-name-secrets
  # encryptionKey: my-cluster-name-mongodb-encryption-key

pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.35.0
  serverHost: monitoring-service

replsets:
  - name: rs0
    size: 1
    # externalNodes:
    # - host: 34.124.76.90
    # - host: 34.124.76.91
    #   port: 27017
    #   votes: 0
    #   priority: 0
    # - host: 34.124.76.92
    # configuration: |
    #   operationProfiling:
    #     mode: slowOp
    #   systemLog:
    #     verbosity: 1
    antiAffinityTopologyKey: "kubernetes.io/hostname"
    # tolerations: []
    # priorityClass: ""
    # annotations: {}
    # labels: {}
    nodeSelector:
      acme/node-type: "ops"
    # livenessProbe:
    #   failureThreshold: 4
    #   initialDelaySeconds: 60
    #   periodSeconds: 30
    #   timeoutSeconds: 10
    #   startupDelaySeconds: 7200
    # readinessProbe:
    #   failureThreshold: 8
    #   initialDelaySeconds: 10
    #   periodSeconds: 3
    #   successThreshold: 1
    #   timeoutSeconds: 2
    # runtimeClassName: image-rc
    # storage:
    #   engine: wiredTiger
    #   wiredTiger:
    #     engineConfig:
    #       cacheSizeRatio: 0.5
    #       directoryForIndexes: false
    #       journalCompressor: snappy
    #     collectionConfig:
    #       blockCompressor: snappy
    #     indexConfig:
    #       prefixCompression: true
    #   inMemory:
    #     engineConfig:
    #        inMemorySizeRatio: 0.5
    sidecars:
    - image: percona/mongodb_exporter:0.36
      env:
      - name: EXPORTER_USER
        valueFrom:
          secretKeyRef:
            name: psmdb-db-secrets
            key: MONGODB_CLUSTER_MONITOR_USER
      - name: EXPORTER_PASS
        valueFrom:
          secretKeyRef:
            name: psmdb-db-secrets
            key: MONGODB_CLUSTER_MONITOR_PASSWORD
      - name: POD_IP
        valueFrom:
          fieldRef:
            fieldPath: status.podIP
      - name: MONGODB_URI
        value: "mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP):27017"
      args: ["--discovering-mode", "--compatible-mode", "--collect-all", "--mongodb.uri=$(MONGODB_URI)"]
      name: metrics
    #   volumeMounts:
    #     - mountPath: /volume1
    #       name: sidecar-volume-claim
    #     - mountPath: /secret
    #       name: sidecar-secret
    #     - mountPath: /configmap
    #       name: sidecar-config
    # sidecarVolumes:
    # - name: sidecar-secret
    #   secret:
    #     secretName: mysecret
    # - name: sidecar-config
    #   configMap:
    #     name: myconfigmap
    # sidecarPVCs:
    # - apiVersion: v1
    #   kind: PersistentVolumeClaim
    #   metadata:
    #     name: sidecar-volume-claim
    #   spec:
    #     resources:
    #       requests:
    #         storage: 1Gi
    #     volumeMode: Filesystem
    #     accessModes:
    #       - ReadWriteOnce
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: true
      exposeType: LoadBalancer
      # loadBalancerSourceRanges:
      #   - 10.0.0.0/8
      # serviceAnnotations:
      #   service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
      # serviceLabels: 
      #   some-label: some-key
    nonvoting:
      enabled: false
      # podSecurityContext: {}
      # containerSecurityContext: {}
      size: 3
      # configuration: |
      #   operationProfiling:
      #     mode: slowOp
      #   systemLog:
      #     verbosity: 1
      antiAffinityTopologyKey: "kubernetes.io/hostname"
      # tolerations: []
      # priorityClass: ""
      # annotations: {}
      # labels: {}
      # nodeSelector: {}
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        # emptyDir: {}
        # hostPath:
        #   path: /data
        pvc:
          # annotations:
          #   volume.beta.kubernetes.io/storage-class: example-hostpath
          # labels:
          #   rack: rack-22
          # storageClassName: standard
          # accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      antiAffinityTopologyKey: "kubernetes.io/hostname"
      # tolerations: []
      # priorityClass: ""
      # annotations: {}
      # labels: {}
      # nodeSelector: {}
    # schedulerName: ""
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      # emptyDir: {}
      # hostPath:
      #   path: /data
      pvc:
        # annotations:
        #   volume.beta.kubernetes.io/storage-class: example-hostpath
        # labels:
        #   rack: rack-22
        storageClassName: mongodb
        # accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 250Gi

sharding:
  enabled: false 

backup:
  enabled: true
  image:
    repository: percona/percona-backup-mongodb
    tag: 2.0.5
  serviceAccountName: percona-server-mongodb-operator
  #  annotations:
  #  iam.amazonaws.com/role: 
  # resources:
  #   limits:
  #     cpu: "300m"
  #     memory: "0.5G"
  #   requests:
  #     cpu: "300m"
  #     memory: "0.5G"
  storages:
    s3-eu-west:
      type: s3
      s3:
        bucket: acme-test-mongodb-backup
        credentialsSecret: prod-aws-mongodb
        region: eu-west-2
        prefix: ""
        uploadPartSize: 10485760
        maxUploadParts: 10000
        storageClass: STANDARD
        insecureSkipTLSVerify: false
    # minio:
    #   type: s3
    #   s3:
    #     bucket: MINIO-BACKUP-BUCKET-NAME-HERE
    #     region: us-east-1
    #     credentialsSecret: my-cluster-name-backup-minio
    #     endpointUrl: http://minio.psmdb.svc.cluster.local:9000/minio/
    #     prefix: ""
    #   azure-blob:
    #     type: azure
    #     azure:
    #       container: CONTAINER-NAME
    #       prefix: PREFIX-NAME
    #       credentialsSecret: SECRET-NAME
  pitr:
    enabled: false
    # oplogSpanMin: 10
    # compressionType: gzip
    # compressionLevel: 6
  tasks:
   - name: "every-hour-backup"
     enabled: true
     schedule: "0 * * * *"
     keep: 3
     type: logical
     storageName: s3-eu-west

  # - name: daily-s3-us-west
  #   enabled: true
  #   schedule: "0 0 * * *"
  #   keep: 3
  #   storageName: s3-us-west
  #   compressionType: gzip
  # - name: weekly-s3-us-west
  #   enabled: false
  #   schedule: "0 0 * * 0"
  #   keep: 5
  #   storageName: s3-us-west
  #   compressionType: gzip
  # - name: weekly-s3-us-west-physical
  #   enabled: false
  #   schedule: "0 5 * * 0"
  #   keep: 5
  #   type: physical
  #   storageName: s3-us-west
  #   compressionType: gzip
  #   compressionLevel: 6

# If you set users here the secret will be constructed by helm with these values
# users:
#   MONGODB_BACKUP_USER: backup
#   MONGODB_BACKUP_PASSWORD: backup123456
#   MONGODB_DATABASE_ADMIN_USER: databaseAdmin
#   MONGODB_DATABASE_ADMIN_PASSWORD: databaseAdmin123456
#   MONGODB_CLUSTER_ADMIN_USER: clusterAdmin
#   MONGODB_CLUSTER_ADMIN_PASSWORD: clusterAdmin123456
#   MONGODB_CLUSTER_MONITOR_USER: clusterMonitor
#   MONGODB_CLUSTER_MONITOR_PASSWORD: clusterMonitor123456
#   MONGODB_USER_ADMIN_USER: userAdmin
#   MONGODB_USER_ADMIN_PASSWORD: userAdmin123456
#   PMM_SERVER_API_KEY: apikey
#   # PMM_SERVER_USER: admin
#   # PMM_SERVER_PASSWORD: admin

Hello @Kay_Khan ,

seems you are doing all correctly.
It is how mongodb works. https://www.mongodb.com/docs/mongodb-shell/connect/

You have just one node and you are connecting. This calls for directConnection parameter. Once you will connect to multiple nodes and have ?replicaSet flag set in the URI, there should be no need to set directConnection param.
At the same time, I’m a bit confused why if you use a LoadBalancer, you connect to mongodb.svc.cluster.local - which is a ClusterIP entity.

Hi thank you for replying. Is there some documentation you can point me to that implies we must use directConnection with a single node cluster?

At the same time, I’m a bit confused why if you use a LoadBalancer, you connect to mongodb.svc.cluster.local - which is a ClusterIP entity.

I’m not sure, this is how it has been setup using this operator.


Im also not sure why, but i changed clusterServiceDNSMode to use the External value. This allows me to connect to this same cluster without the directConnection option. The differnece i can see between the original and the new version is the following.

  1. members[0].name now points directly to the aws load balancer that was created

clusterServiceDNSMode: External

{
  set: 'rs0',
  date: ISODate("2023-04-28T12:14:17.160Z"),
  myState: 1,
  term: Long("14"),
  syncSourceHost: '',
  syncSourceId: -1,
  heartbeatIntervalMillis: Long("2000"),
  majorityVoteCount: 1,
  writeMajorityCount: 1,
  votingMembersCount: 1,
  writableVotingMembersCount: 1,
  optimes: {
    lastCommittedOpTime: { ts: Timestamp({ t: 1682684054, i: 1 }), t: Long("14") },
    lastCommittedWallTime: ISODate("2023-04-28T12:14:14.204Z"),
    readConcernMajorityOpTime: { ts: Timestamp({ t: 1682684054, i: 1 }), t: Long("14") },
    appliedOpTime: { ts: Timestamp({ t: 1682684054, i: 1 }), t: Long("14") },
    durableOpTime: { ts: Timestamp({ t: 1682684054, i: 1 }), t: Long("14") },
    lastAppliedWallTime: ISODate("2023-04-28T12:14:14.204Z"),
    lastDurableWallTime: ISODate("2023-04-28T12:14:14.204Z")
  },
  lastStableRecoveryTimestamp: Timestamp({ t: 1682684012, i: 1 }),
  electionCandidateMetrics: {
    lastElectionReason: 'electionTimeout',
    lastElectionDate: ISODate("2023-04-27T15:42:37.470Z"),
    electionTerm: Long("14"),
    lastCommittedOpTimeAtElection: { ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
    lastSeenOpTimeAtElection: { ts: Timestamp({ t: 1682610067, i: 1 }), t: Long("13") },
    numVotesNeeded: 1,
    priorityAtElection: 2,
    electionTimeoutMillis: Long("10000"),
    newTermStartDate: ISODate("2023-04-27T15:42:37.474Z"),
    wMajorityWriteAvailabilityDate: ISODate("2023-04-27T15:42:37.476Z")
  },
  members: [
    {
      _id: 0,
      name: '<redacted>-1771644787.eu-west-2.elb.amazonaws.com:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
      uptime: 73914,
      optime: { ts: Timestamp({ t: 1682684054, i: 1 }), t: Long("14") },
      optimeDate: ISODate("2023-04-28T12:14:14.000Z"),
      lastAppliedWallTime: ISODate("2023-04-28T12:14:14.204Z"),
      lastDurableWallTime: ISODate("2023-04-28T12:14:14.204Z"),
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      electionTime: Timestamp({ t: 1682610157, i: 1 }),
      electionDate: ISODate("2023-04-27T15:42:37.000Z"),
      configVersion: 6,
      configTerm: 14,
      self: true,
      lastHeartbeatMessage: ''
    }
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1682684054, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("535fbf62d30588efa00587e3fd4f1a56f9f41eba", "hex"), 0),
      keyId: Long("7226631497447374854")
    }
  },
  operationTime: Timestamp({ t: 1682684054, i: 1 })
}

Would really love help with understanding what is happening here.

My concern is we are pivoting away from bitnami mongodb to percona operator. We don’t want to have to update a bunch of applications that already have connection to mongodb to add this extra option directConnection if i don’t have to. and it seems like i should not need it ( open to being wrong about this ) but seems like there is a way when the members name is pointing to the load balancer ip directly via clusterServiceDNSMode=External.

You can see with the config in my OP under (yaml), that it creates both a ClusterIP and Loadbalancer is that expected?

NAME                                  READY   STATUS    RESTARTS   AGE
pod/psmdb-db-rs0-0                    3/3     Running   0          3m47s
pod/psmdb-operator-584fc857f8-qxsj9   1/1     Running   0          28h

NAME                     TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)           AGE
service/psmdb-db-rs0     ClusterIP      None             <none>                                                                    27017/TCP         28h
service/psmdb-db-rs0-0   LoadBalancer   172redacted138   <redacted>-redacted.eu-west-2.elb.amazonaws.com   27017:30608/TCP   28h
service/psmdb-metrics    ClusterIP      172.20.130.58    <none>                                                                    9216/TCP          4d

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/psmdb-operator   1/1     1            1           28h

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/psmdb-operator-584fc857f8   1         1         1       28h

NAME                            READY   AGE
statefulset.apps/psmdb-db-rs0   1/1     22h

Yeah, it is okay. The ClusterIP service is headless. It is always there.

Okay i think we decided we are going to update our apps to include the directConnection: true as we’ve seen this mentioned else where. Which means we’ve also reverted the default clusterServiceDNSMode which is probably good as we were unsure in the first place what side effects that would have, its briefly mentioned in the documentation here Exposing the cluster - Percona Operator for MongoDB

You should be careful with the clusterServiceDNSMode=External variant. Using IP addresses instead of DNS hostnames is discouraged in MongoDB. IP addresses make configuration changes and recovery more complicated. Also, they are particularly problematic in scenarios where IP addresses change (i.e., deleting and recreating the cluster).

So finally i think we are just confusd on whether we ought to be taking any action given your comment below

At the same time, I’m a bit confused why if you use a LoadBalancer, you connect to mongodb.svc.cluster.local - which is a ClusterIP entity.

Hey!
Going deeper into ClusterIP comment that I made:
If you expose the database through Loadbalancer and then connect to it through ClusterIP, I’m not sure if you really need a LoadBalancer.

Okay,

Just to clarify as im a little confused, the resources created in kubernetes and configuration of mongodb has been setup based on the yaml file using the percona mongodb operator. I have not made any extra configuration changes or added any additional resources ( other than service/psmdb-metrics) .

Most likely you are using the Operator in non-sharding aka Statefulset mode.

Then you have to provide the name of the replicaset in the connection string. And the Statefulset is handling to which pod the connection should go.

You never connect directly to a pod of the statefulset.

Sorry, how should i reconfigure my config to ensure that i dont need to connect directly to a pod? ( dont need to use directConnection=true)

@Kay_Khan something like this:

mongo "mongodb+srv://databaseAdmin:databaseAdminPassword@my-cluster-name-rs0.<namespace name>.svc.cluster.local/admin?replicaSet=rs0&ssl=false"

note the +srv piece

Also you can just list all the endpoints of all your replica set nodes in a connection string.

mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

This does not work when i substitute my-cluster-name-rs0.<namespace name>.svc.cluster.local/ with my internal nlb address.

In mongodb compass

mongodb+srv://removed:removed@k8s-mongodb-psmdbdbi-1e0cbbc5c7-.elb.us-east-2.amazonaws.com/admin?replicaSet=rs0&ssl=false

I am trying to connect to an internal mongodb instance from my local machine via VPN,

Local Machine → VPN Into VPC → NLB → Kubernetes Cluster → Percona Mongodb.