Is exposing mongo replicaset with NodePort working?

Because of IP limitation, I cannot expose replicaset with LoadBalancer, so I decided to go with NodePort. I’m running on GKE 1.18 with Percona operator 1.9.0.

I got the CR here:

And expose replicaset only as follow (sharding disabled):

replsets:
  - name: rs0
    expose:
      enabled: true
      exposeType: NodePort

sharding:
  enabled: false

Observation:

  • 3 pods are created but restarts constantly with liveness failed: unreachable server error.
  • Operator logs show "msg":"failed to reconcile cluster","Request.Namespace":"percona-mongodb","Request.Name":"my-cluster-name","replset":"rs0","error":"dial:: failed to ping mongo: context deadline exceeded".
  • The operator seems to use <node_internal_ip>:<exposed_node_port_ip> as host name to connect to mongo instances. Which is somewhat expected, although I’d expect it to use the node’s external IP.

I tried to connect to one of the mongo pod with:
kubectl exec --stdin --tty my-cluster-name-rs0-0 -- /bin/bash

Then tried to connect to the mongod running on the same pod with:

  • mongo --host localhost --port 27017 -u <username> -p <password> → This worked.
  • mongo --host <service_internal_ip> --port 27017 -u <username> -p <password> → This also worked.
  • mongo --host <cluster_internal_ip> --port <exposed_node_port> -u <username> -p <password> → This DOESN’T worked. I got a connection refused error.

The 2 first connection worked as expected but the last one didn’t. To be clear, the <cluster_internal_ip> is from the pod’s host IP (or from kubectl get nodes -owide) and the <exposed_node_port> is from the nodePort field in the pod’s service.

By curiosity, I tried to run a single percona server:

kubectl run --stdin --tty test-mongo --image percona/percona-server-mongodb:4.4.6-8 --port 27017 -- /bin/bash

# Inside test-mongo pod
mongod --bind_ip_all --port 27017

kubectl expose test-mongo --type="NodePort" --port 27017

Then I tried all the connection above again. Now all of them worked as expected, including using node’s IP and node port.

So I don’t know why mongo instances in replicaset refused the nodeport connection but a standalone mongo instance did not. Anyway, because of the connectivity issue, the cluster was failed to be reconciled.

Have you ever experienced anything similar? What can be the cause of this?

1 Like

Some more information here, if I manually expose one of the replset member pod, the connection works, like this:

k expose pod my-cluster-name-rs0-0 --type=NodePort --name=xtra-svc-0 --target-port 27017

But using the percona created service, the connection fails, despite there isn’t any difference between these services. Mind blowing.

1 Like

I believe you are hitting this bug: [K8SPSMDB-511] NodePort Port changes every 20 seconds when exposeType to NodePort and enabled set to true - Percona JIRA

Seems to be the same errors?

2 Likes

It is the same error message.

I also noticed that the bug you mentioned had this PR merged. However, I am using the operator built from the main branch, not the earlier 1.9.0 tag. So the PR actually presents in my deployment.

But the ticket is still in progress, right? I can mention this post in the ticket, there are some extra information here.

1 Like

Oh the PR is kinda unrelated

1 Like

Ah okay, it’s just like in the JIRA ticket, the nodeport changes so quickly so I cannot use any.

1 Like

I found something here. In the reconcile loop, the operator compare the last hash and service’s metadata with the current spec to decide whether it should update the service.

However in some cases, the NodePort service - after being created, will be populated with additional annotations. In my case it’s field.cattle.io/publicEndpoints which contains the public node’s IP and the nodeport. In consequence, the service’s metadata isn’t the same as the spec and then the service will be updated (while it shouldn’t), making constantly changing ports every time the operator fires a reconcile loop.

Also because of the changing port, connection error is inevitable.

1 Like

Wow @vhphan, thank you for taking initiative and investigating the issue. AFAIU, that annotation is from Rancher but you’re using GKE. Any idea why that annotation is appended to the service on GKE?

BTW, that’s probably why I couldn’t reproduce the issue. I was trying on GKE.

2 Likes

I believe it’s because my GKE cluster was provisioned by a third-party job in my company, and they somehow mix some Rancher scheduler in there. But anw, I think comparing annotation like that leaves a potential bug in the future because extra annotations added by scheduler isn’t rare.

1 Like

May I offer a discussion for fix?

1 Like

@vhphan if you add this annotation into the CR and apply - will it work normally?

I assume it should be under spec.replsets.[].expose.serviceAnnotations

It is an expected behavior of a k8s control loop - it monitors the objects and changes them if they are not reflecting the state.

1 Like

@Sergey_Pronin No it won’t, firstly because of this: [K8SPSMDB-470] ServiceAnnotation and LoadBalancerSourceRanges fields don't propagate to k8s service - Percona JIRA

Secondly that annotation is calculated and dynamically injected, I cannot know beforehand what the host IP and port will be. So whatever I put in the CR will be replaced.

Btw the field looks like this:

field.cattle.io/publicEndpoints: '[{"addresses":["10.153.146.15"],"port":32162,"protocol":"TCP","serviceName":"test-percona-operator:my-cluster-name-rs0-0","allNodes":true}]'
1 Like

Ah, it is dynamic. What do you have in mind? We can discuss here or jump into a call.

What we want to watch is essentially what users put in the CR, right? So I suggest, at least for reconciling service, that we can compare the annotation field from the CR. I suppose it can be done by comparing the current value with the old value in kubectl.kubernetes.io/last-applied-configuration. :thinking:

1 Like

Are you injecting the annotation into the Service? If I add the annotation into the service - it is not deleted.

1 Like

No I don’t add any annotation into the service myself. What are you suggesting?

1 Like

Nice find vhphan!

I will guess that this is the same issue I am experiencing, as I use Rancher to manage my local private cloud servers.

I would imagine other vendors might also add dynamic annotations.

2 Likes

@sw34 @vhphan could you please tell where these annotations are added to? Is it a service annotation? Are they added to CR?

1 Like

@Sergey_Pronin The annotation is added to service’s annotation, like so:

metadata:
  annotations:
    field.cattle.io/publicEndpoints: '[{"addresses":["x.x.x.x"],"port":30946,"protocol":"TCP","serviceName":"percona-mongodb:my-cluster-name-rs0-0","allNodes":true}]'
    percona.com/last-config-hash: eyJwb3J0cyI6W3sibmFtZSI6Im1vbmdvZGIiLCJwb3J0IjoyNzAxNywidGFyZ2V0UG9ydCI6MjcwMTd9XSwic2VsZWN0b3IiOnsic3RhdGVmdWxzZXQua3ViZXJuZXRlcy5pby9wb2QtbmFtZSI6Im15LWNsdXN0ZXItbmFtZS1yczAtMCJ9LCJ0eXBlIjoiTm9kZVBvcnQiLCJleHRlcm5hbFRyYWZmaWNQb2xpY3kiOiJMb2NhbCJ9
  creationTimestamp: "2021-08-19T11:26:12Z"
  labels:
    app.kubernetes.io/component: external-service
    app.kubernetes.io/instance: my-cluster-name
    app.kubernetes.io/managed-by: percona-server-mongodb-operator
    app.kubernetes.io/name: percona-server-mongodb
    app.kubernetes.io/part-of: percona-server-mongodb
    app.kubernetes.io/replset: rs0
  name: my-cluster-name-rs0-0
  namespace: percona-mongodb
  ownerReferences:
  - apiVersion: psmdb.percona.com/v1-10-0
    controller: true
    kind: PerconaServerMongoDB
    name: my-cluster-name
    uid: 1011931e-ea87-4693-9e8f-f944e2bd3e3c
  resourceVersion: "1041660"
  selfLink: /api/v1/namespaces/percona-mongodb/services/my-cluster-name-rs0-0
  uid: 6e33efad-840c-469a-920d-e44b3bf28cc0
spec:
  clusterIP: 10.43.8.45
  externalTrafficPolicy: Local
  ports:
  - name: mongodb
    nodePort: 30946
    port: 27017
    protocol: TCP
    targetPort: 27017
  selector:
    statefulset.kubernetes.io/pod-name: my-cluster-name-rs0-0
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
1 Like

and here is straight from Rancher. used your default cr.yaml, just changed mongos from ClusterIP to NodePort.

apiVersion: v1
kind: Service
metadata:
  annotations:
    field.cattle.io/publicEndpoints: '[{"addresses":["192.168.1.85"],"port":32150,"protocol":"TCP","serviceName":"mongo-test:my-cluster-name-mongos","allNodes":true}]'
    percona.com/last-config-hash: eyJwb3J0cyI6W3sibmFtZSI6Im1vbmdvcyIsInBvcnQiOjI3MDE3LCJ0YXJnZXRQb3J0IjoyNzAxN31dLCJzZWxlY3RvciI6eyJhcHAua3ViZXJuZXRlcy5pby9jb21wb25lbnQiOiJtb25nb3MiLCJhcHAua3ViZXJuZXRlcy5pby9pbnN0YW5jZSI6Im15LWNsdXN0ZXItbmFtZSIsImFwcC5rdWJlcm5ldGVzLmlvL21hbmFnZWQtYnkiOiJwZXJjb25hLXNlcnZlci1tb25nb2RiLW9wZXJhdG9yIiwiYXBwLmt1YmVybmV0ZXMuaW8vbmFtZSI6InBlcmNvbmEtc2VydmVyLW1vbmdvZGIiLCJhcHAua3ViZXJuZXRlcy5pby9wYXJ0LW9mIjoicGVyY29uYS1zZXJ2ZXItbW9uZ29kYiJ9LCJ0eXBlIjoiTm9kZVBvcnQiLCJleHRlcm5hbFRyYWZmaWNQb2xpY3kiOiJMb2NhbCJ9
  creationTimestamp: "2021-08-19T12:32:05Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:percona.com/last-config-hash: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"1b10af79-c392-4dd2-b94b-c1c6eb368669"}:
            .: {}
            f:apiVersion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:externalTrafficPolicy: {}
        f:ports:
          .: {}
          k:{"port":27017,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector:
          .: {}
          f:app.kubernetes.io/component: {}
          f:app.kubernetes.io/instance: {}
          f:app.kubernetes.io/managed-by: {}
          f:app.kubernetes.io/name: {}
          f:app.kubernetes.io/part-of: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: percona-server-mongodb-operator
    operation: Update
    time: "2021-08-19T12:32:05Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:field.cattle.io/publicEndpoints: {}
    manager: rancher
    operation: Update
    time: "2021-08-19T12:33:09Z"
  name: my-cluster-name-mongos
  namespace: mongo-test
  ownerReferences:
  - apiVersion: psmdb.percona.com/v1-9-0
    controller: true
    kind: PerconaServerMongoDB
    name: my-cluster-name
    uid: 1b10af79-c392-4dd2-b94b-c1c6eb368669
  resourceVersion: "43197086"
  uid: 96445f17-d656-42d7-82d4-49ba89a9e294
spec:
  clusterIP: 10.43.92.5
  clusterIPs:
  - 10.43.92.5
  externalTrafficPolicy: Local
  ports:
  - name: mongos
    nodePort: 32150
    port: 27017
    protocol: TCP
    targetPort: 27017
  selector:
    app.kubernetes.io/component: mongos
    app.kubernetes.io/instance: my-cluster-name
    app.kubernetes.io/managed-by: percona-server-mongodb-operator
    app.kubernetes.io/name: percona-server-mongodb
    app.kubernetes.io/part-of: percona-server-mongodb
  sessionAffinity: None
  type: NodePort
1 Like