Primary replicaset constantly restarts

vhphan · July 9, 2021, 3:08pm

Hi everyone, I am having an issue with percona mongodb server replicaset, deployed by percona mongo operator. The provisoning completed but it looks like the primary pod constantly restarts.

kubectl get po                    
NAME                                               READY   STATUS             RESTARTS   AGE
my-cluster-name-rs0-0                              0/1     CrashLoopBackOff   13         57m
my-cluster-name-rs0-1                              1/1     Running            0          56m
my-cluster-name-rs0-2                              1/1     Running            0          56m
percona-server-mongodb-operator-7ff49bcf85-vvv7l   1/1     Running            0          59m

The pod’s logs are full of this:

{"t":{"$date":"2021-07-09T14:49:35.801+00:00"},"s":"I",  "c":"REPL_HB",  "id":23974,   "ctx":"ReplCoord-27","msg":"Heartbeat failed after max retries","attr":{"target":"x.x.x.x:27017","maxHeartbeatRetries":2,"error":{"code":93,"codeName":"InvalidReplicaSetConfig","errmsg":"replica set IDs do not match, ours: 60e85504878f18f4cc284693; remote node's: 60e855651d196b68b3d5a990"}}}

I also see some failed liveness probe on

my-cluster-name-rs0-0

:

Warning  Unhealthy  59m  kubelet  Liveness probe failed: {"level":"info","msg":"Running Kubernetes liveness check for mongod","time":"2021-07-09T13:55:05Z"}
{"level":"error","msg":"replSetGetStatus returned error command replSetGetStatus requires authentication","time":"2021-07-09T13:55:05Z"}

I follow the standard deployment found on Install Percona server for MongoDB on Kubernetes. I ran it on a GKE cluster. Hope someone can help me out, thanks.

Sergey_Pronin · July 12, 2021, 9:49am

Hello @vhphan ,

I tried to reproduce the issue by setting up GKE cluster and following this manual. I cannot reproduce the issue. All pods are healthy and no probes are failing.

Is there anything specific about your cluster or DB custom resource?

vhphan · July 12, 2021, 10:13am

Hi, sorry I forgot to mention. There are some custom changes compared to the original CR:

Sharding mode disabled
Expose replicaset with loadbalancer
Disable node anti-affinity

spec:
  allowUnsafeConfigurations: true
  replsets:
    - name: rs0
      affinity:
        antiAffinityTopologyKey: none
      expose:
        enabled: true
        exposeType: LoadBalancer
  sharding:
    enabled: false

Sergey_Pronin · July 12, 2021, 1:50pm

Tried with these changes - still works. What is the node instance type you have in GKE?
I suspect it might be something to do with resources.

vhphan · July 12, 2021, 8:28pm

It’s e2-standard-4 with cos. But I tried to redeploy from scratch several times and the issue only appear once in a while. Quite rare actually.

Now I’m thinking that it might be a network problem related to my vpc settings. The cluster is in a private corporate network but the LB is public. From what I see, if I expose the replicaset, the members’ addresses will be the LBs’ addresses, right? The operator hasn’t implemented split horizon DNS yet, so communication between replicaset members will go outside of the cluster network?

Topic		Replies	Views
Many restart pod percona-server rs0 Percona Server for MongoDB percona , mongodb	1	37	June 23, 2025
Mongo pods constantly restart due to failed liveness probe Percona Operator for MongoDB percona , mongodb	9	2421	January 9, 2025
Percona server for mongodb with kubernetes can be built with only replicasets Percona Operator for MongoDB	1	987	June 16, 2021
Percona MongoDB instance don't launch with replicas after downtime Percona Operator for MongoDB	4	918	September 5, 2023
No replicaset available on and off Percona Server for MongoDB percona , mongodb	1	1602	March 1, 2024

Primary replicaset constantly restarts

Related topics