Percona MongoDB instance don't launch with replicas after downtime

dung-tien-nguyen · August 7, 2023, 3:59am

Description:

With Percona server Mongodb in k8s using Operator,
After a case of unexpected full node downtime, the pods are back running but do not join replicas and are in a ghost state.
Does Percona have any solution, what is the manual solution to fix this problem, please help me…

Steps to Reproduce:

Shutdown k8s node or using kubectl delete --force all_pod_in_rs

Version:

I’m using Operator 1.14.0, MongoDB server 6.0.4-3

Logs:

When exec into pod, run cmd: rs.status(), output: MongoServerError: Our replica set config is invalid or we are not a member of it

Expected Result:

A automate solution or manual solution while waiting for roadmap

Actual Result:

Additional Information:

Sergey_Pronin · September 1, 2023, 7:54am

Hello @dung-tien-nguyen ,

I was not able to reproduce it with the same versions. Killed the Pods multiple times, can connect and cluster is healthy.

It also important to connect to a replicaset in the right way - for example:

mongosh "mongodb+srv://clusterAdmin:clusterAdmin123456@my-cluster-name-rs0.default.svc.cluster.local/admin?replicaSet=rs0&ssl=false"

What is psmdb object showing and Pods?

kubectl get pods
kubectl get psmdb

dung-tien-nguyen · September 1, 2023, 3:24pm

This error occurs when specifying the clusterServiceDNSMode=External value, and exposing using the NodePort or LoadBalancer service.
clusterServiceDNSMode=Internal does not have this problem. @Sergey_Pronin

Sergey_Pronin · September 4, 2023, 10:23am

@dung-tien-nguyen you are right, it does not work that way.
In the next release of the operator (coming in September), we will have split horizon feature. Thus you can have clusterServiceDNSMode=Internal, but at the same time use loadbalancers. This should address it.

dung-tien-nguyen · September 5, 2023, 3:21am

but when set clusterServiceDNSMode=Internal and accessing from outside the k8s cluster, users can only connect to single mongoDB instances, right?

Topic		Replies	Views
MongoDB Cluster cannot failover when down time all pods and using mode External (NodePort and LB) Percona Operator for MongoDB psmdb-operator	3	306	May 2, 2024
Kubernetes PSMDB shutdown signal 15 Percona Operator for MongoDB percona , mongodb , kubernetes	11	2381	September 7, 2021
Percona Server MongoDB stuck in initializing Percona Operator for MongoDB percona , mongodb , psmdb-operator	4	1854	February 21, 2023
Cross DC/region replica site of percona mongodb crashing for certificate MongoDB percona , mongodb , psmdb-operator	2	428	March 1, 2024
Percona MongoDB Replica set configuration Percona Operator for MongoDB	7	638	October 11, 2024