PSMDB 7.0.4 SOMETIMES starts without replicaset while has Liveness Probe in Openshift 3 and 4

HI,

I’m trying to add Liveness Probe to my MongoDB in Openshift. Without these parameters Mongo can be deployed perfectly every single time. I had to create single node replica set as pbm-agent requires it, so I need to run Mongo with replicaset.
I’m creating my replicaset like this

"rs.initiate(
 {
 _id: 'rs0',
 members: [
 { _id: 0, host: '(openshift-service-name):27017'},
 ]
 });"

Then I’m about to add Liveness Probe in helm chart

           livenessProbe:
             exec:
               command: [ "mongosh", "--host",  "localhost", "--port", "27017", "-u", "admin", "-p", "'admin'", "--authenticationDatabase", "'admin'", "--eval", "'db.getSiblingDB(\"admin\").runCommand({ replSetGetStatus: 1 }).ok ? 0 : 2'" ]
             initialDelaySeconds: 70         
             periodSeconds: 60
             failureThreshold: 3
             timeoutSeconds: 20 

It’s interesting as Liveness check with the configuration given above also “corrupts” database, but at the first run it always run normally. However if I kill the pod it’s going to be gone. It’s also confusing as if the pod is going to be killed by stateful set due to achieving failure threshold in liveness probe it will start normally almost all the time

On the other hand it’s impossible to run Readiness Probe as container is permanently in not ready state, does not matter what command I’m going to provide here. For example echo is impossible as well. I’ve tried to remove initial delay, set is as 10 seconds, have wider timeout frame etc.

I believe there are most important logs:

{"t":{"$date":"2024-01-18T06:15:34.306+00:00"},"s":"I",  "c":"CONTROL",  "id":20711,   "ctx":"LogicalSessionCacheReap","msg":"Failed to reap transaction table","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
{"t":{"$date":"2024-01-18T06:15:34.306+00:00"},"s":"I",  "c":"SHARDING", "id":7012500, "ctx":"QueryAnalysisConfigurationsRefresher","msg":"Failed to refresh query analysis configurations, will try again at the next interval","attr":{"error":"PrimarySteppedDown: No primary exists currently"}}
{"t":{"$date":"2024-01-18T06:15:34.307+00:00"},"s":"I",  "c":"NETWORK",  "id":5693100, "ctx":"ReplCoord-0","msg":"Asio socket.set_option failed with std::system_error","attr":{"note":"connect (sync) TCP fast open","option":{"level":6,"name":30,"data":"01 00 00 00"},"error":{"what":"set_option: Protocol not available","message":"Protocol not available","category":"asio.system","value":92}}}
{"t":{"$date":"2024-01-18T06:15:34.466+00:00"},"s":"I",  "c":"-",        "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":400}}
{"t":{"$date":"2024-01-18T06:15:36.800+00:00"},"s":"I",  "c":"REPL",     "id":21394,   "ctx":"ReplCoord-0","msg":"This node is not a member of the config"}
{"t":{"$date":"2024-01-18T06:15:36.800+00:00"},"s":"I",  "c":"REPL",     "id":21358,   "ctx":"ReplCoord-0","msg":"Replica set state transition","attr":{"newState":"REMOVED","oldState":"STARTUP"}}
{"t":{"$date":"2024-01-18T06:16:14.306+00:00"},"s":"I",  "c":"SHARDING", "id":7012500, "ctx":"QueryAnalysisConfigurationsRefresher","msg":"Failed to refresh query analysis configurations, will try again at the next interval","attr":{"error":"PrimarySteppedDown: No primary exists currently"}}

I consider it as a bug as it’s really random if Mongo is going to start properly

@Iliterallyneedhelp, the PSMDB operator, does not support PSMDB 7 at all. We plan to add it in the next PSDMB operator release.

2 Likes

Hi,
Thanks for your response. So deploying PSMDB on OpenShift/K8s by the hand is not supported at all? Do I need to use PSMDB operator instead?

Anyway it’s still not solving my issue. Could you please advise how should I troubleshoot it?

We support OpenShift, but you can’t use PSMDB v7 with our operator. You can use v6, v5 and v4.4 but not v7. We have official documentation how to deploy operator on OpenShift Install on OpenShift - Percona Operator for MongoDB . Try to use the supported version of PSMDB and inform us about the results.

I’ve some requirements that need to be fulfilled

  • Mongo DB 7
  • LDAP connected with AD
  • Incremental backups ( as I checked it’s not available in operator)

I achieved that with 2 separated containers(PSMDB and PBM agent) modified in docker and tested locally. Everything works perfect in docker( except incremental backup restore[ it does not work because mongod is not reachable for pbm agent that needs to stop daemon to perform backup. If you have an idea how can I fix it I’d appreciate it]), unfortunately after deployment in OpenShift. Server is going to be corrupted eventually until next reboot.

I do not think I can use Operator PSMDB instead of modified containers in Openshift at the moment

Do we have timeline when new PSDMB operator would be released which supports 7 version?

Hi @Sumeet_Chaudhari, we will include PSMDB 7 version in the next operator release. I hope we will have it in one month.

Do we know when this would support 7 version?

Hi @Sumeet_Chaudhari !
I think the release should happen this or next week, so pay attention to release notes and announcements.

@Sumeet_Chaudhari please try Operator 1.16, it now supports MongoDB version 7.

Release notes: Percona Operator for MongoDB 1.16.0 (2024-05-24) - Percona Operator for MongoDB

Nice, is there a upgrade guide for both operator and mongodb

Have a look here: Upgrade MongoDB and the Operator - Percona Operator for MongoDB