No replicaset available on and off

Hi dears,

I’ve a problem that I’m facing for about 2 months. The problem is that my customized mongodb server SOMETIMES starts without replicaset while being hosted in Openshift( both 3.11 and 4). As I checked logs it looks like server is looking for replicaset so it attempts to create new one but it fails. There are most important logs

{"t":{"$date":"2024-01-30T08:52:25.330+00:00"},"s":"I",  "c":"REPL",     "id":6005300, "ctx":"initandlisten","msg":"Starting up replica set aware services"}
{"t":{"$date":"2024-01-30T08:52:25.331+00:00"},"s":"I",  "c":"REPL",     "id":4280500, "ctx":"initandlisten","msg":"Attempting to create internal replication collections"}
{"t":{"$date":"2024-01-30T08:52:25.331+00:00"},"s":"I",  "c":"-",        "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":200}}
{"t":{"$date":"2024-01-30T08:52:25.334+00:00"},"s":"I",  "c":"REPL",     "id":4280501, "ctx":"initandlisten","msg":"Attempting to load local voted for document"}
{"t":{"$date":"2024-01-30T08:52:25.335+00:00"},"s":"I",  "c":"REPL",     "id":4280502, "ctx":"initandlisten","msg":"Searching for local Rollback ID document"}
{"t":{"$date":"2024-01-30T08:52:25.336+00:00"},"s":"I",  "c":"REPL",     "id":21529,   "ctx":"initandlisten","msg":"Initializing rollback ID","attr":{"rbid":14}}
{"t":{"$date":"2024-01-30T08:52:25.336+00:00"},"s":"I",  "c":"REPL",     "id":4280504, "ctx":"initandlisten","msg":"Cleaning up any partially applied oplog batches & reading last op from oplog"}
{"t":{"$date":"2024-01-30T08:52:25.337+00:00"},"s":"I",  "c":"CONTROL",  "id":6608200, "ctx":"initandlisten","msg":"Initializing cluster server parameters from disk"}
{"t":{"$date":"2024-01-30T08:52:25.338+00:00"},"s":"I",  "c":"REPL",     "id":21544,   "ctx":"initandlisten","msg":"Recovering from stable timestamp","attr":{"stableTimestamp":{"$timestamp":{"t":1706604728,"i":2}},"topOfOplog":{"ts":{"$timestamp":{"t":1706604728,"i":2}},"t":41},"appliedThrough":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1}}}
{"t":{"$date":"2024-01-30T08:52:25.338+00:00"},"s":"I",  "c":"REPL",     "id":21545,   "ctx":"initandlisten","msg":"Starting recovery oplog application at the stable timestamp","attr":{"stableTimestamp":{"$timestamp":{"t":1706604728,"i":2}}}}
{"t":{"$date":"2024-01-30T08:52:25.338+00:00"},"s":"I",  "c":"REPL",     "id":5466604, "ctx":"initandlisten","msg":"Start point for recovery oplog application exists in oplog. No adjustment necessary","attr":{"startPoint":{"$timestamp":{"t":1706604728,"i":2}}}}
{"t":{"$date":"2024-01-30T08:52:25.338+00:00"},"s":"I",  "c":"REPL",     "id":21549,   "ctx":"initandlisten","msg":"No oplog entries to apply for recovery. Start point is at the top of the oplog"}
{"t":{"$date":"2024-01-30T08:52:25.338+00:00"},"s":"I",  "c":"REPL",     "id":4280505, "ctx":"initandlisten","msg":"Creating any necessary TenantMigrationAccessBlockers for unfinished migrations"}
{"t":{"$date":"2024-01-30T08:52:25.339+00:00"},"s":"I",  "c":"TENANT_M", "id":6531504, "ctx":"initandlisten","msg":"Clearing serverless operation lock registry on shutdown","attr":{"ns":null}}
{"t":{"$date":"2024-01-30T08:52:25.340+00:00"},"s":"I",  "c":"REPL",     "id":4280506, "ctx":"initandlisten","msg":"Reconstructing prepared transactions"}
{"t":{"$date":"2024-01-30T08:52:25.341+00:00"},"s":"I",  "c":"REPL",     "id":4280507, "ctx":"initandlisten","msg":"Loaded replica set config, scheduled callback to set local config"}
{"t":{"$date":"2024-01-30T08:52:25.341+00:00"},"s":"I",  "c":"REPL",     "id":4280508, "ctx":"ReplCoord-0","msg":"Attempting to set local replica set config; validating config for startup"}
{"t":{"$date":"2024-01-30T08:52:25.341+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
{"t":{"$date":"2024-01-30T08:52:25.342+00:00"},"s":"I",  "c":"QUERY",    "id":7080100, "ctx":"ChangeStreamExpiredPreImagesRemover","msg":"Starting Change Stream Expired Pre-images Remover thread"}
{"t":{"$date":"2024-01-30T08:52:25.343+00:00"},"s":"I",  "c":"CONTROL",  "id":20710,   "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
{"t":{"$date":"2024-01-30T08:52:25.343+00:00"},"s":"I",  "c":"CONTROL",  "id":20711,   "ctx":"LogicalSessionCacheReap","msg":"Failed to reap transaction table","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
{"t":{"$date":"2024-01-30T08:52:25.343+00:00"},"s":"I",  "c":"SHARDING", "id":7012500, "ctx":"QueryAnalysisConfigurationsRefresher","msg":"Failed to refresh query analysis configurations, will try again at the next interval","attr":{"error":"PrimarySteppedDown: No primary exists currently"}}
"t":{"$date":"2024-01-30T08:52:25.532+00:00"},"s":"I",  "c":"-",        "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":400}}

It’s hilarious because if I kill pod a few times it’s going to run properly eventually, I thought it may be caused by difference in signals sent to pod by openshift during killing process but with my observations it does not matter if I send SIGKILL or SIGTERM but as for me it’s dead end.

Please see my full configuration:
Dockerfile:

FROM docker.io/percona/percona-server-mongodb:7.0.4

EXPOSE 27017

USER root

RUN yum update && yum upgrade -y
RUN yum install -y cyrus-sasl cyrus-sasl-plain
RUN yum install openldap-clients -y 

RUN rm -f etc/mongod.conf
ENV MONGO_BIND_IP="0.0.0.0"
ENV MONGO_INIT_ENABLED="true"

COPY entrypoint_rs.sh / 
COPY mongodb_keyfile /etc/mongodb/keys/
COPY script.sh . 
COPY file.js . 
COPY admessage.js . 
COPY saslauthd.conf /etc 
COPY mongod.conf /etc 
COPY mongodb.conf /etc/sasl2
COPY custommongod /usr/local/bin

RUN chmod +rx  /entrypoint_rs.sh && chown -R mongodb:root /entrypoint_rs.sh
RUN chmod +x /entrypoint_rs.sh && mkdir -p /etc/mongodb/keys/

RUN chmod 0400 /etc/mongodb/keys/mongodb_keyfile  && chown -R mongodb:root /etc/mongodb/keys/mongodb_keyfile

RUN chmod +x script.sh \
    && chown -R mongodb:root data 

RUN chmod +x file.js
RUN chmod +x admessage.js

RUN sed -i -e s/^MECH=pam/MECH=ldap/g /etc/sysconfig/saslauthd 

RUN chmod 644 /etc/saslauthd.conf && chown mongodb:root /etc/saslauthd.conf

RUN chmod 755 run && chown mongodb:root run \
&& chmod 755 etc && chown mongodb:root etc


RUN touch /var/log/audit.json \
    && chmod 777 /var/log/audit.json && chown mongodb:root /var/log/audit.json

RUN chmod +x addusers.sh
RUN chmod +x newjoiner.sh

RUN chmod 755 /usr/local/bin/custommongod
RUN chown mongodb:root /usr/local/bin/custommongod

RUN touch /var/log/forklog \
    && chmod 777 /var/log/forklog && chown mongodb:root /var/log/forklog

USER mongodb

ENTRYPOINT ["/entrypoint_rs.sh"]
CMD ["custommongod"]

Entrypoint:

#!/bin/bash

echo "🔥entrypoint_rs.sh"
FILE=/data/db/inittialized

if [ "$MONGO_INIT_ENABLED" = "false" ] ||  [ -f "$FILE" ]; then
  echo $MONGO_INIT_ENABLED
else
  echo "🔥🔥🔥init enabled, initializing"
  mongod --wiredTigerCacheSizeGB 0.25 --replSet rs0 --bind_ip  "$MONGO_BIND_IP" &
  #--config /etc/mongod.conf &
  bash /script.sh 
  mongosh admin file.js
  mongod --shutdown;
  echo "Second run🔥🔥🔥🔥🔥🔥🔥🔥"
  mongod --wiredTigerCacheSizeGB 0.25 --replSet rs0 --bind_ip  "$MONGO_BIND_IP" --auth --keyFile /etc/mongodb/keys/mongodb_keyfile &
  sleep 10
  mongosh percona admessage.js -u admin -p admin --authenticationDatabase admin
  mongod --shutdown;
  echo "Third run🔥🔥"
  mongod --wiredTigerCacheSizeGB 0.25 --replSet rs0 --bind_ip  "$MONGO_BIND_IP" --setParameter authenticationMechanisms=PLAIN,SCRAM-SHA-1 --auth --keyFile /etc/mongodb/keys/mongodb_keyfile &
  sleep 10
  mongod --shutdown;
  export MONGO_INIT_ENABLED=false
  touch /data/db/inittialized
  echo "🔥🔥🔥🔥Setup completed"
fi

echo "🔥🔥Running MongoDB"
saslauthd -a ldap 
# mongod --wiredTigerCacheSizeGB 0.25 --replSet rs0 --bind_ip  "$MONGO_BIND_IP" --setParameter authenticationMechanisms=PLAIN,SCRAM-SHA-1 --auth --keyFile /etc/mongodb/keys/mongodb_keyfile --dbpath data/db --auditDestination file --auditFormat JSON --auditPath /var/log/audit.json

exec "$@"

script.sh

#!/bin/bash

echo "EXEC: mu script. 🚀🚀🚀🚀🚀 "
check_connection() {
    mongosh --eval "print('Connected')" 2>/dev/null
    return $?
}

while true; do
    echo "Awaiting 5s to ensure mongod is up"
    sleep 5
    
    if check_connection; then
        echo "Connected🔥🔥🔥🔥"
        mongosh --eval "rs.initiate(
 {
 _id: 'rs0',
 members: [
 { _id: 0, host: 'mongodb-server:27017'},
 ]
 });"
        mongosh --eval "db.getSiblingDB('admin').createRole({ 'role': 'pbmAnyAction',
      'privileges': [
         { 'resource': { 'anyResource': true },
           'actions': [ 'anyAction' ]
         }
      ],
      'roles': []
   });"
        mongosh --eval "db.getSiblingDB('admin').createUser({user: 'pbmuser',
       'pwd': 'secretpwd',
       'roles' : [
          { 'db' : 'admin', 'role' : 'readWrite', 'collection': '' },
          { 'db' : 'admin', 'role' : 'backup' },
          { 'db' : 'admin', 'role' : 'clusterMonitor' },
          { 'db' : 'admin', 'role' : 'restore' },
          { 'db' : 'admin', 'role' : 'pbmAnyAction' }
       ]
    });"
        
        break
    else
        echo "Awaiting 5s"
    fi
done

file.js

db.createUser({'user': 'admin', 'pwd': 'admin', 'roles': ['root']})

addmessage.js

db.test.insert({"message": "Active Directory user success!"})

I read multiple threads in this matter and the only possible solution I found is disabling the fork in processManagement but it’s disabled by default.

The custom custommongod command is

exec mongod --wiredTigerCacheSizeGB 0.25 --replSet rs0 --bind_ip  "$MONGO_BIND_IP" --setParameter authenticationMechanisms=PLAIN,SCRAM-SHA-1 --auth --keyFile /etc/mongodb/keys/mongodb_keyfile --dbpath data/db --auditDestination file --auditFormat JSON --auditPath /var/log/audit.json

EDIT1
I think it’s important to mention that whenever I try to execute replicaset while it’s corrupted and replicaset shall not be available, I’m getting such message

test> rs.initiate()
MongoServerError: already initialized

Hi @Iliterallyneedhelp ,
Thank you for the detailed description of the issue but I would like to ask few more questions:
The snippet of the log which you provided does not contain any critical error messages. There are some messages like “Replication has not yet been configured” or “No primary exists currently”. But there are also the promise to retry: “will try at the next refresh interval”.

  • are those the last records in the log or they are repeating?
  • it is not clear what do you mean saying “replica set is not available”. How do you check if it is available?
  • from the log messages it looks like probably node cannot connect to other members of the replica set. Have you checked if connectivity issues between replica set members can be the issue?
  • you said you run customized mongodb server. What kind of customization? Have you tried to do the same with original (non-customized) version?