Description:
Trying cross cluster replication in our environment with two different clusters. We used Istio mesh instead of load balancer due to internal restrictions to expose the replica sets.
As it is cross cluster, we have enabled tls and got ca certs manually. We have configured ServiceEntry and transport layer for smooth connections between replica sets.
Once all the changes applying in steps, we see replica set pod (only one rs configured in secondary where as 3 in primary) keeps crashing.
Steps to Reproduce:
This can be reproduced by deploying the operator again
Version:
mongo: 1.18.0
Logs: kubectl logs mongopod -n --tail=50
{"t":{"$date":"2025-03-18T06:28:26.514+00:00"},"s":"I", "c":"-", "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":7400}}
{"t":{"$date":"2025-03-18T06:28:26.815+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn92","msg":"client metadata","attr":{"remote":"127.0.0.6:33885","client":"conn92","negotiatedCompressors":[],"doc":{"driver":{"name":"mongo-go-driver","version":"1.17.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.22.8","env":{"container":{"orchestrator":"kubernetes"}}}}}
{"t":{"$date":"2025-03-18T06:28:26.826+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn93","msg":"client metadata","attr":{"remote":"127.0.0.6:49163","client":"conn93","negotiatedCompressors":[],"doc":{"driver":{"name":"mongo-go-driver","version":"1.17.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.22.8","env":{"container":{"orchestrator":"kubernetes"}}}}}
{"t":{"$date":"2025-03-18T06:28:27.003+00:00"},"s":"W", "c":"QUERY", "id":23799, "ctx":"ftdc","msg":"Aggregate command executor error","attr":{"error":{"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [local.oplog.rs] not found."},"stats":{},"cmd":{"aggregate":"oplog.rs","cursor":{},"pipeline":[{"$collStats":{"storageStats":{"waitForLock":false,"numericOnly":true}}}],"$db":"local"}}}
Expected Result:
Primary and secondary should be on sync. In primary replication, we need to see replia pod as secondary and healthy
Actual Result:
rs.status() in primary:
_id: 10,
name: 'fqdn:27015',
health: 0,
state: 8,
stateStr: '(not reachable/healthy)',
uptime: 0,
optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
optimeDate: ISODate('1970-01-01T00:00:00.000Z'),
optimeDurableDate: ISODate('1970-01-01T00:00:00.000Z'),
lastAppliedWallTime: ISODate('1970-01-01T00:00:00.000Z'),
lastDurableWallTime: ISODate('1970-01-01T00:00:00.000Z'),
lastHeartbeat: ISODate('2025-03-18T06:32:07.834Z'),
lastHeartbeatRecv: ISODate('1970-01-01T00:00:00.000Z'),
pingMs: Long('0'),
lastHeartbeatMessage: "Couldn't get a connection within the time limit",
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
configVersion: -1,
configTerm: -1
}
],
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1742279523, i: 2 }),
signature: {
hash: Binary.createFromBase64('tWV/cYSENh+gC788OopzcvEkcQc=', 0),
keyId: Long('7480054805797273607')