The operator is not updating the config until the PVC removed and recreated

Ahmed_Asim · July 13, 2023, 8:07pm

Description:

I have multiple sharded psmdb clusters for staging and production which needs to be replicated to a DR site, I started with staging clusters but I faced a strange issue which causes a DB lose, basically when I update the config for main site with the externalNodes IPs, the operator is not updating the pods with the new config and I keep see logs for old IPs which are not exist or was exist but removed after I applied the new configuration, the only way to update the config is to delete the PVC and the pods, then the new pods will have the new config and removing the PVC causes the DB to be removed

Steps to Reproduce:

[Step-by-step instructions on how to reproduce the issue, including any specific settings or configurations]

deploy a normal psmdb cluster and add some DBs then update the config and set clusterServiceDNSMode: External and add the externalNodes IPs for the DR sites, then If you managed to connect both clusters then try to change the DR IPs and re-apply the config again

Version:

[Insert the version number of the software]
percona-server-mongodb:4.4.16-16
crVersion: 1.14.0

Logs:

[If applicable, include any relevant log files or error messages]
the below IPs are not exist in the cluster, it was and services IPs has changes but still it trying to connect to old IPs and not piking up the new IPs

{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“D2”, “c”:“ASIO”, “id”:4646302, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“Finished request”,“attr”:{“requestId”:32,“status”:{“code”:6,“codeName”:“HostUnreachable”,“errmsg”:“Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”}}}
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“D2”, “c”:“ASIO”, “id”:22597, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“Request finished with response”,“attr”:{“requestId”:32,“isOK”:false,“response”:“HostUnreachable: Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”}}
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“D3”, “c”:“EXECUTOR”, “id”:22608, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“Received remote response”,“attr”:{“response”:“HostUnreachable: Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”}}
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“I”, “c”:“-”, “id”:4333222, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“RSM received failed isMaster”,“attr”:{“host”:“10.10.0.181:27017”,“error”:“HostUnreachable: Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”,“replicaSet”:“cfg”,“isMasterReply”:“{}”}}
strong text## Expected Result:

the config gets updated with the changes I made

Actual Result:

cashed config and old IPs are still there and not getting updated unless we delete the PVCs then we see the new configuration.

main site config:

clusterServiceDNSMode: External
replsets:
rs0:
externalNodes:
- host: 10.11.10.225
priority: 0
votes: 0
- host: 10.11.10.227
priority: 0
votes: 0
- host: 10.11.10.229
priority: 0
votes: 0
configuration: |
systemLog:
verbosity: 100
expose:
enabled: true
exposeType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/oci-load-balancer-shape: flexible
service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: “10”
service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: “10”
service.beta.kubernetes.io/oci-load-balancer-internal: “true”
service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: None
service.beta.kubernetes.io/oci-load-balancer-subnet1: x.x.x.x…x.x.x.
oci.oraclecloud.com/oci-network-security-groups: x.x.x.x.x.x.x.

sharding:
configrs:
externalNodes:
- host: 10.11.10.226
priority: 0
votes: 0
- host: 10.11.10.228
priority: 0
votes: 0
- host: 10.11.10.230
priority: 0
votes: 0
configuration: |
systemLog:
verbosity: 100
expose:
enabled: true
exposeType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/oci-load-balancer-shape: flexible
service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: “10”
service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: “10”
service.beta.kubernetes.io/oci-load-balancer-internal: “true”
service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: None
service.beta.kubernetes.io/oci-load-balancer-subnet1: x.x.x.xx.x.xx
oci.oraclecloud.com/oci-network-security-groups: x.x.x.x.x.x

DR site config:

so the question here, how can we update the config without deleting the PVCs please ?

Sergey_Pronin · July 14, 2023, 7:23am

Hey @Ahmed_Asim ,

seems you case requires a deeper dive. Would you be open to jump into a quick call with me to understand your problem better? If you can prepare the envs beforehand and show me what is happening live - that would be awsm.

If you are open to it, please book me here: Calendly - Sergey Pronin

Ahmed_Asim · July 22, 2023, 9:20pm

@Sergey_Pronin thanks for your reply and for the meeting, miraculously I managed to make it work without deleting the data for two clusters but i have an interesting issue with the other 3 clusters which is during the data sync from main to the DR i got a log message in rs saying that some indexes failed to create in the background … creating them now in tbe foreground then it starts to scan some collections which has a massive amount of records and at some point the pod got killed with OOM killed so i increased the RAM and the maxIndexBuildMemoryUsageMegabytes up to 265G RAM and still the build index fails and pod consumes the whole host memory then gets killed … i tried alot to overcome this issue but with no luck unfortunately also tried to usd --noIndexBuildRetry but its been removed from mongo 4.4
Do you please have any ideas about this issue ? Appreciate your support and efforts

Thanks
Ahmed

Ahmed_Asim · July 22, 2023, 9:26pm

Also I’m not sure if I should increase the RAM over 256G, can the index build consumes all of this amount of memory? Because its not making any sense to me. I will be waiting for your feedback

Ahmed_Asim · July 22, 2023, 9:54pm

here is when if fails:

│ ing keys from external sorter into index","done":323857700,"total":406210899,"percent":79}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:05.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":332547000,"total":406210899,"percent":81}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:08.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":341218800,"total":406210899,"percent":84}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:11.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":349945400,"total":406210899,"percent":86}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:14.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":358538200,"total":406210899,"percent":88}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:17.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":367027300,"total":406210899,"percent":90}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:20.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":375286000,"total":406210899,"percent":92}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:23.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":383745700,"total":406210899,"percent":94}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:26.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":392150700,"total":406210899,"percent":96}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:29.001+00:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"initandlisten","msg":"progress meter","attr":{"name":"Index Build: insert │
│ ing keys from external sorter into index","done":400487500,"total":406210899,"percent":98}}                                                                             │
│ {"t":{"$date":"2023-07-22T21:05:30.409+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"WTCheckpointThread","msg":"WiredTiger message","attr":{"message":"[169005 │
│ 9930:409938][8:0x7f3d0937e700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 120, snapshot max: 120 snapshot count: 0, oldest t │
│ imestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 4245194"}}                                                                                         │
│ {"t":{"$date":"2023-07-22T21:05:31.055+00:00"},"s":"I",  "c":"INDEX",    "id":20685,   "ctx":"initandlisten","msg":"Index build: inserted keys from external sorter int │
│ o index","attr":{"namespace":"x.x","index":"TrxRef","keysInserted":406210899,"durationMillis":595000}}                                    │
│ tcmalloc: large alloc 12998750208 bytes == 0x56776994c000 @                                                                                                             │
│ /bin/bash: line 18:     8 Killed                  mongod --bind_ip_all --auth --dbpath=/data/db --port=27017 --replSet=rs0 --storageEngine=wiredTiger --relaxPermChecks │
│  --sslAllowInvalidCertificates --clusterAuthMode=keyFile --keyFile=/etc/mongodb-secrets/mongodb-key --shardsvr --enableEncryption --encryptionKeyFile=/etc/mongodb-encr │
│ yption/encryption-key --wiredTigerIndexPrefixCompression=true --setParameter maxIndexBuildMemoryUsageMegabytes=250000                                                   │
│ Stream closed EOF for psmdb-db-support/psmdb-db-support-rs0-0 (mongod)

Santosh_Varma · July 26, 2023, 6:17pm

Hi Ahmed,

what is the request and limit size for memory set for mongod container?

I would also suggest to have WT storage engine cache specified in config.
It may happen that during index build some other operation also took place requiring memory and it may have lead to OOM.

Also in 256GB system, allocating 250GB for index build then mongod will consume all that memory leaving very less memory for system.

Sergey_Pronin · August 3, 2023, 8:44am

@Ahmed_Asim - does Santosh’s answer help? Have you tried it?

Ahmed_Asim · August 3, 2023, 11:23am

Hi @Sergey_Pronin @Santosh_Varma thanks for your reply, the fix for index build was to increase the maxIndexBuildMemoryUsageMegabytes and allocate more memory for the pod so the index build finish fast also the most Important thing is to REMOVE the liveness probe because the index build will take some time and this will cause pod unreadiness then it will be killed during the build.

for the main issue which I believe it’s very serious and critical one "config is getting cashed’ whenever external node IPs gets changed, It happened with me twice so far and I really don’t know under which circumstances this happens and the only way to make the cluster work in this case Is to delete all the PVC for all the replicates and start fresh which is disaster if you asked me.