Description:
I have multiple sharded psmdb clusters for staging and production which needs to be replicated to a DR site, I started with staging clusters but I faced a strange issue which causes a DB lose, basically when I update the config for main site with the externalNodes IPs, the operator is not updating the pods with the new config and I keep see logs for old IPs which are not exist or was exist but removed after I applied the new configuration, the only way to update the config is to delete the PVC and the pods, then the new pods will have the new config and removing the PVC causes the DB to be removed
Steps to Reproduce:
[Step-by-step instructions on how to reproduce the issue, including any specific settings or configurations]
deploy a normal psmdb cluster and add some DBs then update the config and set clusterServiceDNSMode: External and add the externalNodes IPs for the DR sites, then If you managed to connect both clusters then try to change the DR IPs and re-apply the config again
Version:
[Insert the version number of the software]
percona-server-mongodb:4.4.16-16
crVersion: 1.14.0
Logs:
[If applicable, include any relevant log files or error messages]
the below IPs are not exist in the cluster, it was and services IPs has changes but still it trying to connect to old IPs and not piking up the new IPs
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“D2”, “c”:“ASIO”, “id”:4646302, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“Finished request”,“attr”:{“requestId”:32,“status”:{“code”:6,“codeName”:“HostUnreachable”,“errmsg”:“Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”}}}
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“D2”, “c”:“ASIO”, “id”:22597, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“Request finished with response”,“attr”:{“requestId”:32,“isOK”:false,“response”:“HostUnreachable: Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”}}
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“D3”, “c”:“EXECUTOR”, “id”:22608, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“Received remote response”,“attr”:{“response”:“HostUnreachable: Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”}}
{“t”:{“$date”:“2023-07-12T22:00:13.642+00:00”},“s”:“I”, “c”:“-”, “id”:4333222, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“RSM received failed isMaster”,“attr”:{“host”:“10.10.0.181:27017”,“error”:“HostUnreachable: Error connecting to 10.10.0.181:27017 :: caused by :: No route to host”,“replicaSet”:“cfg”,“isMasterReply”:“{}”}}
strong text## Expected Result:
the config gets updated with the changes I made
Actual Result:
cashed config and old IPs are still there and not getting updated unless we delete the PVCs then we see the new configuration.
main site config:
clusterServiceDNSMode: External
replsets:
rs0:
externalNodes:
- host: 10.11.10.225
priority: 0
votes: 0
- host: 10.11.10.227
priority: 0
votes: 0
- host: 10.11.10.229
priority: 0
votes: 0
configuration: |
systemLog:
verbosity: 100
expose:
enabled: true
exposeType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/oci-load-balancer-shape: flexible
service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: “10”
service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: “10”
service.beta.kubernetes.io/oci-load-balancer-internal: “true”
service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: None
service.beta.kubernetes.io/oci-load-balancer-subnet1: x.x.x.x…x.x.x.
oci.oraclecloud.com/oci-network-security-groups: x.x.x.x.x.x.x.
sharding:
configrs:
externalNodes:
- host: 10.11.10.226
priority: 0
votes: 0
- host: 10.11.10.228
priority: 0
votes: 0
- host: 10.11.10.230
priority: 0
votes: 0
configuration: |
systemLog:
verbosity: 100
expose:
enabled: true
exposeType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/oci-load-balancer-shape: flexible
service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: “10”
service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: “10”
service.beta.kubernetes.io/oci-load-balancer-internal: “true”
service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: None
service.beta.kubernetes.io/oci-load-balancer-subnet1: x.x.x.xx.x.xx
oci.oraclecloud.com/oci-network-security-groups: x.x.x.x.x.x
DR site config:
unmanaged: true
updateStrategy: OnDelete
clusterServiceDNSMode: External
backup:
enabled: false
replsets:
rs0:
externalNodes:
volumeSpec:
pvc:
resources:
requests:
storage: 600Gi
configuration: |
systemLog:
verbosity: 100
nonvoting:
volumeSpec:
pvc:
resources:
requests:
storage: 60Gi
expose:
enabled: true
exposeType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-name: x.x.x.x
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet
serviceAnnotationsservice.beta.kubernetes.io/alibaba-cloud-loadbalancer-instance-charge-type: PayByCLCU
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-resource-group-id: x.x.x.x
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id: x.x.x.x
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-security-group-ids: "x.x.x.x.x
sharding:
configrs:
volumeSpec:
pvc:
resources:
requests:
storage: 60Gi
configuration: |
systemLog:
verbosity: 100
expose:
enabled: true
exposeType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-name: x.x.x.x.x
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet
serviceAnnotationsservice.beta.kubernetes.io/alibaba-cloud-loadbalancer-instance-charge-type: PayByCLCU
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-resource-group-id: rx.x.x.x.x.x.
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id: x.x.x.x
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-security-group-ids: x.x.x.x.x.x
so the question here, how can we update the config without deleting the PVCs please ?