Replicaset resync advice

Hi,

I hope this is the best section to ask this question.

We have a mainly default 3/3/3 Kubernetes install of MongoDB on an oldish version of Kubernetes (1.22) and MongoDB (5.0.7-6) running on a mixed test cluster that’s edging towards production (its replacement is running newer versions)

The issue would seem to be unconnected with version levels though. We were looking at bringing this system bang up to date, resolving issues like the use of GlusterFS and having a consistent physical environment.

The Kubernetes env is mostly KVM based, running on two Hetzner servers. However, for the node is question, its just running directly on its own machine. All three servers have similar physical characteristics, 128GB Memory, number of cores, Multiple ethernet interfaces etc.

A 5TB local volume is assigned to each DB instance, and it was while reviewing the construction of the volumes that the issue was discovered.

Basically, the server that is running directly on the base hardware is short of 1TB of data. My limited efforts to look at sync status suggest that the system as a whole believes itself to be in sync. The difference of 1TB is 1.4TB used, as opposed to 2 other systems showing 2.4 TB used.

There was an issue last week that may be the underlying cause, we’d seen increasing issues with the internal 10Ge network interface, until the port/switch was replaced last week (a little unclear, but in the end, I think it was the switch)

I also think the oplog, currently at 50GB, is reasonably full, but the data held appears to go back around a year, if I’m reading this right.

And no, this isn’t currently backed up, have just figured out how to do so on our later installs, but need to switch back to kubectl rather than helm to work on this one.

So, the question is, what would be the best way to get the system back in sync ? I’m assuming that we could scale down the number of instances, and then scale back up, but is that the correct thing to do, and how do we select the particular node to remove.

Any advice would as always be most appreciated.

Mike