I would like to have your opinion on the feasibility of running a 3-node, sharded MongoDB cluster on Kubernetes.
The use case is to able run a MongoDB database which has larger storage requirements than a single server (that I have) can meet.
This is for a hobby project so the budget is limited. Thus the 3 node constraint.
I was thinking of the following setup:
- 3 replicasets, each composed of 2 mongod and 1 arbiter
- sharding activated, with 3 config server pods
- 1 mongos on each node
In practice, a deployment would look like something as (Primary / Secondary):
- node A would run: rs0-0 (P), rs1-arbiter-0, rs2-1 (S), cfg-0, mongos
- node B would run: rs0-1, rs1-0 (P), rs2-1 (S), r2-arbiter-0, cfg-1, mongos
- node C would run: rs0-arbiter-0, rs1-1 (S), rs2-0 (P), cfg-2, mongos
Each node would thus host:
- a primary of a first replicaset;
- a secondary of a second replicaset;
- and the arbiter of a third replicaset.
Let’s say that each node has 1TB disk capacity and documents are perfectly distributed across the shards.
That would mean that each shard could be as big as 0.5 TB, and the total available space for the MongoDB cluster would be 0.5 x 3 = 1.5 TB.
With the added benefit of the redundancy, where one of the 3 node may go down without losing all the data.
Am I correct?
I’ve already experimented with the operator and I’ve managed to get something running (with “allowUnsafeConfigurations=true”).
Although I haven’t succeeded in making Kubernetes schedule the pods exactly as I wanted.
With podAntiAffinity rules, I could ensure that no node host more than 1 pod of each replicaset.
But I couldn’t figure out how to enforce a “perfect” spread, i.e. excluding the possibly of having rs0-X, rs1-X and rs2-X running on a single node.
What do think of this architecture? Will it work? Is it sensible?