MongoDB uses a file for each collection and each index, so 100’s of thousands of those per shard becomes a bottleneck at checkpoint time when filehandles are iterated as a matter of normal processing. It definitely hurts performance.
A long time ago the WiredTiger team worked on a ‘million collections’ project where they added a ‘grouped collection’ API in the KVRecordStore class etc. Many collections would put into one file by adding a prefix for the collection to every key. This was added around v3.4 but never fully surfaced in the code, and just the other day I saw it’s been removed from the v5.0 code.
So there’s been no good way, and apparently won’t ever be a good way, to have 100’s of thousands of collections and indexes per shard.
But you can have lots of shards so there’s not too many collections per shard. That works.
By default no collection is sharded. When you create a new database namespace (eg. tenantABC, tenantDEF, etc.) implicity as you create the first collection there it will have its primary shard chosen in a round-robin fashion. All the collections of that database namespace will be created on that shard only, and won’t be split to other shards unless you explicitly request that they become sharded collections.
If you want to set which shard explicitly is the primary shard for a db namespace I suggest doing the “movePrimary” as soon as you created the first collection for it.