Description:
This is just a clarification question. Currently, docs say if you do a restore, system-user secrets need to be identical to the source cluster (at the time of backup). I can understand this requirement on full backup, because this includes the system.admin.users collection - so a full restore will overwrite the system users and if they’re different from what the operator knows, it will lose access to the cluster.
This limitation can be a bit of a challenge though if one is also required to implement regular secret rotation. We now have to keep track of the system user secret as well as which backups it corresponds to. Not impossible, but maybe annoying and easy to forget.
Two questions about this:
- Is it correct that a selective restore should work even if the system passwords differ? In the selective case, the users database should not be touched, meaning the operator would still have access.
- In case of a full restore with different secrets, the operator would lose access and the cluster would be marked as failed. But the restore itself might be able to finish properly. I’m not too sure about that, because a classical “mongorestore” typically fails the oplog replay stage if the admin user changed during restore. But if pbm can handle that, one should be able to repair the cluster simply by replacing the system passwords with the new ones. Or is there other problems?
I’m specifically imagining the worst case where the system user secret might have been lost - in that case, option 1 would be the easiest approach to get the db back online - while option 2 might need lots of manual intervention (including bringing up mongodb in no-auth mode to be able to replace the credentials).