In a PSA (Primary, Secondary and Arbiter) structure you have to disable the Read Concern Majority option if you don’t want to lose availability if a node other than the arbiter is stopped (is in the official documentation). The problem is that having it disabled does not work pbm backup (ReadConcernMajorityNotEnabled. Majority read concern is not enabled). Is there any way to disable it to check the read concern or any other solution?
I am facing the same concern as said by ruben. If ReadConcernMajority is enabled, then mongo cluster will stop working if any of data bearing node goes down. Is there a way to trigger backup from primary when ReadConcernMajority is not enabled using percona.
I haven’t found a solution yet
Hi Ruben (and Rinki).
I believe the direct cause of the error you reported is that connections from pbm-agents and PBM CLI to read the PBM control collections explicitly set the “majority” readConcern option.
pbm.go: connection to the PBM control collections:
func connect(ctx context.Context, uri, appName string) (*mongo.Client, error) {
client, err := mongo.NewClient(
options.Client().ApplyURI(uri).
SetAppName(appName).
SetReadPreference(readpref.Primary()).
SetReadConcern(readconcern.Majority()).
SetWriteConcern(writeconcern.New(writeconcern.WMajority())),
)
...
https://github.com/percona/percona-backup-mongodb/blob/7d5fefc0368d9289b8a0cd880434b7cadba9852e/pbm/pbm.go#L187-L194
N.b. this is not all connections. Connections from the pbm-agent to the local mongod node, e.g. when copying data for a backup or inserting data during a restore, don’t set readConcern. Instead they use the “direct” option which implies that readPreference and readConcern will be ignored. (The “n.curi” value here the URI provided by PBM_MONGODB_URI env var or the --mongodb-uri commandline argument.)
node.go::
func (n *Node) Connect() error {
conn, err := mongo.NewClient(options.Client().ApplyURI(n.curi).SetAppName("pbm-agent-exec").SetDirect(true))
https://github.com/percona/percona-backup-mongodb/blob/f6e74921a8aee9e993f516d2ebfff45ad978afc2/pbm/node.go#L42-L43
PBM has a design focus to use replica set’s inbuilt consistency where it is sensible, but as you noted Ruben changes for transactions starting in v4.0 made read majority and PSA replicasets sort-of incompatible. A PSA replicaset is fine but a PA(+ S down) replicaset is does not offer majority read guarantees.
I think the error you report can only happen when the replicaset with the PBM control collections falls into a PA(+ S down) state. Could you please confirm that matches what you observe / are worried about?
If I disable the read concern majority (“enableMajorityReadConcern: false” in mongod.conf) the pbm agent detects it and does not let it start. I can’t test what would happen if the secondary falls because it doesn’t even start the pbm-agent.If once started it does not check the read concern for backups (use the “direct”), there should be no problem even if the arbiter or the secondary is down, but as I said with the option read concern disabled it does not start the pbm-agent.Is there any way to start the agent with the read concern disabled?
Thank you very much.
Error pbm-agent with “enableMajorityReadConcern: false” in mongo.conf:jul 28 11:05:27 mongo1 pbm-agent[14670]: 2020/07/28 11:05:27 connect to mongodb: get config server connetion URI: (ReadConcernMajorityNotEnabled) Majority read concern is not enabled.
I test all without “enableMajorityReadConcern: false” in mongo.conf and all works fine but it is not the recommendation for a PSA structure.
Hi Ruben.
There is no way to start pbm-agent with the read concern disabled.
I also I don’t think it should be an option due to the very small number of situation in which taking a backup from a node that can’t confer majority read guarantees would be acceptable. I agree that an arbiter vote should be good enough to confer majority rule, but as of current versions (4.0, 4.2, 4.4) MongoDB began to require a physical write which an arbiter can’t do. I have a feeling future versions of MongoDB will resolve this issue.
Also, the issue only affects the PSA replicaset when one of the data-bearing nodes is down. One schedule backup may not run, but the next can.
OK, thank you very much for your help