WriteConcern on ReplicaSet, connectionString vs Mongod.conf?

Hi there,

We are using Percona MongoDB v3.6.11-3.1, running a replicaSet of 5 nodes. The nodes 1-2-3 are on the same datacenter1 same VLAN and the backends are pointing to those 3 servers, reading from secondaries. The node4 is on the datacenter1 but on another VLAN, and it has been setup to do not vote and it doesn’t get backends traffic. The node 5 is on datacenter2, on a VLAN that can reach datacenter1 but only node 4, meaning Node 5 can’t reach Node 1-2-3. No vote and no backends traffic for Node 5.

We are experiencing an issue, apparently we are writing to the primary and very soon trying to read some data, but it seems the data is not being propagated to the node 2 & node 3 (secondaries) yet. Then, I am looking to implement the Write Concern "2 or perhaps majority”. We use Percona Monitoring and Management v1, and I am seeing 2 seconds of replication lag for node 2 and node 3, and sometimes the PMM shows 3-4 secs of replication lag.

We have a database of 50GB of data, and the main collection has way to many indexes over 50 indexes and we are close to 90MM records/documents. We mainly use it for a Learning Records Store “LRS”, where the official “Learning Locker software” has official 30 indexes. With our +50 indexes plus the data, our databases on disk is about 101GB of disk size. We run our backends/databases on VMs - Debian 8, using 4 Hosts (ESXi 6.7) some Dell servers, we use vCenter, vSAN, vMotion, etc.

It seems to me 2 seconds of replicate lag is a lot. Should I add the Write Concern “majority” on the Mongod.conf or on the connection string on the backends?

Thanks in advance.

1 Like

Hi.

I have a feeling it is plain replication lag. Using a write concern of majority will slow the application down (at least the thread doing the write). It won’t change the speed at which writes are replicated, except indirectly by making them faster if the client load is throttled by the write concern option you’re using.

Just to make things crystal-clear about the replica set configuration, as it affects what counts as part of a majority write, could you please share the output of rs.conf()? If you’re concerned about sharing the hostnames please just replace them with ‘node1.dc1’ etc.

And if monitor this replicaset using PMM, or MongoDB Ops Manager, is there anything you see in those graphs?

Cheers,

Akira

1 Like

Hi, here my rs.config and part of the mongod.cnf

Also, I uploaded screenshot of the PMM v1, about my secondaries (Mongo2 and Mongo3).

rs0:PRIMARY> rs.conf()
{
	"_id" : "rs0",
	"version" : 12,
	"protocolVersion" : NumberLong(1),
	"members" : [
		{
			"_id" : 0,
			"host" : "mongo1.fleetdefense.com:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 2,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 1,
			"host" : "mongo2.fleetdefense.com:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 2,
			"host" : "mongo3.fleetdefense.com:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 3,
			"host" : "mongo4.fleetdefense.com:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : true,
			"priority" : 0,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 0
		},
		{
			"_id" : 4,
			"host" : "mongo5.fleetdefense.com:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : true,
			"priority" : 0,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 0
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatIntervalMillis" : 2000,
		"heartbeatTimeoutSecs" : 10,
		"electionTimeoutMillis" : 10000,
		"catchUpTimeoutMillis" : 60000,
		"catchUpTakeoverDelayMillis" : 30000,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		},
		"replicaSetId" : ObjectId("5c47cb0acf8ec0b80dc3822e")
	}
}

------------------------------------------------------------------------------

mongod.conf

# mongod.conf, Percona Server for MongoDB
# for documentation of all options, see:
...
storage:
  dbPath: /var/lib/mongodb
  journal:
enabled: true

systemLog:
  destination: file
  logAppend: true
  logRotate: reopen
  path: /var/log/mongodb/mongod.log

processManagement:
  fork: true
  pidFilePath: /var/run/mongod.pid
net:
  port: 27017
  bindIp: 0.0.0.0

operationProfiling:
  slowOpThresholdMs: 400
  mode: slowOp
  rateLimit: 100

replication:
  replSetName: "rs0"
1 Like

Hi. Thanks for sharing the PMM graphs. I see the replication lag is showing as 1~2 secs nearly all the time.

There’s a rounding-up issue that affects replication lag calculations. The oplog timestamp used is the MongoDB Timestamp type which not a wall-clock time but a composite of whole second + an incremental counter that resets to 1 each new second. There is no rule we can use to reverse-calculate what wall-clock millisecond within a second the oplog timestamp represents so the calculation.

But even so it looks like there is an approximate ~1.0 sec replication lag most of the time.

Although your client requests are low I suspect the primary is serving the oplog to the secondaries slowly because of the high load at the storage engine level.

If you look at the “Scanned and Moved Objects” graph we can see that 500k to 1M docs are being scanned every second. (Ignore “moved” in this graph title, that was a metric only used in MMAPv1. The important thing is “scanned_documents” means collection scan, “scanned” means index entries scanned.) This is for client requests totalling ~150 inserts per second (these don’t cause inefficient scans), ~100 updates per second and ~20 queries per second. (The Document Operations graph shows a similar ~100 updates per second so I deduce the updates are updating only one document each.)

I can’t tell if it is the query or the updates but each one is scanning something on the order of 10k
~ 50k documents (full documents from the collection, not index entries).

So, a lack-of-index problem is the notable thing. Or, if you have an index that is supposed to be making those fast, it’s not suitable in some way and the query engine isn’t using it.

And to go back to the original question I suspect all this unnecessary work is adding latency when the primary is serving the secondaries in their oplog-tailing query.

1 Like