Issues with add and drop user on Percona Mongo 3.2.18-3.9 cluster with delayed nodes

Hi,

I am not a mongo expert, although I am familiar with the product.
I am running a Percona Mongo 3.2.18-3.9 cluster with 7 nodes, two of which are delayed nodes (24 hr delay).
When I attempt to drop or add users from the admin database on the primary, I get the following error:

REPLICA1:PRIMARY> use admin
switched to db admin
REPLICA1:PRIMARY> db.dropUser(“someuser”);

2018-04-24T16:26:56.299-0400 E QUERY [thread1] Error: timeout :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DB.prototype.dropUser@src/mongo/shell/db.js:1355:15
@(shell):1:1

And the user is not dropped.
I get the same kind of thing with adding a user.

I was thinking that it is the 24 hr delayed nodes which are causing this issue.
this cluster was originally mongo community 2.6.8, upgraded to Mongo community 3.0.7, and then to community edition 2.3.18; then finally
finally, the cluster was upgraded to Percona Mongo 3.2.18-3.9.

In the community versions with the delayed nodes, a similar error/warning would display, but the add or drop of the user would succeed.

I was thinking of using writetimeout with no timeout - so the syntax would be like this:

db.dropUser(“someuser”, {w: “majority”, wtimeout: 0} );

But…I am concerned about how this will behave with the 24 hr delayed nodes.
These delayed nodes are configured as follows:

{
“_id” : 4,
“host” : “<ode name>:<ode port>”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : true,
“priority” : 0,
“tags” : {

},
“slaveDelay” : NumberLong(86400),
“votes” : 1
},

I thought of something like db.dropUser(“someuser”, {w: “majority”, wtimeout: 86500000} ); but I am worried about the length of the timeout there.

PS - I have a test cluster with no delayed nodes (slightly lower version) and here are the results of the add and drop user operations on that one:

RS1:PRIMARY> db.createUser( { user: “tester”, pwd: “dwjhfwouyfufh”, roles: [“root”] } )
Successfully added user: { “user” : “tester”, “roles” : [ “root” ] }
RS1:PRIMARY> db.dropUser(“tester”);
true
RS1:PRIMARY> db.version()
3.2.16-3.6

At any rate let me know if using 0 or for wtimeout or the very large number of MS - 86500000 - note this is a little bit longer than the delay…

Thanx in advance

Doug Levy

Hi Doug,
I have tested the same with Percona 3.2.18-3.9 from my side and could drop the users without issue. I had 3 members including a delay member in the replicaset.

mongo-rs1:PRIMARY> use admin
switched to db admin
mongo-rs1:PRIMARY> db.createUser({ user: "tester", pwd: "tested",
... roles:["root"]
... })
Successfully added user: { "user" : "tester", "roles" : [ "root" ] }
mongo-rs1:PRIMARY> 
mongo-rs1:PRIMARY> db.dropUser("tester")
true
mongo-rs1:PRIMARY> 
mongo-rs1:PRIMARY> rs.conf()
{
"_id" : "mongo-rs1",
"version" : 222535,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "172.17.0.10:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "172.17.0.9:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "172.17.0.8:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {

},
"slaveDelay" : NumberLong(86400),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {

},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5ae0e60047636814fcd452ad")
}
}
mongo-rs1:PRIMARY> db.printSlaveReplicationInfo()
source: 172.17.0.9:27017
syncedTo: Wed Apr 25 2018 20:55:22 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary 
source: 172.17.0.8:27017
syncedTo: Wed Apr 25 2018 20:46:05 GMT+0000 (UTC)
557 secs (0.15 hrs) behind the primary 
mongo-rs1:PRIMARY> db.version()
3.2.18-3.9
mongo-rs1:PRIMARY> 

Please send us the output of the rs.conf(), db.version() and rs.status(). Would like to take a look at getLastErrorDefaults.

Regards,
Vinodh Krishnaswamy

Hi and thanks for responding.
Hostnames are changed here for security but I made sure the “placeholder names” are consistent.
rs.status()

REPLICA1:PRIMARY> rs.status()
{
“set” : “REPLICA1”,
“date” : ISODate(“2018-04-26T21:24:05.263Z”),
“myState” : 1,
“term” : NumberLong(-1),
“heartbeatIntervalMillis” : NumberLong(2000),
“members” : [
{
“_id” : 0,
“name” : “host1.foo.bar.com:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 4927744,
“optime” : Timestamp(1524777844, 6),
“optimeDate” : ISODate(“2018-04-26T21:24:04Z”),
“lastHeartbeat” : ISODate(“2018-04-26T21:24:04.832Z”),
“lastHeartbeatRecv” : ISODate(“2018-04-26T21:24:04.831Z”),
“pingMs” : NumberLong(0),
“syncingTo” : “host2.foo.bar.com:27017”,
“configVersion” : 13
},
{
“_id” : 1,
“name” : “host2.foo.bar.com:27017”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 4927745,
“optime” : Timestamp(1524777845, 1),
“optimeDate” : ISODate(“2018-04-26T21:24:05Z”),
“electionTime” : Timestamp(1519851408, 1),
“electionDate” : ISODate(“2018-02-28T20:56:48Z”),
“configVersion” : 13,
“self” : true
},
{
“_id” : 2,
“name” : “host3.foo.bar.com:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 4926421,
“optime” : Timestamp(1524777842, 12),
“optimeDate” : ISODate(“2018-04-26T21:24:02Z”),
“lastHeartbeat” : ISODate(“2018-04-26T21:24:03.339Z”),
“lastHeartbeatRecv” : ISODate(“2018-04-26T21:24:04.598Z”),
“pingMs” : NumberLong(2),
“syncingTo” : “host2.foo.bar.com:27017”,
“configVersion” : 13
},
{
“_id” : 3,
“name” : “host4.foo.bar.com:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 4927487,
“optime” : Timestamp(1524691443, 13),
“optimeDate” : ISODate(“2018-04-25T21:24:03Z”),
“lastHeartbeat” : ISODate(“2018-04-26T21:24:04.210Z”),
“lastHeartbeatRecv” : ISODate(“2018-04-26T21:24:04.209Z”),
“pingMs” : NumberLong(0),
“syncingTo” : “host2.foo.bar.com:27017”,
“configVersion” : 13
},
{
“_id” : 4,
“name” : “host5.foo.bar.com:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 4927617,
“optime” : Timestamp(1524691443, 13),
“optimeDate” : ISODate(“2018-04-25T21:24:03Z”),
“lastHeartbeat” : ISODate(“2018-04-26T21:24:04.826Z”),
“lastHeartbeatRecv” : ISODate(“2018-04-26T21:24:04.196Z”),
“pingMs” : NumberLong(1),
“syncingTo” : “host3.foo.bar.com:27017”,
“configVersion” : 13
},
{
“_id” : 5,
“name” : “host6.foo.bar.com:27016”,
“health” : 1,
“state” : 7,
“stateStr” : “ARBITER”,
“uptime” : 4926853,
“lastHeartbeat” : ISODate(“2018-04-26T21:24:04.279Z”),
“lastHeartbeatRecv” : ISODate(“2018-04-26T21:24:04.542Z”),
“pingMs” : NumberLong(1),
“configVersion” : 13
},
{
“_id” : 6,
“name” : “host7.foo.bar.com:27016”,
“health” : 1,
“state” : 7,
“stateStr” : “ARBITER”,
“uptime” : 4927203,
“lastHeartbeat” : ISODate(“2018-04-26T21:24:03.457Z”),
“lastHeartbeatRecv” : ISODate(“2018-04-26T21:24:03.863Z”),
“pingMs” : NumberLong(2),
“configVersion” : 13
}
],
“ok” : 1
}

REPLICA1:PRIMARY> rs.conf()
{
“_id” : “REPLICA1”,
“version” : 13,
“members” : [
{
“_id” : 0,
“host” : “host1.foo.bar.com:27017”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

},
“slaveDelay” : NumberLong(0),
“votes” : 1
},
{
“_id” : 1,
“host” : “host2.foo.bar.com:27017”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

},
“slaveDelay” : NumberLong(0),
“votes” : 1
},
{
“_id” : 2,
“host” : “host3.foo.bar.com:27017”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

},
“slaveDelay” : NumberLong(0),
“votes” : 1
},
{
“_id” : 3,
“host” : “host4.foo.bar.com:27017”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : true,
“priority” : 0,
“tags” : {

},
“slaveDelay” : NumberLong(86400),
“votes” : 1
},
{
“_id” : 4,
“host” : “host5.foo.bar.com:27017”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : true,
“priority” : 0,
“tags” : {

},
“slaveDelay” : NumberLong(86400),
“votes” : 1
},
{
“_id” : 5,
“host” : “host6.foo.bar.com:27016”,
“arbiterOnly” : true,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

},
“slaveDelay” : NumberLong(0),
“votes” : 1
},
{
“_id” : 6,
“host” : “host7.foo.bar.com:27016”,
“arbiterOnly” : true,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

},
“slaveDelay” : NumberLong(0),
“votes” : 1
}
],
“settings” : {
“chainingAllowed” : true,
“heartbeatIntervalMillis” : 2000,
“heartbeatTimeoutSecs” : 10,
“electionTimeoutMillis” : 10000,
“getLastErrorModes” : {

},
“getLastErrorDefaults” : {
“w” : 1,
“wtimeout” : 0
}
}
}

REPLICA1:PRIMARY> db.version()
3.2.18-3.9

Thanks for the output Douglas. Sorry for late reply, got held up with I have actually set the same number of members in my test environment to check this. You are getting the below error because you have used “majority” in your dropUser’s write concern.

In your configuration, you are using 2 ARBITERS to make it 7 members replicaset which is not required. Is there any reason that you are using two ARBITER? Do you have your members distributed across the many Data Centres? However, you can actually overcome this one in your current configuration by using writeConcern as 3 which are reasonable from your configuration (1 Primary and 2 SECONDARY) following:


rs1:PRIMARY> db.dropUser("test1", {w: 3, wtimeout: 100})
true

Because “majority” counts the no. of members with voting ability. From your case, majority expects at least 5 members to get the data which is not possible within 86400sec time. But syncing to 3 members is possible. If you remove ARBITER members or convert delayed members to current one will also make this {w: “majority”} works.

Example for majority to work, removing slave delay as follows (assuming replication is happening asap)


rs1:PRIMARY> cfg.members[3].priority=1
1
rs1:PRIMARY> cfg.members[4].priority=1
1
rs1:PRIMARY> cfg.members[3].slaveDelay=0
0
rs1:PRIMARY> cfg.members[4].slaveDelay=0
0
rs1:PRIMARY> rs.reconfig(cfg)
{ "ok" : 1 }
rs1:PRIMARY> rs.conf()
{
"_id" : "rs1",
"version" : 25,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "172.17.0.4:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "172.17.0.6:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "172.17.0.7:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 3,
"host" : "172.17.0.8:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 4,
"host" : "172.17.0.9:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 5,
"host" : "172.17.0.11:27017",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 6,
"host" : "172.17.0.10:27017",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {

},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 100
},
"replicaSetId" : ObjectId("5aebcf5d7f53d69e7ddba762")
}
}
rs1:PRIMARY> db.dropUser("test3", {w: "majority", wtimeout: 100})
true
rs1:PRIMARY>