Percona Server for Mongodb [Stuck at Initializing]

Hi,

After making some changes to the cluster config “Increasing replica set” The cluster is stuck now at Initializing state for days.

Operator has the following error:

{"level":"error","ts":1654589464.6081927,"logger":"controller_psmdb","msg":"failed to reconcile cluster","Request.Namespace":"percona-support-dbs","Request.Name":"percona-server-psmdb","replset":"rs0","error":"fix tags: write mongo config: replSetReconfig: (NewReplicaSetConfigurationIncomp │ratible) version field value of 2147580194 is out of range","errorVerbose":"(NewReplicaSetConfigurationIncompatible) version field value of 2147580194 is out of range\nreplSetReconfig\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo.WriteConfig\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo/mongo.go:232\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:263\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:493\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nfix tags: write mongo config\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:264\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:493\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor /sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

The above error seems to appear when the operator tries to fix the cluster state and force cluster reconfiguration using rs.reconfig()

I tried to do it manually But I got the following:

rs.reconfig(rs.config(), {force:true});

{
	"operationTime" : Timestamp(1654601016, 1),
	"ok" : 0,
	"errmsg" : "version field value of 2147560657 is out of range",
	"code" : 103,
	"codeName" : "NewReplicaSetConfigurationIncompatible",
	"$gleStats" : {
		"lastOpTime" : Timestamp(0, 0),
		"electionId" : ObjectId("7fffffff0000000000000005")
	},
	"lastCommittedOpTime" : Timestamp(1654601016, 1),
	"$configServerState" : {
		"opTime" : {
			"ts" : Timestamp(1654601021, 3),
			"t" : NumberLong(4)
		}
	},
	"$clusterTime" : {
		"clusterTime" : Timestamp(1654601021, 3),
		"signature" : {
			"hash" : BinData(0,"Uivo7xR187DxHdA6OaRUMfuoXXU="),
			"keyId" : NumberLong("7103095769724354583")
		}
	}
}

Cluster state:

rs0:PRIMARY> rs.status()
{
	"set" : "rs0",
	"date" : ISODate("2022-06-07T11:13:54.278Z"),
	"myState" : 1,
	"term" : NumberLong(5),
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"majorityVoteCount" : 3,
	"writeMajorityCount" : 3,
	"votingMembersCount" : 5,
	"writableVotingMembersCount" : 4,
	"optimes" : {
		"lastCommittedOpTime" : {
			"ts" : Timestamp(1654600431, 122),
			"t" : NumberLong(5)
		},
		"lastCommittedWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
		"readConcernMajorityOpTime" : {
			"ts" : Timestamp(1654600431, 122),
			"t" : NumberLong(5)
		},
		"readConcernMajorityWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
		"appliedOpTime" : {
			"ts" : Timestamp(1654600431, 122),
			"t" : NumberLong(5)
		},
		"durableOpTime" : {
			"ts" : Timestamp(1654600431, 122),
			"t" : NumberLong(5)
		},
		"lastAppliedWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
		"lastDurableWallTime" : ISODate("2022-06-07T11:13:51.444Z")
	},
	"lastStableRecoveryTimestamp" : Timestamp(1654600376, 1),
	"electionCandidateMetrics" : {
		"lastElectionReason" : "electionTimeout",
		"lastElectionDate" : ISODate("2022-06-07T10:03:43.584Z"),
		"electionTerm" : NumberLong(5),
		"lastCommittedOpTimeAtElection" : {
			"ts" : Timestamp(1654596120, 1),
			"t" : NumberLong(4)
		},
		"lastSeenOpTimeAtElection" : {
			"ts" : Timestamp(1654596120, 1),
			"t" : NumberLong(4)
		},
		"numVotesNeeded" : 3,
		"priorityAtElection" : 2,
		"electionTimeoutMillis" : NumberLong(10000),
		"numCatchUpOps" : NumberLong(0),
		"newTermStartDate" : ISODate("2022-06-07T10:03:46.207Z"),
		"wMajorityWriteAvailabilityDate" : ISODate("2022-06-07T10:03:48.934Z")
	},
	"electionParticipantMetrics" : {
		"votedForCandidate" : true,
		"electionTerm" : NumberLong(4),
		"lastVoteDate" : ISODate("2022-06-07T09:55:31.080Z"),
		"electionCandidateMemberId" : 4,
		"voteReason" : "",
		"lastAppliedOpTimeAtElection" : {
			"ts" : Timestamp(1654595382, 1),
			"t" : NumberLong(3)
		},
		"maxAppliedOpTimeInSet" : {
			"ts" : Timestamp(1654595382, 1),
			"t" : NumberLong(3)
		},
		"priorityAtElection" : 2
	},
	"members" : [
		{
			"_id" : 0,
			"name" : "IP:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 4204,
			"optime" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDurable" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDate" : ISODate("2022-06-07T11:13:51Z"),
			"optimeDurableDate" : ISODate("2022-06-07T11:13:51Z"),
			"lastAppliedWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastDurableWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastHeartbeat" : ISODate("2022-06-07T11:13:52.605Z"),
			"lastHeartbeatRecv" : ISODate("2022-06-07T11:13:54.083Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "IP:27017",
			"syncSourceId" : 4,
			"infoMessage" : "",
			"configVersion" : NumberLong(2147479631),
			"configTerm" : -1
		},
		{
			"_id" : 1,
			"name" : "IP:27017",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 4707,
			"lastHeartbeat" : ISODate("2022-06-07T11:13:52.491Z"),
			"lastHeartbeatRecv" : ISODate("2022-06-07T11:13:52.490Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"configVersion" : NumberLong(2147479631),
			"configTerm" : -1
		},
		{
			"_id" : 2,
			"name" : "IP:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 4172,
			"optime" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDurable" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDate" : ISODate("2022-06-07T11:13:51Z"),
			"optimeDurableDate" : ISODate("2022-06-07T11:13:51Z"),
			"lastAppliedWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastDurableWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastHeartbeat" : ISODate("2022-06-07T11:13:53.247Z"),
			"lastHeartbeatRecv" : ISODate("2022-06-07T11:13:53.774Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "IP:27017",
			"syncSourceId" : 4,
			"infoMessage" : "",
			"configVersion" : NumberLong(2147479631),
			"configTerm" : -1
		},
		{
			"_id" : 3,
			"name" : "IP:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 4803,
			"optime" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDate" : ISODate("2022-06-07T11:13:51Z"),
			"lastAppliedWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastDurableWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"electionTime" : Timestamp(1654596223, 1),
			"electionDate" : ISODate("2022-06-07T10:03:43Z"),
			"configVersion" : NumberLong(2147479631),
			"configTerm" : -1,
			"self" : true,
			"lastHeartbeatMessage" : ""
		},
		{
			"_id" : 4,
			"name" : "IP:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 4220,
			"optime" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDurable" : {
				"ts" : Timestamp(1654600431, 122),
				"t" : NumberLong(5)
			},
			"optimeDate" : ISODate("2022-06-07T11:13:51Z"),
			"optimeDurableDate" : ISODate("2022-06-07T11:13:51Z"),
			"lastAppliedWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastDurableWallTime" : ISODate("2022-06-07T11:13:51.444Z"),
			"lastHeartbeat" : ISODate("2022-06-07T11:13:52.741Z"),
			"lastHeartbeatRecv" : ISODate("2022-06-07T11:13:53.971Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "IP:27017",
			"syncSourceId" : 3,
			"infoMessage" : "",
			"configVersion" : NumberLong(2147479631),
			"configTerm" : -1
		}
	],
	"ok" : 1,
	"$gleStats" : {
		"lastOpTime" : Timestamp(0, 0),
		"electionId" : ObjectId("7fffffff0000000000000005")
	},
	"lastCommittedOpTime" : Timestamp(1654600431, 122),
	"$configServerState" : {
		"opTime" : {
			"ts" : Timestamp(1654600433, 1),
			"t" : NumberLong(4)
		}
	},
	"$clusterTime" : {
		"clusterTime" : Timestamp(1654600433, 1),
		"signature" : {
			"hash" : BinData(0,"dQfRZDpSTcHLlCYrTfHRlYznJRA="),
			"keyId" : NumberLong("7103095769724354583")
		}
	},
	"operationTime" : Timestamp(1654600431, 122)
}

What could be the cause here? and how to fix it.

1 Like

Hi!
Thanks for your report. Could you please share with us the following information:

  1. The age of your cluster (how long does it running)
  2. Operator version/Mongo version
  3. Platform you use
  4. Step to reproduce
  5. The output of following command
    kubectl get psmdb -o yaml
1 Like

Hi,

It’s a new cluster running in k8s.
The operator version is 1.12 and Mongo 4.4

Currently the cluster is in ready state after making some changes to WriteConfig function replSetReconfig and setting force value to false in the operator

Before with force: true the cluster was stuck in Initializing state. Not sure why and what happened, but the issue appeared after making some changes to the Percona server cluster configuration.

2 Likes

We found the problem and are going to fix it in the next release. You can check this task for updates [K8SPSMDB-666] Don't force reconfig by default - Percona JIRA

3 Likes

Thanks @Natalia_Marukovich

How this issue can be fixed? Can I have the URL for the fix PR.

1 Like

Hi @reab , It is under review right now K8SPSMDB-666: Don't force reconfig by default by egegunes · Pull Request #980 · percona/percona-server-mongodb-operator · GitHub

2 Likes

Great news, Thanks ! @Slava_Sarzhan

I tested the changes it seems working with fresh cluster. However applied same changes to running cluster with data, I’m still having the same “out of range” issue as before. I believe I need to reset the version field value. How can I do this?

msg":"failed to reconcile cluster","Request.Namespace":"percona-dbs","Request.Name":"psmdb-db-","replset":"rs0","error":"fix │
│  tags: write mongo config: replSetReconfig: (NewReplicaSetConfigurationIncompatible) version field value of 2147483648 is out of range",
1 Like