Percona MongoDB Operator Milti-Cluster Switchover/Failover

Description:

I went through the tutorial to setup Percona MongoDB Multi-Cluster with the K8S Operator. About Multi-cluster - Percona Operator for MongoDB

Everything went smooth. I want now to simulate a switchover/failover but I was not able to find instructions.

What I tried:
Start with:

  • 2 K8S clusters, A and B
  • Percona MongoDB operator deployed in each
  • Deploy a RS with 3 nodes in B, unmanaged, expose it with type NodePort. Sharding disabled.
  • Deploy a RS with 3 nodes in A, managed, expose it with type NodePort. Sharding disabled.Use B nodes as externalNodes (using host IPs and NodePorts ports)
  • make sure the credentials and certificates are the same
  • The setup is successful, I have a 6 node ReplicaSet and both PSMDB CRs are “ready”

As a switchover process:

  • Change PSMDB CR in A from managed to unmanaged
  • Connect to MongoDB and remove A’s nodes from the ReplicaSet. Now the replicaSet have 3 nodes, the nodes from B
  • Change PSMDB CR in B from unmanaged to managed
  • The CR in B moves to error state with error: message: ‘Error: failed to update config members: delete: write mongo config: replSetReconfig:
    (NodeNotFound) No host described in new configuration with {version: 131364, term:
    2} for replica set rs0 maps to this node’

Hi, please take a look at Disaster Recovery for MongoDB on Kubernetes

Thanks for sharing the link. The failover (disaster recovery) process worked as described.

Now, in case of a switchover, where both the clusters are healthy but I just want to switch the PRIMARY/ACTIVE from A to B, I did not manage to make it work.

What I did:

  • Start with the same setup as described in the blog post
  • Changed the CR in A from managed to unmanaged
  • Leave the ReplicaSet config as it was (6 nodes, all healthy)
  • Changed the CR in B from unmanaged to managed
  • The CR in B moves to error state:
  message: 'Error: failed to update config members: fix member hostname: write mongo
    config: replSetReconfig: (NewReplicaSetConfigurationIncompatible) Found two member
    configurations with same host field, members.1.host == members.4.host == <host IP>:<node port>'
  mongoImage: percona/percona-server-mongodb:7.0.15-9-multi

LE: Removing the tags from all Mongo nodes and moving the primary to B instead of A resulted in a successful switchover. However, the process involved quite a few manual steps, and I’m unsure whether it is entirely deterministic.

What you did looks good to me. Can you share the output of rs.conf() ? also what is the value of clusterServiceDNSMode ?

clusterServiceDNSMode: "External"

rs.conf() before switchover:

{
  _id: 'rs0',
  version: 78580,
  term: 3,
  members: [
    {
      _id: 0,
      host: '1.2.3.4:30864',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        serviceName: 'mongodb',
        nodeName: 'mw-3',
        podName: 'mongodb-rs0-0'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 1,
      host: '1.2.3.5:31383',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        serviceName: 'mongodb',
        nodeName: 'mw-1',
        podName: 'mongodb-rs0-1'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 2,
      host: '1.2.3.6:32351',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        serviceName: 'mongodb',
        nodeName: 'mw-2',
        podName: 'mongodb-rs0-2'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 3,
      host: '1.2.3.7:30308',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 4,
      host: '1.2.3.8:32118',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 5,
      host: '1.2.3.9:30498',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 0,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 0
    }
  ],
  protocolVersion: Long('1'),
  writeConcernMajorityJournalDefault: true,
  settings: {
    chainingAllowed: true,
    heartbeatIntervalMillis: 2000,
    heartbeatTimeoutSecs: 10,
    electionTimeoutMillis: 10000,
    catchUpTimeoutMillis: -1,
    catchUpTakeoverDelayMillis: 30000,
    getLastErrorModes: {},
    getLastErrorDefaults: { w: 1, wtimeout: 0 },
    replicaSetId: ObjectId('67dbf1520289a4b6b57b904b')
  }
}

cleanup tags and switch primary:

cfg = rs.config()
cfg.members[0].tags = {}
cfg.members[1].tags = {}
cfg.members[2].tags = {}
cfg.members[3].tags = {}
cfg.members[4].tags = {}
cfg.members[5].tags = {}
cfg.members[3].priority = 10
rs.reconfig(cfg, {force: true})

rs.conf() after switchover

{
  _id: 'rs0',
  version: 148264,
  term: 4,
  members: [
    {
      _id: 0,
      host: '1.2.3.4:30864',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 1,
      host: '1.2.3.5:31383',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 2,
      host: '1.2.3.6:32351',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 0,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 0
    },
    {
      _id: 3,
      host: '1.2.3.7:30308',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        podName: 'mongodb-rs0-0',
        serviceName: 'mongodb',
        nodeName: 'mw-3'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 4,
      host: '1.2.3.8:32118',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        podName: 'mongodb-rs0-1',
        serviceName: 'mongodb',
        nodeName: 'mw-2'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 5,
      host: '1.2.3.9:30498',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        nodeName: 'mw-1',
        podName: 'mongodb-rs0-2',
        serviceName: 'mongodb'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    }
  ],
  protocolVersion: Long('1'),
  writeConcernMajorityJournalDefault: true,
  settings: {
    chainingAllowed: true,
    heartbeatIntervalMillis: 2000,
    heartbeatTimeoutSecs: 10,
    electionTimeoutMillis: 10000,
    catchUpTimeoutMillis: -1,
    catchUpTakeoverDelayMillis: 30000,
    getLastErrorModes: {},
    getLastErrorDefaults: { w: 1, wtimeout: 0 },
    replicaSetId: ObjectId('67dbf1520289a4b6b57b904b')
  }
}

Tag cleanup should be performed by operator when you change the site B to managed. I believe there might be an issue with the cr.yaml on site B. Anyway, we will update the documentation with clear steps for switchover. Feel free to subscribe to Jira for updates

In case of a failover, when I manually remove the A’s nodes from rs.conf(), like described in the blog post, I can confirm the operator refreshes the tags. In case of a switchover, where I want to keep all the nodes but switch the “Active” cluster, so the active operator and the primary node, the operator gets confused by the existing tags and fails as described above.

Anyway, indeed it could be a config error on my end so a reproducible procedure will certainly help me identify any misconfigurations.