Percona MongoDB Operator Multi-Cluster Switchover/Failover

Description:

I went through the tutorial to setup Percona MongoDB Multi-Cluster with the K8S Operator. About Multi-cluster - Percona Operator for MongoDB

Everything went smooth. I want now to simulate a switchover/failover but I was not able to find instructions.

What I tried:
Start with:

  • 2 K8S clusters, A and B
  • Percona MongoDB operator deployed in each
  • Deploy a RS with 3 nodes in B, unmanaged, expose it with type NodePort. Sharding disabled.
  • Deploy a RS with 3 nodes in A, managed, expose it with type NodePort. Sharding disabled.Use B nodes as externalNodes (using host IPs and NodePorts ports)
  • make sure the credentials and certificates are the same
  • The setup is successful, I have a 6 node ReplicaSet and both PSMDB CRs are “ready”

As a switchover process:

  • Change PSMDB CR in A from managed to unmanaged
  • Connect to MongoDB and remove A’s nodes from the ReplicaSet. Now the replicaSet have 3 nodes, the nodes from B
  • Change PSMDB CR in B from unmanaged to managed
  • The CR in B moves to error state with error: message: ‘Error: failed to update config members: delete: write mongo config: replSetReconfig:
    (NodeNotFound) No host described in new configuration with {version: 131364, term:
    2} for replica set rs0 maps to this node’

Hi, please take a look at Disaster Recovery for MongoDB on Kubernetes

Thanks for sharing the link. The failover (disaster recovery) process worked as described.

Now, in case of a switchover, where both the clusters are healthy but I just want to switch the PRIMARY/ACTIVE from A to B, I did not manage to make it work.

What I did:

  • Start with the same setup as described in the blog post
  • Changed the CR in A from managed to unmanaged
  • Leave the ReplicaSet config as it was (6 nodes, all healthy)
  • Changed the CR in B from unmanaged to managed
  • The CR in B moves to error state:
  message: 'Error: failed to update config members: fix member hostname: write mongo
    config: replSetReconfig: (NewReplicaSetConfigurationIncompatible) Found two member
    configurations with same host field, members.1.host == members.4.host == <host IP>:<node port>'
  mongoImage: percona/percona-server-mongodb:7.0.15-9-multi

LE: Removing the tags from all Mongo nodes and moving the primary to B instead of A resulted in a successful switchover. However, the process involved quite a few manual steps, and I’m unsure whether it is entirely deterministic.

What you did looks good to me. Can you share the output of rs.conf() ? also what is the value of clusterServiceDNSMode ?

clusterServiceDNSMode: "External"

rs.conf() before switchover:

{
  _id: 'rs0',
  version: 78580,
  term: 3,
  members: [
    {
      _id: 0,
      host: '1.2.3.4:30864',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        serviceName: 'mongodb',
        nodeName: 'mw-3',
        podName: 'mongodb-rs0-0'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 1,
      host: '1.2.3.5:31383',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        serviceName: 'mongodb',
        nodeName: 'mw-1',
        podName: 'mongodb-rs0-1'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 2,
      host: '1.2.3.6:32351',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        serviceName: 'mongodb',
        nodeName: 'mw-2',
        podName: 'mongodb-rs0-2'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 3,
      host: '1.2.3.7:30308',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 4,
      host: '1.2.3.8:32118',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 5,
      host: '1.2.3.9:30498',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 0,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 0
    }
  ],
  protocolVersion: Long('1'),
  writeConcernMajorityJournalDefault: true,
  settings: {
    chainingAllowed: true,
    heartbeatIntervalMillis: 2000,
    heartbeatTimeoutSecs: 10,
    electionTimeoutMillis: 10000,
    catchUpTimeoutMillis: -1,
    catchUpTakeoverDelayMillis: 30000,
    getLastErrorModes: {},
    getLastErrorDefaults: { w: 1, wtimeout: 0 },
    replicaSetId: ObjectId('67dbf1520289a4b6b57b904b')
  }
}

cleanup tags and switch primary:

cfg = rs.config()
cfg.members[0].tags = {}
cfg.members[1].tags = {}
cfg.members[2].tags = {}
cfg.members[3].tags = {}
cfg.members[4].tags = {}
cfg.members[5].tags = {}
cfg.members[3].priority = 10
rs.reconfig(cfg, {force: true})

rs.conf() after switchover

{
  _id: 'rs0',
  version: 148264,
  term: 4,
  members: [
    {
      _id: 0,
      host: '1.2.3.4:30864',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 1,
      host: '1.2.3.5:31383',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 2,
      host: '1.2.3.6:32351',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 0,
      tags: { external: 'true' },
      secondaryDelaySecs: Long('0'),
      votes: 0
    },
    {
      _id: 3,
      host: '1.2.3.7:30308',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        podName: 'mongodb-rs0-0',
        serviceName: 'mongodb',
        nodeName: 'mw-3'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 4,
      host: '1.2.3.8:32118',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        podName: 'mongodb-rs0-1',
        serviceName: 'mongodb',
        nodeName: 'mw-2'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    },
    {
      _id: 5,
      host: '1.2.3.9:30498',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 2,
      tags: {
        nodeName: 'mw-1',
        podName: 'mongodb-rs0-2',
        serviceName: 'mongodb'
      },
      secondaryDelaySecs: Long('0'),
      votes: 1
    }
  ],
  protocolVersion: Long('1'),
  writeConcernMajorityJournalDefault: true,
  settings: {
    chainingAllowed: true,
    heartbeatIntervalMillis: 2000,
    heartbeatTimeoutSecs: 10,
    electionTimeoutMillis: 10000,
    catchUpTimeoutMillis: -1,
    catchUpTakeoverDelayMillis: 30000,
    getLastErrorModes: {},
    getLastErrorDefaults: { w: 1, wtimeout: 0 },
    replicaSetId: ObjectId('67dbf1520289a4b6b57b904b')
  }
}

Tag cleanup should be performed by operator when you change the site B to managed. I believe there might be an issue with the cr.yaml on site B. Anyway, we will update the documentation with clear steps for switchover. Feel free to subscribe to Jira for updates

In case of a failover, when I manually remove the A’s nodes from rs.conf(), like described in the blog post, I can confirm the operator refreshes the tags. In case of a switchover, where I want to keep all the nodes but switch the “Active” cluster, so the active operator and the primary node, the operator gets confused by the existing tags and fails as described above.

Anyway, indeed it could be a config error on my end so a reproducible procedure will certainly help me identify any misconfigurations.

Hi @Ivan_Groenewold Is this still on your radar? I still can’t perform the switchover if I do not manually cleanup the tags.

@Laurentiu_Soica I will check this sometime this week.

I think the issue with the tags might be due to your manually removing nodes from the replica set.

Can you try instead letting the operator handle changes? basically add nodes from site B as external on site A, and add nodes from site A as external on site B.

At switchover time, you set the cluster on A to “unmanaged”, and then on B to “managed”. The operator should take care of promoting a node on B site as primary.

I think that’s the process I follow (I do node removal on failover only, not switchover).

In case of a switchover, this is the process:

  • Start with cluster A with unmanaged: false, updateStrategy: SmartUpdate, 3 local nodes, 3 externalNodes and cluster B with unmanaged: true, updateStrategy: OnDelete, 3 local nodes, 3 externalNodes
  • Demote cluster A: unmanaged: true, updateStrategy: OnDelete, 3 local nodes, 3 externalNodes
  • Promote cluster B: unmanaged: false, updateStrategy: SmartUpdate, 3 local nodes, 3 externalNodes

At this point, cluster B CR reports:

message: ‘Error: failed to update config members: fix member hostname: write mongo
config: replSetReconfig: (NewReplicaSetConfigurationIncompatible) Found two member
configurations with same host field, members.1.host == members.4.host == IP:PORT’
mongoImage: percona/percona-server-mongodb:8.0.8-3
mongoVersion: 8.0.8-3
observedGeneration: 4
ready: 3
replsets:
rs0:
initialized: true
ready: 3
size: 3
status: ready
size: 3
state: error

I couldn’t reproduce the same behavior. In my case the promotion of cluster B works fine. Would you mind opening a bug at https://perconadev.atlassian.net/ and providing the full cr.yaml for both clusters? that way the dev team can take a look.

I tried to add the details here Jira

Wasn’t able to edit description or add files so I’ve add the details as comments.