How to plan databases availability during monthly linux patching

We have nearly 500 MongoDB replica sets (normal + sharded clusters) span across tow data centers. Each cluster is having minimum 3 members on an average. so we have 500 x 3 = 1500 members (physical servers + virtual servers combination based app preferences).

Due to enterprise standards, every host shoudl undergo monthly patching. which is created a big challenge to DBAs to plan the database server high availability during the month Linux patching.

The challenge we are facing currently is to ensure the database server availability through manual checks to ensure one server from each cluster go for maintenance where the patching activity start 2 week of the month and ends last week of the month.

Wondering how the big companies managing the linux patchign by achieving the high availability by keepign majority of the voting members in the cluster during the patching?

And how to avoid manual intervention?

Hi Venug.

This is good topic to open a discussion about. MongoDB’s replica sets means that regular maintenance doesn’t have to cause any downtime for the database clients, but (important) you must not take down the majority of nodes in any given replica set at a given time.

Rule of ‘Minority maximum death’?

Before stopping and restarting any mongod node, or the host it is on, the administration script/program needs to determine what the other nodes in the same replica set are and confirm the majority (i.e. two, for the typical three-node replica set) are healthy before stopping that node. ‘Healthy’ = either PRIMARY, or SECONDARY with no noticeable replication lag.

Parallelization for the win

You can do replica sets in parallel so it doesn’t take longer in theory if you have one replica set or five hundred. Of course in practice you’ll probably stagger launches of the update procedure so there isn’t too much to be watched at any given moment.

The tricky parts

The tricky part is maintaining ‘Minority maximum death’ as the script/program runs

I think the root issue behind this is devops tools are built without a sense of distributed data. They’re made to run per server using only what state they can sense local to the server.

Parallel launch in the same replica set can easily lead to race conditions. For example a local script running on hosts A, B and C of the same replicaset might first run a check ‘Are the other hosts healthy PRIMARY or SECONDARY state now?’. If they’re done in parallel at the starting moment the answer will be true. Then A, B and C all restart in another moment, taking down the entire replicaset simultaneously.

Safety mechanisms have to be programmed. In short I think a semaphore of some kind should be put into the replica set’s own data so external agents can see which node is schedule to be restarted at any given moment. But at this point we should start determining some concrete points about your case.

  • What is being patched? The entire host server, or just the mongod binaries? (The former is what I assume.)
  • What is the sys admin method? Bash scripts and passwordless SSH? A devops tool like Puppet, Ansible, etc? Program using API of a computing service like AWS or whole-real-bare-metal system like Ubuntu’s MAAS?
1 Like

Thanks Akira for your thoughts on the post.

If I understand correctly, parallelization is nothing but rolling method upgrades to ensure we do not have majority deaths.

Even for rolling fashion OS upgrades, we are following the below steps,

  1. Linux team (sysadmins) releases the list of servers that are planned for patching in the first week of every month
  2. DBA admins process the list and ensure any 2 of the server identifies and preapre the master list in nas location
  3. Upgrade process (puppet process by sysadmin) refers master list and upgrades the database server one at a time as mentioned in the master list

But still this process involves atleast 3 days of one single DBA resource effort to verify and prepare the master list

Wondering how the large organizations manages the upgrades process on linux server that hosts mondodb instances monthly or quarterly or whatever defined time.

As you highlighted, safety mechanism had to be programmed.

What is being patched? The entire host server, or just the mongod binaries? (The former is what I assume.)

  • What is the sys admin method? Bash scripts and passwordless SSH? A devops tool like Puppet, Ansible, etc? Program using API of a computing service like AWS or whole-real-bare-metal system like Ubuntu’s MAAS?

OS (RHEL) is being patched every monthly
sysadmin method is mostly through puppet using flyway tools

1 Like

Hi

OS (RHEL) is being patched every monthly
sysadmin method is mostly through puppet using flyway tools

Got it.

I had the experience of programming a solution using Ansible ~3 years ago whilst I was a contractor DBA for a while. Ansible was a fixed requirement because it had already been used for the non-distributed databases in that department. I found I could use Ansible’s group hierarchy well for organizing environments, clusters, and replica sets. Furthermore the number of nodes at that site wasn’t big - so I could review the list of servers manually within several page scrolls. At least for a single environment (prod/qa/dev). So I could for example have a subgroup for every replica set

But then there was no way (at least with Ansible at the time) to get it to do work serially within each group.

If I recall correctly I executed the same shell script in parallel. The script was very careful to wait and not proceed if more than one replica set member was down; when it did do a restart it exited. I can’t recall what I used to choose the next member to restart - but the one with the maximum uptime at a given time seems like a suitable property.

If puppet (or modern Ansible) has a way to instruct work be performed serially within a subgroup, or follow a “disruption budget” of 1 that would make it simpler.

Also if it is a sharded cluster it should be the config server replica set first, then shards in parallel after.

In your case you’re not just doing a ‘systemctl restart mongod’, you’re doing a ‘systemctl stop mongod; [systemctl] reboot’ + ‘systemctl start mongod’ after the server is communicating again so that’s a bit different.

2 Likes

I feel the programatic way is the only solution


1 Like