Percona backup manager not listing snapshots

Hi there,

I am facing a very strange problem while migrating MongoDB community version from Bare Metal/VMs to Kubernetes using Percona MongoDB Operator. All the pods are up and running, but when running the kubectl psmdb-backup, it shows no backups. Running pbm status command on the pod prints status fine, agents are up and running but no existing snapshots list that we took on the bare metal setup.

Below are the logs from K8s percona mongodb backup-agent pod.

Cluster:
========
csrs0:
 - csrs0/percona-prod-clust-csrs0-0.percona-prod-clust-csrs0.mongodb.svc.cluster.local:27017 [P]: pbm-agent v2.3.0 OK
 - csrs0/percona-prod-clust-csrs0-1.percona-prod-clust-csrs0.mongodb.svc.cluster.local:27017 [S]: pbm-agent v2.3.0 OK
 - csrs0/percona-prod-clust-csrs0-2.percona-prod-clust-csrs0.mongodb.svc.cluster.local:27017 [S]: pbm-agent v2.3.0 OK
cfg:
 - cfg/percona-prod-clust-cfg-0.percona-prod-clust-cfg.mongodb.svc.cluster.local:27017 [P]: pbm-agent v2.3.0 OK
 - cfg/percona-prod-clust-cfg-1.percona-prod-clust-cfg.mongodb.svc.cluster.local:27017 [S]: pbm-agent v2.3.0 OK
 - cfg/percona-prod-clust-cfg-2.percona-prod-clust-cfg.mongodb.svc.cluster.local:27017 [S]: pbm-agent v2.3.0 OK
csrs1:
 - csrs1/percona-prod-clust-csrs1-0.percona-prod-clust-csrs1.mongodb.svc.cluster.local:27017 [P]: pbm-agent v2.3.0 OK
 - csrs1/percona-prod-clust-csrs1-1.percona-prod-clust-csrs1.mongodb.svc.cluster.local:27017 [S]: pbm-agent v2.3.0 OK
 - csrs1/percona-prod-clust-csrs1-2.percona-prod-clust-csrs1.mongodb.svc.cluster.local:27017 [S]: pbm-agent v2.3.0 OK


PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 eu-central-1 BUCKET URL

From the existing bare metal servers:

Cluster:
========
csrs1:
  - csrs1/mongodb-prod-rs1-node01:27017 [P]: pbm-agent v2.3.0 OK
  - csrs1/mongodb-prod-rs1-node02:27017 [S]: pbm-agent v2.3.0 OK
  - csrs1/mongodb-prod-rs1-arbiter:27017 [!Arbiter]: arbiter node is not supported
configReplSet:
  - configReplSet/mongodb-prod-configsvr-1:27019 [S]: pbm-agent v2.3.0 OK
  - configReplSet/mongodb-prod-configsvr-2:27019 [P]: pbm-agent v2.3.0 OK
  - configReplSet/mongodb-prod-configsvr-3:27019 [S]: pbm-agent v2.3.0 OK
csrs0:
  - csrs0/mongodb-prod-rs0-node01:27017 [P]: pbm-agent v2.3.0 OK
  - csrs0/mongodb-prod-rs0-arbiter:27017 [!Arbiter]: arbiter node is not supported
  - csrs0/mongodb-prod-rs0-node02:27017 [S]: pbm-agent v2.3.0 OK


PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
Snapshot backup "2024-07-19T16:00:01Z", started at 2024-07-19T16:00:01Z. Status: snapshot backup. [op id: 669a8d812990bd801d45e657]

Backups:
========
S3 eu-central-1 BUCKET URL
  Snapshots:
    2024-07-19T16:00:01Z 0.00B <logical> [running: running / 2024-07-19T16:00:04Z]
    2024-07-19T10:32:00Z 120.76GB <logical> [restore_to_time: 2024-07-19T11:35:24Z]
    2024-07-18T16:00:01Z 120.73GB <logical> [restore_to_time: 2024-07-18T17:02:45Z]
    2024-07-18T08:00:01Z 120.66GB <logical> [restore_to_time: 2024-07-18T09:03:11Z]
    2024-07-18T00:00:01Z 120.67GB <logical> [restore_to_time: 2024-07-18T01:03:06Z]
    2024-07-17T16:00:01Z 120.62GB <logical> [restore_to_time: 2024-07-17T17:02:50Z]
    2024-07-17T08:00:01Z 120.58GB <logical> [restore_to_time: 2024-07-17T09:02:50Z]
    2024-07-17T00:00:01Z 120.62GB <logical> [restore_to_time: 2024-07-17T01:03:01Z]

Any thoughts / insights would be helpful.

@Muhammad_Azhar

It could be the metadata refresh problem. Well that can be resolved by refreshing the metadata or resyncing the backup from storage again by using below command.

shell> pbm config --force-resync
OR
shell> pbm config --force-resync --file /etc/pbm_config.yaml

Are you still taking backups from both old/new setup OR if haven’t taken any backup from the new environment ? Taking backups from both sides(Source and Target) could compromise PBM metadata.

If metadata refresh doesn’t work then please gather the below information from your new environment ?

shell> pbm list
shell> sudo pbm config --list
shell> pbm logs
shell> journalctl -u pbm-agent.service

and below details from your backend database node [MongoDB] ? You need to change your endpoint or credential details as per your database pod.

shell> mongo --port 27017 -u admin -p "paSSword" --authenticationDatabase admin --eval "db.getSiblingDB('admin').pbmBackups.find().forEach(function(f){printjson(f)})" > **pbmBackups.meta**;

shell> mongo --port 27017 -u admin -p "paSSword" --authenticationDatabase admin --eval "db.getSiblingDB('admin').pbmAgents.find().forEach(function(f){printjson(f)})" > **pbmAgents.meta**;

shell> mongo --port 27017 -u admin -p "paSSword" --authenticationDatabase admin --eval "db.getSiblingDB('admin').pbmConfig.find().forEach(function(f){printjson(f)})" > **pbmConfig.meta**;

@Muhammad_Azhar interesting use case!

I’m curious how the migration was executed. Was it through external nodes? Or backup/restore?

By design if you migrate to a new architecture or environment, there is an expectation that you need to take a full backup after the migration. And that starts the backup story from scratch.

@anil.joshi, --force-resync did the job, thanks. I was able to retrieve the backups that were taken on the bare metal machines.

@Sergey_Pronin We have 2 shards setup on bare metal machines. We are going with backup/restore migration. Yes, we took the backup from scratch and restored it on new environment.

It was a bit tricky migration, but through trial and errors we managed today to have the backup restored completely. Next is PITR recovery to have minimal downtime.

Quick question, I noticed few threads where other users mentioned if we don’t use the TLS certificate such cert-manager, the default certificates generated through percona does not auto-renew. Is this still holding true?

@Muhammad_Azhar

Quick question, I noticed few threads where other users mentioned if we don’t use the TLS certificate such cert-manager, the default certificates generated through percona does not auto-renew. Is this still holding true?

This is true. We work closely with the CNCF ecosystem and encourage users to rely on existing building blocks.
cert-manager is a battle tested tool for certificate management. Percona Operators rely on Secret resources for storing TLS certificates. We allow users to choose the best way to manage certificates within Secrets. The job of the Operator is to ensure that these are properly mounted and applied within mongodb.