PBM Physical Restore in K8s Not Working

dclark · July 16, 2025, 2:05pm

Hey all,

So I was having issues with the logical restores in K8s per the post here. I have since got the helm chart 1.20.1 up and running with the operator and pbm agent 2.10.0. This did not resolve the issue per this post. It appears that it is still an issue with a multi-replicaset setup and mongodb > 6.x.

As a result, I have started looking into physical restores, will likely be necessary in production anyway since they should be significantly faster. However I am seeing some very odd behavior when attempting the physical restores.

Essentially, I kick off the restore with a cr.yaml file;

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: physical-restore-from-main-1
  namespace: psmdb-dev-reports
spec:
  clusterName: psmdb-dev-reports-psm
  storageName: s3-us-east-physical
  backupSource:
    type: physical
    destination: s3://<my_bucket>/physical/2025-07-14T20:50:22Z
    s3:
      credentialsSecret: psmdb-backup-s3
      bucket: <my_bucket>

Once I kick it off with kubectl apply -f deploy/backup/restore-physical.yaml -n psmdb-dev-reports, it sits for a while, kubectl get psmdb-restore -n psmdb-dev-reports without a status change. During this time, the operator generates no logs nor the pbm agent. After about 5 minutes the operator will kill the mongos instances, as is to be expected.

After this there is still no status change and no further logs in the operator, still no logs in pbm agent with pbm logs -f. I assume it is doing the file restore in the background. After about another 5 minutes, the operator starts throwing logs about Waiting for statefulsets to be ready before restore. Similar to this;

2025-07-15T18:17:50.726Z    INFO    Waiting for statefulsets to be ready before restore    {"controller": "psmdbrestore-controller", "controllerGroup": "psmdb.percona.com", "contr
ollerKind": "PerconaServerMongoDBRestore", "PerconaServerMongoDBRestore": {"name":"physical-restore-from-main-1","namespace":"psmdb-dev-reports"}, "namespace": "psmdb-dev-reports"
, "name": "physical-restore-from-main-1", "reconcileID": "7973623e-ac14-4087-b2d5-ada7cdd37e59", "ready": false}
2025-07-15T18:17:55.727Z    INFO    Waiting for statefulsets to be ready before restore    {"controller": "psmdbrestore-controller", "controllerGroup": "psmdb.percona.com", "contr
ollerKind": "PerconaServerMongoDBRestore", "PerconaServerMongoDBRestore": {"name":"physical-restore-from-main-1","namespace":"psmdb-dev-reports"}, "namespace": "psmdb-dev-reports"
, "name": "physical-restore-from-main-1", "reconcileID": "8322868f-7960-4800-b7a3-6120ef2863c7", "ready": false}
2025-07-15T18:18:53.902Z    INFO    SmartUpdate    apply changes to secondary pod    {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "
PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-dev-reports-psm","namespace":"psmdb-dev-reports"}, "namespace": "psmdb-dev-reports", "name": "psmdb-dev-reports-psm",
 "reconcileID": "6c946aa9-f1dc-457d-8f2a-65d963d1310e", "statefulset": "psmdb-dev-reports-psm-amfam", "replset": "amfam", "pod": "psmdb-dev-reports-psm-amfam-1"}
2025-07-15T18:19:34.797Z    INFO    Pod started    {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaSer
verMongoDB": {"name":"psmdb-dev-reports-psm","namespace":"psmdb-dev-reports"}, "namespace": "psmdb-dev-reports", "name": "psmdb-dev-reports-psm", "reconcileID": "6c946aa9-f1dc-457
d-8f2a-65d963d1310e", "pod": "psmdb-dev-reports-psm-amfam-1"}

At this point it starts restarting all of the pods, which I also think is to be expected. However, when the pods come back online, they are missing the backup-agent agent container. This results in the backup eventually failing stating that there are no pbm agents available for the restore. I was following pbm logs up to the point of the the pod being restarted and nothing was ever posted to it. I did however see a few errors in admin.pbmlog for each node of the RS noting msg: 'mark error during restore: check mongod binary: run: exec: "mongod": executable file not found in $PATH. stderr: '. Not sure if these were related to the backup though as I was only able to look after the pods were restarted and mongos came back up and I could connect.

The other really odd thing is that I have tried deleting pods, patching the helm chart etc. in order to get the pbm-agent container back. The only two things that seem to work are deletion of the stateful sets, or a complete terragrunt destroy/apply.

We are running EKS on AWS currently v1.32, helm chart for percona is 1.20.1, physical restore attempted with backup-agent 2.10.0 and 2.9.1. It should also be noted that I have tried numerous variations of the restore yaml. Including removing the s3 section as all of that is defined in the storage name. Also credentialsSecret: psmdb-backup-s3 is there with the correct AWS secrets.

Helm values for the mongodb deploy;

# Cluster DNS Suffix
# clusterServiceDNSSuffix: svc.cluster.local
# clusterServiceDNSMode: "Internal"

finalizers:
  - percona.com/delete-psmdb-pods-in-order

nameOverride: ""
fullnameOverride: ""

crVersion: 1.20.1
pause: false
unmanaged: false
unsafeFlags:
  tls: false
  replsetSize: false
  mongosSize: false
  terminationGracePeriod: false
  backupIfUnhealthy: false

enableVolumeExpansion: true

annotations: {}

multiCluster:
  enabled: false

updateStrategy: SmartUpdate
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false

image:
  repository: percona/percona-server-mongodb
  tag: 7.0.12-7
  # tag: 7.0.18-11
imagePullPolicy: Always

secrets:
  encryptionKey: psmdb-encryption-key
  users: psmdb-users-secrets

pmm:
  enabled: true
  image:
    repository: percona/pmm-client
    tag: 2.44.1
  serverHost: pmm.override.in.terraform.threatx.io

replsets:
  rs0:
    name: rs0
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      type: ClusterIP
    resources:
      limits:
        cpu: "4"
        memory: "4G"
      requests:
        cpu: "1"
        memory: "1G"
    volumeSpec:
      pvc:
        storageClassName: gp3-mongo-standard-unencrypted
        resources:
          requests:
            storage: 25Gi
  
  amfam:
    name: amfam
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      type: ClusterIP
    resources:
      limits:
        cpu: "4"
        memory: "4G"
      requests:
        cpu: "1"
        memory: "1G"
    volumeSpec:
      pvc:
        storageClassName: gp3-mongo-standard-unencrypted
        resources:
          requests:
            storage: 25Gi

sharding:
  enabled: true
  balancer:
    enabled: true

  configrs:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      type: ClusterIP
    resources:
      limits:
        cpu: "1"
        memory: "1G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: gp3-mongo-standard-unencrypted
        resources:
          requests:
            storage: 3Gi

  mongos:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "4"
        memory: "4G"
      requests:
        cpu: "1"
        memory: "1G"
    expose:
      enabled: true
      exposeType: LoadBalancer
      servicePerPod: true
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: "external"
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

backup:
  enabled: true
  image:
    repository: percona/percona-backup-mongodb
    tag: 2.9.1
  resources:
    limits:
      cpu: "3"
      memory: "3G"
    requests:
      cpu: "1"
      memory: "1G"
  storages:
    s3-us-east-logical:
      # main: true
      type: s3
      s3:
        bucket: derrived-from-terraform
        credentialsSecret: psmdb-backup-s3
        serverSideEncryption:
          kmsKeyID: derrived-from-terraform
          sseAlgorithm: aws:kms
        region: us-east-2
        prefix: "logical"
        storageClass: INTELLIGENT_TIERING
    s3-us-east-physical:
      main: true
      type: s3
      s3:
        bucket: derrived-from-terraform
        credentialsSecret: psmdb-backup-s3
        serverSideEncryption:
          kmsKeyID: derrived-from-terraform
          sseAlgorithm: aws:kms
        region: us-east-2
        prefix: "physical"
        storageClass: INTELLIGENT_TIERING
  pitr:
    enabled: false
    oplogOnly: false
  tasks:

Happy to provide any additional information if I can provide it. If anyone has any thoughts it would be greatly appreciated.

dclark · July 16, 2025, 5:33pm

Ok, some more information from what I can gather but not a go expert.

It looks like the process does the following;

removes the mongos stateful set. - makes sense bring down mongos.
Does some magic in the background, likely pulling data from aws and prepping it for restore.
Triggers a patch to the pbm stateful set. This appears to be where the problem resides. After the patch the pods come back but without the backup-agent container. Once the stateful set is back, we can see that it has the init but nothing for the actual container. Our best guess at this point is that it waits for 5 minutes for all of the stateful sets to be updated, however with a multi replicaset this may take longer than 5 min. It then fails or exits without restoring the stateful set back to its original state.

2025-07-16T17:04:23.725Z  INFO  Waiting for statefulsets to be ready before restore
.....
 2025-07-16T17:09:53.855Z  INFO  Waiting for statefulsets to be ready before restore

2025-07-16T17:09:58.917Z	ERROR	failed to make restore	{"controller": "psmdbrestore-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBRestore", "PerconaServerMongoDBRestore": {"name":"physical-restore-from-main-1","namespace":"psmdb-dev-reports"}, "namespace": "psmdb-dev-reports", "name": "physical-restore-from-main-1", "reconcileID": "ffaf9619-720f-4e9e-be70-41f5abd497e3", "restore": "physical-restore-from-main-1", "backup": "", "error": "check if pbm agents are ready: get pbm status: command terminated with exit code 1", "errorVerbose": "command terminated with exit code 1\nget pbm status\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore.(*ReconcilePerconaServerMongoDBRestore).checkIfPBMAgentsReadyForPhysicalRestore.func2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore/physical.go:1031\nk8s.io/client-go/util/retry.OnError.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.33.0/util/retry/util.go:51\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection\n\t/go/pkg/mod/k8s.io/apimachinery@v0.33.0/pkg/util/wait/wait.go:150\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/apimachinery@v0.33.0/pkg/util/wait/backoff.go:477\nk8s.io/client-go/util/retry.OnError\n\t/go/pkg/mod/k8s.io/client-go@v0.33.0/util/retry/util.go:50\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore.(*ReconcilePerconaServerMongoDBRestore).checkIfPBMAgentsReadyForPhysicalRestore\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore/physical.go:1014\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore.(*ReconcilePerconaServerMongoDBRestore).reconcilePhysicalRestore\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore/physical.go:130\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore.(*ReconcilePerconaServerMongoDBRestore).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore/perconaservermongodbrestore_controller.go:250\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:334\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700\ncheck if pbm agents are ready\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore.(*ReconcilePerconaServerMongoDBRestore).reconcilePhysicalRestore\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore/physical.go:132\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore.(*ReconcilePerconaServerMongoDBRestore).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbrestore/perconaservermongodbrestore_controller.go:250\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:334\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700"}

I am going to revert to an earlier version of the operator/helm charts like 1.19.1 and see if it happens there as well but need to leave it in current state for the moment as we analyze.

dclark · July 17, 2025, 8:08pm

Ok, I have some more information.

It looks like what is happening is that the crd patch makes changes to the mongod container by installing pbm as needed for the restore. That install is to /opt/percona/pbm. However if I try to execute that command on a node that is in a currently bad state I get the following GLIBC errors.

[mongodb@psmdb-dev-reports-psm-amfam-0 db]$ /opt/percona/pbm
/opt/percona/pbm: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /opt/percona/pbm)
/opt/percona/pbm: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /opt/percona/pbm)

Best guess is that it is coming from this function in ./pkg/controller/perconaservermongodbrestore/physical.go

func getPBMBinaryAndContainerForExec(pod *corev1.Pod) (string, string) {
        container := "mongod"
        pbmBinary := "/opt/percona/pbm"

        for _, c := range pod.Spec.Containers {
                if c.Name == naming.ContainerBackupAgent {
                        return naming.ContainerBackupAgent, "pbm"
                }
        }

        return container, pbmBinary
}

Any thoughts?

Ege_Gunes · July 18, 2025, 1:10pm

Hi @dclark,

This happens because your PBM docker image and PSMDB docker image has different base images. Starting from v7.0.16, base image is changed in PSMDB docker image. Would it be possible for you to use PSMDB >=v7.0.16?

dclark · July 18, 2025, 2:33pm

It is a new reporting DB so I can def try. May not be able to get to it today but happy to give it a whirl and see what happens. I will let you know.

Thanks!

dclark · July 22, 2025, 2:17pm

Ok, So some more info.

With 7.16 it makes it further along than it did before. However it is now failing on the restore itself. Looks like a permission error based on the file that is uploaded to s3 but does not indicate permissions to what. Doesn’t seem to be an s3 issue since it can pull, copy and upload files to s3.

Below are the logs from cfg, rs0 and amfam replicasets as well as output of the xml file that was uploaded to s3 with the permissions error.

cfg logs

2025-07-22T13:56:43.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] download stat: buf 2147483648, arena 268435456, span 33554432, spanNum 8, cc 8, [{1 0} {1 0} {1 0} {1 0} {1
 0} {1 0} {0 0} {1 0}]
2025-07-22T13:56:43.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] preparing data
2025-07-22T13:56:45.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] oplogTruncateAfterPoint: {1752526255 112}
2025-07-22T13:56:46.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] recovering oplog as standalone
2025-07-22T13:56:49.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] clean-up and reset replicaset config
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmAgents'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmBackups'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmCmd'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmPITR'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmOpLog'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmLog'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmLock'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmLock'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmPITRChunks'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmRestores'
2025-07-22T13:56:51.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] dropping 'admin.pbmLockOp'
2025-07-22T13:56:53.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] restore on node succeed
2025-07-22T13:56:53.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] moving to state done
2025-07-22T13:56:53.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-re
ports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.done" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-07-22T13:56:53.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] waiting for `done` status in rs map[.pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-repor
ts-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-1.psmdb-dev-r
eports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev
-reports.svc.cluster.local:27017:{}]
2025-07-22T13:56:59.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs.done" [size hint: 10 (10.00B); part size:
10485760 (10.00MB)]
2025-07-22T13:56:59.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] waiting for shards map[.pbm.restore/2025-07-22T13:54:35.803952785Z/rs.amfam/rs:{} .pbm.restore/2025-07-22T1
3:54:35.803952785Z/rs.cfg/rs:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.rs0/rs:{}]
2025-07-22T13:57:18.000+0000 E [pitr] init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: psmdb
-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.35.228:27017: connect: connection refused },
{ Addr: psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.25.148:27017: connect: connectio
n refused }, { Addr: psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.58.52:27017: connec
t: connection refused }, ] }
2025-07-22T13:58:33.000+0000 E [pitr] init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: psmdb
-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.35.228:27017: connect: connection refused },
{ Addr: psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.25.148:27017: connect: connectio
n refused }, { Addr: psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.58.52:27017: connec
t: connection refused }, ] }
2025-07-22T13:58:36.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-re
ports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.hb" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-07-22T13:58:36.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs.hb" [size hint: 10 (10.00B); part size: 10
485760 (10.00MB)]
2025-07-22T13:58:36.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/cluster.hb" [size hint: 10 (10.00B); part size: 1048
5760 (10.00MB)]
2025-07-22T13:59:48.000+0000 E [pitr] init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: psmdb
-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.35.228:27017: connect: connection refused },
{ Addr: psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.25.148:27017: connect: connectio
n refused }, { Addr: psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.58.52:27017: connec
t: connection refused }, ] }
2025-07-22T14:00:36.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-re
ports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.hb" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-07-22T14:00:36.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs.hb" [size hint: 10 (10.00B); part size: 10
485760 (10.00MB)]
2025-07-22T14:00:36.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/cluster.hb" [size hint: 10 (10.00B); part size: 1048
5760 (10.00MB)]
2025-07-22T14:00:39.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] rm tmp conf
2025-07-22T14:00:39.000+0000 E [restore/2025-07-22T13:54:35.803952785Z] restore: moving to state done: wait for shards: check heartbeat in .pbm.restore/2025-07-22T13:54:35.8039527
85Z/rs.rs0/rs.hb: stuck, last beat ts: 1753192596
2025-07-22T14:00:39.000+0000 I change stream was closed
2025-07-22T14:00:39.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] hearbeats stopped
2025-07-22T14:00:39.000+0000 D [restore/2025-07-22T13:54:35.803952785Z] uploading ".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/log/psmdb-dev-reports-psm-cfg-2.psmdb-dev-rep
orts-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.0.log" [size hint: -1 (unknown); part size: 10485760 (10.00MB)]
2025-07-22T14:00:39.000+0000 D [agentCheckup] deleting agent status
2025-07-22T14:00:39.000+0000 I Exit: <nil>

rs0 logs

...
2025-07-22T13:57:51.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/rs0/collection-272-8179271578848083738.wt.zst> to </data/db/collection-272-81792
71578848083738.wt>
2025-07-22T13:57:51.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/rs0/collection-2722--4897946048421971911.wt.zst> to </data/db/collection-2722--4
897946048421971911.wt>
2025-07-22T13:57:51.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/rs0/collection-2722-3364827609456467902.wt.zst> to </data/db/collection-2722-336
4827609456467902.wt>
2025-07-22T13:57:51.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/rs0/collection-2723--3348092466159302869.wt.zst> to </data/db/collection-2723--3
348092466159302869.wt>

amfam logs

...
2025-07-22T13:57:38.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/amfam/collection-194--4446793882982678545.wt.zst> to </data/db/collection-194--4
446793882982678545.wt>
2025-07-22T13:57:38.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/amfam/collection-194--6226521036319455186.wt.zst> to </data/db/collection-194--6
226521036319455186.wt>
2025-07-22T13:57:38.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/amfam/collection-194--7888739071994594822.wt.zst> to </data/db/collection-194--7
888739071994594822.wt>
2025-07-22T13:57:38.000+0000 I [restore/2025-07-22T13:54:35.803952785Z] copy <2025-07-14T20:50:22Z/amfam/collection-194--8609606868955300066.wt.zst> to </data/db/collection-194--8
609606868955300066.wt>

.pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/log/psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.0.log

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>8NEPT92NXMNNSHVN</RequestId>
<HostId>pEHe+eNiEVltJEq+rCL5WA0fyBog/k0kt1Jjq5vjiyl43a9KigRfjBzZN6vNFkRBxmYGwDIdvjs=</HostId>
</Error>

Any thoughts or help would be greatly appreciated.

Thanks!

dclark · July 22, 2025, 2:22pm

Ok, so not a permissions thing, that was s3 with me trying to access through the web . Below is log file for cfg as downloaded from aws.

...
{"ts":1753192603,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"copy \u003c2025-07-14T20:50:22Z/cfg/journal/WiredTigerLog.0000000476.zst\u003e to \u003c/data/db/journal/WiredTigerLog.0000000476\u003e"}
{"ts":1753192603,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"copy \u003c2025-07-14T20:50:22Z/cfg/key.db/WiredTigerLog.0000000094.zst\u003e to \u003c/data/db/key.db/WiredTigerLog.0000000094\u003e"}
{"ts":1753192603,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"copy \u003c2025-07-14T20:50:22Z/cfg/key.db/WiredTigerLog.0000000095.zst\u003e to \u003c/data/db/key.db/WiredTigerLog.0000000095\u003e"}
{"ts":1753192603,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"download stat: buf 2147483648, arena 268435456, span 33554432, spanNum 8, cc 8, [{1 0} {1 0} {1 0} {1 0} {1 0} {1 0} {0 0} {1 0}]"}
{"ts":1753192603,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"preparing data"}
{"ts":1753192605,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"oplogTruncateAfterPoint: {1752526255 112}"}
{"ts":1753192606,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"recovering oplog as standalone"}
{"ts":1753192609,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"clean-up and reset replicaset config"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmAgents'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmBackups'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmLog'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmLock'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmPITRChunks'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmCmd'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmRestores'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmPITR'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmLock'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmLockOp'"}
{"ts":1753192611,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"dropping 'admin.pbmOpLog'"}
{"ts":1753192613,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"restore on node succeed"}
{"ts":1753192613,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"moving to state done"}
{"ts":1753192613,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.done\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192613,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"waiting for `done` status in rs map[.pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017:{}]"}
{"ts":1753192619,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs.done\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192619,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"waiting for shards map[.pbm.restore/2025-07-22T13:54:35.803952785Z/rs.amfam/rs:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs:{} .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.rs0/rs:{}]"}
{"ts":1753192638,"s":1,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"pitr","eobj":"","ep":{"T":0,"I":0},"msg":"init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: psmdb-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.35.228:27017: connect: connection refused }, { Addr: psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.25.148:27017: connect: connection refused }, { Addr: psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.58.52:27017: connect: connection refused }, ] }"}
{"ts":1753192713,"s":1,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"pitr","eobj":"","ep":{"T":0,"I":0},"msg":"init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: psmdb-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.35.228:27017: connect: connection refused }, { Addr: psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.25.148:27017: connect: connection refused }, { Addr: psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.58.52:27017: connect: connection refused }, ] }"}
{"ts":1753192716,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.hb\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192716,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs.hb\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192716,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/cluster.hb\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192788,"s":1,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"pitr","eobj":"","ep":{"T":0,"I":0},"msg":"init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: psmdb-dev-reports-psm-cfg-0.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.35.228:27017: connect: connection refused }, { Addr: psmdb-dev-reports-psm-cfg-1.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.25.148:27017: connect: connection refused }, { Addr: psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp 10.27.58.52:27017: connect: connection refused }, ] }"}
{"ts":1753192836,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/node.psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017.hb\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192836,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/rs.cfg/rs.hb\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192836,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"uploading \".pbm.restore/2025-07-22T13:54:35.803952785Z/cluster.hb\" [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]"}
{"ts":1753192839,"s":4,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"rm tmp conf"}
{"ts":1753192839,"s":1,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"restore","eobj":"2025-07-22T13:54:35.803952785Z","ep":{"T":1753192109,"I":1},"opid":"687f981bc2377423e78da9e2","msg":"restore: moving to state done: wait for shards: check heartbeat in .pbm.restore/2025-07-22T13:54:35.803952785Z/rs.rs0/rs.hb: stuck, last beat ts: 1753192596"}
{"ts":1753192839,"s":3,"rs":"cfg","node":"psmdb-dev-reports-psm-cfg-2.psmdb-dev-reports-psm-cfg.psmdb-dev-reports.svc.cluster.local:27017","e":"","eobj":"","ep":{"T":0,"I":0},"msg":"change stream was closed"}

Topic		Replies	Views
Physical restore never finish Percona Backup for MongoDB mongodb , kubernetes , psmdb-operator , pbm	3	185	April 13, 2025
Pbm restore steps Percona Backup for MongoDB	14	197	November 18, 2024
Error Percona Restore Percona Backup for MongoDB	14	2063	April 15, 2021
Pbm mongo restore on aks Percona Backup for MongoDB percona	5	1361	December 17, 2021
Restore from pbm hangs after some time Percona Backup for MongoDB	3	89	February 18, 2025

PBM Physical Restore in K8s Not Working

Related topics