Physical restore never finish

Hi everyone

Here is my configuration of a test cluster for backup/restore/update:

Kubernetes : 1.29.10
Helm chart psmdb-operator : 1.18.0 (1.19.0 impossible to deploy)
Helm Chart psmdb-db : 1.18.0 (1.19.0 impossible to deploy)
Mongod : 6.0.19-16-multi
Operator : 1.18.0
PBM : 2.7.0-multi (2.8.0-multi error with index for restore)

In total 9 pods for 1 replicatset : 1 operator, 3 cfg, 3 rs0, 2 mongos router

So I want to do a complete restore of my mongos dbs.
For that I want to do it from a complete physical backup that is made, existing of 22Gio.

But if I launch a restoration with manifest K8s, it works at the beginning then after it never stops, I left running 24 hours and still the status of the restore in running!

Here is the manifest k8s file that I used:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: xxxxx-restaure-physical-full-test-1
spec:
  clusterName: "psmdb-db-test"
  backupSource:
    type: "physical"
    destination: "s3://xxxxx-mongo-backup-physical-test/2025-02-04T05:30:21Z"
    s3:
      credentialsSecret: "pbm-mongodb-xxxxxx-xx"
      region: "fr-par"
      bucket: "xxxxx-mongo-backup-physical-test"
      endpointUrl: "https://s3.fr-par.xxx.xxxx"

And here is what happens after applying the manifest with kubectl:

  1. The restore is in status waiting
  2. The PBM containers of each pod stop
  3. The router mongos stop both
  4. The restore is in status running
  5. The restoration begins …

The kubectl describe my-restore:

Name:         xxxxx-restaure-physical-full-test-1
Namespace:    percona-mongodb-test
Labels:       <none>
Annotations:  <none>
API Version:  psmdb.percona.com/v1
Kind:         PerconaServerMongoDBRestore
Metadata:
  Creation Timestamp:  2025-02-04T16:53:06Z
  Generation:          1
  Resource Version:    41116260131
  UID:                 ed9749e0-6850-4b6f-b757-f81043750ed3
Spec:
  Backup Source:
    Destination:  s3://xxxxx-mongo-backup-physical-test/2025-02-04T05:30:21Z
    s3:
      Bucket:              xxxxx-mongo-backup-physical-test
      Credentials Secret:  pbm-mongodb-xxxxxx-xx
      Endpoint URL:        https://s3.fr-par.xxx.xxxx
      Region:              fr-par
    Type:                  physical
  Cluster Name:            psmdb-db-xxxxx
Status:
  Pbm Name:  2025-02-04T17:08:15.142876273Z
  State:     running
Events:      <none>

And now the log of psmdb-operator :

2025-02-05T09:31:11.911Z	DEBUG	PBM restore status	{"controller": "psmdbrestore-controller", "object": {"name":"xxxxx-restaure-physical-full-test-1","namespace":"percona-mongodb-test"}, "namespace": "percona-mongodb-test", "name": "xxxxx-restaure-physical-full-test-1", "reconcileID": "b9491d27-98f2-4fdc-a57d-4e1bcc98c0d7", "status": {"type":"physical","opid":"","name":"2025-02-04T17:08:15.142876273Z","replsets":[{"name":"cfg","start_ts":0,"status":"done","last_transition_ts":1738689045,"first_write_ts":{"T":0,"I":0},"last_write_ts":{"T":0,"I":0},"node":"","conditions":null},{"name":"rs0","start_ts":0,"status":"down","last_transition_ts":1738688969,"first_write_ts":{"T":0,"I":0},"last_write_ts":{"T":0,"I":0},"node":"","conditions":null}],"compression":"","store":{"type":""},"size":0,"mongodb_version":"","fcv":"","start_ts":0,"last_transition_ts":1738688924,"first_write_ts":{"T":0,"I":0},"last_write_ts":{"T":0,"I":0},"hb":{"T":0,"I":0},"status":"running","conditions":null,"n":null,"pbm_version":"","balancer":""}}

Can you help me understand what the problem is please and see how I can stop the physical restoration properly to return to a nominal operating state of the cluster?

Thank you

Hi @fabien.hannecart,

More than 24 hours for 22GB doesn’t sound normal to me. What’s the status of your pods? If you exec into one of the mongod containers and run ps faux, do you see mongod and pbm-agent processes? Also, there’s a /tmp/pbm-agent.log in mongod containers during physical restore, that might include helpful information to debug.

1 Like