Error: get metadata: get: context deadline exceeded

Deimantas_Le · May 13, 2025, 6:30pm

Initially my restore script was working fine, however after a few weeks it stopped working and provides the below error - I was not able to find any results or solutions

I’m using local storage with sshfs mounted storage box for this, so I’ve thought it might be a storage bottleneck however that seems not to be the case

 Starting restore 2025-05-13T18:28:08.266159982Z to point-in-time 2025-05-13T14:42:31 from '2025-04-24T12:01:21Z'2025-05-13T18:28:08.000+0000 D [pitr] error while executing pitr, pitr procedure will be restarted
mongo-1    | 2025-05-13T18:28:09.000+0000 I got command restore [name: 2025-05-13T18:28:08.266159982Z, snapshot: 2025-04-24T12:01:21Z point-in-time: <1747147351,0>] <ts: 1747160888>, opid: 68238f3809fe636ddb8c3063
mongo-1    | 2025-05-13T18:28:09.000+0000 I got epoch {1747160554 6105}
mongo-1    | 2025-05-13T18:28:09.000+0000 I [restore/2025-05-13T18:28:08.266159982Z] to time: 2025-05-13T14:42:31Z
mongo-1    | 2025-05-13T18:28:09.000+0000 I [restore/2025-05-13T18:28:08.266159982Z] oplog slicer disabled
mongo-1    | 2025-05-13T18:28:09.000+0000 I [restore/2025-05-13T18:28:08.266159982Z] backup: 2025-04-24T12:01:21Z
mongo-1    | 2025-05-13T18:28:09.000+0000 I [restore/2025-05-13T18:28:08.266159982Z] recovery started
mongo-1    | 2025-05-13T18:28:09.000+0000 D [restore/2025-05-13T18:28:08.266159982Z] waiting for 'starting' status
mongo-1    | .........2025-05-13T18:28:17.000+0000 D [pitr] stop pitr hb
mongo-1    | 2025-05-13T18:28:17.000+0000 D [pitr] stop pitr agent activity monitor
mongo-1    | 2025-05-13T18:28:17.000+0000 D [pitr] stop pitr topo monitor
mongo-1    | 2025-05-13T18:28:17.000+0000 D [pitr] stop pitr config monitor
mongo-1    | 2025-05-13T18:28:17.000+0000 D [pitr] stop pitr error monitor
mongo-1    | Error: get metadata: get: context deadline exceeded

Ivan_Groenewold · May 14, 2025, 11:31am

It seems some of your pbm-agents might be stuck. Can you try restarting them? which version of pbm are you using?

Deimantas_Le · May 14, 2025, 12:25pm

Hey, thanks for following up on this - yes mongo and the agent has been restarted, and I already tried fully wiping the environment and starting from scratch

After some debugging, i figured out that:

it’s not an issue with mongodb connection
it doesn’t seem to be an issue with the storage location
there are no locks in mongo
I’m using 3.9.1, and this is the same version I have been using before, and this is also reflected in .pbm.init, so that didn’t change

then I’ve tried running the restore without PITR like this:
pbm restore $latest_snapshot -w

and that worked - so I assume there is something corrupted with the PITR, but how do I debug this? there is no helpful output given from the logs, and the error doesn’t immediately make sense in relation to this

EDIT: also there is only one mongo instance in the cluster:

rs0 [direct: primary] test> ... ... ... ... PBM Config found: [
  {
    "_id": "68248a1574f6cb3403da0a87",
    "storage": {
      "type": "filesystem",
      "filesystem": {
        "path": "/tmp/backup-mount/mongo-backup"
      }
    },
    "pitr": {
      "enabled": true,
      "oplogSpanMin": 10
    },
    "epoch": {
      "$timestamp": "7504275762761957377"
    }
  }
]

Ivan_Groenewold · May 15, 2025, 12:01pm

you can look at pbm-agent log to see more details about what’s going on exactly with PITR. If you suspect a bug please open a report at jira.percona.com

Deimantas_Le · May 15, 2025, 2:23pm

Yes we could submit a bug report

However, I’m not entirely sure what I would go about putting into the bug report? The only output I have is what is shown above, there isn’t any more detailed logs, the log itself seems to be related to a go http request, but when testing it seems to be related to PITR, so I don’t understand what caused it - that’s the reason I’m asking here to see if anyone encountered this before. We only had PBM in production for a month or so and already having issues.

What complicates matters is that there is no utility (as far as I’m aware) that is made available to test the integrity of the backup slices, or any way to debug this, so I can’t confirm what caused this.

Ivan_Groenewold · May 15, 2025, 2:26pm

start by looking at the pbm agent logs on each node:
journalctl -u pbm-agent -f
there is more detailed output there than you get by running pbm logs alone.

Topic		Replies	Views
Pbm restore failed: context deadline exceeded Percona Backup for MongoDB	3	1503	June 20, 2023
Error when Percona restore- got context deadline exceeded (Client.Timeout or context cancellation while reading body) Percona Backup for MongoDB closed-no-reply	0	657	November 9, 2023
Backup Error: get: context deadline exceeded Percona Backup for MongoDB pbm	4	790	August 1, 2024
retryChunk got copy: context deadline exceeded Percona Backup for MongoDB	2	571	June 12, 2025
PBM Restore failed Percona Backup for MongoDB	4	917	May 18, 2023

Error: get metadata: get: context deadline exceeded

Related topics