Pbm restore failed: context deadline exceeded

Hi! I’m getting errors with PBM PITR restore.
I really want to resolve this, please feel free to ask for any additional info, I’ll provide it ASAP
i also tried to upgrade to the latest 1.8 version of PBM, but I got the same error
All replicaSet nodes can access to s3 bucket and I can download pbm backup files with aws s3 with no problems

Setup:

ReplicaSet with 3 nodes. 1 primary, 2 secondary
Nodes: AWS EC2 t3.large

How i start restore:

pbm config --set pitr.enabled=false
pbm restore --time="2022-09-15T14:15:56"

After a while these errors occur:


Sep 15 15:01:19 ip-10-0-14-38 pbm-agent[444]: 2022-09-15T15:01:19.000+0000 W [pitrestore/2022-09-15T14:15:56Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 10s
Sep 15 15:01:29 ip-10-0-14-38 pbm-agent[444]: 2022-09-15T15:01:29.000+0000 I [pitrestore/2022-09-15T14:15:56Z] session recreated, resuming download
Sep 15 15:01:29 ip-10-0-14-38 pbm-agent[444]: 2022-09-15T15:01:29.000+0000 E [pitrestore/2022-09-15T14:15:56Z] download 'private-bucket-for-backups/pbmPitr/meta-rs0/20220915/20220915140017-26.20220915141032-159.oplog.snappy' file from S3: context deadline exceeded (Client.Timeout or context cancellation while reading body)
Sep 15 15:01:29 ip-10-0-14-38 pbm-agent[444]: 2022-09-15T15:01:29.000+0000 E [pitrestore/2022-09-15T14:15:56Z] restore: replay chunk 1663250417.1663251032: apply oplog for chunk: download 'private-bucket-for-backups/pbmPitr/meta-rs0/20220915/20220915140017-26.20220915141032-159.oplog.snappy': context deadline exceeded (Client.Timeout or context cancellation while reading body)

FULL log from the start of PBM restore: Sep 15 14:49:53 ip-10-0-14-38 pbm-agent[444]: 2022-09-15T14:49:53.000+0000 I got - Pastebin.com

Configs:
uname:

Linux ip-10-0-14-38 5.13.0-1019-aws #21~20.04.1-Ubuntu SMP Wed Mar 16 11:54:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

pbm version:

Version:   1.6.1
Platform:  linux/amd64
GitCommit: aebbc24a03f21f9ddee6ebb72f510ff1d1c58fdc
GitBranch: release-1.6.1
BuildTime: 2021-11-03_12:12_UTC
GoVersion: go1.16.9

pbm-agent config:

pitr:
  enabled: true
  oplogSpanMin: 10
storage:
  type: s3
  s3:
    region: eu-north-1
    bucket: private-bucket-for-backups
    prefix: data/pbm/backup/meta
    credentials:
      access-key-id: <access key>
      secret-access-key: <secret key>

pbm list --full:

Backup snapshots:
  2022-09-15T13:03:14Z [complete: 2022-09-15T13:03:19]
  2022-09-15T14:00:10Z [complete: 2022-09-15T14:00:17]
  2022-09-15T14:21:48Z [complete: 2022-09-15T14:21:56]

PITR <off>:
  2022-09-15T13:03:20 - 2022-09-15T14:22:17

  meta-rs0: [{ 2022-09-15T13:03:19 - 2022-09-15T14:22:17}]

mongod config:

storage:
  oplogMinRetentionHours: 1
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

net:
  port: 27017
  bindIp: 0.0.0.0

processManagement:
  timeZoneInfo: /usr/share/zoneinfo

replication:
    replSetName: "meta-rs0"
security:
    authorization: enabled
    keyFile: /opt/mongo_rs_key_file

MongoDB:
No errors in MongoDB logs, just common info records

1 Like

I am having the same issue. Were you able to resolve it? Can you share your findings?
For me it restores the entire cluster but fails in the end while trying to restore local.oplog.rs.bson.gz

Hi @sohahm ,

What PBM version do you use? If it was a bug, it could be fixed in the latest version.

But in the original description, I see a timeout during the download - network issue.

I am facing the same issue while downloading from GCS bucket during restoration.