PBM restore takes long time

aranjith0 · May 16, 2023, 4:02pm

Team,

I’m doing selective restore - restoring a single collection (6GB) from a FULL backup (2.5TB).
its keep running for 11+ hrs (still progressing).

Is this an expected behaviour. Please help me to fix this.

command: pbm restore 2023-05-09T14:56:30Z --ns=ctr.trademarks.gecn 2>&1

[mongod@ip-10-80-11-253 ~]$ pbm status
Cluster:

shard1ReplSet:

shard1ReplSet/10.80.11.253:27038 [P]: pbm-agent v2.1.0 OK
configReplSet:
configReplSet/10.80.11.253:27039 [P]: pbm-agent v2.1.0 OK
shard3ReplSet:
shard3ReplSet/10.80.10.159:27038 [P]: pbm-agent v2.1.0 OK
shard2ReplSet:
shard2ReplSet/10.80.10.186:27038 [P]: pbm-agent v2.1.0 OK

PITR incremental backup:

Status [OFF]

Currently running:

Snapshot restore “2023-05-09T14:56:30Z”, started at 2023-05-16T04:30:23Z. Status: oplog restore. [op id: 646306df173cdb6959b600ac]

Backups:

S3 us-east-1 s3://xxxxxxxxxx/pbm/backup/
Snapshots:
2023-05-12T07:58:25Z 92.68KB <logical, selective> [restore_to_time: 2023-05-12T07:58:33Z]
2023-05-12T07:49:04Z 6.53GB <logical, selective> [restore_to_time: 2023-05-12T07:54:17Z]
2023-05-12T06:56:45Z 75.56KB <logical, selective> [restore_to_time: 2023-05-12T06:56:52Z]
2023-05-11T14:17:04Z 752.69GB <logical, selective> [restore_to_time: 2023-05-11T17:33:36Z]
2023-05-09T14:56:30Z 2.68TB [restore_to_time: 2023-05-10T01:41:15Z]

[mongod@ip-10-80-11-253 ~]$ pbm logs
2023-05-16T04:30:23Z I [shard1ReplSet/10.80.11.253:27038] got epoch {1684211244 3}
2023-05-16T04:30:23Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] backup: 2023-05-09T14:56:30Z
2023-05-16T04:30:23Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] restore started
2023-05-16T04:30:24Z I [configReplSet/10.80.11.253:27039] [restore/2023-05-16T04:30:23.527718212Z] moving to state running
2023-05-16T04:30:24Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state running
2023-05-16T04:30:24Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state running
2023-05-16T04:30:27Z I [configReplSet/10.80.11.253:27039] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:47Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:52Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:55Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:32:56Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:57Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:32:57Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:32:58Z I [configReplSet/10.80.11.253:27039] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:33:53Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] oplog replay finished on {1683682875 266}
2023-05-16T04:33:53Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] restore finished successfully
2023-05-16T04:35:58Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] oplog replay finished on {1683682875 261}
2023-05-16T04:35:58Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] restore finished successfully
2023-05-16T04:40:36Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] oplog replay finished on {1683682875 267}
2023-05-16T04:40:36Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] restore finished successfully

Ivan_Groenewold · May 17, 2023, 10:54am

Hi, I don’t see any problems in the logs you provided. Do you see the data from the restored collection is complete? You might want to open a bug in our jira.percona.com so the dev team can take a look

aranjith0 · May 17, 2023, 11:00am

Hi @Ivan_Groenewold

No, Restore of specific collection didn’t complete.
I see from the logs that data shards finished restore in 5min but the config server is stuck on the oplog replay.
That restore being run for 24+ hrs and i had to kill that.
I already opened a bug [PBM-1120] PBM restore takes long time - Percona JIRA

aranjith0 · June 2, 2023, 7:48am

Hi @ Dmytro_Zghoba,

Can you please guide me how to proceed with this.

I couldn’t able to do the restore of specific collection/DB from full backup. This affects our prod env.
I see that,As per logs PBM agent has finished but didn’t update restore status.
the status is shows as “Status: oplog restore” for past 3 days, nothing revealed in the logs.

Topic		Replies	Views
Selective restore not working Percona Backup for MongoDB	4	122	January 21, 2025
Unable to restore the backup from any of the backup files using PBM agent! Percona Backup for MongoDB percona	1	55	April 22, 2025
Pbm restore fails Percona Backup for MongoDB percona , mongodb , pbm	5	1339	October 19, 2023
Pbm restore steps Percona Backup for MongoDB	14	171	November 18, 2024
Percona Backup for MongoDB is not restoring the incremental/base backup Percona Backup for MongoDB mongodb , pbm	4	803	June 5, 2023

PBM restore takes long time

[mongod@ip-10-80-11-253 ~]$ pbm status Cluster:

PITR incremental backup:

Currently running:

Backups:

Related topics

[mongod@ip-10-80-11-253 ~]$ pbm status
Cluster: