Restore on a local docker deployment

Hi,

I’ve setup Percona Server for MongoDB on a bare metal kubernetes cluster and things are mostly working fine. I’ve enabled storage on an S3 compatible storage and use SSE-C to encrypt the backups.

On the cluster i’ve tried different recovery scenarios and everything is working as I want which is great.

However, i’m running into some troubles trying to restore locally from the S3 storage

What i’ve done is setup is:

  • create a readonly s3 key so the restore absolutely does not impact the backups
  • start a local percona-server-mongodb:8.0.19-7 docker container (same as kubernetes cluster)
  • start a local percona-backup-mongodb:2.12.0 docker container with a config pointing to the prod S3 storage with the readonly user (same as kubernetes cluster)

The issue is that pbm list by default shows nothing, so I try to force a resync to update the metadata but I get a permission error

docker exec -it pbm pbm config --force-resync --wait
Error: waiting for resync [opid "6a22d1645c1d961583a1e455"]: resync: reinit storage: delete init file: delete 'mongo-backups/.pbm.init' file from S3: operation error S3: DeleteObject, https response error StatusCode: 403, RequestID: tx8c2b460608fe497a85ba8-006a22d165, HostID: tx8c2b460608fe497a85ba8-006a22d165, api error AccessDenied: Access Denied.

I understand that pbm is trying to delete .pbm.init but I’m not too keen on the restore being able to modify the storage. It feels like I’m missing something, is it normal for the resync to modify the pbm.init? any suggestions on how to approach this?

Thanks!

P.S: percona/percona-backup-mongodb - Docker Image documentation isn’t correct, I believe the commands at the bottom should be docker exec and not docker run


percona config

storage:
  type: s3
  s3:
    region: gra
    endpointUrl: https://s3.gra.io.cloud.ovh.net
    forcePathStyle: true
    bucket: mongo-backups
    credentials:
      access-key-id: redacted
      secret-access-key: redacted
    serverSideEncryption:
      sseCustomerAlgorithm: AES256
      sseCustomerKey: redacted

docker commands

docker network create pbm

docker run --network pbm --name pbm_mongo -d percona/percona-server-mongodb:8.0.19-7 --replSet=rs0
docker exec -it pbm_mongo mongosh --eval='rs.initiate()'

docker run --network pbm --name pbm --mount type=bind,source=$(pwd)/pbm_config.yaml,target=/tmp/pbm_config.yaml,readonly=true -e PBM_MONGODB_URI="mongodb://pbm_mongo:27017" -d percona/percona-backup-mongodb:2.12.0
docker exec -it pbm pbm config --file /tmp/pbm_config.yaml
docker exec -it pbm pbm config --force-resync --wait

EDIT: looking at the code write is necessary – my question is can I resync or at least import locally in read only?

// Resync sync oplog, backup, and restore meta from provided storage.
//
// It checks for read and write permissions, drops all meta from the database
// and populate it again by reading meta from the storage.
func Resync(
	ctx context.Context,
	conn connect.Client,
	cfg *config.StorageConf,
	node string,
	includeRestores bool,
) error {
	l := log.LogEventFromContext(ctx)

	stg, err := util.StorageFromConfig(cfg, node, l)
	if err != nil {
		return errors.Wrap(err, "unable to get backup store")
	}

	err = storage.HasReadAccess(ctx, stg)
	if err != nil {
		if !errors.Is(err, storage.ErrUninitialized) {
			return errors.Wrap(err, "check read access")
		}

		err = util.Initialize(ctx, stg)
		if err != nil {
			return errors.Wrap(err, "init storage")
		}
	} else {
		// check write permission and update PBM version
		err = util.Reinitialize(ctx, stg)
		if err != nil {
			return errors.Wrap(err, "reinit storage")
		}
	}

	err = SyncBackupList(ctx, conn, cfg, "", node)
	if err != nil {
		l.Error("failed sync backup metadata: %v", err)
	}

	err = resyncOplogRange(ctx, conn, stg)
	if err != nil {
		l.Error("failed sync oplog range: %v", err)
	}

	err = resyncPhysicalRestores(ctx, conn, stg, includeRestores)
	if err != nil {
		l.Error("failed sync physical restore metadata: %v", err)
	}

	return nil
}

Hi, thanks for reaching out. Unfortunately you ran into a known issue, for which the engineering team will need provide a fix. I suggest you add +1 in the ticket so it can get prioritized: https://perconadev.atlassian.net/browse/PBM-1480

Thanks @Ivan_Groenewold it’s exactly that, I didn’t see the issue and original post when I looked for it. I will see if I can make a dirty patch for the time being so I can actually use my backups

In the meantime there is a workaround that engineering came up with:

Configure 2 storage profiles in the target cluster:

  • dummy (empty) main storage
  • profile with configuration to access storage of the source cluster (Read-only perms)

give it a shot and let me know if that works

The suggestion to use the storage profile doesn’t support PITR from the looks of it so not ideal for my scenario.

I just did a small dirty patch

diff --git a/pbm/resync/rsync.go b/pbm/resync/rsync.go
index c93b816f..176e0bae 100644
--- a/pbm/resync/rsync.go
+++ b/pbm/resync/rsync.go
@@ -52,10 +52,10 @@ func Resync(
                }
        } else {
                // check write permission and update PBM version
-               err = util.Reinitialize(ctx, stg)
-               if err != nil {
-                       return errors.Wrap(err, "reinit storage")
-               }
+               // err = util.Reinitialize(ctx, stg)
+               // if err != nil {
+               //      return errors.Wrap(err, "reinit storage")
+               // }
        }
 
        err = SyncBackupList(ctx, conn, cfg, "", node)

then rebuild

make build

then start the agent

export PBM_MONGODB_URI="mongodb://172.20.0.2:27017/?directConnection=true"
./pbm-agent

then sync and restore

./pbm config --force-resync --wait
Storage resync finished

$ ./pbm list
Backup snapshots:
  2026-06-04T19:51:32Z <logical> [restore_to_time: 2026-06-04T19:58:04]

PITR <off>:
  2026-06-04T19:58:05 - 2026-06-05T15:00:30

$ ./pbm restore --time=2026-06-05T15:00:30
Starting restore 2026-06-05T15:04:19.101748157Z to point-in-time 2026-06-05T15:00:30 from '2026-06-04T19:51:32Z'...............Error: get metadata: get: context deadline exceeded

I get a Error: get metadata: get: context deadline exceeded but the agent is happily chugging a away and restoring (painfully slowly)

glad to hear you were able to workaround. For speeding up restore you can play with Restore options - Percona Backup for MongoDB and Restore options - Percona Backup for MongoDB among other options. However keep in mind logical restore is slow. Physical backup/restore is usually recommended if you have a significant amount of data (more than a few GB).

Thanks @Ivan_Groenewold, I will look into this. I will try not to hijack this thread but just out of curiosity would those kind of metrics make sense? I thought it might have been S3 decode/download issue but I can happily download at 10MiB/s with rclone. And the host system is very far from being overwhelmed

Each oplog slice is around ~20-25MB compressed (s2) except the first which is a bit bigger for a 15 minutes windows

  • pbmPitr/rs0/20260604/20260604195804-201.20260604202520-50.oplog.s2 (38.49 MB) – applied in ~17 minutes
  • pbmPitr/rs0/20260604/20260604202520-50.20260604204020-63.oplog.s2 (21.20 MB) – applied in 10 minutes
2026-06-05T16:18:12.000+0100 I [restore/2026-06-05T15:04:19.101748157Z] starting oplog replay
2026-06-05T16:18:12.000+0100 D [restore/2026-06-05T15:04:19.101748157Z] + applying {rs0 2026-06-04T19:51:32Z/rs0/oplog/20260604195133-104.20260604195804-201.gz gzip {1780602693 104} {1780603084 201} 7329304}

2026-06-05T16:20:10.000+0100 D [restore/2026-06-05T15:04:19.101748157Z] + applying {rs0 pbmPitr/rs0/20260604/20260604195804-201.20260604202520-50.oplog.s2 s2 {1780603084 201} {1780604720 50} 38486786}
2026-06-05T16:23:56.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 8388608-16777215
2026-06-05T16:27:56.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 16777216-25165823
2026-06-05T16:31:44.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 25165824-33554431
2026-06-05T16:35:22.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 33554432-41943039

2026-06-05T16:37:48.000+0100 D [restore/2026-06-05T15:04:19.101748157Z] + applying {rs0 pbmPitr/rs0/20260604/20260604202520-50.20260604204020-63.oplog.s2 s2 {1780604720 50} {1780605620 63} 21197197}
2026-06-05T16:41:40.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 8388608-16777215
2026-06-05T16:45:21.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 16777216-25165823
2026-06-05T16:47:21.000+0100 D [restore/2026-06-05T15:04:19.101748157Z] + applying {rs0 pbmPitr/rs0/20260604/20260604204020-63.20260604205520-73.oplog.s2 s2 {1780605620 63} {1780606520 73} 22525770}
2026-06-05T16:51:03.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 8388608-16777215
2026-06-05T16:53:45.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 16777216-25165823
2026-06-05T16:55:29.000+0100 D [restore/2026-06-05T15:04:19.101748157Z] + applying {rs0 pbmPitr/rs0/20260604/20260604205520-73.20260604211020-56.oplog.s2 s2 {1780606520 73} {1780607420 56} 23572346}
2026-06-05T16:58:51.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 8388608-16777215
2026-06-05T17:02:23.000+0100 W [restore/2026-06-05T15:04:19.101748157Z] failed to download chunk 16777216-25165823
2026-06-05T17:04:17.000+0100 D [restore/2026-06-05T15:04:19.101748157Z] + applying {rs0 pbmPitr/rs0/20260604/20260604211020-56.20260604212520-62.oplog.s2 s2 {1780607420 56} {1780608320 62} 22831040}

Edit: I find those logs a bit odd. The documentation you link specifies that the chunks are of 32MB but here it’s clearly trying to get 8MB chunks and I haven’t changed the config

Unfortunately oplog apply is single threaded. Also during restore, typically cache starts cold which makes things even worse. So PITR is quite slow compared to regular replication oplog apply.

I opened a ticket for the eng team to review the chunk mismatch https://perconadev.atlassian.net/browse/PBM-1781 feel free to subscribe for updates