PBM restore of physical or incremental backup in docker container

Hi,

We’re running both percona mongodb and PBM both in docker containers – one in each. I’ve been experimenting with both physical and incremental backups and I think it may be a good option for us, but I’m trying to figure out how to do a restore.

It appears that in order to do so they would somehow need to be running in the same container… is that correct? How would I go about making that work in that case, are there requirements for how it would run? e.g. I thought about just using supervisor to run both in the same container, but if it’s stopping mongodb and performing things and then starting it again how would that work?

This seems like such a basic thing that I’m kinda surprised that I can’t find any documentation or even reference to it. I assume someone must have solved it before if it works with the percona mongodb kubernetes operator?

Hi @Richard_Bateman1,

As for now, we do not yet have documentation for using PBM in a virtualized environment.
Percona Operator for MongoDB is a good option. It knows and does the dirty work for you.

Anyway, let me help you

During physical/incremental restore, pbm-agents run mongod (using exec) to perform data preparation. If pbm-agent runs within the same linux namespace, it uses the same mongod. However, like in your case, when pbm-agent runs in an isolated namespace (another container), it cannot run mongod. You must provide a mongod binary to the same namespace so pbm-agent can run it.

For example, you can build a docker image for pbm-agent with mongod or mount a volume with the binary.

The restore will look like:

  1. PBM stops the mongod process.
    It can be run in different namespace/container. PBM invokes db.shutdown() (for each node).
    The process should not be restarted on exit 0. Please ensure proper restart-policy
  2. PBM replaces dbpath content (from a backup)
  3. PBM runs mongod with --dbpath to the db path to prepare data.
    The mongod bin should be available through filesystem so the exec command can load it to memory
  4. at the end, PBM stops all mongod and pbm-agent processes.
  5. the cluster is ready to start (by an admin/operator. PBM does not start it)

Example of docker image for percona/percona-server-mongodb:6.0-based cluster:

FROM golang:1.19
WORKDIR /build

RUN apt-get update -y && apt-get install -y libkrb5-dev
COPY . .
RUN make

FROM percona/percona-server-mongodb:6.0

COPY --from=0 /go/bin/pbm /go/bin/pbm-agent /usr/local/bin/
RUN export PATH=$PATH:/usr/local/bin

USER 1001
CMD ["pbm-agent"]

The example is just for informative purposes. It isn’t verified as ready for production

Understood – thanks, that helps.

As an aside, a slightly easier way to build that docker container which doesn’t require the source to PBM is:

FROM percona/percona-backup-mongodb:2.0.5

FROM percona/percona-server-mongodb:6.0

COPY --from=0 /usr/bin/pbm /usr/bin/pbm-agent /usr/bin/
1 Like

I’ve got it running a restore now using the container I built using the script provided… except it’s 4.4, because we haven’t upgraded it yet.

Question about incremental backups – is each incremental a diff from the base or is each one incremental from the last incremental? In other words, if I do a weekly incremental base and a daily incremental backup and then restore from, say, Wednesday, does it need to download the base and every day up through Wednesday or just the base and Wednesday?

diff from previous. like git history. all previous will be applied one by one. for big dataset it is really fast and economy

but you can always start new chain with –base flag

Understood – so fewer base files is better for keeping files small (a full physical backup of my db is > 200G), but worse for quick restores.

That’s what I expected, but I wanted to verify. I’m running a test restore now – it looks like it’s working, though it is currently only a single member replicaset. We’ll see how this goes =]

I appreciate the help!

for multi node replset it should work faster than logical. PBM replaces dbpath for each rs member. on start, mongod does not do init-sync (stream collections and oplog from primary to secondaries in logical format).

I’m not sure if this should be a new topic or if it’s related somehow to using it within a container…

I tried the restore over the weekend but the restore failed. The full log can be found at SignalStuff PrivateBin

The most relevant bits of the log with a little surrounding are:

2023-04-15T06:18:20.000+0000 I [restore/2023-04-14T19:15:45.184193361Z] copy <2023-04-14T04:37:40Z/insightset/index/4585-9101825161959954868.wt.zst.100663296-16777216> to </data/db/index/4585-9101825161959954868.wt>
2023-04-15T06:18:22.000+0000 I [restore/2023-04-14T19:15:45.184193361Z] copy <2023-04-14T04:37:40Z/insightset/index/4585-9101825161959954868.wt.zst.184549376-83886080> to </data/db/index/4585-9101825161959954868.wt>
2023-04-15T06:18:26.000+0000 I [restore/2023-04-14T19:15:45.184193361Z] copy <2023-04-14T04:37:40Z/insightset/journal/WiredTigerLog.0000000277.zst.0-104857600> to </data/db/journal/WiredTigerLog.0000000277>
2023-04-15T06:18:29.000+0000 I [restore/2023-04-14T19:15:45.184193361Z] copy <2023-04-14T04:37:40Z/insightset/journal/WiredTigerLog.0000000277.zst> to </data/db/journal/WiredTigerLog.0000000277>
2023-04-15T06:18:30.000+0000 I [restore/2023-04-14T19:15:45.184193361Z] copy <2023-04-14T04:37:40Z/insightset/journal/WiredTigerLog.0000000278.zst> to </data/db/journal/WiredTigerLog.0000000278>
2023-04-15T06:18:34.000+0000 I [restore/2023-04-14T19:15:45.184193361Z] preparing data
2023/04/15 06:18:35 mongod process: exit status 14, [1681539515:302757][287:0x7f9c66559540], txn-recover: __posix_open_file, 808: /data/db/key.db/./WiredTigerLog.0000000006: handle-open: open: No such file or directory
[1681539515:302800][287:0x7f9c66559540], txn-recover: __wt_txn_recover, 1109: Recovery failed: No such file or directory
[1681539515:303322][287:0x7f9c66559540], connection: __wt_cache_destroy, 364: cache server: exiting with 3 pages in memory and 0 pages evicted
[1681539515:303342][287:0x7f9c66559540], connection: __wt_cache_destroy, 370: cache server: exiting with 6574 bytes in memory
[1681539515:303347][287:0x7f9c66559540], connection: __wt_cache_destroy, 376: cache server: exiting with 6236 bytes dirty and 1 pages dirty
[1681539515:340225][287:0x7f9c66559540], txn-recover: __posix_open_file, 808: /data/db/key.db/./WiredTigerLog.0000000006: handle-open: open: No such file or directory
[1681539515:340236][287:0x7f9c66559540], txn-recover: __wt_txn_recover, 1109: Recovery failed: WT_ERROR: non-specific WiredTiger error
[1681539515:340499][287:0x7f9c66559540], connection: __wt_cache_destroy, 364: cache server: exiting with 2 pages in memory and 0 pages evicted
[1681539515:340506][287:0x7f9c66559540], connection: __wt_cache_destroy, 370: cache server: exiting with 6637 bytes in memory
[1681539515:340509][287:0x7f9c66559540], connection: __wt_cache_destroy, 376: cache server: exiting with 6468 bytes dirty and 1 pages dirty
[1681539515:381006][287:0x7f9c66559540], txn-recover: __posix_open_file, 808: /data/db/key.db/./WiredTigerLog.0000000006: handle-open: open: No such file or directory
[1681539515:381016][287:0x7f9c66559540], txn-recover: __wt_txn_recover, 1109: Recovery failed: WT_ERROR: non-specific WiredTiger error
[1681539515:381311][287:0x7f9c66559540], connection: __wt_cache_destroy, 364: cache server: exiting with 2 pages in memory and 0 pages evicted
[1681539515:381318][287:0x7f9c66559540], connection: __wt_cache_destroy, 370: cache server: exiting with 6637 bytes in memory
[1681539515:381321][287:0x7f9c66559540], connection: __wt_cache_destroy, 376: cache server: exiting with 6468 bytes dirty and 1 pages dirty
2023-04-15T06:23:35.000+0000 D [restore/2023-04-14T19:15:45.184193361Z] rm tmp conf
2023-04-15T06:23:35.000+0000 D [restore/2023-04-14T19:15:45.184193361Z] clean-up dbpath

The issue seems to be that /data/db/key.db/./WiredTigerLog.0000000006 doesn’t exist – and indeed that filename does not show up anywhere in the log file (WiredTigerLog.0000000005 does) but I do see a file 2023-04-08T18:50:23Z/insightset/key.db/WiredTigerLog.0000000006.zst in the base incremental backup snapshot.

There were some failed incremental backup attempts in between the one I restored and the base.

Any ideas where I should start troubleshooting?

An additional datapoint that may or may not matter; the missing file is only 859 bytes compressed, but expands to much much larger than that.

-rw-r--r--  1 ops  ops  5242880 Apr  9 03:32 WiredTigerLog.0000000006
-rw-r--r--  1 ops  ops      859 Apr  9 03:32 WiredTigerLog.0000000006.zst

That seems like it would mean that the file is mostly empty, but I don’t know if that’s normal or what.

Hi @Richard_Bateman1

New PBM version 2.1.0 was released on April 18, which supposed to have this issue fixed. However please note that incremental backups made with previous PBM versions are incompatible for the restore with PBM 2.1.0, so it would be required to make new backups. Please try new version and inform if there are any issues.