Hello everyone,
I’m currently working on setting up a new MongoDB cluster with 3 nodes for development and testing purposes. All three nodes are hosted on the same machine. I’m using Docker to run this MongoDB cluster along with containers for percona-backup-mongodb. Below is my docker-compose.yml file:
version: "3"
services:
rs101:
image: percona/percona-server-mongodb:6.0
container_name: rs101
hostname: rs101
ports:
- "27017:27017"
networks:
- db-network
command: "--port=27017 --replSet rs"
rs102:
image: percona/percona-server-mongodb:6.0
container_name: rs102
hostname: rs102
ports:
- "28017:28017"
networks:
- db-network
command: "--port=28017 --replSet rs"
rs103:
image: percona/percona-server-mongodb:6.0
container_name: rs103
hostname: rs103
ports:
- "29017:29017"
networks:
- db-network
command: "--port=29017 --replSet rs"
percona-backup-mongod-1:
container_name: pbm1
networks:
- db-network
environment:
- PBM_MONGODB_URI=mongodb://pbmuser:secretpwd@rs101:27017
image: percona/percona-backup-mongodb:latest
percona-backup-mongodb-2:
container_name: pbm2
networks:
- db-network
environment:
- PBM_MONGODB_URI=mongodb://pbmuser:secretpwd@rs102:28017
image: percona/percona-backup-mongodb:latest
percona-backup-mongodb-3:
container_name: pbm3
networks:
- db-network
environment:
- PBM_MONGODB_URI=mongodb://pbmuser:secretpwd@rs103:29017
image: percona/percona-backup-mongodb:latest
rs-init:
image: percona/percona-server-mongodb:6.0
container_name: rs-init
restart: "no"
networks:
- db-network
depends_on:
- rs101
- rs102
- rs103
command: >
mongosh --host rs101:27017 --eval
'
config = {
"_id" : "rs",
"members" : [
{
"_id" : 0,
"host" : "rs101:27017"
},
{
"_id" : 1,
"host" : "rs102:28017"
},
{
"_id" : 2,
"host" : "rs103:29017"
}
]
};
rs.initiate(config);
'
networks:
db-network:
driver: bridge
At this stage, I’ve been successful in creating logical snapshot backups, which I store in a MinIO bucket:
$ pbm list
Backup snapshots:
2024-04-03T10:10:53Z <logical> [restore_to_time: 2024-04-03T10:11:04Z]
2024-04-03T11:49:10Z <logical> [restore_to_time: 2024-04-03T11:49:23Z]
2024-04-03T11:54:49Z <logical> [restore_to_time: 2024-04-03T11:55:00Z]
2024-04-03T11:56:48Z <logical> [restore_to_time: 2024-04-03T11:57:00Z]
2024-04-03T12:31:19Z <logical> [restore_to_time: 2024-04-03T12:31:25Z]
PITR <on>:
However, I’m encountering difficulties in implementing incremental backups:
$ pbm backup --type incremental --base
Starting backup '2024-04-03T13:00:59Z'....Backup '2024-04-03T13:00:59Z' to remote store 's3://xxxxxxxxxxxx/mongo-backup' has started
$ pbm logs -s D -t 1000
2024-04-03T13:01:00Z I [rs/rs101:27017] got command backup [name: 2024-04-03T13:00:59Z, compression: s2 (level: default)] <ts: 1712149259>
2024-04-03T13:01:00Z I [rs/rs102:28017] got command backup [name: 2024-04-03T13:00:59Z, compression: s2 (level: default)] <ts: 1712149259>
2024-04-03T13:01:00Z I [rs/rs101:27017] got epoch {1712148849 3}
2024-04-03T13:01:00Z I [rs/rs102:28017] got epoch {1712148849 3}
2024-04-03T13:01:00Z I [rs/rs103:29017] got command backup [name: 2024-04-03T13:00:59Z, compression: s2 (level: default)] <ts: 1712149259>
2024-04-03T13:01:00Z D [rs/rs101:27017] [backup/2024-04-03T13:00:59Z] init backup meta
2024-04-03T13:01:00Z I [rs/rs103:29017] got epoch {1712148849 3}
2024-04-03T13:01:00Z D [rs/rs101:27017] [backup/2024-04-03T13:00:59Z] nomination list for rs: [[rs102:28017 rs103:29017] [rs101:27017]]
2024-04-03T13:01:00Z D [rs/rs101:27017] [backup/2024-04-03T13:00:59Z] nomination rs, set candidates [rs102:28017 rs103:29017]
2024-04-03T13:01:00Z I [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] backup started
2024-04-03T13:01:00Z D [rs/rs103:29017] [backup/2024-04-03T13:00:59Z] skip after nomination, probably started by another node
2024-04-03T13:01:01Z D [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] flush incremental backup history
2024-04-03T13:01:01Z D [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] backup cursor id: 45d13b6e-84a5-448c-a7bf-1caca7035795
2024-04-03T13:01:04Z D [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] set journal up to {1712149261 2}
2024-04-03T13:01:05Z I [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] uploading data
2024-04-03T13:01:05Z D [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] stop cursor polling: <nil>, cursor err: <nil>
2024-04-03T13:01:05Z I [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] mark RS as error `upload data files: upload file `/data/db/journal/WiredTigerLog.0000000006`: get file stat: stat /data/db/journal/WiredTigerLog.0000000006: no such file or directory`: <nil>
2024-04-03T13:01:05Z I [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] mark backup as error `upload data files: upload file `/data/db/journal/WiredTigerLog.0000000006`: get file stat: stat /data/db/journal/WiredTigerLog.0000000006: no such file or directory`: <nil>
2024-04-03T13:01:05Z E [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] backup: upload data files: upload file `/data/db/journal/WiredTigerLog.0000000006`: get file stat: stat /data/db/journal/WiredTigerLog.0000000006: no such file or directory
2024-04-03T13:01:05Z D [rs/rs102:28017] [backup/2024-04-03T13:00:59Z] releasing lock
2024-04-03T13:01:05Z D [rs/rs101:27017] [backup/2024-04-03T13:00:59Z] bcp nomination: rs won by rs102:28017
2024-04-03T13:01:05Z D [rs/rs101:27017] [backup/2024-04-03T13:00:59Z] skip after nomination, probably started by another node
Specifically, I’m unable to generate incremental backups. While I can locate the file /data/db/journal/WiredTigerLog.0000000006 on one of the MongoDB nodes (rs103), I can’t seem to find it on the other nodes.
Here is the actual PBM status:
$ pbm status
Cluster:
========
rs:
- rs/rs101:27017 [P]: pbm-agent v2.4.1 OK
- rs/rs102:28017 [S]: pbm-agent v2.4.1 OK
- rs/rs103:29017 [S]: pbm-agent v2.4.1 OK
PITR incremental backup:
========================
Status [ON]
Currently running:
==================
(none)
and the backup config:
storage:
type: s3
s3:
endpointUrl: "http://xxxxxxxxxx"
region: us-east-1
bucket: mongo-backup
credentials:
access-key-id: xxxxxxx
secret-access-key: xxxxxxx
I’m wondering if anyone could shed some light on what might be going wrong or if there’s something I might have missed in the setup process.
Thank you in advance for your help and suggestions!