Hi dears,
I’m struggling with taking a restore from Incremental backup. I’ve deployed an apllication to Openshift cluster and created 2 containers inside the same pod in order to make them able to reach each other. I can take copy as expected from PBM container. But I’m unable to restore the database from incremental backup as PBM agent cannot reach mongod, because it’s in another container.
sh-4.4$ pbm restore 2024-01-26T07:57:50Z
Starting restore 2024-01-26T08:38:17.422755382Z from '2024-01-26T07:57:50Z'.Error: check mongod binary: run: exec: "mongod": executable file not found in$PATH. stderr:
- Restore on replicaset "rs0" in state: error: check mongod binary: run: exec: "mongod": executable file not found in $PATH. stderr:
2024-01-26T08:38:01.000+0000 I [pitr] got done signal, stopping
2024-01-26T08:38:06.000+0000 I [pitr] created chunk 2024-01-26T08:28:31 - 2024-01-26T08:38:01
2024-01-26T08:38:06.000+0000 I [pitr] pausing/stopping with last_ts 2024-01-26 08:38:01 +0000 UTC
2024-01-26T08:38:17.000+0000 I got command restore [name: 2024-01-26T08:38:17.422755382Z, snapshot: 2024-01-26T07:57:50Z] <ts: 1706258297>
2024-01-26T08:38:17.000+0000 I got epoch {1706258271 2}
2024-01-26T08:38:17.000+0000 I [restore/2024-01-26T08:38:17.422755382Z] backup: 2024-01-26T07:57:50Z
2024-01-26T08:38:17.000+0000 I [restore/2024-01-26T08:38:17.422755382Z] recovery started
2024-01-26T08:38:17.000+0000 D [restore/2024-01-26T08:38:17.422755382Z] port: 27637
2024-01-26T08:38:17.000+0000 E [restore/2024-01-26T08:38:17.422755382Z] restore: check mongod binary: run: exec: "mongod": executable file not found in $PATH. stderr:
2024-01-26T08:38:17.000+0000 D [restore/2024-01-26T08:38:17.422755382Z] hearbeats stopped
2024-01-26T08:39:16.000+0000 D [pitr] start_catchup [oplog only]
How can I redirect traffic to the PSMDB container to stop mongo daemon? It’s located in /usr/bin folder
I’d like to mention that I cannot use Operator as I need Mongo 7 and Incremental backup
Thanks from advance for assistance
Hi @Iliterallyneedhelp ,
Restore on replicaset “rs0” in state: error: check mongod binary: run: exec: “mongod”: executable file not found in $PATH. stderr:
The error means that pbm-agent performed pre-checks and couldn’t find mongod binary in its $PATH. The “actual” restore hasn’t started yet.
During Physical Restore, PBM agents shut down (remotely) the mongod process by sending db.adminCommand("shutdown")
, copies backup files to the dbpath, and exec mongod --dbpath=[...]
to perform “post-restore” actions.
Make sure:
- PBM Agent container has the same mongod binary as the PSMDB container. And its $PATH contains the mongod (i.e., pbm-agent can exec mongod process)
- PSMDB dbpath is mounted volume (i.e., it’s accessible for other containers)
- PBM Agent container has the same mounted volume as the PSMDB container (i.e., it can read/write the PSMDB dbpath content)
- PBM Agent is run under the same User ID or Group ID as the PSMDB mongod process (i.e., it has the same permissions for read/write for the dbpath)
1 Like
Hi @Dmytro_Zghoba
Thanks for your answer. I finally realized why mongod is required in PBM Agent container, and the below one was the issue indeed. I used your response from the past as an example
PBM Agent container has the same mongod binary as the PSMDB container. And its $PATH contains the mongod (i.e., pbm-agent can exec mongod process)
I’m just curious about one thing.
mongod is the processes that keeps container alive, but container needs to stop it in order to take a backup. Additionally after stopping the mongod, PBM Agent container is going to be shut down within 5 minutes. I’ve large database about 150GB so restore is going to take a while.
Could you please advise what’s the best approach to keep these 2 containers alive, while restore is in progress?
Thanks once more Dmytro
For restore purposes, I’d change the restart policy and liveness checks for the containers/pod.
Make the pod run with the PBM container only.
At the end of the Physical Restore, the PBM container also stops, and you will need to start the whole cluster manually.
It may look like:
- delete Services/Endpoints i.e. stop client connections
- restart the cluster with updated configs for pod/containers
- run restore
- (when restore is done) run cluster with normal pod/containers configs
- create services/Endpoints again
You cannot have a partially run cluster because data consistency and internal cluster state will be broken. Until full cluster recovery, you should not allow any client connections.
I suggest you look at Percona Operator for MongoDB. It is open source product and is available for free (Apache 2.0)
Hi again, @Dmytro_Zghoba
I’ve followed your recommendation and I’m receiving such logs after start-agent.sh edit. I tried to disable restarting policy inside PMB container
pbm config --file pbm-config.yaml
Error: connect to mongodb: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongodb-server:27017, Type: Unknown, Last error: dial tcp 172.30.12.91:27017: connect: connection refused }, ] }
+ exec pbm-agent
2024/01/29 08:07:05 Exit: connect to PBM: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongodb-server:27017, Type: Unknown, Last error: dial tcp 172.30.12.91:27017: connect: connection refused }, ] }
It’s kinda obvious that mongodb server cannot be reached as I followed your recommendation to run PBM Agent container only. Which file and how can I edit to make pbm commands to be working?
my current start-agent.sh looks like this:
#!/bin/bash
for argv; do
if [[ -n "$usenext" ]]; then
export PBM_MONGODB_URI="${argv}"
break
fi
if [[ "$argv" == '--mongodb-uri' ]]; then
use_next='true'
# TODO should we check if last?
continue
elif [[ "$argv" == '--mongodb-uri='* ]]; then
export PBM_MONGODB_URI="${argv#--mongodb-uri=}"
break
fi
done
# TODO should we check if all parts are set?
set +o xtrace
[[ -z "$PBM_MONGODB_URI" ]] && export PBM_MONGODB_URI="mongodb://${PBM_AGENT_MONGODB_USERNAME}:${PBM_AGENT_MONGODB_PASSWORD}@localhost:${PBM_MONGODB_PORT}/?replicaSet=${PBM_MONGODB_REPLSET}"
set -o xtrace
if [ "$RESTORE_MODE" != "true" ]; then
if [ "${1:0:9}" = "pbm-agent" ]; then
OUT="$(mktemp)"
OUT_CFG="$(mktemp)"
timeout=5
for i in {1..10}; do
# ARM image doesn't contain mongo CLI, preliminary check is skipped, PBM will return error in case of connection failure
if [ ! -e "/usr/bin/mongo" ]; then
break
fi
if [ "${SHARDED}" ]; then
echo "waiting for sharded scluster"
# check in case if shard has role 'shardsrv'
set +o xtrace
mongo "${PBM_MONGODB_URI}" --eval="db.isMaster().\$configServerState.opTime.ts" --quiet | tee "$OUT"
set -o xtrace
exit_status=$?
# check in case if shard has role 'configsrv'
set +o xtrace
mongo "${PBM_MONGODB_URI}" --eval="db.isMaster().configsvr" --quiet | tail -n 1 | tee "$OUT_CFG"
set -o xtrace
exit_status_cfg=$?
ts=$(grep -E '^Timestamp\([0-9]+, [0-9]+\)$' "$OUT")
isCfg=$(grep -E '^2$' "$OUT_CFG")
if [[ "${exit_status}" == 0 && "${ts}" ]] || [[ "${exit_status_cfg}" == 0 && "${isCfg}" ]]; then
break
else
sleep "$((timeout * i))"
fi
else
set +o xtrace
mongo "${PBM_MONGODB_URI}" --eval="(db.isMaster().hosts).length" --quiet | tee "$OUT"
set -o xtrace
exit_status=$?
rs_size=$(grep -E '^([0-9]+)$' "$OUT")
if [[ "${exit_status}" == 0 ]] && [[ $rs_size -ge 1 ]]; then
break
else
sleep "$((timeout * i))"
fi
fi
done
rm "$OUT"
fi
else
echo "Restore mode activated"
fi
pbm config --file pbm-config.yaml
exec "$@"