Percona Mongodb Restore hung and data not restored

Hi,
We are managing a mongodb sharded cluster (Mongodb 4.4) and have 3 data nodes, 3 config servers and 2 mongos servers. I installed percona agent on all of the servers, and had a remote gcs storage. I was able to perform backup and upload the data to gcs. But when trying to do a restore, the process would hung and pbm logs were scant. Could you help with this issue?

> pbm status
Cluster:
========
configsvr:
  - configsvr/g-fraud-es-mongodb-config-primary-01-mongo-a-01:27017: pbm-agent v1.5.0 OK
  - configsvr/g-fraud-es-mongodb-config-secondary-01-mongo-a-01:27017: pbm-agent v1.5.0 OK
  - configsvr/g-fraud-es-mongodb-config-arbiter-01-mongo-a-01:27017: pbm-agent v1.5.0 OK
rs0:
  - rs0/g-fraud-es-mongodb-primary-set01-mongo-a-01:27017: pbm-agent v1.5.0 OK
  - rs0/g-fraud-es-mongodb-secondary-set01-mongo-a-01:27017: pbm-agent v1.5.0 OK
  - rs0/g-fraud-es-mongodb-secondary-02-set01-mongo-a-01:27017: pbm-agent v1.5.0 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
PITR restore "2021-06-16T03:03:46Z", started at 2021-06-16T06:32:53Z. Status: snapshot restore. [op id: 60c99b1422caa6a3de0c4ce0]
Backups:
========
S3 asia-southeast1 s3://https://storage.googleapis.com/risk_integration_mongodb_backup/pbm/test
  Snapshots:
    2021-06-16T03:03:46Z 2.08MB [complete: 2021-06-16T03:04:07]
    2021-06-15T10:06:34Z 1.37MB [complete: 2021-06-15T10:06:55]
  PITR chunks:
    2021-06-16T03:04:07 - 2021-06-16T04:48:14 643.50KB
    2021-06-15T10:06:55 - 2021-06-16T03:03:52 5.15MB

> pbm logs -i 60c99b1422caa6a3de0c4ce0
2021-06-16T06:32:52Z I [/] [pitr/2021-06-16T04:48:14Z] recovery started
2021-06-16T06:32:52Z E [/] [pitr/2021-06-16T04:48:14Z] restore: define last oplog slice: get: mongo: no documents in result
2021-06-16T06:32:53Z I [rs0/g-fraud-es-mongodb-secondary-02-set01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] Node in not suitable for restore
2021-06-16T06:32:53Z I [configsvr/g-fraud-es-mongodb-config-primary-01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] recovery started
2021-06-16T06:32:53Z I [rs0/g-fraud-es-mongodb-primary-set01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] recovery started
2021-06-16T06:32:53Z I [rs0/g-fraud-es-mongodb-secondary-set01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] Node in not suitable for restore
2021-06-16T06:32:53Z I [configsvr/g-fraud-es-mongodb-config-secondary-01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] Node in not suitable for restore
2021-06-16T06:32:53Z I [configsvr/g-fraud-es-mongodb-config-arbiter-01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] Node in not suitable for restore
2021-06-16T06:32:53Z E [/] [pitr/2021-06-16T04:48:14Z] unbale to run the restore while another backup or restore process running
2021-06-16T06:33:01Z I [configsvr/g-fraud-es-mongodb-config-primary-01-mongo-a-01:27017] [pitr/2021-06-16T04:48:14Z] mongorestore finished

The restore ended very fast, but the data was not restored, and the status did not change. After hours of waiting we did a little digging into the code and deleted the locks in the db so we can try again. Unfortuately, the result was the same. The process hung, and data was not restored.

> pbm logs -i 60c9dab8956c4f8056f6c055
2021-06-16T11:04:25Z I [configsvr/g-fraud-es-mongodb-config-arbiter-01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] node is not primary so it unsuitable to do restore
2021-06-16T11:04:25Z I [configsvr/g-fraud-es-mongodb-config-primary-01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] restore started
2021-06-16T11:04:25Z I [configsvr/g-fraud-es-mongodb-config-secondary-01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] node is not primary so it unsuitable to do restore
2021-06-16T11:04:25Z I [rs0/g-fraud-es-mongodb-secondary-set01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] node is not primary so it unsuitable to do restore
2021-06-16T11:04:25Z I [/] [restore/2021-06-16T03:03:46Z] restore started
2021-06-16T11:04:25Z E [/] [restore/2021-06-16T03:03:46Z] restore: prepare snapshot: extra/unknown replica set found in the backup: rs0,configsvr
2021-06-16T11:04:25Z E [/] [restore/2021-06-16T03:03:46Z] unbale to run the restore while another operation running
2021-06-16T11:04:25Z I [rs0/g-fraud-es-mongodb-primary-set01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] restore started
2021-06-16T11:04:25Z I [rs0/g-fraud-es-mongodb-secondary-02-set01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] node is not primary so it unsuitable to do restore
2021-06-16T11:04:33Z I [configsvr/g-fraud-es-mongodb-config-primary-01-mongo-a-01:27017] [restore/2021-06-16T03:03:46Z] mongorestore finished

Any idea how to solve the issue? Thanks!!

1 Like

Hi, based on the output you seem to be using an arbiter in the config server replica set. This is not a supported MongoDB configuration.
Going back to your issue, you cannot restore to 2021-06-16T03:03:46Z since your first snapshot finished at 2021-06-16T03:04:07. Try using 2021-06-16T03:04:08 instead and since you had a hung operation I’d restart all agents just in case.
Hope that helps

1 Like