Hello,
we have installed percona mongodb with pbm agents on aks.
3 nodes/pods, 1 primary 2 secondary, each node/pod has its pbm-agent installed
mongo version 4.4.6-8
nodes: 3
sharding: no
replica: yes
pitr: no
we did succesfully a backup of the entire cluster and saved it on Minio.
the dump backup file size is 15gb, gz compressed. Now we want to restore it using the manifest
but we got always errors like
2021-12-13T11:18:32.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-13T11:18:32.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
2021-12-13T11:18:33.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
from aks resource manifest
check cluster for restore dump done: convergeCluster: restore on the shard
rs0 failed with: restore mongo dump (successes: 5767909 / fails: 0):
xxxx: error restoring from archive on stdin: reading
bson input: error demultiplexing archive; archive io error
can someone help us understanding where could be the problem?
Could be a problem of download time from minio?
{âlevelâ:âinfoâ,âtsâ:1639468601.797091,âloggerâ:âcontroller_perconaservermongodbrestoreâ,âmsgâ:âwaiting for mongos terminationâ}
{âlevelâ:âinfoâ,âtsâ:1639468601.8832114,âloggerâ:âcontroller_perconaservermongodbrestoreâ,âmsgâ:âwaiting for mongos terminationâ}
{âlevelâ:âinfoâ,âtsâ:1639468603.1849604,âloggerâ:âcontroller_psmdbâ,âmsgâ:âbalancer disabledâ}
{âlevelâ:âinfoâ,âtsâ:1639468605.5126054,âloggerâ:âcontroller_perconaservermongodbrestoreâ,âmsgâ:âWaiting for restore metadataâ,âPBM nameâ:â2021-12-14T07:56:45.349643054Zâ,ârestoreâ:âpercona-store-mongo-restore-20211213-d-v1â,âbackupâ:âcron-percona-store-mo-20211213000013-9gvsgâ}
rs0-0
2021-12-14T07:56:45.000+0000 I got command restore [name: 2021-12-14T07:56:45.349643054Z, backup name: 2021-12-13T00:00:23Z] <ts: 1639468605>
2021-12-14T07:56:45.000+0000 I got epoch {1639468604 2}
2021-12-14T07:56:45.000+0000 I [restore/2021-12-13T00:00:23Z] node is not primary so it unsuitable to do restore
rs0-1
2021-12-14T07:56:45.000+0000 I got command restore [name: 2021-12-14T07:56:45.349643054Z, backup name: 2021-12-13T00:00:23Z] <ts: 1639468605>
2021-12-14T07:56:45.000+0000 I got epoch {1639468604 2}
2021-12-14T07:56:45.000+0000 I [restore/2021-12-13T00:00:23Z] node is not primary so it unsuitable to do restore
rs0-2
2021-12-14T00:00:21.000+0000 I got command backup [name: 2021-12-14T00:00:20Z, compression: gzip] <ts: 1639440020>
2021-12-14T00:00:21.000+0000 I got epoch {1639440020 1}
2021-12-14T00:00:38.000+0000 D [backup/2021-12-14T00:00:20Z] skip after nomination, probably started by another node
2021-12-14T07:56:45.000+0000 I got command restore [name: 2021-12-14T07:56:45.349643054Z, backup name: 2021-12-13T00:00:23Z] <ts: 1639468605>
2021-12-14T07:56:45.000+0000 I got epoch {1639468604 2}
2021-12-14T07:56:45.000+0000 I [restore/2021-12-13T00:00:23Z] restore started
2021-12-14T07:56:48.761+0000 connected to node type: replset
2021-12-14T07:56:48.781+0000 archive prelude semantic.Semantic.MRSTRTPLMR
2021-12-14T07:56:48.781+0000 archive prelude semantic.Semantic.BYND
âŠ
|2021-12-14T07:56:48.859+0000|archive prelude catalog.Attribute.Device|
|â|â|
|2021-12-14T07:56:48.859+0000|archive prelude admin.system.users|
|2021-12-14T07:56:48.859+0000|archive prelude admin.system.roles|
|2021-12-14T07:56:48.859+0000|archive prelude admin.system.version|
|2021-12-14T07:56:48.859+0000|archive format version 0.1|
|2021-12-14T07:56:48.859+0000|archive server version 4.4.6-8|
|2021-12-14T07:56:48.859+0000|archive tool version 0.0.1|
|2021-12-14T07:56:48.859+0000|setting number of parallel collections to number of parallel collections in archive (2)|
|2021-12-14T07:56:48.864+0000|preparing collections to restore from|
|2021-12-14T07:56:48.864+0000|found collection semantic.Semantic.MRSTRTPLMR bson to restore to semantic.Semantic.MRSTRTPLMR|
2021-12-14T07:56:48.873+0000 found collection metadata from semantic.Semantic.MTN1 to restore to semantic.Semantic.MTN1
2021-12-14T07:56:48.873+0000 found collection admin.pbmRUsers bson to restore to admin.pbmRUsers
2021-12-14T07:56:48.873+0000 found collection metadata from admin.pbmRUsers to restore to admin.pbmRUsers
2021-12-14T07:56:48.873+0000 skipping restore of system.profile collection in admin
2021-12-14T07:56:48.873+0000 skipping restore of system.profile metadata
2021-12-14T07:56:48.873+0000 found collection admin.pbmRRoles bson to restore to admin.pbmRRoles
2021-12-14T07:56:48.873+0000 found collection metadata from admin.pbmRRoles to restore to admin.pbmRRoles
2021-12-14T07:56:48.873+0000 found collection admin.system.users bson to restore to admin.system.users
2021-12-14T07:56:48.873+0000 found collection metadata from admin.system.users to restore to admin.system.users
âŠ
2021-12-14T07:56:48.876+0000 found collection catalog.EntityType bson to restore to catalog.EntityType
2021-12-14T07:56:48.876+0000 found collection metadata from catalog.EntityType to restore to catalog.EntityType
2021-12-14T07:56:48.876+0000 found collection catalog.Datasource bson to restore to catalog.Datasource
2021-12-14T07:56:48.876+0000 found collection metadata from catalog.Datasource to restore to catalog.Datasource
2021-12-14T07:56:48.876+0000 found collection catalog.Attribute.Device bson to restore to catalog.Attribute.Device
2021-12-14T07:56:48.876+0000 found collection metadata from catalog.Attribute.Device to restore to catalog.Attribute.Device
2021-12-14T07:56:48.876+0000 found collection benchmark.test bson to restore to benchmark.test
2021-12-14T07:56:48.876+0000 found collection metadata from benchmark.test to restore to benchmark.test
2021-12-14T07:56:48.876+0000 found collection estimation.test bson to restore to estimation.test
2021-12-14T07:56:48.876+0000 found collection metadata from estimation.test to restore to estimation.test
2021-12-14T07:56:48.964+0000 first non special collection semantic.Semantic.BRKHG_B found. The demultiplexer will handle it and the remainder
2021-12-14T07:56:48.964+0000 restoring up to 2 collections in parallel
2021-12-14T07:56:48.978+0000 dropping collection semantic.Semantic.BRKHG_B before restoring
2021-12-14T07:56:48.984+0000 reading metadata for semantic.Semantic.BRKHG_B from archive on stdin
2021-12-14T07:56:48.985+0000 creating collection semantic.Semantic.BRKHG_B using options from metadata
2021-12-14T07:56:49.057+0000 restoring semantic.Semantic.BRKHG_B from archive on stdin
2021-12-14T07:56:49.062+0000 using 10 insertion workers
2021-12-14T07:56:51.557+0000 dropping collection semantic.Semantic.TDRNG_5 before restoring
2021-12-14T07:56:51.564+0000 reading metadata for semantic.Semantic.TDRNG_5 from archive on stdin
2021-12-14T07:56:51.564+0000 creating collection semantic.Semantic.TDRNG_5 using options from metadata
2021-12-14T07:56:51.589+0000 restoring semantic.Semantic.TDRNG_5 from archive on stdin
2021-12-14T07:56:51.595+0000 using 10 insertion workers
2021-12-14T08:03:49.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:03:49.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
2021-12-14T08:03:50.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:05:12.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:05:12.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
2021-12-14T08:05:13.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:06:39.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:06:39.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
2021-12-14T08:06:40.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:07:11.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:07:11.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 2s
2021-12-14T08:07:13.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:07:43.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:07:43.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 3s
2021-12-14T08:07:46.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:08:17.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:08:17.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 4s
2021-12-14T08:08:21.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:08:52.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:08:52.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 5s
2021-12-14T08:08:57.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:09:27.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:09:27.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 6s
2021-12-14T08:09:33.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:10:03.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:10:03.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 7s
2021-12-14T08:10:10.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:10:41.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:10:41.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 8s
2021-12-14T08:10:49.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:11:19.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:11:19.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 9s
2021-12-14T08:11:28.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:11:59.000+0000 W [restore/2021-12-13T00:00:23Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:11:59.000+0000 W [restore/2021-12-13T00:00:23Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 10s
2021-12-14T08:12:09.000+0000 I [restore/2021-12-13T00:00:23Z] session recreated, resuming download
2021-12-14T08:12:09.000+0000 E [restore/2021-12-13T00:00:23Z] download âbk-mongo-dev/2021-12-13T00:00:23Z_rs0.dump.gzâ file from S3: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-12-14T08:12:09.374+0000 finished restoring semantic.Semantic.BRKHG_B (2839546 documents, 0 failures)
2021-12-14T08:12:09.558+0000 finished restoring semantic.Semantic.TDRNG_5 (2878599 documents, 0 failures)
2021-12-14T08:12:09.559+0000 demux finishing when there are still outs (7)
2021-12-14T08:12:09.559+0000 demux finishing (err:corruption found in archive; ParserConsumer.BodyBSON() ( corruption found in archive; read bson ( unexpected EOF ) ))
2021-12-14T08:12:09.000+0000 E [restore/2021-12-13T00:00:23Z] restore: restore mongo dump (successes: 5718145 / fails: 0): semantic.Semantic.BRKHG_B: error restoring from archive on stdin: reading bson input: error demultiplexing archive; archive io error
2021-12-14T08:12:09.000+0000 D [restore/2021-12-13T00:00:23Z] releasing lock
well, the backup worked.
do i have to use a different manifest for the restore?
a few days ago i was checking the logs and realized that the restore has dropped the collection xxx (donât remember which one) and its indexes. then recreated it and copied the bsondocuments, but failed when trying to restore indexes.
in our db we have collections with 100/150 millions of documents. maybe we have to change some mongo setting?