retryChunk got copy: context deadline exceeded

aranjith0 · November 14, 2023, 3:11pm

Hi,
i am seeing below messages in the logs while restoring the backups to a new cluster on Version2.2.1
Also i think the full cluster restore took ~24hrs whereas restore took 18hrs in older versions (2.0.5)

2023-11-13T14:57:27Z W [shard1ReplSet/10.80.11.0:27038] [restore/2023-11-13T09:29:21.900315695Z] retryChunk got copy: context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 0s
2023-11-13T14:57:27Z I [shard1ReplSet/10.80.11.0:27038] [restore/2023-11-13T09:29:21.900315695Z] session recreated, resuming download

[mongod@ip-10-80-11-188 ~]$ pbm status
Cluster:
========
shard3ReplSet:
  - shard3ReplSet/10.80.11.188:27038 [P]: pbm-agent v2.2.1 OK
configReplSet:
  - configReplSet/10.80.11.0:27039 [P]: pbm-agent v2.2.1 OK
shard1ReplSet:
  - shard1ReplSet/10.80.11.0:27038 [P]: pbm-agent v2.2.1 OK
shard2ReplSet:
  - shard2ReplSet/10.80.11.40:27038 [P]: pbm-agent v2.2.1 OK


PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-east-1 s3://cm-mongo-db-shared-prod-va/percona/backup/
  Snapshots:
    2023-11-11T01:00:02Z 2.24TB <logical> [restore_to_time: 2023-11-11T12:53:21Z]

anil.joshi · November 24, 2023, 8:00am

@aranjith0

Is the issue repeating every time ? Did you tried running that again ?

2023-11-13T14:57:27Z W [shard1ReplSet/10.80.11.0:27038] [restore/2023-11-13T09:29:21.900315695Z] retryChunk got copy: context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 0s
2023-11-13T14:57:27Z I [shard1ReplSet/10.80.11.0:27038] [restore/2023-11-13T09:29:21.900315695Z] session recreated, resuming download

Was the network stable, and health of the target cluster was fine during the activity? Did you observe anything unusual in the MongoDB or system/kernel logs?

Still to expedite the PBM process you can tweak the parallel download depending on your hardware resources and database load. To do so you need to edit the PBM configuration file as below.

restore:
   numDownloadWorkers: <int>
   maxDownloadBufferMb: <int>
   downloadChunkMb: 32

numDownloadWorkers - the number of workers to download data from the storage. By default, it equals to the number of CPU cores
maxDownloadBufferMb - the maximum size of memory buffer to store the downloaded data chunks for decompression and ordering. It is calculated as numDownloadWorkers * downloadChunkMb * 16
downloadChunkMb is the size of the data chunk to download (by default, 32 MB)

Reference:- Restore a backup - Percona Backup for MongoDB

Can you please share the below details of PBM for a review ?

pbm config --list
pbm logs -t0 
pbm logs --event restore

Are you trying to restore backup taken in older PBM version (2.0) or using the same version 2.2.1 for both backup/restore process ?

WM1 · June 12, 2025, 12:03pm

I also get these errors from time to time while I’m doing test restores:
pbm-agent[57935]: 2025-06-12T10:05:22.000+0000 W [restore/2025-06-12T09:51:51.921935763Z] failed to download chunk 1090519040-1098907647
pbm-agent[57935]: 2025-06-12T10:05:22.000+0000 W [restore/2025-06-12T09:51:51.921935763Z] retryChunk got failed to download chunk 1090519040-1098907647 (of 6006
484807) after 2 retries: copy: context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
[…]
pbm-agent[57935]: 2025-06-12T11:56:28.000+0000 W [restore/2025-06-12T11:41:04.460428581Z] failed to download chunk 6006243328-6014631935

I’m wondering if that means some data from backup will not be restored or downloads are retried and there’s no worry that restore DB will be incosistent?

Topic		Replies	Views
PBM Restore failed Percona Backup for MongoDB	4	921	May 18, 2023
Pbm restore failed: context deadline exceeded Percona Backup for MongoDB	3	1509	June 20, 2023
Error when Percona restore- got context deadline exceeded (Client.Timeout or context cancellation while reading body) Percona Backup for MongoDB closed-no-reply	0	660	November 9, 2023
ERROR: couldn’t get response from all shards: convergeClusterWithTimeout: Percona Backup for MongoDB pbm	6	692	November 14, 2023
Restore getting failed in pbm restore MongoDB pbm	1	70	November 14, 2024

retryChunk got copy: context deadline exceeded

Related topics