Backup failed with error: "upload to GCS: 502 Bad Gateway"

Nguy_n_Xuan_H_i · June 23, 2022, 2:27am

Hi guys,
I’m facing this problem when performing backup on my MongoDB sharded cluster lately! Here is the status and backup log. Hope you guys can help me solve this problem, thank you.

Cluster:
========
configRS:
  - configRS/b2b-mongo-1:57017: pbm-agent v1.7.0 OK
  - configRS/b2b-mongo-3:57017: pbm-agent v1.7.0 OK
  - configRS/b2b-mongo-4:57017: pbm-agent v1.7.0 OK
myShard_0:
  - myShard_0/b2b-mongo-1:37017: pbm-agent v1.7.0 OK
  - myShard_0/b2b-mongo-2:37017: pbm-agent v1.7.0 OK
  - myShard_0/b2b-mongo-4:37017: pbm-agent v1.7.0 OK
myShard_1:
  - myShard_1/b2b-mongo-2:47017: pbm-agent v1.7.0 OK
  - myShard_1/b2b-mongo-3:47017: pbm-agent v1.7.0 OK
  - myShard_1/b2b-mongo-4:47017: pbm-agent v1.7.0 OK


PITR incremental backup:
========================
Status [ON]

Currently running:
==================
(none)

Backups:
========
S3 asia-east1 s3://https://storage.googleapis.com/omd-mongodb/pbm/backup
  Snapshots:
    2022-06-22T18:00:01Z 0.00B [ERROR: check cluster for dump done: convergeCluster: backup on shard myShard_1 failed with: ] [2022-06-22T20:06:34]
    2022-06-19T18:00:01Z 0.00B [ERROR: check cluster for dump done: convergeCluster: backup on shard myShard_0 failed with: ] [2022-06-19T20:08:13]
    2022-06-15T18:00:01Z 0.00B [ERROR: check cluster for dump done: convergeCluster: backup on shard myShard_1 failed with: ] [2022-06-15T20:02:22]
    2022-06-12T18:00:01Z 476.81GB <logical> [complete: 2022-06-12T20:11:59]
    2022-06-08T18:00:01Z 479.73GB <logical> [complete: 2022-06-08T20:17:03]
    2022-06-05T18:00:01Z 472.58GB <logical> [complete: 2022-06-05T20:10:38]
    2022-06-01T18:00:01Z 484.08GB <logical> [complete: 2022-06-01T20:17:48]
    2022-05-28T18:00:01Z 485.72GB <logical> [complete: 2022-05-28T20:24:22]
  PITR chunks [249.29GB]:
    2022-05-28T20:24:23 - 2022-06-16T02:30:5

2022-06-22T18:00:02Z I [myShard_1/b2b-mongo-3:47017] [backup/2022-06-22T18:00:01Z] backup started
2022-06-22T18:00:02Z I [myShard_0/b2b-mongo-2:37017] [backup/2022-06-22T18:00:01Z] backup started
2022-06-22T18:00:02Z I [configRS/b2b-mongo-1:57017] [backup/2022-06-22T18:00:01Z] backup started
2022-06-22T18:00:07Z I [configRS/b2b-mongo-1:57017] [backup/2022-06-22T18:00:01Z] mongodump finished, waiting for the oplog
2022-06-22T20:06:33Z I [myShard_1/b2b-mongo-3:47017] [backup/2022-06-22T18:00:01Z] dropping tmp collections
2022-06-22T20:06:33Z I [myShard_1/b2b-mongo-3:47017] [backup/2022-06-22T18:00:01Z] mark RS as error `mongodump: write data: upload to GCS: 502 Bad Gateway.`: <nil>
2022-06-22T20:06:33Z E [myShard_1/b2b-mongo-3:47017] [backup/2022-06-22T18:00:01Z] backup: mongodump: write data: upload to GCS: 502 Bad Gateway.
2022-06-22T20:06:34Z I [configRS/b2b-mongo-1:57017] [backup/2022-06-22T18:00:01Z] dropping tmp collections
2022-06-22T20:06:34Z I [configRS/b2b-mongo-1:57017] [backup/2022-06-22T18:00:01Z] mark RS as error `check cluster for dump done: convergeCluster: backup on shard myShard_1 failed with: `: <nil>
2022-06-22T20:06:34Z I [configRS/b2b-mongo-1:57017] [backup/2022-06-22T18:00:01Z] mark backup as error `check cluster for dump done: convergeCluster: backup on shard myShard_1 failed with: `: <nil>
2022-06-22T20:06:34Z E [configRS/b2b-mongo-1:57017] [backup/2022-06-22T18:00:01Z] backup: check cluster for dump done: convergeCluster: backup on shard myShard_1 failed with: 
2022-06-22T20:41:53Z I [myShard_0/b2b-mongo-2:37017] [backup/2022-06-22T18:00:01Z] mongodump finished, waiting for the oplog
2022-06-22T20:41:54Z I [myShard_0/b2b-mongo-2:37017] [backup/2022-06-22T18:00:01Z] dropping tmp collections
2022-06-22T20:41:54Z I [myShard_0/b2b-mongo-2:37017] [backup/2022-06-22T18:00:01Z] mark RS as error `waiting for dump done: backup stuck, last beat ts: 1655928392`: <nil>
2022-06-22T20:41:54Z E [myShard_0/b2b-mongo-2:37017] [backup/2022-06-22T18:00:01Z] backup: waiting for dump done: backup stuck, last beat ts: 1655928392

Ivan_Groenewold · June 24, 2022, 11:18am

Hello, as you can see the problem is communication issue uploading to GCS. Did you test all servers on all shards can write to GCS? Have you checked logs on the Google side?

Nguy_n_Xuan_H_i · June 27, 2022, 7:21am

Hi Igroene, the problem goes away after upgrading from PBM v1.6.1 to v1.7.0 and adding retryer to the config. I think it’s because the backup set is relatively large and the network is unstable can cause the write to GCS to fail.
For anyone facing the same problem, here is what I add to the config after upgrading to v1.7.0:

storage:
...
    retryer:
      numMaxRetries: 10
      minRetryDelay: 30
      maxRetryDelay: 5

I’m not sure if this is the permanent fix but it is worth a try.

Topic		Replies	Views
Backup error for mongo percona operator Percona Backup for MongoDB	6	370	February 24, 2024
Backup failed with ERROR: couldn’t get response from all shards Percona Backup for MongoDB	1	1205	May 8, 2023
Backup failed in 2.3.1 version percona , mongodb , new-release	0	357	March 4, 2024
PBM Fails incremental backup with: ERROR: check cluster for backup done: convergeCluster: backup on shard mongo_data_rs10 failed with: %!s(<nil>) Percona Backup for MongoDB	5	82	March 20, 2025
Backup failing after update to 2.0.4 Percona Backup for MongoDB closed-no-reply	0	1352	February 23, 2023

Backup failed with error: "upload to GCS: 502 Bad Gateway"

Related topics