Hi,
We are seens some issues when running an pbm incremental base backup, the error message we get is:
ERROR: check cluster for backup done: convergeCluster: backup on shard mongo_data_rs10 failed with: %!s()
the backup logs:
2025-03-19T16:28:00Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/collection-3--2064123749121529916.wt [0:479232] 468.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-5--3410557875914912461.wt [0:253952] 248.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-70-1975402631693463105.wt [0:49152] 48.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-42411--2854269189489801028.wt [0:32768] 32.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-42591--2854269189489801028.wt [0:98304] 96.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-26--4382307725957457486.wt [0:4096] 4.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-42459--2854269189489801028.wt [0:512000] 500.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/index-23--3768745852357879697.wt [0:249856] 244.00KB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/storage.bson [0:114] 114.00B
2025-03-19T16:28:01Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading data done
2025-03-19T16:28:01Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading journals
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/journal/WiredTigerLog.0000031853 100.00MB
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading: /data/db/journal/WiredTigerLog.0000031854 100.00MB
2025-03-19T16:28:01Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] uploading journals done
2025-03-19T16:28:01Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] stop cursor polling: <nil>, cursor err: <nil>
2025-03-19T16:28:02Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] mark RS as error `waiting for done: backup stuck, last beat ts: 1742396422`: <nil>
2025-03-19T16:28:02Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] set balancer on
2025-03-19T16:28:02Z E [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] backup: waiting for done: backup stuck, last beat ts: 1742396422
2025-03-19T16:28:02Z D [mongo_data_rs1/mongodb_content_rs1n1:27017] [backup/2025-03-19T13:33:56Z] releasing lock
i cant seem to find anyhing relased to this anywhere, hope you can point me in the right direction or got a solution
Thanks for the answer, just trying to find something in the agent log now, but also just found this in the backup log:
2025-03-19T15:00:23Z D [mongo_data_rs10/mongodb_content_rs10n2:27017] [backup/2025-03-19T13:33:56Z] stop cursor polling: <nil>, cursor err: connection pool for 127.0.0.1:27017 was cleared because another operation failed with: connection(127.0.0.1:27017[-3153]) incomplete read of message header: read tcp 127.0.0.1:47900->127.0.0.1:27017: i/o timeout: connection(127.0.0.1:27017[-3153]) incomplete read of message header: read tcp 127.0.0.1:47900->127.0.0.1:27017: i/o timeout
2025-03-19T15:00:23Z I [mongo_data_rs10/mongodb_content_rs10n2:27017] [backup/2025-03-19T13:33:56Z] mark RS as error `upload file `/data/db/journal/WiredTigerLog.0000027827`: get file stat: stat /data/db/journal/WiredTigerLog.0000027827: no such file or directory`: <nil>
2025-03-19T15:00:23Z D [mongo_data_rs10/mongodb_content_rs10n2:27017] [backup/2025-03-19T13:33:56Z] set balancer on
2025-03-19T15:00:23Z E [mongo_data_rs10/mongodb_content_rs10n2:27017] [backup/2025-03-19T13:33:56Z] backup: upload file `/data/db/journal/WiredTigerLog.0000027827`: get file stat: stat /data/db/journal/WiredTigerLog.0000027827: no such file or directory
2025-03-19T15:00:23Z D [mongo_data_rs10/mongodb_content_rs10n2:27017] [backup/2025-03-19T13:33:56Z] releasing lock
Cluster:
========
mongo_data_rs1:
- mongo_data_rs1/mongodb_content_rs1n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs1/mongodb_content_rs1n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs1/mongodb_content_rs1n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs8:
- mongo_data_rs8/mongodb_content_rs8n1:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs8/mongodb_content_rs8n2:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs8/mongodb_content_rs8n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs3:
- mongo_data_rs3/mongodb_content_rs3n1:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs3/mongodb_content_rs3n2:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs3/mongodb_content_rs3n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs2:
- mongo_data_rs2/mongodb_content_rs2n1:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs2/mongodb_content_rs2n2:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs2/mongodb_content_rs2n3:27017 [!Arbiter]: arbiter node is not supported
mongo_conf:
- mongo_conf/mongodb_content_cfg_server1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_conf/mongodb_content_cfg_server2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_conf/mongodb_content_cfg_server3:27017 [S]: pbm-agent v2.3.1 OK
mongo_data_rs10:
- mongo_data_rs10/mongodb_content_rs10n1:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs10/mongodb_content_rs10n2:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs10/mongodb_content_rs10n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs12:
- mongo_data_rs12/mongodb_content_rs12n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs12/mongodb_content_rs12n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs12/mongodb_content_rs12n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs7:
- mongo_data_rs7/mongodb_content_rs7n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs7/mongodb_content_rs7n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs7/mongodb_content_rs7n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs9:
- mongo_data_rs9/mongodb_content_rs9n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs9/mongodb_content_rs9n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs9/mongodb_content_rs9n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs5:
- mongo_data_rs5/mongodb_content_rs5n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs5/mongodb_content_rs5n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs5/mongodb_content_rs5n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs4:
- mongo_data_rs4/mongodb_content_rs4n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs4/mongodb_content_rs4n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs4/mongodb_content_rs4n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs11:
- mongo_data_rs11/mongodb_content_rs11n1:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs11/mongodb_content_rs11n2:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs11/mongodb_content_rs11n3:27017 [!Arbiter]: arbiter node is not supported
mongo_data_rs6:
- mongo_data_rs6/mongodb_content_rs6n1:27017 [P]: pbm-agent v2.3.1 OK
- mongo_data_rs6/mongodb_content_rs6n2:27017 [S]: pbm-agent v2.3.1 OK
- mongo_data_rs6/mongodb_content_rs6n3:27017 [!Arbiter]: arbiter node is not supported
dont know if that gives more insight in whats happening?
our oplog setting, is set to 200gb, the backup size is about 10tb
Hi very likely you are hitting a bug with oplog dump/upload. We’ve fixed a few of them since PBM 2.3.1 and made the process able to auto-retry if fails. I suggest you upgrade ASAP to latest PBM 2.9.0 and try again.
Can you tell me what version its fixed in? because we are running mongodb 5.0 right now, and cant just upgrade to 6,7,8 right away. so i could hope its fixed in pbm 2.7.0?