I have a shard cluster in prod. 10 shards +1 config with 4 replicaset members each(3 + 1 hidden). Several DB is shared. We use a transaction.
I try make a backup with pbm. Backup is done. Without errors.
Backup command: pbm backup --wait
I have a stage cluster for testing restore. it differs from prod only in the number of members in replicaset and host name. Shard name - the same. Before restore Stage cluster was clean. mongos is shutdown.
For this topic all name was changed in log.
Restore command
pbm --mongodb-uri "mongodb://<mongodb_uri> restore 2023-01-04T00:00:02Z
Cluster:
========
config:
- config/mdb-config.domain:27035 [P]: pbm-agent v2.0.2 OK
shard_1:
- shard_1/mdb-shard-1.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_2:
- shard_2/mdb-shard-2.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_3:
- shard_3/mdb-shard-3.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_4:
- shard_4/mdb-shard-4.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_5:
- shard_5/mdb-shard-5.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_6:
- shard_6/mdb-shard-6.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_7:
- shard_7/mdb-shard-7.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_8:
- shard_8/mdb-shard-8.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_9:
- shard_9/mdb-shard-9.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_10:
- shard_10/mdb-shard-10.domain:27017 [P]: pbm-agent v2.0.2 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 <region> s3://<bucket-name>
Snapshots:
2023-01-23T00:00:08Z 1.05TB <logical> [restore_to_time: 2023-01-23T07:33:48Z]
2023-01-22T00:00:09Z 1.04TB <logical> [restore_to_time: 2023-01-22T07:25:25Z]
2023-01-21T00:00:16Z 1.04TB <logical> [restore_to_time: 2023-01-21T07:27:25Z]
2023-01-20T00:00:23Z 1.08TB <logical> [restore_to_time: 2023-01-20T07:46:17Z]
Logs:
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:24Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:25Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:49Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T08:54:49Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:05:06Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:05:07Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:09:05Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:09:05Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:09:41Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:09:41Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:18:30Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:18:31Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:22:17Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:22:18Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:32:08Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:32:09Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:52:55Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:52:55Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T14:01:22Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T14:01:23Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T14:38:03Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T14:38:03Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T16:51:46Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T16:51:46Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T16:51:47Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:49Z E [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-2>] sync failed with: wait transaction <<long-transaction-entry-2>> [] {1672790405 476}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-3>] sync failed with: wait transaction <<long-transaction-entry-3>> [] {1672790405 244}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-4>] sync failed with: wait transaction <<long-transaction-entry-4>> [] {1672790405 211}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-5>] sync failed with: wait transaction <<long-transaction-entry-5>> [] {1672790405 192}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-6>] sync failed with: wait transaction <<long-transaction-entry-6>> [] {1672790405 300}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-7>] sync failed with: wait transaction <<long-transaction-entry-7>> [] {1672790405 735}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-4>] sync failed with: wait transaction <<long-transaction-entry-4>> [] {1672790405 211}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:53Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] oplog replay finished on {1672816465 1380}
2023-01-26T16:51:54Z E [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] check cluster for the restore done: convergeCluster: restore on the shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T20:35:22Z W [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] errGetObj Err: RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_8/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T20:35:22Z W [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_8/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer, try to reconnect in 1s
2023-01-26T20:35:23Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] session recreated, resuming download
2023-01-26T21:25:12Z W [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] io.copy: read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T21:25:12Z W [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer, try to reconnect in 1s
2023-01-26T21:25:13Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] session recreated, resuming download
2023-01-26T21:51:49Z W [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] errGetObj Err: RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_6/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T21:51:49Z W [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_6/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer, try to reconnect in 1s
2023-01-26T21:51:50Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] session recreated, resuming download
The error is always the same:
{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
but I looked at collection in the backup and the whole oplog. there is not contain dockument with value { documentId: null, someid: null }
the problem can be in transactions?
transactions are tried on different collections. some of them are sharded.
Can you please help me fix this error?