Restore sharded cluster error. E11000 duplicate key error

I have a shard cluster in prod. 10 shards +1 config with 4 replicaset members each(3 + 1 hidden). Several DB is shared. We use a transaction.

I try make a backup with pbm. Backup is done. Without errors.

Backup command: pbm backup --wait

I have a stage cluster for testing restore. it differs from prod only in the number of members in replicaset and host name. Shard name - the same. Before restore Stage cluster was clean. mongos is shutdown.

For this topic all name was changed in log.
Restore command

pbm --mongodb-uri "mongodb://<mongodb_uri> restore  2023-01-04T00:00:02Z
Cluster:
========
config:
  - config/mdb-config.domain:27035 [P]: pbm-agent v2.0.2 OK
shard_1:
  - shard_1/mdb-shard-1.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_2:
  - shard_2/mdb-shard-2.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_3:
  - shard_3/mdb-shard-3.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_4:
  - shard_4/mdb-shard-4.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_5:
  - shard_5/mdb-shard-5.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_6:
  - shard_6/mdb-shard-6.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_7:
  - shard_7/mdb-shard-7.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_8:
  - shard_8/mdb-shard-8.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_9:
  - shard_9/mdb-shard-9.domain:27017 [P]: pbm-agent v2.0.2 OK
shard_10:
  - shard_10/mdb-shard-10.domain:27017 [P]: pbm-agent v2.0.2 OK


PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 <region> s3://<bucket-name>
  Snapshots:
    2023-01-23T00:00:08Z 1.05TB <logical> [restore_to_time: 2023-01-23T07:33:48Z]
    2023-01-22T00:00:09Z 1.04TB <logical> [restore_to_time: 2023-01-22T07:25:25Z]
    2023-01-21T00:00:16Z 1.04TB <logical> [restore_to_time: 2023-01-21T07:27:25Z]
    2023-01-20T00:00:23Z 1.08TB <logical> [restore_to_time: 2023-01-20T07:46:17Z]

Logs:

2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:21Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] got command restore [name: 2023-01-26T08:54:21.860592751Z, backup name: 2023-01-04T00:00:02Z] <ts: 1674723261>
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] got epoch {1674460055 2}
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] backup: 2023-01-04T00:00:02Z
2023-01-26T08:54:22Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restore started
2023-01-26T08:54:22Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:23Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:24Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:25Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state running
2023-01-26T08:54:49Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T08:54:49Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:05:06Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:05:07Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:09:05Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:09:05Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:09:41Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:09:41Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:18:30Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:18:31Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:22:17Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:22:18Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:32:08Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:32:09Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T13:52:55Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T13:52:55Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T14:01:22Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T14:01:23Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T14:38:03Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T14:38:03Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T16:51:46Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] restoring users and roles
2023-01-26T16:51:46Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] moving to state dumpDone
2023-01-26T16:51:47Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:47Z I [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:48Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] starting oplog replay
2023-01-26T16:51:49Z E [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_1/mdb-shard-1.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_10/mdb-shard-10.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_7/mdb-shard-7.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-2>] sync failed with: wait transaction <<long-transaction-entry-2>> [] {1672790405 476}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_9/mdb-shard-9.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-3>] sync failed with: wait transaction <<long-transaction-entry-3>> [] {1672790405 244}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:50Z E [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-4>] sync failed with: wait transaction <<long-transaction-entry-4>> [] {1672790405 211}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_4/mdb-shard-4.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-5>] sync failed with: wait transaction <<long-transaction-entry-5>> [] {1672790405 192}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_2/mdb-shard-2.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-6>] sync failed with: wait transaction <<long-transaction-entry-6>> [] {1672790405 300}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-7>] sync failed with: wait transaction <<long-transaction-entry-7>> [] {1672790405 735}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:52Z E [shard_3/mdb-shard-3.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-4>] sync failed with: wait transaction <<long-transaction-entry-4>> [] {1672790405 211}: shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T16:51:53Z I [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] oplog replay finished on {1672816465 1380}
2023-01-26T16:51:54Z E [config/mdb-config.domain:27035] [restore/2023-01-26T08:54:21.860592751Z] check cluster for the restore done: convergeCluster: restore on the shard shard_1 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying a transaction entry: distributed txn [<long-transaction-entry-1>] sync failed with: wait transaction <<long-transaction-entry-1>> [] {1672790405 146}: shard shard_8 failed with: replay chunk 1672790405.1672816465: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1672790405,"I":110},"Term":192,"Hash":null,"Version":2,"Operation":"u","Namespace":"db.collection.name1","Object":[{"Key":"$v","Value":2},{"Key":"diff","Value":[{"Key":"u","Value":[{"Key":"captureKey","Value":{"Subtype":3,"Data":"<objectid-1>"}},{"Key":"capturedAt","Value":"2023-01-04T00:00:05.144Z"},{"Key":"state","Value":1},{"Key":"totalCaptureCount","Value":641},{"Key":"upTime","Value":"2023-01-04T00:11:35.144Z"}]}]}],"Query":[{"Key":"_id","Value":{"Subtype":3,"Data":"<objectid-data-1>"}}],"UI":{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }
2023-01-26T20:35:22Z W [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] errGetObj Err: RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_8/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T20:35:22Z W [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_8/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer, try to reconnect in 1s
2023-01-26T20:35:23Z I [shard_8/mdb-shard-8.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] session recreated, resuming download
2023-01-26T21:25:12Z W [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] io.copy: read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T21:25:12Z W [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer, try to reconnect in 1s
2023-01-26T21:25:13Z I [shard_5/mdb-shard-5.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] session recreated, resuming download
2023-01-26T21:51:49Z W [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] errGetObj Err: RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_6/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T21:51:49Z W [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_6/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer, try to reconnect in 1s
2023-01-26T21:51:50Z I [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] session recreated, resuming download

The error is always the same:

{"Subtype":4,"Data":"<objectid-data-2>"},"LSID":null,"TxnNumber":null,"PrevOpTime":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: db.collection.name1 index: IX_DocumentId_SomeId dup key: { documentId: null, someid: null }

but I looked at collection in the backup and the whole oplog. there is not contain dockument with value { documentId: null, someid: null }

the problem can be in transactions?
transactions are tried on different collections. some of them are sharded.

Can you please help me fix this error?

Hey @Andrey_Agafonov ,

Very interesting. I can’t say what cause it right now. Also I see lots of network issues.

Anyway, is it MongoDB or Percona Server for MongoDB? what version? is you backup mongod is the same as on stage? with the same configs deployed? (except number of members per replset).

I created ticket PBM-1040 for it.
Please give us more env and other context details to help identifying and resolving the issue.

Also I see lots of network issues.

What do you think so? becose:

2023-01-26T21:51:49Z W [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] errGetObj Err: RequestError: send request failed
caused by: Get "https://<region>/<bucket-name>/<backup-path>/2023-01-04T00%3A00%3A02Z/shard_6/local.oplog.rs.bson.gz": read tcp <source_ip>:<sport> -> <destIp>:<port>: read: connection reset by peer
2023-01-26T21:51:49Z W [shard_6/mdb-shard-6.domain:27017] [restore/2023-01-26T08:54:21.860592751Z] got RequestError: send request failed

This is strange. stage and s3 bucket located in AWS. In the same region.

MongoDB: 5.0.10
Backup the same. PBM Agent version v2.0.2 (prod and stage - the same).
The configurations are the same.
S3 bucket - the same.

with the same configs deployed? (except number of members per replset).

Differences between prod and stages:

  • Oplog size
  • Instances size (CPU,RAM. Stage smaller then prod)
  • hostname (but shardname the same)
  • number of members per replset

please tell me if you need more information.
thanks

Prod:
Config:

systemLog:
 destination: file
 path: /var/log/mongodb/mongoc.log
 logAppend: true
 logRotate: reopen
storage:
 dbPath: /datadrive/mongodb
 journal:
  enabled: true
net:
 port: <port>
 bindIp: <ip>
security:
 keyFile: <file-path>
replication:
 oplogSizeMB: 1024
 replSetName: config
sharding:
 clusterRole: configsvr

Shard:

systemLog:
 destination: file
 path: /var/log/mongodb/mongod.log
 logAppend: true
 logRotate: reopen
storage:
 dbPath: /datadrive/mongodb
 engine: wiredTiger
 journal:
  enabled: true
net:
 port: <port>
 bindIp: <ip>
security:
 keyFile: <file-path>
replication:
 oplogSizeMB: 32768
 replSetName: shard_1
sharding:
 clusterRole: shardsvr

#########
Stage:

config:

systemLog:
 destination: file
 path: /var/log/mongodb/mongoc.log
 logAppend: true
storage:
 dbPath: /datadrive/mongodb/
 journal:
  enabled: true
net:
 port: <port>
 bindIpAll: true
security:
 keyFile: <file-path>
replication:
 oplogSizeMB: 64
 replSetName: config
sharding:
 clusterRole: configsvr

Shard:

systemLog:
 destination: file
 path: /var/log/mongodb/mongod.log
 logAppend: true
 logRotate: reopen
storage:
 dbPath: /datadrive/mongodb
 engine: wiredTiger
 journal:
  enabled: true
net:
 port: <port>
 bindIpAll: true
security:
 keyFile: <file-path>
replication:
 oplogSizeMB: 16768
 replSetName: shard_1
sharding:
 clusterRole: shardsvr