Description:
I’m using mongosync to replicate data from one MongoDB cluster to another, both managed by the Percona Kubernetes Operator. I’m seeing a flood of NoSuchTransaction
errors on the destination side, and it looks like mongosync is repeatedly aborting or expiring transactions before it can commit each batch of CRUD events. Im syncing from a single replicaset instance to a sharded multi replicaset.
Steps to Reproduce:
- Deploy Source Cluster
- Deploy Destination Cluster
- Run mongosync
Version:
mongodb 8
Logs:
{"time":"2025-05-13T07:45:15.302926Z","level":"debug","serverID":"7c04cb2b","mongosyncID":"coordinator","crudBatchID":"42c2b834-62ab-42e9-a986-47410fe0b245","sessionID":{"id": {"$uuid": "5dd1147f-1409-43de-8fb8-edca4f675c54"}},"componentNames":["Change Event Application","Change Event Application","CRUD Processors","CRUD Processors","Change Event Applier 45 (CRUD)"],"errorFromPreviousTransactionFunctionCall":{"msErrorLabels":["serverError"],"clientType":"destination","database":"metaverse","collection":"item_scheduler","collectionUUID":"134ed987-f29e-4dcf-bf6a-4833cdb67917","failedCommand":"RunCommand","failedRunCommand":"[{update item_scheduler} {updates [[{q {\"_id\": {\"$oid\":\"66cdf56abe81e218a7bc1931\"}}} {u [[{$_internalApplyOplogUpdate [{oplogUpdate {\"$v\": {\"$numberInt\":\"2\"},\"diff\": {\"u\": {\"fetch_sales_at\": {\"$date\":{\"$numberLong\":\"1747208712681\"}}}}}}]}]]} {multi false} {upsert false}]]} {bypassDocumentValidation true} {bypassEmptyTsReplacement true}]","message":"Change Event Applier 45 (CRUD) failed to apply update event (cluster time: &{T:1747122315 I:12}, namespace: { db: metaverse, coll: item_scheduler, sourceUUID: <nil>, destUUID: <nil> }, collUUID: 84d025f9-ec57-4094-b894-ecd4171fa7e0): failed to update document when querying on the _id field: failed to update document on destination: failed to execute a command on the MongoDB server: (NoSuchTransaction) cannot continue txnId 1991 for session 5dd1147f-1409-43de-8fb8-edca4f675c54 - O0CMtIVItQN4IsEOsJdrPL8s7jv5xwh5a/A5Qfvs2A8= - - with txnRetryCounter 0"},"timesCalled":2,"transactionIdentifier":"ApplyEventsBatch","message":"Calling transaction callback."}
Expected Result:
All change events should be applied cleanly in batches, with no transaction aborts. mongosync should steadily advance its progress without retrying or losing events.
Actual Result:
Each CRUD batch begins a multi‐document transaction on the destination, but frequently the server responds with:
(NoSuchTransaction) cannot continue txnId <…> for session <…> – with txnRetryCounter 0
Additional Information:
I came across this article - Don't Use Load Balancer In front of Mongos | Finisky Garden
In short:
- The Percona Operator may expose multiple mongos instances; if mongosync’s transaction commands hit different mongos routers (due to lack of session pinning), transactions will be aborted ?
mongos:
size: 3
expose:
enabled: true
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-ip-address-type: ipv4
service.beta.kubernetes.io/aws-load-balancer-name: acme-prod-psmdb-default-sharded
Any guidance on how to configure the Percona Operator, mongosync settings, or MongoDB parameters to eliminate these NoSuchTransaction
failures would be greatly appreciated!