Transactions exceptions in sharded cluster

Getting lots of transaction exceptions in my sharded cluster

cannot continue txnId 27087 for session uuid - base64 -  -  with txnRetryCounter 0 cannot continue txnId 27086 for session

The transaction eventually completes, but seems to be random.

The problem actually occurs due to the service acting as a load balancer in front of the mongos pods. When a transaction is executed, all of its operations need to happen against the same mongos pod, which becomes random depending on where the operation is routed through.

As a resolution, I set the servicePerPod param to true on the mongos config in order for my services requiring transactional writes to specifically select a single one and eliminate the randomness creating those errors.

There doesn’t seem to be an option to enable services both per-pod and for all of them out of the box. However since creating an additional service for the mongos is trivial, you can easily get the best of both worlds.

