Transactions exceptions in sharded cluster

Getting lots of transaction exceptions in my sharded cluster

cannot continue txnId 27087 for session uuid - base64 -  -  with txnRetryCounter 0 cannot continue txnId 27086 for session

The transaction eventually completes, but seems to be random.

The problem actually occurs due to the service acting as a load balancer in front of the mongos pods. When a transaction is executed, all of its operations need to happen against the same mongos pod, which becomes random depending on where the operation is routed through.

As a resolution, I set the servicePerPod param to true on the mongos config in order for my services requiring transactional writes to specifically select a single one and eliminate the randomness creating those errors.

There doesn’t seem to be an option to enable services both per-pod and for all of them out of the box. However since creating an additional service for the mongos is trivial, you can easily get the best of both worlds.

Hope it helps :smiley:

Reference 1, 2, 3, 4.