Environment
- Kubernetes: EKS (AWS)
- Percona PSMDB Operator: 1.21.1
- Percona Server for MongoDB: 7.0.24-13
- Percona Backup for MongoDB (PBM): 2.11.0
- Istio: 1.28.3 (Ambient mode, no sidecars)
- Replica set: 3 members (
psmdb-db-rs0-{0,1,2}), sharding disabled - TLS: preferTLS
Summary
We have a PSMDB cluster that was fully functional before we introduced Istio ambient mode into the namespace. After enabling Istio ambient (ztunnel-based L4 mesh, no sidecars), the PSMDB operator and the backup agent started experiencing intermittent connection handshake failures to replica set members. The replica set itself remains healthy (3/3 ready), but the operator’s reconciliation loop and backup operations fail intermittently.
Symptoms
1. Operator reconciliation errors
The PerconaServerMongoDB CR flips between ready and error state. The error message is always a connection handshake failure to one of the RS members (the target member varies — sometimes rs0-0, sometimes rs0-2):
Status:
Backup Config Hash: 9623d82b4e716743357ab93e2cf249a013a23dc3b843e51c7e49d
Backup Image: percona/percona-backup-mongodb:2.11.0
Backup Version: 2.11.0
Conditions:
Last Transition Time: 2026-02-06T18:52:35Z
Status: True
Type: initializing
Last Transition Time: 2026-02-06T19:03:32Z
Message: update PiTR config: create pbm object: create PBM connection to psmdb-db-rs0-0.psmdb-db-rs0.ns1.svc.cluster.local:27017,psmdb-db-rs0-1.psmdb-db-rs0.ns1.svc.cluster.local:27017,psmdb-db-rs0-2.psmdb-db-rs0.ns1.svc.cluster.local:27017: create mongo connection: ping: connection() error occurred during connection handshake: handshake failure: connection(psmdb-db-rs0-2.psmdb-db-rs0.ns1.svc.cluster.local:27017[-2911361]) socket was unexpectedly closed: EOF: connection(psmdb-db-rs0-2.psmdb-db-rs0.ns1.svc.cluster.local:27017[-2911361]) socket was unexpectedly closed: EOF
Reason: ErrorReconcile
Status: True
Type: error
Last Transition Time: 2026-02-06T19:21:17Z
Status: True
Type: ready
Last Transition Time: 2026-02-16T19:57:16Z
Status: False
Type: sharding
Host: psmdb-db-rs0.ns1.svc.cluster.local
Message: Error: dial: ping mongo: connection() error occurred during connection handshake: handshake failure: connection(psmdb-db-rs0-2.psmdb-db-rs0.ns1.svc.cluster.local:27017[-2911376]) socket was unexpectedly closed: EOF: connection(psmdb-db-rs0-2.psmdb-db-rs0.ns1.svc.cluster.local:27017[-2911376]) socket was unexpectedly closed: EOF
Mongo Image: percona/percona-server-mongodb:7.0.24-13
Mongo Version: 7.0.24-13
Observed Generation: 9
Ready: 3
Replsets:
rs0:
Initialized: true
Ready: 3
Size: 3
Status: ready
Size: 3
State: error
The replset is fully operational (Ready: 3, Status: ready), all application connections via the service work fine, but the operator intermittently can’t reach individual pod hostnames during reconciliation.
2. Backup failures
On-demand and scheduled backups (PBM logical backup to S3-compatible storage) fail with similar errors. The backup agent successfully dumps most collections and uploads them to storage, but ultimately fails because PBM loses heartbeat connectivity to a RS member:
2026-02-16T20:07:13 I [backup/...] dump finished, waiting for the oplog
2026-02-16T20:07:13 I [backup/...] mark backup as error
`check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1771272383`
2026-02-16T20:07:13 E [backup/...] backup: check cluster for dump done:
convergeCluster: lost shard rs0, last beat ts: 1771272383
The underlying connection errors during backup are consistently:
connection() error occurred during connection handshake: handshake failure:
connection(psmdb-db-rs0-2...[-259]) socket was unexpectedly closed: EOF
Questions
-
Is the PSMDB operator known to work with Istio ambient mode? Are there any recommended configurations or known incompatibilities with ztunnel traffic interception?
-
Are there recommended Istio PeerAuthentication / DestinationRule configurations for PSMDB clusters running in an Istio ambient mesh?
Any guidance on running Percona PSMDB operator in an Istio ambient mode environment would be greatly appreciated.