PerconaServerMongoDBBackup | psmdb operator | server selection error

hparikh · November 22, 2024, 8:06pm

We are having a single shard mongodb cluster deployed in kubernetes with psmdb-operator-1.16.2. We have the backups enabled and I notice backups run successful for small data size of 5-10 GB. However when backup runs little longer for DB size more than 50G we start getting “server selection error” followed by “socket was unexpectedly closed” for cfg replset.

We face this issue with both the backup types logical as well as physical.

Below is the backup config yaml

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  finalizers:
  - percona.com/delete-backup
  name: manual-bkp-test1-20241122
spec:
  clusterName: test1
  storageName: gcs
  type: physical

Attached is the psmdb-backup status where you will find the error details. On checking backup-agent logs for cfg and rs0 replset, we see backup gets successfully completed but it’s just that the operator shows the backup status as error

$ kubectl get psmdb-backup manual-bkp-test1-20241122
NAME                        CLUSTER    STORAGE   DESTINATION                                                   TYPE       STATUS   COMPLETED   AGE
manual-bkp-test1-20241122   test1      gcs       s3://test1-mdb-backups/test1-test1-mdb/2024-11-22T18:23:53Z   physical   error                97m

Is there any configuration in backups where we can increase socket timeout to overcome this error?

Thanks

Harsh

anil.joshi · November 26, 2024, 10:34am

@hparikh

Did you check the health of the backend DB nodes around the corner if no such network or other hiccups/blockers ?

May be you can share the Mongo POD logs and information for a insight ?

kubectl logs pod_name
kubectl describe pod pod_name

Is there any specific time period when this issue happening ?

Can you please share the below information as well to check more on PBM side ?

sudo pbm status > pbm.status;
sudo pbm list > list.out;
sudo pbm logs -t0 > logs.out;
sudo pbm version > version.out;
sudo pbm config --list > conf.out;

How much resources are allocated for memory/CPU ?

hparikh · December 4, 2024, 8:30pm

Hi @anil.joshi

Thank you for your kind feedback. We checked the health of mongodb cluster for cfg, rs0 and mongos, they are all running healthy. We don’t see any errors in mongod pods (containers) that can cause backup agent to get socket time out.

This issue happens everytime when we run the backups.

Please find attached all the output requested, if there is anything else that you think is required kindly let me know.

pbm_queries.log (24.0 KB)
pod-describe.log (12.1 KB)
psmdb-yaml.log (10.4 KB)

Best,

Harsh

hparikh · December 12, 2024, 6:51pm

I tried testing with newer as well as latest version of backup-agent, still no luck.

percona-backup-mongodb:2.4.1 (default for 1.16 chart)
percona-backup-mongodb:2.7.0

  spec:
    backup:
      enabled: true
      image: percona/percona-backup-mongodb:2.7.0
      pitr:
        enabled: false
      storages:
        gcs:
          s3:

Harsh

Topic		Replies	Views
Error during an attempt to restore a physical backup of MongoDB Percona Operator for MongoDB	3	89	March 20, 2025
PSMDB 1.14.0. PBM (backup) exit code -1 Percona Operator for MongoDB	1	932	July 20, 2023
Mongo ping: server selection error: server selection timeout Percona Backup for MongoDB percona , mongodb , closed-no-reply	0	1333	December 19, 2022
Failed to perform backup to S3 on Percona Operator for MongoDB Percona Operator for MongoDB percona , mongodb	5	1519	May 14, 2024
When I turn on the backup option, operator reports an error：error occured during connection handshake: x509 Percona Operator for MongoDB	10	1455	February 3, 2023

PerconaServerMongoDBBackup | psmdb operator | server selection error

Related topics