MongoDB intial backup won't work with Operator version 1.16.2

Kimmo_Katajisto · October 1, 2024, 8:19am

Hi there,

I am testing Percona MongoDB Operator.

Cluster is deployed and its healthy:
katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod get pods -n dbaas-mongodb-wft-mongo
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-cb86ccd85-xdkp2 1/1 Running 0 37m
wft-mongo-cfg-0 3/3 Running 0 20m
wft-mongo-cfg-1 3/3 Running 0 20m
wft-mongo-cfg-2 3/3 Running 0 20m
wft-mongo-mongos-0 2/2 Running 0 20m
wft-mongo-mongos-1 2/2 Running 0 20m
wft-mongo-mongos-2 2/2 Running 0 19m
wft-mongo-rs0-0 3/3 Running 0 20m
wft-mongo-rs0-1 3/3 Running 0 20m
wft-mongo-rs0-2 3/3 Running 0 20m
katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod get psmdb -n dbaas-mongodb-wft-mongo
NAME ENDPOINT STATUS AGE
wft-mongo wft-mongo-mongos.dbaas-mongodb-wft-mongo.svc.cluster.local ready 36m

But backup fails:
katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod get psmdb-backup -n dbaas-mongodb-wft-mongo
NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
initial-on-demand-backup wft-mongo s3-storage s3://dbaas-mongodb-wft-mongo/2024-10-01T08:14:37Z logical error 4m31s

Description:
katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod describe psmdb-backup initial-on-demand-backup -n dbaas-mongodb-wft-mongo
Name: initial-on-demand-backup
Namespace: dbaas-mongodb-wft-mongo
Labels:
Annotations:
API Version: psmdb.percona.com/v1
Kind: PerconaServerMongoDBBackup
Metadata:
Creation Timestamp: 2024-10-01T08:13:52Z
Generation: 1
Resource Version: 2686756144
UID: 51c34eaa-c89f-435d-877b-f09fe105e1d2
Spec:
Cluster Name: wft-mongo
Storage Name: s3-storage
Status:
Destination: s3://dbaas-mongodb-wft-mongo/2024-10-01T08:14:37Z
Error: no available agent(s) on replsets: rs0
Last Transition: 2024-10-01T08:14:37Z
Pbm Name: 2024-10-01T08:14:37Z
Replset Names:
cfg
rs0
s3:
Bucket: dbaas-mongodb-wft-mongo
Credentials Secret: my-cluster-name-backup-s3
Endpoint URL: he-pi-s3-shn-102.xyz
Region: us-east-1
Server Side Encryption:
Start: 2024-10-01T08:14:37Z
State: error
Storage Name: s3-storage
Type: logical
Events:

This is a fresh MongoDB cluster. Could you help me please?

Thank you,
-Kimmo

Vinodh_Krishnaswamy · October 1, 2024, 12:22pm

Hi Kimmo,
I see the error that it has no backup agent on rs0:
Error: no available agent(s) on replsets: rs0

Can you please check whether you have backup-agents available? You can also check the logs like below from the backup-agent:
kubectl --kubeconfig nksm-he-nesc-dbaas-prod logs pod/wft-mongo-rs0-0 -c backup-agent -n dbaas-mongodb-wft-mongo

And also get the below one:
kubectl --kubeconfig nksm-he-nesc-dbaas-prod describe pod/wft-mongo-rs0-0 -c backup-agent -n dbaas-mongodb-wft-mongo

Regards,
Vinodh Guruji

Kimmo_Katajisto · October 1, 2024, 5:49pm

It seems to be a password issue?

katajistok@dbaasdev001:~$ kubectl --kubeconfig .kube/nksm-he-nesc-dbaas-prod logs pod/wft-mongo-rs0-0 -c backup-agent -n dbaas-mongodb-wft-mongo | tail -n 10
2024/10/01 17:43:34 [entrypoint] pbm-agent exited with code 1
2024/10/01 17:43:34 [entrypoint] restart in 5 sec
2024/10/01 17:43:39 [entrypoint] starting pbm-agent
2024/10/01 17:43:39 Exit: connect to PBM: create mongo connection to configsvr with connection string ‘mongodb://wft-mongo-cfg-0.wft-mongo-cfg.dbaas-mongodb-wft-mongo.svc.cluster.local:27017,wft-mongo-cfg-1.wft-mongo-cfg.dbaas-mongodb-wft-mongo.svc.cluster.local:27017,wft-mongo-cfg-2.wft-mongo-cfg.dbaas-mongodb-wft-mongo.svc.cluster.local:27017#8It8ALnEOJ2TlW@localhost:27017/?replicaSet=rs0&tls=true&tlsCertificateKeyFile=%2Ftmp%2Ftls.pem&tlsCAFile=/etc/mongodb-ssl%2Fca.crt&tlsInsecure=true’: connect: error parsing uri: unescaped colon in password
2024/10/01 17:43:39 [entrypoint] pbm-agent exited with code 1
2024/10/01 17:43:39 [entrypoint] restart in 5 sec
2024/10/01 17:43:44 [entrypoint] starting pbm-agent
2024/10/01 17:43:44 Exit: connect to PBM: create mongo connection to configsvr with connection string ‘mongodb://wft-mongo-cfg-0.wft-mongo-cfg.dbaas-mongodb-wft-mongo.svc.cluster.local:27017,wft-mongo-cfg-1.wft-mongo-cfg.dbaas-mongodb-wft-mongo.svc.cluster.local:27017,wft-mongo-cfg-2.wft-mongo-cfg.dbaas-mongodb-wft-mongo.svc.cluster.local:27017#8It8ALnEOJ2TlW@localhost:27017/?replicaSet=rs0&tls=true&tlsCertificateKeyFile=%2Ftmp%2Ftls.pem&tlsCAFile=/etc/mongodb-ssl%2Fca.crt&tlsInsecure=true’: connect: error parsing uri: unescaped colon in password
2024/10/01 17:43:44 [entrypoint] pbm-agent exited with code 1
2024/10/01 17:43:44 [entrypoint] restart in 5 sec

Kimmo_Katajisto · October 1, 2024, 5:51pm

Cannot copy & paste describe output, says something about new users cannot add 2 links …

Kimmo_Katajisto · October 1, 2024, 7:56pm

I changed MONGODB_BACKUP_USER password and it went bit more further, Pods were restarted. It’s still running to an error: “starting deadline exceeded”.

katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod get psmdb-backup -n dbaas-mongodb-wft-mongo
NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
initial-on-demand-backup wft-mongo s3-storage s3://dbaas-mongodb-wft-mongo/2024-10-01T19:50:04Z logical error 4m21s

katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod describe psmdb-backup initial-on-demand-backup -n dbaas-mongodb-wft-mongo
Name: initial-on-demand-backup
Namespace: dbaas-mongodb-wft-mongo
Labels:
Annotations:
API Version: psmdb.percona.com/v1
Kind: PerconaServerMongoDBBackup
Metadata:
Creation Timestamp: 2024-10-01T19:49:18Z
Generation: 1
Resource Version: 2688258487
UID: 2b792b2f-a50f-4946-a713-00ce66e4f9c5
Spec:
Cluster Name: wft-mongo
Storage Name: s3-storage
Status:
Destination: s3://dbaas-mongodb-wft-mongo/2024-10-01T19:50:04Z
Error: starting deadline exceeded
Last Transition: 2024-10-01T19:50:05Z
Pbm Name: 2024-10-01T19:50:04Z
Replset Names:
cfg
rs0
s3:
Bucket: dbaas-mongodb-wft-mongo
Credentials Secret: my-cluster-name-backup-s3
Endpoint URL: he-pi-s3-shn-102.nesc.nokia.net
Region: us-east-1
Server Side Encryption:
Start: 2024-10-01T19:50:05Z
State: error
Storage Name: s3-storage
Type: logical
Events:

katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod logs pod/wft-mongo-rs0-0 -c backup-agent -n dbaas-mongodb-wft-mongo
2024/10/01 19:44:49 [entrypoint] starting pbm-agent
2024-10-01T19:44:50.000+0000 I pbm-agent:
Version: 2.4.1
Platform: linux/amd64
GitCommit: 67a182e77ff70be5781ad9b68e42dbb59f1a3de6
GitBranch: release-2.4.1
BuildTime: 2024-03-21_11:09_UTC
GoVersion: go1.19
2024-10-01T19:44:50.000+0000 I starting PITR routine
2024-10-01T19:44:50.000+0000 I node: rs0/wft-mongo-rs0-0.wft-mongo-rs0.dbaas-mongodb-wft-mongo.svc.cluster.local:27017
2024-10-01T19:44:50.000+0000 I listening for the commands
2024-10-01T19:44:52.000+0000 D [pitr] start_catchup
2024-10-01T19:44:52.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:44:55.000+0000 W [agentCheckup] get current storage status: query mongo: mongo: no documents in result
2024-10-01T19:45:22.000+0000 D [pitr] start_catchup
2024-10-01T19:45:22.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:45:52.000+0000 D [pitr] start_catchup
2024-10-01T19:45:52.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:46:22.000+0000 D [pitr] start_catchup
2024-10-01T19:46:22.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:46:52.000+0000 D [pitr] start_catchup
2024-10-01T19:46:52.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:47:22.000+0000 D [pitr] start_catchup
2024-10-01T19:47:22.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:47:52.000+0000 D [pitr] start_catchup
2024-10-01T19:47:52.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:48:22.000+0000 D [pitr] start_catchup
2024-10-01T19:48:22.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:48:52.000+0000 D [pitr] start_catchup
2024-10-01T19:48:52.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:49:23.000+0000 D [pitr] start_catchup
2024-10-01T19:49:23.000+0000 E [pitr] init: catchup: get last backup: no backup found. full backup is required to start PITR
2024-10-01T19:49:42.000+0000 I got command backup [name: 2024-10-01T19:49:41Z, compression: gzip (level: default)] <ts: 1727812181>
2024-10-01T19:49:42.000+0000 I got epoch {1727812170 4}
2024-10-01T19:49:42.000+0000 D [backup/2024-10-01T19:49:41Z] skip after nomination, probably started by another node
2024-10-01T19:49:53.000+0000 I [pitr] oplog slicer is paused for lock [Snapshot backup, opid: 66fc525573c9cdfef8b7281a]
2024-10-01T19:50:05.000+0000 I got command backup [name: 2024-10-01T19:50:04Z, compression: gzip (level: default)] <ts: 1727812204>
2024-10-01T19:50:05.000+0000 I got epoch {1727812193 1}
2024-10-01T19:50:08.000+0000 I [pitr] oplog slicer is paused for lock [Snapshot backup, opid: 66fc525573c9cdfef8b7281a]
2024-10-01T19:50:10.000+0000 D [backup/2024-10-01T19:50:04Z] get lock: another operation is running: Snapshot backup ‘66fc525573c9cdfef8b7281a’
2024-10-01T19:50:10.000+0000 D [backup/2024-10-01T19:50:04Z] skip: lock not acquired
2024-10-01T19:50:23.000+0000 I [pitr] oplog slicer is paused for lock [Snapshot backup, opid: 66fc525573c9cdfef8b7281a]
2024-10-01T19:50:38.000+0000 D [pitr] start_catchup
2024-10-01T19:50:38.000+0000 I [pitr] streaming started from 2024-10-01 19:50:15 +0000 UTC / 1727812215
2024-10-01T19:50:38.000+0000 D [pitr] start_ok

Kimmo_Katajisto · October 1, 2024, 8:06pm

I deleted backup, paused/unpaused MongoDB cluster and started backup again. Now it worked.

katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod get psmdb-backup -n dbaas-mongodb-wft-mongo
No resources found in dbaas-mongodb-wft-mongo namespace.

katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod apply -f bak.yaml -n dbaas-mongodb-wft-mongo
perconaservermongodbbackup.psmdb.percona.com/initial-on-demand-backup created

katajistok@dbaasdev001:~/.kube$ kubectl --kubeconfig nksm-he-nesc-dbaas-prod get psmdb-backup -n dbaas-mongodb-wft-mongo
NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
initial-on-demand-backup wft-mongo s3-storage s3://dbaas-mongodb-wft-mongo/2024-10-01T20:03:44Z logical ready 42s 108s

I guess Operator doesn’t like this password:
echo -n ‘OSM4SXQ4QUxuRU9KMlRsVw==’ | base64 --decode
9#8It8ALnEOJ2TlW

Maybe password should not start with number?

Kimmo_Katajisto · October 1, 2024, 8:11pm

Or is it because of hashtag (#) in password?

-Kimmo

Kimmo_Katajisto · October 2, 2024, 5:41am

Hello,

Just made one test. I removed # from password and backup worked. Is there a list of “safe” characters for System Users passwords somewhere?

-Kimmo

Vinodh_Krishnaswamy · October 3, 2024, 2:44pm

I deleted backup, paused/unpaused MongoDB cluster and started backup again. Now it worked.

Glad to hear that!

Indeed, it works without “#.” I tested the same and I was able to see your problem

Backup User password:

$ k get secret/internal-my-cluster-name-users -npsmdb --template={{.data.MONGODB_BACKUP_PASSWORD}} |base64 -d
asf#shdfA9

PBM agent complained as follows:

2024/10/02 17:56:38 [entrypoint] `pbm-agent` exited with code 1
2024/10/02 17:56:38 [entrypoint] restart in 5 sec
2024/10/02 17:56:43 [entrypoint] starting `pbm-agent`
2024/10/02 17:56:43 Exit: connect to PBM: create mongo connection: connect: parse "mongodb://backup:asf": invalid port ":asf" after host
2024/10/02 17:56:43 [entrypoint] `pbm-agent` exited with code 1
2024/10/02 17:56:43 [entrypoint] restart in 5 sec
2024/10/02 17:56:48 [entrypoint] starting `pbm-agent`
2024/10/02 17:56:48 Exit: connect to PBM: create mongo connection: connect: parse "mongodb://backup:asf": invalid port ":asf" after host
2024/10/02 17:56:48 [entrypoint] `pbm-agent` exited with code 1```

I can connect to the DB with hexadecimal of the special character as follows (hex of # is %23):

mongosh "mongodb://backup:asf%23shdfA9@localhost:27017/?authSource=admin"

Raised JIRA [K8SPSMDB-1172] - Percona JIRA for the same. Thanks for sharing us the issue. You can subscribe yourself to the JIRA to see the updates.

Regards,
Vinodh Guruji

Topic		Replies	Views
[Error] backup-agent: couldn't find key MONGODB_BACKUP_PASSWORD_ESCAPED in secret Percona Operator for MongoDB	1	73	April 18, 2025
Can't get backup working (operator crashes) Percona Operator for MongoDB closed-no-reply	0	797	September 1, 2021
Percona mongodb operator backup error Percona Operator for MongoDB	1	461	February 21, 2024
Does percona mongodb-operator minimal-cluster supports backup? Percona Operator for MongoDB	3	298	April 8, 2024
Error while trying backup: check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: Percona Operator for MongoDB closed-no-reply , pbm	5	881	November 9, 2024

MongoDB intial backup won't work with Operator version 1.16.2

Related topics