Issues with operator backup to S3

Hi,

I struggling with a number of related backup issues, I wondered if anyone had any advice that might help.

I’m carrying out some testing on a 3 node cluster formed from Hetzner cloud nodes, in a namespace called mongodb. Percona Mongodb has been installed using Helm . Changes to the setup are applied as follows

helm upgrade mongodb-clu1 percona/psmdb-db -n mongodb --reuse-values -f storage.yaml

Where different yaml files adjust the various parts of the S3 backup.

Connectivity to S3 has been successfully tested -

kubectl run -i --rm aws-cli --image=perconalab/awscli --restart=Never – bash -c ‘AWS_ACCESS_KEY_ID=aws_access_key
AWS_SECRET_ACCESS_KEY=aws_secret_access_key
AWS_DEFAULT_REGION=eu-west-2
/usr/bin/aws
s3 ls s3://our.fqdn.com/archive/server20/’
If you don’t see a command prompt, try pressing enter.
2021-07-14 12:09:35 0
2022-04-25 02:01:37 1184854844 backup_server20_22-04-25.tgz
2022-05-30 02:01:33 1192485332 backup_server20_22-05-30.tgz
2022-06-27 02:01:34 1192498841 backup_server20_22-06-27.tgz
2022-07-25 02:01:36 1192498442 backup_server20_22-07-25.tgz
2022-08-29 02:01:32 1192491757 backup_server20_22-08-29.tgz
2022-09-26 02:01:36 1192491772 backup_server20_22-09-26.tgz

but attempts to schedule the backup fail with the following error

2023-06-02T10:58:11.000+0000 E [agentCheckup] check storage connection: storage check failed with: get S3 object header: RequestError: send request failed
caused by: Head “https://s3.eu-west-2.amazonaws.com/our.fqdn.com/archive/server20/.pbm.init”: net/http: invalid header field value for “Authorization”

As the backup hasn’t completed, the log is full of these messages.

One final point, I can’t be sure at present if these messages are from the original cron job, which didn’t reference the namespace, and which I can’t delete.

root@kube-1:~# kubectl get psmdb-backup -A
NAMESPACE NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
default backup1 mongodb-clu1 s3-eu-west 31d
mongodb backup2 mongodb-clu1-psmdb-db s3-eu-west 2023-05-26T14:20:53Z logical error 8m8s

attempts to remove backup1 claim to have succeeded, but always hang. Is there a way to remove it ?
oot@kube-1:~# kubectl get psmdb-backup -A
NAMESPACE NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
default backup1 mongodb-clu1 s3-eu-west 38d
root@kube-1:~# kubectl -n default delete psmdb-backup/backup1
perconaservermongodbbackup.psmdb.percona.com “backup1” deleted
^Croot@kube-1:~# kubectget psmdb-backup -Ap1
NAMESPACE NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
default backup1 mongodb-clu1 s3-eu-west 38d
mongodb cron-mongodb-clu1-psm-20230602113000-4wp57 mongodb-clu1-psmdb-db s3-eu-west 2023-06-02T11:30:21Z logical error 4m53s
root@kube-1:~# kubectl -n mongodb delete psmdb-backup/cron-mongodb-clu1-psm-20230602113000-4wp57
perconaservermongodbbackup.psmdb.percona.com “cron-mongodb-clu1-psm-20230602113000-4wp57” deleted
root@kube-1:~# kubectl get psmdb-backup -A
NAMESPACE NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
default backup1 mongodb-clu1 s3-eu-west 38d

There is more config info available if required.

Thanks,

Mike

Hello @mikes ,

thanks for reaching out.
Most of the times I see auth issues in the operator because of the trailing end of line symbol.
When you encode the S3 keys into base64, do you use -n flag like it is recommended in our docs?

$ echo -n YOURKEY | base64 -w0

or for MAC

$ echo -n YOURKEY | base64

The deletion might also freeze because of the wrong key, as Operator tries to delete the backup from the bucket as well.

Morning Sergey,

Well, I’ve tried the base64 encoding of the keys, and the error messages has changed. It’s now getting through to the next step, but still failing.

root@kube-1:~# kubectl -n mongodb get psmdb-backup/cron-mongodb-clu1-psm-20230605093000-4kcwj -o yaml
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
creationTimestamp: “2023-06-05T09:30:00Z”
finalizers:

  • delete-backup
    generateName: cron-mongodb-clu1-psm-20230605093000-
    generation: 1
    labels:
    ancestor: daily-s3-eu-west
    cluster: mongodb-clu1-psmdb-db
    type: cron
    name: cron-mongodb-clu1-psm-20230605093000-4kcwj
    namespace: mongodb
    resourceVersion: “12694995”
    uid: a16a2835-c63f-4d44-bc09-ea1089cb1b2a
    spec:
    clusterName: mongodb-clu1-psmdb-db
    compressionType: gzip
    storageName: s3-eu-west
    type: logical
    status:
    destination: “2023-06-05T09:30:21Z”
    error: “oplog: write data: upload to S3: SignatureDoesNotMatch: The request signature
    we calculated does not match the signature you provided. Check your key and signing
    method.\n\tstatus code: 403, request id: P93HRN7KKDAVHC67, host id: cjztxSXGLYw8c5QTsJAdgbAPrOyi8ETZzfBoc0dX6PfMggUSgacvChY+RkupBZUwulys0SkT2RQ=.”
    lastTransition: “2023-06-05T09:30:31Z”
    pbmName: “2023-06-05T09:30:21Z”
    replsetNames:
  • rs0
    s3:

Just as a reminder - this command line option works without having to base64 encode the key’s.

kubectl run -i --rm aws-cli --image=perconalab/awscli

Is there any way of testing the backup function by being directly logged in to a pod ?

Thanks

Mike

Also, it does seem to have managed to write the .pbm.init file

[root@h80-84-58-218 Jun]# s3cmd -c /root/.ms-s3cfg get s3://vrbackups.viewranger.com/archive/server22/.pbm.init
download: ‘s3://vrbackups.viewranger.com/archive/server22/.pbm.init’ → ‘./.pbm.init’ [1 of 1]
5 of 5 100% in 0s 276.49 B/s done
[root@h80-84-58-218 Jun]# more .pbm.init
2.0.4

Which does seem to suggest it has write access to the bucket.

More confused.

Mike

So, next time round, 30 mins later, a full dump of the DB takes place, but all attempts to “PUT” these files into the S3 bucket fail with the error shown above. I have the logs for that bucket, here are a couple of entries

56 AuthHeader s3.eu-west-2.amazonaws.com TLSv1.2 - -
50459b1a7c8542464d1c71746fe30d82ce4aa05da24e3039e4bcc3e523004e79 vrbackups.viewranger.com [05/Jun/2023:09:30:28 +0000] 65.109.193.4 - HK4A29YW3Y4AQ50W REST.PUT.OBJECT archive/ser
ver22/2023-06-05T09%253A30%253A21Z/rs0/admin.pbmLog.gz “PUT /vrbackups.viewranger.com/archive/server22/2023-06-05T09:30:21Z/rs0/admin.pbmLog.gz HTTP/1.1” 403 SignatureDoesNotMatc
h 3098 - 5 - “-” “aws-sdk-go/1.44.159 (go1.19; linux; amd64) S3Manager” - JysAYgmbyCEMRtHS3W19z1UMS1KafOCSqhR0ZrCDGkvzPajtz5QnHiIlwftIvxsL/ImUkecWAKM= SigV4 ECDHE-RSA-AES128-GCM-
SHA256 AuthHeader s3.eu-west-2.amazonaws.com TLSv1.2 - -
::::::::::::::
./2023-06-05-10-31-17-E6B59F2B3378490E
::::::::::::::
50459b1a7c8542464d1c71746fe30d82ce4aa05da24e3039e4bcc3e523004e79 vrbackups.viewranger.com [05/Jun/2023:09:36:42 +0000] 65.109.193.4 arn:aws:iam::************:user/MikeShield 3JF8
FR22GG4Q8GAZ REST.HEAD.OBJECT archive/server22/.pbm.init “HEAD /vrbackups.viewranger.com/archive/server22/.pbm.init HTTP/1.1” 200 - - 5 31 - “-” “aws-sdk-go/1.44.159 (go1.19; lin
ux; amd64)” - tACpOIpvEzouC6z5NIXMdGrp57FtcYEWrBO9j1gz3d1T+8ggycZxOqs7Ue19wZ/nqg4LiUM9M2A= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader s3.eu-west-2.amazonaws.com TLSv1.2 - -
::::::::::::::

So essentially, we can use the auth and “GET”, but not “PUT”. My AWS account credentials have been able to “PUT” to this bucket for some time now (6-7 years), so it might be something to do with that, but I have to admit to being totally flumoxed at present.

Mike

Looking closely - the HEAD command has picked up my AWS id, which is why I’ve had to come back and edit the post and remove it, while the PUT command only has a “-” in the log at the same place. I’d say this looks to be an issue with the code, but will wait and see.

Thanks,

Mike

Hey @mikes ,

thanks for diving deep into it.
I’m asking our Percona Backup for MongoDB experts to look into it.

Thanks Sergey,

You might want to tell them that the issue is in the way that pbm handles folders under the bucket, probably the handling of “/” in the name if you use a structure under the bucket, as we do. It’s fairly simple, two top level folders, called backup and archive, which have different retention policies applied, and then server names below that on each one.

I’ve created a new bucket, removed the folder structure and hey presto, worked straight away. I’d still class it as an issue, as it’s not unreasonable to expect to be able to use these features of S3.

Hope that helps

Mike