Description:
lot of backup in error
Steps to Reproduce:
configure backup to s3 storage
Version:
all
Logs:
…
Destination: s3://accel-webapp-dev-elog-backup/2024-02-21T00:00:21Z
Error: some of pbm-agents were lost during the backup Last Transition: 2024-02-21T00:00:52Z Pbm Name: 2024-02-21T00:00:21Z Pbm Pod: elog-plus-cluster-rs0-0 Replset Names: rs0
…
Additional Information:
i have a 3 node replica-set and each backup agent has those log:
2024-02-21T12:00:21.000+0000 D [backup/2024-02-21T12:00:21Z] init backup meta 2024-02-21T12:00:21.000+0000 D [backup/2024-02-21T12:00:21Z] nomination list for rs0: [[elog-plus-cluster-rs0-1.elog-plus-cluster-rs0.elog-plus.svc.cluster.local:27017 elog-plus-cluster-rs0-2.elog-plus-c │
2024-02-21T12:00:21.000+0000 D [backup/2024-02-21T12:00:21Z] nomination rs0, set candidates [elog-plus-cluster-rs0-1.elog-plus-cluster-rs0.elog-plus.svc.cluster.local:27017 elog-plus-cluster-rs0-2.elog-p
2024-02-21T12:00:22.000+0000 D [backup/2024-02-21T12:00:21Z] skip after nomination, probably started by another node
2024-02-21T12:00:26.000+0000 D [pitr] set pitr span to 20m0s
2024-02-21T12:00:26.000+0000 D [backup/2024-02-21T12:00:21Z] bcp nomination: rs0 won by elog-plus-cluster-rs0-1.elog-plus-cluster-rs0.elog-plus.svc.cluster.local:27017
…
2024-02-21T12:00:28.399+0000 Mux close namespace elogs.mongockChangeLog
2024-02-21T12:00:28.399+0000 done dumping admin.pbmRRoles (2 documents) 2024-02-21T12:00:28.399+0000 Mux close namespace admin.pbmRRoles
2024-02-21T12:00:28.399+0000 done dumping admin.pbmPITRChunks (358 documents)
2024-02-21T12:00:28.399+0000 Mux close namespace admin.pbmPITRChunks
2024/02/21 12:00:28 [entrypoint] pbm-agent
exited with code -1 2024/02/21 12:00:28 [entrypoint] restart in 5 sec 2024/02/21 12:00:33 [entrypoint] starting pbm-agent
this is my mognodb deployment for the operator:
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
name: elog-plus-cluster
finalizers:
- delete-psmdb-pods-in-order
spec:
crVersion: 1.15.0
image: percona/percona-server-mongodb:6.0.12-9
imagePullPolicy: Always
allowUnsafeConfigurations: false
updateStrategy: SmartUpdate
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: Disabled
schedule: "0 2 * * *"
setFCV: false
secrets:
users: mongodb-secret
encryptionKey: elog-plus-cluster-encryption-key
# vault: my-cluster-name-vault
pmm:
enabled: false
image: percona/pmm-client:2.39.0
serverHost: monitoring-service
# mongodParams: --environment=ENVIRONMENT
# mongosParams: --environment=ENVIRONMENT
replsets:
- name: rs0
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9216'
prometheus.io/path: '/metrics'
sidecars:
- image: percona/mongodb_exporter:2.37.0
name: mongodb-exporter
args: ["--compatible-mode", "--discovering-mode", "--collector.diagnosticdata", "--collector.replicasetstatus","--collector.dbstats", "--collector.topmetrics", "--collector.indexstats", "--mongodb.uri=$(MONGODB_URI)", "--web.listen-address=$(POD_IP):9216"]
env:
- name: EXPORTER_USER
valueFrom:
secretKeyRef:
name: mongodb-secret
key: MONGODB_CLUSTER_MONITOR_USER
- name: EXPORTER_PASS
valueFrom:
secretKeyRef:
name: mongodb-secret
key: MONGODB_CLUSTER_MONITOR_PASSWORD
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MONGODB_URI
value: "mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP)/?replicaSet=rs0&authMechanism=SCRAM-SHA-256"
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: "2"
memory: "2G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 100Gi
nonvoting:
enabled: false
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
# minAvailable: 0
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 10Gi
arbiter:
enabled: false
size: 1
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
enabled: false
configsvrReplSet:
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 3Gi
mongos:
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
expose:
exposeType: ClusterIP
backup:
enabled: true
image: percona/percona-backup-mongodb:2.3.1
serviceAccountName: percona-server-mongodb-operator
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
storages:
s3-tid:
type: s3
s3:
bucket: accel-webapp-dev-elog-backup
credentialsSecret: s3-backup-secret
region: us-west-1
prefix: ""
uploadPartSize: 10485760
maxUploadParts: 10000
storageClass: STANDARD
endpointUrl: https://s3dfrgw.slac.stanford.edu
pitr:
enabled: true
oplogSpanMin: 20
compressionType: gzip
compressionLevel: 6
tasks:
- name: daily-s3-tid
enabled: true
schedule: "0 */12 * * *"
keep: 3
storageName: s3-tid
compressionType: gzip
compressionLevel: 6
# - name: weekly-s3-us-west
# enabled: false
# schedule: "0 0 * * 0"
# keep: 5
# storageName: s3-us-west
# compressionType: gzip
# compressionLevel: 6
# - name: weekly-s3-us-west-physical
# enabled: false
# schedule: "0 5 * * 0"
# keep: 5
# type: physical
# storageName: s3-us-west
# compressionType: gzip
# compressionLevel: 6
this is the log of on backups goes in error:
Status:
Destination: s3://accel-webapp-dev-elog-backup/2024-02-21T12:00:21Z
Error: some of pbm-agents were lost during the backup
Last Transition: 2024-02-21T12:01:01Z
Pbm Name: 2024-02-21T12:00:21Z
Pbm Pod: elog-plus-cluster-rs0-2
Replset Names:
rs0
s3:
Bucket: accel-webapp-dev-elog-backup
Credentials Secret: s3-backup-secret
Endpoint URL: https://s3dfrgw.slac.stanford.edu
Max Upload Parts: 10000
Region: us-west-1
Server Side Encryption:
Storage Class: STANDARD
Upload Part Size: 10485760