Hi,
We received an alert yesterday for one of our MongoDB pods having disk 90% full.
When we compare the df -h output it doesn’t match the du output for the mountpoint:
df -h /data/db
Filesystem Size Used Avail Use% Mounted on
none 5.0G 4.5G 548M 90% /data/db
du -schx /data/db
685M /data/db
685M Total
At first, we couldn’t figure out what is going on, but we did notice that if you delete the MongoDB replset pod when it comes back up, the df output drops down to match the du output.
I tried killing the mongod process to see if it was holding onto some open file handles or something but it was still reporting the discrepancy between df and du.
Then I ran, for example:
% kubectl -n example exec example-mongodb-rs0-2 -c mongod -- df -h /data/db
Filesystem Size Used Avail Use% Mounted on
none 5.0G 1.3G 3.7G 26% /data/db
% kubectl -n example exec example-mongodb-rs0-2 -c backup-agent -- kill 1
% kubectl -n example exec example-mongodb-rs0-2 -c mongod -- df -h /data/db
Filesystem Size Used Avail Use% Mounted on
none 5.0G 718M 4.3G 15% /data/db
and realised that it’s the backup-agent container which is holding something.
Unfortunately, that space is definitely being consumed, I tried to dd a 4GB file to see and it hit ENOSPC very quickly.
I looked in /proc/1/fd etc and couldn’t see anything concerning or obvious.
Our configuration is:
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
name: "{{ mongodb_config.cluster.name }}"
namespace: {{ app_namespace }}
finalizers:
- percona.com/delete-psmdb-pods-in-order
- percona.com/delete-psmdb-pvc
spec:
enableVolumeExpansion: true
enableExternalVolumeAutoscaling: false
crVersion: {{ mongodb_config.cluster.version }}
image: percona/percona-server-mongodb:{{ mongodb_config.images.server }}
imagePullPolicy: IfNotPresent
updateStrategy: SmartUpdate
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: disabled
schedule: "0 2 * * *"
setFCV: false
secrets:
users: "{{ mongodb_config.cluster.name }}-secrets"
pmm:
enabled: false
image: percona/pmm-client:{{ mongodb_config.images.pmm }}
serverHost: monitoring-service
replsets:
- name: rs0
configuration: |
security:
enableEncryption: false
size: 3
affinity:
antiAffinityTopologyKey: "topology.kubernetes.io/zone"
sidecars:
- name: metrics
image: percona/mongodb_exporter:{{ mongodb_config.images.exporter }}
env:
- name: EXPORTER_USER
valueFrom:
secretKeyRef:
name: "{{ mongodb_config.cluster.name }}-secrets"
key: MONGODB_CLUSTER_MONITOR_USER
- name: EXPORTER_PASS
valueFrom:
secretKeyRef:
name: "{{ mongodb_config.cluster.name }}-secrets"
key: MONGODB_CLUSTER_MONITOR_PASSWORD
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MONGODB_URI
value: "mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP):27017"
args: ["--discovering-mode", "--compatible-mode", "--collect-all", "--log.level=warn", "--mongodb.uri=$(MONGODB_URI)"]
podDisruptionBudget:
maxUnavailable: 1
expose:
enabled: false
resources:
limits:
cpu: "{{ mongodb_config.resources.limits.cpu }}"
memory: "{{ mongodb_config.resources.limits.memory }}"
requests:
cpu: "{{ mongodb_config.resources.requests.cpu }}"
memory: "{{ mongodb_config.resources.requests.memory }}"
volumeSpec:
persistentVolumeClaim:
storageClassName: "{{ mongodb_config.storage.className }}"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: {{ mongodb_config.storage.capacity }}
hidden:
size: 1
enabled: false
nonvoting:
size: 1
enabled: false
arbiter:
size: 1
enabled: false
backup:
enabled: true
resources:
requests:
memory: "{{ mongodb_config.backup.resources.requests.memory }}"
cpu: "{{ mongodb_config.backup.resources.requests.cpu }}"
limits:
memory: "{{ mongodb_config.backup.resources.limits.memory }}"
cpu: "{{ mongodb_config.backup.resources.limits.cpu }}"
image: percona/percona-backup-mongodb:{{ mongodb_config.images.backup }}
startingDeadlineSeconds: 300
storages:
radosgw:
type: s3
s3:
bucket: "{{ mongodb_bucket_name }}"
credentialsSecret: mongodb-backup
endpointUrl: "http://objectstore.objectstore.svc.cluster.local"
prefix: ""
region: ceph-objectstore
pitr:
enabled: true
compressionType: gzip
compressionLevel: 6
tasks:
- name: daily-backup
enabled: true
schedule: "0 1 * * *"
type: physical
retention:
count: 30
type: count
deleteFromStorage: true
storageName: radosgw
compressionType: gzip
compressionLevel: 6
logcollector:
enabled: false
unsafeFlags:
tls: true
tls:
mode: disabled
We are using the following:
- Percona Operator for MongoDB 1.21.1,
- MongoDB 8.0.12-4 and
- backup agent 2.11.0
pbm status:
Cluster:
========
rs0:
- example-mongodb-rs0-0.example-mongodb-rs0.example.svc.cluster.local:27017 [P]: pbm-agent [v2.11.0] OK
- example-mongodb-rs0-1.example-mongodb-rs0.example.svc.cluster.local:27017 [S]: pbm-agent [v2.11.0] OK
- example-mongodb-rs0-2.example-mongodb-rs0.example.svc.cluster.local:27017 [S]: pbm-agent [v2.11.0] OK
PITR incremental backup:
========================
Status [ON]
Running members: rs0/example-mongodb-rs0-2.example-mongodb-rs0.example.svc.cluster.local:27017;
Currently running:
==================
(none)
Backups:
========
S3 ceph-objectstore http://objectstore.objectstore.svc.cluster.local:8080/mongodb-backup-8840c0f9-4186-4fc0-8517-97db01fcb950
Snapshots:
2026-01-20T01:00:00Z 48.00MB <physical> success [restore_to_time: 2026-01-20T01:00:02]
2026-01-19T01:00:00Z 52.51MB <physical> success [restore_to_time: 2026-01-19T01:00:02]
2026-01-18T01:00:00Z 54.08MB <physical> success [restore_to_time: 2026-01-18T01:00:01]
2026-01-17T01:00:01Z 53.47MB <physical> success [restore_to_time: 2026-01-17T01:00:03]
2026-01-16T01:00:00Z 51.36MB <physical> success [restore_to_time: 2026-01-16T01:00:01]
2026-01-15T01:51:41Z 49.16MB <physical> success [restore_to_time: 2026-01-15T01:51:43]
2026-01-15T01:51:05Z 48.76MB <physical> success [restore_to_time: 2026-01-15T01:51:08]
2026-01-15T01:50:29Z 48.52MB <physical> success [restore_to_time: 2026-01-15T01:50:31]
2026-01-15T01:49:59Z 48.30MB <physical> success [restore_to_time: 2026-01-15T01:50:00]
2026-01-15T01:49:25Z 48.02MB <physical> success [restore_to_time: 2026-01-15T01:49:27]
2026-01-15T01:48:55Z 47.80MB <physical> success [restore_to_time: 2026-01-15T01:48:57]
2026-01-15T01:48:19Z 48.34MB <physical> success [restore_to_time: 2026-01-15T01:48:20]
2026-01-15T01:47:43Z 48.11MB <physical> success [restore_to_time: 2026-01-15T01:47:45]
2026-01-15T01:47:12Z 51.38MB <physical> success [restore_to_time: 2026-01-15T01:47:15]
2026-01-15T01:46:42Z 47.81MB <physical> success [restore_to_time: 2026-01-15T01:46:44]
2026-01-15T01:46:11Z 47.53MB <physical> success [restore_to_time: 2026-01-15T01:46:12]
2026-01-15T01:44:33Z 57.57MB <physical> success [restore_to_time: 2026-01-15T01:44:34]
2026-01-15T01:43:47Z 57.33MB <physical> success [restore_to_time: 2026-01-15T01:43:48]
2026-01-15T01:43:11Z 57.10MB <physical> success [restore_to_time: 2026-01-15T01:43:12]
2026-01-01T01:00:00Z 0.00B <physical> failed [ERROR: some of pbm-agents were lost during the backup] [2026-01-15T01:43:11]
2025-12-31T01:00:00Z 56.64MB <physical> success [restore_to_time: 2025-12-31T01:00:02]
2025-12-30T01:00:00Z 55.63MB <physical> success [restore_to_time: 2025-12-30T01:00:01]
2025-12-29T01:00:00Z 55.00MB <physical> success [restore_to_time: 2025-12-29T01:00:02]
2025-12-28T01:00:00Z 51.40MB <physical> success [restore_to_time: 2025-12-28T01:00:02]
2025-12-27T01:00:00Z 50.88MB <physical> success [restore_to_time: 2025-12-27T01:00:02]
2025-12-26T01:00:00Z 47.24MB <physical> success [restore_to_time: 2025-12-26T01:00:02]
2025-12-25T01:00:00Z 46.98MB <physical> success [restore_to_time: 2025-12-25T01:00:01]
2025-12-24T01:00:00Z 43.04MB <physical> success [restore_to_time: 2025-12-24T01:00:02]
2025-12-23T01:00:00Z 42.65MB <physical> success [restore_to_time: 2025-12-23T01:00:02]
2025-12-22T01:00:00Z 38.63MB <physical> success [restore_to_time: 2025-12-22T01:00:02]
2025-12-21T01:00:00Z 37.97MB <physical> success [restore_to_time: 2025-12-21T01:00:02]
2025-12-20T01:00:00Z 34.35MB <physical> success [restore_to_time: 2025-12-20T01:00:02]
PITR chunks [1.48GB]:
2025-12-20T01:00:03 - 2026-01-20T02:29:56
2025-12-19T01:00:50 - 2025-12-20T01:00:02 (no base snapshot)
I couldn’t get pbm logs -x -s D -t 0 to run but here is the output of kubectl logs from a pod which currently is reporting:
% kubectl exec -n example example-mongodb-rs0-1 -c backup-agent -- df -h /data/db
Filesystem Size Used Avail Use% Mounted on
none 5.0G 1.3G 3.7G 26% /data/db
% kubectl exec -n example example-mongodb-rs0-1 -c backup-agent -- du -schx /data/db
648M /data/db
648M total
logs: gist:1d880a6bc36839e9a3ee205b3359fbee · GitHub
then when I run kill 1
% % kubectl exec -n example example-mongodb-rs0-1 -c backup-agent -- kill 1
% kubectl exec -n example example-mongodb-rs0-1 -c backup-agent -- df -h /data/db
Filesystem Size Used Avail Use% Mounted on
none 5.0G 716M 4.3G 15% /data/db