Unable to Start MongoDB After Resources Update, Possible Data Corruption

Adhiraj_Singh · April 28, 2025, 9:52pm

Description:

We’ve been using the Percona operator for MongoDB for many months now with success, not had an issue & backup/restore process working well. Yesterday we tried to increase the resources of our mongodb server but once the configuration change was applied – mongodb just fails to start.

We’re currently trying to restore a backup, but this issue seems to be really worrying if we’re to continue using the operator. I hope this was some error on our part.

Also tried booting up the server without sharding but no luck. And of course, tried letting it run for 15-20 minutes but no luck.

Steps to Reproduce:

This has never happened before and is the first time we’re seeing this. We’ve updated the resources, scaled up the disk many times without any issues. I’ve attached our configuration yaml (the change was to increase 4 vcpus to 6, and memory from 8gb to 12)

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb
  namespace: mongodb
spec:
  crVersion: 1.19.0
  image: percona/percona-server-mongodb:8.0.4-1-multi
  tls:
    mode: disabled
  unsafeFlags:
    tls: true
    replsetSize: true
    mongosSize: true
  upgradeOptions:
    apply: disabled
    schedule: "0 2 * * *"
  secrets:
    users: mongodb
  replsets:
  - name: rs0
    size: 1
    affinity:
      antiAffinityTopologyKey: 'none'
    # if using even number of nodes, set arbiter to true
    arbiter:
      enabled: false
      size: 1
    configuration: |
      replication:
        # 25GB
        oplogSizeMB: 25800
      operationProfiling:
        slowOpThresholdMs: 10000
        mode: slowOp
        rateLimit: 100
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: '6'
        memory: '12Gi'
      requests:
        cpu: '4'
        memory: 8Gi
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 550Gi
        storageClassName: gp2-xfs
  sharding:
    enabled: true

    configsvrReplSet:
      size: 1
      resources:
        limits:
          cpu: '2'
          memory: 2048Mi
        requests:
          cpu: '1'
          memory: 1536Mi
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 25Gi
          storageClassName: gp2-xfs

    mongos:
      size: 1
  users:
  - name: <user>
    db: <db>
    passwordSecretRef:
      name: password
      key: password
    roles:
      - name: dbOwner
        db: <db>
      - name: read
        db: local
      - name: read
        db: config
      - name: clusterMonitor
        db: admin
      - name: readAnyDatabase
        db: admin
  backup:
    enabled: false
    image: percona/percona-backup-mongodb:2.8.0-multi
    pitr:
      enabled: true
      oplogOnly: true
    storages:
      s3-bkp:
        type: s3
        s3:
          bucket:  <bucket-name>
          region: ap-southeast-1
          credentialsSecret: <name>
    resources:
      limits:
        cpu: 1
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 1Gi
    tasks:
    - name: "backup-task"
      enabled: true
      # At 0 minutes past the hour, every 12 hours, starting at 10:00 AM
      # https://crontab.cronhub.io
      schedule: "0 10/12 * * *"
      keep: 8
      type: physical
      storageName: s3-bkp

Version:

1.19.0

Logs:

mongo-logs.txt (653.1 KB)

Slava_Sarzhan · April 29, 2025, 6:06pm

Hi @Adhiraj_Singh I think you have big data, and you need to increase the termination grace period via the terminationGracePeriodSeconds option to be safe next time when you restart your cluster. Please see Jira

Adhiraj_Singh · April 30, 2025, 4:23am

I see, thank you. What value would you recommend for a 0.5 TB dataset? Also, what’s the default value

Topic		Replies	Views
MongoDB Restore Failing Percona Operator for MongoDB percona , mongodb , kubernetes	1	148	January 26, 2025
Percona Mongodb operator restarts with error "fatal error: concurrent map read and map write" Percona Operator for MongoDB percona , bugs , mongodb	2	106	December 23, 2024
Percona MongoDB Replica set configuration Percona Operator for MongoDB	7	600	October 11, 2024
Mongo replset fails to restart if backup is switch on/off Percona Operator for MongoDB percona , mongodb	7	1871	December 9, 2022
Restore failing with error when restoring collection Percona Operator for MongoDB psmdb-operator , pbm	1	297	March 30, 2024

Unable to Start MongoDB After Resources Update, Possible Data Corruption

Description:

Steps to Reproduce:

Version:

Logs:

Related topics