PSMDB Restore fails on zstd compression

Hi folks,

I have a problem with PSMDB restore in Kubernetes when using zstd compression.

Description:

I’ve created a simple replicaset cluster of 3 nodes with daily physical and logical backups using:

  • compressionType: zstd
  • compressionLevel: 6

Backups work perfectly fine, whether scheduled or on-demand. However, I am unable to restore my cluster from either physical or logical backups.

  • Physical backup: The cluster switches to initializing mode and then fails with “backup not found.”
  • Logical backup: The restore fails every time with an “unexpected EOF” error.

Steps to Reproduce:

  1. Create a simple replicaset
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: replicaset-test-cluster
  namespace: percona-mongodb
  finalizers:
    - percona.com/delete-psmdb-pods-in-order
spec:
  pause: false
  crVersion: 1.19.1
  image: percona/percona-server-mongodb:7.0.15-9-multi
  imagePullPolicy: Always
  updateStrategy: SmartUpdate
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: disabled
    schedule: "0 2 * * *"
    setFCV: false
  secrets:
    users: replicaset-test-cluster-secrets
    encryptionKey: replicaset-test-cluster-mongodb-encryption-key
  pmm:
    enabled: true
    image: percona/pmm-client:2.44.0
    serverHost: pmm-qa.slicetest.com
    mongodParams: --environment=QA --cluster=replicaset-test-cluster
  replsets:
  - name: rs0
    size: 3
    configuration: |
      operationProfiling:
        mode: all
        slowOpThresholdMs: 100
        rateLimit: 10
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      type: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 3Gi

    nonvoting:
      enabled: false
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"

  sharding:
    enabled: false
    configsvrReplSet:
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      expose:
        enabled: false
        type: ClusterIP
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi

    mongos:
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      expose:
        type: ClusterIP

  users:
  - name: app-test-user
    db: admin
    passwordSecretRef:
      name: app-test-user-secret
      key: password
    roles:
      - name: readWrite
        db: test_db

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.8.0-multi
    storages:
      s3-us-east:
        type: s3
        s3:
          bucket: mongodb-backup-qa
          credentialsSecret: replicaset-test-cluster-backup-s3
          region: us-east-1
          prefix: "replicaset-test-cluster"
    pitr:
      enabled: false
      oplogOnly: false
      compressionType: zstd
      compressionLevel: 6
    tasks:
      - name: daily-s3-us-east
        enabled: true
        schedule: "0 0 * * *"
        keep: 10
        storageName: s3-us-east
        compressionType: zstd
        compressionLevel: 6
        type: physical
      - name: daily-s3-us-east-logic
        enabled: true
        schedule: "0 1 * * *"
        keep: 10
        storageName: s3-us-east
        compressionType: zstd
        compressionLevel: 6
        type: logical
  1. Make some visible data changes, (create DB, collection, add some data), make a backup and drop some object to have anchor to check.
  2. Take the created backup and start the restore:
    backup info:
NAME                                         CLUSTER                   STORAGE      DESTINATION                                                              TYPE       STATUS   COMPLETED   AGE
cron-replicaset-test--20250408000000-8lmbh   replicaset-test-cluster   s3-us-east   s3://mongodb-backup-qa/replicaset-test-cluster/2025-04-08T00:00:21Z   physical   ready    39h         39h

Restore config:

kind: PerconaServerMongoDBRestore
metadata:
  name: restore-replicaset-test-cluster
spec:
  clusterName: replicaset-test-cluster
  backupName: cron-replicaset-test--20250408000000-8lmbh
  storageName: s3-us-east
  backupSource:
    type: physical
    destination:  s3://mongodb-backup-qa/replicaset-test-cluster/2025-04-08T00:00:21Z
    s3:
      credentialsSecret: replicaset-test-cluster-backup-s3
      region: us-east-1
      bucket: mongodb-backup-qa
      endpointUrl: https://s3.us-east-1.amazonaws.com/
      prefix: "replicaset-test-cluster"

Version:

Operator: 1.19.1
MongoDB server: 7.0.15-9
Percona-Backup-Mongodb: 2.8.0

Logs:

Operator logs:

  1. Physical backup: gist:13d1579be189ac54b4392e08206ff611 · GitHub
  2. Logical backup: Logical backup restore with ztsd compression · GitHub

Expected Result:

A working cluster with actual data from the backup.

Actual Result:

  • Physical backup: The cluster in initializing state is not stable.
  • Logical backup: Partly lost data and users (YAML config helps to have users).

Additional Information:

I’ve switched compression to the default gzip, and I was able to restore both physical and logical backups on the same cluster. This suggests that the problem is with the zstd compression algorithm.

Hi, thanks for letting us know. Can you please open a bug at jira.percona.com with this info so dev team can review?