Feature Request: CR field to set sizeLimit on the PXC operator-managed tmp emptyDir volume

The PXC Operator mounts a hardcoded emptyDir: {} volume at tmp (MySQL’s tmpdir) on every PXC pod. This volume has no sizeLimit, and there is no Custom Resource field to set one. The only lever available today is pxc.resources.limits.ephemeral-storage, which provides a total cap across the container’s rootfs, log files, and all node-disk-backed emptyDir volumes combined — but no way to target tmp specifically.

We would like a CR field (e.g. pxc.tmpVolumeSpec.sizeLimit) to place a dedicated limit on just the tmp emptyDir, independent of the container’s overall ephemeral-storage budget.


Steps to Reproduce:

  1. Deploy a PXC cluster with operator version 1.17.

  2. Inspect the resulting StatefulSet:

    kubectl get statefulset mysql-pxc -n <namespace> -o yaml | grep -A5 'name: tmp'
    

    You will see:

    - name: tmp
      emptyDir: {}
    
  3. Confirm there is no sizeLimit set and no CR field to set one:

    grep -i 'tmp' deploy/cr.yaml
    
  4. Note that pxc.resources.limits.ephemeral-storage is the only available cap, but it is a blunt instrument: it limits the combined total of container rootfs + log files + tmp (and any other node-disk emptyDirs). There is no way to express “allow 200 GiB for the container layer but cap tmp at 10 GiB” through the CR.

  5. Attempt to patch sizeLimit onto the StatefulSet directly:

    kubectl patch statefulset mysql-pxc -n <namespace> --type=json \
      -p='[{"op":"replace","path":"/spec/template/spec/volumes/X/emptyDir","value":{"sizeLimit":"10Gi"}}]'
    

    The operator will overwrite this on the next reconcile cycle.


Version:

  • Operator: 1.17 or greater
  • Percona XtraDB Cluster: 8.0.x
  • Kubernetes: AKS 1.34 (Azure)

Logs:

The issue is structural rather than producing error logs. The symptom we observe is DiskPressure on database pool nodes driven by high ephemeral write throughput to the node OS disk:

# Observed via Prometheus node_disk_written_bytes_total on the rootfs device (nvme0n1):
# Peak write rate:  ~5.76 GiB/hr on pxc-0
# Average:          ~2.29 GiB/hr sustained
# Daily write churn: ~700 GiB/day average, up to ~1,360 GiB/day

# MySQL temp file activity (mysql_global_status_created_tmp_files):
# pxc-0: ~14.9 files/min
# pxc-1: ~0 files/min
# pxc-2: ~0 files/min

# /tmp occupancy at rest (low, but spikes under query load):
# pxc-0: 5.75 MB
# pxc-1: 0.02 MB
# pxc-2: 0.02 MB

Temp files are short-lived (written and deleted per-query), so point-in-time occupancy metrics understate the risk — only write throughput reflects it.


Expected Result:

A pxc.tmpVolumeSpec (or equivalent) field in the CR allowing at minimum:

spec:
  pxc:
    tmpVolumeSpec:
      emptyDir:
        sizeLimit: 10Gi          # targeted cap on /tmp only
        # medium: Memory         # optional: move to tmpfs

This would be independent of pxc.resources.limits.ephemeral-storage, letting operators express separate budgets for MySQL temp files vs. the container layer.

Alternatively, a first-class pxc.configuration-level injection that the operator recognizes and pairs with directory initialization — e.g., setting tmpdir = /var/lib/mysql/tmp and ensuring the operator pre-creates the directory — would also address the problem by moving temp I/O off the node OS disk entirely.


Actual Result:

The tmp emptyDir is unconditionally created as emptyDir: {}. Any sizeLimit patched onto the StatefulSet is overwritten on the next operator reconcile. The only workaround is to set tmpdir to a path on the datadir PVC via pxc.configuration:

spec:
  pxc:
    configuration: |
      [mysqld]
      tmpdir = /var/lib/mysql/tmp

This works (MySQL creates the directory on startup, since the mysql user owns /var/lib/mysql), but it is undocumented, not validated by the operator, and moves all temp I/O to the datadir PVC — which may or may not be desirable depending on PVC size and IOPS allocation.


Additional Information:

  • Why ephemeral-storage alone is insufficient: pxc.resources.limits.ephemeral-storage caps the total of rootfs + logs + all node-disk emptyDirs. A cluster with e.g. 50 GiB memory and heavy sort workloads needs a large MySQL working set, but only a modest tmp cap. These can’t be expressed independently today.
  • Why emptyDir sizeLimit matters separately: A per-volume sizeLimit causes kubelet to evict the pod if that specific volume exceeds the cap, giving a targeted safety valve without affecting the container’s overall ephemeral-storage budget.
  • emptyDir medium: Memory (tmpfs) would be an alternative for in-memory temp tables, but it is also not configurable through the CR and would not be counted against ephemeral-storage limits.
  • PVC redirect (tmpdir = /var/lib/mysql/tmp) removes tmp from ephemeral-storage accounting entirely (PVC mounts are never counted), which may be the right default for database-pool nodes with large PVCs but small OS disks.
  • On AKS nodes with a 29 GiB OS disk (e.g. Standard_B4ms general pool), the combination of container layers, log files, and uncapped tmp usage can contribute to DiskPressure evictions, cascading to otherwise-healthy co-located pods.
  • Related gap: there is no pxc.extraVolumes / pxc.extraVolumeMounts primitive in the CR for adding operator-managed volumes, so there is no general workaround for this class of problem.

@Julio_Guevara I created https://perconadev.atlassian.net/browse/K8SPXC-1869 to address this.