Help to understand how the resource limits are set on clusters

I tried setting up a medium cluster and notices these limits in the pods resource
resources:
limits:
cpu: 3200m
memory: 7Gi
requests:
cpu: 3040m
memory: 7140383129600m

Also when i set a custom 3 node cluster with 7 cpu and 24gb ram i see the same limits as above and i did a quick sysbench and noticed no difference between performance of medium or my custom specs. I am running this on 3 nodes dedicated to db each with 8 cpu and 32 gb ram.

Is there something i am missing here, even resizing the nodes to small and then to large does not change any thing in perf or these limits when inspecting yaml.

or should i update the crd PerconaXtraDBCluster values directly for this since it is not working in ui?
I am using the latest everest v1.0.1 on Rke2 1.30.2 with ebs volumes.

I also keep getting this message [Could not get cluster resources] when customizing the cluster.
Could this be related ?

i tried patching the limits in cr PerconaXtraDBCluster still doesn’t change. It would be helpful if someone can point me in the right direction to get to increase limits.

I had that same issue. CPU requests are always 3, even if you set smaller values in the custom option. Also, there is no way to change requests/limits values manually.
It would be nice if there is a switch in UI added for “more details” where users can see and change values yaml for cr before deployment

Hey folks. How do I reproduce it?

Here is what I did:

  1. Deployed 1.0.1 on my k8s cluster
  2. Created a PXC cluster with CPU 1.5 and RAM 1.5

I see the following resources in kubectl output:

    resources:
      limits:
        cpu: 1500m
        memory: 1500M
      requests:
        cpu: 1500m
        memory: 1500M

So are we saying that requests and limits are set correctly, but there are performance issues?
Or there is a problem where in some cases requests and limits are not set correctly?

cc @beta-auction @Slavisa_Milojkovic

1 Like

I tried again now and it works with a small preset. When this issue happened, the initial db was set to be created with higher resource requests that were available. When I deleted the db from UI since it could not be provisioned, each subsequential try to create a smaller db instance was also set with 3 CPU requests. Like the operator kept somewhere the initial requests.

So I think you should first try to create big db larger than available resources in the cluster, then delete it from UI, and try to create a smaller db to reproduce the issue.

No idea how this works but we cannot launch a cluster like this, we aleady had a OOMKilled because the limits are set too low (7G). I am not sure if this is hardcoded or not.

We are also seeing this in the logs which concern us:

2024-08-14 15:14:20.872	
2024-08-14T13:14:20.872Z	INFO	KubeAPIWarningLogger	spec.template.spec.containers[0].resources.requests[memory]: fractional byte value "101994987520m" is invalid, must be an integer
2024-08-14 15:14:20.872	
2024-08-14T13:14:20.872Z	INFO	KubeAPIWarningLogger	spec.template.spec.containers[0].resources.limits[memory]: fractional byte value "107374182400m" is invalid, must be an integer
2024-08-14 15:14:20.054	
2024-08-14T13:14:20.054Z	INFO	KubeAPIWarningLogger	spec.template.spec.containers[1].resources.requests[memory]: fractional byte value "7140383129600m" is invalid, must be an integer
2024-08-14 15:14:20.054	
2024-08-14T13:14:20.054Z	INFO	KubeAPIWarningLogger	spec.template.spec.containers[0].resources.limits[memory]: fractional byte value "214643507200m" is invalid, must be an integer

For us it just ignores our resource requests and puts 7Gi ram as a container limit which causes OOMkilled because our DB needs at least 14G to work correctly.

case everestv1alpha1.EngineSizeMedium:
	pxc.Spec.PXC.PodSpec.LivenessProbes.TimeoutSeconds = 451
	pxc.Spec.PXC.PodSpec.ReadinessProbes.TimeoutSeconds = 451
	pxc.Spec.PXC.PodSpec.Resources = pxcResourceRequirementsMedium

this is being used, but this was the payload:

{
  "apiVersion": "everest.percona.com/v1alpha1",
  "kind": "DatabaseCluster",
  "metadata": {
    "name": "backend",
    "namespace": "everest"
  },
  "spec": {
    "backup": {
      "enabled": true,
      "schedules": [
        {
          "enabled": true,
          "name": "backend",
          "backupStorageName": "wasabi",
          "schedule": "0 22 * * *",
          "retentionCopies": 3
        }
      ]
    },
    "engine": {
      "type": "pxc",
      "version": "8.0.36-28.1",
      "replicas": 3,
      "resources": {
        "cpu": "4",
        "memory": "20G"
      },
      "storage": {
        "class": "local-nvme",
        "size": "100G"
      },
      "config": "[mysqld]\nkey_buffer_size=16M\nmax_allowed_packet=128M\nmax_connections=250\ninnodb_buffer_pool_size=16G"
    },
    "monitoring": {
      "monitoringConfigName": "pmm"
    },
    "proxy": {
      "replicas": 3,
      "expose": {
        "type": "internal"
      }
    }
  }
}

So I don’t know how but the medium size is being used instead of custom.

I think custom isn’t properly supported at all from what I can gather.

lower limits worked for me too, but anything above 4 cpu( such as 6) or greater than 7gb ram doesn’t work. The limits are stuck as mentioned above. Tried in latest everest version also(1.1.0).

I tried this again this time spinning up a new cluster k3s to see if i can replicate the issue that happened in rke2.

Installed latest everest
Install single node mysql 1 cpu, 2 gb, and 25gb disk

apiVersion: everest.percona.com/v1alpha1
kind: DatabaseCluster
metadata:
  creationTimestamp: '2024-08-17T06:36:52Z'
  finalizers:
    - everest.percona.com/upstream-cluster-cleanup
    - foregroundDeletion
  generation: 4
  labels:
    clusterName: mysql-u3b
....
    replicas: 1
    resources:
      cpu: '1'
      memory: 2G
    storage:
      class: local-path
      size: 25G
    type: pxc
    userSecretsName: everest-secrets-mysql-u3b
    version: 8.0.36-28.1

when checking the xtradb crd

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
...
pxc:
   # skipping config since it is default set by operator
    image: percona/percona-xtradb-cluster:8.0.36-28.1
    lifecycle: {}
    livenessProbes:
      timeoutSeconds: 450
    podDisruptionBudget:
      maxUnavailable: 1
    readinessProbes:
      timeoutSeconds: 450
    resources:
      limits:
        cpu: 600m
        memory: 1825361100800m
      requests:
        cpu: 570m
        memory: 1728724336640m
    serviceType: ClusterIP
    sidecarResources: {}
    size: 1
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 25G
        storageClassName: local-path

Now unsintall this mysql instance
Install mysql single node 6 cpu 16gb ram 50gb disk

apiVersion: everest.percona.com/v1alpha1
kind: DatabaseCluster
  labels:
    clusterName: mysql-37y
...
engine:
    config: ....
    replicas: 1
    resources:
      cpu: '6'
      memory: 16G
    storage:
      class: local-path
      size: 50G
    type: pxc
    userSecretsName: everest-secrets-mysql-37y
    version: 8.0.36-28.1

and in the xtra db cluster crd we have

    resources:
      limits:
        cpu: 3200m
        memory: 7Gi
      requests:
        cpu: 3040m
        memory: 7140383129600m

I even check in the node limits (kubectl describe node xxx) just to be sure that

  Resource           Requests              Limits
  --------           --------              ------
  cpu                4117m (25%)           5080m (31%)
  memory             8803424665600m (13%)  9434Mi (14%) ## only 7gb request; rest(~2g is others running in cluster)
  ephemeral-storage  0 (0%)                0 (0%)
  hugepages-1Gi      0 (0%)                0 (0%)
  hugepages-2Mi      0 (0%)                0 (0%)

Interestingly
when i request 12 cpu and 52gb ram with 80gb disk the below us the resource

    resources:
      limits:
        cpu: 3200m
        memory: 28Gi ## <--seems limited to 28gb and not 52gb
      requests:
        cpu: 3040m
        memory: 28561532518400m

checked on ndoe too

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests               Limits
  --------           --------               ------
  cpu                4403m (27%)            5381m (33%)
  memory             32458208706560m (49%)  33190Mi (52%)
  ephemeral-storage  0 (0%)                 0 (0%)
  hugepages-1Gi      0 (0%)                 0 (0%)
  hugepages-2Mi      0 (0%)                 0 (0%)

This seems to be clearly a bug according to my understanding of how it is working. This looks like a logic issue around how resources are assigned.
it seems to limit to some ram value in ranges and cpu is limited to 3200m when requesting more than 4 cpu.

btw, I am tested on standard k3s cluster(ipv6 enabled) with cilium, on single a aws ec2 m6a.4xlarge instance.

cc @Sergey_Pronin @silent

Hello folks,
Thank you for letting us know about this issue.
I can confirm that this is indeed a bug in the everest operator. I have created this ticket to track this internally. We expect to release a fix with the next version of everest (v1.2.0).

hi @Diogo_Recharte Is this still scheduled to be fixed in the next release?

I saw comments from Manish Chawla in the ticket saying he tested it and it worked on a version number that I don’t recognise (maybe a test build).

But I can confirm this is still an issue for us.

Yes, this fix will be part of the upcoming v1.2.0

1 Like