High Availability in PMM

Hi,

We are in the process of setting up the PMM Server in our Kubernetes cluster to monitor MySQL, PostgreSQL, and ProxySQL instances.

As per the PMM documentation, PMM HA is scheduled for release in Q3 2024, but it has not been released yet.

Could you provide an estimated timeline for when this feature will be included in the Helm charts for public release?

Any information would be greatly appreciated.

thank you,
Arul

Hello @Arul_Deepan_Anbumani,
There is no official timeline for PMM HA at this time. Our team is busy working on PMM v3 release. K8S should be handling the “HA” component of PMM anyways. If the PMM pod goes down, K8S will recreate it and reattach the data volume. There will be an outage while K8S does these tasks.

I would ask that you do the tasks for ‘Manual HA setup’ of PMM within your K8S environment as a workaround.

Hi @matthewb ,

Thank you for the update. I understand that K8S can handle the basic recovery of PMM through pod recreation, but the outage during this process is something we want to minimize, especially for production environments with critical monitoring needs.

I will proceed with the manual HA setup within our Kubernetes environment for now and will reach out to the community if further assistance is needed.

Appreciate your response.

Cheers,
Arul

Respectfully, Kubernetes does not manage “HA” element of Statefulsets well, by their nature they are not expected to be deleting any pods of a Statefulset if a node is offline/unavailable.
Only way this happens is if you are using a control plane that will remove unresponsive nodes and create a new one - which I am not using, so there is no HA with the current setup, and it seems you’re delaying your more clustered (out of the box) version.

We are not purposely delaying anything. Percona prioritizes customer demand of features when we look at our plans. Right now, native HA PMM is not in high demand due to most of our customers using some form of existing container HA to handle this. Because PMM separates the data volume, it is very easy to recreate PMM without much loss in visibility if it goes down.