Percona xtraDB Cluster 8.x Setup With proxysql and keepalived as HA

Hi ,

We are setting up a Percona XtraDB Cluster with three nodes, each running on an AWS EC2 instance. To handle query routing, we plan to use ProxySQL, also deployed on an EC2 instance.

However, in this setup, ProxySQL becomes a single point of failure. To ensure high availability, we considered deploying Keepalived on two ProxySQL nodes. But since Keepalived relies on VRRP (multicast), which isn’t supported in AWS, we’re unsure if this is a viable solution.

Could you advise whether Keepalived can be used in this setup? If not, what would be the best approach to achieve high availability for ProxySQL in an AWS environment?

Looking forward to your recommendations.

Hey @Faisal_Hassan,
just for my understanding, the two ProxySQL nodes are in the same network (VPC)?
If yes you can setup a pacemaker cluster with a virtual ip and the proxysql as a resource and add those two in a resource group. In that setup you configure your client to use the cluster ip of the pacemaker cluster for communication. In case one node goes offline the other node takes over the cluster ip and your cluster is still available.

Hey @Jostrus,
Firstly, thanks for your response! And yes, the two ProxySQL nodes are in the same VPC.
I’ll try setting up Pacemaker as you suggested. Do you have any reference or guide for the setup?
What about Keepalived? Will it work in this setup?

Looking forward to your insights!

Hey @Faisal_Hassan,
sure thing, here is an guide:

I prefer HAProxy but that’s just my personal opinion. Keepalived will work as well with that setup. I also use keepalived and even ipvsadm for different use case scenarios but i just like haproxy a bit more.
In case you want to use keepalived as an Loadbalancer instead of haproxy you have you use the systemd pacemaker resource: pcs resource create haproxy systemd:keepalived

I don’t know how much you know about pacemaker/corosync cluster so if you need further clarification or in case you are interested in deeper insights about pacemaker just hit me up.

Hi @Jostrus,

First of all, thanks for the references and for sharing your opinion! I appreciate the insights.

Actually, I’m new to this setup and would love to learn more about Pacemaker/Corosync.
Could you share deeper insights on how they work in a high-availability setup? Any best practices or gotchas to watch out for?

Looking forward for your response!

Hey @Faisal_Hassan,
best practice for a pacemaker/corosync setup is quorum but can also be configured for a two node setup but isn’t recommended. Here you have to check stonith (shoot the other node in the head) and set quorum policy to ignore for a two node cluster.

As always you can also setup a witness node in case you don’t want a third node that can take over the whole traffic.

Lastly that i can think of is a dedicated NIC for Heartbeat messages but that depends on the amount of traffic that is already sending and receiving over your main NIC. It’s not required as you can still have a pretty solid setup without it but it’s definitely best practice.

Edit: One Advantage of pacemaker/corosync is the already built ocf scripts: resource-agents/heartbeat/galera.in at main · ClusterLabs/resource-agents · GitHub
For example you can setup an Galera Replication with pacemaker. There are scripts for various service so it’s worth to have a look at it.

Hey @Jostrus,

These two are extremely complex pieces of software, and would not be recommended as best-practices.

Since you are in AWS, the simplest solutions available to you are: an AWS Elastic Load Balancer, an Elastic IP, or an EC2 scaling group.