Percona HAProxy fails on machine reboot (failure in resolving DNS names)

SuperHammer · January 28, 2025, 5:00pm

Pardon me if this is not the correct section, but I couldn’t find an HAProxy-specific topic to post this on.

Percona offers an HAProxy package, which can be installed with:
yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
percona-release setup ppg16
dnf install percona-haproxy.x86_64

When using it along keepalived, one may also want to enable binding to any IP address:
sysctl -w net.ipv4.ip_nonlocal_bind=1
setsebool -P haproxy_connect_any=1

Here is a small /etc/haproxy/haproxy.cfg that can help us reproduce the problem:

global
        chroot      /var/lib/haproxy
        log         /dev/log local0
        maxconn 4096
        daemon
        user        haproxy
        group       haproxy
        stats socket /var/lib/haproxy/stats level admin
        pidfile     /var/run/haproxy.pid

defaults
        log         global
        option      dontlognull
        timeout http-request    60s
        timeout queue           1m
        timeout connect         60s
        timeout client          5m
        timeout server          5m
        timeout http-keep-alive 30s
        timeout check           60s
        mode                    http
        balance                 source

listen my-listener-here
  bind 192.168.248.239:6443
  mode tcp
  server master1 my-backend:6443 check inter 1s

If we run a systemctl status haproxy.service we can see the service is running.

Now let’s restart the machine:

# reboot

Now, upon the machine’s reboot, we can see the status is now failed:

# systemctl status haproxy.service
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/haproxy.service.d
           └─override.conf
   Active: failed (Result: exit-code) since Tue 2025-01-28 15:07:26 CET; 15s ago
  Process: 1195 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q (code=exited, status=1/FAILURE)

gen 28 15:07:26 myhostname systemd[1]: Starting HAProxy Load Balancer...
gen 28 15:07:26 myhostname systemd[1]: haproxy.service: Control process exited, code=exited status=1
gen 28 15:07:26 myhostname systemd[1]: haproxy.service: Failed with result 'exit-code'.
gen 28 15:07:26 myhostname systemd[1]: Failed to start HAProxy Load Balancer.

The ExecStartPre failed with status=1/exit code 1.

Let us override the service to troubleshoot the issue with strace:

export SYSTEMD_EDITOR="vim"
systemctl edit haproxy.service

We will add the following lines:

[Service]
ExecStartPre=
ExecStartPre=/bin/strace /usr/sbin/haproxy -f $CONFIG -c -q

Now let’s once again reboot the machine:

# reboot

The service is once again failed. Let’s take a look at the journalctl logs:

gen 28 15:10:54 myhostname strace[1199]: connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.248.239")}, 16) = -1 ENETUNREACH (Network is unreachable)
gen 28 15:10:54 myhostname strace[1199]: close(4)                                = 0
gen 28 15:10:54 myhostname strace[1199]: socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4
gen 28 15:10:54 myhostname strace[1199]: setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
gen 28 15:10:54 myhostname strace[1199]: connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.248.239")}, 16) = -1 ENETUNREACH (Network is unreachable)
gen 28 15:10:54 myhostname strace[1199]: close(4)                                = 0
gen 28 15:10:54 myhostname strace[1199]: socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4
gen 28 15:10:54 myhostname strace[1199]: setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
gen 28 15:10:54 myhostname strace[1199]: connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.248.239")}, 16) = -1 ENETUNREACH (Network is unreachable)
gen 28 15:10:54 myhostname strace[1199]: close(4)                                = 0
gen 28 15:10:54 myhostname strace[1199]: rt_sigprocmask(SIG_BLOCK, [HUP USR1 USR2 PIPE ALRM CHLD TSTP URG VTALRM PROF WINCH IO], [], 8) = 0
gen 28 15:10:54 myhostname strace[1199]: uname({sysname="Linux", nodename="myhostname", ...}) = 0
gen 28 15:10:54 myhostname strace[1199]: rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
gen 28 15:10:54 myhostname strace[1199]: exit_group(1)                           = ?
gen 28 15:10:54 myhostname strace[1199]: +++ exited with 1 +++
gen 28 15:10:54 myhostname systemd[1]: haproxy.service: Control process exited, code=exited status=1
gen 28 15:10:54 myhostname systemd[1]: haproxy.service: Failed with result 'exit-code'.
gen 28 15:10:54 myhostname systemd[1]: Failed to start HAProxy Load Balancer.

Clearly, it cannot resolve it’s own name, and this seems to be the problem.

We can verify this by having haproxy NOT fail upon missed DNS resolution in haproxy.cfg, like so:

global
        chroot      /var/lib/haproxy
        log         /dev/log local0
...
defaults
        # never fail on address resolution
        default-server init-addr last,libc,none
        log         global
...

Now, let us reboot once again.

Now, we can see that the haproxy service is running:

# systemctl status haproxy.service
...
   Active: active (running) since Tue 2025-01-28 15:15:35 CET; 18s ago
...

So, why does this happen?

If we take a look at another service’s unit file, for example keepalived’s service unit:

# cat /usr/lib/systemd/system/keepalived.service
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target

It is waiting for network-online.target before starting.
Instead, our HAProxy service is waiting for network.target:

# cat /usr/lib/systemd/system/haproxy.service
[Unit]
Description=HAProxy Load Balancer
After=network.target

To confirm this is the problem, we can override the service configuration like so:

# export SYSTEMD_EDITOR="vim"
# systemctl edit haproxy.service

[Unit]
After=network-online.target
Wants=network-online.target

With this configuration, upon reboot HAProxy will happily start.

I hope this article helps someone with the same problem save some time.

@perconafriends do you think such a modification would be possible/a good default configuration for the haproxy package?

It seems the haproxy package is reused across projects so I don’t have the complete picture in mind right now.

More information can be found on the various network targets here NetworkTarget

Best regards!

matthewb · January 28, 2025, 6:28pm

Hey there @SuperHammer,
Thank you for this research/investigation. I have opened a JIRA with our team for verification as a bug fix.
https://perconadev.atlassian.net/browse/DISTMYSQL-503

Topic		Replies	Views
Problems with HAProxy as load balancer and 2 nodes on leastcon / roundrobin Percona XtraDB Cluster 5.x	0	2950	November 9, 2013
Help with proxy_protocol_networks and HAPROXY Percona XtraDB Cluster 5.x	2	1156	November 20, 2018
Very Strange problem with haproxy Percona XtraDB Cluster 5.x	0	4049	November 25, 2014
Cannot update haproxy service parameters Percona Operator for MySQL	1	508	December 7, 2023
haproxy not working. one node dead causes full failure Percona XtraDB Cluster 5.x	0	1008	October 18, 2013

Percona HAProxy fails on machine reboot (failure in resolving DNS names)

Related topics