Hi,
We’ve been working with a client who has a 3 node PXC8.0.32 cluster on RHEL8.10. The cluster has been running fine for several months and was restarted last week after some patching on one of the nodes.
The bootstrap node came up fine but both joiner nodes were failing around SST with the following being the core error:
2025-02-19T08:42:49.925402-05:00 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 1248: /usr/bin/ls: Operation not permitted
Line 1248 of wsrep_sst_xtrabackup-v2 is doing this:
sockets=$(ls -l /proc/$pid/fd | grep socket | cut -d’[’ -f2 | cut -d ‘]’ -f1 | tr ‘\n’ ‘|’)
The client rolled back the updates on the patched server but the issue persists so we suspected some security policies/hardening had taken place across all servers in recent weeks/months.
We’ve subsequently narrowed it down to the “capabilities” in the mysqld service script /etc/systemd/system/mysqld.service:
CapabilityBoundingSet=CAP_IPC_LOCK CAP_DAC_OVERRIDE CAP_AUDIT_WRITE
After some trial and error (with a test service) we found that adding two extra capabilities CAP_SYS_PTRACE and CAP_DAC_READ_SEARCH was sufficient to prevent the error and allow the joiner nodes to SST successfully:
CapabilityBoundingSet=CAP_IPC_LOCK CAP_DAC_OVERRIDE CAP_AUDIT_WRITE CAP_SYS_PTRACE CAP_DAC_READ_SEARCH
Good that we fixed it but the problem is that we don’t know why this is happening in this client’s environment - we haven’t encountered this internally or on any of our other client systems.
We’ve checked and ruled out pretty much everything we know of: selinux, fapolicyd, apparmor, /proc mount flags and a whole bunch of weird and wonderful systemd and kernel settings.
Anybody encountered something similar? What else can we check in terms of audit logging to trace why the ls on /proc is being denied?
Any wisdom much appreciated.
thanks,
Neil