State transfer failed: -13 (Permission denied)

I have started the first node with the bootstrap option, and it is running well and serving queries. However, when I try to add the 2nd and 3rd nodes, they both fail with the same error. It’s odd as they were all running perfectly just a bit ago. I had to stop all the machines to do some maintenance and rebooted them. This is the mysql log on the 2nd machine trying to join:

2023-09-07T22:30:55.822334Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2023-09-07T22:30:55.822343Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> PRIMARY (TO: 12374121)
2023-09-07T22:30:55.822439Z 1 [Note] [MY-000000] [Galera] ####### processing CC 12374121, local, ordered
2023-09-07T22:30:55.822471Z 1 [Note] [MY-000000] [Galera] Maybe drain monitors from -1 upto current CC event 12374121 upto:-1
2023-09-07T22:30:55.822481Z 1 [Note] [MY-000000] [Galera] Drain monitors from -1 up to -1
2023-09-07T22:30:55.822493Z 1 [Note] [MY-000000] [Galera] Process first view: 329f6151-d406-11ed-8dc7-77e2709c9139 my uuid: 359d6040-4dce-11ee-a6c0-4e911c923161
2023-09-07T22:30:55.822508Z 1 [Note] [MY-000000] [Galera] Server pxc-cluster-node-2 connected to cluster at position 329f6151-d406-11ed-8dc7-77e2709c9139:12374121 with ID 359d6040-4dce-11ee-a6c0-4e911c923161
2023-09-07T22:30:55.822521Z 1 [Note] [MY-000000] [WSREP] Server status change disconnected -> connected
2023-09-07T22:30:55.822566Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-07T22:30:55.822597Z 1 [Note] [MY-000000] [Galera] ####### My UUID: 359d6040-4dce-11ee-a6c0-4e911c923161
2023-09-07T22:30:55.822609Z 1 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2023-09-07T22:30:55.822781Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2023-09-07T22:30:55.822876Z 1 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2023-09-07T22:30:55.822900Z 1 [Note] [MY-000000] [Galera] State transfer required: 
	Group state: 329f6151-d406-11ed-8dc7-77e2709c9139:12374121
	Local state: 00000000-0000-0000-0000-000000000000:-1
2023-09-07T22:30:55.822909Z 1 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
2023-09-07T22:30:55.822917Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-07T22:30:55.823125Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.128.0.50' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '68329' --mysqld-version '8.0.31-23.2'   '' )
2023-09-07T22:30:56.729249Z 1 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|10.128.0.50:4444/xtrabackup_sst//1
2023-09-07T22:30:56.729353Z 1 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
2023-09-07T22:30:56.729382Z 1 [Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: 329f6151-d406-11ed-8dc7-77e2709c9139
2023-09-07T22:30:56.729403Z 1 [Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 12374121, STRv: 3
2023-09-07T22:30:56.729500Z 1 [Note] [MY-000000] [Galera] IST receiver addr using tcp://10.128.0.50:4568
2023-09-07T22:30:56.729708Z 1 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-12374121, listening at: tcp://10.128.0.50:4568
2023-09-07T22:30:56.730269Z 0 [Note] [MY-000000] [Galera] Member 0.0 (pxc-cluster-node-2) requested state transfer from '*any*'. Selected 1.0 (pxc-cluster-node-1)(SYNCED) as donor.
2023-09-07T22:30:56.730303Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 12374123)
2023-09-07T22:30:56.730343Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 1
2023-09-07T22:30:56.730363Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2023-09-07T22:30:56.730375Z 1 [Note] [MY-000000] [Galera] GCache history reset: 329f6151-d406-11ed-8dc7-77e2709c9139:0 -> 329f6151-d406-11ed-8dc7-77e2709c9139:12374121
2023-09-07T22:30:57.432489Z 0 [Warning] [MY-000000] [Galera] 1.0 (pxc-cluster-node-1): State transfer to 0.0 (pxc-cluster-node-2) failed: -13 (Permission denied)
2023-09-07T22:30:57.432538Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1216: Will never receive state. Need to abort.
2023-09-07T22:30:57.432550Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-09-07T22:30:57.432571Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-09-07T22:30:57.432703Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
2023-09-07T22:30:58.436334Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,359d6040-a6c0,8)
memb {
	359d6040-a6c0,0
	}
joined {
	}
left {
	}
partitioned {
	abb830ee-9751,0
	}
)
2023-09-07T22:30:58.436409Z 0 [Note] [MY-000000] [Galera] (359d6040-a6c0, 'tcp://0.0.0.0:4567') turning message relay requesting off
2023-09-07T22:30:58.436429Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2023-09-07T22:30:58.436440Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2023-09-07T22:30:58.436606Z 0 [Note] [MY-000000] [Galera] gcomm: closed
2023-09-07T22:30:58.436628Z 0 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.
2023-09-07T22:30:58.436639Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2023-09-07T22:30:58.436646Z 0 [Note] [MY-000000] [WSREP] Terminating SST process

This is the MySQL conf for the joining node:

[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log_error=/var/log/mysql/error.log
pid_file=/var/run/mysqld/mysqld.pid
max_connections=1000
wsrep_auto_increment_control=0
auto_increment_increment=1
binlog_expire_logs_seconds=604800
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://10.128.0.51,10.128.0.50,10.128.0.49
binlog_format=ROW
wsrep_slave_threads=8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=10.128.0.50
wsrep_cluster_name=pxc-cluster
wsrep_node_name=pxc-cluster-node-2
pxc_strict_mode=ENFORCING
wsrep_sst_method=xtrabackup-v2

The failure of course is in these lines:

2023-09-07T22:30:57.432489Z 0 [Warning] [MY-000000] [Galera] 1.0 (pxc-cluster-node-1): State transfer to 0.0 (pxc-cluster-node-2) failed: -13 (Permission denied)
2023-09-07T22:30:57.432538Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1216: Will never receive state. Need to abort.

It says permission denied, but it doesn’t specify which permission to do what?

The datadir on both nodes is owned by the mysql user, and as far as I understand in PXC 8, we don’t need to create an SST user in MySQL as its done transparently and automatically. How can I figure out what permission is failing?

Is SELinux enabled? That’s usually the first source of permission denied errors.

1 Like

Hi @matthewb checked, its not enabled for this install.

Is there a way to get more verbose exceptions?

Hi @Cyril_Graze, In general MySQL / Percona MySQL 8 will have the default value of 2 for parameter log_error_verbosity which means logging ERROR , WARNING. So, the server which you have shared the log has ERROR , WARNING& NOTE with all verbosity exceptions.

Please refer here for more details of the log_error_verbosity.

1 Like

There should be an additional error log in your $DATADIR for xtrabackup which is created during the SST process. Check this log for additional errors. Also, on the joiner, there should be a receiver xtrabackup log with more info.

1 Like

@matthewb @Uday_Kumar_Rajarapu

Thanks for your prompt and helpful replies. I have digged deeper now in the logs, and have found the source of the exception on the bootstrapped node:

2023-09-09T16:40:17.125198Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_common: line 979: /usr/bin/mysql: Operation not permitted
2023-09-09T16:40:17.126508Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:126
2023-09-09T16:40:17.145568Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.128.0.50:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.33-25.1'   '' --gtid '329f6151-d406-11ed-8dc7-77e2709c9139:14503219' : 13 (Permission denied)
2023-09-09T16:40:17.148647Z 0 [Note] [MY-000000] [Galera] SST sending failed: -13

Looks like wsrep_sst_common is not being allowed to execute the MySQL binary.

Any tips on the best solution for this? :slight_smile:

The code in that script is:

function exec_sql() {
  local user=$1
  local password=$2
  local socket=$3
  local query=$4
  local args=""
  local retvalue
  local retoutput
  local default_auth=""

  if [[ $# -ge 5 ]]; then
    args=$5
  fi

  defaults=$(printf '[client]\nuser=%s\npassword="%s"\nsocket=%s\n%s' \
    "${user}" \
    "${password}" \
    "${socket}" \
    "${default_auth}"
  )

  retoutput=$(mysql --defaults-file=<(echo "${defaults}") --unbuffered --batch --silent ${args} -e "$query")
  retvalue=$?

  printf "%s" "${retoutput}"
  return $retvalue
}

These are the file permissions for MySQL:

$ ls -la /usr/bin/mysql
-rwxr-xr-x 1 root root 7980496 Jul 27 07:05 /usr/bin/mysql

This isn’t by chance ARM is it?
Check your system audit.log and other system logs to see if execution was denied for some reason. Also, what OS/version are you running?

@matthewb

No its a standard 64bit VM on GCP. Running Ubuntu 22.04.

Previously had checked if AppArmor was enabled, but the package wasn’t even intstalled. How can I make 100% sure AppArmor is not blocking this?

There will be OS/audit log/system log entries if blocked by the kernel/OS.

There are no entries specific to either /usr/bin/wsrep_sst_common or /usr/bin/mysql. On the donor node, kern.log is empty, and on the joiner node the only entries I see are these:

user@storage-instance:~$ sudo tail -n 300 /var/log/kern.log
Sep 11 12:52:18 storage-instance kernel: [348690.146538] kauditd_printk_skb: 32915 callbacks suppressed
Sep 11 12:52:18 storage-instance kernel: [348690.146543] audit: type=1400 audit(1694436738.146:539852): apparmor="ALLOWED" operation="exec" class="file" profile="/usr/sbin/mysqld" name="/usr/bin/dash" pid=551155 comm="sst_joiner" requested_mask="x" denied_mask="x" fsuid=114 ouid=0 target="/usr/sbin/mysqld//null-/usr/bin/dash"
Sep 11 12:52:18 storage-instance kernel: [348690.146551] audit: type=1400 audit(1694436738.146:539853): apparmor="ALLOWED" operation="file_inherit" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/var/lib/mysql/grastate.dat" pid=551155 comm="sh" requested_mask="wr" denied_mask="wr" fsuid=114 ouid=114
Sep 11 12:52:18 storage-instance kernel: [348690.146624] audit: type=1400 audit(1694436738.146:539854): apparmor="ALLOWED" operation="file_mmap" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/usr/bin/dash" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146685] audit: type=1400 audit(1694436738.150:539855): apparmor="ALLOWED" operation="file_mmap" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146898] audit: type=1400 audit(1694436738.150:539856): apparmor="ALLOWED" operation="open" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/etc/ld.so.cache" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146926] audit: type=1400 audit(1694436738.150:539857): apparmor="ALLOWED" operation="getattr" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/etc/ld.so.cache" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146931] audit: type=1400 audit(1694436738.150:539858): apparmor="ALLOWED" operation="getattr" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/etc/ld.so.cache" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146948] audit: type=1400 audit(1694436738.150:539859): apparmor="ALLOWED" operation="open" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/usr/lib/x86_64-linux-gnu/libc.so.6" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146966] audit: type=1400 audit(1694436738.150:539860): apparmor="ALLOWED" operation="getattr" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/usr/lib/x86_64-linux-gnu/libc.so.6" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0
Sep 11 12:52:18 storage-instance kernel: [348690.146970] audit: type=1400 audit(1694436738.150:539861): apparmor="ALLOWED" operation="getattr" class="file" profile="/usr/sbin/mysqld//null-/usr/bin/dash" name="/usr/lib/x86_64-linux-gnu/libc.so.6" pid=551155 comm="sh" requested_mask="r" denied_mask="r" fsuid=114 ouid=0

Any other ideas?

$ file /usr/bin/mysql
Is it the correct binary? Can you execute it yourself directly?

$ file /usr/bin/mysql
/usr/bin/mysql: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f23f66b417ee261e98e935cd748934678b62a9b8, for GNU/Linux 3.2.0, stripped

yes, it works as expected with no permission errors

I’m really stuck here @matthewb :man_facepalming:

Looking at the service status, I see this warning about PIDs, is this normal or indicative of a know issue?

# systemctl status mysql.service
× mysql.service - Percona XtraDB Cluster
     Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
     Active: failed (Result: signal) since Tue 2023-09-12 14:27:00 UTC; 10s ago
    Process: 703594 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
    Process: 703631 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
    Process: 703660 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 703662 ExecStartPre=/bin/sh -c VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
    Process: 704007 ExecStart=/usr/sbin/mysqld $_WSREP_START_POSITION (code=killed, signal=ABRT)
    Process: 704498 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
   Main PID: 704007 (code=killed, signal=ABRT)
     Status: "Server startup in progress"
        CPU: 10.572s

Sep 12 14:26:42 storage-instance systemd[1]: Starting Percona XtraDB Cluster...
Sep 12 14:26:57 storage-instance systemd[1]: mysql.service: Got notification message from PID 704383, but reception only permitted for main PID 704007
Sep 12 14:26:59 storage-instance systemd[1]: mysql.service: Main process exited, code=killed, status=6/ABRT
Sep 12 14:27:00 storage-instance mysql-systemd[704498]:  WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Sep 12 14:27:00 storage-instance mysql-systemd[704498]:  WARNING: mysql may be already dead
Sep 12 14:27:00 storage-instance systemd[1]: mysql.service: Failed with result 'signal'.
Sep 12 14:27:00 storage-instance systemd[1]: Failed to start Percona XtraDB Cluster.
Sep 12 14:27:00 storage-instance systemd[1]: mysql.service: Consumed 10.572s CPU time.

The PID file is empty (warning) because the ‘Main process exited’ 1 line above. Normal message.
Have you verified all the permissions for all paths/files listed in the my.cnf file? The socket, datadir, error log, pid, etc?

@matthewb I have double checked all paths and files in the cnf.

What’s odd is that the 3 node cluster was running perfectly fine, after a server restart which was needed on emergency maintenance, the SST didn’t want to start. I’ve been having to run a single node since then.

I’ve set verbosity in logs to the max, and verified that there are not AppArmor issues. How else can I figure out why I’m getting permission denied?

still seeing this on the error log for the donor node:

2023-09-24T17:21:57.195775Z 0 [Note] [MY-000000] [Galera] async IST sender starting to serve tcp://10.128.0.50:4568 sending 28948844-28951119, preload starts from 28948844
2023-09-24T17:21:57.195980Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.128.0.50:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.33-25.1'   '' --gtid '329f6151-d406-11ed-8dc7-77e2709c9139:28951138' )
2023-09-24T17:21:57.196241Z 0 [Note] [MY-000000] [Galera] IST sender 28948844 -> 28951119
2023-09-24T17:21:57.215063Z 10 [Note] [MY-000000] [WSREP] DONOR thread signaled with 0
2023-09-24T17:21:57.771413Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_common: line 979: /usr/bin/mysql: Operation not permitted
2023-09-24T17:21:57.772644Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:126

Check if xtrabackup is already running on any of the nodes. It could be left orphaned after last failed SST.
Also make sure all the important TCP ports are opened between nodes - 4567, 4444, 4568.