Unable to bootstrap the 1st node

Hi percona community,
I have 3 nodes with percona xtraDB cluster installed on all of them.I am trying to configure the xtraDB cluster and when I run systemctl start mysql@bootstrap.service I get the following error:
Job for mysql@bootstrap.service failed because a timeout was exceeded.
See “systemctl status mysql@bootstrap.service” and “journalctl -xe” for details.

When I run systemctl status mysql@bootstrap.service this is what I get:

[mysqluser@prod-mysql-node01 ~]$ systemctl status mysql@bootstrap.service
● mysql@bootstrap.service - Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap
Loaded: loaded (/usr/lib/systemd/system/mysql@.service; disabled; vendor preset: disabled)
Active: failed (Result: timeout) since Mon 2024-02-19 10:40:36 EAT; 29s ago
Process: 116552 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=3)
Process: 116410 ExecStart=/usr/sbin/mysqld $EXTRA_ARGS $_WSREP_START_POSITION (code=killed, signal=KILL)
Process: 116304 ExecStartPre=/bin/sh -c VAR=bash /usr/bin/mysql-systemd galera-recovery; [ $? -eq 0 ] && systemctl set-en>
Process: 116302 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 116260 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 116410 (code=killed, signal=KILL)
Status: “Server startup in progress”

Feb 19 10:39:06 prod-mysql-node01.ipsl.co.ke systemd[1]: mysql@bootstrap.service: start operation timed out. Terminating.
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke systemd[1]: mysql@bootstrap.service: State ‘stop-sigterm’ timed out. Killing.
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke systemd[1]: mysql@bootstrap.service: Killing process 116410 (mysqld) with signal>
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke systemd[1]: mysql@bootstrap.service: Main process exited, code=killed, status=9/>
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke mysql-systemd[116552]: /usr/bin/mysql-systemd: line 233: kill: (116410) - No suc>
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke mysql-systemd[116552]: WARNING: mysql already dead
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke mysql-systemd[116552]: ERROR! Stale PID file: /var/run/mysqld/mysqld.pid
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke systemd[1]: mysql@bootstrap.service: Control process exited, code=exited status=3
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke systemd[1]: mysql@bootstrap.service: Failed with result ‘timeout’.
Feb 19 10:40:36 prod-mysql-node01.ipsl.co.ke systemd[1]: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/my>
lines 1-21/21 (END)

This is my log file:
[root@prod-mysql-node01 log]# tail -f mysqld.log
2024-02-19T07:37:37.496153Z 0 [Note] [MY-000000] [Galera] Server initialized
2024-02-19T07:37:37.496163Z 0 [Note] [MY-000000] [WSREP] Server status change initializing → initialized
2024-02-19T07:37:37.496180Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-02-19T07:37:37.496236Z 2 [Note] [MY-000000] [Galera] Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
2024-02-19T07:37:37.497612Z 8 [Warning] [MY-013185] [Server] Currently unknown variable ‘clone_valid_donor_list’ was read from the persisted config file.
2024-02-19T07:37:37.497671Z 8 [Note] [MY-000000] [Galera] pause
2024-02-19T07:37:37.500525Z 7 [Note] [MY-000000] [WSREP] Cluster table is empty, not recovering transactions
2024-02-19T07:37:37.500570Z 2 [Note] [MY-000000] [WSREP] Server status change initialized → joined
2024-02-19T07:37:37.500582Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-02-19T07:37:37.500596Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.

What could I be missing?I am running it on rocky linux 8.9

Hi, have you changed the location of the pidfile? it seems systemd is timing out waiting for the service to be up.

Hi Ivan_Groenewold,
I have not changed it. My default location is /var/run/mysqld/mysqld.pid.

Please remove this unknown parameter from your my.cnf and also from $datadir/mysqld-auto.cnf

Hi matthewb,
After I removed the parameter,the bootstrap did start, however when I try to join in the 2nd node by starting mysql,I get the following error:

[root@prod-mysql-node03 log]# tail -f mysqld.log
2024-02-21T06:19:23.502248Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:19:25.001472Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:19:26.504612Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:19:28.003667Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:19:29.502814Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:19:31.004831Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:24:38.571926Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T06:24:40.571613Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T07:11:32.950841Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error
2024-02-21T07:11:34.943237Z 0 [Warning] [MY-000000] [Galera] Handshake failed: tlsv1 alert decrypt error

What could be the issue here?

You need to copy the SSL certificates from node1 over to node2 before starting node2. Or, more simply, disable pxc_encrypt_cluster_traffic on both nodes.

Hi matthewb,
I am now getting this error when I try to bootstrap:
[root@prod-mysql-node01 log]# tail -f mysqld.log
ERROR! WSREP: Failed to recover position:
Log of wsrep recovery (–wsrep-recover):
INFO: WSREP: Running position recovery with --log_error=‘/data01/mysql_data/mysql/wsrep_recovery_verbose.E1jd1l’ --pid-file=‘/data01/mysql_data/mysql/prod-mysql-node01.ipsl.co.ke-recover.pid’
ERROR! WSREP: Failed to recover position:
Log of wsrep recovery (–wsrep-recover):
INFO: WSREP: Running position recovery with --log_error=‘/data01/mysql_data/mysql/wsrep_recovery_verbose.xKt2xs’ --pid-file=‘/data01/mysql_data/mysql/prod-mysql-node01.ipsl.co.ke-recover.pid’
ERROR! WSREP: Failed to recover position:
Log of wsrep recovery (–wsrep-recover):
INFO: WSREP: Running position recovery with --log_error=‘/data01/mysql_data/mysql/wsrep_recovery_verbose.BufNcm’ --pid-file=‘/data01/mysql_data/mysql/prod-mysql-node01.ipsl.co.ke-recover.pid’
ERROR! WSREP: Failed to recover position:

Please attach more logs. Those last repeated lines of the same content don’t really help much. Is this on node1 after you disabled the cluster traffic encryption? Please provide a more detailed list of actions/steps/commands taken.

Hi matthewb,
This is after I removed the parameter.I am now trying to bootstrap the 1st node using the bootstrap command: systemctl start mysql@bootstrap.service

And its not giving any other logs apart from the one I shared initially.

@Cris Please stop mysql. Ensure all mysqld instances are not running (verify with ps -Af | grep mysqld). Then zero out the mysql error log (ex: echo >/path/to/file.log). Then try bootstrap. Then, attach the entire log file.

Hi matthewb,
I did another clean upgrade from MySQL 8.0.32 to Percona MySQL 8.0.32 and ran the bootstrap command which started well and its currently active. Now on trying to join the 2nd node, I am getting the error as attached on the file named node 2.Its not that descriptive so kind of difficult to troubleshoot.



See if you can assist. I have also attached part of node 1 logfile

I see where node2 failed, but the reason is in node1’s log, which you provided an unaligned timestamp picture. Please find the same timestamps in nod1 log and look for errors.

Hi matthewb,
Attached are the two logfiles in a text file from where the run started on the 2 nodes
Node 2 logfile.txt (15.1 KB)
Node 1 logfile.txt (12.1 KB)

Both logs show ‘operation canceled’. Make sure ports 4444, 4567, and 4568 are open between both hosts. Make sure SELinux/apparmor is disabled.

Hi matthewb,
Only port 4567 is in use by mysql as attached.SELINUX is DISABLED on both servers.

What about firewall? Is that disabled? (iptables/ufw, etc)

They are all disabled on all the nodes.

Is node1 in PRIMARY state? Able to read/write data?

Hi Matthewb,
Node 1 is in primary and can read/write data as shown below

So node1 is online, bootstrapped, and in PRIMARY state. But when you try to start node2, it will not join. 9/10 times this is network related. Read over your my.cnf again and check everything lines up. Provide both here if you wish.