Xtradb Cluster 5.7, Can’t Connect Another Nodes

Hello!
I have following problem with Percona Xtradb Cluster 5.7:

Test environment: 4 hosts:
ipa01.test.local (CentOS 7) | 10.0.2.22 - DC/DNS

per01.test.local (Ubuntu 20.04 LTS) | 10.0.2.30 - that host will be bootstrap
per02.test.local (Ubuntu 20.04 LTS) | 10.0.2.31 - slave
per03.test.local (Ubuntu 20.04 LTS) | 10.0.2.32 - slave

Configure OS after installation:
1. Set /etc/hosts. For example per01.test.local:

127.0.0.1 localhost
127.0.1.1 per01
10.0.2.20 per01 per01.test.local

2. Set network /etc/netplan/00-installer-config.yaml. For example per01.test.local:

network:
  ethernets:
    ens33:
      addresses:
      - 10.0.2.30/24
      gateway4: 10.0.2.1
      nameservers:
        addresses:
        - 10.0.2.22
        search: [test.local]
  version: 2

netplan apply

3. Set time:
timedatectl set-timezone Asia/Almaty

4. Update hosts:
apt -y update && apt -y upgrade

5. The firewall is disabled by default. We are checking.
ufw status
Status: inactive

6. install freeipa-client:
apt install -y freeipa-client

7. Connect to domain:
ipa-client-install --mkhomedir --domain test.local --realm TEST.LOCAL --enable-dns-updates --ntp-server=10.0.2.22

The ipa-client-install command was successful

8. Check ipa client:
kinit admin
klist
Valid starting Expires Service principal
01/04/2023 10:08:23 01/05/2023 10:08:21 krbtgt/TEST.LOCAL@TEST.LOCAL

9. Reboot hosts

I strictly follow all the steps according to the official installation instructions for Percona XtraDB Cluster 5.7:

  1. sudo apt remove apparmor
  2. curl -O https://repo.percona.com/apt/percona-release_latest.generic_all.deb
  3. sudo apt -y install gnupg2 lsb-release ./percona-release_latest.generic_all.deb
  4. sudo apt update
  5. sudo percona-release setup pxc-57
  6. apt -y install percona-xtradb-cluster-57
    password for all hosts: Root1234
  7. sudo service mysql stop

Already at this step you can see the error:
root@per01:/home/dk01# systemctl status mysql

● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
     Loaded: loaded (/etc/init.d/mysql; generated)
     Active: inactive (dead) since Wed 2023-01-04 10:13:48 +06; 1min 50s ago
       Docs: man:systemd-sysv-generator(8)
    Process: 4514 ExecStop=/etc/init.d/mysql stop (code=exited, status=0/SUCCESS)

Jan 04 10:13:16 per01.test.local /etc/init.d/mysql[3285]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Jan 04 10:13:18 per01.test.local mysql[3222]:    ...done.
Jan 04 10:13:18 per01.test.local systemd[1]: Started LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
Jan 04 10:13:35 per01.test.local systemd[1]: Stopping LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Jan 04 10:13:35 per01.test.local mysql[4514]:  * Stopping MySQL (Percona XtraDB Cluster) mysqld
Jan 04 10:13:48 per01.test.local /etc/init.d/mysql[4636]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Jan 04 10:13:48 per01.test.local /etc/init.d/mysql[4640]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Jan 04 10:13:48 per01.test.local mysql[4514]:    ...done.
Jan 04 10:13:48 per01.test.local systemd[1]: mysql.service: Succeeded.
Jan 04 10:13:48 per01.test.local systemd[1]: Stopped LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.

Ok, we do not pay attention to follow the instructions further.

8. For per01 set wsrep.cnf:
nano /etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf

[mysqld]
# Path to Galera library
wsrep_provider=/usr/lib/libgalera_smm.so
# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://10.0.2.30,10.0.2.31,10.0.2.32
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB
# Slave thread to use
wsrep_slave_threads=1
wsrep_log_conflicts
# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=10.0.2.30
# Cluster name
wsrep_cluster_name=test-cluster
#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=per01.test.local
#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=ENFORCING
# SST method
wsrep_sst_method=xtrabackup-v2
#Authentication for SST method
wsrep_sst_auth="sstuser:Root12345"
  1. Copy wsrep for another nodes and change wsrep_node_name, wsrep_node_address.

  2. Bootstrapping the First Node:
    /etc/init.d/mysql bootstrap-pxc

root@per01:/etc/mysql/percona-xtradb-cluster.conf.d# /etc/init.d/mysql bootstrap-pxc

  • Bootstrapping Percona XtraDB Cluster database server mysqld [ OK ]
  1. Check ports:
    ss -tunlp
Netid            State             Recv-Q             Send-Q                         Local Address:Port                         Peer Address:Port            Process
udp              UNCONN            0                  0                                  127.0.0.1:323                               0.0.0.0:*                users:(("chronyd",pid=777,fd=5))
udp              UNCONN            0                  0                              127.0.0.53%lo:53                                0.0.0.0:*                users:(("systemd-resolve",pid=729,fd=12))
udp              UNCONN            0                  0                                      [::1]:323                                  [::]:*                users:(("chronyd",pid=777,fd=6))
tcp              LISTEN            0                  4096                           127.0.0.53%lo:53                                0.0.0.0:*                users:(("systemd-resolve",pid=729,fd=13))
tcp              LISTEN            0                  128                                  0.0.0.0:22                                0.0.0.0:*                users:(("sshd",pid=794,fd=3))
tcp              LISTEN            0                  4096                                 0.0.0.0:4567                              0.0.0.0:*                users:(("mysqld",pid=5399,fd=13))
tcp              LISTEN            0                  80                                         *:3306                                    *:*                users:(("mysqld",pid=5399,fd=44))
tcp              LISTEN            0                  128                                     [::]:22                                   [::]:*                users:(("sshd",pid=794,fd=4))
  1. Coonnet to the mysql
    mysql -u root -p
mysql> show status like 'wsrep%';
+----------------------------------+--------------------------------------+
| Variable_name                    | Value                                |
+----------------------------------+--------------------------------------+
| wsrep_local_state_uuid           | 1d701a1f-8be6-11ed-b2ae-7e7f086a9a74 |
| wsrep_protocol_version           | 9                                    |
| wsrep_last_applied               | 0                                    |
| wsrep_last_committed             | 0                                    |
| wsrep_replicated                 | 0                                    |
| wsrep_replicated_bytes           | 0                                    |
| wsrep_repl_keys                  | 0                                    |
| wsrep_repl_keys_bytes            | 0                                    |
| wsrep_repl_data_bytes            | 0                                    |
| wsrep_repl_other_bytes           | 0                                    |
| wsrep_received                   | 2                                    |
| wsrep_received_bytes             | 142                                  |
| wsrep_local_commits              | 0                                    |
| wsrep_local_cert_failures        | 0                                    |
| wsrep_local_replays              | 0                                    |
| wsrep_local_send_queue           | 0                                    |
| wsrep_local_send_queue_max       | 1                                    |
| wsrep_local_send_queue_min       | 0                                    |
| wsrep_local_send_queue_avg       | 0.000000                             |
| wsrep_local_recv_queue           | 0                                    |
| wsrep_local_recv_queue_max       | 2                                    |
| wsrep_local_recv_queue_min       | 0                                    |
| wsrep_local_recv_queue_avg       | 0.500000                             |
| wsrep_local_cached_downto        | 0                                    |
| wsrep_flow_control_paused_ns     | 0                                    |
| wsrep_flow_control_paused        | 0.000000                             |
| wsrep_flow_control_sent          | 0                                    |
| wsrep_flow_control_recv          | 0                                    |
| wsrep_flow_control_interval      | [ 100, 100 ]                         |
| wsrep_flow_control_interval_low  | 100                                  |
| wsrep_flow_control_interval_high | 100                                  |
| wsrep_flow_control_status        | OFF                                  |
| wsrep_flow_control_active        | false                                |
| wsrep_flow_control_requested     | false                                |
| wsrep_cert_deps_distance         | 0.000000                             |
| wsrep_apply_oooe                 | 0.000000                             |
| wsrep_apply_oool                 | 0.000000                             |
| wsrep_apply_window               | 0.000000                             |
| wsrep_apply_waits                | 0                                    |
| wsrep_commit_oooe                | 0.000000                             |
| wsrep_commit_oool                | 0.000000                             |
| wsrep_commit_window              | 0.000000                             |
| wsrep_local_state                | 4                                    |
| wsrep_local_state_comment        | Synced                               |
| wsrep_cert_index_size            | 0                                    |
| wsrep_cert_bucket_count          | 22                                   |
| wsrep_gcache_pool_size           | 1320                                 |
| wsrep_causal_reads               | 0                                    |
| wsrep_cert_interval              | 0.000000                             |
| wsrep_open_transactions          | 0                                    |
| wsrep_open_connections           | 0                                    |
| wsrep_ist_receive_status         |                                      |
| wsrep_ist_receive_seqno_start    | 0                                    |
| wsrep_ist_receive_seqno_current  | 0                                    |
| wsrep_ist_receive_seqno_end      | 0                                    |
| wsrep_incoming_addresses         | 10.0.2.30:3306                       |
| wsrep_cluster_weight             | 1                                    |
| wsrep_desync_count               | 0                                    |
| wsrep_evs_delayed                |                                      |
| wsrep_evs_evict_list             |                                      |
| wsrep_evs_repl_latency           | 0/0/0/0/0                            |
| wsrep_evs_state                  | OPERATIONAL                          |
| wsrep_gcomm_uuid                 | 8c0ac25b-8be7-11ed-8c48-8a82a5b54526 |
| wsrep_gmcast_segment             | 0                                    |
| wsrep_cluster_conf_id            | 1                                    |
| wsrep_cluster_size               | 1                                    |
| wsrep_cluster_state_uuid         | 1d701a1f-8be6-11ed-b2ae-7e7f086a9a74 |
| wsrep_cluster_status             | Primary                              |
| wsrep_connected                  | ON                                   |
| wsrep_local_bf_aborts            | 0                                    |
| wsrep_local_index                | 0                                    |
| wsrep_provider_name              | Galera                               |
| wsrep_provider_vendor            | Codership Oy <info@codership.com>    |
| wsrep_provider_version           | 3.63(rf47405c)                       |
| wsrep_ready                      | ON                                   |
+----------------------------------+--------------------------------------+
75 rows in set (0.00 sec)
  1. Create user:
mysql> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'Root12345';
mysql> GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
mysql> FLUSH PRIVILEGES;
  1. Adding Nodes to Cluster, for exmaple per03
    /etc/init.d/mysql start

root@per03:/etc/mysql/percona-xtradb-cluster.conf.d# systemctl status mysql

● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
     Loaded: loaded (/etc/init.d/mysql; generated)
     Active: failed (Result: exit-code) since Wed 2023-01-04 12:38:25 +06; 13min ago
       Docs: man:systemd-sysv-generator(8)
    Process: 6378 ExecStart=/etc/init.d/mysql start (code=exited, status=1/FAILURE)
Jan 04 12:38:03 per03.test.local systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Jan 04 12:38:03 per03.test.local mysql[6378]:  * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Jan 04 12:38:03 per03.test.local /etc/init.d/mysql[6436]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Jan 04 12:38:05 per03.test.local mysql[6378]:  * State transfer in progress, setting sleep higher mysqld
Jan 04 12:38:25 per03.test.local mysql[6378]:  * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).
Jan 04 12:38:25 per03.test.local mysql[6378]:    ...fail!
Jan 04 12:38:25 per03.test.local systemd[1]: mysql.service: Control process exited, code=exited, status=1/FAILURE
Jan 04 12:38:25 per03.test.local systemd[1]: mysql.service: Failed with result 'exit-code'.
Jan 04 12:38:25 per03.test.local systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
1 Like

Any ideas guys? I could give any additional information. :anguished:

1 Like

Please provide the mysql error log and not the output of systemctl.

Also, init.d scripts have been deprecated/dead for almost 7 years across all major operating systems. Why are you not using systemctl?

2 Likes

Thank you for your comment.

I used them because that’s what the manual recommends. But, I also tried systemctl start mysql and unfortunately the result is the same. Maybe there are some errors in the process or omissions in the installation process?

1 Like

Hmm. The manual is very much out of date then in that regard.

Did you provide the logs? I don’t see them above.

1 Like

Shure.

host = per03.test.local

  1. journalctl -xe
Jan 05 10:53:23 per03.test.local systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
-- Subject: A start job for unit mysql.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit mysql.service has begun execution.
--
-- The job identifier is 630.
Jan 05 10:53:23 per03.test.local mysql[15818]:  * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Jan 05 10:53:23 per03.test.local /etc/init.d/mysql[15876]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Jan 05 10:53:25 per03.test.local mysql[15818]:  * State transfer in progress, setting sleep higher mysqld
Jan 05 10:53:45 per03.test.local mysql[15818]:  * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).
Jan 05 10:53:45 per03.test.local mysql[15818]:    ...fail!
Jan 05 10:53:45 per03.test.local systemd[1]: mysql.service: Control process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit mysql.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Jan 05 10:53:45 per03.test.local systemd[1]: mysql.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit mysql.service has entered the 'failed' state with result 'exit-code'.
Jan 05 10:53:45 per03.test.local systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
-- Subject: A start job for unit mysql.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit mysql.service has finished with a failure.
--
-- The job identifier is 630 and the job result is failed.
  1. tail /var/log/mysql/error.log
    is empty

What other logs could I provide you to understand the situation?

1 Like

Hello Dmitriy,
We see that it is trying to get State transfer which is expected,

Jan 05 10:53:25 per03.test.local mysql[15818]:  * State transfer in progress

So can we get the mysql error logs from the Donor, i,e. the server which you bootstrapped.
Also, post the config file for per03.

1 Like

Connect to node1 and run SELECT @@log_error and that will be the location of MySQL’s error log. Please provide the most recent contents when you tried to start node2

1 Like

Thank you all for your participation. Problem solved.

I had two platforms and on each of them, this problem arose. However, it is a mistake to assume that the problem is related to “MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid”. This is just a consequence that can be caused by various problems, namely:

  1. Mistakes in host naming
  2. Mistakes in names or IP addresses in the wsrep.cnf file
  3. Mistakes in mysqld.cnf
  4. Lack of access to the donor host
  5. Password errors for user sstuser

My recommendations:
If you have a similar error on the first node… Try to get started:

  1. Leave the default mysqld.cnf configuration file and do not edit it.
  2. Do not specify additional addresses in the wsrep option “wsrep_cluster_address=gcomm://”

Should start. Then gradually make changes.

If the first donor host starts up for you, but the rest do not start, similar to my problem, try the following:

  1. Also check that the mysqld.cnf file is the same as on the donor host.
  2. Check that there are no errors in the wsrep.cnf file with names or ip addresses
  3. Check that the local hostnames are correct and they are accessible to each other

Check the following log if you are having problems:
/var/log/mysqld.log

Also, I did not resort to it, but given that in this case, little information is provided for analysis. Just a thought, I think you can temporarily adjust the script that launches the mysql service and replace /dev/null with error output. See the article on how to do this: Что такое /dev/null в Linux? - Ravesli

In my case, there were different errors in different places, which led visually to the same problem.

Have a nice day!

1 Like