Percona XtraDB 8.0.30-22 - Nodes 2 and 3 refuse to connect

I can bootstrap the first node and that is functioning as it should. However, for some reason I cannot get nodes 2 and 3 to start. Here is the contents of the mysqld.cnf file on Ubuntu 22.04.1 LTS (Jammy):

[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
server-id=2
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid

binlog_expire_logs_seconds=604800
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.56.250,192.168.56.251,192.168.56.249
binlog_format=ROW
wsrep_slave_threads=8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=192.168.56.250
wsrep_cluster_name=nextcloud
wsrep_node_name=nextcloud2
pxc_strict_mode=ENFORCING
wsrep_sst_method=xtrabackup-v2

Obviously node 3 has it’s IP Address and hostname with the correct cluster name.

Here is the output of systemctl status mysql.service:
mysql.service - Percona XtraDB Cluster
Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2022-12-30 16:33:40 UTC; 13min ago
Process: 5356 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Process: 5393 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
Process: 5422 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 5424 ExecStartPre=/bin/sh -c VAR=bash /usr/bin/mysql-systemd galera-recovery; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 5514 ExecStart=/usr/sbin/mysqld $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Process: 5517 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Main PID: 5514 (code=exited, status=1/FAILURE)
Status: “Server shutdown complete”
CPU: 1.098s

Dec 30 16:33:03 nextcloud2 systemd[1]: Starting Percona XtraDB Cluster

Dec 30 16:33:40 nextcloud2 systemd[1]: mysql.service: Main process exited, code=exited, status=1/FAILURE
Dec 30 16:33:40 nextcloud2 mysql-systemd[5517]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 30 16:33:40 nextcloud2 mysql-systemd[5517]: WARNING: mysql may be already dead
Dec 30 16:33:40 nextcloud2 systemd[1]: mysql.service: Failed with result ‘exit-code’.
Dec 30 16:33:40 nextcloud2 systemd[1]: Failed to start Percona XtraDB Cluster.
Dec 30 16:33:40 nextcloud2 systemd[1]: mysql.service: Consumed 1.098s CPU time.

If there is any more information required then please let me know.
I have tried many different options including re-installing and rebuilding 3 x new VMs but I keep getting the same issue where the bootstrap node works correctly but nodes 2 and 3 will not even start.

Any help appreciated.
Many thanks

1 Like

Can you ensure that you have wsrep_sst_auth configured on the donor node? Read about this here,

Also, ensure you have the ports for SST open between all the nodes.

4 tcp ports are used by Pecona XtraDB Cluster:

  • the regular MySQL port (default 3306)
  • port for group (Galera) communication (default 4567)
  • port for State Transfer (default 4444)
  • port for Incremental State Transfer (default is: port for group communication (4567) + 1 = 4568)

This doc should help to clear more doubts to configure PXC.

1 Like

@yunus_shaikh wsrep_sst_auth was removed in 8.0 so that’s not an issue anymore.

@adgwytc PXC enables SSL by default also in 8.0. You are required to copy the SSL certificates from the bootstrap node to the other nodes before you start them. All nodes must use the same SSL certificates otherwise connections will fail. You can set pxc-encrypt-cluster-traffic=OFF as a temporary workaround to make sure your nodes can actually connect, but best security practices say you should leave this on and fix the SSLs.

2 Likes

Thank you both for your advice. Much appreciated. I will be following the advice tomorrow morning at work.

Strangely, the way I configured the SSL Certs and copied them worked fine on Redhat but don’t appear to be then on Ubuntu. The Redhat Percona version was 8.0 too.

Thanks again and I shall update tomorrow when I have tested.

1 Like

So, it looks like there is an issue with the certs. If I turn certs off then all systems connect without a problem. I will look into this and update

1 Like

Unfortunately I still cannot get this functioning with the certs. Here is how I am creating the certs on the bootstrapped node and then copying across to the other nodes:

openssl req -new -newkey rsa:4096 -nodes -keyout nextcloud1.key -out nextcloud1.csr
openssl x509 -req -sha256 -days 365 -in nextcloud1.csr -signkey nextcloud1.key -out nextcloud1.pem
openssl x509 -req -days 365 -in nextcloud1.csr -signkey nextcloud1.key -out nextcloud1.crt

These are then replicated with nextcloud2 and nextcloud3 to end up with 3 sets of certs. These are copied to the following directory:

/var/lib/mysql/

and then called from within mysqld.cnf file as follows:

[client]
socket=/var/run/mysqld/mysqld.sock
ssl-ca = /var/lib/mysql/nextcloud1.pem
ssl-cert = /var/lib/mysql/nextcloud1.crt
ssl-key = /var/lib/mysql/nextcloud1.key

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
tls_version=TLSv1.2
require_secure_transport = ON
ssl-ca = /var/lib/mysql/nextcloud1.pem
ssl-cert = /var/lib/mysql/nextcloud1.crt
ssl-key = /var/lib/mysql/nextcloud1.key

But the bootstrap node fails to come up with this configuration but works without the configuration.

Any ideas?

1 Like

Don’t put the certs in /var/lib/mysql as they will be overritten. Create something like /etc/ssl/mysql and put all certs there and adjust my.cnf to use that path.

2 Likes

Hi Matthew,
Thanks for the heads up but that is not the issue. If that were the case it would fail in Redhat too as that is where I placed them there. But, just to be safe I followed you advice and completed the following:

mkdir /etc/ssl/nextcloud - I made nextcloud mysql:mysql and also all the certs.

I completed the same on the other 2 x servers.

Now, it could be I have made the certs incorrectly here for Ubuntu. Let’s go through what I did:

Server 1: (nextcloud1)
openssl req -new -newkey rsa:4096 -nodes -keyout nextcloud1.key -out nextcloud1.csr
openssl x509 -req -sha256 -days 365 -in nextcloud1.csr -signkey nextcloud1.key -out cacert.pem
openssl x509 -req -days 365 -in nextcloud1.csr -signkey nextcloud1.key -out nextcloud1.crt

Server 2: (nextcloud2)
openssl req -new -newkey rsa:4096 -nodes -keyout nextcloud2.key -out nextcloud2.csr
openssl x509 -req -sha256 -days 365 -in nextcloud2.csr -signkey nextcloud2.key -out cacert.pem
openssl x509 -req -days 365 -in nextcloud2.csr -signkey nextcloud2.key -out nextcloud2.crt

Server 3: (nextcloud3)
openssl req -new -newkey rsa:4096 -nodes -keyout nextcloud3.key -out nextcloud3.csr
openssl x509 -req -sha256 -days 365 -in nextcloud3.csr -signkey nextcloud3.key -out cacert.pem
openssl x509 -req -days 365 -in nextcloud3.csr -signkey nextcloud3.key -out nextcloud3.crt

As per recommendations this was all completed on the bootstrap server (nextcloud1) and then copied to the locations on server 2 and server 3.

However, becasue there could only be the one .pem file, that was never overwritten and my guess is that should remain so (not that it makes much difference as server 2 and server 3 are not even being started yet - this is purely server 1).

Here is the relevant section of the mysqld.cnf file:
[client]
socket=/var/run/mysqld/mysqld.sock
ssl-ca = /etc/ssl/nextcloud/cacert.pem
ssl-cert = /etc/ssl/nextcloud/nextcloud1.crt
ssl-key = /etc/ssl/nextcloud/nextcloud1.key

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
tls_version=TLSv1.2
require_secure_transport = ON
ssl-ca = /etc/ssl/nextcloud/cacert.pem
ssl-cert = /etc/ssl/nextcloud/nextcloud1.crt
ssl-key = /etc/ssl/nextcloud/nextcloud1.key

The error is the typical “There is something wrong with your certs” error as shown below:

Jan 04 15:46:24 nextcloud1 systemd[1]: Starting Percona XtraDB Cluster

Jan 04 15:46:24 nextcloud1 mysql-systemd[19103]: WARNING: PXC is in bootstrap mode. To switch to normal operation, first stop the mysql@bootstrap.service then start the mysql service
Jan 04 15:46:24 nextcloud1 systemd[1]: mysql.service: Control process exited, code=exited, status=1/FAILURE
Jan 04 15:47:54 nextcloud1 systemd[1]: mysql.service: State ‘stop-post’ timed out. Terminating.
Jan 04 15:47:54 nextcloud1 systemd[1]: mysql.service: Control process exited, code=killed, status=15/TERM
Jan 04 15:47:54 nextcloud1 systemd[1]: mysql.service: Failed with result ‘exit-code’.
Jan 04 15:47:54 nextcloud1 systemd[1]: mysql.service: Unit process 19257 (sleep) remains running after unit stopped.
Jan 04 15:47:54 nextcloud1 systemd[1]: Failed to start Percona XtraDB Cluster.

root@nextcloud1:/etc/ssl/nextcloud# mysql -u root -p
Enter password:
ERROR 2026 (HY000): SSL connection error: error:0A000086:SSL routines::certificate verify failed

There is nothing showing in /var/log/mysql/error.log

Any further ideas please? I have tried re-certing multitude of times now with no luck.

1 Like

No no, the SAME certs on all 3 machines. Don’t make new ones for each machine. Make the certs only once and copy to all 3 machines.
Also, remove this require_secure_transport from my.cnf

2 Likes

Hi Matthew,

Again. Many thanks for the response. It seems that the bootstrap system just does not like the certs and I have no idea why.

I have completed as you stated (below):

Removed all current certs from each system and confirmed no mysql running with “systemctl status mysql”.
On server 1 (nextcloud1) I completed the following:

openssl req -new -newkey rsa:4096 -nodes -keyout nextcloud1.key -out nextcloud1.csr
openssl x509 -req -sha256 -days 365 -in nextcloud1.csr -signkey nextcloud1.key -out cacert.pem
openssl x509 -req -days 365 -in nextcloud1.csr -signkey nextcloud1.key -out nextcloud1.crt

I then copied these to the other two servers into “/etc/ssl/nextcloud/” and made them mysql:mysql.

In the mysqld.cnf file I commented out “require_secure_transport” as recommended.

I then restarted with “systemctl start mysql@bootstrap.service” and I still get the same errors as show below:

Jan 04 20:48:22 nextcloud1 systemd[1]: Starting Percona XtraDB Cluster

Jan 04 20:48:56 nextcloud1 systemd[1]: mysql.service: Main process exited, code=exited, status=1/FAILURE
Jan 04 20:48:56 nextcloud1 mysql-systemd[20158]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Jan 04 20:48:56 nextcloud1 mysql-systemd[20158]: WARNING: mysql may be already dead
Jan 04 20:48:56 nextcloud1 systemd[1]: mysql.service: Failed with result ‘exit-code’.
Jan 04 20:48:56 nextcloud1 systemd[1]: Failed to start Percona XtraDB Cluster.

Trying to login to see:
root@nextcloud1:/etc/ssl/nextcloud# mysql -u root -p
Enter password:
ERROR 2026 (HY000): SSL connection error: error:0A000086:SSL routines::certificate verify failed

Error log from this morning:
2023-01-05T08:56:36.564240Z 1 [Note] [MY-000000] [WSREP] Server status change initialized → joined
2023-01-05T08:56:36.564266Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-01-05T08:56:36.564281Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-01-05T08:56:36.571772Z 1 [Note] [MY-000000] [Galera] Recording CC from group: 28
2023-01-05T08:56:36.571797Z 1 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 28
2023-01-05T08:56:36.571815Z 1 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 1
2023-01-05T08:56:36.571836Z 1 [Note] [MY-000000] [Galera] Server nextcloud1 synced with group
2023-01-05T08:56:36.571844Z 1 [Note] [MY-000000] [WSREP] Server status change joined → synced
2023-01-05T08:56:36.571849Z 1 [Note] [MY-000000] [WSREP] Synchronized with group, ready for connections
2023-01-05T08:56:36.571853Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.

And finally, the full “mysqld.cnf” file without the comments:
[client]
socket=/var/run/mysqld/mysqld.sock
ssl-ca = /etc/ssl/nextcloud/cacert.pem
ssl-cert = /etc/ssl/nextcloud/nextcloud1.crt
ssl-key = /etc/ssl/nextcloud/nextcloud1.key

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
tls_version=TLSv1.2
#require_secure_transport = ON
ssl-ca = /etc/ssl/nextcloud/cacert.pem
ssl-cert = /etc/ssl/nextcloud/nextcloud1.crt
ssl-key = /etc/ssl/nextcloud/nextcloud1.key

binlog_expire_logs_seconds=604800
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://
binlog_format=ROW
wsrep_slave_threads=8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=192.168.56.251
wsrep_cluster_name=nextcloud
wsrep_node_name=nextcloud1
pxc_strict_mode=ENFORCING
wsrep_sst_method=xtrabackup-v2

The actual cluster, as mentioned before, works fine and tested good without the certs, so something is not right with the certs but I cannot work out what it is.

The way I have completed initially works fine on redhat just cannot get it to work on Ubuntu.

1 Like

Update:

I removed the certs again and when re-configuring them I did not apply any passcode at the end of the information insertion for the CSR.

When I attempted to restart via bootstrap it failed, however, when I utilised “systemctl start mysql” it started, but, when trying to login I get the same error as follows:

root@nextcloud1:/etc/ssl/nextcloud# systemctl start mysql
root@nextcloud1:/etc/ssl/nextcloud# systemctl status mysql
● mysql.service - Percona XtraDB Cluster
Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-01-05 09:28:12 UTC; 4s ago
Process: 22020 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Process: 22057 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
Process: 22086 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 22088 ExecStartPre=/bin/sh -c VAR=bash /usr/bin/mysql-systemd galera-recovery; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 22193 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 22195 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 22136 (mysqld)
Status: “Server is operational”
Tasks: 52 (limit: 9407)
Memory: 394.7M
CPU: 2.650s
CGroup: /system.slice/mysql.service
└─22136 /usr/sbin/mysqld --wsrep_start_position=dbdd8456-883b-11ed-bfe0-860417cae853:29

root@nextcloud1:/etc/ssl/nextcloud# mysql -u root -p
Enter password:
ERROR 2026 (HY000): SSL connection error: error:0A000086:SSL routines::certificate verify failed

Given that all three systems are shutdown in the correct order then the first system should be started again in bootstrap mode. However, that does not explain why I still cannot login.

Weirdly, the “systemctl start mysql” will only start if the following line is blank:

wsrep_cluster_address=gcomm://

If I place any addresses of the other nodes in here it will not run.

Many thanks

1 Like

This combination:

wsrep_cluster_address=gcomm://
systemctl start mysql

is the SAME as this combination:

wsrep_cluster_address=gcomm://node1,node3,node4
systemctl start mysql@bootstrap

An empty gcomm:// is the same thing as bootstrap.

Systemctl treats mysql and mysql@bootstrap as 2 completely independent services. You MUST stop mysql@bootstrap if that is how you started it earlier.

Remove the RSA passphrase from your SSLs: openssl rsa -in ca-key.pem -out ca-key.pem
Here is documentation for how I create the SSL certificates in our training labs. I know this works because I use it weekly.

1 Like

Hi Matthew,

Now I know where I was going wrong.

This has worked perfectly and I now have the Cluster running with the certs enabled in mysqld.cnf.

Many thanks for your help with regards to this. Very much appreciated.

1 Like

hi Matthewb

If we mention the certificate path in mysqld.cnf. Then it will not look for the default data dir for the certificates which were created automatically.

Hi @rahuls50, yes, that is correct and that is the point. If you use the automatically generated certificates then the certs will be different on node1 and node2 and node2 will not be able to connect to node1. This is documented on our website that the certificates must be from the same CA in order for PXC node encryption to work.

hello Mattheb.

Thank you for your response.

Can you please let me know abt the certificate

CA.pem = this is root CA certificate
server-key.pem = I am confused with this. What it should be
server-cert.pem= I am confused with this. What it should be

I have wildcard certificate. (eg:- *.example.com)

Can i use this wildcard certificate as my server-cert.pem.

I have example.crt which includes (root CA, intermidiate ca, *.example.com)

And also

I have example.pem which includes (root CA, intermidiate ca, *.example.com , server key)

And does it needs to be in .pem format only.

Please let me know
I always confused when it comes to certificate.

@rahuls50 please open a new forum topic to discuss other issues so we can close this one out. thanks.

Hello Matthewb

Ok. I will open the new one