Joining a new node to cluster

I am starting a new cluster xtraDB Cluster 8.x.
My bootstrapped node is up and running.
On node 2 here is the my.cnf

[mysqld] server-id=5 datadir=/db socket=/var/run/mysqld/mysqld.sock log-error=/var/log/mysql/error.log pid-file=/var/run/mysqld/mysqld.pid wsrep_provider=/usr/lib/galera4/libgalera_smm.so wsrep_cluster_address=gcomm://192.168.2.61 #first node's ip address binlog_format=ROW innodb_autoinc_lock_mode=2 wsrep_node_address=192.168.4.71 wsrep_cluster_name=Cluster-2021 wsrep_node_name=Node2 wsrep_sst_method=xtrabackup-v2 wsrep_sst_donor=192.168.2.61 wsrep_sst_auth=rep:2021_rep

I am getting the following errors Galera Fail to access the file (/db//gvwstate.dat) error (No such file or directory) the /db directory does exist.

connecting to group '[Galera] handshake with remote endpoint ssl://192.168.2.61:4567 failed: asio.ssl:337047686: 'certificate verify failed' ( 337047686: 'error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed')' [Galera] handshake with remote endpoint ssl://192.168.2.61:4567 failed: asio.ssl:337047686: 'certificate verify failed' ( 337047686: 'error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed')

Can someone give some advice. Is this a security certificate issue?

1 Like

Yes it is. PXC8 has SSL/TLS enabled by default. You need to copy the *.pem files which were created on node1 over to data directory on node2, then start node2. Same for node3. These steps are detailed out in our documentation.

1 Like

I skipped the ssl issue pxc-encrypt-cluster-traffic. I am now getting a connection timeout issue. However I am not sure how the port is listed as 4567. In my.cnf there is no port listed, nor is there a port in the service file. On the working node SHOW GLOBAL VARIABLES LIKE 'PORT'; returns 3306. Is there a port flag

(909df668-b14a, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.2.61:4567 timed out, no messages seen in PT3S, socket stats: rtt: 217 rttvar: 108 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000296819 last_delivered_since: 3000296819 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
The port is open however I am getting the error.

1 Like

After making some changes in my.cnf
I am getting this error failed to establish connection with cluster (reason: 7)
I’ve bootstrapped the first node, I am now trying to join but I am getting the above error.
wsrep_cluster_address has the ip of the bootstrapped node.
Failed to open backend connection: -110

1 Like

Port 4567 and 4568 are the ports used by Galera communications. These ports, and 4444 should be opened between nodes. Galera does not use 3306; 3306 is for application<–>mysql connections.

1 Like

I updated the my.cnf file for the bootstrapped node and restarted.
wsrep_provider_options="gmcast.listen_addr=tcp://192.168.4.61:23306; ist.recv_addr=192.168.4.61:43306" wsrep_sst_receive_address=192.168.4.61:33306

On my joiner node I updated the my.cnf to
wsrep_provider_options="gmcast.listen_addr=tcp://192.168.4.72:23306; ist.recv_addr=192.168.4.72:43306" wsrep_sst_receive_address=192.168.4.72:33306
My bootstrapped node is up, when I try to start the joiner node I am getting.

[Galera] gcomm: connecting to group 'DB-cluster', peer '192.168.2.61:' [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out) at gcomm/src/pc.cpp:connect():161 [Galera] gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out) [Galera] gcs/src/gcs.cpp:gcs_open():1757: Failed to open channel 'DB-cluster' at 'gcomm://192.168.2.61': -110 (Connection timed out) [WSREP] Provider/Node (gcomm://192.168.2.61) failed to establish connection with cluster (reason: 7)
I checked to see if I have SELinux enabled on both machines and they are disabled.
I can ping the 2.61 address so it should not be a connection issue.

1 Like

I recommend that you do not change the ports; you will only add to the confusion already here.

ping doesn’t tell you anything. You need to check the actual ports:

node1: socat - TCP-LISTEN:4444
node2: echo "hello" | socat - TCP:ip.adr.of.node1:4444

Then swap the nodes and run the commands again. Do this for all ports involved.

1 Like

Ok I removed the wsrep_provider_options ip line. So now I tested all of the ports and I get hello back from port 4444 and 4568 and from port 4567 i get already in use. I am guessing that means the ports are open for exchanging data?

1 Like

The syncing process is failing with the error message. This is on from the donor.

 xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe) xb_stream_write_data() failed. 

xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[02] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[02] xtrabackup: Error: failed to copy datafile.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[04] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[04] xtrabackup: Error: failed to copy datafile.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[03] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[03] xtrabackup: Error: failed to copy datafile.                                                                                    
211207 11:21:13 >> log scanned up to (10790810915817)
[ERROR] [MY-000000] [WSREP-SST] ------------innobackup.backup.log (END) ------------
[ERROR] [MY-000000] [WSREP-SST] ******************************************************
[ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:22
[ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults- file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1'   '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:28' : 22 (Invalid argument)
[Note] [MY-000000] [Galera] SST sending failed: -22
[Note] [MY-000000] [WSREP] Server status change donor -> joined 
[Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
[ERROR] [MY-000000] [WSREP] Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1'   '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:28'
[Warning] [MY-000000] [Galera] 0.0 (DB403): State transfer to 1.0 (DBDEV) failed: -22 (Invalid argument)
[Note] [MY-000000] [Galera] Shifting DONOR/DESYNCED -> JOINED (TO: 28)
[Note] [MY-000000] [Galera] Member 0.0 (DB403) synced with group.
[Warning] [MY-000000] [Galera] Peer (IST receiver) 751374bb-574e-11ec-8506-22e187707781 for IST AsyncSender seems to be disconnected. Terminating IST AsyncSender.

On the joiner here is some of the log.

[Warning] [MY-000000] [WSREP] Node is running in bootstrap/initialize mode. Disabling pxc_strict_mode checks #Not sure why it is trying to bootstrap when I am joining a cluster.
 [Galera] Server DBDEV connected to cluster at position bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:28 with
ID 751374bb-574e-11ec-8506-22e187707781
[MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|192.168.4.71:4444/xtrabackup_sst//1
[Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
[Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
[Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 28, STRv: 3
[Note] [MY-000000] [Galera] IST receiver addr using tcp://192.168.4.71:4568
[Note] [MY-000000] [Galera] Prepared IST receiver for 0-28, listening at: tcp://192.168.4.71:4568
[Note] [MY-000000] [Galera] Member 1.0 (DBDEV) requested state transfer from 'DB403'. Selected 0.0 (DB403)(SYNCED) as donor.
[Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 28)
[Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
[Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
[Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
[Note] [MY-000000] [Galera] (751374bb-8506, 'tcp://0.0.0.0:4567') turning message relay requesting off
[ERROR] [MY-000000] [WSREP-SST] Killing SST (30922) with SIGKILL after stalling for 120 seconds
[Warning] [MY-000000] [Galera] 0.0 (DB403): State transfer to 1.0 (DBDEV) failed: -22 (Invalid argument)
[ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1214: Will never receive state. Need to abort.
[Note] [MY-000000] [Galera] gcomm: terminating thread
[Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 185: 30924 Killed socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
[ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
[ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 137 137
[ERROR] [MY-000000] [WSREP-SST] Line 1296
[ERROR] [MY-000000] [WSREP-SST] ******************************************************
[ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
[ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.71' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '30416' --mysqld-version '8.0.25-15.1'   '' : 32 (Broken pipe)
[ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
[ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
[ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
1 Like

You posted this question in the PXC 5.* forum, but you are running 8.0.25. Since you are running 8.*, you are probably being affected by SSL issues. You need to copy the *.pem certs from node1 over to node2 and configure mysql to load them before starting node2. I see you said you disabled SSL, but did you restart node1 after changing that?

All of your error messages point to networking issues, but since you confirmed the ports are open using socat/echo, then SSL is the next issue.

1 Like

Yes after disabling ssl with pxc-encrypt-cluster-traffic = OFF node1 has been restarted (a few times). Is there a username and password that I need to create? I will copy the pem files.
Sorry I didnt see it was under PXC 5*.

1 Like

I have added they pem files. Here an updated my.cnf

# SST method
wsrep_sst_method=xtrabackup-v2
wsrep_sst_donor=DB403
pxc-encrypt-cluster-traffic = ON
[sst]
encrypt=4
ssl-ca=/var/lib/mysql/ca.pem
ssl-cert=/var/lib/mysql/server-cert.pem
ssl-key=/var/lib/mysql/server-key.pem

Here is the error I am getting

[Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
[ERROR] [MY-000000] [Galera] handshake with remote endpoint ssl://192.168.2.61:4567 failed: asio.ssl:337047686: 'certificate verify failed' ( 337047686: 'error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed')

A few of those fail messages.
The cert files from node1 to node2 so they are the same.

1 Like

There’s clearly some issue with the certificates. Is there a passphrase on the key? Can you use openssl to verify the cert against the ca?

1 Like

Here is the output.

/var/lib/mysql/ca.pem: 
CN = MySQL_Server_8.0.25-15_Auto_Generated_CA_Certificate error 18 at 0 depth lookup: self signed certificate error /var/lib/mysql/ca.pem: verification failed

The certififcate created by percona is self-signed so the verify tool can verify against a public certififcate authority. This does not mean the certififcate is invalid, only self signed.

Do I need the *.pem file that was created in the datadir from the first node. My datadir is not the usual mysql location. The pem files /var/lib/mysql are from default mysql adds when installing. I am looking in my datadir and there are pem files in there, should I use those pem files?

1 Like

Is it possible to use xtrabackup version 8.0.26-18 with Xtra Cluster 8.0.25. I think that might be my issue.

1 Like

You should be using the PEM files from your current $datadir. If you changed datadir, then you will have new PEM files in the new current datadir. You can remove the PEM files from /var/lib/mysql

1 Like

Do I need a different location for the PEM files.
My datadir is /db1/ do I need to make a new new directory certdir on all of the nodes and copy the pem files to the newly created directory?
Because bringing on a new node, guessing the pem files will get copied over during sst transfer.

1 Like

This is 100% false. You must manually copy over the PEM files before starting node via SST.

The best solution is to make a cert dir on all nodes. I use /etc/ssl/mysql/ in my training classes. Start the first node using whatever datadir you want. Let mysql generate PEM files here. Copy the PEM files to /etc/ssl/mysql/ Edit my.cnf and specify this path to all cert files.

Copy PEM to node2. Edit node2 my.cnf to match node1 (ie: ssl paths). Start node2. Repeat node3. This way, all of your certs are always in the same location on every server. If you put the certs into datadir on node2 before starting, the SST process will erase the PEM files before syncing data.

1 Like

Ok, thanks I assumed the new directory was the correct way to handle. I also created a tmpdir for storing files during sst transfer. Now I am getting a new error, the sync is happening I am getting a lot of log scanned up to (xxxxx). Then it fails with the following errors.

On the joiner:

[ERROR] [MY-000000]Killing SST (77001) with SIGKILL after stalling for 120 seconds
[Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 185: 77003 Killed socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
[ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 137 137
[ERROR] [MY-000000] [WSREP-SST] Line 1296
[ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.71' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '76495' --mysqld-version '8.0.25-15.1'   '' : 32 (Broken pipe)
....

On the donor:

211210 19:30:50 >> log scanned up to (10790818111434)
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
[ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (END) ------------
[ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1' '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:54' : 22 (Invalid argument)
1 Like

That usually means the SSL tunnel was not established or for some reason terminated early/unexpectedly.

1 Like