Galera Cluster ERROR when adding a node

Good afternoon, I have a cluster guys, with two nodes DB01 and DB02 but DB02 (primary) crashed automatically when we restarted DB02 DB01 stayed as (primary) so we thought of starting DB02 with the command “systemctl start mysqld” as a on the secondary without the --wsrep-new-cluster option because DB02 took the option from the primary,

with that we restarted and got the following errors:

2022-12-19T22:54:26.015335Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():160
2022-12-19T22:54:26.015358Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2022-12-19T22:54:26.015581Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel ‘zabbix_data_cluster’ at ‘gcomm://172.16.187.1,172.16.187.2 ,172.16.187.3’: -110 (Connection timed out)
2022-12-19T22:54:26.015600Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2022-12-19T22:54:26.015608Z 0 [ERROR] WSREP: wsrep::connect(gcomm://172.16.187.1,172.16.187.2,172.16.187.3) failed: 7
2022-12-19T22:54:26.015649Z 0 [ERROR] Aborting

and a WARNING:

2022-12-19T22:53:55.010452Z 0 [Warning] WSREP: access file(/data/database//gvwstate.dat) failed(No such file or directory)

the doubt is?

how to insert the DB02 back into the cluster,

The scenario today is like DB01 and DB02 stopped,

the data files are:

/data/database

and I also have another doubt what the files would be:

/var/lib/mysql

drwxr-xr-x 2 mysql mysql 53 Jun 3 2021 pki
-rw------- 1 mysql mysql 5 Sep 20 01:54 mysql.sock.lock
srwxrwxrwx 1 mysql mysql 0 Sep 20 01:54 mysql.sock

to force a clean SSL either /data/database or /var/lib/myql, or have another output such as:

down DB01 and up with the command:

service mysql start --wsrep-new-cluster

and then up DB02 with the command:

service mysql start,

so we will have DB01 (primary) and DB02 (secondary)

Could you help me with the best option or solution?

thank you

Check your my.cnf. I looks like you might have some strange characters or extra spaces in your gcomm:// address

1 Like

I know it’s old post. But it also happened to me a while ago, and I haven’t found the solution yet.
I had a Galera Cluster with 3 nodes running perfectly, but when I try to move one node to different IP segment, a problem arose. The first ERROR notification was:

[Note] WSREP: (d2866861-ab3e, 'tcp://0.0.0.0:4567') connection established to a56f0b84-b8ba tcp://10.xxx.xxx.xxx:4567
[Note] WSREP: (d2866861-ab3e, 'tcp://0.0.0.0:4567') connection to peer 66e3b9f2-ba37 with addr tcp://10.yyy.yyy.yyy:4567 timed out, no messages seen
....
2025-01-20T08:30:39.519493+07:00 hqmariadb5 mariadbd[26146]: 2025-01-20  8:30:39 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view
2025-01-20T08:30:39.519590+07:00 hqmariadb5 mariadbd[26146]: #011 at ./gcomm/src/pc.cpp:connect():160

I guessed that the cluster connection have been made, but there were some implementation errors on Gcomm layer.

It returned back to normal when I change it to the old IP subnet again. There was no network firewall or host firewall blocking the connection.