Hi all,
I have a two node Persona 8 cluster up and running with default ssl settings:
# mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_%';"
+-----------------------------------+--------------------------------------------------------+
| Variable_name | Value |
+-----------------------------------+--------------------------------------------------------+
| wsrep_cluster_weight | 2 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 3 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | b88ce12a-e660-11ec-ab48-124f00d72b1c |
| wsrep_cluster_status | Primary |
+-----------------------------------+--------------------------------------------------------+
But when I try to add an arbitrator to the cluster I get the timeout error:
Command to start garb:
garbd --group=uat_ossvpnsuite --address="gcomm://10.1.54.64:4567,10.1.54.65:4567" --option="socket.ssl_key=/var/lib/galera/server-key.pem;socket.ssl_cert=/var/lib/gale
ra/server-cert.pem;socket.ssl_ca=/var/lib/galera/ca.pem;socket.ssl_cipher=AES128-SHA256"
Output:
2022-06-10 10:17:44.504 INFO: CRC-32C: using 64-bit x86 acceleration.
2022-06-10 10:17:44.504 INFO: Read config:
daemon: 0
name: garb
address: gcomm://10.1.54.64:4567,10.1.54.65:4567
group: uat_ossvpnsuite
sst: trivial
donor:
options: socket.ssl_key=/var/lib/galera/server-key.pem;socket.ssl_cert=/var/lib/galera/server-cert.pem;socket.ssl_ca=/var/lib/galera/ca.pem;socket.ssl_cipher=AES128-SHA256; gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
cfg:
log:
recv_script:
2022-06-10 10:17:44.505 WARN: Option 'gcs.fc_master_slave' is deprecated and will be removed in the future versions, please use 'gcs.fc_single_primary' instead.
2022-06-10 10:17:44.507 INFO: protonet asio version 0
2022-06-10 10:17:44.507 INFO: Using CRC-32C for message checksums.
2022-06-10 10:17:44.507 INFO: backend: asio
2022-06-10 10:17:44.507 INFO: gcomm thread scheduling priority set to other:0
2022-06-10 10:17:44.507 WARN: Fail to access the file (./gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2022-06-10 10:17:44.508 INFO: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2022-06-10 10:17:44.508 INFO: GMCast version 0
2022-06-10 10:17:44.508 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-06-10 10:17:44.508 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-06-10 10:17:44.509 INFO: EVS version 1
2022-06-10 10:17:44.509 INFO: gcomm: connecting to group 'uat_ossvpnsuite', peer '10.1.54.64:4567,10.1.54.65:4567'
2022-06-10 10:17:47.511 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.64:4567 timed out, no messages seen in PT3S, socket stats: rtt: 186 rttvar: 93 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000607106 last_delivered_since: 3000607106 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:17:47.511 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.65:4567 timed out, no messages seen in PT3S, socket stats: rtt: 334 rttvar: 167 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000793103 last_delivered_since: 3000793103 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:17:47.511 INFO: announce period timed out (pc.announce_timeout)
2022-06-10 10:17:47.512 INFO: EVS version upgrade 0 -> 1
2022-06-10 10:17:47.512 INFO: PC protocol upgrade 0 -> 1
2022-06-10 10:17:47.512 WARN: no nodes coming from prim view, prim not possible
2022-06-10 10:17:47.512 INFO: Current view of cluster as seen by this node
view (view_id(NON_PRIM,cdf95202-abcb,1)
memb {
cdf95202-abcb,0
}
joined {
}
left {
}
partitioned {
}
)
2022-06-10 10:17:48.012 WARN: last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50317S), skipping check
2022-06-10 10:17:52.012 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.64:4567 timed out, no messages seen in PT3S, socket stats: rtt: 271 rttvar: 135 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 2999936276 last_delivered_since: 2999936276 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:17:55.512 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.65:4567 timed out, no messages seen in PT3S, socket stats: rtt: 342 rttvar: 171 rto: 200000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 3499608484 last_delivered_since: 3499608484 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:17:58.513 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.64:4567 timed out, no messages seen in PT3S, socket stats: rtt: 180 rttvar: 90 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 2999752298 last_delivered_since: 2999752298 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:02.014 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.65:4567 timed out, no messages seen in PT3S, socket stats: rtt: 318 rttvar: 159 rto: 200000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 3499663839 last_delivered_since: 3499663839 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:05.014 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.64:4567 timed out, no messages seen in PT3S, socket stats: rtt: 227 rttvar: 113 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 2999996758 last_delivered_since: 2999996758 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:08.015 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.65:4567 timed out, no messages seen in PT3S, socket stats: rtt: 360 rttvar: 180 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 2999639557 last_delivered_since: 2999639557 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:11.016 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.64:4567 timed out, no messages seen in PT3S, socket stats: rtt: 202 rttvar: 101 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 2999724405 last_delivered_since: 2999724405 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:14.516 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.65:4567 timed out, no messages seen in PT3S, socket stats: rtt: 360 rttvar: 180 rto: 200000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 3499270093 last_delivered_since: 3499270093 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:17.517 INFO: (cdf95202-abcb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.1.54.64:4567 timed out, no messages seen in PT3S, socket stats: rtt: 204 rttvar: 102 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 2999658372 last_delivered_since: 2999658372 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2022-06-10 10:18:17.526 INFO: PC protocol downgrade 1 -> 0
2022-06-10 10:18:17.527 INFO: Current view of cluster as seen by this node
view ((empty))
2022-06-10 10:18:17.528 ERROR: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():161
2022-06-10 10:18:17.528 ERROR: gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2022-06-10 10:18:18.529 INFO: gcomm: terminating thread
2022-06-10 10:18:18.529 INFO: gcomm: joining thread
2022-06-10 10:18:18.529 ERROR: gcs/src/gcs.cpp:gcs_open():1758: Failed to open channel 'uat_ossvpnsuite' at 'gcomm://10.1.54.64:4567,10.1.54.65:4567': -110 (Connection timed out)
2022-06-10 10:18:18.529 INFO: Shifting CLOSED -> DESTROYED (TO: 0)
2022-06-10 10:18:18.529 FATAL: Garbd exiting with error: Failed to open connection to group: 110 (Connection timed out)
at garb/garb_gcs.cpp:Gcs():35
The certificates are copied from node01 and the y are identical on all nodes.
percona sql config:
[mysqld]
transaction-isolation = READ-COMMITTED
max_connect_errors = 9999999
wait_timeout = 9999999
bind-address = 0.0.0.0
datadir = /var/lib/mysql/
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_provider_options="gcache.size=512M"
wsrep_cluster_address=gcomm://10.1.54.64,10.1.54.65
binlog_format=ROW
wsrep_node_address=10.1.54.65
wsrep_sst_method=xtrabackup-v2
wsrep_cluster_name=uat_ossvpnsuite
wsrep_node_name=nluu-ossvpnsuitedb02
pxc_strict_mode=ENFORCING
log_timestamps=SYSTEM
Do you have any ideas how to fix the problem?