Adding arbiter to existing cluster

Cihan_Tunali · November 10, 2020, 8:05am

Hello all,

I have 3 node xtradb cluster and want to add arbiter to that cluster. I followed “https://www.percona.com/doc/percona-xtradb-cluster/8.0/howtos/garbd_howto.html” this article, but arbiter could not communicate even it has network connectivity. What am i missing?

Arbiter log:

2020-11-10 16:26:45.537 INFO: CRC-32C: using hardware acceleration.

2020-11-10 16:26:45.537 INFO: Read config:

daemon:   0

name:    garb

address:   gcomm://10.10.10.201:4567,10.10.10.202:4567,10.10.10.203:4567

group:    pxc-poc

sst:     trivial

donor:

options:   gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes

cfg:

log:

recv_script:

2020-11-10 16:26:45.539 INFO: protonet asio version 0

2020-11-10 16:26:45.539 INFO: Using CRC-32C for message checksums.

2020-11-10 16:26:45.539 INFO: backend: asio

2020-11-10 16:26:45.539 INFO: gcomm thread scheduling priority set to other:0

2020-11-10 16:26:45.540 WARN: Fail to access the file (./gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown

2020-11-10 16:26:45.540 INFO: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown

2020-11-10 16:26:45.540 INFO: GMCast version 0

2020-11-10 16:26:45.540 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567

2020-11-10 16:26:45.541 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1

2020-11-10 16:26:45.541 INFO: EVS version 1

2020-11-10 16:26:45.541 INFO: gcomm: connecting to group ‘pxc-poc’, peer ‘10.10.10.201:4567,10.10.10.202:4567,10.10.10.203:4567’

2020-11-10 16:26:48.543 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:48.543 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:48.543 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.203:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:48.543 INFO: announce period timed out (pc.announce_timeout)

2020-11-10 16:26:48.543 INFO: EVS version upgrade 0 → 1

2020-11-10 16:26:48.543 INFO: PC protocol upgrade 0 → 1

2020-11-10 16:26:48.543 WARN: no nodes coming from prim view, prim not possible

2020-11-10 16:26:48.543 INFO: Current view of cluster as seen by this node

view (view_id(NON_PRIM,60f0857d,1)

memb {

60f0857d,0

}

joined {

left {

partitioned {

)

2020-11-10 16:26:49.043 WARN: last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50246S), skipping check

2020-11-10 16:26:52.544 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:55.544 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:58.545 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:01.545 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:04.546 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:07.547 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:10.547 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:13.548 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:16.548 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:18.556 INFO: PC protocol downgrade 1 → 0

2020-11-10 16:27:18.556 INFO: Current view of cluster as seen by this node

view ((empty))

2020-11-10 16:27:18.556 ERROR: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)

 at gcomm/src/pc.cpp:connect():159

2020-11-10 16:27:18.556 ERROR: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)

2020-11-10 16:27:18.556 ERROR: gcs/src/gcs.cpp:gcs_open():1700: Failed to open channel ‘pxc-poc’ at ‘gcomm://10.10.10.201:4567,10.10.10.202:4567,10.10.10.203:4567’: -110 (Connection timed out)

2020-11-10 16:27:18.556 FATAL: Exception in creating receive loop: Failed to open connection to group: 110 (Connection timed out)

 at garb/garb_gcs.cpp:Gcs():35

matthewb · November 11, 2020, 12:40pm

@Ghan Confirm using telnet that you can indeed open a connection from arbitrator node to 10.10.10.201:4567. Since this is 8.0, SSL is enabled by default. Did you copy the SSL certificates from node1 to arbitrator node?

If you already have a 3-node cluster, why are you adding a 4th member (arbitrator)? All this does is add the possibility of split-brain errors.

Why do you feel you need the arbitrator?

Cihan_Tunali · November 12, 2020, 1:31am

Hi @matthewb and thank you for your reply.

First, I did not copy SSL certificates because it is not written at documentation page (https://www.percona.com/doc/percona-xtradb-cluster/8.0/howtos/garbd_howto.html) and i did not know it Can you please guide me how to do that?

Second, I will have 4 node cluster which will have 2 primary node, 1 arbiter and 1 primary with vote 0. The node which has vote 0, it will not become primary anymore but will keep up to date data so i will use this node to get storage snapshot (i will stop mysql service, get snapshot of disk by hardware level then start mysql service again). Thank you!

matthewb · November 12, 2020, 12:41pm

You could change the weight of node1 to 2, node1 to 2, node3 to 0. That way, still a total vote of 3. If node3 goes offline, node1+node2 = 3.

You are right, the SSL docs are not there. You need to SCP the .pem files from node1 over to arbitrator node and configure garb to use them.

Cihan_Tunali · November 12, 2020, 1:32pm

I will change the weight of node1 to 1, node2 to 1, node3 to 0 and arbiter to 1. I think that will be enough.

About SSL thing, after finding SSL files from node1, I dont know how to configure garb to use them

matthewb · November 17, 2020, 10:04am

@Ghan You’ll need to read up on the galera arbitrator documentation for the parameters you need to pass to enable the SSL connectivity. There are some SSL examples here: https://galeracluster.com/library/documentation/arbitrator.html

Topic		Replies	Views
Percona Galera Arbitrator can not join cluster Percona XtraDB Cluster 8.x	4	1247	July 15, 2022
Arbitrator garb and Centos 6.5 - Startup Error Percona XtraDB Cluster 5.x	2	1699	June 5, 2014
Arbitrator Garbd Percona XtraDB Cluster 5.x	2	6597	March 14, 2014
TimeOut on arbitrator after upgrate pxc from 5.7 to 8.0 Percona XtraDB Cluster 8.x	1	574	August 3, 2020
Node not able to join the cluster - PXC 8.0.27 Percona XtraDB Cluster 8.x mysql , percona	8	677	May 27, 2022

Adding arbiter to existing cluster

Related topics