Adding arbiter to existing cluster

Hello all,

I have 3 node xtradb cluster and want to add arbiter to that cluster. I followed “https://www.percona.com/doc/percona-xtradb-cluster/8.0/howtos/garbd_howto.html” this article, but arbiter could not communicate even it has network connectivity. What am i missing?

Arbiter log:

2020-11-10 16:26:45.537 INFO: CRC-32C: using hardware acceleration.

2020-11-10 16:26:45.537 INFO: Read config:

daemon:   0

name:    garb

address:   gcomm://10.10.10.201:4567,10.10.10.202:4567,10.10.10.203:4567

group:    pxc-poc

sst:     trivial

donor:

options:   gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes

cfg:

log:

recv_script:

2020-11-10 16:26:45.539 INFO: protonet asio version 0

2020-11-10 16:26:45.539 INFO: Using CRC-32C for message checksums.

2020-11-10 16:26:45.539 INFO: backend: asio

2020-11-10 16:26:45.539 INFO: gcomm thread scheduling priority set to other:0

2020-11-10 16:26:45.540 WARN: Fail to access the file (./gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown

2020-11-10 16:26:45.540 INFO: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown

2020-11-10 16:26:45.540 INFO: GMCast version 0

2020-11-10 16:26:45.540 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567

2020-11-10 16:26:45.541 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1

2020-11-10 16:26:45.541 INFO: EVS version 1

2020-11-10 16:26:45.541 INFO: gcomm: connecting to group ‘pxc-poc’, peer ‘10.10.10.201:4567,10.10.10.202:4567,10.10.10.203:4567’

2020-11-10 16:26:48.543 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:48.543 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:48.543 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.203:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:48.543 INFO: announce period timed out (pc.announce_timeout)

2020-11-10 16:26:48.543 INFO: EVS version upgrade 0 → 1

2020-11-10 16:26:48.543 INFO: PC protocol upgrade 0 → 1

2020-11-10 16:26:48.543 WARN: no nodes coming from prim view, prim not possible

2020-11-10 16:26:48.543 INFO: Current view of cluster as seen by this node

view (view_id(NON_PRIM,60f0857d,1)

memb {

60f0857d,0

}

joined {

}

left {

}

partitioned {

}

)

2020-11-10 16:26:49.043 WARN: last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50246S), skipping check

2020-11-10 16:26:52.544 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:55.544 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:26:58.545 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:01.545 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:04.546 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:07.547 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:10.547 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:13.548 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.202:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:16.548 INFO: (60f0857d, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.10.10.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)

2020-11-10 16:27:18.556 INFO: PC protocol downgrade 1 → 0

2020-11-10 16:27:18.556 INFO: Current view of cluster as seen by this node

view ((empty))

2020-11-10 16:27:18.556 ERROR: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)

 at gcomm/src/pc.cpp:connect():159

2020-11-10 16:27:18.556 ERROR: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)

2020-11-10 16:27:18.556 ERROR: gcs/src/gcs.cpp:gcs_open():1700: Failed to open channel ‘pxc-poc’ at ‘gcomm://10.10.10.201:4567,10.10.10.202:4567,10.10.10.203:4567’: -110 (Connection timed out)

2020-11-10 16:27:18.556 FATAL: Exception in creating receive loop: Failed to open connection to group: 110 (Connection timed out)

 at garb/garb_gcs.cpp:Gcs():35

@Ghan Confirm using telnet that you can indeed open a connection from arbitrator node to 10.10.10.201:4567. Since this is 8.0, SSL is enabled by default. Did you copy the SSL certificates from node1 to arbitrator node?

If you already have a 3-node cluster, why are you adding a 4th member (arbitrator)? All this does is add the possibility of split-brain errors.

Why do you feel you need the arbitrator?

Hi @matthewb and thank you for your reply.

First, I did not copy SSL certificates because it is not written at documentation page (https://www.percona.com/doc/percona-xtradb-cluster/8.0/howtos/garbd_howto.html) and i did not know it :slight_smile: Can you please guide me how to do that?

Second, I will have 4 node cluster which will have 2 primary node, 1 arbiter and 1 primary with vote 0. The node which has vote 0, it will not become primary anymore but will keep up to date data so i will use this node to get storage snapshot (i will stop mysql service, get snapshot of disk by hardware level then start mysql service again). Thank you!

You could change the weight of node1 to 2, node1 to 2, node3 to 0. That way, still a total vote of 3. If node3 goes offline, node1+node2 = 3.

You are right, the SSL docs are not there. You need to SCP the .pem files from node1 over to arbitrator node and configure garb to use them.

I will change the weight of node1 to 1, node2 to 1, node3 to 0 and arbiter to 1. I think that will be enough.

About SSL thing, after finding SSL files from node1, I dont know how to configure garb to use them :frowning:

@Ghan You’ll need to read up on the galera arbitrator documentation for the parameters you need to pass to enable the SSL connectivity. There are some SSL examples here: https://galeracluster.com/library/documentation/arbitrator.html