Nodes Cannot Join the Percona XtraDBCluster 8.0.33

Hello,

I reinstalled percona-xtradb-cluster-server-8.0.33-25 many times. I’m running the first server as bootstrap. But I could not include other nodes in the cluster.

2023-10-20T20:31:30.239323Z 0 [Warning] [MY-000000] [Galera] There are no nodes in the same segment that will ever be able to become donors, yet there is a suitable donor outside. Will use that one.
2023-10-20T20:31:30.239411Z 0 [Note] [MY-000000] [Galera] Member 1.1 (ishhysdb4) requested state transfer from ‘any’. Selected 0.0 (ishhysdb1)(SYNCED) as donor.
2023-10-20T20:31:30.239497Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY → JOINER (TO: 15)
2023-10-20T20:31:30.239614Z 2 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
2023-10-20T20:31:30.239652Z 2 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2023-10-20T20:31:30.239683Z 2 [Note] [MY-000000] [Galera] GCache history reset: ab45006b-6f5b-11ee-b945-1abbefdb6fc0:0 → 2f9cb569-6f81-11ee-8223-e65b2c6c5365:15
2023-10-20T20:33:09.347961Z 0 [Note] [MY-000000] [WSREP-SST] Trying to terminate (1037779) socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x with SIGTERM
2023-10-20T20:33:09.508161Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 218: 1037781 Exit 143 socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
2023-10-20T20:33:09.508261Z 0 [Note] [MY-000000] [WSREP-SST] 1037782 Terminated | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2023-10-20T20:33:10.518492Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2023-10-20T20:33:10.518616Z 0 [ERROR] [MY-000000] [WSREP-SST] Possible timeout in receving first data from donor in gtid/keyring stage
2023-10-20T20:33:10.518722Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1381
2023-10-20T20:33:10.518822Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2023-10-20T20:33:10.519099Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2023-10-20T20:33:10.717793Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘10.32.12.47’ --datadir ‘/var/lib/mysql/’ --basedir ‘/usr/’ --plugindir ‘/usr/lib64/mysql/plugin/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘1037237’ --mysqld-version ‘8.0.33-25.1’ ‘’ : 32 (Broken pipe)
2023-10-20T20:33:10.717910Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2023-10-20T20:33:10.717942Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2023-10-20T20:33:10.718159Z 3 [Note] [MY-000000] [Galera] Processing SST received
2023-10-20T20:33:10.718218Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2023-10-20T20:33:10.718293Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2023-10-20T20:33:10.718346Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2023-10-20T20:33:10.718384Z 3 [Note] [MY-000000] [Galera] Closing send monitor…
2023-10-20T20:33:10.718408Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2023-10-20T20:33:10.718506Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-10-20T20:33:10.718676Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-10-20T20:33:10.718826Z 3 [Note] [MY-000000] [Galera] gcomm: closing backend
2023-10-20T20:33:11.724873Z 3 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))

It looks like the port cannot be used, it is hard to tell without actually testing it, but you could check:

  • the port is open
  • there is no software preventing service binding to the port (e.g. selinux or aparmor)
  • there is no other software listening on the port (you could use “netstat” to determine that)

Hope it gives some idea of where to look at.

This is a strange warning. Please share your entire config for bootstrap node and one of the joiner nodes.

Additionally, as Taras said, check all ports: 4444, 4567, 4568, 3306

Hi,

I checked all ports and no issues about it. Additionaly, Installed OS is RHEL 8.7 and Selinux and firewalld are stopped.

Here it’s my current cluster configuration. thanks for colobration.

Provider specific configuration options

wsrep_provider_options = “pc.recovery=TRUE;pc.weight=2;gmcast.segment=1;socket.ssl_compression=no;gcache.keep_pages_size=4;gcache.page_size=128M;gcache.size=5024M;evs.send_window=1024;evs.user_send_window=1024; evs.use_aggregate = true; gcs.fc_factor = 1; gcs.fc_limit = 500; gcs.max_packet_size = 2097152;gcs.fc_debug = 0; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.suspect_timeout = PT30S; evs.stats_report_period = PT1M; evs.info_log_mask = 0; evs.install_timeout = PT1M; evs.join_retrans_period = PT2S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.use_aggregate = true; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql; gcache.name = /var/lib/mysql/galera.cache; gcs.fc_debug = 0; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.32.12.48; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3”

Binary log expiration period is 604800 seconds, which equals 7 days

binlog_expire_logs_seconds = 604800

######## wsrep ###############

Path to Galera library

wsrep_provider = /usr/lib64/galera4/libgalera_smm.so

wsrep_cluster_address = gcomm://10.32.12.48,10.32.12.49,10.32.12.45,10.32.12.47,10.32.12.46

In order for Galera to work correctly binlog format should be ROW

binlog_format = ROW

wsrep_applier_threads = 16
wsrep_certify_nonPK = 1
wsrep_max_ws_rows = 131072
wsrep_node_incoming_address = 10.32.12.48
wsrep_max_ws_size = 1073741824
wsrep_debug = 0
wsrep_retry_autocommit = 5
wsrep_auto_increment_control = 0
auto_increment_increment = 6
auto_increment_offset = 1
wsrep_sync_wait = 0
wsrep_log_conflicts = 1
wsrep_sst_donor_rejects_queries= 0

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode = 2
wsrep_slave_threads = 12

Node IP address

wsrep_node_address = 10.32.12.48

Cluster name

wsrep_cluster_name = ish-pxc

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name = ishhysdb1

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode = DISABLED

SST method

wsrep_sst_method = xtrabackup-v2
pxc-encrypt-cluster-traffic = OFF
innodb_strict_mode = 0
#lower_case_table_names = 1
sql_mode = NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

This is other node configuration.

Provider specific configuration options

wsrep_provider_options = “pc.recovery=TRUE;pc.weight=2;gmcast.segment=1;socket.ssl_compression=no;gcache.keep_pages_size=4;gcache.page_size=128M;gcache.size=5024M;evs.send_window=1024;evs.user_send_window=1024; evs.use_aggregate = true; gcs.fc_factor = 1; gcs.fc_limit = 500; gcs.max_packet_size = 2097152;gcs.fc_debug = 0; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.suspect_timeout = PT30S; evs.stats_report_period = PT1M; evs.info_log_mask = 0; evs.install_timeout = PT1M; evs.join_retrans_period = PT2S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.use_aggregate = true; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql; gcache.name = /var/lib/mysql/galera.cache; gcs.fc_debug = 0; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.32.12.49; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3”

Binary log expiration period is 604800 seconds, which equals 7 days

binlog_expire_logs_seconds = 604800

######## wsrep ###############

Path to Galera library

wsrep_provider = /usr/lib64/galera4/libgalera_smm.so

wsrep_cluster_address = gcomm://10.32.12.48,10.32.12.49,10.32.12.45,10.32.12.47,10.32.12.46

In order for Galera to work correctly binlog format should be ROW

binlog_format = ROW

wsrep_applier_threads = 16
wsrep_certify_nonPK = 1
wsrep_max_ws_rows = 131072
wsrep_node_incoming_address = 10.32.12.49
wsrep_max_ws_size = 1073741824
wsrep_debug = 0
wsrep_retry_autocommit = 5
wsrep_auto_increment_control = 0
auto_increment_increment = 6
auto_increment_offset = 1
wsrep_sync_wait = 0
wsrep_log_conflicts = 1
wsrep_sst_donor_rejects_queries= 0

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode = 2
wsrep_slave_threads = 12

Node IP address

wsrep_node_address = 10.32.12.49

Cluster name

wsrep_cluster_name = ish-pxc

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name = ishhysdb2

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode = DISABLED

SST method

wsrep_sst_method = xtrabackup-v2
pxc-encrypt-cluster-traffic = OFF
innodb_strict_mode = 0
#lower_case_table_names = 1
sql_mode = NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

What error do you see in the error log of the donor?

This is unusual. You should keep each node at weight=1. I see other unusual parameters too. I would advise you keep with the defaults until after your cluster is online and functional. Then you can tweak settings.

The company we receive service from has activated the antivirus application on the front of the servers. For this reason, Cluster requests were failing.
thank you.

Thanks for this information,
Have you seen any other strange parameters?

wsrep_certify_nonPK
wsrep_max_ws_rows
wsrep_max_ws_size
wsrep_auto_increment_control (leave this default)
wsrep_sync_wait
wsrep_sst_donor_rejects_queries
wsrep_provider_options (there are many in here that rarely need to be set)