Nodes Cannot Join the Percona XtraDBCluster 8.0.33

Akif_Sinmaz · October 20, 2023, 8:41pm

Hello,

I reinstalled percona-xtradb-cluster-server-8.0.33-25 many times. I’m running the first server as bootstrap. But I could not include other nodes in the cluster.

2023-10-20T20:31:30.239323Z 0 [Warning] [MY-000000] [Galera] There are no nodes in the same segment that will ever be able to become donors, yet there is a suitable donor outside. Will use that one.
2023-10-20T20:31:30.239411Z 0 [Note] [MY-000000] [Galera] Member 1.1 (ishhysdb4) requested state transfer from ‘any’. Selected 0.0 (ishhysdb1)(SYNCED) as donor.
2023-10-20T20:31:30.239497Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY → JOINER (TO: 15)
2023-10-20T20:31:30.239614Z 2 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
2023-10-20T20:31:30.239652Z 2 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2023-10-20T20:31:30.239683Z 2 [Note] [MY-000000] [Galera] GCache history reset: ab45006b-6f5b-11ee-b945-1abbefdb6fc0:0 → 2f9cb569-6f81-11ee-8223-e65b2c6c5365:15
2023-10-20T20:33:09.347961Z 0 [Note] [MY-000000] [WSREP-SST] Trying to terminate (1037779) socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x with SIGTERM
2023-10-20T20:33:09.508161Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 218: 1037781 Exit 143 socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
2023-10-20T20:33:09.508261Z 0 [Note] [MY-000000] [WSREP-SST] 1037782 Terminated | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2023-10-20T20:33:10.518492Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2023-10-20T20:33:10.518616Z 0 [ERROR] [MY-000000] [WSREP-SST] Possible timeout in receving first data from donor in gtid/keyring stage
2023-10-20T20:33:10.518722Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1381
2023-10-20T20:33:10.518822Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2023-10-20T20:33:10.519099Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2023-10-20T20:33:10.717793Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘10.32.12.47’ --datadir ‘/var/lib/mysql/’ --basedir ‘/usr/’ --plugindir ‘/usr/lib64/mysql/plugin/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘1037237’ --mysqld-version ‘8.0.33-25.1’ ‘’ : 32 (Broken pipe)
2023-10-20T20:33:10.717910Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2023-10-20T20:33:10.717942Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2023-10-20T20:33:10.718159Z 3 [Note] [MY-000000] [Galera] Processing SST received
2023-10-20T20:33:10.718218Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2023-10-20T20:33:10.718293Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2023-10-20T20:33:10.718346Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2023-10-20T20:33:10.718384Z 3 [Note] [MY-000000] [Galera] Closing send monitor…
2023-10-20T20:33:10.718408Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2023-10-20T20:33:10.718506Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-10-20T20:33:10.718676Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-10-20T20:33:10.718826Z 3 [Note] [MY-000000] [Galera] gcomm: closing backend
2023-10-20T20:33:11.724873Z 3 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))

Taras_Onishchuk · October 20, 2023, 10:32pm

It looks like the port cannot be used, it is hard to tell without actually testing it, but you could check:

the port is open
there is no software preventing service binding to the port (e.g. selinux or aparmor)
there is no other software listening on the port (you could use “netstat” to determine that)

Hope it gives some idea of where to look at.

matthewb · October 21, 2023, 2:27pm

This is a strange warning. Please share your entire config for bootstrap node and one of the joiner nodes.

Additionally, as Taras said, check all ports: 4444, 4567, 4568, 3306

Akif_Sinmaz · October 21, 2023, 7:47pm

Hi,

I checked all ports and no issues about it. Additionaly, Installed OS is RHEL 8.7 and Selinux and firewalld are stopped.

Here it’s my current cluster configuration. thanks for colobration.

Provider specific configuration options

wsrep_provider_options = “pc.recovery=TRUE;pc.weight=2;gmcast.segment=1;socket.ssl_compression=no;gcache.keep_pages_size=4;gcache.page_size=128M;gcache.size=5024M;evs.send_window=1024;evs.user_send_window=1024; evs.use_aggregate = true; gcs.fc_factor = 1; gcs.fc_limit = 500; gcs.max_packet_size = 2097152;gcs.fc_debug = 0; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.suspect_timeout = PT30S; evs.stats_report_period = PT1M; evs.info_log_mask = 0; evs.install_timeout = PT1M; evs.join_retrans_period = PT2S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.use_aggregate = true; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql; gcache.name = /var/lib/mysql/galera.cache; gcs.fc_debug = 0; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.32.12.48; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3”

Binary log expiration period is 604800 seconds, which equals 7 days

binlog_expire_logs_seconds = 604800

######## wsrep ###############

Path to Galera library

wsrep_provider = /usr/lib64/galera4/libgalera_smm.so

wsrep_cluster_address = gcomm://10.32.12.48,10.32.12.49,10.32.12.45,10.32.12.47,10.32.12.46

In order for Galera to work correctly binlog format should be ROW

binlog_format = ROW

wsrep_applier_threads = 16
wsrep_certify_nonPK = 1
wsrep_max_ws_rows = 131072
wsrep_node_incoming_address = 10.32.12.48
wsrep_max_ws_size = 1073741824
wsrep_debug = 0
wsrep_retry_autocommit = 5
wsrep_auto_increment_control = 0
auto_increment_increment = 6
auto_increment_offset = 1
wsrep_sync_wait = 0
wsrep_log_conflicts = 1
wsrep_sst_donor_rejects_queries= 0

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode = 2
wsrep_slave_threads = 12

Node IP address

wsrep_node_address = 10.32.12.48

Cluster name

wsrep_cluster_name = ish-pxc

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name = ishhysdb1

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode = DISABLED

SST method

wsrep_sst_method = xtrabackup-v2
pxc-encrypt-cluster-traffic = OFF
innodb_strict_mode = 0
#lower_case_table_names = 1
sql_mode = NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

This is other node configuration.

Provider specific configuration options

wsrep_provider_options = “pc.recovery=TRUE;pc.weight=2;gmcast.segment=1;socket.ssl_compression=no;gcache.keep_pages_size=4;gcache.page_size=128M;gcache.size=5024M;evs.send_window=1024;evs.user_send_window=1024; evs.use_aggregate = true; gcs.fc_factor = 1; gcs.fc_limit = 500; gcs.max_packet_size = 2097152;gcs.fc_debug = 0; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.suspect_timeout = PT30S; evs.stats_report_period = PT1M; evs.info_log_mask = 0; evs.install_timeout = PT1M; evs.join_retrans_period = PT2S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.use_aggregate = true; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql; gcache.name = /var/lib/mysql/galera.cache; gcs.fc_debug = 0; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.32.12.49; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3”

Binary log expiration period is 604800 seconds, which equals 7 days

binlog_expire_logs_seconds = 604800

######## wsrep ###############

Path to Galera library

wsrep_provider = /usr/lib64/galera4/libgalera_smm.so

wsrep_cluster_address = gcomm://10.32.12.48,10.32.12.49,10.32.12.45,10.32.12.47,10.32.12.46

In order for Galera to work correctly binlog format should be ROW

binlog_format = ROW

wsrep_applier_threads = 16
wsrep_certify_nonPK = 1
wsrep_max_ws_rows = 131072
wsrep_node_incoming_address = 10.32.12.49
wsrep_max_ws_size = 1073741824
wsrep_debug = 0
wsrep_retry_autocommit = 5
wsrep_auto_increment_control = 0
auto_increment_increment = 6
auto_increment_offset = 1
wsrep_sync_wait = 0
wsrep_log_conflicts = 1
wsrep_sst_donor_rejects_queries= 0

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode = 2
wsrep_slave_threads = 12

Node IP address

wsrep_node_address = 10.32.12.49

Cluster name

wsrep_cluster_name = ish-pxc

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name = ishhysdb2

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode = DISABLED

SST method

wsrep_sst_method = xtrabackup-v2
pxc-encrypt-cluster-traffic = OFF
innodb_strict_mode = 0
#lower_case_table_names = 1
sql_mode = NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

yunus_shaikh · October 23, 2023, 8:03am

What error do you see in the error log of the donor?

matthewb · October 23, 2023, 2:26pm

This is unusual. You should keep each node at weight=1. I see other unusual parameters too. I would advise you keep with the defaults until after your cluster is online and functional. Then you can tweak settings.

Akif_Sinmaz · October 23, 2023, 2:36pm

The company we receive service from has activated the antivirus application on the front of the servers. For this reason, Cluster requests were failing.
thank you.

Akif_Sinmaz · October 23, 2023, 2:38pm

Thanks for this information,
Have you seen any other strange parameters?

matthewb · October 23, 2023, 3:52pm

wsrep_certify_nonPK
wsrep_max_ws_rows
wsrep_max_ws_size
wsrep_auto_increment_control (leave this default)
wsrep_sync_wait
wsrep_sst_donor_rejects_queries
wsrep_provider_options (there are many in here that rarely need to be set)

Topic		Replies	Views
Problem in adding 2nd node to Percona Xtradb cluster 8.0.35 Percona XtraDB Cluster 8.x troubleshooting , mysql , percona	2	395	April 2, 2024
The master node starts normally, and other nodes cannot join Percona XtraDB Cluster 8.x	10	819	October 13, 2023
Joining a new node to cluster Percona XtraDB Cluster 8.x	28	6239	December 16, 2021
Nodes in cluster doesnt join eachother Percona XtraDB Cluster 8.x mysql , percona	3	273	May 3, 2024
trouble of starting the cluster after a crash Percona XtraDB Cluster 5.x	0	2376	January 23, 2014

Nodes Cannot Join the Percona XtraDBCluster 8.0.33

Provider specific configuration options

Binary log expiration period is 604800 seconds, which equals 7 days

Path to Galera library

In order for Galera to work correctly binlog format should be ROW

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

Node IP address

Cluster name

SST method

Provider specific configuration options

Binary log expiration period is 604800 seconds, which equals 7 days

Path to Galera library

In order for Galera to work correctly binlog format should be ROW

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

Node IP address

Cluster name

SST method

Related topics