Percona Cluster Node Timeout in Azure VM

I have Percona Xtradb Cluster in Azure VM with Three node (2 were MySQL Node NodeA and NodeB, 1 Garb Arbitrator NodeC)
Node A and C in Vnet and connected to NodeB with single VNET Peering
I have joined these cluster with Vnet Peering and private IP address. Issue is MySQL Node is getting connection lost and mysql stopped following were the logs found. Is there is any fix for this.

In Stopped Node (NodeA)

WSREP: Failed to report last committed 20712650, -110 (Connection timed out)
WSREP: last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT2.35911S), skipping check
Log of wsrep recovery (–wsrep-recover)

In Running Node (NodeB) - Bootstrap

2020-02-25T10:50:58.405580Z 0 [Note] WSREP: (9288914c, ‘tcp://0.0.0.0:4567’) connection to peer 807f459c with addr tcp://NodeB:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-02-25T10:50:58.406359Z 0 [Note] WSREP: (9288914c, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://NodeB:4567
2020-02-25T10:50:59.630488Z 0 [Note] WSREP: (9288914c, ‘tcp://0.0.0.0:4567’) reconnecting to 807f459c (tcp://NodeB:4567), attempt 0
2020-02-25T10:51:00.906739Z 0 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2020-02-25T10:51:00.906800Z 0 [Note] WSREP: evs:: proto(9288914c, GATHER, view_id(REG,807f459c,3)) suspecting node: 807f459c
2020-02-25T10:51:00.906815Z 0 [Note] WSREP: evs:: proto(9288914c, GATHER, view_id(REG,807f459c,3)) suspected node without join message, declaring inactive
2020-02-25T10:51:01.406965Z 0 [Note] WSREP: declaring node with index 0 inactive (evs.inactive_timeout)
2020-02-25T10:51:01.527169Z 0 [Note] WSREP: declaring a4893046 at tcp://NodeC:4444 stable
2020-02-25T10:51:01.647292Z 0 [Note] WSREP: Node 9288914c state primary
2020-02-25T10:51:01.767513Z 0 [Note] WSREP: Current view of cluster as seen by this node

In Running Arbitrator Node (Node C)

2020-02-25 10:50:58.355 INFO: (a4893046, ‘tcp://0.0.0.0:4444’) connection to peer 807f459c with addr tcp://NodeB:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-02-25 10:50:58.355 INFO: (a4893046, ‘tcp://0.0.0.0:4444’) turning message relay requesting on, nonlive peers: tcp://NodeB:4567
2020-02-25 10:50:59.356 INFO: (a4893046, ‘tcp://0.0.0.0:4444’) reconnecting to 807f459c (tcp://NodeB:4567), attempt 0
2020-02-25 10:51:00.356 INFO: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2020-02-25 10:51:00.356 INFO: evs:: proto(a4893046, OPERATIONAL, view_id(REG,807f459c,3)) suspecting node: 807f459c
2020-02-25 10:51:00.356 INFO: evs:: proto(a4893046, OPERATIONAL, view_id(REG,807f459c,3)) suspected node without join message, declaring inactive
2020-02-25 10:51:00.856 INFO: declaring node with index 0 inactive (evs.inactive_timeout)

Any help on this please.

Hi kvigneshs

Could you please add this information:
[LIST]
[]version of Percona XtraDB Cluster
[
]version or other information about the Azure environment
[*]copies of the my.cnf for each of the nodes in the cluster
[/LIST] Meanwhile, I will share this link with the team in case they have any suggestions.

Please also see this JIRA post, if you have responses to Przemyslaw’s questions that could help a great deal https://jira.percona.com/browse/PXC-2285

Hi Lorraine,

Thanks for your reply, please find mysql configuration Same configuration used for Node A and Node B. and Percona xtradb cluster version i’m using 5.7.28.

############### mysqld.cnf ###############

# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/lib/mysql/mysql.sock
[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysql/mysql-error.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin
log_slave_updates
expire_logs_days=7
sql_mode=''
innodb_buffer_pool_size = 4G # (adjust value here, 50%-70% of total RAM)
innodb_buffer_pool_instances=10
innodb_log_file_size = 1G
#innodb_log_file_size = 50331648
innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0
innodb_flush_method = O_DIRECT
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_io_capacity = 3000
innodb_io_capacity_max = 6000
innodb_temp_data_file_path=ibtmp1:12M:autoextend:m ax:1G
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log_bin_trust_function_creators = 1
query_cache_type = 1
query_cache_size =125M
innodb_autoinc_lock_mode = 2
hot_cache.key_buffer_size=1G
slow_query_log = 1
slow_query_log_file=/var/log/mysql/slow-query.log
long_query_time=10
log-queries-not-using-indexes



############### wsrep.cnf ###############

[mysqld]
# Path to Galera library
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://NODE_A,NODE_B,NODE_C
wsrep_provider_options="gcache.size=3G;gcache.page _size=1G;gcache.recover=yes"
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB
# Slave thread to use
wsrep_slave_threads=16
wsrep_log_conflicts=ON
# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=IPAPPRESS_A/B
# Cluster name
wsrep_cluster_name=decodeglobal-cluster
#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=CLUSTERNAME
#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED
# SST method
wsrep_sst_method=xtrabackup-v2
#Authentication for SST method
wsrep_sst_auth="USER:PASSWORD"
max_connections=500
max_connect_errors=500
sql_mode=''


############### mysqld_safe.cnf ###############
#
# The Percona Server 5.7 configuration file.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# [URL="http://dev.mysql.com/doc/mysql/en/server-system-variables.html"]http://dev.mysql.com/doc/mysql/en/se...variables.html[/URL]

[mysqld_safe]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/lib/mysql/mysql.sock
nice = 0

Azure Environment:

Node A and Node C were in same region and same vnet, but Node B is in Different region, connected with Azure VNET Peering.
I have tried to cluster these machine using both Public and Private IP address.