Percona Cluster Node Timeout in Azure VM

kvigneshs · February 25, 2020, 6:21am

I have Percona Xtradb Cluster in Azure VM with Three node (2 were MySQL Node NodeA and NodeB, 1 Garb Arbitrator NodeC)
Node A and C in Vnet and connected to NodeB with single VNET Peering
I have joined these cluster with Vnet Peering and private IP address. Issue is MySQL Node is getting connection lost and mysql stopped following were the logs found. Is there is any fix for this.

In Stopped Node (NodeA)

WSREP: Failed to report last committed 20712650, -110 (Connection timed out)
WSREP: last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT2.35911S), skipping check
Log of wsrep recovery (–wsrep-recover)

In Running Node (NodeB) - Bootstrap

2020-02-25T10:50:58.405580Z 0 [Note] WSREP: (9288914c, ‘tcp://0.0.0.0:4567’) connection to peer 807f459c with addr tcp://NodeB:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-02-25T10:50:58.406359Z 0 [Note] WSREP: (9288914c, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://NodeB:4567
2020-02-25T10:50:59.630488Z 0 [Note] WSREP: (9288914c, ‘tcp://0.0.0.0:4567’) reconnecting to 807f459c (tcp://NodeB:4567), attempt 0
2020-02-25T10:51:00.906739Z 0 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2020-02-25T10:51:00.906800Z 0 [Note] WSREP: evs:: proto(9288914c, GATHER, view_id(REG,807f459c,3)) suspecting node: 807f459c
2020-02-25T10:51:00.906815Z 0 [Note] WSREP: evs:: proto(9288914c, GATHER, view_id(REG,807f459c,3)) suspected node without join message, declaring inactive
2020-02-25T10:51:01.406965Z 0 [Note] WSREP: declaring node with index 0 inactive (evs.inactive_timeout)
2020-02-25T10:51:01.527169Z 0 [Note] WSREP: declaring a4893046 at tcp://NodeC:4444 stable
2020-02-25T10:51:01.647292Z 0 [Note] WSREP: Node 9288914c state primary
2020-02-25T10:51:01.767513Z 0 [Note] WSREP: Current view of cluster as seen by this node

In Running Arbitrator Node (Node C)

2020-02-25 10:50:58.355 INFO: (a4893046, ‘tcp://0.0.0.0:4444’) connection to peer 807f459c with addr tcp://NodeB:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-02-25 10:50:58.355 INFO: (a4893046, ‘tcp://0.0.0.0:4444’) turning message relay requesting on, nonlive peers: tcp://NodeB:4567
2020-02-25 10:50:59.356 INFO: (a4893046, ‘tcp://0.0.0.0:4444’) reconnecting to 807f459c (tcp://NodeB:4567), attempt 0
2020-02-25 10:51:00.356 INFO: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2020-02-25 10:51:00.356 INFO: evs:: proto(a4893046, OPERATIONAL, view_id(REG,807f459c,3)) suspecting node: 807f459c
2020-02-25 10:51:00.356 INFO: evs:: proto(a4893046, OPERATIONAL, view_id(REG,807f459c,3)) suspected node without join message, declaring inactive
2020-02-25 10:51:00.856 INFO: declaring node with index 0 inactive (evs.inactive_timeout)

kvigneshs · February 26, 2020, 6:08am

Any help on this please.

lorraine.pocklington · February 26, 2020, 6:35am

Hi kvigneshs

Could you please add this information:
[LIST]
[]version of Percona XtraDB Cluster
[]version or other information about the Azure environment
[*]copies of the my.cnf for each of the nodes in the cluster
[/LIST] Meanwhile, I will share this link with the team in case they have any suggestions.

lorraine.pocklington · February 26, 2020, 6:38am

Please also see this JIRA post, if you have responses to Przemyslaw’s questions that could help a great deal [URL][PXC-2285] A lot of WSREP: Failed to report last committed <number>, -110 (Connection timed out) - Percona JIRA

kvigneshs · February 27, 2020, 9:17am

Hi Lorraine,

Thanks for your reply, please find mysql configuration Same configuration used for Node A and Node B. and Percona xtradb cluster version i’m using 5.7.28.

############### mysqld.cnf ###############

# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/lib/mysql/mysql.sock
[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysql/mysql-error.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin
log_slave_updates
expire_logs_days=7
sql_mode=''
innodb_buffer_pool_size = 4G # (adjust value here, 50%-70% of total RAM)
innodb_buffer_pool_instances=10
innodb_log_file_size = 1G
#innodb_log_file_size = 50331648
innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0
innodb_flush_method = O_DIRECT
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_io_capacity = 3000
innodb_io_capacity_max = 6000
innodb_temp_data_file_path=ibtmp1:12M:autoextend:m ax:1G
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log_bin_trust_function_creators = 1
query_cache_type = 1
query_cache_size =125M
innodb_autoinc_lock_mode = 2
hot_cache.key_buffer_size=1G
slow_query_log = 1
slow_query_log_file=/var/log/mysql/slow-query.log
long_query_time=10
log-queries-not-using-indexes



############### wsrep.cnf ###############

[mysqld]
# Path to Galera library
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://NODE_A,NODE_B,NODE_C
wsrep_provider_options="gcache.size=3G;gcache.page _size=1G;gcache.recover=yes"
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB
# Slave thread to use
wsrep_slave_threads=16
wsrep_log_conflicts=ON
# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=IPAPPRESS_A/B
# Cluster name
wsrep_cluster_name=decodeglobal-cluster
#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=CLUSTERNAME
#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED
# SST method
wsrep_sst_method=xtrabackup-v2
#Authentication for SST method
wsrep_sst_auth="USER:PASSWORD"
max_connections=500
max_connect_errors=500
sql_mode=''


############### mysqld_safe.cnf ###############
#
# The Percona Server 5.7 configuration file.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# [URL="http://dev.mysql.com/doc/mysql/en/server-system-variables.html"]http://dev.mysql.com/doc/mysql/en/se...variables.html[/URL]

[mysqld_safe]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/lib/mysql/mysql.sock
nice = 0

Azure Environment:

Node A and Node C were in same region and same vnet, but Node B is in Different region, connected with Azure VNET Peering.
I have tried to cluster these machine using both Public and Private IP address.

Topic		Replies	Views
Replication + Galera = Timeout? Percona XtraDB Cluster 5.x	2	14029	July 31, 2012
Cluster nodes not connecting Percona XtraDB Cluster 5.x	2	24386	April 8, 2013
Adding node to existing cluster fails Percona XtraDB Cluster 5.x	5	6206	July 13, 2017
Node is not connecting. Percona XtraDB Cluster 5.x	7	3118	September 9, 2020
Mysql Service Stopped Due to timeout with cluster address Percona XtraDB Cluster 5.x	0	849	May 25, 2018

Percona Cluster Node Timeout in Azure VM

Related topics