Help my percona xtradb cluster stuck

my cluster not completely running.
I tried to restart VM, delete pods, recreate pods again. and keep not running.

haproxy

The following values are used for PXC node 192.168.6.7 in backend galera-nodes:
wsrep_local_state is ; pxc_maint_mod is ; wsrep_cluster_status is ; 2 nodes are available
PXC node 192.168.6.7 for backend galera-nodes is not ok
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.100' (111)
The following values are used for PXC node 192.168.0.100 in backend galera-admin-nodes:
wsrep_local_state is ; pxc_maint_mod is ; wsrep_cluster_status is ; 2 nodes are available
PXC node 192.168.0.100 for backend galera-admin-nodes is not ok
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.6.7' (111)
The following values are used for PXC node 192.168.6.7 in backend galera-admin-nodes:
wsrep_local_state is ; pxc_maint_mod is ; wsrep_cluster_status is ; 2 nodes are available
PXC node 192.168.6.7 for backend galera-admin-nodes is not ok

logs node stuck

26] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.120780992, {"log"=>"2021-09-06T10:35:55.120549Z 0 [Note] [MY-000000] [WSREP] Starting replication"}]
[27] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.120942420, {"log"=>"2021-09-06T10:35:55.120754Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0"}]
[28] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.121128766, {"log"=>"2021-09-06T10:35:55.120944Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to fe2fa9e6-0bc8-11ec-8466-c69f6ee237c8:172242"}]
[29] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.121395857, {"log"=>"2021-09-06T10:35:55.121208Z 0 [Note] [MY-000000] [Galera] protonet asio version 0"}]
[30] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.121971133, {"log"=>"2021-09-06T10:35:55.121816Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums."}]
[31] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.122168534, {"log"=>"2021-09-06T10:35:55.121969Z 0 [Note] [MY-000000] [Galera] initializing ssl context"}]
[32] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.122639960, {"log"=>"2021-09-06T10:35:55.122423Z 0 [Note] [MY-000000] [Galera] backend: asio"}]
[33] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.122895856, {"log"=>"2021-09-06T10:35:55.122705Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0 "}]
[34] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.123194286, {"log"=>"2021-09-06T10:35:55.122995Z 0 [Warning] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown"}]
[35] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.123414833, {"log"=>"2021-09-06T10:35:55.123184Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown"}]
[36] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.123840002, {"log"=>"2021-09-06T10:35:55.123666Z 0 [Note] [MY-000000] [Galera] GMCast version 0"}]
[37] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.124179266, {"log"=>"2021-09-06T10:35:55.124010Z 0 [Note] [MY-000000] [Galera] (37240c87-9ed6, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567"}]
[38] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.124403181, {"log"=>"2021-09-06T10:35:55.124158Z 0 [Note] [MY-000000] [Galera] (37240c87-9ed6, 'ssl://0.0.0.0:4567') multicast: , ttl: 1"}]
[39] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.124900097, {"log"=>"2021-09-06T10:35:55.124716Z 0 [Note] [MY-000000] [Galera] EVS version 1"}]
[40] pxc.cluster1-pxc-2.mysqld-error.log: [1630924555.125158754, {"log"=>"2021-09-06T10:35:55.124983Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'cluster1-pxc', peer '192.168.0.100:,192.168.6.7:'"}]
[0] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127278661, {"log"=>"2021-09-06T10:35:58.126989Z 0 [Note] [MY-000000] [Galera] announce period timed out (pc.announce_timeout)"}]

[1] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127478766, {"log"=>"2021-09-06T10:35:58.127276Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1"}]
[2] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127625161, {"log"=>"2021-09-06T10:35:58.127421Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1"}]
[3] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127850355, {"log"=>"2021-09-06T10:35:58.127644Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible"}]
[4] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127973903, {"log"=>"2021-09-06T10:35:58.127821Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node"}]
[5] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127975122, {"log"=>"view (view_id(NON_PRIM,37240c87-9ed6,1)"}]
[6] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127975941, {"log"=>"memb {"}]
[7] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127976241, {"log"=>"       37240c87-9ed6,0"}]
[8] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127976431, {"log"=>"       }"}]
[9] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127976648, {"log"=>"joined {"}]
[10] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127977023, {"log"=>"      }"}]
[11] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127977268, {"log"=>"left {"}]
[12] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127977499, {"log"=>"      }"}]
[13] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127977793, {"log"=>"partitioned {"}]
[14] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127978028, {"log"=>"      }"}]
[15] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127978276, {"log"=>")"}]
[0] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.628627499, {"log"=>"2021-09-06T10:35:58.628333Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50362S), skipping check"}]

1 Like

Hi Defri_tri_wahyudi,

Lastm message from the log is:
[0] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.628627499, {“log”=>“2021-09-06T10:35:58.628333Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50362S), skipping check”}]

Taken from this blogpost: Galera warning "last inactive check" - Percona Database Performance Blog
This warning is related to the evs.inactive_check_period option. This option controls the poll period for the group communication response time. If a node is delayed, it is added to a delay list and it can lead to the cluster evicting the node.

Also, this node does not seem to see the other nodes:
[3] pxc.cluster1-pxc-2.mysqld-error.log: [1630924558.127850355, {“log”=>“2021-09-06T10:35:58.127644Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible”}]

Last , when a node rejoins the cluster, it requires to reach the other nodes and do IST or SST before the node can fully re join the cluster:
https://galeracluster.com/library/documentation/state-transfer.html

You should re check if this node is properly configured, if the other nodes can reach this node , while the node re joins check if IST/SST is occurying Monitoring a Galera Cluster — Galera Cluster Documentation . After restarting the node give it some time to fully join , as it won’t be accessible and HA proxy will mark it as offline while IST/SST is ongoing.

Regards

1 Like