Percona Bootstrap - Network/Storage Issues

We have few incidents where when there was network issues between the clusters or hardware issues (like storage is slow) the whole cluster hangs. We need to shutdown and bootstrap the cluster to bring it back up. Is there a setting available in xtradb which can bring down the trouble node with in the cluster so there is no bootstrap required ? below are the errors we see during those times.

Server version: 5.6.28-76.1-56-log Percona XtraDB Cluster (GPL)
Total Nodes - 6

2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2021-05-23 07:30:33 15940 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)

2021-05-23 07:49:01 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:49:01 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:49:07 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:49:07 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:49:15 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2021-05-23 07:49:15 15940 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)

1 Like

Indeed you should not shutdown the whole cluster, ideally we would need to see more diagnostic information to see why it happens.
However you are using 5.6 version which is very old and on end of life support, so we will not be able to do much about it.
I strongly recommend to migrate to Percona XtraDB Cluster 8.0.22