Resolving IST Utilization Issues in a Percona XtraDB Cluster and Adding a New Node

Marwan_Wasel · October 23, 2023, 10:58am

Hello All,

In my current setup, I’m managing a two-node Percona XtraDB Cluster on Ubuntu 16.04.6, running MySQL - mysql Ver 14.14 Distrib 5.7.23-23, along with xtrabackup version 2.4.12, based on MySQL server 5.7.19. However, I’ve encountered a persistent problem. Whenever there’s a failure or an unexpected shutdown in MySQL, the affected node fails to utilize IST (Incremental State Transfer) and defaults to using SST (State Snapshot Transfer). This results in data deletion on the node, leading to time-consuming recovery from scratch. Given the substantial 3TB of data, I’m looking for ways to recover a node using IST or force its utilization. I’m also interested in additional manual steps that could assist in promoting IST after a crash or ungraceful MySQL shutdown.

To address this issue, I’m in the process of adding a third node to the cluster, following best practices in a Percona cluster. This will enable me to experiment with various recovery mechanisms without jeopardizing the two nodes responsible for the main production workload. However, this addition has caused a significant spike in Memory Usage of MySQL on the master node, even though the master isn’t the donor for the newly added node. I had to shut down the extra node due to memory constraints on the master. My primary goal is to find a solution to add a new node without risking the current workloads.

I would appreciate your insights and assistance in resolving these challenges I’m encountering.

matthewb · October 23, 2023, 4:12pm

Just so you know, in 8 days (Oct 31 2023), 5.7 dies. No more support for this version. You need to upgrade as soon as you can, otherwise you won’t be able to get any help.

That is dangerous. You should always have a minimum of 3 nodes. 2 nodes == split brain.

I’ve never understood why this isn’t enabled by default. You should add this to all my.cnf files.

wsrep_provider_options = "gcache.size=2G;gcache.recover=yes"

Marwan_Wasel · October 24, 2023, 9:10am

first of all I’m thankful for your fast response, according to your reply.

I’m aware of this, this is a main reason for trying to setup an additional node, yet I’m encountering memory spikes when it starts syncing, basically the 2nd node status in SQLProxy is changed to OFFLINE_SOFT and as a DONOR it stops serving requests, this leaves the master node as the only functioning node, I don’t want to risk current workload with an OOM in the master node, do you have any suggestion to lower this risk or how to overcome this obstacle ?

my servers are running on hosts each with 16 CPU and 32 GB of Memory.

is it possible set this variables while running MySQL ? does it requires to restart MySQL server ?

matthewb · October 24, 2023, 3:10pm

I’m not sure if those specific gcache parameters can be set dynamically. I would recommend a restart.

Yes, that is expected.

I’m unsure as to why node1 would see increase memory while node2 is serving as DONOR.

Topic		Replies	Views
Problem getting IST to work Percona XtraDB Cluster 5.x	5	1690	March 30, 2020
recovery issues (IST\SST) on Docker Percona XtraDB Cluster 5.x	0	477	October 24, 2016
Node Not Performing IST Percona XtraDB Cluster 5.x	4	1053	March 13, 2023
SST recovery loop after node reboot Percona XtraDB Cluster 5.x	2	596	August 5, 2019
Percona Xtradb Cluster Master node down Percona XtraDB Cluster 5.x	1	112	January 22, 2025

Resolving IST Utilization Issues in a Percona XtraDB Cluster and Adding a New Node

Related topics