Error when node stop on Percona Cluster

There is a cluster structure consisting of 3 nodes. Due to a locking issue, we shut down the 1st node, deleted the /var/lib/mysql directory, and performed an SST (State Snapshot Transfer) again. It transferred all the data, but the /var/lib/mysql directory appeared empty. When we tried to start the MySQL service, we encountered the following error:

Error: “Stale sst_in_progress file in datadir: mysqld
Starting MySQL (Percona XtraDB Cluster) database server: mysqld
State transfer in progress, setting sleep higher: mysqld . . .
The server quit without updating the PID file.”

What should I do?

1 Like

Can you give more detail on this? “locking issue” is not something you see in PXC when used correctly.

How did you verify this?

Make sure to ls -la because there was probably a .sst directory.

Make sure to erase everything and try the SST again. Give it plenty of time to fully finish. MySQL will start automatically once SST is finished.

I verify transfer data
Node2 data size = Node1 data size

ls -al /var/lib/mysql There were no files found in the directory
What do you mean by ‘deleting everything’? Could you provide us with a directive?
Besides Should we start Node1 with Bootstrap? I’m not sure if it has anything to do with the fact that Node1 was the primary initially

var/mysql_old file
grastate.dat

GALERA saved state

version: 2.1
uuid: fc229951-c807-11ed-963b-029e3177219a
seqno: 0
safe_to_bootstrap: 1

Maybe recover. We insert parameter other node wsrep_provider_options=‘pc.bootstrap=true’;
We use mysql version 5.7.40

This is a dead MySQL version. You should upgrade to 8.0 as soon as you can.

I was referring to deleting the hidden .sst directory.

Absolutely not. You only bootstrap a node when starting a cluster for the first time. If you have 2 other nodes running, and you bootstrapped node1, you would create 2 independent clusters.

If you have 2 other nodes running, simply do a normal MySQL start on node1. It will SST from node2/3 and then start and join the cluster.

We cleared the /DB/mysql folder, including the hidden .sst directory, and initiated the SST process. All of these steps were carried out two days ago. Although the SST completed two days ago, the MySQL service still didn’t start successfully.
Maybe is it possible crash MYSQL?

You need to read the logs. There should be a .sst folder in the $datadir with logs, and/or the mysql error log. Also the logs on the donor host.

We were unable to find any relevant information in the logs. Would it be possible to do the following:
Node2 is currently a functioning virtual machine. I duplicated Node2 to create a new machine. If I edit the configuration on Node1 and remove the .sst directory, could this potentially resolve the issue?Alternatively, could you kindly provide us with a step-by-step guide on how to set up a new machine from scratch and register it properly?

This is what we do in our Percona XtraDB Cluster Tutorial training class:

  1. Copy the my.cnf from node X to node Y
  2. Change the wsrep-node-address to Y’s ip
  3. systemctl start mysql on node Y

That’s it. Node Y will initiate an SST from Node X, and automatically join the cluster.

Yes, you could duplicate the VM and change the config in a similar way. Make sure you have set wsrep_provider_options="gcache.recover=yes" so the gcache can be used for IST.

We have a data size of 7TB, and the issue we’re facing is related to our setup with Percona XtraDB Cluster 5.7 (PXC 5.7). Initially, the MySQL data directory is set up at /var/lib/mysql. After this, we change the MySQL configuration to point to a new data directory (DB/mysql/). When we start MySQL, the SST (State Snapshot Transfer) process begins and completes successfully, but afterward, MySQL fails to start.

ps aux:
wsrep-provider=/usr/lib/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --open-files-limit=65000 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1

Error:
server quit without updating PID file (/var/run/mysqld/mysqld.pid)
Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.

Did you try the option I mentioned above? Are you trying to bootstrap this node?

1 Like

Yes node joined cluster. bootstrap solved our problem . Thank you