How to stop and start an XtraDB Mysql cluster of 3 nodes/

tarique · August 11, 2023, 1:10pm

How to stop and start an XtraDB Mysql cluster of 3 nodes, where the first node is a bootstrap node ? How to start the bootstrap node if it is down accidentally and join back to the cluster ? How to start other two nodes if they are down accidentally and join back to the cluster ??

michael.villegas · August 12, 2023, 4:41pm

Hello Tarique, welcome to the Percona Community Forum.

Regarding your question, if your 3 nodes cluster is up and running and somehow the original node that boostrap the cluster is down, just start it as any other regular MySQL service, it will join the cluster and one of the other two nodes will help it get updated using IST or SST depending on the gcache. As long as there is one node in the cluster still up and running, the other nodes can be started as any other regular MySQL service.
If all nodes are down, then you need to boostrap the cluster from the last node to go down, and then start the other nodes as a regular MySQL service. If you don’t know which was the last node to go down you can review the contests of file “grastate.dat” and look for the following

safe_to_bootstrap: 1

I hope this was helpful.
Cheers,
Michael

CTutte · September 11, 2023, 1:00pm

Hi tarique,

We have a blogpost on how to recover PXC under different scenarios: Galera Replication – How to Recover a PXC Cluster | Percona

There are 6 scenarios that show how to recover the cluster depending if 1 (or more) nodes left the cluster, gracefully and ungracefully.

tarique · September 19, 2023, 7:07am

Hey Michael,

My Percona cluster’s all nodes were down due to some table issues. I saw the grastate.dat file and found that one of the node says , safe_to_bootstrap: 1 but when I tried to bootstrap from this node, failed to bootstrap, what else needs to be done. Now I am stuck. Please help me.

systemctl status mysql@bootstrap.service
● mysql@bootstrap.service - Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap
Loaded: loaded (/usr/lib/systemd/system/mysql@.service; disabled; vendor preset: disabled)
Active: activating (start) since Tue 2023-09-19 10:55:16 +03; 4s ago
Process: 16130 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 16232 ExecStartPre=/bin/sh -c VAR=bash /usr/bin/mysql-systemd galera-recovery; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$>
Process: 16230 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 16189 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 16315 (mysqld)
Status: “Server startup in progress”
Tasks: 6 (limit: 204283)
Memory: 377.9M
CGroup: /system.slice/system-mysql.slice/mysql@bootstrap.service
└─16315 /usr/sbin/mysqld --wsrep-new-cluster

Sep 19 10:55:16 MUATCMSMPXDB3 systemd[1]: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap…

CTutte · September 19, 2023, 1:01pm

Hi Tarique,

From your outputs it seems that the server is in the process of starting up:

Status: “Server startup in progress”

It can take a while based on different factors such as crash recovery, binlog scanning, slow hardware, buffer pool warmup etc…
You should monitor the error.log to see what the server is doing.

The error.log path can be configured in the config file with the variable name “log_error”

Regards

tarique · September 19, 2023, 1:12pm

Thank you for your response @CTutte .

This is what I found from mysql.log:

==============================
2023-09-19T13:06:36.189184Z 2 [Note] [MY-000000] [WSREP] Server status change connected → joiner
2023-09-19T13:06:36.189201Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-19T13:06:36.189222Z 2 [Note] [MY-000000] [WSREP] Server status change joiner → initializing
2023-09-19T13:06:36.189231Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-19T13:06:36.192991Z 3 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2023-09-19T13:06:37.096336Z 3 [ERROR] [MY-012179] [InnoDB] Could not find any file associated with the tablespace ID: 6492
2023-09-19T13:06:37.096399Z 3 [ERROR] [MY-012964] [InnoDB] Use --innodb-directories to find the tablespace files. If that fails then use --innodb-force-recovery=1 to ignore this and to permanently lose all changes to the missing tablespace(s)
2023-09-19T13:06:37.196647Z 3 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error.
2023-09-19T13:06:37.596514Z 3 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine
2023-09-19T13:06:37.596723Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2023-09-19T13:06:37.596754Z 0 [ERROR] [MY-010119] [Server] Aborting
2023-09-19T13:06:37.596762Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2023-09-19T13:06:39.596889Z 0 [Note] [MY-000000] [WSREP] Server status change initializing → disconnecting
2023-09-19T13:06:39.596973Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-19T13:06:39.597019Z 0 [Note] [MY-000000] [Galera] Closing send monitor…
2023-09-19T13:06:39.597032Z 0 [Note] [MY-000000] [Galera] Closed send monitor.
2023-09-19T13:06:39.597042Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-09-19T13:06:39.597054Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-09-19T13:06:39.597067Z 1 [Note] [MY-000000] [WSREP] rollbacker thread exiting 1
2023-09-19T13:06:39.597353Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
2023-09-19T13:06:39.597395Z 2 [ERROR] [MY-000000] [Galera] Exception: State wait was interrupted
2023-09-19T13:06:39.597408Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 → 0
2023-09-19T13:06:39.597424Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2023-09-19T13:06:39.597457Z 2 [ERROR] [MY-000000] [Galera] View callback failed. This is unrecoverable, restart required. (FATAL)
at galera/src/replicator_smm.cpp:submit_view_info():2594
2023-09-19T13:06:39.597489Z 2 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2023-09-19T13:06:39.597507Z 2 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.

here is my my.cnf file details:

Template my.cnf for PXC

Edit to your requirements.

[client]
socket=/var/lib/mysql/mysql.sock

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
skip-log-bin = true
max_connections=1000

Binary log expiration period is 604800 seconds, which equals 7 days

binlog_expire_logs_seconds=604800

######## wsrep ###############

Path to Galera library

wsrep_provider=/usr/lib64/galera4/libgalera_smm.so

Cluster connection URL contains IPs of nodes

#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://10.0.0.20,10.0.0.21,10.0.0.22

In order for Galera to work correctly binlog format should be ROW

binlog_format=ROW

Slave thread to use

wsrep_slave_threads=8

wsrep_log_conflicts

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode=2

Node IP address

wsrep_node_address=10.0.0.20

Cluster name

wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=pxc-cluster-node-3-MUATCMSMPXDB3

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED

SST method

wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=“sstuser:QRagis@78hb”
pxc-encrypt-cluster-traffic=OFF

kindly let me know if there is any mandatory stuff that will help in bringing up the server that I missed in my.cnf file

CTutte · September 19, 2023, 1:22pm

Hi Tarique,

This might be complicated to troubleshoot in a community forum.

First of all; when you bootstrap a node make sure that the bootstrap node is up and running before starting any other node.

From the outputs it seems this node is a “joiner”:

2023-09-19T13:06:36.189184Z 2 [Note] [MY-000000] [WSREP] Server status change connected → joiner

Are you sure that there are NO nodes up and running?

2023-09-19T13:06:37.096399Z 3 [ERROR] [MY-012964] [InnoDB] Use --innodb-directories to find the tablespace files. If that fails then use --innodb-force-recovery=1 to ignore this and to

The error.log is also complaining that the current datadirectory is missing files. Either because you changed configuration, or because the datadir folder was deleted when wrongly starting this node and due to trying to SST, or either the local disks are broken, or the data dir is correct intact and safe but the files themselves are corrupt, or maybe something else.

Above said. Make sure every node is down.
Then make a backup of all the servers datadir in case you do something wrong (and the datadir process gets deleted on a wrong startup sequence).
Then bootstrap the newest node. If that fails, check disk health and any other errors that might pop up. You might need to use innodb force recovery (https://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html) Note that it’s safe to use force recovery up to a value of 3. A value bigger than 3 will cause data loss; in which case you better start another one of the nodes (that might not have up to date info but likely a healthy dataset)