How to stop and start an XtraDB Mysql cluster of 3 nodes, where the first node is a bootstrap node ? How to start the bootstrap node if it is down accidentally and join back to the cluster ? How to start other two nodes if they are down accidentally and join back to the cluster ??
Hello Tarique, welcome to the Percona Community Forum.
Regarding your question, if your 3 nodes cluster is up and running and somehow the original node that boostrap the cluster is down, just start it as any other regular MySQL service, it will join the cluster and one of the other two nodes will help it get updated using IST or SST depending on the gcache. As long as there is one node in the cluster still up and running, the other nodes can be started as any other regular MySQL service.
If all nodes are down, then you need to boostrap the cluster from the last node to go down, and then start the other nodes as a regular MySQL service. If you don’t know which was the last node to go down you can review the contests of file “grastate.dat” and look for the following
safe_to_bootstrap: 1
I hope this was helpful.
Cheers,
Michael
Hi tarique,
We have a blogpost on how to recover PXC under different scenarios: Galera Replication – How to Recover a PXC Cluster | Percona
There are 6 scenarios that show how to recover the cluster depending if 1 (or more) nodes left the cluster, gracefully and ungracefully.
Hey Michael,
My Percona cluster’s all nodes were down due to some table issues. I saw the grastate.dat file and found that one of the node says , safe_to_bootstrap: 1 but when I tried to bootstrap from this node, failed to bootstrap, what else needs to be done. Now I am stuck. Please help me.
systemctl status mysql@bootstrap.service
● mysql@bootstrap.service - Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap
Loaded: loaded (/usr/lib/systemd/system/mysql@.service; disabled; vendor preset: disabled)
Active: activating (start) since Tue 2023-09-19 10:55:16 +03; 4s ago
Process: 16130 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 16232 ExecStartPre=/bin/sh -c VAR=bash /usr/bin/mysql-systemd galera-recovery
; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$>
Process: 16230 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 16189 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 16315 (mysqld)
Status: “Server startup in progress”
Tasks: 6 (limit: 204283)
Memory: 377.9M
CGroup: /system.slice/system-mysql.slice/mysql@bootstrap.service
└─16315 /usr/sbin/mysqld --wsrep-new-cluster
Sep 19 10:55:16 MUATCMSMPXDB3 systemd[1]: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap…
Hi Tarique,
From your outputs it seems that the server is in the process of starting up:
Status: “Server startup in progress”
It can take a while based on different factors such as crash recovery, binlog scanning, slow hardware, buffer pool warmup etc…
You should monitor the error.log to see what the server is doing.
The error.log path can be configured in the config file with the variable name “log_error”
Regards
Thank you for your response @CTutte .
This is what I found from mysql.log:
==============================
2023-09-19T13:06:36.189184Z 2 [Note] [MY-000000] [WSREP] Server status change connected → joiner
2023-09-19T13:06:36.189201Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-19T13:06:36.189222Z 2 [Note] [MY-000000] [WSREP] Server status change joiner → initializing
2023-09-19T13:06:36.189231Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-19T13:06:36.192991Z 3 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2023-09-19T13:06:37.096336Z 3 [ERROR] [MY-012179] [InnoDB] Could not find any file associated with the tablespace ID: 6492
2023-09-19T13:06:37.096399Z 3 [ERROR] [MY-012964] [InnoDB] Use --innodb-directories to find the tablespace files. If that fails then use --innodb-force-recovery=1 to ignore this and to permanently lose all changes to the missing tablespace(s)
2023-09-19T13:06:37.196647Z 3 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error.
2023-09-19T13:06:37.596514Z 3 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine
2023-09-19T13:06:37.596723Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2023-09-19T13:06:37.596754Z 0 [ERROR] [MY-010119] [Server] Aborting
2023-09-19T13:06:37.596762Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2023-09-19T13:06:39.596889Z 0 [Note] [MY-000000] [WSREP] Server status change initializing → disconnecting
2023-09-19T13:06:39.596973Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-09-19T13:06:39.597019Z 0 [Note] [MY-000000] [Galera] Closing send monitor…
2023-09-19T13:06:39.597032Z 0 [Note] [MY-000000] [Galera] Closed send monitor.
2023-09-19T13:06:39.597042Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-09-19T13:06:39.597054Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-09-19T13:06:39.597067Z 1 [Note] [MY-000000] [WSREP] rollbacker thread exiting 1
2023-09-19T13:06:39.597353Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
2023-09-19T13:06:39.597395Z 2 [ERROR] [MY-000000] [Galera] Exception: State wait was interrupted
2023-09-19T13:06:39.597408Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 → 0
2023-09-19T13:06:39.597424Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2023-09-19T13:06:39.597457Z 2 [ERROR] [MY-000000] [Galera] View callback failed. This is unrecoverable, restart required. (FATAL)
at galera/src/replicator_smm.cpp:submit_view_info():2594
2023-09-19T13:06:39.597489Z 2 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2023-09-19T13:06:39.597507Z 2 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.
here is my my.cnf file details:
Template my.cnf for PXC
Edit to your requirements.
[client]
socket=/var/lib/mysql/mysql.sock
[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
skip-log-bin = true
max_connections=1000
Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800
######## wsrep ###############
Path to Galera library
wsrep_provider=/usr/lib64/galera4/libgalera_smm.so
Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://10.0.0.20,10.0.0.21,10.0.0.22
In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
Slave thread to use
wsrep_slave_threads=8
wsrep_log_conflicts
This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
Node IP address
wsrep_node_address=10.0.0.20
Cluster name
wsrep_cluster_name=pxc-cluster
#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=pxc-cluster-node-3-MUATCMSMPXDB3
#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED
SST method
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=“sstuser:QRagis@78hb”
pxc-encrypt-cluster-traffic=OFF
kindly let me know if there is any mandatory stuff that will help in bringing up the server that I missed in my.cnf file
Hi Tarique,
This might be complicated to troubleshoot in a community forum.
First of all; when you bootstrap a node make sure that the bootstrap node is up and running before starting any other node.
From the outputs it seems this node is a “joiner”:
2023-09-19T13:06:36.189184Z 2 [Note] [MY-000000] [WSREP] Server status change connected → joiner
Are you sure that there are NO nodes up and running?
2023-09-19T13:06:37.096399Z 3 [ERROR] [MY-012964] [InnoDB] Use --innodb-directories to find the tablespace files. If that fails then use --innodb-force-recovery=1 to ignore this and to
The error.log is also complaining that the current datadirectory is missing files. Either because you changed configuration, or because the datadir folder was deleted when wrongly starting this node and due to trying to SST, or either the local disks are broken, or the data dir is correct intact and safe but the files themselves are corrupt, or maybe something else.
Above said. Make sure every node is down.
Then make a backup of all the servers datadir in case you do something wrong (and the datadir process gets deleted on a wrong startup sequence).
Then bootstrap the newest node. If that fails, check disk health and any other errors that might pop up. You might need to use innodb force recovery (https://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html) Note that it’s safe to use force recovery up to a value of 3. A value bigger than 3 will cause data loss; in which case you better start another one of the nodes (that might not have up to date info but likely a healthy dataset)
Regards
Thanks @CTutte for prompt response.
let me try the innodb-force-recovery method, is it ok if the try bootstraping from the same node which has value 1 for safe_to_bootstrap after putting innodb_recovery value in my.cnf or do we have to try starting only with normal mysql start command for innodb_recovery method ??
currently no nodes are running