3-node cluster config. Attempted to load a large amount of data in one of the databases. Process was disconnected by what appears to be an OS error. Having trouble disconnecting/shutting down the offending node. I will attached the relevant error.log section.
Current state of mysql service…
● mysql.service - Percona XtraDB Cluster
Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
Active: activating (start) since Wed 2021-12-08 01:49:57 UTC; 12h ago
Process: 919 ExecStartPre=/bin/sh -c VAR=
bash /usr/bin/mysql-systemd galera-recovery; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSIT
Process: 915 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 844 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
Process: 743 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 1012 (mysqld)
Status: “Server startup in progress”
Tasks: 6 (limit: 4915)
└─1012 /usr/sbin/mysqld --wsrep_start_position=9ca37890-4965-11ec-9375-7f14bb2ba97e:18889
Dec 08 01:49:57 db-app01-atl-prd systemd: Starting Percona XtraDB Cluster…
root@db-app01-atl-prd:~# ps -ef | grep mysql
mysql 1012 1 0 01:49 ? 00:00:05 /usr/sbin/mysqld --wsrep_start_position=9ca37890-4965-11ec-9375-7f14bb2ba97e:18889
root 2722 2649 0 14:06 pts/1 00:00:00 grep --color=auto mysql
app01_error.log (58.1 KB)
You can run
systemctl stop mysql to kill the mysql process and gracefully stop it.
I tried that but it just seems to hang. I’ll try it again and see if I get any log entries.
PXC requires a minimum 10s to stop. Open another session and tail -f the mysql log when you stop it to watch.
It is stuck at the same spot. These are new nodes. Outside of re-installing the Cluster software on the node, can I reset /deletethe /var/lib/mysql files to simulate a new instance and add it to the cluster? Basically simulate a brand new node.
Each time you start a new node, PXC must sync the data set with the other nodes. If you have 100GB of data, that will take about 1hr-2hrs over gigabit eth. Are you watching the logs? The logs will tell you that an SST is in progress.
I understand how it works but it does not appear to be doing anything as this is the result of a tail -f of the /var/log/mysql/error.log This has been the last lines in the log for the past hour…
I really just want to trash this node and re-add it to the cluster.
2021-12-08T19:16:06.989369Z 1 [Note] [MY-000000] [Galera] Non-primary view
2021-12-08T19:16:06.989621Z 1 [Note] [MY-000000] [WSREP] Server status change disconnected → connected
2021-12-08T19:16:06.989864Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-08T19:16:06.990120Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-08T19:16:07.485951Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50233S), skipping check
Then do that. Just kill -9 mysql, erase the /var/lib/mysql and start it up. It will auto-SST from the other nodes.
I did just that, back in business. Thanks for the tips!