I have a 3 node galera cluster, with 2 members on same subnet (call them db1 and db2), different ESX hosts. 3rd node (db3) was on a remote ESX server that had site-to-site tunnel. Recently the 2 original servers were taken down to be moved to the same location as the 1 new one. After the equipment was moved and the servers (Red Hat 7) were started, both of the nodes failed to start. Worse, while db3 remained up, it was still reporting the cluster size as 3 (even though db1 & 2 had failed to start).
I eventually shut down db3, and started db1 with new-cluster option. What I do not understand, is how or why db3 would still report 3. Anyone ever experienced something like this? Any ideas as to why or how this could have happened?