Graceful shutdown of individual nodes.

Hi folks,

I have been wondering if anyone have been trying to make a “graceful shutdown” of nodes in a Percona xtradb cluster.

To explain my problem.
I have installed a cluser (Percona XtraDB Cluster 5.6) more or less identical to the example in the PXC documentation; three node cluster (let’s call them node A, B,C) and a HA proxy server as a load balancer. HA Proxy is polling the three nodes on port 9200 (the clustercheck script also mentioned in the documentation). It works just smooth, but I miss a way to shut down nodes graceful one at a time. I know I can shut down the mysql service on a node, but this will interrupt the clients in their current actions. I know it shouldn’t be a problem to the database and the data integrity, and this is mentioned as a graceful shutdown in the documentation (chapter 4.4), but I don’t think this is the nicest way to shut down a node. From the servers point of view it is fine, but from the clients point of view it is not exactly graceful.

I figured if I stop the service on port 9200 on one node (I.E. Node A) HA proxy will see node A as down and will not try to make any new connections to this node. But all current MySQL connections to Node A will remain active. This allows the clients to finish whatever they are doing, disconnect and next time they connect the will connect to node B or C. I find this very useful for planned maintenance. When all the connections on node A is gone I can stop the MySQL service, do whatever I need to do and start MySQL and the clustercheck service, and move on to the next node.

But I’m facing little problem. A lot of the clients connecting to my cluster is services running continuously and they have a persistent connection to the database. So even if I stop the clustercheck service on a node, the connections from these services will never disconnect.

Since most of these services is inhouse developed, it is fairly simple to change the code regarding connections to the cluster.

My suggestion would be to instruct the developers to make all services disconnect and reconnect if a connection is open for more than one hour.

But before I do so, I would like to hear if some of you have experienced the same problem, and if you have found another workaround to this matter.

Best Regards,
Lenny Andersen
A happy PXC DBA! :slight_smile:

Hi Lenny,

Very good point, HAProxy does not check each node if still operational on every new request, that would be overkill and too slow. So if you stop a node, indeed some connections may hit the black hole before HAProxy notices it down.
Stopping healthcheck service before taking the node down for maintenance surely is one way to do it, however HAProxy offers you an interactive management possibility via socket interface, where you can take servers or whole backends down for maint. See some details here:
[url]http://haproxy.tech-notes.net/9-2-unix-socket-commands/[/url]

And too long connections may be certainly a problem. IMHO the application should be able to handle broken ones by itself and perform reconnect tries if needed.