Delay listening until WSREP has prepared node?

Edward_Hibbert · March 2, 2022, 9:37am

If I reboot a node, I get a bunch of user requests which fail with

WSREP has not yet prepared node for application use’

This only lasts a short time before the node is fully ready - but end user errors are bad, and undermine the point of having a cluster.

Is there a way to configure MySQL/Percona to delay listening for user connections until WSREP has completed its initialisation?

matthewb · March 2, 2022, 3:11pm

Hello @Edward_Hibbert,
If you frontend your cluster using ProxySQL then the proxy won’t send connections to the backend cluster nodes until WSREP is ready.

Can you explain the difference between receiving ‘WSREP has not prepared’ error vs ‘connection refused’ error? Both are errors that your app would receive.

Edward_Hibbert · March 2, 2022, 6:04pm

Thanks for your reply. Generic load-balancers such as haproxy will route a request to a different pool member if they can’t establish a connection. If they can establish a connection, then most of them won’t be aware of the meaning of specific SQL errors so they’ll just pass them back to the client.

So for those kinds of environments I can’t see a case where accepting requests but returning errors to all of them would be better than not accepting the connection at all . I could easily be missing something - is there such a case?

I don’t currently use ProxySQL, but that’s a useful tip. I’m not keen to add another component; I was hoping there would be a configurable way of telling Percona not to listen immediately.

matthewb · March 2, 2022, 6:21pm

Gotcha. Makes sense.

The issue here is that ‘WSREP has not prepared…’ is not a “this only happens at startup” status. Nodes can disconnect and reconnect from the cluster at any time. You can have a PXC node that’s been running just fine for days, then disconnects due to some network blip and the apps will get ‘WSREP not prepared’ while the node has disconnected. If clients are blocked, how would you know that a situation as arisen? How would your alerting/monitoring function if connections were blocked?

You wouldn’t add another component; you would replace haproxy with ProxySQL.

HAProxy supports backend health checks. How about implementing this?

Edward_Hibbert · March 2, 2022, 6:56pm

Wouldn’t it be better for Percona to drop the listen and then re-open it when it was in a fit state to receive connections? That would also allow detection of an issue (though surely there are better ways) and be less likely to provoke unwary clients. Much MySQL client code won’t be aware of Galera-specific errors and realise that they are transient.

I still can’t see any situation at all where accepting connections when you know you’re going to reject the requests is better than not accepting the connections in the first place. That situation is something all clients have to handle in some way anyway.

(Sorry if that sounds rude, it’s not meant to be.)

I was using haproxy as an example; actually I have some custom code for which servers I use - there are some long-running requests which get sent to different servers for preference to avoid locking issues, but which should fail over if a server is unavailable, so the logic is a bit customised to the application. That code attempts but fails to handle the WSREP case, which is what prompted me to wonder why I was writing that code in the first place, and whether there was a way around it.

Anyway, it sounds like there isn’t any Percona config for this, so I’ll look at fixing my client some more.

matthewb · March 2, 2022, 7:13pm

How would you know what the problem is if you can’t connect? If you don’t know the problem, how would you fix it? What if this wasn’t a transient issue and something more systemic and now you can’t connect because the node is in a bad state and is blocking connections? You’d never be able to diagnose let alone fix. It’s like removing the entire dashboard out of your car. Car won’t start cause there’s a problem. What’s the problem? Who knows cause there’s no dashboard to tell you the battery is dead.

Because “mysql” is not the issue. “mysql” is not the one generating this error. This is a plugin running inside mysql that is producing the error. Thus mysql accepts the connection and when you request information from the plugin, the plugin gives an error. This is no different than if InnoDB or MyRocks had an error. If InnoDB had some issue that it couldn’t fetch data, “mysql” would still allow the connection and InnoDB would be returning an error. Again, if you simply blocked connections when there are error conditions, how would you even know what the issue is?

Because the application should indeed be handling this case. The app said “give me data”, the database returned “sorry, can’t do that”, and thus the app is responsible for follow-up action. You (the app) walk into a grocery store (the db) and ask for a candy bar. Store says “sorry, none here”. Who fixes the situation? You do. You don’t wait there (indefinitely) until the store has a bar in stock. You leave and go to another store. You also didn’t know the store had an issue until after you went inside (connected). If the doors were closed, you have no idea if they are closed cause no candy bars, or closed for lunch break, or the door is broken.

Topic		Replies	Views
The node does not join back into the cluster Percona XtraDB Cluster 8.x	3	1060	February 12, 2024
WSREP has not yet prepared node for application use Percona XtraDB Cluster 8.x	1	2355	January 21, 2022
percona cluster db on 2nd node fail to start Percona XtraDB Cluster 5.x	4	1511	February 26, 2015
Percona cluster failed to connect backend connection Percona XtraDB Cluster 5.x	1	5386	March 30, 2017
After systemctl restart mysql, the node does not connect to the cluster Percona XtraDB Cluster 8.x	4	1363	September 29, 2023

Delay listening until WSREP has prepared node?

Related topics