Delay listening until WSREP has prepared node?

If I reboot a node, I get a bunch of user requests which fail with

WSREP has not yet prepared node for application use’

This only lasts a short time before the node is fully ready - but end user errors are bad, and undermine the point of having a cluster.

Is there a way to configure MySQL/Percona to delay listening for user connections until WSREP has completed its initialisation?

1 Like

Hello @edwh,
If you frontend your cluster using ProxySQL then the proxy won’t send connections to the backend cluster nodes until WSREP is ready.

Can you explain the difference between receiving ‘WSREP has not prepared’ error vs ‘connection refused’ error? Both are errors that your app would receive.

1 Like

Thanks for your reply. Generic load-balancers such as haproxy will route a request to a different pool member if they can’t establish a connection. If they can establish a connection, then most of them won’t be aware of the meaning of specific SQL errors so they’ll just pass them back to the client.

So for those kinds of environments I can’t see a case where accepting requests but returning errors to all of them would be better than not accepting the connection at all . I could easily be missing something - is there such a case?

I don’t currently use ProxySQL, but that’s a useful tip. I’m not keen to add another component; I was hoping there would be a configurable way of telling Percona not to listen immediately.

1 Like

Gotcha. Makes sense.

The issue here is that ‘WSREP has not prepared…’ is not a “this only happens at startup” status. Nodes can disconnect and reconnect from the cluster at any time. You can have a PXC node that’s been running just fine for days, then disconnects due to some network blip and the apps will get ‘WSREP not prepared’ while the node has disconnected. If clients are blocked, how would you know that a situation as arisen? How would your alerting/monitoring function if connections were blocked?

You wouldn’t add another component; you would replace haproxy with ProxySQL.

HAProxy supports backend health checks. How about implementing this?

1 Like

Wouldn’t it be better for Percona to drop the listen and then re-open it when it was in a fit state to receive connections? That would also allow detection of an issue (though surely there are better ways) and be less likely to provoke unwary clients. Much MySQL client code won’t be aware of Galera-specific errors and realise that they are transient.

I still can’t see any situation at all where accepting connections when you know you’re going to reject the requests is better than not accepting the connections in the first place. That situation is something all clients have to handle in some way anyway.

(Sorry if that sounds rude, it’s not meant to be.)

I was using haproxy as an example; actually I have some custom code for which servers I use - there are some long-running requests which get sent to different servers for preference to avoid locking issues, but which should fail over if a server is unavailable, so the logic is a bit customised to the application. That code attempts but fails to handle the WSREP case, which is what prompted me to wonder why I was writing that code in the first place, and whether there was a way around it.

Anyway, it sounds like there isn’t any Percona config for this, so I’ll look at fixing my client some more.

1 Like

How would you know what the problem is if you can’t connect? If you don’t know the problem, how would you fix it? What if this wasn’t a transient issue and something more systemic and now you can’t connect because the node is in a bad state and is blocking connections? You’d never be able to diagnose let alone fix. It’s like removing the entire dashboard out of your car. Car won’t start cause there’s a problem. What’s the problem? Who knows cause there’s no dashboard to tell you the battery is dead.

Because “mysql” is not the issue. “mysql” is not the one generating this error. This is a plugin running inside mysql that is producing the error. Thus mysql accepts the connection and when you request information from the plugin, the plugin gives an error. This is no different than if InnoDB or MyRocks had an error. If InnoDB had some issue that it couldn’t fetch data, “mysql” would still allow the connection and InnoDB would be returning an error. Again, if you simply blocked connections when there are error conditions, how would you even know what the issue is?

Because the application should indeed be handling this case. The app said “give me data”, the database returned “sorry, can’t do that”, and thus the app is responsible for follow-up action. You (the app) walk into a grocery store (the db) and ask for a candy bar. Store says “sorry, none here”. Who fixes the situation? You do. You don’t wait there (indefinitely) until the store has a bar in stock. You leave and go to another store. You also didn’t know the store had an issue until after you went inside (connected). If the doors were closed, you have no idea if they are closed cause no candy bars, or closed for lunch break, or the door is broken.

1 Like