tldr; Success! I got it working.
I installed the latest debug version from the ProxySQL site, on a second server, and had pretty much the same dismal results as I did with Percona’s version of it, on my first server - my application was blowing up, left and right, with all the issues I noted earlier in this thread. It was pretty discouraging.
But I thought to myself, “It could be the software is crap, but there’s a hell of a lot of people using it, just fine, and I’m new to it, so what might I be over-looking?” I’m a SysAdmin by trade, and often, if something new doesn’t work, it turns out to be a PEBKAC issue.
So, I thought, “When does everything ‘just work’?” Well, that’s when all the multiple instances of my application (I say ‘my’, but - trust me - I didn’t write it) were connected to a SINGLE database for their reads and writes. That, in turn, lead me back to my earlier after-thought question, above:
Does this have anything to do with my cluster having just 1 hostgroup (0), with all 3 nodes read/write?
So, I decided to set things up as a read/write split, that I’d seen mentioned in numerous places, as something a lot of people did. I was concerned about only having one writer, because what would happen if it went offline - would I be left with no writers, only readers? That would kinda break stuff too. But then I figured someone must have thought of that, already, so I started googling around. I came across this article:
It assuaged my concerns about being left with no writers (I wouldn’t be), and it showed me what to do. So, since I was mostly set up correctly already, I just used that article to see how to modify my mysql_servers and scheduler tables. I also modified mysql-query_retries_on_failure as it mentioned. I did not create any rules. I wanted it to act as if there was only one server, just like when ProxySQL wasn’t the middle-man.
I re-ran my test - essentially restarting all 51 instances of my application at once, so that it beat on the database - and it just worked. Not a single error anywhere. It was just as flawless as it was without ProxySQL - though perhaps a teeny bit slower than going directly to the server, but I’d read that the debug version was a tad slower, so that was OK.
So, it would seem that my application cannot handle multiple writers. Why, I do not know or understand, but having a single writer just works.
Next, I’ll set up some query rules, to redirect some traffic to the two reader (and, potentially, writer) nodes that are currently sitting unused. Rather than use the black & white method mentioned in the article above, I plan to do it the more “efficient” way spelled out in the “read/write split using regex and digest” section of this article:
If my application is still working OK, after adding the query-routing rules - and I’m betting it will - then I’ll go try re-configuring the other server (the one with Percona’s version on it) and making sure all is well there, too (I’m betting it will be). Then all I’ll have to do is decide which flavor I wish to run, going forward.
As Hannibal Smith would say, "I love it when a plan comes together!’