In a 2 real node PXC is it possible to install a Galera Arbitrator in the same server of one node?
Using a different port than 4567 for it? Any downside is doing it?
Can I do for example Server1 = Node1+ProxySQL+garbd & Server 2 = Node2
Or Server1 = Node1+ProxySQL & Server 2 = Node2+garbd
Which is better considering that Node1 also runs the webserver?
@xirtam,
If you put garbd + node on the same server, then if that server goes down/offline/network split, then the entire cluster goes offline. That server would have voting power of 2 (garb = 1, node = 1), and the other node has power of 1. Total of 3. You need > 50% to survive. 2 missing out of 3 is 66.66% gone.
Ideally you should have a separate webserver. It is dangerous to have DB and web on same host from a security perspective.
Yes, usually I do 4 servers cluster (1web+3db), but I can’t in this because the rack is full (I have 10gb private link between them, I don’t want to connect over the network)
Back on the initial question: yes if the nodes goes down I have 66% down, but it’s really important since the web and main node will still be up together (they are on the same box)?
My only concern is to not have split brain.
Probably I don’t even need garb in this situation because is impossible an unlinked server get writes, correct?
I mean there are only 2 cases:
- The webserver + nodes (which we’ll call main node) are down, so they can’t write to 2nd node = no split brain.
- The 2nd node is down, so web+1st node can’t write anyway.
My only concern is: if nodes goes down, and I don’t have gard, can it rejoin the cluster without problems and get the diffs?
Web and main PXC node will be up, but the PXC node will deny all activity since it is no longer part of a cluster. 66% is gone. The remaining 33% will refuse all reads and all writes.
If you only have 1 webserver, why do you even have PXC? If the machine the webserver is running on goes down, the site is down. What’s the benefit/purpose of having another machine with PXC+garbd if there’s no app to use it?
Set pc.weight=2
on the web+pxc node. This node now counts for 2 votes. Forget garbd, it’s unnecessary. node2 weight is 1. total is 3. if web+pxc server down, app is down, no writes anywhere. if node2 down, web+pxc still have majority (2/3) and writes/reads continue.
Worst situation then is if pxc on server1 crashes, but web/proxysql are still up. node2 will be non-primary because there’s only 1/3. you can configure dirty reads in this case, or force node2 to become primary. if you cleanly shutdown pxc1 (ie: systemctl stop mysql) then that reduces the total count from 3/3 to 1/1 and the cluster remains functioning on node2.
oh right. this is way easier.
To split read\writes on replicated bidirectional dbs. I’m having high CPU usage on node1 right now. I’m trying to split the load. But maybe there is an easier way to do it without PXC?
Is this something I can setup in config or must be done manually only when node1 is down?
Why? Have you investigated the reason? Is the webserver causing CPU? Do you see high user% or high iowait%? High iowait% means the disks are causing the problem. How many CPU cores? Can you upgrade to 2-4 more cores? Are you swapping to disk? If so, add more memory. PXC alone should be using 80% of your memory, leaving only 20% for proxysql+web+OS. that’s not much, and makes your prone to swap.
If there is high CPU from PXC, you probably have bad queries that are not using indexes. Fix the queries, and you’ll see CPU drop. Turn down some intensive IO stuff like setting innodb_flush_log_at_trx_commit=2, and sync_binlog=500
Yes, badly written apps with no easy way to optimize queries right now without rewriting most of it from scratch. I won’t go in details here. This is out of our scope. Not our job to rewrite it. We must just resolve the load issue fast.
Server are all dedicated EPYC 96core\256gb ram.
Currently is not a PXC. It’s a normal single MariaDB instance. We are evaluating a switch to PXC to split the load in 2.
So I was thinking about a 2 node cluster (I would setup a 3 node, but as I said we are out of space in the same rack).
I need a replication to use a load balancer, but it must be bidirectional as read\writes can happen on both. That’s why a master\slave replication is not an option.
Any other options, other than PXC?
No, that’s not how it works. The application has no knowledge of the backend infrastructure. ProxySQL will send writes to a single node, and then load balance reads between the nodes.
It’s not going to split the load in 2. 50% would go to node1, which replicates that to the other node. The other 50% goes to node2, which replicates that back to the other node. Both nodes receive 100%, always. PXC and other clustering solutions are not write-scaling solutions, they are HA/Read scaling solutions.
You can offset some load away from node1 by redirecting all reads to node2. Even in this case, node1 receives 100% of the writes and replicates them to node2. Both nodes receive the same write load. The difference is that there is no web server on node2, so node2 can handle more “load” of both reads and writes.
You can use ProxySQL to cache certain bad queries for 5-10s. That’ll help.
Solving “load issues fast” can be done by fixing queries. You could fix 5 of the worst queries in about an hour, and drop CPU by 20%. That is certainly fast and far more stable/long-term than rearchitecting the backend. Or even just cache the top 20 worst queries; that’ll lighten the load too.
I forgot to say my problem is read scaling. The problem comes from some SELECTs. There is no problem in writing, 95% is reading. I need a read scaling solution.
Then when the CMS is updated wrong code is back. We already fixed lots of them, but some are beyond fixable (nested selects queries that won’t use indexes or need full table scans).
Also this is not really an ISP job anyway, they pay they get more servers\power. We don’t put hands on customers code. It will be fixed eventually in the CMS.
Anyway 100% of the problems are from SELECTs so a PXC (or any other alternative) with a balancer would temporarly fix the issues.
Since 100% of the problems are SELECTs, then PXC+ProxySQL, or use traditional async+ProxySQL. Both will accomplish the task of scaling reads.
async+proxysql means a normal master\slave replication?
If using PXC can it be a 2 node only cluster without problems since web+node1 are on the same server then? Or do I need to allow write only on a main node? how is this done? in proxysql?
Yes, source/replica is the current terminology.
As long as you adjust the weight on web+node1, yes
Yes, create a proxysql rule that sends all SELECTs to hostgroup 11 (or whatever group you define as reader group), and any other queries go to hostgroup 10 (the writer group). Both our documentation pages and proxysql’s own docs pages have detailed info on how to set this up.
Is this the same as using gcs.fc_single_primary yes with PXC? Which is better to use? PXC or a
Can this be ignored using gcs.fc_single_primary?
No, it is not. gcs.fc_single_primary
simply alters the way flow control is calculated. Async and PXC are completely different in the way transactions are replicated and applied.
Which one is preferable/easier for a simple read balancing considering i will have only 2 nodes?
The most important thing to me is that if something goes down, a simple restart would automatically relink them and keep up the replication like nothing happened.
I will setup writing only in node1, and reading from both like you said with proxysql on web/node1 box.
So 1) web+node1+proxysql writing+reading 2) node2 only reading.
What is the easier and less overhead way for such setup? PXC with weight 2 on node 1 or a simple master/slave replication?
Async is more aligned for this, simple source/replica. PXC’s primary purpose is high-availability.
Depends on why the node went down. Depends on why replication broke. MySQL does not fix itself. You would still need to perform basic DBA duties and diagnose/fix. If you did a simple restart to change a config setting, or down for upgrade/maintenance, then yes, replication will auto reconnect and resume.
With this you mean the classic “CHANGE MASTER TO MASTER_HOST” setup with different serverid? Do you have a link to some guide?
Is this guide still applicable? I need a synchronous or semi-synchronous master/slave, because i need the data to be exactly the same if multiple selects are done on different nodes.
Yes, CHANGE REPLICATION SOURCE TO SOURCE_HOST…, SOURCE_USER=… There are endless guides on the web, and in our documentation.
synchronous replication is achieved with PXC. synchronous replication is not available with standard async MySQL. The semi-synchronous plugin is deprecated and will be removed soon; you should not use it.
As long as the SELECTs are done within the same transaction, ProxySQL forces them to the same node. ProxySQL will not split apart queries within a transaction.
Couldn’t find one on percona blog\doc which didn’t involve xtrabackup copies or pxc. That’s why I’m asking for one.
so this set me back to PXC setup
But there can be a delay between node1 write and when node2 receive it. What happens if a write has been done on node1 (the only enabled for writes in proxysql) but a select is done on node2 before it receives the previous write?
A completely async replication could lead to data discrepancy, no?
I don’t agree because PXC is synchronous replication, async apply.
That’s the nature of async replication. What you describe is a ‘critical read’, and is typically handled in the application by starting a RW transaction, which ProxySQL will send to the writer node.
PXC and GR are no different in this regard. Both have synchronous replication, but both are also async apply. There is no MySQL setting that can implment/guarantee 2-phase commit, synchronous writes. semi-sync replication only guarantees the transaction is in the replica’s relay log; it does not guarantee commit on replica.
Data discrepancy is when both systems are quiesced (ie: idle) and the data is different. That’s not the case with async, nor PXC/GR because replication is not idle. If you execute a SELECT on source, then same on replica and the replica has 2s lag, the result may be different, but this is not considered a data discrepancy because both systems are actively making changes. Eventually, both will have the same data which does not violate data discrepancy.