PXC on 500gb existing MySql DB

I would like to confirm something before attempting further to get PXC to work.

  1. we have existing data of around 500gb in total. From PXC docs it seems PXC will transfer the data on its own to my new nodes? So if I have my master I dont need to copy the data to the other nodes first, It will handle this. How well does this part work? Is it best to stop transactions on the master or can it handle constantly changing data while doing the transfer?

  2. some of my nodes will be far away, 400+ latency. I assume that the updates between nodes happens seperately to my local writes. For example if I do a 1000 seperate write statements in 1 second to a table, this will not slow down anything on the machine doing the write statements, and my other nodes will get the 1000 transactions ‘eventually’ (normally within 400 or so ms) ? Issue is because of the latency to some nodes I cant have the server initiating the sql statement to wait for all nodes to confirm it?

Hi jyhetpos,

  1. SST (or initial data sync) will happen under the hood by using xtrabackp. So the replica will receive an exact copy of the data set and sync with the primary. The primary con continue to receive traffic but expect some performance degradation as there will be heavy disk and network activity while SST is in progress

  2. PXC is very sensitive to latency. Every time there is a write transaction commited on the primary, the prompt (ACK) won’t be returned to the client until all nodes in the topology have accepted and validated the transaction. That said, every write transaction will be heavy delayed by the overall cluster latency (that can be possibly more than 400 ms depending on the number of nodes and geographic distribution)
    For PXC , it is strongly suggested that the nodes are close and with low latency and not geographically distributed. Check this blog for reference: How Not to do MySQL High Availability: Geographic Node Distribution with Galera-Based Replication Misuse - Percona Database Performance Blog

Regards

thanks for the quick reply.

1), thanks, sounds great
2), I am trying to have in-sync databases between two regions, is there any solution that you are aware of that can do this? Or how does other places do that? Writing to the same location isnt practical when clients are in different continents.

Hello @jyhetpos ,

2), I am trying to have in-sync databases between two regions, is there any solution that you are aware of that can do this?

Why do you need this? If region A goes down, will all traffic redirect to region B? Is region A a disaster recovery for region B?

Writing to the same location isnt practical when clients are in different continents.

But if region A goes down, will clients redirect to region B? That makes it perfectly practical.

If region A goes down but you don’t fail A over to B, then what’s the point in having in-sync databases?

What most do is what I’ve described: Either all write to same region, accepting the lag, or they write to a local region cluster that is dedicated to that region and ignore the database sync.

You could set up async replication between both regions, and have B write to db in B and A write to db in A, and then async replication will keep them in sync. But this can introduce other issues like duplicate entries if some random chance A writes to A, then tries to read/write to B and replication hasn’t sync’d yet.