Issue replicating from MySQL 8.0.22 to Percona XtraDB Cluster 8.0.34-26.1

Our current setup is a standalone MySQL 8.0.22 server (I know, it’s bad) and we’d like to migrate to a PXC Cluster as the application grew quite a lot.

My main issue is that I’m trying to set-up replication between our existing instance and one of the nodes in the cluster but the nodes fail with: “WSREP has not yet prepared node for application use

Here are the steps that I took:

  1. Dump the database that we want to replicate:
    mysqldump -h standalone -uroot -P 3306 --password \
        --skip-lock-tables \
        --skip-add-locks \
        --single-transaction \
        --flush-logs \
        --master-data=2 \
        somedatabase
    
  2. Import the database on PXC-3
  3. Change master on PXC-3 using the MASTER_LOG_FILE and MASTER_LOG_POS from the dump:
    CHANGE MASTER TO MASTER_HOST="192.168.0.100",
    MASTER_USER="replication_user", MASTER_PASSWORD="*******",
    MASTER_LOG_FILE='binlog.000004', MASTER_LOG_POS=196;
    
  4. Start replication

After starting the replication the PXC-3 node picked-up the data however the other nodes started failing with “WSREP has not yet prepared node for application use”. After restarting them and they came back up successfully with all the data from PXC-3 replicated.

To test the replication I’ve inserted a new row in the standalone instance and went to check on PXC-3 to see if it replicated however it too started failing with “WSREP has not yet prepared node for application use”

So I’m clearly doing something wrong here, maybe I misconfigured something?

Here are the configs:
Existing (8.0.22) instance:

[mysqld]
default_authentication_plugin=mysql_native_password
;skip-log-bin
max_connections=1024
innodb_buffer_pool_size=1G
sort_buffer_size=256000000

server-id=4
gtid_mode=ON
enforce-gtid-consistency=ON

log-bin=binlog
binlog-format=row

binlog-do-db=somedatabase

PXC-node-3 (8.0.34-26.1)

[mysqld]
server-id=3
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
log_error_verbosity=2
pid-file=/var/run/mysqld/mysqld.pid

# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800

######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib/galera4/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://192.168.122.100,192.168.122.101,192.168.122.102

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# Slave thread to use
wsrep_slave_threads=8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node IP address
wsrep_node_address=192.168.122.102
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified,  then system hostname will be used
wsrep_node_name=pxc-cluster-node-3

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=ENFORCING

pxc_encrypt_cluster_traffic=off

# SST method
wsrep_sst_method=xtrabackup-v2

# Tuning
max_connections=1024
innodb_buffer_pool_size=2G
sort_buffer_size=256M

# Temporary replication
gtid_mode=ON
enforce-gtid-consistency=ON
log-replica-updates=ON
skip-replica-start=ON

Here’s an overview of I’m trying to achieve in this first phase of the migration:

Hello @bogdannbv

That’s incredibly tiny. You said the application is growing? But with a pool this small, you’re hamstringing yourself. Buffer pool size should be, starting at, 80% of the systems overall RAM.

In what way? PXC is not a write-scaling solution. PXC is a high-availability solution. If you’re expecting PXC to be able to handle more write-transactions, you need to reset expectations. PXC is a read-scaling (more nodes to spread reads) and HA solution (one goes down, 2 others hot-n-ready to take over).

pxc_strict_mode

I would set this to permissive to test and see what’s not working. Have you looked at the PXC error logs?

Also, I would have started my PXC as a single node (you did bootstrap the first node right?), imported the mysqldump, connected replication, wait for catch-up, then create nodes2/3. This way, you’re sorta isolating where any issues might be happening.

Hi @matthewb one mistake on my side, I forgot to mention that I haven’t actually done this in production yet, I’m working in a local envieonment comprised of 4 VMs to emulate the topology in production.

So the innodb_buffer_pool_size is enough for what I’m doing as it’s just me hitting the database and my test database is fairly small.

I’m not expecting PXC to solve any potential write-scalability issues as we’re not at that point yet. I’m specifically looking for high availability and automatic failover in case one of the nodes goes down.

I’ve looked at the error logs but no errors, which is weird, I’ll try increasing the verbosity level and post them here if there’s anything that stands out

The cluster was up and running properly when I did this but I’ll try starting with a single node in bootstrap mode and then bring the other nodes up.

Hi again @matthewb I’ve triple-checked everything and the issue was that I forgot to sim-link the replication.cnf file on the other two nodes (i’ve got the last part from PXC config above in a separate file so that I can remove it entirely once the migration is done and to modify it more easily on the host machine in a shared volume between all the PXC instances).

I’ve also found the error, I overlooked it as it was between warnings about mysql_native_password being deprecated:

[ERROR] [MY-000000] [Galera] Failed to apply write set: gtid: 6fcd1063-b155-11ee-958e-d68d57bd7808:42138 server_id: 34c9b460-da84-11ee-adc0-a73915500901 client_id: 30 trx_id: 1323 flags: 3 (start_transaction | commit)

To double-check I’ve removed the replication.cnf file from one of the nodes and restarted it. It started and joined the cluster normally however once I added a new row in one of the tables sure enough the node without the GTID config failed with “WSREP has not yet prepared node for application use”.

So yeah, basically skill issue :laughing:

Thank you for helping me and for the tips!