PXC 8.x replication issues

Good afternoon. Maybe someone can help.

There is a single node version 8.x
Next created pxc8.x

  • create backup from single node
  • first node in pxc8 uploaded to bootstrap
  • added 2 remaining nodes to the cluster
    It seems that everything is ok

But you need to finish the cluster with a single node
I add(try on gdit and binlog) it and start slave and the node does not sync anything(increase Seconds_Behind_Master), the commands cannot be executed, only the restart of the service helps
When restart node, view log:
Slave SQL for channel ‘’: Worker 1 failed executing transaction ‘****’ at master log mysql-bin.000011, end_log_pos 108802; Error ‘WSREP has not yet prepared node for application use’ on query. Default database: '’. Query: ‘BEGIN’, Error_code: MY-001047
Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘mysql-bin.000011’ position 108633

I would be grateful for your help…

2 Likes

You are mixing two different replication styles. PXC does not use traditional replication. PXC uses the Galera communications library to manage transaction replication.

If you were successful in getting all 3 nodes connected together, then YAY you are done! That is your cluster. They will communicate and sync over tcp/4444 and share transactions via this port. There is nothing else to configure.

2 Likes

This is true, but the cluster does not have up-to-date data at the moment, the currently working single node with which I made a backup for the cluster

The question is how do I synchronise the cluster with a single node?

1 Like

any idea how to synchronise cluster with single node?

1 Like

You should read up on PXC 101 Basics. PXC has built-in, native synchronization. When you bring node2 online, it will contact another member of the cluster. If node2 doesn’t have the same data as the other node, then node2 will automatically receive a copy of the entire dataset. This is known as SST.

Shut down the new node. Erase the datadir completely and then start MySQL. This new node will automatically get a copy of the entire dataset from the existing node.

1 Like

You misunderstood me
I have 1 working single node 8.x
We set the task to transfer this node to the cluster.
I took a backup from this single node and created a cluster with 3 nodes.
Now, in order to switch to the cluster, you need to synchronise it with the working single node.
When I run start slave on one of the cluster nodes, that node crashes

1 Like

Ok. I understand now.

In your 3-node cluster, is it online and functioning? Is it in Primary state? Can you create tables, INSERT, DELETE, SELECT on ALL nodes without errors? Make sure this works first.

Then, pick one of the cluster nodes and configure it as replica of your existing single MySQL server. You should be using GTIDs for this to help prevent any potential replication/binlog issues.

1 Like

Yes, I do that, I use gtid to set up replication with a single node on one of the nodes
But the problem is that when I do a start slave, the node crashes from the cluster and only reboot helps.
This is where the problem is

1 Like

But the most interesting thing is when I reload this node without a cluster (I comment on everything related to wsrep_* in the config)
replication works fine and without errors
Mystic((((

1 Like

Do you have log_slave_updates=on in the replica/node config?

What you are trying to do is almost quite literally the first lab we do in our PXC Training Course, so I know that this works.

1 Like

Yep, but use new command log_replica_updates = 1

1 Like

Well, I would start your cluster over fresh because it looks like you have some sort of data mismatch that is causing the cluster to vote to expel that member when replication starts.

Blow away the cluster, bootstrap 1 node, configure async replication to this node. Verify this works, then start node2, wait for SST, verify replicated events received by node1 are going to node2. Then start node3.

1 Like

Do I understand correctly what to run on the first node in bootstrap mode, slave replication without adding other nodes to it?
and after the first node syncs with single node, connect the second and third?

1 Like

Do I understand correctly what to run on the first node in bootstrap mode, slave replication without adding other nodes to it? and after the first node syncs with single node, connect the second and third?

Yes.

2 Likes

did not help(((
In bootstrap mode without other nodes, replication behaves exactly the same.
Runs, but does not catch up with the server (

in log:
[WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

1 Like

Unfortunately I can’t diagnose this issue any further without accessing your systems to see what is going on. It sounds like you have some fundamental configuration issues.

As my last piece of advice, I would try the following:

  1. Erase PXC1 datadir
  2. Ensure that all GTID-related configuration is set on mysql1
  3. Take a new xtrabackup of mysql1 and restore to PXC1
  4. Start PXC1 in bootstrap mode
  5. Configure PXC1 as replica using GTID

These steps are the same steps we do in our PXC Tutorial Session, which I just finish delivering to a client last Thursday. I know for 100% certainty that this works.

3 Likes

Thanks for the help, I will think that is not so.
I don’t understand why when I comment lines in my.cnf (wsrep_) on PXC1 and start the node without cluster mode, replication works fine with mysql1

And as soon as in cluster mode and it doesn’t matter in bootstrap or already in normal mode, replication starts(Slave_IO_Running: Yes/Slave_SQL_Running: Yes) up but lags behind and the node crashes from the cluster…

The problem, as for me, is hidden in the cluster mode, but there are not so many settings:

pxc_encrypt_cluster_traffic = OFF
wsrep_cluster_address = gcomm://10.0.0.1,10.0.0.2,10.0.0.3
wsrep_cluster_name = mysql-cluster
wsrep_node_address = 10.0.0.1
wsrep_node_name = mysql-cluster-n1
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_provider_options = ‘evs.inactive_timeout=PT1M;evs.install_timeout=PT1M;evs.keepalive_period=PT3S;evs.send_window=512;evs.suspect_timeout=PT30S;evs.user_send_window=256;evs.version=1;gcache.size=1G;gcs.fc_factor=0.8’
wsrep_slave_threads = 16
wsrep_sst_method = xtrabackup-v2

1 Like

You need to examine the reason behind the lag. Is it table locking? Massive row writes? I would remove all of those wsrep_provider_options you have except for the gcache size.

If you took a current backup and restored it to pxc1 then started replication, all within a few minutes, then there should be 0 lag. If you have lag, there’s some other issue happening.

2 Likes