Donor node issues

When a new node joins the cluster and node1 becomes the donor does that remove node1 from the cluster and block all reads and writes?

I have a multi-master setup and have all of my scripts writing to node2. Is there a way to not have new nodes use that node as it’s donor. Here is my setup below.

[client]
socket=/var/run/mysqld/mysqld.sock
ssl-ca=/etc/mysql/percona-cert/ca.pem
ssl-cert=/etc/mysql/percona-cert/server-cert.pem
ssl-key=/etc/mysql/percona-cert/server-key.pem

[mysqld]
server-id=1
user=mysql
tmpdir=/db3/tmp
datadir=/db1
local_infile=ON
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
default_storage_engine=InnoDB
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
innodb_buffer_pool_size = 20G
innodb_buffer_pool_instances = 8
innodb_lock_wait_timeout = 100
wait_timeout = 86400
interactive_timeout = 86400
max_heap_table_size = 96M
sort_buffer_size = 16M
join_buffer_size = 16M
tmp_table_size = 96M
max_sort_length = 8K

max_allowed_packet = 16M
group_concat_max_len = 16M
max_connections = 200
max_connect_errors = 10
max_binlog_size = 128M
log-bin=binlog
log_slave_updates
binlog_expire_logs_seconds=604800
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
binlog_format=ROW
wsrep_slave_threads=8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_cluster_address=gcomm://192.168.2.59,192.168.2.61,192.168.2.64,192.168.2.62
wsrep_node_address=192.168.2.60
wsrep_node_name=DB403
wsrep_cluster_name=DB-cluster
pxc_strict_mode=PERMISSIVE
ssl-ca=/etc/mysql/percona-cert/ca.pem
ssl-cert=/etc/mysql/percona-cert/server-cert.pem
ssl-key=/etc/mysql/percona-cert/server-key.pem
tls_version=TLSv1.2,TLSv1.3
[sst]
tmpdir=/db3/tmp
sst-idle-timeout=0
encrypt=4
ssl-ca=/etc/mysql/percona-cert/ca.pem
ssl-cert=/etc/mysql/percona-cert/server-cert.pem
ssl-key=/etc/mysql/percona-cert/server-key.pem
[mysqld_safe]
service_startup_timeout=73200

192.168.2.59 is the designated write node. Do i just not include it in the wsrep_cluster_address?
I joined a new node and i believe it used 192.168.2.59 as the donor which seems like it caused a traffic jam with transaction locks. Which caused a large traffic jam for hours, until the the locks were granted or freed.

---TRANSACTION 852286683, ACTIVE (PREPARED) 7392 sec
51 lock struct(s), heap size 8312, 7781 row lock(s), undo log entries 5100
MySQL thread id 6845761, OS thread handle 140612192237312, query id 170451164 dev101.domain.pub 192.168.2.187 user1 wsrep: write set replicated and certified (1712284)
commit
1 Like

Hi danarashad,

While a node does SST, it won’t be blocked for write/read operations assuming sst method is xtrabackup and that you don’t have non transactional tables.

When xtrabackup executes, it will execute:
Executing LOCK TABLES FOR BACKUP…
This will apply certain locks as described here: Introducing backup locks in Percona Server - Percona Database Performance Blog
When the xtrabackup backup is finished, the locks are released.

SST can be an expensive operation depending on data set size, network and hardware resources. Because of the extra load and/or locking, server performance will probably degrade and connections might pile up at peak load traffic. It is strongly suggested that you a non writer node for serving SST. You can do so as explained here: State Snapshot Transfers — Galera Cluster Documentation

Regards

1 Like

Hello @danarashad,
During the SST donor process, a node is not blocked for writes/reads. The node does get dsync’d from the cluster which only prevents it from invoking flow control to other nodes, asking them to slow down.

wsrep_cluster_address has nothing to do with being a donor. wsrep_cluster_address is only a hint for the communications library to find nodes. Even if you remove a node from this parameter, when other nodes join, they will announce all nodes they know of and connect anyways.

You can specify a donor by setting wsrep_sst_donor= in my.cnf and that node will request SST from that server.

Also, the transaction info you show looks pretty bad. 1 query with 7781 row locks? Was this an UPDATE?

1 Like

After I paused all crons the write node was eventually moved past the 7781 row locks. We have a lot of dml, it was either an update or insert via load data infile. (is there a way to look up what the transaction is actually doing)
Taking over from someone else and they used load data in file for giant transactions.

1 Like

You can do SHOW PROCESSLIST to see the query being executed.

How many nodes are in your cluster?

1 Like

I have 5 nodes in the cluster. Show processlist didn’t show much of anything. I don’t recall what but it didn’t have the load data infile statement.
wsrep_sst_donor= can that be a list of ip’s. wsrep_sst_donor=192.168.0.1,192.168.0.2

1 Like

Yes. The joiner will try them in order.

1 Like

Thank you I will give that a try.

1 Like