SST locks donor node for writing after any DDL

JiriZyka · December 21, 2024, 8:43pm

Hello,
When performing a long SST, after any DDL query is ran like CREATE TABLE or ALTER TABLE, the query gets stuck on backup locks and then also blocks any subsequent insert and delete queries.
Is there a way to avoid the write locks after a DDL query is ran?

How to reproduce:

Start an SST, that will take longer than a few minutes(use rlimit to slowdown the sst process)
Run CREATE TABLE (FULL PROCESSLIST will show its stuck by backup locks)
Run any WRITE query → it will be stuck til the end of the SST

More information
Percona XtraDB version: 8.0.39-30.1
Number of nodes in the cluster: 2 + arbiter
The second nodes serves almost as a “slave” it never receives write queries for the same database as the first node - Perhaps it would be safe to enable --no-backup-locks?
Mariabackup seems to be able to handle DDLs during SSTs and backups: https://jira.mariadb.org/browse/MDEV-16791

Thank you for any help

JiriZyka · December 21, 2024, 8:46pm

Here is the full configuration:

[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
sql-mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=30000

# files
innodb_file_per_table
innodb_log_file_size=15G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size= 40G
innodb_buffer_pool_instances=8
innodb_log_buffer_size=64M

# tune
innodb_doublewrite= 1
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit= 0
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10
innodb_lru_scan_depth=2048
innodb_page_cleaners=4
join_buffer_size=256K
sort_buffer_size=256K
innodb_use_native_aio=1
innodb_stats_persistent = 1

innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_io_capacity=1500
innodb_io_capacity_max=2500
innodb_purge_threads=4
innodb_adaptive_hash_index=0
max_prepared_stmt_count=1000000
innodb_monitor_enable = '%'
performance_schema = ON

ssl-ca=/etc/mysql/ssl/ca.pem
ssl-cert=/etc/mysql/ssl/server-cert.pem
ssl-key=/etc/mysql/ssl/server-key.pem
ssl-cipher=TLSv1.2
server-id="1"
datadir="/mnt/database-data-cluster-test/database-data"
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
wsrep_provider_options="socket.ssl_key=/etc/mysql/ssl/server-key.pem;socket.ssl_cert=/etc/mysql/ssl/server-cert.pem;socket.ssl_ca=/etc/mysql/ssl/ca.pem"
wsrep_notify_cmd = "/usr/local/sbin/wsrep_notify_slack --name database_cluster_test"

# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800

######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib/galera4/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://10.1.0.18,10.2.0.18

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# Slave thread to use
wsrep_slave_threads=8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node IP address
wsrep_node_address="10.1.0.18"
# Cluster name
wsrep_cluster_name="database_cluster_test"

#If wsrep_node_name is not specified,  then system hostname will be used
wsrep_node_name="database-cluster-test-proxmox1-datanode"

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=MASTER

# SST method
wsrep_sst_method=xtrabackup-v2

[sst]
encrypt=4
rlimit=30m
ssl-ca=/etc/mysql/ssl/ca.pem
ssl-cert=/etc/mysql/ssl/server-cert.pem
ssl-key=/etc/mysql/ssl/server-key.pem

matthewb · December 23, 2024, 4:02pm

Hello @JiriZyka,
From the docs:

LOCK TABLES FOR BACKUP uses a new MDL lock type to block updates to non-transactional tables and DDL statements for all tables.

What you are experiencing is correct behavior. You should not be running any DDL during an SST. The SST process needs a consistent schema to create a clone of the node, thus any/all DDL are blocked during the SST process.

Because you ran a DDL, it got blocked, and this block also blocks writes to the table. If you do not run a DDL, then writes will not get blocked.

JiriZyka · January 6, 2025, 12:45pm

Hello @matthewb,

Thank you for the answer.

In your message, you say that: blocks writes to the table. however it seems that one ALTER query on a single database blocks ALL write queries everywhere. Not just on the altered table or the altered database.

I assume this is still intended behaviour and not a bug. Is it dangerous to enable --no-backup-locks in my specific situation? Is there any other mariabackup-like(https://jira.mariadb.org/browse/MDEV-16791) workaround?

matthewb · January 6, 2025, 6:42pm

Yes, that is correct and is intended behavior. You should not be running DDL during a backup/SST process.

Topic		Replies	Views
Donor node can not serve write query while SST Percona XtraDB Cluster 5.x	6	1165	October 27, 2021
Waiting for table backup lock drop table if exists file_search	15	1545	March 11, 2022
Backup locking other nodes Percona XtraBackup	4	825	March 18, 2022
XtraDB Cluster -non-locking option to rebuild 2 nodes w/out taking Production down Percona XtraDB Cluster 5.x	1	642	September 27, 2016
Adding Nodes With Minimal Downtime? Percona XtraDB Cluster 5.x	1	701	April 8, 2019

SST locks donor node for writing after any DDL

Related topics