Hanging MDL locks in Percona Cluster

Hello,

we have 3 Percona (Ver 8.0.27-18.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3)) nodes in master-master replication mode and ProxySql (version 2.0.15-percona-1.1, codename Truls) is installed in front of the nodes.

Percona config:

[mysqld]
datadir                 = /var/lib/mysql
tmpdir                  = /srv/mysql_tmpdir
internal_tmp_mem_storage_engine = MEMORY
user                    = mysql
default_storage_engine  = InnoDB
binlog_format           = ROW
skip-name-resolve       = 1
sql_mode                = ""
collation_server        = utf8_unicode_ci
character_set_server    = utf8

# WSREP #
wsrep_provider          = /usr/lib/galera4/libgalera_smm.so
wsrep_provider_options  = "gcache.size=1G;gcache.recover=yes;gcs.fc_limit=160;gcs.fc_factor=0.8;pc.weight=1;cert.log_conflicts=yes;"
wsrep_cluster_address   = gcomm://10.0.1.11,10.0.1.12,10.0.1.13
wsrep_node_address      = 10.0.1.11
wsrep_node_name         = pxc-msk1-crm-node01
wsrep_sst_donor         = pxc-msk1-crm-node03
wsrep_cluster_name      = autocrm-cluster
wsrep_sst_method        = xtrabackup-v2
wsrep_retry_autocommit  = 10
wsrep_applier_threads     = 16
wsrep_certify_nonPK     = 1
wsrep_applier_FK_checks = 0
wsrep_certification_rules = optimized
pxc_strict_mode         = ENFORCING
pxc-encrypt-cluster-traffic = OFF

# 16-02-2022: wsrep_ignore_apply_errors=7 & lock_wait_timeout=7200
wsrep_ignore_apply_errors = 7
lock_wait_timeout       = 7200
max_write_lock_count    = 10

# LOGGING #
skip-log-bin
general_log             = ON
general_log_file        = /var/log/mysql/general.log
wsrep_log_conflicts     = ON
log_error               = /var/log/mysql/error.log
slow_query_log          = 1
slow_query_log_file     = /var/log/mysql/slow.log
long_query_time         = 1

# CACHES AND LIMITS #
tmp_table_size          = 32M
max_heap_table_size     = 32M
max_connections         = 1000
open_files_limit        = 240010
table_definition_cache  = 262144
table_open_cache        = 4096
wait_timeout            = 14400
sort_buffer_size        = 2M
thread_cache_size       = 80
thread_pool_size        = 8
performance_schema_digests_size = 524288

# INNODB #
innodb_flush_method             = O_DIRECT
innodb_log_files_in_group       = 2
innodb_log_file_size            = 2G
innodb_flush_log_at_trx_commit  = 2
innodb_file_per_table           = 1
innodb_buffer_pool_size         = 32G
innodb_thread_concurrency       = 0
innodb_autoinc_lock_mode        = 2
innodb_buffer_pool_instances    = 32
innodb_read_io_threads          = 8
innodb_write_io_threads         = 8
innodb_io_capacity              = 200
innodb_open_files               = 262144
innodb_print_all_deadlocks      = 1

[sst]
progress = 1
compressor='zstd -2 -T8'
decompressor='zstd -d -T8'
backup_threads=8

[xtrabackup]
parallel=8

ProxySQL config:

mysql_variables=
{
        interfaces="0.0.0.0:3306"
        threads=16
        max_connections=2048
        default_query_delay=0
        default_query_timeout=36000000
        default_schema="information_schema"
        poll_timeout=2000
        stacksize=1048576
        have_compress=true
        server_version="8.0"
        connect_timeout_server=3000
        connect_timeout_server_max=15000
        connect_retries_on_failure=3
        connect_timeout_client=15000
        monitor_history=600000
        monitor_connect_interval=60000
        monitor_ping_interval=10000
        monitor_read_only_interval=1500
        monitor_read_only_timeout=500
        ping_interval_server_msec=120000
        ping_timeout_server=500
        commands_stats=true
        sessions_sort=true
}

mysql_query_rules:
(
        {
                rule_id=101
                active=1
                match_digest="^SELECT.*FOR UPDATE$"
                destination_hostgroup=200
                apply=1
        },
        {
                rule_id=102
                active=1
                match_digest="^SELECT"
                destination_hostgroup=400
                apply=1
        }
)

mysql_galera_hostgroups:
(
        {
                writer_hostgroup=200
                backup_writer_hostgroup=300
                reader_hostgroup=400
                offline_hostgroup=0
                active=1
                max_writers=1
                writer_is_also_reader=2
                max_transactions_behind=30
                comment="PXC CRM Group"
        }
)

One of the node is writing node and another nodes are reading nodes.
In the process of work we see that there are appears hanging metadata locks on the writing node.

mysql> SELECT * FROM performance_schema.metadata_locks where LOCK_DURATION='EXPLICIT';
+-------------+----------------+------------------+-------------+-----------------------+-----------+---------------+-------------+------------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA  | OBJECT_NAME      | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE                 | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+----------------+------------------+-------------+-----------------------+-----------+---------------+-------------+------------------------+-----------------+----------------+
| TABLE       | crm_auraavto   | sales_case_needs | NULL        |       140574325155248 | SHARED    | EXPLICIT      | GRANTED     | dictionary_impl.cc:438 |             154 |           3223 |
| TABLE       | crm_orbis-auto | contact_result   | NULL        |       140618572197200 | SHARED    | EXPLICIT      | GRANTED     | dictionary_impl.cc:438 |             441 |            122 |
| TABLE       | crm_forpost    | users            | NULL        |       140572023107136 | SHARED    | EXPLICIT      | GRANTED     | dictionary_impl.cc:438 |             195 |            276 |
+-------------+----------------+------------------+-------------+-----------------------+-----------+---------------+-------------+------------------------+-----------------+----------------+
3 rows in set (0.01 sec)

We don’t see locks at reader nodes.
If we switch the writing node to another one, locks immediately appear there. We do not make any changes to the database structure.
If we trying to change structure which related with locked table then we have a node crashing with MDL BF-BF conflict error.
Can anybody help with question why do we have hanging metadata locks?

1 Like