PXC 8.0 - nodes exit from the cluster after an MDL conflict

Hello all,

Can someone shed some light on the problem with the MDL conflict causing MySQL to shutdown node in the cluster?

Our enviroment is 3 PXC nodes with loadbalancing via ProxySQL:
node1 read-write
node2 read-only
node3 read-only

8.0.26-16.1 Percona XtraDB Cluster (GPL), Release rel16, Revision b141904, WSREP version 26.4.3

When a problem occurs:
When we execute mass ALTER tables (adding columns/keys/foreign keys) on read-write node1, other nodes may crash because of MDL conflicts.

node1 my.cnf

[mysqld]
datadir                 = /var/lib/mysql
tmpdir                  = /srv/mysql_tmpdir
internal_tmp_mem_storage_engine = MEMORY
user                    = mysql
default_storage_engine  = InnoDB
binlog_format           = ROW
skip-name-resolve	= 1
sql_mode                = ""
collation_server        = utf8_unicode_ci
character_set_server    = utf8

# WSREP #
wsrep_provider          = /usr/lib/galera4/libgalera_smm.so
wsrep_provider_options  = "gcache.size=1G;gcache.recover=yes;gcs.fc_limit=160;gcs.fc_factor=0.8;pc.weight=1;cert.log_conflicts=yes;"
wsrep_cluster_address   = gcomm://10.0.1.10,10.0.2.10,10.0.3.10
wsrep_node_address      = 10.0.1.10
wsrep_node_name         = pxc-node1
wsrep_sst_donor         = pxc-node3
wsrep_cluster_name      = pxc-cluster
wsrep_sst_method        = xtrabackup-v2
wsrep_retry_autocommit  = 10
wsrep_applier_threads     = 32
wsrep_certify_nonPK     = 1
wsrep_applier_FK_checks	= 0
wsrep_certification_rules = optimized
pxc_strict_mode         = PERMISSIVE
pxc-encrypt-cluster-traffic = OFF
wsrep_ignore_apply_errors = 7
lock_wait_timeout	= 7200

# LOGGING #
skip-log-bin
general_log             = ON
general_log_file        = /var/log/mysql/general.log
wsrep_log_conflicts     = ON
log_error               = /var/log/mysql/error.log
slow_query_log          = 1
slow_query_log_file     = /var/log/mysql/slow.log
long_query_time         = 1

# CACHES AND LIMITS #
tmp_table_size          = 32M
max_heap_table_size     = 32M
max_connections         = 1000
open_files_limit        = 240010
table_definition_cache  = 262144
table_open_cache        = 4096
wait_timeout            = 14400
sort_buffer_size        = 2M
thread_cache_size       = 80
thread_pool_size        = 16
performance_schema_digests_size = 1048576

# INNODB #
innodb_flush_method             = O_DIRECT
innodb_log_files_in_group       = 2
innodb_log_file_size            = 2G
innodb_flush_log_at_trx_commit  = 2
innodb_file_per_table           = 1
innodb_buffer_pool_size         = 64G
innodb_thread_concurrency       = 0
innodb_autoinc_lock_mode        = 2
innodb_buffer_pool_instances    = 64
innodb_read_io_threads          = 16
innodb_write_io_threads         = 16
innodb_io_capacity		= 200
innodb_open_files               = 262144
innodb_print_all_deadlocks      = 1

[sst]
progress = 1
compressor='zstd -2 -T6'
decompressor='zstd -d -T6'
backup_threads=6

[xtrabackup]
parallel=6

pxc-node2 mysql-error.log

2022-03-16T17:34:26.082428Z 196328 [Note] [MY-000000] [WSREP] MDL conflict db=xxx_dbname table=xxx_client ticket=10 solved by abort
2022-03-16T17:34:27.256059Z 199044 [Note] [MY-000000] [WSREP] MDL conflict db=xxx_dbname table=xxx_client ticket=10 solved by abort
2022-03-16T17:34:29.284658Z 14 [Note] [MY-000000] [WSREP] MDL BF-BF conflict

schema:  xxx_dbname
request: (thd-tid:14    seqno:119068119         exec-mode:toi, query-state:exec, conflict-state:committed)
          cmd-code:3 3  query:ALTER TABLE `xxx_common_client` ADD `phone_hash` varchar(255) UNIQUE)

granted: (thd-tid:18    seqno:119068125         exec-mode:high priority, query-state:exec, conflict-state:committing)
          cmd-code:3 167        query:(null))

2022-03-16T17:34:29.284706Z 14 [Note] [MY-000000] [WSREP] MDL ticket: type: shared write, space: TABLE, db: xxx_dbname, name: xxx_client
2022-03-16T17:34:29.284741Z 14 [ERROR] [MY-010119] [Server] Aborting
2022-03-16T17:34:29.284755Z 14 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2022-03-16T17:34:31.290194Z 14 [Warning] [MY-000000] [Server] /usr/sbin/mysqld: Forcing close of thread 210453  user: 'xxx_dbname'
2022-03-16T17:34:31.290722Z 2 [Note] [MY-000000] [WSREP] rollbacker thread exiting 2
2022-03-16T17:34:31.290801Z 14 [Note] [MY-000000] [WSREP] Server status change synced -> disconnecting
2022-03-16T17:34:31.290829Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2022-03-16T17:34:31.290899Z 14 [Note] [MY-000000] [Galera] Closing send monitor...
2022-03-16T17:34:31.290921Z 14 [Note] [MY-000000] [Galera] Closed send monitor.
2022-03-16T17:34:31.290984Z 14 [Note] [MY-000000] [Galera] gcomm: terminating thread
2022-03-16T17:34:31.291028Z 14 [Note] [MY-000000] [Galera] gcomm: joining thread
2022-03-16T17:34:31.291822Z 14 [Note] [MY-000000] [Galera] gcomm: closing backend
2022-03-16T17:34:31.293414Z 14 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,a8594bf1-864a,11)
memb {
        d57e7da2-86e4,0
        }
joined {
        }
left {
        }
partitioned {
        a8594bf1-864a,0
        ea59cd1a-a3b1,0
        }
)
2022-03-16T17:34:31.293492Z 14 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2022-03-16T17:34:31.293519Z 14 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2022-03-16T17:34:31.293782Z 14 [Note] [MY-000000] [Galera] gcomm: closed
2022-03-16T17:34:31.293951Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2022-03-16T17:34:31.294090Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [128, 160]
2022-03-16T17:34:31.294113Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
2022-03-16T17:34:31.294130Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> OPEN (TO: 119068160)
2022-03-16T17:34:31.294163Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE.
2022-03-16T17:34:31.294204Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0]
2022-03-16T17:34:31.294226Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection.
2022-03-16T17:34:31.294244Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 119068160)
2022-03-16T17:34:31.294265Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success
2022-03-16T17:34:31.294731Z 14 [Note] [MY-000000] [Galera] recv_thread() joined.
2022-03-16T17:34:31.294804Z 14 [Note] [MY-000000] [Galera] Closing replication queue.
2022-03-16T17:34:31.294828Z 14 [Note] [MY-000000] [Galera] Closing slave action queue.

Maybe there is some magic wsrep option that will prevent a nodes from dropping out of a cluster during mass ALTER and MDL conflicts? Or are we wishing for a weird and contradictory ACID?
Or maybe we need some tuning wsrep flow control?

Thank you all in advance for your advice and tips.

2 Likes

Are you 100% certain that all nodes have the same schema before attempting these ALTERs? The default methodology for DDLs is to use TOI (wsrep_osu_method) which tells all nodes to execute the ALTER at the same time, so if all nodes have the same schema initially, then the ALTER will execute just fine on all nodes. But if there are differences before the ALTER, then it will probably fail.

Have you isolated a specific ALTER that causes the disconnect, or is it random ALTER? Have you tried doing a single ALTER instead of “mass ALTERs”?

3 Likes

Hi @oliko , thanks for posting to the Percona Forums, welcome!!

Have you used our data consistency checker called pt-table-checksum? With it you can check the integrity of the data across nodes in PXC. Have a look at our documentation:

2 Likes

Hello @Michael_Coburn and @matthewb !
Thank you for your prompt answers! :blush:
Let me explain a little bit of the specifics. We make changes to the table structure often and constantly, as the product is being actively developed. There are about 500 databases in our cluster and it often happens that we need to massively change the structure of the one table in each database. We analyzed the problems, we could not find a pattern. The problem occurs on different tables and databases. From one ALTER, as a rule, replication does not stop, falls on massive ALTER-s.
To complete the picture, over the past 2 months such crashes occurred 15 times with MDL, and each time we have to start MySQL on node and waiting for SST.
Thanks for the tip about pt-table-checksum, we will check it very soon, but we have to constantly re-synchronize databases, which exclude variant, that there is a difference in database structures between nodes. Also, we’ve never played around with the wsrep_osu_method option to be able to change the table structure on only 1 node. By the way, can “wsrep_osu_method = NBO” save us?
Also now found that some DDL (ALTERs) was executed past ProxySQL, so there was a situation when in fact on node1 were r/w requests from applications, and in node2 read requests + DDL. Perhaps this also contributed to the fact that we have constant MDL conflicts, right?

1 Like

@oliko,
PXC can only manage 1 ALTER at a time. If you are attempting to execute multiple DDLs that is not going to be supported/work.

You should never have differences in schemas between PXC nodes. This will absolutely cause crashes. PXC nodes should always have the same data, the same schema, at all times. This is the very nature of PXC/Galera and it is a violation of the fundamentals for there to be either data or schema differences.

1 Like

We want to clarify the situation. On the nodes of our cluster, MDL locks are constantly hanging on the “common_user” table. They appear immediately after launch and synchronization with other nodes. On different nodes, we see different locked tables in different databases. Any change to the schema of this table (for example, deleting a foreign key to this table) leads to a node crash.

Here is an example of a permanently hanging lock:

SELECT * FROM performance_schema.metadata_locks l left join `performance_schema`.`threads` t on l.OWNER_THREAD_ID = t.THREAD_ID where l.LOCK_DURATION='EXPLICIT'\G;

 OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: dsf_geely
          OBJECT_NAME: common_user
          COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140454290305904
            LOCK_TYPE: SHARED
        LOCK_DURATION: EXPLICIT
          LOCK_STATUS: GRANTED
               SOURCE: dictionary_impl.cc:438
      OWNER_THREAD_ID: 170
       OWNER_EVENT_ID: 113
            THREAD_ID: 170
                 NAME: thread/sql/THREAD_wsrep_applier
                 TYPE: FOREGROUND
       PROCESSLIST_ID: 27
     PROCESSLIST_USER: NULL
     PROCESSLIST_HOST: NULL
       PROCESSLIST_DB: NULL
  PROCESSLIST_COMMAND: Query
     PROCESSLIST_TIME: 2
    PROCESSLIST_STATE: NULL
     PROCESSLIST_INFO: NULL
     PARENT_THREAD_ID: 1
                 ROLE: NULL
         INSTRUMENTED: YES
              HISTORY: YES
      CONNECTION_TYPE: NULL
         THREAD_OS_ID: 1568767
       RESOURCE_GROUP: SYS_default

Please tell me what could be the reason for this behavior.

P.S. And of course we do only one ALTER at a time. (No parallel ALTERs)

1 Like

Nodes will typically crash in PXC on purpose if the data is different between nodes. This is a safety mechanism. Data should never be different between nodes. Metadata locks occur even on plain MySQL, but since this is also PXC, the locks can have bigger impact.

Have you tried using RSU or NBO options?

1 Like