PXC8.0 MDL conflict during operations CREATE/DROP USER or GRANT

Hi,
https://perconadev.atlassian.net/browse/PXC-4315
Fixed in 8.0.35
I’m afraid the OOM error was not related to this message.

If you have steps to reproduce OOM, please create a Jira ticket.

Hello,
@Kamil_Holubicki

Error:
[MY-000000] [WSREP] MDL conflict db= table= ticket=10 solved by abort

still exists in version 8.0.36 and appeared in 8.0.33, but in the messages above there was an answer that a fix is ​​expected in 8.0.37.
The OOM is not related to the ticket, but the same error appeared during SST, which is strange, this did not happen before in 8.0.33.
The log was filled with several thousand entries.
On cluster 8.0.33, the only thing that was added was the use of roles and privileges.

I recently picked up a new cluster with version 8.0.36 and restored a full backup on it using xtrabackup.
Then I checked the CREATE/DROP USER or GRANT commands, no errors or freezes. But the cluster was not under traffic, we will check this. If the new cluster does not have such problems with MDL conflict and freezing, then we will switch to it.

So I’ll come back later with the result.

Hello,

Today we switched to a new cluster with a full xtrabackup backup restore, switched traffic to ProxySQL.
Then I executed the GRANT command for one of the users, it went through instantly, no errors or freezes occurred.
For now we are leaving it for testing and will also check for other commands.

Hello,
Alas, another command:

GRANT SELECT, INSERT ON `db_orders`.purchased_product TO `user_name`@`%`;

caused problems:

  1. execution was slow
Query OK, 0 rows affected (3 min 48.00 sec)
  1. there were Aborted Connections and MDL conflict errors on the cluster:
2024-08-16 10:43:05
2024-08-16T07:43:05.407201Z 632643 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:
2024-08-16 10:43:05
2024-08-16T07:43:05.407192Z 632643 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED -------- 2024-08-16 10:43:05 2024-08-16T07:43:05.407174Z 632643 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort 2024-08-16 10:43:05 2024-08-1 6 10:43:05 SQL: START TRANSACTION 2024-08-16 10:43:05 THD: 634148, mode: local, state: exec, conflict: aborted, seqno: -1 2024-08-16 10:43:05
2024-08-16T07:43:05.407058Z 632643 [Note] [MY-000000] [WSREP] Victim thread: 2024-08-16 10:43:05 2024-08-16 10:43:05 SQL: FLUSH PRIVILEGES 2024-08-16 1 0:43:05 THD: 632643, mode: toi, state: exec, conflict: aborted, seqno: 43603881 2024-08-16 10:43:05 2024-08-16T07:43:05.406954Z 632643 [Note] [MY-000000] [WSREP] Winning thread:
2024-08-16 10:43:05
2024-08-16 10:43:05
2024-08-16T07:43:05.406939Z 632643 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:
2024-08-16 10:43:05
2024-08-16T07:43:05.406848Z 632643 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED -------
  1. one node is stuck, but the cluster is still up
  2. MySQL log was filled with errors:
[MY-010926] [Server] Access denied for user 'usr_nomenclature'@'10.9.2.146' (using password: NO)
... 
[Server] Aborted connection 632226 to db: 'db_stocks' user: 'usr_stocks' host: '10.9.2.146' (Got an error reading communication packets).

Despite the users having privileges, everyone lost access.
The problem was solved by restarting the frozen node.

There is a suspicion that it may be related to the use of roles and privileges for them.
I will test the hypothesis - I will delete all created roles and assign privileges to each user as it was before.
This is the only thing that changed in the cluster and raises suspicion.

Hi @shigaev.s ,
The issue should be solved in upcoming 8.0.37 (PXC-4385)

1 Like

@Kamil_Holubicki Hi!
When is the deadline?

So, I managed to do it.
What was done:

  1. disabled ProxySQL (2 nodes in K8S) by setting the scale to 0 for the duration of the work to eliminate traffic and any load
  2. manually deleted roles with the DROP ROLE command, and the very first command:
mysql> DROP ROLE `db_bookwatch_full`;
Query OK, 0 rows affected (6 min 19.98 sec)

took an insanely long time to execute, and in this group there was only one user with access to 1 DB with FULL privileges
At this time, all cluster nodes froze, but the cluster itself was alive. After executing the command, the cluster became operational and all subsequent commands were executed instantly and without affecting the operation of the cluster nodes
Most likely it has nothing to do with granting privileges through roles or to the user.
3) launched Ansible role with a task for users: create/delete user and privileges. I repeated it 5 times.
During this time there were several:

[MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort

from the log:

2024-08-26 17:34:16	
2024-08-26T14:34:16.707145Z 17 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:34:16	
2024-08-26 17:34:16	
	  LIMIT 1
2024-08-26 17:34:16	
	    AND column_name IN ('status', 'enabled')
2024-08-26 17:34:16	
	    AND table_name = 'INNODB_METRICS'
2024-08-26 17:34:16	
	  WHERE table_schema = 'information_schema'
2024-08-26 17:34:16	
	  FROM information_schema.columns
2024-08-26 17:34:16	
	    column_name
2024-08-26 17:34:16	
   SQL: SELECT
2024-08-26 17:34:16	
   THD: 973244, mode: local, state: exec, conflict: executing, seqno: -1
2024-08-26 17:34:16	
2024-08-26T14:34:16.707126Z 17 [Note] [MY-000000] [WSREP] Victim thread: 
2024-08-26 17:34:16	
2024-08-26 17:34:16	
   SQL: FLUSH PRIVILEGES
2024-08-26 17:34:16	
   THD: 17, mode: toi, state: exec, conflict: committed, seqno: 116670811
2024-08-26 17:34:16	
2024-08-26T14:34:16.707117Z 17 [Note] [MY-000000] [WSREP] Winning thread: 
2024-08-26 17:34:16	
2024-08-26 17:34:16	
2024-08-26T14:34:16.707105Z 17 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:
2024-08-26 17:34:16	
2024-08-26T14:34:16.707039Z 17 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED --------
2024-08-26 17:34:15	
2024-08-26T14:34:15.572126Z 16 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:34:08	
2024-08-26T14:34:08.425718Z 19 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:34:05	
2024-08-26T14:34:05.348976Z 21 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:34:03	
2024-08-26T14:34:03.280406Z 13 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:31:37	
2024-08-26T14:31:37.753089Z 19 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:31:36	
2024-08-26T14:31:36.710479Z 20 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
2024-08-26 17:31:36	
2024-08-26 17:31:36	
	  LIMIT 1
2024-08-26 17:31:36	
	    AND column_name IN ('status', 'enabled')
2024-08-26 17:31:36	
	    AND table_name = 'INNODB_METRICS'
2024-08-26 17:31:36	
	  WHERE table_schema = 'information_schema'
2024-08-26 17:31:36	
	  FROM information_schema.columns
2024-08-26 17:31:36	
	    column_name
2024-08-26 17:31:36	
   SQL: SELECT
2024-08-26 17:31:36	
   THD: 973072, mode: local, state: exec, conflict: executing, seqno: -1
2024-08-26 17:31:36	
2024-08-26T14:31:36.710447Z 20 [Note] [MY-000000] [WSREP] Victim thread: 
2024-08-26 17:31:36	
2024-08-26 17:31:36	
   SQL: FLUSH PRIVILEGES
2024-08-26 17:31:36	
   THD: 20, mode: toi, state: exec, conflict: committed, seqno: 116670475
2024-08-26 17:31:36	
2024-08-26T14:31:36.710428Z 20 [Note] [MY-000000] [WSREP] Winning thread: 
2024-08-26 17:31:36	
2024-08-26 17:31:36	
2024-08-26T14:31:36.710417Z 20 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:
2024-08-26 17:31:36	
2024-08-26T14:31:36.710407Z 20 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED --------
2024-08-26 17:31:36	
2024-08-26T14:31:36.710343Z 20 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=1 solved by abort
  1. returned ProxySQL to work
  2. checked metrics and logs - no errors

It is not clear from this why one command out of about 45 took so long to execute? All the others were executed quickly and without errors or freezes.
Also, executing Ansible role with task for users went without problems.

I leave it for testing until the next CREATE/DROP, GRANT/REVOKE commands.

We are also waiting for a fix and its release date.

Hello,

I ran several REVOKE and GRANT commands.
They were executed quickly and without errors/freezing. No MDL conflicts.

This time there were no roles, all privileges were provided at the user level.
And it looks like there are problems with using roles in a cluster when nodes are under traffic. Or such an unfixed bug that it does not always appear.

Is there a fix date?