PXC8.0 MDL conflict during operations CREATE/DROP USER or GRANT

shigaev.s · June 13, 2024, 12:29pm

Hello!

There is a cluster of 3 version nodes:

Server version: 8.0.33-25.1 Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3

wsrep_OSU_method | TOI
wsrep_log_conflicts | ON

When executing commands: CREATE/DROP USER or GRANT
the server freezes and on all nodes there are errors in the logs:

[Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=10 solved by abort

In this case it helps:

stop nodes 2 and 3: systemctl stop mysql
start 2 nodes: systemctl start mysql
after SST 2 nodes start 3 nodes: systemctl start mysql
after SST 3 nodes, the cluster was restored

In some cases the commands: CREATE/DROP USER or GRANT
can be successfully completed or performed with a strong delay, for example, 1.5 minutes or more.

The commands are executed on 1 node and there are no similar commands in the cluster, i.e. without parallel execution.

This behavior has not previously been observed with such commands.

Tell me what the problem is and how to fix it?

matthewb · June 13, 2024, 5:05pm

If you switch to wsrep_OSU_method=NBO, do you get the same behavior? Are there any other table locks happening during the CREATE USER? During the stall, can you run SHOW ENGINE INNODB STATUS?

shigaev.s · June 14, 2024, 7:44am

I haven’t tested it with the wsrep_OSU_method=NBO parameter.
Will it be enough to do it at the Session level? And then execute the commands?

I’ll check it out and come back again.

Thanks for the answer!

shigaev.s · June 17, 2024, 7:15am

You can’t see the status because everything freezes until you turn off 2 nodes.

matthewb · June 17, 2024, 7:33pm

Yes, SET SESSION wsrep_osu_method=NBO then run the CREATE USER.

Any other information in the mysql error log during the stalls?

shigaev.s · June 25, 2024, 3:06pm

mysql> SET SESSION wsrep_osu_method=NBO;
Query OK, 0 rows affected (0,00 sec)

mysql> DROP USER `petrov.s`@`%`;
ERROR 1235 (42000): This version of MySQL doesn't yet support 'this query in wsrep_OSU_method NBO'

log error.log:

2024-06-25 18:03:30	
2024-06-25T15:03:30.088700Z 1327819 [ERROR] [MY-000000] [WSREP] Fail to replicate: DROP USER `petrov.s`@`%`

When deleting, conflicts arose again.
You can’t see anything else, because… the server hangs.

Very strange behavior.

matthewb · June 25, 2024, 5:56pm

Can you put together a repeatable test case? I’d like to try it. If I can repeat it as well, then we can open a bug report.

shigaev.s · June 26, 2024, 8:20am

The test case is like this:
There is a user table
You need to do one of the following:

CREATE USER 'test_user'@'%' IDENTIFIED BY 'password';
GRANT SELECT ON `db_test`.* TO 'test_user'@'%';
DROP USER `petrov.s`@`%`;

It doesn’t matter what user, privileges or database.

When responding to such requests, the cluster on all nodes has errors and freezes.
It helps to stop all nodes, bootstrap and connect one node at a time.

Not long ago they started using role: https://dev.mysql.com/doc/refman/8.0/en/roles.html
Could this be related?

I’ll try to repeat this on a trust cluster of the same version.

Modestas_Mockus · June 26, 2024, 8:53am

I have some hanging issues if i run all selects in one command with flush privelegies ant the end. No issues if run comand one be one with few seconds between them and before flush privelegies

shigaev.s · June 26, 2024, 8:57am

Yes, only one command is executed. Several are given, because on any of them leads to problems.
This behavior did not exist before.
When downloading the pt-show-grants utility, everything is identical.
What else can you check?
Can using roles have this effect?

shigaev.s · June 26, 2024, 1:07pm

Could it be related to a bug [PXC-4315] - Percona JIRA?
I am studying the updates and planning to upgrade. After that, check the commands.

matthewb · June 26, 2024, 5:38pm

Hello @shigaev.s,
I created a 3node PXC 8.0.35 and I was unable to repeat this. I ran the create/drop use as you provided and I did not experience any cluster stalls at all. Can you please upgrade to the latest PXC and see if your issue remains?

shigaev.s · June 27, 2024, 7:57am

Yesterday we upgraded to version 8.0.36-28 (04/03/2024).
Today we tested several commands: REVOKE, GRANT, DROP USER - no more MDL conflict and cluster freeze.
It seems that the bug fix in [PXC-4315] - Percona JIRA helped.

We’ll leave it for testing for now.
I’ll come back later and if it doesn’t happen again, I’ll mark the ticket as resolved.

shigaev.s · July 10, 2024, 7:58am

The problem repeated itself.
On the DROP USER command, the entire cluster froze, no logs, no metrics.
Moreover, the user was eventually deleted.
I checked this after restoring the cluster.
It is sad. The reason is not known and how to debug it.

Any ideas?

matthewb · July 10, 2024, 2:38pm

Hello @shigaev.s,
https://perconadev.atlassian.net/browse/PXC-4385
This fix will be in 8.0.37

shigaev.s · July 10, 2024, 2:42pm

Hello @matthewb ,
will wait.

matthewb · July 10, 2024, 3:58pm

If you can create a coredump while the server is hung, that would help us.
(kill -11) and create a jira ticket, uploading the compressed coredump file.

Kamil_Holubicki · July 10, 2024, 4:54pm

THIS blogpost can be helpful.
When the server is stuck, just kill it with kill -11. It will cause core file creation.

shigaev.s · July 11, 2024, 7:16am

Hello,
I’ll try this if it fails.

shigaev.s · July 23, 2024, 2:33pm

Hello, @matthewb
when is version 8.0.37 expected?
The other day there was this error:

[MY-000000] [WSREP] MDL conflict db= table= ticket=10 solved by abort

arose in a production environment, but this time during SST, when one of the nodes fell with OOM.
This didn’t happen before. The cluster version there is still 8.0.33.
Stopping the frozen node also helped after bootstrap.

Topic		Replies	Views
Another 'MDL conflict db= table= ticket=10 solved by abort' issue Percona XtraDB Cluster 8.x	2	869	October 11, 2023
PXC8 locks when adding users to different nodes at the same time Percona XtraDB Cluster 8.x	3	626	April 1, 2022
Percona XtraDB cluster Ver 8.0.36-28.1 node crash [MY-000000] [WSREP] MDL BF-BF conflict Percona XtraDB Cluster 8.x	1	173	July 3, 2024
Percona Xtradb Cluster Crash Percona XtraDB Cluster 8.x mysql , percona	3	858	March 15, 2023
PXC 8.0 - nodes exit from the cluster after an MDL conflict Percona XtraDB Cluster 8.x	10	2800	July 4, 2022

PXC8.0 MDL conflict during operations CREATE/DROP USER or GRANT

Related topics