Percona XtraDB and Magento cron job failures

Hi,

I am trying to get Magento to reliably work with a newly installed Percona XtraDB 3 node cluster. Specifics are:

Magento 2.4.5
percona-xtradb-cluster 1:8.0.29-21-1.focal
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal

The configuration I have set up does currently work and Magento mostly runs fine. However I am seeing a lot of errors being reported in the cron functionality of magento and the cron tasks are not running properly.

In the Percona XtraDB logs I see various indications of some problems:

2022-09-30T13:26:15.079423Z 95974 [Warning] [MY-000000] [WSREP] Percona-XtraDB-Cluster doesn’t recommend use of GET_LOCK with pxc_strict_mode = PERMISSIVE
2022-09-30T13:26:15.079468Z 95974 [Note] [MY-000000] [WSREP] MDL conflict db= table=my-prefix|cron_703fd9d88e47d6ee07c6d281af86adb5 ticket=10 solved by abort
2022-09-30T13:26:15.080297Z 95974 [Warning] [MY-000000] [WSREP] Percona-XtraDB-Cluster doesn’t recommend use of RELEASE_LOCK with pxc_strict_mode = PERMISSIVE

Now, the GET_LOCK and RELEASE_LOCK warnings I am familiar with and have always had to set the cluster in a PERMISSIVE state (SET GLOBAL pxc_strict_mode=PERMISSIVE; on each node). I realise this is not ideal, but have never found a better solution for this.

What is more concnering though is the

2022-09-30T13:26:15.079468Z 95974 [Note] [MY-000000] [WSREP] MDL conflict db= table=my-prefix|cron_703fd9d88e47d6ee07c6d281af86adb5 ticket=10 solved by abort

Note - I am suspicious that this may be the cause of the errors reported in the magento cron result responses.

I have tried setting the wsrep_debug variable (SET GLOBAL wsrep_debug=SERVER;) on the main node that I have magento talking to (although I typically have a load balanced configuration with haproxy I have set up the configuration so that magento always talks to one specific node and the other nodes are considered backups in case the primary goes down). Now this does generate a lot of extra information but I am not sure what this is telling me!

I get quite afew log entried like this:
2022-09-30T13:32:30.177531Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:30.178397Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)

and also:
2022-09-30T13:32:14.519241Z 96336 [Note] [MY-000000] [WSREP] Thread holds MDL locks at TOI begin: ANALYZE TABLE xtento_xtcore_config_data 96336
2022-09-30T13:32:14.519299Z 96336 [Note] [MY-000000] [WSREP] Executing Query (ANALYZE TABLE xtento_xtcore_config_data) with write-set (-1) and exec_mode: local in TO Isolation mode
2022-09-30T13:32:14.519353Z 96336 [Note] [MY-000000] [WSREP] wsrep: initiating TOI for write set (-1)
2022-09-30T13:32:14.519393Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.519761Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.521223Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.521250Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.522364Z 96336 [Note] [MY-000000] [WSREP] Query (ANALYZE TABLE xtento_xtcore_config_data) with write-set (91523284) and exec_mode: toi replicated in TO Isolation mode
2022-09-30T13:32:14.522381Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.522456Z 96336 [Note] [MY-000000] [WSREP] wsrep: TO isolation initiated for write set (91523284)
2022-09-30T13:32:14.522921Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.523904Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.525028Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.525400Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.526672Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.526891Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.528034Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.528181Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.529697Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.529912Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)
2022-09-30T13:32:14.531318Z 93824 [Note] [MY-000000] [WSREP] wsrep_commit_empty(93824)
2022-09-30T13:32:14.531378Z 96336 [Note] [MY-000000] [WSREP] TO END: 91523284: ANALYZE TABLE xtento_xtcore_config_data
2022-09-30T13:32:14.531403Z 96336 [Note] [MY-000000] [WSREP] wsrep: completed TOI write set (91523284)
2022-09-30T13:32:14.531418Z 96336 [Note] [MY-000000] [WSREP] Setting WSREPXid (InnoDB): 67833533-30bc-11ec-8c11-7e8198aaf81e:91523284
2022-09-30T13:32:14.531433Z 96336 [Note] [MY-000000] [WSREP] Updating WSREPXid: 67833533-30bc-11ec-8c11-7e8198aaf81e:91523284
2022-09-30T13:32:14.532072Z 92714 [Note] [MY-000000] [WSREP] wsrep_commit_empty(92714)

But I am unsure if this is helpful in diagnosing the issues at the moment.

If anyone could help with how I could proceed to debug / any ideas on how to fix these kind of issues it would be most appreciated.

Many thx!

1 Like

Reading a little more at:

Suggests I may be able to use:
SET GLOBAL pxc_strict_mode=MASTER;
on my main node, as long as the Magento application writes only to that node.

When I set that I no longer see the GET_LOCK and RELEASE_LOCK warnings and the doc suggests that the TABLE locks would not be validated by my main node when set to MASTER.

In this scenario, I am getting many of these errors reported in the log:
2022-09-30T14:32:03.849728Z 100308 [Note] [MY-000000] [WSREP] MDL conflict db= table=cron_8b73d49c9b3097b2c19067ed72dff619 ticket=10 solved by abort
2022-09-30T14:32:03.849876Z 100291 [Note] [MY-000000] [WSREP] MDL conflict db= table=cron_5d6e389185b7a7ff8ada1b70e5d5beec ticket=10 solved by abort
2022-09-30T14:32:03.851049Z 100308 [Note] [MY-000000] [WSREP] MDL conflict db= table=cron_8b73d49c9b3097b2c19067ed72dff619 ticket=10 solved by abort
2022-09-30T14:32:03.852351Z 100308 [Note] [MY-000000] [WSREP] MDL conflict db= table=cron_8b73d49c9b3097b2c19067ed72dff619 ticket=10 solved by abort
2022-09-30T14:32:03.853605Z 100308 [Note] [MY-000000] [WSREP] MDL conflict db= table=cron_8b73d49c9b3097b2c19067ed72dff619 ticket=10 solved by abort

Which I suspect is the cause of the Magento cron jobs failing. I realise that this is how the cluster is ensuring consistency across the nodes by aborting the transaction, but how to fix this issue so I can get the Magento cron working correctly?

Any help gratefully received,
Many thx!

1 Like

Why are you running so many ANALYZE TABLE? That isn’t something to be ran on a regular basis. And as you noted, you shouldn’t be using GET LOCK with PXC.

1 Like

Hi matthewb,

Many thx for replying to this!

This is the Magento (php) code that is generating these queries. I am not in control of that application so cannot tell you why or when Magento does this. I also cannot prevent Magento issuing those GET_LOCK queries either - it is embedded into the application code.

I guess the real question for me is what should I do about this to be able to reliably use the XtraDB cluster with Magento (as I have for approx 5 years before this latest upgrade!).

If you have any ideas, they’d be most welcome!

1 Like

Magento is not designed to operate against a cluster like PXC or even Group Replication. I would ensure that all Magento connections to your database are to a single instance.

You could switch out HAProxy for ProxySQL which would allow you to block specific queries and just always return true.

You could also try setting wsrep_osu_method=NBO to improve metadata locks.

I would also reach out to Magento support (since you pay for a license) and ask them what are their recommended settings for PXC.

1 Like

Thx matthewb for your reply.

So are you suggesting that Magento can only run reliably against a single node MySQL DB? I have had it running for the last 5 years against a Percona XtraDB cluster and not had these cron issues I am reporting here.

I will try that setting you suggest, just in case it helps though so thanks for that suggestion.

Did you see my other post at:

where I ask about wsrep_convert_lock_to_trx - it would be really useful to get your view on that.

Thanks for your responses though, some ideas / insight into this issue is of course very welcome!

1 Like

I’m saying that the designers of Magento have not written the code in such a way that it can work properly against PXC. Most notably GET_LOCK, and the constant ANALYZE TABLE. Neither one of those are best practices against a cluster.

Were you not originally on 5.7? MySQL 8 changed some things, as did PXC 8 with the introduction of Galera 4. This again goes to my point above that the maintainers of Magento are not coding with clusters in mind, nor does it appear they are paying attention to database changes that may impact their software. (From what you are describing)

If you can isolate all traffic from Magento to a single server, you will probably have a better time.

1 Like

Profile - matthewb - Percona Community Forum Thx again for the useful reply on this thread.

Totally hear what you are saying here - Magento hasn’t been coded in a way that has been optimized for cluster DB deployments; however I am sure there are people out there that have solved this in some way.

Yes, you are correct - I had a 5.7 Percona XtraDB cluster that has been working more or less fine, and these latest issues have occurred as a consequence of trying to get a modern 8.0 setup in place.

I can of course set things up with a standalone Percona MySQL instance and I am sure everything will work fine, albeit with the limited scaling (ie only vertical scaling on the DB node) that this would impose.

My last question for you is this - I can easily ensure that the Magento software ONLY talks to one node of the XtraDB cluster - would there be any way to get a configuration that would work in this case? I have tried, but still get the same errors (because of course it is still am XtraDB cluster, even though I am talking only to one node.

It sounds like my best option is going to just have a single node Percona instance, unless you have any other ideas to try,

Thx again for your replies to my questions, very helpful!

1 Like

I would suggest looking at ProxySQL instead of HAProxy. ProxySQL is a layer-7 MySQL proxy that “speaks” MySQL protocol and can filter/route based on the actual query itself. You can also use it to firewall/block those ‘ANALYZE TABLE’ commands.

You won’t be able to get around the GET_LOCK() as that appears to be core to Magento’s functionality, but ProxySQL does detect it’s use and routes correctly. You’ll probably have to stay in PERMISSIVE mode and ignore the constant warnings, to be honest (unless there’s some option in magento to not use database locks).

I don’t see any errors in your logs above. What I see are ‘Note’ level (informational messages, not bad/error conditions). Make sure any tables that Magento is creating are InnoDB engine (should be the system-wide default, but just making sure Magento isn’t overriding)

1 Like

Profile - matthewb - Percona Community Forum thanks for the reply, I will have a look at using ProxySQL as an option as it sounds like that definitely gives some more room for configuration of routing.
Yes, I realise that these are “Note” level log items, but the cluster is coping with the situation by dropping/aborting the operation - fine for the cluster, but not fine for Magento as it doesn’t get done what was required!
And yes, I have checked the engine on the Magento tables and they definitely are all InnoDB so that is at least OK.
I am having to revert to a single Perocna node for now, but I haven’t given up on getting Magento working with the 8.0 XtraDB cluster quite yet; I will post here any solutions if I can get things to a reliable state in the future in the hope it will help others trying to get this configuration working.

Thx for your replies to my questions.

1 Like

Can you find out specifically what isn’t working? Can you see from the Magento logs what queries are failing?

1 Like