What causes this!? Percona server 8.0.31 unresponsive every day same time

So… we’re hitting a weird case the last week.

Our percona 8.0.31 server on ubuntu 22.04 (128G RAM) becomes unresponsive every day at the same time.

The symptoms are a bit weird too. Our usual connection count is about 30-40 connections. Then at 12:30 this starts to increase, or rather the reported number starts to increase. For the brief time we can monitor this, we can tell there are not any more active connections than usual.

Every day, at 12:30 exactly, ‘show processlist’ becomes unresponsive. Queries run for a few minutes more, but at 12:34 there’s nothing left to do but a hard kill. You can still connect to mysql at 12:30, but nothing works. (It just stalls.)

You’d say: easy - check the logs, crontabs, events, incoming connections, for anything that starts 12:30.

We did that:

  • There are no long running queries
  • There are no events running at 12:30
  • There are no cronjobs running at 12:30
  • There are no external connections (other then our usual webapplication) that execute at that time.

The logs show absolutely nothing. Mysql log shows nothing, syslog shows nothing.

What strikes me as weird is that ‘show processlist’ just stalls and never returns anything. Server does not crash or report anything.

This is a new server that has been migrated from another server running 8.0.30 just fine. Server is an ubuntu 22.04 server with 128G ram and pretty much nothing else runs on that machine. Mysql is limited to 70G, but tops out at 40G anyway.

Anyone any ideas? Our (unfortunate) next step would be to downgrade, but that requires a lot of work and has no garantuees to work.

Hi @crewone it seems like you are facing some sort of deadlock based on your description.

Are you able to collect a coredump when your server is stalled by either configuring linux build-in coredump or use Say Hello to Libcoredumper - A New Way to Generate Core Dumps, and Other Improvements - Percona Database Performance Blog .

This will help to validate if we have a chain of threads blocking each other.

1 Like

To be clear: You suggest killing mysql with a sigsegv signal instead of a sigkill (when it becomes unresponsive) and checking the coredump?

I will try this! thanks for the suggestion, to be continued.

Correct. That or run gcore -p $(pidof mysqld) to collect a coredump

1 Like

We have identified that what causes this crash.
We removed the offending line friday, since then the service has not crashed.
We re-introduced the line again today at another time - and to be sure mysql became completely unresponsive exactly at that time.

The code itself is too absurd, too simple to cause a complete mysql failure, yet it does.

This is the code, that ran in an non-interactive shell:

echo -n '' | nc {$host} 3306 | cut -d~ -f1

We have a coredump, but it is 70G.

Hi @crewone .

This is indeed a bug. I have raised it at [PS-8660] Server version 8.0.31 stalls querying I_S P_S when new connection arrives - Percona JIRA . IT was introduced on 8.0.31 and will be fixed as part of 8.0.32.