Percona XtraDB node crashing randomly without logs

Hi,

I’m having problems with a percona xtradb cluster 5.7.43
The nodes are all running as primary but just one of them is being used to write data at a time.
The Crashes have occurred after upgrading from 5.6.x to 5.7 and are totally random from what I can see.
It’s always the node that has the “master ip” that crashes, usually once a week.
I can restart the mysql service and the node just runs again.
I have looked into the logs and they are empty (error.log). I have already checked if its an OOM kill and its not the case.
I have also installed PMM and can see no unusual behavior.

The Database itself is less than 10GB and the nodes each have 6GB RAM and 2Vcpus so that shouldnt be a problem either.

Im thankful for any sort of help troubleshooting the issue.

I would recommend first fixing the error logging. If the error log is correctly configured, it should have a normal mysqld startup and shutdown log.

Once the error logging is fixed and whenever the next crash happens, we will have some crash details logged inside the error log, which will help further check the crash issue.

Hi @Abhinav_Gupta
Maybe I put it wrong, the error.log does hold information, for example on startup of the mysql service.
What I mean is that when the mysql service crashes theres no information in the error log
(as if it was killed by OOM killer)

@doctober,
That sounds correct. When OOM kills mysql, there isn’t information that I recall seeing in mysql’s log. You should look through dmesg and the journald logs for more information.

Hi,

like I’ve said, it wasn’t an OOM kill.
I have already looked through all available logs and dmesg and there isn’t any relevant information.
Could it be a bug of any sort?

@doctober,

Is the log_error_verbosity is also set to “3” in order to capture any useful details (warnings or messages as well).

May be still you can share the database error log files along with the OS/Kernel logs (/var/log/messages or dmesg -T) resembling the issue duration for a quick look over.

Did you able to capture any useful information like (show full processlist, show engine innodb status ) before the crash just to observe the queries/workload pattern ?

It’s always the node that has the “master ip” that crashes, usually once a week.

It could be some kind of workload that triggering the crash. Is there any particular job or process you ran before such happenings ?

Regards,
Anil

@anil.joshi

The log_error_verbosity is already set to “3”, there haven’t been any errors so far.
The last crash was about 1 week ago and since then everything has been going smooth.
“show full processlist and show engine innodb” show little workloads. I also have pmm-agent that captures information. In the PMM dashboard I can observe no anomalies and a very steady load.
I can share more detailed information if you tell me what you would like to see.
As far as I know there is no job/process that ran, that could’ve crashed MySQL.

I have looked into the syslog and kernel log and they show nothing that could have affected the MySQL process (there were no OOM kills or jobs/processes to affect MySQL).

I’m planning to upgrade the cluster from 5.7 to 8 soon and hope to solve the “issue”.

Thank you,
Simon

Hey @doctober

Thanks for confirming the details.

The last crash was about 1 week ago and since then everything has been going smooth.
“show full processlist and show engine innodb” show little workloads. I also have pmm-agent that captures information. In the PMM dashboard I can observe no anomalies and a very steady load.

We are glad to hear the problem not recurring now and everything runs normal. Let us know if you face any further issues .

I’m planning to upgrade the cluster from 5.7 to 8 soon and hope to solve the “issue”.

Indeed there are lot of fixes happen in MySQL8 and certainly you would get some benefit and performance enhancements.

https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html
https://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html

As a side note before migrating to 8.0 please verify the impacting changes via upgrade checker utility to avoid any dependency issues later on.