Percona xtradb cluster crash problem and loss of Mysql services

Hi,

There was an outage on the Prod database recently. A check was performed and the database was found to be closed. When I want to open the database, “Failed to start mysql.service: Unit mysql.service is masked.” I encountered the error.
When I tried to open the database with the umask command, mysql main services were deleted from all nodes. “Failed to start mysql.service: Unit mysql.service not found.”

Then, I performed Reinstallation on all nodes and brought all nodes up with Bootstrap. I have no clue what caused this or why it was closed, so the log of the relevant database has been added for your review.

I don’t know if it is related to the problem or not, but I have an observation.

At the time of the problem, the errors shown below appeared in the log, now the same errors are starting to appear. I ask for your help to prevent any interruptions again.

2024-04-09T05:01:01.735511Z 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2024-04-09T05:01:04.799247Z 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2024-04-09T05:01:04.826104Z 0 [Warning] [MY-000000] [Galera] Handshake failed: wrong version number
2024-04-09T05:01:04.929789Z 0 [Warning] [MY-000000] [Galera] Handshake failed: unknown protocol
2024-04-09T05:01:05.968724Z 0 [Warning] [MY-000000] [Galera] Handshake failed: unknown protocol
2024-04-09T05:01:06.004839Z 0 [Warning] [MY-000000] [Galera] Handshake failed: version too low
2024-04-09T05:01:06.038211Z 0 [Warning] [MY-000000] [Galera] Handshake failed: version too low
2024-04-09T05:01:07.069188Z 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2024-04-09T05:01:08.102054Z 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2024-04-09T05:01:09.133675Z 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2024-04-09T05:01:10.169407Z 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol

Main node : NODE1_156

node3_error_158.log (623.6 KB)
node2_error_157.log (633.4 KB)
node1_error_156.log (1.2 MB)

Please read through MySQL’s error logs that you attached.

2023-09-26T11:23:50.656217Z 0 [ERROR] [MY-000067] [Server] unknown variable ‘encrypt=4’.
2023-09-26T11:23:50.656546Z 0 [ERROR] [MY-013236] [Server] The designated data directory /data/mysql/ is unusable. You can remove all files that the server added to it.

2023-09-22T08:20:11.487612Z 0 [Note] [MY-000000] [WSREP-SST] Trying to terminate (89210) socat -u openssl-listen:4444,reuseaddr,cert=/data/mysql/server-cert.pem,key=/data/mysql/server-key.pem,cafile=/data/mysql/ca.pem,verify=1,retry=30 stdio | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x with SIGTERM
2023-09-22T08:20:11.493857Z 0 [Note] [MY-000000] [WSREP-SST] Terminated
2023-09-22T08:20:12.497333Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2023-09-22T08:20:12.497468Z 0 [ERROR] [MY-000000] [WSREP-SST] Possible timeout in receving first data from donor in gtid/keyring stage

You’ve got some issue with your SSL certificates. I suggest you remove all the encryption/SSL settings and get your cluster back online first. Then try to bring SSL back into the config.