PXC 8.0.26 2nd node not starting

I created a cluster, the bootstrapped node has been up. Started second node the sst works, moving the data on the second node worked. Both have the exact same version mysql Ver 8.0.26-16.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel16, Revision b141904, WSREP version 26.4.3) It completes and is in sync with the other node on the cluster. Then it just shutdown.

In the error log I see a few warnings

2022-02-04T02:29:21.196662Z 0 [Warning] [MY-013245] [Server] The SSL library function CRYPTO_set_mem_functions failed. This is typically caused by the SSL library already being used. As a result the SSL memory allocation will not be instrumented.
2022-02-04T02:29:22.595468Z 4 [System] [MY-000000] [WSREP] PXC upgrade completed successfully
2022-02-04T02:29:22.711375Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2022-02-04T02:29:22.715006Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2022-02-04T02:29:22.802344Z 0 [Note] [MY-000000] [Galera] 1.0 (DB404): State transfer from 0.0 (DB403) complete.
2022-02-04T02:32:28.853625Z 1 [Note] [MY-000000] [Galera] Server DB404 synced with group
022-02-04T09:53:16.342103Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.26-16.1).
2022-02-04T09:53:16.342177Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN
2022-02-04T09:53:26.343096Z 0 [Note] [MY-000000] [WSREP] Shutdown replication

I did notice a message saying PXC upgrade complete. So I am assuming after the upgrade the node2 has been upgraded to the correct version.
However when I started the cluster earlier and it failed for not having a clean datadir. It had the below version which is the same version before the upgrade complete message.

2022-02-03T12:44:47.974519Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.26-16.1)  Percona XtraDB Cluster (GPL), Release rel16, Revision b141904, WSREP version 26.4.3.

On the donor node it just forgets the joiner node. No warnings, I get some Deleted page /db1/gcache.page.000040 messages then below.

2022-02-04T02:29:23.599187Z 0 [Note] [MY-000000] [Galera] Member 1.0 (DB404) synced with group.
2022-02-04T09:53:27.343871Z 0 [Note] [MY-000000] [Galera] forgetting b1517a88-ba9f (ssl://192.168.2.64:4567)
2022-02-04T09:53:27.343975Z 0 [Note] [MY-000000] [Galera] Node 3382bee4-891b state primary

1 Like

Hi @danarashad , thank you for posting to the Percona forums!

Something called DB404 to shutdown 7 and a half hours after joining the cluster successfully. That is longer than any timeout of systemd or other system level service. You will want to investigate what automation or tooling you have that might have triggered the shutdown. Check syslog for whomever may have stopped the service

1 Like

Checking the syslog, seems like everything is working then I hit a timeout. I have included the message from the syslog and the message from mysql error log as well as mysql.service file.

syslog

Feb  5 01:38:02 oceandb404 kernel: [195744.345168] audit: type=1400 audit(1644025082.644:75581): apparmor="ALLOWED" operation="unlink" profile="/usr/sbin/mysqld" name="/db1/#innodb_temp/temp_10.ibt" pid=25977 comm="mysqld" requested_mask="d" denied_mask="d" fsuid=115 ouid=115
Feb  5 01:38:03 oceandb404 mysql-systemd[28832]:  WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Feb  5 01:38:03 oceandb404 mysql-systemd[28832]:  WARNING: mysql may be already dead
Feb  5 01:38:03 oceandb404 systemd[1]: mysql.service: Failed with result 'timeout'.
Feb  5 01:38:03 oceandb404 systemd[1]: Failed to start Percona XtraDB Cluster.

mysql error log

2022-02-04T13:38:11.191859Z 2 [Note] [MY-000000] [Galera] Server DB404 synced with group
2022-02-04T13:38:11.191900Z 2 [Note] [MY-000000] [WSREP] Server status change joined -> synced
2022-02-04T13:38:11.191916Z 2 [Note] [MY-000000] [WSREP] Synchronized with group, ready for connections
2022-02-04T13:38:11.191929Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2022-02-05T01:37:46.342102Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.26-16.1).
2022-02-05T01:37:46.342167Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN
2022-02-05T01:37:56.342951Z 0 [Note] [MY-000000] [WSREP] Shutdown replication

mysql.service

[Unit]
Description=Percona XtraDB Cluster
After=network.target syslog.target local-fs.target remote-fs.target
Requires=local-fs.target remote-fs.target
[Install]
WantedBy=multi-user.target
Alias=mysqld.service
[Service]
Type=notify

TimeoutStartSec=0
TimeoutSec=43200 #Added by me during sst it kept timing out. 
PrivateNetwork=false
User=mysql
Group=mysql
CapabilityBoundingSet=CAP_IPC_LOCK CAP_DAC_OVERRIDE CAP_AUDIT_WRITE
PrivateDevices=false
ProtectSystem=full
ProtectHome=true
PermissionsStartOnly=true
EnvironmentFile=-/etc/default/mysql
ExecStartPre=/usr/bin/mysql-systemd start-pre
ExecStartPre=/usr/bin/mysql-systemd check-grastate
ExecStartPre=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
ExecStartPre=/bin/sh -c "VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1"
ExecStart=/usr/sbin/mysqld $_WSREP_START_POSITION
ExecStartPost=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID
ExecStop=/usr/bin/mysql-systemd stop
ExecStopPost=/usr/bin/mysql-systemd stop-post
ExecReload=/usr/bin/mysql-systemd reload
KillSignal=SIGTERM
Restart=on-abort
RestartSec=5s
RestartPreventExitStatus=SIGABRT
UMask=007
PrivateTmp=false
LimitNOFILE=16364
Nice=0
1 Like

When starting mysql it was attached to my tmux session. When the session was killed mysql would die. Which presented another issue, why was mysql not being pushed to the background after starting. Apparmor didn’t have the proper permissions set.

1 Like

Hi @danarashad ,
It’s great that you solved the issue.

Apparmor didn’t have the proper permissions set.

Is my understanding correct that the problem was AppArmor profile configuration for PXC? If yes, could you please create a bug ticket in jira.percona.com providing information what was missing?

1 Like

The admin refused to actually give me a answer. All he said was Apparmor permissions. I will keep bugging them to get a answer to help the community.

1 Like

wasted a day trying to get these nodes to connect after going to percona 8

the nodes just die with

I am thinking it is AppArmor as well , reinstalled Percona cluster packages dozens of times today
so what is the correct method for getting AppArmor to work ?
tried using new and existing profile during install

Screenshot from 2022-12-03 16-37-04

#dmesg
[628955.182264] audit: type=1400 audit(1670045048.076:28): apparmor="ALLOWED" operation="connect" profile="/usr/sbin/mysqld" name="/run/systemd/userdb/io.systemd.DynamicUser" pid=3088288 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=0 ouid=0

usr.sbin.mysqld.txt (2.1 KB)
usr.bin.wsrep_sst_xtrabackup-v2.txt (2.0 KB)

and after adding

wsrep_provider_options="socket.ssl=yes;socket.ssl_ca=/etc/mysql/certs/ca.pem;socket.ssl_cert=/etc/mysql/certs/server-cert.pem;socket.ssl_key=/etc/mysql/certs/server-key.pem"

I can not even get the node to boot strap

Time for yet another
apt remove --purge percona-xtradb-cluster-*
run my install script and
scp /var/lib/mysql/*.pem root@192.168.2.39:/var/lib/mysql
all over again

1 Like

Hi @MasterCATZ ,
AA profiles related to PXC are installed in complain mode. According to the documentation:
“In this mode security policy is not enforced but rather access violations are logged to the system log.” You can also disable completely AA profiles for PXC, but I don’t think it will help.
So I would search for the problem somewhere else.
BTW, I’m still not sure how PXC AA profiles could have been a cause of the initial problem described in this thread because the original reported was not able to provide solution details.

Please describe your problem in more details.

  1. Is it the 1st node that doesn’t start, or 1st node starts and the 2nd one fails to join?
  2. Please provide error logs from nodes
  3. Please provide the output of dmesg
  4. output of journalctl -xe
1 Like