Node got shutdown after certain period of inactivity

Hi,
I have created 2-node cluster at the moment and tried to see how that works. I was able to start the server and see the replication happening for a while. After some period of inactivity on the node i.e. no updates or replication, both the nodes got shutdown. 
Logs:
2020-08-28T07:11:41.744142Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: ‘/var/run/mysqld/mysqlx.sock’ bind-address: ‘::’ port: 33060…2020-08-28T07:25:39.881942Z 0 [Note] [MY-000000] [Galera] forgetting 88fe372e (ssl://172.20.51.155:4567)2020-08-28T07:25:39.882021Z 0 [Note] [MY-000000] [Galera] Node 9e7c4dd3 state primary2020-08-28T07:25:39.882061Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this nodeview (view_id(PRIM,9e7c4dd3,3)memb { 9e7c4dd3,0 }joined { }left { }partitioned { 88fe372e,0 })2020-08-28T07:25:39.882079Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk2020-08-28T07:25:39.884838Z 0 [Note] [MY-000000] [Galera] forgetting 88fe372e (ssl://172.20.51.155:4567)2020-08-28T07:25:39.884961Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 12020-08-28T07:25:39.885432Z 0 [Note] [MY-000000] [Galera] STATE_EXCHANGE: sent state UUID: aca27403-e8ff-11ea-88e9-b6d98e7f09342020-08-28T07:25:39.885541Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: aca27403-e8ff-11ea-88e9-b6d98e7f09342020-08-28T07:25:39.885634Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: aca27403-e8ff-11ea-88e9-b6d98e7f0934 from 0 (ip-172-20-38-237.internal)2020-08-28T07:25:39.885741Z 0 [Note] [MY-000000] [Galera] Quorum results: version    = 6, component  = PRIMARY, conf_id    = 2, members    = 1/1 (primary/total), act_id     = 36, last_appl. = 32, protocols  = 2/10/4 (gcs/repl/appl), vote policy= 0, group UUID = 33b6c06a-e6a7-11ea-979e-16d3d063f46c2020-08-28T07:25:39.885877Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]2020-08-28T07:25:39.886020Z 15 [Note] [MY-000000] [Galera] ####### processing CC 37, local, ordered2020-08-28T07:25:39.886148Z 15 [Note] [MY-000000] [Galera] Drain monitors from 36 upto 362020-08-28T07:25:39.886238Z 15 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5, 3)2020-08-28T07:25:39.886311Z 15 [Note] [MY-000000] [Galera] ####### My UUID: 9e7c4dd3-e8fd-11ea-8744-06996b25e5ed2020-08-28T07:25:39.886395Z 15 [Note] [MY-000000] [Galera] ####### ST not required2020-08-28T07:25:39.886462Z 15 [Note] [MY-000000] [Galera] Skipping cert index reset2020-08-28T07:25:39.886525Z 15 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 36 -> 372020-08-28T07:25:39.886598Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.2020-08-28T07:25:39.886671Z 15 [Note] [MY-000000] [Galera] ####### Setting monitor position to 372020-08-28T07:25:39.889719Z 15 [Note] [MY-000000] [Galera] Recording CC from group: 372020-08-28T07:25:39.889791Z 15 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 352020-08-28T07:25:39.889855Z 15 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 342020-08-28T07:25:39.889930Z 15 [Note] [MY-000000] [Galera] ================================================View:  id: 33b6c06a-e6a7-11ea-979e-16d3d063f46c:37  status: primary  protocol_version: 4  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO  final: no  own_index: 0  members(1): 0: 9e7c4dd3-e8fd-11ea-8744-06996b25e5ed, ip-172-20-38-237.internal=================================================2020-08-28T07:25:39.890033Z 15 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.2020-08-28T07:25:44.962366Z 0 [Note] [MY-000000] [Galera]  cleaning up 88fe372e (ssl://172.20.51.155:4567)2020-08-28T07:26:46.107137Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN2020-08-28T07:26:56.700485Z 0 [Note] [MY-000000] [WSREP] Shutdown replication2020-08-28T07:26:56.700672Z 0 [Note] [MY-000000] [WSREP] Server status change synced -> disconnecting2020-08-28T07:26:56.700789Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.2020-08-28T07:26:56.700896Z 0 [Note] [MY-000000] [Galera] Closing send monitor…2020-08-28T07:26:56.700984Z 0 [Note] [MY-000000] [Galera] Closed send monitor.2020-08-28T07:26:56.701070Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread2020-08-28T07:26:56.701165Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread2020-08-28T07:26:56.701311Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend2020-08-28T07:26:56.701431Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 02020-08-28T07:26:56.701526Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this nodeview ((empty))2020-08-28T07:26:56.701829Z 0 [Note] [MY-000000] [Galera] gcomm: closed2020-08-28T07:26:56.701919Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE.


My wsrep_provider_options are initialized with

'socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem;evs.keepalive_period = PT3S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M'



No other special config change has been done.
Can anybody help me understand if these values are the reason of this shutdown? Is there any way to keep these two nodes operating all the time(no restrictions)? 
I am asking this because it has to be config issue as there is a graceful shutdown of both the nodes. Please let me know if any more information is needed here.

I would remove all of those wsrep_provider_options except the SSL ones. Changing the EVS.* parameters away from defaults is extremely rare and not normal. Why did you set these this way?

You do indeed have a graceful shutdown; something else stopped your cluster. Check systemctl status to see if a decision was made there, or some other script sent a signal to stop mysql.

Hi Matt,

I would remove all of those wsrep_provider_options except the SSL ones. Changing the EVS.* parameters away from defaults is extremely rare and not normal. Why did you set these this way?

Added those configs to avoid the very problem of shutdown. I have removed them now but the problem still persists.

Please find the logs on successful startup below:

root@ip-172-20-38-237:~# systemctl status mysql.service
● mysql.service - Percona XtraDB Cluster
     Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
     Active: activating (start-post) since Tue 2020-09-01 08:47:17 UTC; 1min 33s ago
    Process: 37543 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
    Process: 37595 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
    Process: 37624 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 37626 ExecStartPre=/bin/sh -c VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_>
    Process: 37740 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
   Main PID: 37674 (mysqld); Control PID: 37742 (mysql-systemd)
     Status: "Server is operational"
      Tasks: 59 (limit: 4485)
     Memory: 724.6M
     CGroup: /system.slice/mysql.service
             ├─37674 /usr/sbin/mysqld --wsrep_start_position=33b6c06a-e6a7-11ea-979e-16d3d063f46c:42
             ├─37742 /bin/bash -ue /usr/bin/mysql-systemd start-post 37674
             └─38040 sleep 1


Sep 01 08:47:17 ip-172-20-38-237 systemd[1]: Starting Percona XtraDB Cluster...

It gets down after exactly 15 minutes.Please check the systemctl status:

root@ip-172-20-38-237:~# systemctl status mysql.service
● mysql.service - Percona XtraDB Cluster
     Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2020-09-01 12:00:56 UTC; 4min 40s ago
    Process: 40831 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
    Process: 40878 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
    Process: 40907 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 40909 ExecStartPre=/bin/sh -c VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_>
    Process: 40957 ExecStart=/usr/sbin/mysqld $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 41021 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 41023 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
    Process: 43786 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
   Main PID: 40957 (code=exited, status=0/SUCCESS)
     Status: "Server shutdown complete"


Sep 01 11:45:33 ip-172-20-38-237 systemd[1]: Starting Percona XtraDB Cluster...
Sep 01 12:00:42 ip-172-20-38-237 mysql-systemd[41023]:  ERROR!
Sep 01 12:00:42 ip-172-20-38-237 systemd[1]: mysql.service: Control process exited, code=exited, status=1/FAILURE
Sep 01 12:00:56 ip-172-20-38-237 mysql-systemd[43786]:  WARNING: mysql pid file /data/mysqld.pid empty or not readable
Sep 01 12:00:56 ip-172-20-38-237 mysql-systemd[43786]:  WARNING: mysql may be already dead
Sep 01 12:00:56 ip-172-20-38-237 systemd[1]: mysql.service: Failed with result 'exit-code'.
Sep 01 12:00:56 ip-172-20-38-237 systemd[1]: Failed to start Percona XtraDB Cluster.

Earlier I assumed that due to inactivity it is getting shutdown but every time it is getting down in 15 minutes which seems unrelated.

Is there something I am missing?

Also, wanted to add with reference to this postthat I have modified the following config:

[mysqld_safe]
pid-file            = /data/mysqld.pid
log-error           = /var/log/mysql/error.log
But already gave the required permission 
drwxr-x---  3 mysql mysql 4096 Sep 1 12:00 data

Can you please tell me if this could be causing the issue in any way?

drwxr-x— 3 mysql mysql 4096 Sep 1 12:00 data

Possibly. You do not have global read/execute privs on that directory. How do you expect systemd to read the pid file if you denied permissions? Try changing that using chmod a+x /data

Got it, actually It was creating the pid file somewhere else at default location (/var/run/mysql/).

Made to point to /data and gave required permission solves the issue of shutdown. Just a small query though when you install mysql we generally are running mysql as a daemon process. Is that config removed for PXC nodes? as I didn’t find any file in /etc/init.d/mysqld

Do we not need to run PXC nodes as a daemon process?

/etc/init.d/mysqld

This tells me you are on an OS older than CentOS6/REHL6/Ubuntu12. You should consider upgrading your OS to something that is not going to be EOL’d soon.

PXC is MySQL. So the scripts that start “mysql” also start PXC, because PXC is MySQL with additional features.