Not the answer you need?
Register and ask your own question!

Node got shutdown after certain period of inactivity

nitish54nitish54 Current User Role Contributor
Hi,
I have created 2-node cluster at the moment and tried to see how that works. I was able to start the server and see the replication happening for a while. After some period of inactivity on the node i.e. no updates or replication, both the nodes got shutdown. 
Logs:
2020-08-28T07:11:41.744142Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/run/mysqld/mysqlx.sock' bind-address: '::' port: 33060
..
2020-08-28T07:25:39.881942Z 0 [Note] [MY-000000] [Galera] forgetting 88fe372e (ssl://172.20.51.155:4567)
2020-08-28T07:25:39.882021Z 0 [Note] [MY-000000] [Galera] Node 9e7c4dd3 state primary
2020-08-28T07:25:39.882061Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,9e7c4dd3,3)
memb {
9e7c4dd3,0
}
joined {
}
left {
}
partitioned {
88fe372e,0
}
)
2020-08-28T07:25:39.882079Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2020-08-28T07:25:39.884838Z 0 [Note] [MY-000000] [Galera] forgetting 88fe372e (ssl://172.20.51.155:4567)
2020-08-28T07:25:39.884961Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2020-08-28T07:25:39.885432Z 0 [Note] [MY-000000] [Galera] STATE_EXCHANGE: sent state UUID: aca27403-e8ff-11ea-88e9-b6d98e7f0934
2020-08-28T07:25:39.885541Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: aca27403-e8ff-11ea-88e9-b6d98e7f0934
2020-08-28T07:25:39.885634Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: aca27403-e8ff-11ea-88e9-b6d98e7f0934 from 0 (ip-172-20-38-237.internal)
2020-08-28T07:25:39.885741Z 0 [Note] [MY-000000] [Galera] Quorum results:
version    = 6,
component  = PRIMARY,
conf_id    = 2,
members    = 1/1 (primary/total),
act_id     = 36,
last_appl. = 32,
protocols  = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = 33b6c06a-e6a7-11ea-979e-16d3d063f46c
2020-08-28T07:25:39.885877Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]
2020-08-28T07:25:39.886020Z 15 [Note] [MY-000000] [Galera] ####### processing CC 37, local, ordered
2020-08-28T07:25:39.886148Z 15 [Note] [MY-000000] [Galera] Drain monitors from 36 upto 36
2020-08-28T07:25:39.886238Z 15 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5, 3)
2020-08-28T07:25:39.886311Z 15 [Note] [MY-000000] [Galera] ####### My UUID: 9e7c4dd3-e8fd-11ea-8744-06996b25e5ed
2020-08-28T07:25:39.886395Z 15 [Note] [MY-000000] [Galera] ####### ST not required
2020-08-28T07:25:39.886462Z 15 [Note] [MY-000000] [Galera] Skipping cert index reset
2020-08-28T07:25:39.886525Z 15 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 36 -> 37
2020-08-28T07:25:39.886598Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2020-08-28T07:25:39.886671Z 15 [Note] [MY-000000] [Galera] ####### Setting monitor position to 37
2020-08-28T07:25:39.889719Z 15 [Note] [MY-000000] [Galera] Recording CC from group: 37
2020-08-28T07:25:39.889791Z 15 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 35
2020-08-28T07:25:39.889855Z 15 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 34
2020-08-28T07:25:39.889930Z 15 [Note] [MY-000000] [Galera] ================================================
View:
  id: 33b6c06a-e6a7-11ea-979e-16d3d063f46c:37
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(1):
0: 9e7c4dd3-e8fd-11ea-8744-06996b25e5ed, ip-172-20-38-237.internal
=================================================
2020-08-28T07:25:39.890033Z 15 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2020-08-28T07:25:44.962366Z 0 [Note] [MY-000000] [Galera]  cleaning up 88fe372e (ssl://172.20.51.155:4567)
2020-08-28T07:26:46.107137Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN
2020-08-28T07:26:56.700485Z 0 [Note] [MY-000000] [WSREP] Shutdown replication
2020-08-28T07:26:56.700672Z 0 [Note] [MY-000000] [WSREP] Server status change synced -> disconnecting
2020-08-28T07:26:56.700789Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2020-08-28T07:26:56.700896Z 0 [Note] [MY-000000] [Galera] Closing send monitor...
2020-08-28T07:26:56.700984Z 0 [Note] [MY-000000] [Galera] Closed send monitor.
2020-08-28T07:26:56.701070Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2020-08-28T07:26:56.701165Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2020-08-28T07:26:56.701311Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
2020-08-28T07:26:56.701431Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2020-08-28T07:26:56.701526Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2020-08-28T07:26:56.701829Z 0 [Note] [MY-000000] [Galera] gcomm: closed
2020-08-28T07:26:56.701919Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE.
...

My wsrep_provider_options are initialized with
'socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem;evs.keepalive_period = PT3S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M'


No other special config change has been done.
Can anybody help me understand if these values are the reason of this shutdown? Is there any way to keep these two nodes operating all the time(no restrictions)? 
I am asking this because it has to be config issue as there is a graceful shutdown of both the nodes. Please let me know if any more information is needed here.
Tagged:

Answers

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    I would remove all of those wsrep_provider_options except the SSL ones. Changing the EVS.* parameters away from defaults is extremely rare and not normal. Why did you set these this way?

    You do indeed have a graceful shutdown; something else stopped your cluster. Check systemctl status to see if a decision was made there, or some other script sent a signal to stop mysql.

  • nitish54nitish54 Current User Role Contributor

    Hi Matt,

    I would remove all of those wsrep_provider_options except the SSL ones. Changing the EVS.* parameters away from defaults is extremely rare and not normal. Why did you set these this way?

    Added those configs to avoid the very problem of shutdown. I have removed them now but the problem still persists.

    Please find the logs on successful startup below:

    [email protected]:~# systemctl status mysql.service
    ● mysql.service - Percona XtraDB Cluster
         Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
         Active: activating (start-post) since Tue 2020-09-01 08:47:17 UTC; 1min 33s ago
        Process: 37543 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
        Process: 37595 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
        Process: 37624 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
        Process: 37626 ExecStartPre=/bin/sh -c VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_>
        Process: 37740 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
       Main PID: 37674 (mysqld); Control PID: 37742 (mysql-systemd)
         Status: "Server is operational"
          Tasks: 59 (limit: 4485)
         Memory: 724.6M
         CGroup: /system.slice/mysql.service
                 ├─37674 /usr/sbin/mysqld --wsrep_start_position=33b6c06a-e6a7-11ea-979e-16d3d063f46c:42
                 ├─37742 /bin/bash -ue /usr/bin/mysql-systemd start-post 37674
                 └─38040 sleep 1
    
    
    Sep 01 08:47:17 ip-172-20-38-237 systemd[1]: Starting Percona XtraDB Cluster...
    

    It gets down after exactly 15 minutes. Please check the systemctl status:

    [email protected]:~# systemctl status mysql.service
    ● mysql.service - Percona XtraDB Cluster
         Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
         Active: failed (Result: exit-code) since Tue 2020-09-01 12:00:56 UTC; 4min 40s ago
        Process: 40831 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
        Process: 40878 ExecStartPre=/usr/bin/mysql-systemd check-grastate (code=exited, status=0/SUCCESS)
        Process: 40907 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
        Process: 40909 ExecStartPre=/bin/sh -c VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_>
        Process: 40957 ExecStart=/usr/sbin/mysqld $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
        Process: 41021 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
        Process: 41023 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
        Process: 43786 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
       Main PID: 40957 (code=exited, status=0/SUCCESS)
         Status: "Server shutdown complete"
    
    
    Sep 01 11:45:33 ip-172-20-38-237 systemd[1]: Starting Percona XtraDB Cluster...
    Sep 01 12:00:42 ip-172-20-38-237 mysql-systemd[41023]:  ERROR!
    Sep 01 12:00:42 ip-172-20-38-237 systemd[1]: mysql.service: Control process exited, code=exited, status=1/FAILURE
    Sep 01 12:00:56 ip-172-20-38-237 mysql-systemd[43786]:  WARNING: mysql pid file /data/mysqld.pid empty or not readable
    Sep 01 12:00:56 ip-172-20-38-237 mysql-systemd[43786]:  WARNING: mysql may be already dead
    Sep 01 12:00:56 ip-172-20-38-237 systemd[1]: mysql.service: Failed with result 'exit-code'.
    Sep 01 12:00:56 ip-172-20-38-237 systemd[1]: Failed to start Percona XtraDB Cluster.
    

    Earlier I assumed that due to inactivity it is getting shutdown but every time it is getting down in 15 minutes which seems unrelated.

    Is there something I am missing?

  • nitish54nitish54 Current User Role Contributor

    Also, wanted to add with reference to this post that I have modified the following config:

    [mysqld_safe]
    pid-file            = /data/mysqld.pid
    log-error           = /var/log/mysql/error.log
    

    But already gave the required permission

    drwxr-x---  3 mysql mysql 4096 Sep 1 12:00 data
    

    Can you please tell me if this could be causing the issue in any way?

  • matthewbmatthewb Senior [email protected] Percona Staff Role
    edited September 1

    drwxr-x--- 3 mysql mysql 4096 Sep 1 12:00 data

    Possibly. You do not have global read/execute privs on that directory. How do you expect systemd to read the pid file if you denied permissions? Try changing that using chmod a+x /data

  • nitish54nitish54 Current User Role Contributor

    Got it, actually It was creating the pid file somewhere else at default location (/var/run/mysql/).

    Made to point to /data and gave required permission solves the issue of shutdown. Just a small query though when you install mysql we generally are running mysql as a daemon process. Is that config removed for PXC nodes? as I didn't find any file in /etc/init.d/mysqld

    Do we not need to run PXC nodes as a daemon process?

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    /etc/init.d/mysqld

    This tells me you are on an OS older than CentOS6/REHL6/Ubuntu12. You should consider upgrading your OS to something that is not going to be EOL'd soon.

    PXC is MySQL. So the scripts that start "mysql" also start PXC, because PXC is MySQL with additional features.

Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.