Not the answer you need?
Register and ask your own question!

How to reset Percona XtraDB Cluster on all nodes?

porunovporunov ContributorCurrent User Role Beginner
Hello,

I have a problems with cluster testing. As documentation says I have installed 3 nodes and connect them. All work. After that I manually crashed all nodes. Then bring up two nodes (assuming that 3rd node is fully dead).

I am using CentOS 7 and after node restarts mysql service automatically starts (systemctl enable mysql).

I see next states on nodes:
systemctl status mysql
Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
Active: activating (start-post) since Thu 2016-12-08 01:38:33 EET; 1h 57min left
Process: 926 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 984 (mysqld_safe); : 985 (mysql-systemd)
CGroup: /system.slice/mysql.service
├─ 984 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
├─1566 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsr...
└─control
├─ 985 /bin/bash -ue /usr/bin/mysql-systemd start-post 984
└─4168 sleep 1


They are waiting for the 3rd node. But as I said I assume that it is fully crashed. So, I tried to manually restart first node as a new donor node.

First of all I tried to stop a node:
systemctl stop mysql

But it doesn't want to stop. I checked the status one more time:
Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) (Result: exit-code)
Process: 6950 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=2)
Process: 985 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
Process: 926 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 984 (mysqld_safe)
CGroup: /system.slice/mysql.service
├─ 984 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
└─1566 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --wsrep_start_position=43de3d74-bca8-11e6-a178-57b39b925285:9

Dec 08 01:38:33 GlusterDC1_1 systemd[1]: Starting Percona XtraDB Cluster...
Dec 08 01:38:35 GlusterDC1_1 mysqld_safe[984]: 2016-12-07T23:38:35.462867Z mysqld_safe Logging to '/var/log/mysqld.log'.
Dec 08 01:38:35 GlusterDC1_1 mysqld_safe[984]: 2016-12-07T23:38:35.708324Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Dec 08 01:38:35 GlusterDC1_1 mysqld_safe[984]: 2016-12-07T23:38:35.849329Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
Dec 08 01:38:35 GlusterDC1_1 mysqld_safe[984]: 2016-12-07T23:38:35.852782Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
Dec 07 23:53:41 GlusterDC1_1 mysql-systemd[985]: ERROR!
Dec 07 23:53:41 GlusterDC1_1 systemd[1]: mysql.service: control process exited, code=exited status=1
Dec 07 23:53:41 GlusterDC1_1 mysql-systemd[6950]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 07 23:53:41 GlusterDC1_1 mysql-systemd[6950]: ERROR! mysql already dead
Dec 07 23:53:41 GlusterDC1_1 systemd[1]: mysql.service: control process exited, code=exited status=2


Seems that something failed. Ok. I decided to disable mysql service and restart the node. After node has been restarted I used:
systemctl start [email protected]

To bootstrup a new cluster. But node 2 doesn't want to connect to it. Ok, I bring back node 3 and used:
SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

Seems that nothing works. I stopped all three nodes and now all nodes waiting for something and don't work.

How to reset all nodes and rejoin them together?

Sincerely,
Alexandr

Comments

  • [NUT][NUT] Contributor Current User Role Beginner
    My personal experience with PXC is that it does not like to be controlled using the systemd mechanism.

    Try starting the first (known good) node with:
    /etc/init.d/mysql bootstrap-pxc
    

    Once that is running, start any subsequent node with:
    /etc/init.d/mysql start
    

    The failed node might give a timeout while trying to start/reconnect (depends on how much data it needs to sync) but after the timeout it might still be running the sync, so check it before doing anything else with:
    ps aux | grep mysql
    

    Once it seems to have joined the cluster, it might actually be still 'failed' according to the systemd job status (with a timeout reason). To fix that you can issue a:
    /etc/init.d/mysql restart
    

    That (if something similar happens to me) is usually the way to fix it for me.
  • porunovporunov Contributor Current User Role Beginner
    Thank you for your suggestion! Unfortunately I have problems with bootstrapping. The problem is that it can't be bootstrap till mysql is alive. But I can't stop/kill mysql because it waiting for something. I tried different ways to stop mysql but without success. Here is what I tried:
    systemctl stop mysql

    it fails.

    killall -KILL mysql mysqld_safe mysqld

    it kills mysql for about 2 seconds and them mysql is recreated. So, I have no time to use:
    systemctl start [email protected]

    Also I tried
    kill -9 PID
    and
    kill -15 PID

    The same result as with "killall -KILL"

    Then I tried:
    /usr/local/mysql/bin/mysqladmin -u root -p shutdown

    It shows:
    mysqladmin: connect to server at 'localhost' failed
    error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)'
    Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists!

    I tried to start it with:
    systemctl start mysql

    But of course it doesn't work if the full cluster aren't connected together.

    Then I tried:
    /usr/bin/mysqld_multi stop

    None output..

    # /usr/bin/mysqld_safe stop
    2016-12-08T16:55:35.218919Z mysqld_safe Logging to '/var/log/mysqld.log'.
    2016-12-08T16:55:35.239877Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    2016-12-08T16:55:35.249570Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
    2016-12-08T16:55:35.250808Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
    2016-12-08T16:55:36.470775Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

    If we see log file. We will notice that after mysql stop it recovers (immortal mysql):

    2016-12-08T16:56:13.471971Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
    2016-12-08T16:56:13.471980Z 0 [Note] WSREP: restore pc from disk failed
    2016-12-08T16:56:13.472480Z 0 [Note] WSREP: GMCast version 0
    2016-12-08T16:56:13.472706Z 0 [Note] WSREP: (3a0efa51, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    2016-12-08T16:56:13.472717Z 0 [Note] WSREP: (3a0efa51, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    2016-12-08T16:56:13.473082Z 0 [Note] WSREP: EVS version 0
    2016-12-08T16:56:13.473168Z 0 [Note] WSREP: gcomm: bootstrapping new group 'pxc-cluster'
    2016-12-08T16:56:13.473208Z 0 [Note] WSREP: start_prim is enabled, turn off pc_recovery
    2016-12-08T16:56:13.473317Z 0 [ERROR] WSREP: Address already in use
    2016-12-08T16:56:13.473463Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 98: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'Address already in use': 98 (Address already in use)
    at gcomm/src/asio_tcp.cpp:listen():816
    2016-12-08T16:56:13.473479Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -98 (Address already in use)
    2016-12-08T16:56:13.473527Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1407: Failed to open channel 'pxc-cluster' at 'gcomm://192.168.0.120,192.168.0.121,192.168.0.122': -98 (Address already in use)
    2016-12-08T16:56:13.473537Z 0 [ERROR] WSREP: gcs connect failed: Address already in use
    2016-12-08T16:56:13.473543Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.0.120,192.168.0.121,192.168.0.122) failed: 7
    2016-12-08T16:56:13.473548Z 0 [ERROR] Aborting

    2016-12-08T16:56:13.473559Z 0 [Note] Forcefully disconnecting 0 remaining clients
    2016-12-08T16:56:13.473563Z 0 [Note] WSREP: Service disconnected.
    2016-12-08T16:56:14.474088Z 0 [Note] WSREP: Some threads may fail to exit.
    2016-12-08T16:56:14.474182Z 0 [Note] Binlog end
    2016-12-08T16:56:14.474392Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

    2016-12-08T16:56:14.497916Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
    2016-12-08T16:56:22.669108Z 0 [Note] WSREP: view((empty))
    2016-12-08T16:56:22.669398Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
    at gcomm/src/pc.cpp:connect():162
    2016-12-08T16:56:22.669439Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
    2016-12-08T16:56:22.669843Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1407: Failed to open channel 'pxc-cluster' at 'gcomm://192.168.0.120,192.168.0.121,192.168.0.122': -110 (Connection timed out)
    2016-12-08T16:56:22.669880Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
    2016-12-08T16:56:22.669900Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.0.120,192.168.0.121,192.168.0.122) failed: 7
    2016-12-08T16:56:22.669912Z 0 [ERROR] Aborting

    2016-12-08T16:56:22.669938Z 0 [Note] Forcefully disconnecting 0 remaining clients
    2016-12-08T16:56:22.669951Z 0 [Note] WSREP: Service disconnected.
    2016-12-08T16:56:23.673653Z 0 [Note] WSREP: Some threads may fail to exit.
    2016-12-08T16:56:23.673688Z 0 [Note] Binlog end
    2016-12-08T16:56:23.673800Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

    2016-12-08T16:56:23.689228Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
    2016-12-08T16:56:23.764524Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    2016-12-08T16:56:23.773417Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
    2016-12-08T16:56:23.774686Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
    2016-12-08T16:56:23.942531Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
    2016-12-08T16:56:23.943450Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.14-8-57-log) starting as process 20603 ...
    2016-12-08T16:56:23.945460Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=GlusterDC1_1-bin' to avoid this problem.
    2016-12-08T16:56:23.945632Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
    2016-12-08T16:56:23.945643Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera3/libgalera_smm.so'
    2016-12-08T16:56:23.948918Z 0 [Note] WSREP: wsrep_load(): Galera 3.17(r447d194) by Codership Oy <[email protected]> loaded successfully.
    2016-12-08T16:56:23.948967Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
    2016-12-08T16:56:23.949262Z 0 [Note] WSREP: Found saved state: 43de3d74-bca8-11e6-a178-57b39b925285:9
    2016-12-08T16:56:23.949709Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.0.120; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = fa
    2016-12-08T16:56:23.960759Z 0 [Note] WSREP: Service thread queue flushed.
    2016-12-08T16:56:23.960806Z 0 [Note] WSREP: Assign initial position for certification: 9, protocol version: -1
    2016-12-08T16:56:23.960822Z 0 [Note] WSREP: wsrep_sst_grab()
    2016-12-08T16:56:23.960827Z 0 [Note] WSREP: Start replication
    2016-12-08T16:56:23.960839Z 0 [Note] WSREP: Setting initial position to 43de3d74-bca8-11e6-a178-57b39b925285:9
    2016-12-08T16:56:23.960944Z 0 [Note] WSREP: protonet asio version 0
    2016-12-08T16:56:23.961054Z 0 [Note] WSREP: Using CRC-32C for message checksums.
    2016-12-08T16:56:23.961091Z 0 [Note] WSREP: backend: asio
    2016-12-08T16:56:23.961142Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0


    Also I tried:

    # /usr/bin/mysql-systemd stop
    SUCCESS! Stopping Percona XtraDB Cluster......

    It is just words which mysql prints they mean nothing because mysql is still alive and broken.

    # ps -ef | grep 'mysql'
    root 22823 1 0 19:02 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr --wsrep-new-cluster
    mysql 23236 22823 0 19:02 ? 00:00:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/galera3/libgalera_smm.so --wsrep-new-cluster --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --wsrep_start_position=43de3d74-bca8-11e6-a178-57b39b925285:9
    root 23519 3143 0 19:05 pts/0 00:00:00 grep --color=auto mysql
  • porunovporunov Contributor Current User Role Beginner
    If we look at status we see that mysql now is failed and we can not doing nothing with it.

    # systemctl status mysql
    ● mysql.service - Percona XtraDB Cluster
    Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
    Active: failed (Result: start-limit) since Thu 2016-12-08 18:56:26 EET; 10min ago
    Process: 20664 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
    Process: 20635 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=2)
    Process: 17926 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
    Process: 17925 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
    Process: 20823 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=1/FAILURE)
    Main PID: 17925 (code=exited, status=0/SUCCESS)

    Dec 08 18:56:25 GlusterDC1_1 systemd[1]: mysql.service: control process exited, code=exited status=1
    Dec 08 18:56:25 GlusterDC1_1 systemd[1]: Failed to start Percona XtraDB Cluster.
    Dec 08 18:56:25 GlusterDC1_1 systemd[1]: Unit mysql.service entered failed state.
    Dec 08 18:56:25 GlusterDC1_1 systemd[1]: mysql.service failed.
    Dec 08 18:56:26 GlusterDC1_1 systemd[1]: mysql.service holdoff time over, scheduling restart.
    Dec 08 18:56:26 GlusterDC1_1 systemd[1]: start request repeated too quickly for mysql.service
    Dec 08 18:56:26 GlusterDC1_1 systemd[1]: Failed to start Percona XtraDB Cluster.
    Dec 08 18:56:26 GlusterDC1_1 systemd[1]: Unit mysql.service entered failed state.
    Dec 08 18:56:26 GlusterDC1_1 systemd[1]: mysql.service failed.
    Dec 08 19:04:17 GlusterDC1_1 systemd[1]: Stopped Percona XtraDB Cluster.


    It is 3rd day when I am trying to figure out how to simply stop Percona XtraDB Cluster. But it seems that only developers know how do it.
    Right now the best solution is to not start mysql after reboot. If it isn't started after reboot then yes, your solution works just fine. But for me it is critical to keep nodes running and recover Percona XtraDB Cluster without rebooting.

    Sincerely,
    Alexandr
  • porunovporunov Contributor Current User Role Beginner
    Here is also journal -xe when I try to bootstrap a new node:

    -- Unit [email protected] has begun starting up.
    Dec 08 18:33:18 GlusterDC1_1 mysql-systemd[11271]: State transfer in progress, setting sleep higher
    Dec 08 18:33:18 GlusterDC1_1 mysqld_safe[11270]: 2016-12-08T16:33:18.209277Z mysqld_safe Logging to '/var/log/mysqld.log'.
    Dec 08 18:33:18 GlusterDC1_1 mysqld_safe[11270]: 2016-12-08T16:33:18.229007Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    Dec 08 18:33:18 GlusterDC1_1 mysqld_safe[11270]: 2016-12-08T16:33:18.237268Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
    Dec 08 18:33:18 GlusterDC1_1 mysqld_safe[11270]: 2016-12-08T16:33:18.238587Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
    Dec 08 18:33:19 GlusterDC1_1 mysqld_safe[11270]: 2016-12-08T16:33:19.449360Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11271]: /usr/bin/mysql-systemd: line 137: kill: (11270) - No such process
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11271]: ERROR! mysqld_safe with PID 11270 has already exited: FAILURE
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: [email protected]: control process exited, code=exited status=1
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11722]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11722]: ERROR! mysql already dead
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: [email protected]: control process exited, code=exited status=2
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11751]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11751]: WARNING: mysql may be already dead
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap.
    -- Subject: Unit [email protected] has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit [email protected] has failed.
    --
    -- The result is failed.
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: Unit [email protected] entered failed state.
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: [email protected] failed.
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: [email protected] holdoff time over, scheduling restart.
    Dec 08 18:33:28 GlusterDC1_1 systemd[1]: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap...
    -- Subject: Unit [email protected] has begun start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit [email protected] has begun starting up.
    Dec 08 18:33:28 GlusterDC1_1 mysql-systemd[11825]: State transfer in progress, setting sleep higher
    Dec 08 18:33:28 GlusterDC1_1 mysqld_safe[11824]: 2016-12-08T16:33:28.736807Z mysqld_safe Logging to '/var/log/mysqld.log'.
    Dec 08 18:33:28 GlusterDC1_1 mysqld_safe[11824]: 2016-12-08T16:33:28.758482Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    Dec 08 18:33:28 GlusterDC1_1 mysqld_safe[11824]: 2016-12-08T16:33:28.767379Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
    Dec 08 18:33:28 GlusterDC1_1 mysqld_safe[11824]: 2016-12-08T16:33:28.768738Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
    Dec 08 18:33:29 GlusterDC1_1 mysqld_safe[11824]: 2016-12-08T16:33:29.985157Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
    Dec 08 18:33:38 GlusterDC1_1 mysql-systemd[11825]: /usr/bin/mysql-systemd: line 137: kill: (11824) - No such process
    Dec 08 18:33:38 GlusterDC1_1 mysql-systemd[11825]: ERROR! mysqld_safe with PID 11824 has already exited: FAILURE
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: [email protected]: control process exited, code=exited status=1
    Dec 08 18:33:38 GlusterDC1_1 mysql-systemd[12276]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
    Dec 08 18:33:38 GlusterDC1_1 mysql-systemd[12276]: ERROR! mysql already dead
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: [email protected]: control process exited, code=exited status=2
    Dec 08 18:33:38 GlusterDC1_1 mysql-systemd[12305]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
    Dec 08 18:33:38 GlusterDC1_1 mysql-systemd[12305]: WARNING: mysql may be already dead
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap.
    -- Subject: Unit [email protected] has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit [email protected]ice has failed.
    --
    -- The result is failed.
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: Unit [email protected] entered failed state.
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: [email protected] failed.
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: [email protected] holdoff time over, scheduling restart.
    Dec 08 18:33:38 GlusterDC1_1 systemd[1]: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap...
    -- Subject: Unit [email protected] has begun start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit [email protected] has begun starting up.
    Dec 08 18:33:39 GlusterDC1_1 mysql-systemd[12382]: State transfer in progress, setting sleep higher
    Dec 08 18:33:39 GlusterDC1_1 mysqld_safe[12381]: 2016-12-08T16:33:39.210703Z mysqld_safe Logging to '/var/log/mysqld.log'.
    Dec 08 18:33:39 GlusterDC1_1 mysqld_safe[12381]: 2016-12-08T16:33:39.235482Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    Dec 08 18:33:39 GlusterDC1_1 mysqld_safe[12381]: 2016-12-08T16:33:39.245788Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
    Dec 08 18:33:39 GlusterDC1_1 mysqld_safe[12381]: 2016-12-08T16:33:39.247441Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
    Dec 08 18:33:40 GlusterDC1_1 mysqld_safe[12381]: 2016-12-08T16:33:40.475170Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12382]: /usr/bin/mysql-systemd: line 137: kill: (12381) - No such process
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12382]: ERROR! mysqld_safe with PID 12381 has already exited: FAILURE
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: [email protected]: control process exited, code=exited status=1
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12833]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12833]: ERROR! mysql already dead
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: [email protected]: control process exited, code=exited status=2
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12862]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12862]: WARNING: mysql may be already dead
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap.
    -- Subject: Unit [email protected] has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit [email protected] has failed.
    --
    -- The result is failed.
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: Unit [email protected] entered failed state.
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: [email protected] failed.
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: [email protected] holdoff time over, scheduling restart.
    Dec 08 18:33:49 GlusterDC1_1 systemd[1]: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap...
    -- Subject: Unit [email protected] has begun start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit [email protected] has begun starting up.
    Dec 08 18:33:49 GlusterDC1_1 mysql-systemd[12936]: State transfer in progress, setting sleep higher
    Dec 08 18:33:49 GlusterDC1_1 mysqld_safe[12935]: 2016-12-08T16:33:49.733041Z mysqld_safe Logging to '/var/log/mysqld.log'.
    Dec 08 18:33:49 GlusterDC1_1 mysqld_safe[12935]: 2016-12-08T16:33:49.753820Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    Dec 08 18:33:49 GlusterDC1_1 mysqld_safe[12935]: 2016-12-08T16:33:49.763018Z mysqld_safe Skipping wsrep-recover for 43de3d74-bca8-11e6-a178-57b39b925285:9 pair
    Dec 08 18:33:49 GlusterDC1_1 mysqld_safe[12935]: 2016-12-08T16:33:49.764291Z mysqld_safe Assigning 43de3d74-bca8-11e6-a178-57b39b925285:9 to wsrep_start_position
    Dec 08 18:33:50 GlusterDC1_1 mysqld_safe[12935]: 2016-12-08T16:33:50.977916Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
  • [NUT][NUT] Contributor Current User Role Beginner
    You wrote so much I have to ask, just to make sure:
    1. Do you use a service watchdog? If so, is it still controlling MySQL?
    2. You say 'bootstrap a new node', I hope you do not start it with the bootstrap keyword in the commandline? Each node which should join an already existing cluster should be started as if it was a regular MySQL, only the very FIRST node needs the bootstrap. You actually create a NEW cluster with every node you start with 'bootstrap-pxc'.
    3. The failed starts (by the systemd and it's own init.d script both!) can occur because of the sync running and taking longer then expected, I said that before, it's true.
    4. If you have the luxury of starting over (without any databases to sync), it's probably the easiest and fastest way to troubleshoot.
    5. A shutdown will halt (and mysql will get restarted from what I have read and seen) by the wsrep cluster if the mysql node is seen as 'Donor' to the other nodes when a sync is still in progress, this is to protect data integrity, the whole reason for a cluster like PXC to be installed and used... it's simply doing it's job ;)
    6. Don't cycle reboots/restarts too fast, it will disrupt the sync and master elections, which take some time to complete. (this is from my personal experience, I have been fighting with PXC and the differences in managing it before myself.) Give the nodes at least 5 minutes to complete all tasks if a cluster is running normally while rebooting/restarting one node, only then reboot/restart the next node if needed. Extend this time with a minute for every 15 minutes of downtime of the node.
    It might be wise to put list item 4 to use if you can. It makes it a whole lot easier to troubleshoot issues and the startup/shutdown of nodes takes less time.
  • porunovporunov Contributor Current User Role Beginner
    1. No. I have a bare CentOS 7 Minimal installation with Percona XtraDB Cluster on it.
    2. I used bootstrap only for the first node. After that I don't bootstrap (I.e. when a cluster already running I don't use it).
    3. How to kill sync? After node starts it stays in a 'Activating' state for 2 hours if we try to stop it (systemctl stop mysql) it stays in 'deactivating' state infinity time. When it is in 'deactivating' state we can't do anything except node reboot (i.e. 'systemctl start mysql', 'systemctl stop mysql', kill -9 PID_OF_MYSQL doesn't work).
    4. Sometims sync doesn't work. If a node s crashed (power off) and then is stated (power on) it can stay in 'activating' stay for infinity time. I tried this situation with 3 VM machines. I crashed one and then booted it and it said that it is in 'activating' state (it was in the morning). Then in the midday I tried to check if it is already synk but it wasn't still in 'activating' state. You may think that it is because very huge database but it isn't. I have only one database with several tables (about 10) each table has from 0 to 20 rows.
    6. I see. But there is can be situation when only one node is reboted but after reboot it don't want to sync.

    Do you know a method how to comleatelly kill mysql without rebooting?
    Right now mysql is started automatically (systemctl enable mysql) but sometimes it don't want to connect to the cluster. For me the only possible way to connect all the cluster togather is the next:
    1. On any node:
    systemctl disable mysql
    shutdown now
    2. On other nodes:
    shutdown now
    3. Start a node where mysql is disabled and use:
    systemctl start [email protected]
    systemctl enable mysql
    4. Start other nodes.

    The problem is that I need to restart all the nodes (not kill mysql and restart it but restart all my nodes where PXC is installed).
    If I were able to completely stop mysql that I would able to just stop all PXC cluster and create a new one. But other services wouldn't be restarted. Do you know a way how to just stop mysql when it is in 'activating' or 'deactivating' state?
  • [NUT][NUT] Contributor Current User Role Beginner
    To answer the question in your last answer first:
    • disable the use of systemd (systemctl), PXC will not work properly with that (from my personal experience).
    • only use /etc/init.d/mysql to bootstrap and start/stop nodes.
    • only use /etc/init.d/mysql status to check it's status (yes, I know it displays an output similar to the systemd status, but while that might show you mysql being down, the init.d script will show you it's active, it really does a check for status differently.

    To reply to your other answers to my questions:
    1. good.
    2. Ok, clear (just had to ask, to be sure)
    3. sync is not to be killed, it syncs because it is out of sync, why would you want to kill it in the first place? If you kill the sync, the cluster nodes will run out of sync with different datasets, which makes no sense, you should run multiple standalone database servers if you want that.
    4. Correct, but only when the other 2 nodes are in disarray this will happen. If your cluster is humming along happily when this happens, it should actually fence the broken node, kill it's data if a regular sync is not an option (due to too much garbled data) and sync to the status of the other nodes. Only when it finally gets to the state the other nodes are in will it get unfenced and online and active again. Sometimes this will actually fail, the only way around that is to kill all the data on this node (empty the mysql directory) and do a mysql_install_db (or mysqld --initialize) and mysql_secure_installation to start fresh and have it connect to the cluster and start the sync to the cluster as if it's a completely new node.
    5. (where is the answer to the actual 4, 4 is the answer to my 5?)
    6. see my answer to 3.

    I hope you'll figure it out soon, I know how frustrating these things can be...
  • porunovporunov Contributor Current User Role Beginner
    Thank you for your help!

    I had looked in '/usr/lib/systemd/system/mysql.service' and found an interesting thing at the bottom:

    # Unsafe for PXC
    # mysqld_safe handles this too.
    Restart=always

    I don't know why it is enabled in PXC installation if it is unsafe but I commented it:
    #Restart=always

    After that I rebooted my machines. nothing changed, yes. But now I can easily kill my mysql process if something happens and start it again! So, now I don't need to restart machines at all. If something happens (stuck in 'activating' state) I can stop/kill mysql and start it again!

    Sincerely,
    Alexandr
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.