Not the answer you need?
Register and ask your own question!

MySQL stops handling requests when restarting mysql on other nodes --- donor/desync

shockwavecsshockwavecs ContributorInactive User Role Beginner
In our cluster, a node will experience an issue from time to time. When this happens, nodes 2 and 3 will crash resulting in a
ERROR! MySQL (Percona XtraDB Cluster) is not running, but PID file exists

If I restart mysql on the failed nodes, our Node 1 will no longer service mysql requests. Node 1 will show
| wsrep_local_state_comment  | Donor/Desynced
until Node 2 and Node 3 receive updates. After this, MySQL is OK.

This is an issue because I must wait until late at night to restart nodes 2 and 3 to allow our website to function.

Comments

  • zmahomedyzmahomedy Entrant Inactive User Role Beginner
    I think you are having a split brain situation
    The rule is that after any kind of failure, a galera node will
    consider itself part of the primary partition if it can still see a
    majority of the nodes that were in the cluster before the failure.
    Majority > 50%.
    So if you have 3 nodes and one goes away, the 2 remaining are fine.
    If you have 3 nodes and 2 go away simultaneously, the 1 remaining must
    assume it is the one having problems and will go offline.
    If you have 3 nodes and 1 go way, then you have a 2 node cluster. This
    is not good. Now if any 1 node goes away, the other one alone is not
    in majority so it will have to be offline. The same is true if you
    have a 4 node cluster and simultaneously lose 2 nodes. Etc...
    But all is not lost. The node is still there, and if you as a human
    being know it is the right thing to do, then you can run some manual
    command to re-activate that node again (such as the command given by
    Haris, or just restart, etc...).

    There was a whole article on unknown commands and split brain situations on the perconas site but I cant seem to find it
    In order to restore the cluster, execute below command on the working node and it will establish this node to form the primary component again and restarting previously crashed nodes will join the cluster hopefully.
    mysql> SET GLOBAL wsrep_provider_options='pc.bootstrap=true';
    ariable wsrep_ready variable is set to 0)
  • shockwavecsshockwavecs Contributor Inactive User Role Beginner
    Hey zmahomedy,
    Thanks for the reply. Sorry, I worded it incorrectly.

    Node 1 always works serving mysql queries (r/w) until I restart the 2 dead nodes. The restart of the dead nodes is what prompts the good node to temporarily go offline to sync itself to the bad nodes. After a full SST is sent, I have a 3 node cluster once again.
  • przemekprzemek Percona Support Engineer Percona Staff Role
    The question is why node2 and node3 are crashing. Usually when a node is restarted, full SST is not needed, just fast IST. So your nodes perhaps are shut down due to some inconsistency. The answer should be in their error logs.
  • shockwavecsshockwavecs Contributor Inactive User Role Beginner
    OK I think I have located the cuplrit. It looks like converting the MyISAM tables to InnoDB has caused the other nodes to crash. We have haproxy hitting node 1 primarily so it hits node 1 and then syncs the changes to node 2 and 3. The nodes 2 and 3 break after this and require a full SST. I made a new post here detailing the issue.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.