Not the answer you need?
Register and ask your own question!

Primary node restart failed

marat.komarovmarat.komarov EntrantCurrent User Role Beginner
I've setuped a simple Master-Slave using weighted quorum http://www.codership.com/wiki/doku.p...eighted_quorum

node-1 is a master:
wsrep_provider_options="pc.weight=1; gcs.fc_master_slave=yes"
wsrep_cluster_address=gcomm://node-1,node-2

node-2 is a slave:
wsrep_provider_options="pc.weight=0; gcs.fc_master_slave=yes"
wsrep_cluster_address=gcomm://node-1,node-2

/etc/hosts on both servers has ip address mappings

When i restart mysql on master with `service mysql restart` it fails to join back cluster with a connection refused error in server log:


130529 8:59:31 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'node-1:,node-2:'
130529 8:59:31 [Warning] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, 'tcp://0.0.0.0:4567') address 'tcp://10.137.48.175:4567' points to own listening address, blacklisting
130529 8:59:31 [Note] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, 'tcp://0.0.0.0:4567') address 'tcp://10.137.48.175:4567' pointing to uuid 9a694329-c85f-11e2-0800-3f3cae6230d1 is blacklisted, skipping
130529 8:59:31 [Note] WSREP: declaring b27d9646-c85b-11e2-0800-eeeca83be563 stable
130529 8:59:31 [Note] WSREP: view(view_id(NON_PRIM,9a694329-c85f-11e2-0800-3f3cae6230d1,16) memb {
9a694329-c85f-11e2-0800-3f3cae6230d1,
b27d9646-c85b-11e2-0800-eeeca83be563,
} joined {
} left {
} partitioned {
faec897c-c85e-11e2-0800-2721f22ff1fc,
})
130529 9:00:01 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():139
130529 9:00:01 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130529 9:00:01 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'my_wsrep_cluster' at 'gcomm://node-1,node-2': -110 (Connection timed out)
130529 9:00:01 [ERROR] WSREP: gcs connect failed: Connection timed out
130529 9:00:01 [ERROR] WSREP: wsrep::connect() failed: 6
130529 9:00:01 [ERROR] Aborting


To get back node-1 to cluster i've started it with `service mysql start --wsrep-cluster-address="gcomm://"`

My question: is it expected behavior? it's a bit strange to get connectivity error, while node-2 is up and port 4567 is listening. when cluster contains only one primary node is it safe to restart mysql daemon on it?

Comments

  • percona.jayjpercona.jayj Percona Consulting Lead Percona Staff Role
    Can you pin this behavior on the weighted quorum? Does it behave normally if you get rid of the weight?
  • marat.komarovmarat.komarov Entrant Current User Role Beginner
    Yes, without 'weight' settings it works normally.
  • percona.jayjpercona.jayj Percona Consulting Lead Percona Staff Role
    Then by all means file a bug! http://www.percona.com/doc/percona-x...bugreport.html :D

    I haven't tested this feature, so I can't vouch for it.
  • f.demassisf.demassis Entrant Current User Role Beginner
    Hello,

    I'm experiencing similar situation in an identical 2 nodes configuration (usefull for little shops with no shared storage).
    When restarting both nodes it is necessary to clear Quorum (service mysql bootstrap-pcx on master node, probably works also on slave) in order to clearly restart the galera cluster. So every restart of both nodes require manual intervention.
    I tried to reproduce the problem stopping each of the two nodes. Sequence of tests follows. Is there any patch or configuration of evs.* timing parameters that avoids this problems?

    I can not post tests due to character number limits.

    Thanks in advance,
    Fabrizio
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.