Not the answer you need?
Register and ask your own question!

PXC inconsistent view after network crash

La Cancellera YoannLa Cancellera Yoann EntrantCurrent User Role Contributor
Hi,


We had a cluster outage too, some machines restarted and nodes were able to start again and see each others, but could not reach a primary state
We are running 2 5.7.26 PXC + garbd

Everything looks like usual logs from every node except one, with this line (yes, it does miss a parenthesis):
2020-04-06T07:35:41.992687+01:00 0 [Warning] WSREP: node uuid: 366255e7 last_prim(type: 3, uuid: 06eac166) is inconsistent to restored view(type: V_NON_PRIM, uuid: 06eac166

See the error.log, we have these exact messages looping all the time

Where do this warning come from ?
  • I still did not find it in source code
  • I can't find any info on it, except the post I linked with no resolutions
  • I manually checked, every node do have the exact same view (in terms of view number, number of nodes, ID of nodes, who joined, who left and who is "partitionned")

Also, this same node have a different value for "protocol" in this kind of log :
2020-04-06T07:11:30.421833+01:00 1 [Note] WSREP: New cluster view: global state: 8c88b4da-c72f-11e4-b875-b742ccccfc53:76007269, view# -1: non-Primary, number of nodes: 3, my index: 0, protocol version -1
Other node have "protocol version 3". What would this difference mean ?


Thank you



Best Answer

  • Kamil HolubickiKamil Holubicki Percona Staff Role
    Accepted Answer
    Hi,
    The warning comes from percona-xtradb-cluster-galera/gcomm/src/pc_proto.cpp, gcomm::pc::Proto::deliver_view() line 252.
    The protocol version -1 is the consequence of node not being joined to the cluster.
    You could try to collect more debug info by setting the following in my.cnf on all nodes: 
    wsrep_debug=1
    wsrep_provider_options="debug = yes;other_options_here"

Answers

  • La Cancellera YoannLa Cancellera Yoann Entrant Current User Role Contributor
    Hi,
    Thanks a lot, I will continue from there and update my post if I find anything interesting
  • La Cancellera YoannLa Cancellera Yoann Entrant Current User Role Contributor
    edited April 8
    For the sake of anyone searching a similar issue, that's my understanding of the issue :

    We had VMs outage, each node on its VM
    Turns out, one PXC (1) did not acknowledged the fact that the Garbd came back with another UUID. It kept a old UUID in its member list, and kept the state it had before the incident, hence PRIMARY (type 3)

    The other PXC node (2) has a log saying " WSREP: remote endpoint tcp://x.x.x.x:4567 changed identity 366255e7 -> 0b05842a", while node 1 kept both

    Node (1) would then say its view "was inconsistent", because it had this node 366255e7 stored with last_prim = PRIMARY, when every other nodes were saying NOT PRIMARY

    The fun thing is that I could restart node (1) or garbd as much as I wanted, and it still kept the old UUID and even added one more each time garbd was restarted, while node (2) always acknowledged identity changes

    I had to shut everything and bootstrap again
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.