Not the answer you need?
Register and ask your own question!

"exception in PC" on node 1 -> whole 3 node cluster froze

hodkahodka EntrantCurrent User Role Beginner
I have a 3 node setup: nodes 1 and 2 in datacenter A, node 3 in datacenter B

Today, node 1 failed after a lot of partitioning and resyncing of the cluster with the following message.

Thereafter, the whole cluster froze with a some of the following messages on nodes 2 and 3:
2014-02-16 12:57:47 2595 [Note] WSREP: Nodes 62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d are still in unknown state, unable to rebootstrap new prim

Does anyone have some ideas on how to solve this issue?

Thanks!

Frank.


ERROR ON NODE 1:

2014-02-16 12:41:02 2504 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
pc::Proto{uuid=62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,start_prim=0,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=4,checksum=0,instances=
62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,prim=1,un=1,last_seq=4,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1
6811f0c9-9367-11e3-9044-cb32ea280bb8,prim=0,un=0,last_seq=1,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,40),to_seq=151199,weight=1,segment=2
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,prim=1,un=1,last_seq=43,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1
,state_msgs=
62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,prim=1,un=0,last_seq=4,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,prim=1,un=0,last_seq=43,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1
}}
6811f0c9-9367-11e3-9044-cb32ea280bb8,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,prim=1,un=1,last_seq=4,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,40),to_seq=151199,weight=1,segment=1
6811f0c9-9367-11e3-9044-cb32ea280bb8,prim=0,un=0,last_seq=1,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,40),to_seq=151199,weight=1,segment=2
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,prim=1,un=1,last_seq=2,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,40),to_seq=151199,weight=1,segment=1
}}
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,prim=1,un=0,last_seq=4,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,prim=1,un=0,last_seq=43,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1
}}
,current_view=view(view_id(REG,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,49) memb {
62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,0
6811f0c9-9367-11e3-9044-cb32ea280bb8,0
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,0
} joined {
6811f0c9-9367-11e3-9044-cb32ea280bb8,0
} left {
} partitioned {
}),pc_view=view(view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46) memb {
62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,1
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,1
} joined {
} left {
} partitioned {
}),mtu=32636}
2014-02-16 12:41:02 2504 [Note] WSREP: evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=4,source=bbce851d-9367-11e3-8a0e-9a5107ef8b9f,source_view_id=view_id(REG,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,49),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=3202335,node_list=()
} 116
2014-02-16 12:41:02 2504 [ERROR] WSREP: exception caused by message: evs::msg{version=0,type=3,user_type=255,order=1,seq=0,seq_range=-1,aru_seq=0,flags=4,source=6811f0c9-9367-11e3-9044-cb32ea280bb8,source_view_id=view_id(REG,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,49),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=3202238,node_list=()
}
state after handling message: evs::proto(evs::proto(62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d, OPERATIONAL, view_id(REG,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,49)), OPERATIONAL) {
current_view=view(view_id(REG,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,49) memb {
62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,0
6811f0c9-9367-11e3-9044-cb32ea280bb8,0
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,0
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} node: {idx=1,range=[1,0],safe_seq=0} node: {idx=2,range=[1,0],safe_seq=0} },
fifo_seq=3203297,
last_sent=0,
known={
62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,evs::node{operational=1,suspected=0,installed=1,fifo_seq=-1,}
6811f0c9-9367-11e3-9044-cb32ea280bb8,evs::node{operational=1,suspected=0,installed=1,fifo_seq=3202238,}
bbce851d-9367-11e3-8a0e-9a5107ef8b9f,evs::node{operational=1,suspected=0,installed=1,fifo_seq=3202337,}
}
}2014-02-16 12:41:02 2504 [ERROR] WSREP: exception from gcomm, backend must be restarted: msg_state == local_state: 62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d node 62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d prim state message and local states not consistent: msg node prim=1,un=0,last_seq=4,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1 local state prim=1,un=1,last_seq=4,last_prim=view_id(PRIM,62b5e5f0-9367-11e3-a0ac-abaf5f8dac6d,46),to_seq=151253,weight=1,segment=1 (FATAL)
at gcomm/src/pc_proto.cpp:validate_state_msgs():607
2014-02-16 12:41:02 2504 [Note] WSREP: Received self-leave message.
2014-02-16 12:41:02 2504 [Note] WSREP: Flow-control interval: [0, 0]
2014-02-16 12:41:02 2504 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-02-16 12:41:02 2504 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 2988725)
2014-02-16 12:41:02 2504 [Note] WSREP: RECV thread exiting 0: Success
2014-02-16 12:41:02 2504 [Note] WSREP: New cluster view: global state: 5dd126ae-2944-11e3-9d8e-a65147a95bff:2988725, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 2
2014-02-16 12:41:17 2504 [Note] WSREP: applier thread exiting (code:0)
2014-02-16 16:17:15 2504 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
2014-02-16 16:17:15 2504 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
2014-02-16 16:25:03 2504 [Note] /usr/sbin/mysqld: Normal shutdown

Comments

  • hodkahodka Entrant Current User Role Beginner
    Well, I moved my third node to the same datacenter as nodes 1 and 2. Hope that this works as work-around... Although this is not a solution and I thought that XtraDB Cluster would be capable of working with nodes distributed across datacenters.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.