Hello,
I run Percona Cluster on 2 nodes: 10.0.0.91 (pxc-01) and 10.0.0.92 (pxc-02) + 1 arbitrator (garb) on 10.0.0.10.
Today, I shutted down the 10.0.0.92 (pxc-02) machine.
10.0.0.91 (pxc-01) became non-primary.
I though arbitrator was supposed to take over and declare 10.0.0.91 (pxc-01) primary to keep it working?
After 1 minute I had to run “SET GLOBAL wsrep_provider_options=‘pc.bootstrap=true’;” on 10.0.0.91 to solve the situation.
Any ideas why?
Here are the logs from the arbitrator:
2016-03-24 05:39:35.073 INFO: (b23eccb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.0.92:4567
2016-03-24 05:39:36.573 INFO: (b23eccb0, 'tcp://0.0.0.0:4567') reconnecting to c2c3d173 (tcp://10.0.0.92:4567), attempt 0
2016-03-24 05:39:36.972 INFO: evs::proto(b23eccb0, OPERATIONAL, view_id(REG,0a880fcf,7)) suspecting node: c2c3d173
2016-03-24 05:39:36.972 INFO: evs::proto(b23eccb0, OPERATIONAL, view_id(REG,0a880fcf,7)) suspected node without join message, declaring inactive
2016-03-24 05:39:44.472 WARN: evs::proto(b23eccb0, GATHER, view_id(REG,0a880fcf,7)) install timer expired
2016-03-24 05:39:44.472 INFO: no install message received
2016-03-24 05:39:44.472 INFO: view(view_id(NON_PRIM,0a880fcf,7) memb {
b23eccb0,0
} joined {
} left {
} partitioned {
0a880fcf,2
c2c3d173,2
})
2016-03-24 05:39:44.472 INFO: view(view_id(NON_PRIM,b23eccb0,8) memb {
b23eccb0,0
} joined {
} left {
} partitioned {
0a880fcf,2
c2c3d173,2
})
2016-03-24 05:39:44.472 INFO: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-24 05:39:44.472 INFO: Flow-control interval: [9999999, 9999999]
2016-03-24 05:39:44.472 INFO: Received NON-PRIMARY.
2016-03-24 05:39:44.472 INFO: Shifting SYNCED -> OPEN (TO: 154125793)
2016-03-24 05:39:44.472 INFO: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-24 05:39:44.472 INFO: Flow-control interval: [9999999, 9999999]
2016-03-24 05:39:44.472 INFO: Received NON-PRIMARY.
2016-03-24 05:40:03.978 INFO: declaring 0a880fcf at tcp://10.0.0.91:4567 stable
2016-03-24 05:40:03.979 INFO: view(view_id(NON_PRIM,0a880fcf,9) memb {
0a880fcf,2
b23eccb0,0
} joined {
} left {
} partitioned {
c2c3d173,2
})
2016-03-24 05:40:03.979 INFO: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2016-03-24 05:40:03.979 INFO: Flow-control interval: [9999999, 9999999]
2016-03-24 05:40:03.979 INFO: Received NON-PRIMARY.
2016-03-24 05:40:50.271 INFO: view(view_id(PRIM,0a880fcf,9) memb {
0a880fcf,2
b23eccb0,0
} joined {
} left {
} partitioned {
c2c3d173,2
})
2016-03-24 05:40:50.271 INFO: save pc into disk
2016-03-24 05:40:50.271 WARN: open file(./gvwstate.dat.tmp) failed(Permission denied)
2016-03-24 05:40:50.271 INFO: forgetting c2c3d173 (tcp://10.0.0.92:4567)
2016-03-24 05:40:50.271 INFO: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 1, memb_num = 2
2016-03-24 05:40:50.271 INFO: (b23eccb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-03-24 05:40:50.271 INFO: STATE EXCHANGE: Waiting for state UUID.
2016-03-24 05:40:50.272 INFO: STATE EXCHANGE: sent state msg: d2125aae-f169-11e5-a54f-2f2fb5523976
2016-03-24 05:40:50.272 INFO: STATE EXCHANGE: got state msg: d2125aae-f169-11e5-a54f-2f2fb5523976 from 0 (pxc-01)
2016-03-24 05:40:50.272 INFO: STATE EXCHANGE: got state msg: d2125aae-f169-11e5-a54f-2f2fb5523976 from 1 (garb)
2016-03-24 05:40:50.272 WARN: Quorum: No node with complete state:
2016-03-24 05:40:50.272 INFO: Partial re-merge of primary b28e0200-efcc-11e5-894c-7bf586b54a55 found: 2 of 3.
2016-03-24 05:40:50.272 INFO: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 7,
members = 2/2 (joined/total),
act_id = 154125793,
last_appl. = 154125696,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a
2016-03-24 05:40:50.272 INFO: Flow-control interval: [9999999, 9999999]
2016-03-24 05:40:50.272 INFO: Restored state OPEN -> SYNCED (154125793)
2016-03-24 05:40:53.076 INFO: cleaning up c2c3d173 (tcp://10.0.0.92:4567)
Here are logs from remaining node (pxc-01):
2016-03-24 05:39:35 1811 [Note] WSREP: (0a880fcf, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.0.92:4567
2016-03-24 05:39:36 1811 [Note] WSREP: (0a880fcf, 'tcp://0.0.0.0:4567') reconnecting to c2c3d173 (tcp://10.0.0.92:4567), attempt 0
2016-03-24 05:40:01 1811 [Note] WSREP: evs::proto(0a880fcf, GATHER, view_id(REG,0a880fcf,7)) suspecting node: c2c3d173
2016-03-24 05:40:01 1811 [Note] WSREP: evs::proto(0a880fcf, GATHER, view_id(REG,0a880fcf,7)) suspected node without join message, declaring inactive
2016-03-24 05:40:02 1811 [Note] WSREP: view(view_id(NON_PRIM,0a880fcf,7) memb {
0a880fcf,2
} joined {
} left {
} partitioned {
b23eccb0,0
c2c3d173,2
})
2016-03-24 05:40:02 1811 [Note] WSREP: view(view_id(NON_PRIM,0a880fcf,8) memb {
0a880fcf,2
} joined {
} left {
} partitioned {
b23eccb0,0
c2c3d173,2
})
2016-03-24 05:40:02 1811 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-24 05:40:02 1811 [Note] WSREP: Flow-control interval: [16, 16]
2016-03-24 05:40:02 1811 [Note] WSREP: Received NON-PRIMARY.
2016-03-24 05:40:02 1811 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 154125793)
2016-03-24 05:40:02 1811 [Note] WSREP: New cluster view: global state: 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a:154125793, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-03-24 05:40:02 1811 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-24 05:40:02 1811 [Note] WSREP: Flow-control interval: [16, 16]
2016-03-24 05:40:02 1811 [Note] WSREP: Received NON-PRIMARY.
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 1119, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 336, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 1816, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 516, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 504, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 504, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 528, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 504, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 504, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 914, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 336, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 873, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 336, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 347, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 347, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 1570, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 771, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 783, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 843, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 440, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 440, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 409, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 440, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 415, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 509, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 500, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 538, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Note] WSREP: New cluster view: global state: 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a:154125793, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-03-24 05:40:03 1811 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 805, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Warning] WSREP: Send action {(nil), 538, TORDERED} returned -107 (Transport endpoint is not connected)
2016-03-24 05:40:03 1811 [Note] WSREP: declaring b23eccb0 at tcp://10.0.0.10:4567 stable
2016-03-24 05:40:03 1811 [Note] WSREP: view(view_id(NON_PRIM,0a880fcf,9) memb {
0a880fcf,2
b23eccb0,0
} joined {
} left {
} partitioned {
c2c3d173,2
})
2016-03-24 05:40:03 1811 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
2016-03-24 05:40:03 1811 [Note] WSREP: Flow-control interval: [23, 23]
2016-03-24 05:40:03 1811 [Note] WSREP: Received NON-PRIMARY.
2016-03-24 05:40:03 1811 [Note] WSREP: New cluster view: global state: 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a:154125793, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version 3
2016-03-24 05:40:03 1811 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-24 05:40:50 1811 [Note] WSREP: view(view_id(PRIM,0a880fcf,9) memb {
0a880fcf,2
b23eccb0,0
} joined {
} left {
} partitioned {
c2c3d173,2
})
2016-03-24 05:40:50 1811 [Note] WSREP: save pc into disk
2016-03-24 05:40:50 1811 [Note] WSREP: forgetting c2c3d173 (tcp://10.0.0.92:4567)
2016-03-24 05:40:50 1811 [Note] WSREP: deleting entry tcp://10.0.0.92:4567
2016-03-24 05:40:50 1811 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 0, memb_num = 2
2016-03-24 05:40:50 1811 [Note] WSREP: (0a880fcf, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-03-24 05:40:50 1811 [Note] WSREP: STATE_EXCHANGE: sent state UUID: d2125aae-f169-11e5-a54f-2f2fb5523976
2016-03-24 05:40:50 1811 [Note] WSREP: STATE EXCHANGE: sent state msg: d2125aae-f169-11e5-a54f-2f2fb5523976
2016-03-24 05:40:50 1811 [Note] WSREP: STATE EXCHANGE: got state msg: d2125aae-f169-11e5-a54f-2f2fb5523976 from 0 (pxc-01)
2016-03-24 05:40:50 1811 [Note] WSREP: STATE EXCHANGE: got state msg: d2125aae-f169-11e5-a54f-2f2fb5523976 from 1 (garb)
2016-03-24 05:40:50 1811 [Warning] WSREP: Quorum: No node with complete state:
Version : 3
Flags : 0x7
Protocols : 0 / 7 / 3
State : NON-PRIMARY
Prim state : SYNCED
Prim UUID : b28e0200-efcc-11e5-894c-7bf586b54a55
Prim seqno : 7
First seqno : 154012102
Last seqno : 154125793
Prim JOINED : 3
State UUID : d2125aae-f169-11e5-a54f-2f2fb5523976
Group UUID : 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a
Name : 'pxc-01'
Incoming addr: '10.0.0.91:3306'
Version : 3
Flags : 0xe
Protocols : 0 / 127 / 127
State : NON-PRIMARY
Prim state : SYNCED
Prim UUID : b28e0200-efcc-11e5-894c-7bf586b54a55
Prim seqno : 7
First seqno : -1
Last seqno : 154125793
Prim JOINED : 3
State UUID : d2125aae-f169-11e5-a54f-2f2fb5523976
Group UUID : 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a
Name : 'garb'
Incoming addr: ''
2016-03-24 05:40:50 1811 [Note] WSREP: Partial re-merge of primary b28e0200-efcc-11e5-894c-7bf586b54a55 found: 2 of 3.
2016-03-24 05:40:50 1811 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 7,
members = 2/2 (joined/total),
act_id = 154125793,
last_appl. = 154125672,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a
2016-03-24 05:40:50 1811 [Note] WSREP: Flow-control interval: [23, 23]
2016-03-24 05:40:50 1811 [Note] WSREP: Restored state OPEN -> SYNCED (154125793)
2016-03-24 05:40:50 1811 [Note] WSREP: New cluster view: global state: 7f16b5ae-d1f6-11e5-9382-4ec823ecea7a:154125793, view# 8: Primary, number of nodes: 2, my index: 0, protocol version 3
2016-03-24 05:40:50 1811 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-24 05:40:50 1811 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-03-24 05:40:50 1811 [Note] WSREP: Service thread queue flushed.
2016-03-24 05:40:50 1811 [Note] WSREP: Assign initial position for certification: 154125793, protocol version: 3
2016-03-24 05:40:50 1811 [Note] WSREP: Service thread queue flushed.
2016-03-24 05:40:50 1811 [Note] WSREP: Synchronized with group, ready for connections
2016-03-24 05:40:50 1811 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-24 05:40:53 1811 [Note] WSREP: cleaning up c2c3d173 (tcp://10.0.0.92:4567)