We have been running a 3 node cluster in production for about 2 months now, upgraded from 5.5 to 5.6 latest and Galera 3x with out issues. We use the third node (call it C) as an async master to a 5.6 slave running in our offsite location over an ssh tunnel. This node (C) does not serve any traffic. Also, we send all writes to Node A, and use Node A and B for reads.
The problem that we are seeing is that the third node, in this case C always seems to disconnect from the cluster, and rejoins performing an IST. I have no idea why this is happening but it seems also to corrupt the async slave when it happens.
Here is the full transaction of the cluster member C, leaving and joining the cluster. We have reinstalled this member on new hardware, renamed it, no matter what it always disconnects. The other members are humming along just fine. It seems to be limited to the third joining node.
ERROR LOG - attachments don’t work 2014-05-18 14:31:43 3167 [Note] WSREP: (c008a8e1-de1c-11e3-913f-d6d7b3e7ba76, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://XXX.XXX:4567
2014-05-18 14:31:43 3167 [Note] WSREP: (c008a8e1-de1c-11e3-913f-d6d7b3e7ba76, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2014-05-18 14:31:43 3167 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.55984S), skipping check
2014-05-18 14:31:44 3167 [Note] WSREP: view(view_id(NON_PRIM,1469ea44-daff-11e3-97a5-8bd469d0b8fc,376) memb {
c008a8e1-de1c-11e3-913f-d6d7b3e7ba76,0
} joined {
} left {
} partitioned {
1469ea44-daff-11e3-97a5-8bd469d0b8fc,0
d62ce8fe-dad9-11e3-a835-d29c74758152,0
})
2014-05-18 14:31:44 3167 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-05-18 14:31:44 3167 [Note] WSREP: Flow-control interval: [500, 500]
2014-05-18 14:31:44 3167 [Note] WSREP: Received NON-PRIMARY.
2014-05-18 14:31:44 3167 [Note] WSREP: Shifting SYNCED → OPEN (TO: 58520762)
2014-05-18 14:31:44 3167 [Note] WSREP: New cluster view: global state: 95890958-8d33-11e3-aea7-ff8c6255f30b:58520762, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
2014-05-18 14:31:44 3167 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-05-18 14:31:44 3167 [Note] WSREP: view(view_id(NON_PRIM,c008a8e1-de1c-11e3-913f-d6d7b3e7ba76,377) memb {
c008a8e1-de1c-11e3-913f-d6d7b3e7ba76,0
} joined {
} left {
} partitioned {
1469ea44-daff-11e3-97a5-8bd469d0b8fc,0
d62ce8fe-dad9-11e3-a835-d29c74758152,0
})
2014-05-18 14:31:44 3167 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-05-18 14:31:44 3167 [Note] WSREP: Flow-control interval: [500, 500]
2014-05-18 14:31:44 3167 [Note] WSREP: Received NON-PRIMARY.
2014-05-18 14:31:44 3167 [Note] WSREP: New cluster view: global state: 95890958-8d33-11e3-aea7-ff8c6255f30b:58520762, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
2014-05-18 14:31:44 3167 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-05-18 14:31:45 3167 [Note] WSREP: declaring 1469ea44-daff-11e3-97a5-8bd469d0b8fc stable
2014-05-18 14:31:45 3167 [Note] WSREP: declaring d62ce8fe-dad9-11e3-a835-d29c74758152 stable
2014-05-18 14:31:46 3167 [Note] WSREP: Node 1469ea44-daff-11e3-97a5-8bd469d0b8fc state prim
2014-05-18 14:31:46 3167 [Note] WSREP: view(view_id(PRIM,1469ea44-daff-11e3-97a5-8bd469d0b8fc,378) memb {
1469ea44-daff-11e3-97a5-8bd469d0b8fc,0
c008a8e1-de1c-11e3-913f-d6d7b3e7ba76,0
d62ce8fe-dad9-11e3-a835-d29c74758152,0
} joined {
} left {
} partitioned {
})
2014-05-18 14:31:46 3167 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2014-05-18 14:31:46 3167 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2014-05-18 14:31:46 3167 [Note] WSREP: STATE EXCHANGE: sent state msg: 23c6bd5c-de99-11e3-9cd7-bb91f8f93683
2014-05-18 14:31:46 3167 [Note] WSREP: STATE EXCHANGE: got state msg: 23c6bd5c-de99-11e3-9cd7-bb91f8f93683 from 0 (pxc-01)
2014-05-18 14:31:46 3167 [Note] WSREP: STATE EXCHANGE: got state msg: 23c6bd5c-de99-11e3-9cd7-bb91f8f93683 from 1 (pxc-04)
2014-05-18 14:31:46 3167 [Note] WSREP: STATE EXCHANGE: got state msg: 23c6bd5c-de99-11e3-9cd7-bb91f8f93683 from 2 (pxc-02)
2014-05-18 14:31:46 3167 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 295,
members = 2/3 (joined/total),
act_id = 58520767,
last_appl. = 58520707,
protocols = 0/5/2 (gcs/repl/appl),
group UUID = 95890958-8d33-11e3-aea7-ff8c6255f30b
2014-05-18 14:31:46 3167 [Note] WSREP: Flow-control interval: [500, 500]
2014-05-18 14:31:46 3167 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 58520767)
2014-05-18 14:31:46 3167 [Note] WSREP: State transfer required:
Group state: 95890958-8d33-11e3-aea7-ff8c6255f30b:58520767
Local state: 95890958-8d33-11e3-aea7-ff8c6255f30b:58520762
2014-05-18 14:31:46 3167 [Note] WSREP: New cluster view: global state: 95890958-8d33-11e3-aea7-ff8c6255f30b:58520767, view# 296: Primary, number of nodes: 3, my index: 1, protocol version 2
2014-05-18 14:31:46 3167 [Warning] WSREP: Gap in state sequence. Need state transfer.
2014-05-18 14:31:48 3167 [Note] WSREP: You have configured ‘xtrabackup’ state snapshot transfer method which cannot be performed on a running server. Wsrep provider won’t be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
2014-05-18 14:31:48 3167 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-05-18 14:31:48 3167 [Note] WSREP: REPL Protocols: 5 (3, 1)
2014-05-18 14:31:48 3167 [Note] WSREP: Service thread queue flushed.
2014-05-18 14:31:48 3167 [Note] WSREP: Assign initial position for certification: 58520767, protocol version: 3
2014-05-18 14:31:48 3167 [Note] WSREP: Service thread queue flushed.
2014-05-18 14:31:48 3167 [Warning] WSREP: Releasing seqno 58520762 before 58520763 was assigned.
2014-05-18 14:31:48 3167 [Note] WSREP: Prepared IST receiver, listening at: tcp://XXX.XXX:4568
2014-05-18 14:31:48 3167 [Note] WSREP: Member 1.0 (pxc-04) requested state transfer from ‘pxc-02’. Selected 2.0 (pxc-02)(SYNCED) as donor.
2014-05-18 14:31:48 3167 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 58520784)
2014-05-18 14:31:48 3167 [Note] WSREP: Requesting state transfer: success, donor: 2
2014-05-18 14:31:48 3167 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 0 bytes
2014-05-18 14:31:48 3167 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/17 locked buffers
2014-05-18 14:31:48 3167 [Note] WSREP: Receiving IST: 5 writesets, seqnos 58520762-58520767
2014-05-18 14:31:48 3167 [Note] WSREP: IST received: 95890958-8d33-11e3-aea7-ff8c6255f30b:58520767
2014-05-18 14:31:48 3167 [Note] WSREP: 1.0 (pxc-04): State transfer from 2.0 (pxc-02) complete.
2014-05-18 14:31:48 3167 [Note] WSREP: Shifting JOINER → JOINED (TO: 58520784)
2014-05-18 14:31:48 3167 [Note] WSREP: 2.0 (pxc-02): State transfer to 1.0 (pxc-04) complete.
2014-05-18 14:31:48 3167 [Note] WSREP: Member 1.0 (pxc-04) synced with group.
2014-05-18 14:31:48 3167 [Note] WSREP: Shifting JOINED → SYNCED (TO: 58520784)
2014-05-18 14:31:48 3167 [Note] WSREP: Member 2.0 (pxc-02) synced with group.
2014-05-18 14:31:48 3167 [Note] WSREP: Synchronized with group, ready for connections
2014-05-18 14:31:48 3167 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.