Thanks for your reply Jay. I really appreciated it.
It does recover by itself but takes a few minutes.
I’ll have to try to trap the wsrep variable states when it happens.
I’ve pasted the mysql logs below. Seems pretty consistent at this time each day.
Thanks in advance for any insight you can offer.
130708 22:04:31 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) reconnecting to 063fe2dd-ceba-11e2-0800-37fa3005f9fe (tcp://172.18.4.129:4567), attempt 0
130708 22:04:32 [Note] WSREP: evs::proto(0dca7efa-ceb9-11e2-0800-426d225ba0b6, GATHER, view_id(REG,063fe2dd-ceba-11e2-0800-37fa3005f9fe,152)) suspecting node: 063fe2dd-ceba-11e2-0800-37fa3005f9fe
130708 22:04:33 [Note] WSREP: declaring 7554f7d0-ceb7-11e2-0800-c1b580896c4b stable
130708 22:04:33 [Note] WSREP: Node 0dca7efa-ceb9-11e2-0800-426d225ba0b6 state prim
130708 22:04:33 [Note] WSREP: view(view_id(PRIM,0dca7efa-ceb9-11e2-0800-426d225ba0b6,154) memb {
0dca7efa-ceb9-11e2-0800-426d225ba0b6,
7554f7d0-ceb7-11e2-0800-c1b580896c4b,
} joined {
} left {
} partitioned {
063fe2dd-ceba-11e2-0800-37fa3005f9fe,
})
130708 22:04:33 [Note] WSREP: forgetting 063fe2dd-ceba-11e2-0800-37fa3005f9fe (tcp://172.18.4.129:4567)
130708 22:04:33 [Note] WSREP: deleting entry tcp://172.18.4.129:4567
130708 22:04:33 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
130708 22:04:33 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
130708 22:04:33 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8d4155ec-e7c6-11e2-0800-577937398872
130708 22:04:33 [Note] WSREP: STATE EXCHANGE: sent state msg: 8d4155ec-e7c6-11e2-0800-577937398872
130708 22:04:33 [Note] WSREP: STATE EXCHANGE: got state msg: 8d4155ec-e7c6-11e2-0800-577937398872 from 0 (pxc3)
130708 22:04:33 [Note] WSREP: STATE EXCHANGE: got state msg: 8d4155ec-e7c6-11e2-0800-577937398872 from 1 (pxc2)
130708 22:04:33 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 145,
members = 2/2 (joined/total),
act_id = 533576,
last_appl. = 533498,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = a9e219f1-4291-11e2-0800-f07a1fee082b
130708 22:04:33 [Note] WSREP: Flow-control interval: [23, 23]
130708 22:04:33 [Note] WSREP: New cluster view: global state: a9e219f1-4291-11e2-0800-f07a1fee082b:533576, view# 146: Primary, number of nodes: 2, my index: 0, protocol version 2
130708 22:04:33 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130708 22:04:33 [Note] WSREP: Assign initial position for certification: 533576, protocol version: 2
130708 22:04:38 [Note] WSREP: cleaning up 063fe2dd-ceba-11e2-0800-37fa3005f9fe (tcp://172.18.4.129:4567)
130708 22:05:09 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://172.18.4.129:4567
130708 22:05:09 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
130708 22:05:10 [Note] WSREP: declaring 063fe2dd-ceba-11e2-0800-37fa3005f9fe stable
130708 22:05:10 [Note] WSREP: declaring 7554f7d0-ceb7-11e2-0800-c1b580896c4b stable
130708 22:05:10 [Note] WSREP: Node 0dca7efa-ceb9-11e2-0800-426d225ba0b6 state prim
130708 22:05:10 [Note] WSREP: view(view_id(PRIM,063fe2dd-ceba-11e2-0800-37fa3005f9fe,155) memb {
063fe2dd-ceba-11e2-0800-37fa3005f9fe,
0dca7efa-ceb9-11e2-0800-426d225ba0b6,
7554f7d0-ceb7-11e2-0800-c1b580896c4b,
} joined {
} left {
} partitioned {
})
130708 22:05:10 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
130708 22:05:10 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
130708 22:05:10 [Note] WSREP: STATE EXCHANGE: sent state msg: a392b179-e7c6-11e2-0800-cb8c686fe4f5
130708 22:05:10 [Note] WSREP: STATE EXCHANGE: got state msg: a392b179-e7c6-11e2-0800-cb8c686fe4f5 from 0 (pxc1)
130708 22:05:10 [Note] WSREP: STATE EXCHANGE: got state msg: a392b179-e7c6-11e2-0800-cb8c686fe4f5 from 1 (pxc3)
130708 22:05:10 [Note] WSREP: STATE EXCHANGE: got state msg: a392b179-e7c6-11e2-0800-cb8c686fe4f5 from 2 (pxc2)
130708 22:05:10 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 146,
members = 2/3 (joined/total),
act_id = 533582,
last_appl. = 533498,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = a9e219f1-4291-11e2-0800-f07a1fee082b
130708 22:05:10 [Note] WSREP: Flow-control interval: [28, 28]
130708 22:05:10 [Note] WSREP: New cluster view: global state: a9e219f1-4291-11e2-0800-f07a1fee082b:533582, view# 147: Primary, number of nodes: 3, my index: 1, protocol version 2
130708 22:05:10 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130708 22:05:10 [Note] WSREP: Assign initial position for certification: 533582, protocol version: 2
130708 22:05:42 [Note] WSREP: Node 0 (pxc1) requested state transfer from ‘any’. Selected 1 (pxc3)(SYNCED) as donor.
130708 22:05:42 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 533582)
130708 22:05:42 [Note] WSREP: IST request: a9e219f1-4291-11e2-0800-f07a1fee082b:533576-533582|tcp://172.18.4.129:4568
130708 22:05:42 [Note] WSREP: async IST sender starting to serve tcp://172.18.4.129:4568 sending 533577-533582
130708 22:05:42 [Note] WSREP: async IST sender served
130708 22:05:42 [Note] WSREP: 1 (pxc3): State transfer to 0 (pxc1) complete.
130708 22:05:42 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 533582)
130708 22:05:42 [Note] WSREP: Member 1 (pxc3) synced with group.
130708 22:05:42 [Note] WSREP: Shifting JOINED → SYNCED (TO: 533582)
130708 22:05:42 [Note] WSREP: Synchronized with group, ready for connections
130708 22:05:42 [Note] WSREP: 0 (pxc1): State transfer from 1 (pxc3) complete.
130708 22:05:42 [Note] WSREP: Member 0 (pxc1) synced with group.
130708 22:06:51 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://172.18.4.130:4567
130708 22:06:52 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) reconnecting to 7554f7d0-ceb7-11e2-0800-c1b580896c4b (tcp://172.18.4.130:4567), attempt 0
130708 22:06:53 [Note] WSREP: evs::proto(0dca7efa-ceb9-11e2-0800-426d225ba0b6, GATHER, view_id(REG,063fe2dd-ceba-11e2-0800-37fa3005f9fe,155)) suspecting node: 7554f7d0-ceb7-11e2-0800-c1b580896c4b
130708 22:06:54 [Note] WSREP: evs::proto(0dca7efa-ceb9-11e2-0800-426d225ba0b6, GATHER, view_id(REG,063fe2dd-ceba-11e2-0800-37fa3005f9fe,155)) suspecting node: 7554f7d0-ceb7-11e2-0800-c1b580896c4b
130708 22:06:55 [Note] WSREP: declaring 063fe2dd-ceba-11e2-0800-37fa3005f9fe stable
130708 22:06:55 [Note] WSREP: Node 063fe2dd-ceba-11e2-0800-37fa3005f9fe state prim
130708 22:06:55 [Note] WSREP: view(view_id(PRIM,063fe2dd-ceba-11e2-0800-37fa3005f9fe,156) memb {
063fe2dd-ceba-11e2-0800-37fa3005f9fe,
0dca7efa-ceb9-11e2-0800-426d225ba0b6,
} joined {
} left {
} partitioned {
7554f7d0-ceb7-11e2-0800-c1b580896c4b,
})
130708 22:06:55 [Note] WSREP: forgetting 7554f7d0-ceb7-11e2-0800-c1b580896c4b (tcp://172.18.4.130:4567)
130708 22:06:55 [Note] WSREP: deleting entry tcp://172.18.4.130:4567
130708 22:06:55 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
130708 22:06:55 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
130708 22:06:55 [Note] WSREP: (0dca7efa-ceb9-11e2-0800-426d225ba0b6, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
130708 22:06:55 [Note] WSREP: STATE EXCHANGE: sent state msg: e1e0b984-e7c6-11e2-0800-e0315f7301f1
130708 22:06:55 [Note] WSREP: STATE EXCHANGE: got state msg: e1e0b984-e7c6-11e2-0800-e0315f7301f1 from 0 (pxc1)
130708 22:06:55 [Note] WSREP: STATE EXCHANGE: got state msg: e1e0b984-e7c6-11e2-0800-e0315f7301f1 from 1 (pxc3)
130708 22:06:55 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 147,
members = 2/2 (joined/total),
act_id = 533583,
last_appl. = 0,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = a9e219f1-4291-11e2-0800-f07a1fee082b
130708 22:06:55 [Note] WSREP: Flow-control interval: [23, 23]
130708 22:06:55 [Note] WSREP: New cluster view: global state: a9e219f1-4291-11e2-0800-f07a1fee082b:533583, view# 148: Primary, number of nodes: 2, my index: 1, protocol version 2