slave sql thrtead got stopped automatically .

hi,
we have replication between two perconat xtradb cluster setups.

in each group we have 3 nodes.

slave sql thread got stopped rarely and we have not found cause.

below error messages we found from error log.

140609 3:37:14 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.174.10.163:4567
140609 3:37:15 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) reconnecting to 8d7cae5c-7207-11e3-a546-db0fa85eeb02 (tcp://10.174.10.163:4567), attempt 0
140609 3:37:15 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) cleaning up duplicate 0x2ab328435790 after established 0x2ab328051590
140609 3:37:15 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
140609 5:34:03 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.174.10.162:4567 tcp://10.174.10.163:4567
140609 5:34:04 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) reconnecting to df4de7f2-7205-11e3-8865-c6bdf5daf743 (tcp://10.174.10.162:4567), attempt 0
140609 5:34:04 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) cleaning up established 0x3491f510 which is duplicate of 0x348b73d0
140609 5:34:04 [Note] WSREP: (afc8dae2-7204-11e3-8876-275ef829891c, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
140609 11:16:25 [Note] Slave SQL thread exiting, replication stopped in log ‘ff-clusterdb03-lhr.000045’ at position 181715488
140609 11:50:47 [Note] WSREP: ready state reached
140609 11:50:47 [Note] Slave SQL thread initialized, starting replication in lf.000045’ at position 181715488, relay log ‘/u02/mysql/binlogs/ff–relay-bin.000108’ position: 170203363

Is this the full error log or you truncated some parts?
The 10.174.10.163 is the remote master for this node?

yes this is a full error,same error repeated many times in log.
10.274.10.163/162/161 are in cluster group.
we got same errors in all nodes .

Can you outline how exactly your replication topology looks like?
Also, the “turning message relay requesting on, nonlive peers” message means there was either some networking problem between nodes or a node crashed or was killed.

hanks Mike for your prompt response.

our replication setup is like below.


in c node slave getting stopped frequently .
Please help to resolve it.

photoid=16885|attachment