Questions about two events being logged in the error log

gabeguillen · July 30, 2013, 1:35pm

I have been noticing in my PXC error log that roughly every day (sometimes several times a day), I get about 16 of the exact same error messages logged consecutively:

130729 15:44:53 [Note] WSREP: (29f1d3b8-f86d-11e2-0800-28466944d053, ‘tcp://0.0.0.0:4567’) address ‘tcp://XXX.XXX.XXX.XXX:4567’ pointing to uuid 29f1d3b8-f86d-11e2-0800-28466944d053 is blacklisted, skipping

The only difference between the lines is the timestamp, all of which occur within about a 2 second period.

The other issue that I am seeing occurs at 1am and 3am when our automated backup solution kicks in on this specific node:

130730 1:02:28 [Note] WSREP: Provider paused at a8e8a277-6f03-11e2-0800-5896d9f10d3c:14031249
130730 1:02:28 [Note] WSREP: Provider resumed.

Several of these are logged consecutively. Does this mean that when this occurs this node is no longer a member of the cluster? I do not get a wsrep_notify_cmd hit and our load balancer has never logged that the server was down when polling the clustercheck xinetd script (I do realize that the time in which it is reporting as paused is very fast and it would be hard for the poller to catch it in a down state). Does it mean it is just pausing for flow control reasons? The other nodes in the cluster do not log any events during the time period in which this is occuring.

Thanks for any insight.

gabeguillen · July 30, 2013, 1:36pm

Forgot to mention this is PXC 5.5.30 wsrep_23.7.4.r3843.

Thanks.

gabeguillen · August 2, 2013, 2:11pm

Nobody else is seeing this?

percona.jayj · August 6, 2013, 12:50pm

These are harmless by themselves, but they indicate that some extra state checking is happening (AFAICT).

gabeguillen;11094:

The other issue that I am seeing occurs at 1am and 3am when our automated backup solution kicks in on this specific node:

130730 1:02:28 [Note] WSREP: Provider paused at a8e8a277-6f03-11e2-0800-5896d9f10d3c:14031249
130730 1:02:28 [Note] WSREP: Provider resumed.

Several of these are logged consecutively. Does this mean that when this occurs this node is no longer a member of the cluster? I do not get a wsrep_notify_cmd hit and our load balancer has never logged that the server was down when polling the clustercheck xinetd script (I do realize that the time in which it is reporting as paused is very fast and it would be hard for the poller to catch it in a down state). Does it mean it is just pausing for flow control reasons? The other nodes in the cluster do not log any events during the time period in which this is occuring.

This is triggered by the Galera provider on this node not being able to write locally. Typically this would be caused by a FTWRL (probably from your backup).

A paused provider cannot write, and that will backup the local recv queue, and that in turn may cause flow control depending on your fc_limit and associated fc* settings in the wsrep_provider_options AND if your node is in the ‘Synced’ state. So FC is related, but not necessarily the case if you see this message.

I can’t necessarily account for why you’d see many of these, but what backup method are you using? That may explain it.

Topic		Replies	Views
PXC 5.6 crashes while blacklisting some IPs Percona XtraDB Cluster 5.x	2	1631	April 26, 2014
How does one troubleshoot a server crashing? Percona XtraDB Cluster 5.x	6	5349	July 23, 2014
Cluster failed, can you provide any insight? Percona XtraDB Cluster 5.x	11	1294	November 8, 2013
Strange node hang Percona XtraDB Cluster 5.x	8	1513	October 28, 2013
1047 Unknown command errors on PXC node at the same time each day Percona XtraDB Cluster 5.x	5	2778	July 12, 2013

Questions about two events being logged in the error log

Related topics