Questions about two events being logged in the error log

  1. I have been noticing in my PXC error log that roughly every day (sometimes several times a day), I get about 16 of the exact same error messages logged consecutively:

130729 15:44:53 [Note] WSREP: (29f1d3b8-f86d-11e2-0800-28466944d053, ‘tcp://’) address ‘tcp://XXX.XXX.XXX.XXX:4567’ pointing to uuid 29f1d3b8-f86d-11e2-0800-28466944d053 is blacklisted, skipping

The only difference between the lines is the timestamp, all of which occur within about a 2 second period.

  1. The other issue that I am seeing occurs at 1am and 3am when our automated backup solution kicks in on this specific node:

130730 1:02:28 [Note] WSREP: Provider paused at a8e8a277-6f03-11e2-0800-5896d9f10d3c:14031249
130730 1:02:28 [Note] WSREP: Provider resumed.

Several of these are logged consecutively. Does this mean that when this occurs this node is no longer a member of the cluster? I do not get a wsrep_notify_cmd hit and our load balancer has never logged that the server was down when polling the clustercheck xinetd script (I do realize that the time in which it is reporting as paused is very fast and it would be hard for the poller to catch it in a down state). Does it mean it is just pausing for flow control reasons? The other nodes in the cluster do not log any events during the time period in which this is occuring.

Thanks for any insight.

Forgot to mention this is PXC 5.5.30 wsrep_23.7.4.r3843.


Nobody else is seeing this?

These are harmless by themselves, but they indicate that some extra state checking is happening (AFAICT).

This is triggered by the Galera provider on this node not being able to write locally. Typically this would be caused by a FTWRL (probably from your backup).

A paused provider cannot write, and that will backup the local recv queue, and that in turn may cause flow control depending on your fc_limit and associated fc* settings in the wsrep_provider_options AND if your node is in the ‘Synced’ state. So FC is related, but not necessarily the case if you see this message.

I can’t necessarily account for why you’d see many of these, but what backup method are you using? That may explain it.