Periodic / sporadic leaving/rejoining of nodes.

We moved to a new set of 3 servers (PXDC 5.5.34-25.9.607, migrated CentOS 5.10 to 6.5, physical servers with plenty of CPU and RAM, dual-bonded gigabit, same datacenter) and we are seeing strange quirks when one node will drop and quickly rejoin. Things seem to work as-designed when this happens and cluster quickly recovers, but the fact that it’s dropping all idle connections is causing us some headache.

This seems to happened to one node the most, but has happened to all three so it doesn’t appear to be a hardware/network issue with just one server. It happens anywhere between once a week to 2-3 times in one day.

These databases are small (10gigs) and have relatively light load, and I’ve seen these issues during both “peak” and “mostly-idle” periods of time.

Attached is a log of one of such events. Does anyone have any thoughts or suggestions? Is there anything we could tweak to make the cluster be less sensitive, as we could probably tolerate even 30 seconds of stalls rather than having it drop and rejoin quickly.

xtradb-flap.txt (11.1 KB)

This node apparently had a problems communicating with 192.168.0.21 and 192.168.0.25 nodes. Check network links quality.