Troubleshooting: Primary Node in a PXC rapidly syncs and desyncs.

One of the nodes (our primary node in this case) is have a repeated issue where it will continuously desync and resync over a span of a few seconds. This happens approximately every 10 minutes. We want to avoid rebuilding the Node
Log shows:
2018-06-27 17:27:54 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) resyncs itself to group
2018-06-27 17:27:54 31214 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 1667883)
2018-06-27 17:27:54 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) synced with group.
2018-06-27 17:27:54 31214 [Note] WSREP: Shifting JOINED → SYNCED (TO: 1667883)
2018-06-27 17:27:54 31214 [Note] WSREP: Synchronized with group, ready for connections
2018-06-27 17:27:54 31214 [Note] WSREP: Setting wsrep_ready to true
2018-06-27 17:38:37 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) desyncs itself from group
2018-06-27 17:38:37 31214 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 1669602)
2018-06-27 17:38:37 31214 [Note] WSREP: Provider paused at e73053ea-2b5c-11e7-976a-bf64a7a065f8:1669602 (5831)
2018-06-27 17:38:37 31214 [Note] WSREP: resuming provider at 5831
2018-06-27 17:38:37 31214 [Note] WSREP: Provider resumed.
2018-06-27 17:38:37 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) resyncs itself to group
2018-06-27 17:38:37 31214 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 1669602)
2018-06-27 17:38:37 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) synced with group.
2018-06-27 17:38:37 31214 [Note] WSREP: Shifting JOINED → SYNCED (TO: 1669602)
2018-06-27 17:38:37 31214 [Note] WSREP: Synchronized with group, ready for connections
2018-06-27 17:38:37 31214 [Note] WSREP: Setting wsrep_ready to true

2018-06-27 17:38:44 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) desyncs itself from group
2018-06-27 17:38:44 31214 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 1669705)
2018-06-27 17:38:44 31214 [Note] WSREP: Provider paused at e73053ea-2b5c-11e7-976a-bf64a7a065f8:1669705 (6025)
2018-06-27 17:38:44 31214 [Note] WSREP: resuming provider at 6025
2018-06-27 17:38:44 31214 [Note] WSREP: Provider resumed.
2018-06-27 17:38:44 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) resyncs itself to group
2018-06-27 17:38:44 31214 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 1669705)
2018-06-27 17:38:44 31214 [Note] WSREP: Member 2.0 (loaddb-56-node1) synced with group.
2018-06-27 17:38:44 31214 [Note] WSREP: Shifting JOINED → SYNCED (TO: 1669705)
2018-06-27 17:38:44 31214 [Note] WSREP: Synchronized with group, ready for connections
2018-06-27 17:38:44 31214 [Note] WSREP: Setting wsrep_ready to true
–end of log–

Some notable things:

  • When I type “show status like ‘wsrep%’;” on mysql, it shows that node 1’s ‘wsrep_gcache_pool_size = 16682368’, but node 2 and 3’s are ‘wsrep_gcache_pool_size = 268221616’. I don’t know if this is prevalent, but it’s just something I’ve found.
  • It only happens on Node1.
  • We have tried forcing an SST on Node1 (which successfully grabs from data from Node2) but then continues to have this issue