One of my nodes is behind

I have a 5 node cluster. I run a statement on all of the servers to do a basic check. One table should all have the same date_and_time. However one of the nodes does not have that time. I have a multimaster setup. However my loading scripts are only writing to node1. I use show status like 'wsrep_clu%'; and they are all reporting the same

  wsrep_cluster_weight         5                                    
  wsrep_cluster_capabilities
  wsrep_cluster_conf_id       53                                   
  wsrep_cluster_size            5                                    
  wsrep_cluster_state_uuid  76174b3b-01de-11ed-a719-bb52abb919c1 
  wsrep_cluster_status         Primary                              

Using show processlist is showing a lot of I guess in process replications.

  db: NULL
  Command: Query
  Time: 9758
  State: wsrep: committed write set (623924)
   Info: NULL
  Time_ms: 11820


     Id: 8
     User: event_scheduler
     Host: localhost
      db: NULL
      Command: Daemon
     Time: 1446964
    State: Waiting on empty queue
     Info: NULL
    Time_ms: 1446964376
   Rows_sent: 0
   Rows_examined: 0

       Id: 11                                                                                                                                                     
      User: system user 
     Host:
       db: store1
     Command: Query
     Time: 9739
      State: wsrep: updating row for write-set (623925)
      Info: NULL
     Time_ms: 359
      Rows_sent: 0
    Rows_examined: 0


    User: system user
     Host:
       db: NULL
    Command: Query
     Time: 9760
    State: wsrep: committed write set (623922)
     Info: NULL
     Time_ms: 12429
    Rows_sent: 0
  Rows_examined: 0

Is there anyway to view the wsrep: committed write set (###). To determine what exactly is the issue?

Also since the node is stuck and not accepting replications (or think it’s up to date) is it safe to just restart the node.

1 Like

Restarting the node won’t change anything. In PXC/Galera, replication of events is synchronous, but apply is async. Verify that all 5 nodes have the exact same specifications, and exact same my.cnf parameters. Are all 5 nodes on the same LAN? I suggest installing and configuring PMM to correlate all 5 nodes onto the same graphs, that way you can see if one is hitting CPU/disk/memory issues.

1 Like

All are on the same lan, my.cnf parameters are the same with a few exceptions wsrep_sst_donor,wsrep_cluster_address those of course exclude the node’s ip and name in the donor. The server spec’s are slightly different on two of the nodes. But none of the boxes are maxing CPU/disk/memory when the node is not in sync.

1 Like