XtraDB Cluster node disconnect from cluster after SST and need new SST operation

Lukasz · August 6, 2013, 6:06am

XtraDB Cluster node db1 try to connect to cluster (db3, garbd), but after success SST and established connection (wsrep_cluster_size show 3 nodes), db1 disconnect from cluster and need another SST (Remove ‘/home/database/mysql//grastate.dat’ file and restart if you wish to continue. (FATAL)).

Cluster worked fine for 2 months.

Following important options are set (DB1):
wsrep_sst_method=xtrabackup
wsrep_cluster_address=gcomm://192.168.1.85,192.168.1.87,192.168.1.60
wsrep_sst_donor=db3_node
wsrep_causal_reads=1
wsrep_provider_options=“gcache.size=4G”

Warning and error messages.

DB1:

130806 10:27:37 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
130806 10:27:37 [Warning] WSREP: Could not open saved state file for reading: /home/database/mysql//grastate.dat
130806 10:27:37 [Warning] WSREP: (0d2e5276-fe72-11e2-9d9d-77047f92373b, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.1.85:4567’ points to own listening address, blacklisting
130806 10:27:38 [Warning] WSREP: Gap in state sequence. Need state transfer.
130806 10:27:40 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (38b565ea-acd1-11e2-0800-579a44d43f8f): 1 (Operation not permitted)
130806 10:30:19 [Warning] WSREP: last inactive check more than PT1.5S ago (PT20.2287S), skipping check
130806 10:32:00 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.61507S), skipping check
130806 10:33:03 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.6398S), skipping check
130806 10:34:41 [Warning] WSREP: last inactive check more than PT1.5S ago (PT6.47264S), skipping check
130806 10:38:04 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56166S), skipping check
130806 10:44:57 [Warning] WSREP: Could not find peer:
130806 10:47:04 [Warning] WSREP: last inactive check more than PT1.5S ago (PT10.2864S), skipping check
130806 10:47:27 [Warning] WSREP: Rejecting JOIN message from 0 (db1_node): new State Transfer required.
130806 10:47:28 [ERROR] WSREP: Local state seqno (64285056) is greater than group seqno (64276104): states diverged. Aborting to avoid potential data loss. Remove ‘/home/database/mysql//grastate.dat’ file and restart if you wish to continue. (FATAL)
130806 10:51:14 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
130806 10:51:14 [Warning] WSREP: Could not open saved state file for reading: /home/database/mysql//grastate.dat
130806 10:51:14 [Warning] WSREP: (5a0906fb-fe75-11e2-9944-efb4a1d25eb5, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.1.85:4567’ points to own listening address, blacklisting
130806 10:51:15 [Warning] WSREP: Gap in state sequence. Need state transfer.
130806 10:51:17 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (38b565ea-acd1-11e2-0800-579a44d43f8f): 1 (Operation not permitted)
130806 10:53:10 [Warning] WSREP: last inactive check more than PT1.5S ago (PT12.3021S), skipping check
130806 10:53:20 [Warning] WSREP: last inactive check more than PT1.5S ago (PT4.09586S), skipping check
130806 10:54:21 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.66241S), skipping check
130806 11:09:42 [Warning] WSREP: last inactive check more than PT1.5S ago (PT7.10041S), skipping check
130806 11:12:10 [Warning] WSREP: Rejecting JOIN message from 1 (db1_node): new State Transfer required.
130806 11:12:10 [ERROR] WSREP: Local state seqno (64301015) is greater than group seqno (64290014): states diverged. Aborting to avoid potential data loss. Remove ‘/home/database/mysql//grastate.dat’ file and restart if you wish to continue. (FATAL)

DB3:

130806 9:12:10 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
130806 9:12:12 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
130806 9:18:54 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
130806 9:19:53 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
130806 9:28:47 [Warning] WSREP: Rejecting JOIN message from 2 (db1_node): new State Transfer required.
130806 10:34:43 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
130806 10:34:44 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
130806 10:34:46 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
130806 10:47:05 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
130806 10:47:27 [Warning] WSREP: Rejecting JOIN message from 0 (db1_node): new State Transfer required.
130806 10:53:05 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
130806 11:09:43 [Warning] WSREP: discarding established (time wait) 5a0906fb-fe75-11e2-9944-efb4a1d25eb5 (tcp://192.168.1.85:4567)
130806 11:09:44 [Warning] WSREP: Could not find peer: 5a0906fb-fe75-11e2-9944-efb4a1d25eb5
130806 11:09:45 [Warning] WSREP: discarding established (time wait) 5a0906fb-fe75-11e2-9944-efb4a1d25eb5 (tcp://192.168.1.85:4567)
130806 11:09:46 [Warning] WSREP: discarding established (time wait) 5a0906fb-fe75-11e2-9944-efb4a1d25eb5 (tcp://192.168.1.85:4567)
130806 11:12:10 [Warning] WSREP: Rejecting JOIN message from 1 (db1_node): new State Transfer required.
130806 11:12:10 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)

percona.jayj · August 6, 2013, 12:31pm

Generally when Galera tells you things like this, you should listen.

I also noted these messages:

130806 10:27:37 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)

That makes me suspect you have an open files limit problem on DB1 – have you adjusted your ulimits?

Topic		Replies	Views
SST Failure XtraDB Cluster 5.6.28 Percona XtraDB Cluster 5.x	1	643	November 9, 2016
Randomly IST fail Percona XtraDB Cluster 5.x	4	1052	May 22, 2015
xtrabackup SST error Percona XtraDB Cluster 5.x	5	4854	October 9, 2022
Xtradb cluster: WSREP_SST error Percona XtraDB Cluster 5.x	2	760	May 21, 2015
Failure to add new XtraDB cluster nodes Percona XtraDB Cluster 5.x	4	1764	September 12, 2014

XtraDB Cluster node disconnect from cluster after SST and need new SST operation

Related topics