Lukasz
August 6, 2013, 6:06am
1
XtraDB Cluster node db1 try to connect to cluster (db3, garbd), but after success SST and established connection (wsrep_cluster_size show 3 nodes), db1 disconnect from cluster and need another SST (Remove ‘/home/database/mysql//grastate.dat’ file and restart if you wish to continue. (FATAL)).
Cluster worked fine for 2 months.
Following important options are set (DB1):
wsrep_sst_method=xtrabackup
wsrep_cluster_address=gcomm://192.168.1.85,192.168.1.87,192.168.1.60
wsrep_sst_donor=db3_node
wsrep_causal_reads=1
wsrep_provider_options=“gcache.size=4G”
Warning and error messages.
DB1:
130806 10:27:37 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
130806 10:27:37 [Warning] WSREP: Could not open saved state file for reading: /home/database/mysql//grastate.dat
130806 10:27:37 [Warning] WSREP: (0d2e5276-fe72-11e2-9d9d-77047f92373b, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.1.85:4567’ points to own listening address, blacklisting
130806 10:27:38 [Warning] WSREP: Gap in state sequence. Need state transfer.
130806 10:27:40 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (38b565ea-acd1-11e2-0800-579a44d43f8f): 1 (Operation not permitted)
130806 10:30:19 [Warning] WSREP: last inactive check more than PT1.5S ago (PT20.2287S), skipping check
130806 10:32:00 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.61507S), skipping check
130806 10:33:03 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.6398S), skipping check
130806 10:34:41 [Warning] WSREP: last inactive check more than PT1.5S ago (PT6.47264S), skipping check
130806 10:38:04 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56166S), skipping check
130806 10:44:57 [Warning] WSREP: Could not find peer:
130806 10:47:04 [Warning] WSREP: last inactive check more than PT1.5S ago (PT10.2864S), skipping check
130806 10:47:27 [Warning] WSREP: Rejecting JOIN message from 0 (db1_node): new State Transfer required.
130806 10:47:28 [ERROR] WSREP: Local state seqno (64285056) is greater than group seqno (64276104): states diverged. Aborting to avoid potential data loss. Remove ‘/home/database/mysql//grastate.dat’ file and restart if you wish to continue. (FATAL)
130806 10:51:14 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
130806 10:51:14 [Warning] WSREP: Could not open saved state file for reading: /home/database/mysql//grastate.dat
130806 10:51:14 [Warning] WSREP: (5a0906fb-fe75-11e2-9944-efb4a1d25eb5, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.1.85:4567’ points to own listening address, blacklisting
130806 10:51:15 [Warning] WSREP: Gap in state sequence. Need state transfer.
130806 10:51:17 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (38b565ea-acd1-11e2-0800-579a44d43f8f): 1 (Operation not permitted)
130806 10:53:10 [Warning] WSREP: last inactive check more than PT1.5S ago (PT12.3021S), skipping check
130806 10:53:20 [Warning] WSREP: last inactive check more than PT1.5S ago (PT4.09586S), skipping check
130806 10:54:21 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.66241S), skipping check
130806 11:09:42 [Warning] WSREP: last inactive check more than PT1.5S ago (PT7.10041S), skipping check
130806 11:12:10 [Warning] WSREP: Rejecting JOIN message from 1 (db1_node): new State Transfer required.
130806 11:12:10 [ERROR] WSREP: Local state seqno (64301015) is greater than group seqno (64290014): states diverged. Aborting to avoid potential data loss. Remove ‘/home/database/mysql//grastate.dat’ file and restart if you wish to continue. (FATAL)
DB3:
Generally when Galera tells you things like this, you should listen.
I also noted these messages:
130806 10:27:37 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
That makes me suspect you have an open files limit problem on DB1 – have you adjusted your ulimits?