New Cluster locked after starting second node

Hello

I am trying to move our database 270 GB to a new cluster .

To do this i installed a 3 node cluster runnign centos6 and xtradb cluster and every thing worked fine .

Then i shut down the 3 nodes and did xtradbbackup from our old server and started the first node then configured it as slave to keep data up to date and every thing worked fine.

Then i started the second node and it started to do State transfer but as soon as the transfer is done and mysql started on second node both nodes enter a dead zone i think and both got locked.

i can’t do any command on both when try to do use database on second node i got unkown command !!! , also on first node i tried to do stop slave but it stuck untill i did /etc/init.d/mysql stop on second node so it was freed and continued normally but still second node was stuck the in the shutdown process at

120914 21:20:22 [Note] WSREP: recv_thread() joined.
120914 21:20:22 [Note] WSREP: Closing slave action queue.

so i am very confused why the nodes works fine before copying out database but as soon as i copy our database some thing goes wrong and it is not saying what it is about.

Update:

I think i am having a split brain here , so i will try now to stop the replication to have time to start both node02 and node03 . and we will see how it goes.

Any idea how to be able to start node02 and node03 whithout having to step into split brain, knowing that when i start node02 i directly go into this state without even doing any query on the system.

Any Ideas
BR
Shahbour

Hello

I did start the three nodes but still after running the start slave by 3 min the got stuck.

I was able to do select on all three but not any insert or update.

I checked the wsrep and the were all ready and sync .

when running SHOW FULL PROCESSLIST; on the second node it showes the below ?/

Any Idea ?

mysql> SHOW FULL PROCESSLIST;
±—±------------±----------±-----±--------±------±-- ---------------------------------------±------------------- —±----------±--------------±----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
±—±------------±----------±-----±--------±------±-- ---------------------------------------±------------------- —±----------±--------------±----------+
| 1 | system user | | NULL | Sleep | 70948 | applied write set 3178572 | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 2431 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 3 | system user | | NULL | Sleep | 70948 | Delete_rows_log_event::find_row(3178571) | NULL | 0 | 0 | 1 |
| 5 | root | localhost | NULL | Query | 0 | sleeping | SHOW FULL PROCESSLIST | 0 | 0 | 1 |
±—±------------±----------±-----±--------±------±-- ---------------------------------------±------------------- —±----------±--------------±----------+
4 rows in set (0.00 sec)

Any idea ?

Hello

It seems second node is taking too much time in running

Delete_rows_log_event::find_row(18003806)

what is this and why it is locking every thing.

BR
Shahbour

1 Like