restart problem with XtraDB cluster nodes

Hi,

I set up a cluster of six nodes, everything works fine but when I shut down a node I can’t start it. Here are the contents of the log that appears during startup:

terminate called after throwing an instance of 'gu::NotFound’13:43:36 UTC - mysqld got signal 6 ;This could be because you hit a bug. It is also possible that this binaryor one of the libraries it was linked against is corrupt, improperly built,or misconfigured. This error can also be caused by malfunctioning hardware.We will try our best to scrape up some info that will hopefully helpdiagnose the problem, but since we have already crashed,something is definitely wrong and this may fail.key_buffer_size=0read_buffer_size=262144max_used_connections=0max_threads=10000thread_count=2connection_count=2It is possible that mysqld could use up tokey_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 7806328 K bytes of memoryHope that’s ok; if not, decrease some variables in the equation.Thread pointer: 0x1557c500Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong…stack_bottom = 412ad0f8 thread_stack 0x40000/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7bb1c5]/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x693af4]/lib64/libpthread.so.0[0x31f4a0ebe0]/lib64/libc.so.6(gsignal+0x35)[0x31f4230285]/lib64/libc.so.6(abort+0x110)[0x31f4231d30]/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[0x31f8ebed14]/usr/lib64/libstdc++.so.6[0x31f8ebce16]/usr/lib64/libstdc++.so.6[0x31f8ebce43]/usr/lib64/libstdc++.so.6[0x31f8ebce56]/usr/lib64/libstdc++.so.6(__cxa_call_unexpected+0x48)[0x31f8ebc8a8]/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM15prepare_for_ISTERPvRlRK10wsrep_uuidl+0x70b)[0x2aaaacf3c88b]/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM21prepare_state_requestEPKvlRK10wsrep_uuidl+0x13d)[0x2aaaacf3cb3d]/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM22request_state_transferEPvRK10wsrep_uuidlPKvl+0x35)[0x2aaaacf3cd95]/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM19process_conf_changeEPvRK15wsrep_view_infoiNS_10Replicator5StateEl+0x5cb)[0x2aaaacf2d60b]/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_action+0x8ee)[0x2aaaacf0e5de]/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPv+0x58)[0x2aaaacf0e898]/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0xfd)[0x2aaaacf2caed]/usr/lib64/libgalera_smm.so(galera_recv+0x23)[0x2aaaacf41be3]/usr/sbin/mysqld(_Z25wsrep_replication_processP3THD+0x6b)[0x58d12b]/usr/sbin/mysqld(start_wsrep_THD+0x3f3)[0x51ffa3]/lib64/libpthread.so.0[0x31f4a0677d]/lib64/libc.so.6(clone+0x6d)[0x31f42d325d]Trying to get some variables.Some pointers may be invalid and cause the dump to abort.Query (0): is an invalid pointerConnection ID (thread ID): 2Status: NOT_KILLEDThe manual page at MySQL :: MySQL 8.0 Reference Manual :: B.3.3.3 What to Do If MySQL Keeps Crashing containsinformation that should help you find out what is causing the crash.120411 15:43:36 mysqld_safe mysqld from pid file /var/lib/mysql/srv2.pid ended

I found a solution for restarting the node, I delete the grastate.dat file in the datadir and then start working.

Why this node can’t start if I don’t delete this grastate.dat file?

Thank you in advance for the help

Hi,

I see that a new forum called Percona XtraDB Cluster has been opened, anyone can move this thread in it ?

After many tests, I found that this problem was related to one of nodes that is a replication slave. There is a lot of transactions on this node by the replication channel. If I remove this node from the cluster, stop/start of all other nodes works perfectly but no data are inserted in this context.

At the beginning I have configured the replication slave node to be the first node of the cluster and then I have removed it from the cluster and I add it again so that isn’t the first cluster node. Here again when I launch stop/start on a node, I can see the same error message in the mysqld.log file.

I don’t understand and I don’t know how to avoid this.

Edit:

It’s apparently not a replication problem. I exclude the replication slave node from the cluster, I launch a sysbench on a node and try to restart another node during this bench: same issue.