Percona-XtraDB-Cluster 5.7.19 Bug or Not?

I have a 3-node Cluster installed on Oracle Linux Server release 7.3. On each Node RAM is 16GB.

1st Node shutdown immediately and error in log is like following

2018-03-10T17:53:34.779241Z 10 [Note] WSREP: Victim thread:
THD: 1362991, mode: local, state: committing, conflict: no conflict, seqno: -1
SQL: commit

2018-03-10T17:59:14.170233Z 1369479 [Note] Aborted connection 1369479 to db: ‘shems’ user: ‘asterisk’ host: ‘10.22.12.105’ (Got an error reading communication packets)
2018-03-10T18:03:32.001458Z 1370575 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort

*** Priority TRANSACTION:
TRANSACTION 12735007, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
MySQL thread id 7, OS thread handle 139814477833984, query id 48121740 System lock

*** Victim TRANSACTION:
TRANSACTION 12735006, ACTIVE 0 sec
, undo log entries 1
MySQL thread id 1362991, OS thread handle 139814137227008, query id 48121738 10.22.12.105 shems wsrep: initiating replication for write set (-1)
commit
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1566 page no 3 n bits 88 index PRIMARY of table shems.realtime_map trx id 12735006 lock_mode X locks rec but not gap
2018-03-10T18:05:47.369670Z 7 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-03-10T18:05:47.369680Z 7 [Note] WSREP: cluster conflict due to high priority abort for threads:

2018-03-10T18:05:47.369687Z 7 [Note] WSREP: Winning thread:
THD: 7, mode: applier, state: executing, conflict: no conflict, seqno: 3779483
SQL: (null)

2018-03-10T18:05:47.369693Z 7 [Note] WSREP: Victim thread:
THD: 1362991, mode: local, state: committing, conflict: no conflict, seqno: -1
SQL: commit

2018-03-10T18:22:47.005681Z 1373879 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T18:43:01.001650Z 1377351 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T18:43:01.003203Z 1377353 [Note] WSREP: MDL conflict db= table=task_event ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T19:03:03.001551Z 1380791 [Note] WSREP: MDL conflict db= table=task_event ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T20:00:51.686305Z 1390566 [Note] Aborted connection 1390566 to db: ‘shems’ user: ‘asterisk’ host: ‘10.22.12.105’ (Got an error reading communication packets)
2018-03-10T20:05:02.002695Z 1391429 [Note] WSREP: MDL conflict db= table=task_event ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T20:05:02.003068Z 1391430 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T20:08:06.370495Z 1308616 [Note] Aborted connection 1308616 to db: ‘shems’ user: ‘asterisk’ host: ‘10.22.12.105’ (Got an error reading communication packets)
2018-03-10T20:45:08.784890Z 3 [Note] WSREP: MDL conflict db= table= ticket=MDL_SHARED solved by abort
2018-03-10T20:45:09.755715Z 1398317 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T20:50:57.001104Z 1399312 [Note] WSREP: MDL conflict db= table=task_event ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T20:50:57.001923Z 1399313 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort
2018-03-10T21:58:49.001785Z 1410960 [Note] WSREP: MDL conflict db= table=task_event_2 ticket=MDL_EXCLUSIVE solved by abort

*** Priority TRANSACTION:
TRANSACTION 12770159, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
MySQL thread id 8, OS thread handle 139814477035264, query id 48592432 System lock

*** Victim TRANSACTION:
TRANSACTION 12770158, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
MySQL thread id 1411254, OS thread handle 139814133233408, query id 48592430 event_scheduler updating
update call_durations set call_duration=@duration where call_id= NAME_CONST(‘cid’,_utf8’1520664191.13624’ COLLATE ‘utf8_general_ci’) and user_id= NAME_CONST(‘uid’,8)
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1468 page no 4 n bits 264 index GEN_CLUST_INDEX of table shems.call_durations trx id 12770158 lock_mode X
2018-03-10T22:00:31.075756Z 8 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-03-10T22:00:31.075777Z 8 [Note] WSREP: cluster conflict due to high priority abort for threads:

2018-03-10T22:00:31.075828Z 8 [Note] WSREP: Winning thread:
THD: 8, mode: applier, state: executing, conflict: no conflict, seqno: 3796899
SQL: (null)

2018-03-10T22:00:31.075849Z 8 [Note] WSREP: Victim thread:
THD: 1411254, mode: local, state: idle, conflict: no conflict, seqno: -1
SQL: update call_durations set call_duration=@duration where call_id= NAME_CONST(‘cid’,_utf8’1520664191.13624’ COLLATE ‘utf8_general_ci’) and user_id= NAME_CONST(‘uid’,8)

2018-03-11 02:00:31 0x7f2903bef700 InnoDB: Assertion failure in thread 139814133233408 in file lock0lock.cc line 7973
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: [url]http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html[/url]
InnoDB: about forcing recovery.
22:00:31 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at [url]https://bugs.launchpad.net/percona-xtradb-cluster[/url]

key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=12
max_threads=152
thread_count=15
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 751788 K bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f28802cfe00
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 7f2903beeabf thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xf3de9b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7adfc1]
/lib64/libpthread.so.0(+0xf370)[0x7f2938714370]
/lib64/libc.so.6(gsignal+0x37)[0x7f29367f21d7]
/lib64/libc.so.6(abort+0x148)[0x7f29367f38c8]
/usr/sbin/mysqld[0x77766a]
/usr/sbin/mysqld[0x109158d]
/usr/sbin/mysqld(_ZN7RecLock14deadlock_checkEP9ib_lock_t+0x1d)[0x10915ad]
/usr/sbin/mysqld(_ZN7RecLock12add_to_waitqEPK9ib_lock_tPK9lock_prdt+0x82)[0x1091dc2]
/usr/sbin/mysqld[0x109251f]
/usr/sbin/mysqld(_Z34lock_clust_rec_read_check_and_lockmPK11buf_block_tPKhP12dict_index_tPKm9lock_modemP9que_thr_t+0xe5)[0x1093a85]
/usr/sbin/mysqld[0x115337a]
/usr/sbin/mysqld(_Z15row_search_mvccPh15page_cur_mode_tP14row_prebuilt_tmm+0x1635)[0x115b715]
/usr/sbin/mysqld(_ZN11ha_innobase13general_fetchEPhjj+0x1a5)[0x1041ae5]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x177)[0x82a097]
/usr/sbin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x20)[0xc5f250]
/usr/sbin/mysqld(Z12mysql_updateP3THDR4ListI4ItemES4_y15enum_duplicatesPyS6+0x11bf)[0xd8dd5f]
/usr/sbin/mysqld(_ZN14Sql_cmd_update23try_single_table_updateEP3THDPb+0x1d6)[0xd904b6]
/usr/sbin/mysqld(_ZN14Sql_cmd_update7executeEP3THD+0x36)[0xd907f6]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x3797)[0xd000c7]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x50)[0xc7bb40]
/usr/sbin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x374)[0xc7d774]
/usr/sbin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0xbb)[0xc7e15b]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x330)[0xc7f510]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x53b)[0xc7737b]
/usr/sbin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x7a7)[0xc7b017]
/usr/sbin/mysqld(_ZN14Event_job_data7executeEP3THDb+0xabb)[0xdd9ccb]
/usr/sbin/mysqld(_ZN19Event_worker_thread3runEP3THDP28Event_queue_element_for_exec+0x188)[0xe95cd8]
/usr/sbin/mysqld(event_worker_thread+0x57)[0xe95da7]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xfc2394]
/lib64/libpthread.so.0(+0x7dc5)[0x7f293870cdc5]
/lib64/libc.so.6(clone+0x6d)[0x7f29368b473d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (48e8c48): update call_durations set call_duration=@duration
where call_id= NAME_CONST(‘cid’,_utf8’1520664191.13624’ COLLATE ‘utf8_general_ci’) and user_id= NAME_CONST(‘uid’,8)
Connection ID (thread ID): 1411254
Status: NOT_KILLED

[COLOR=#FF0000]WHAT IS THE REASON? IS IT BUG OR NOT?

Anytime a crash like this happens, it’s a bug, it’s running into an assert in the code.

What is the exact scenario? From the stack trace it looks like you’re calling a stored procedure from an event. Is this correct?

This appears to be a bug when the event gets rolled back due to conflict. This occurs on both PXC 5.6 and 5.7.

A jira bug has been created for this:

[url][PXC-2079] Crash during rollback of a stored procedure called in an event - Percona JIRA

Yes it is an event calling procedure.

Generally I think there is some problems with events in PXC 5.7. For example in my situation (3-node cluster). The Events are only enabled in 1st node by default. If 1-st node is crash then the events are not enabled in one of the other 2 nodes automatically. Even if I forced to enable in one of other 2 nodes after crash, although status of events show enabled, actually doesn’t work.
Means that events are only works in 1-st node - originated node.
Is it really work principle of events in PXC 5.7? If yes then it is the problem for cluster and hope to be solved in following releases or can be solved with patch or bug fixes.
If there is any documentation or blog which explain events detailed, please inform me.
Thanks.

As far as events are concerned, PXC (galera) replication should behave similarly to asynchronous replication.

This is a link to some documentation from MySQL.

https://dev.mysql.com/doc/refman/5.7/en/replication-features-invoked.html

I’ve tested the events and they seem to work on the other nodes (after they are enabled and event-scheduler is set to ON) and as long as the event action is not rolled back due to a BF abort.

There is no automatic failover of the events in the case a node goes down.

Thanks Kenn Takara,

It is clear now.

In my.cnf event-scheduler configured as OFF. I forget about it. So it doesn’t work.