Node 2 crashes, possible cause?

Hi,

We have a 3 nodes - Percona XtraDB 5.7.x, it crashed today and I have no idea why, can somebody give me some lights?

This is what we are running:

mysql> \s


mysql Ver 14.14 Distrib 5.7.28-31, for debian-linux-gnu (x86_64) using 6.3

Connection id: 654497

Current database:

Current user: root@localhost

SSL: Cipher in use is ECDHE-RSA-AES128-GCM-SHA256

Current pager: stdout

Using outfile: ‘’

Using delimiter: ;

Server version: 5.7.28-31-57.2-log Percona XtraDB Cluster (GPL), Release rel31, Revision 939b65f, WSREP version 31.41, wsrep_31.41

Protocol version: 10

Connection: 127.0.0.1 via TCP/IP

Server characterset: utf8mb4

Db characterset: utf8mb4

Client characterset: utf8mb4

Conn. characterset: utf8mb4

TCP port: 3306

Uptime: 7 days 21 hours 14 min 28 sec

2020-10-23T06:53:39.857829-05:00 309846 [Note] Aborted connection 309846 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

2020-10-23T06:53:51.956845-05:00 0 [Note] WSREP: Deleted page /var/lib/mysql/alertdriving/gcache.page.000005

2020-10-23T06:53:52.667054-05:00 309847 [Note] Aborted connection 309847 to db: ‘unconnected’ user: ‘monitorps2’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T06:53:52.671057-05:00 309848 [Note] Aborted connection 309848 to db: ‘unconnected’ user: ‘monitorps3’ host: ‘10.102.70.147’ (Got an error reading communication packets)

2020-10-23T06:53:52.678720-05:00 309850 [Note] Aborted connection 309850 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

2020-10-23T06:54:14.064383-05:00 309853 [Note] Aborted connection 309853 to db: ‘unconnected’ user: ‘monitorps3’ host: ‘10.102.70.147’ (Got an error reading communication packets)

2020-10-23T06:54:14.064685-05:00 309855 [Note] Aborted connection 309855 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

2020-10-23T06:54:14.065164-05:00 309852 [Note] Aborted connection 309852 to db: ‘unconnected’ user: ‘monitorps2’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T06:54:34.283625-05:00 0 [Note] WSREP: Created page /var/lib/mysql/alertdriving/gcache.page.000008 of size 209691272 bytes

2020-10-23T06:55:07.904887-05:00 309883 [Note] WSREP: monitor wait failed for causal read (repl.causal_read_timeout): : 110 (Connection timed out)

at galerautils/src/gu_lock.hpp:wait():97

2020-10-23T06:55:07.905164-05:00 309883 [Note] Aborted connection 309883 to db: ‘unconnected’ user: ‘monitorps2’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T06:55:07.910990-05:00 309884 [Note] WSREP: monitor wait failed for causal read (repl.causal_read_timeout): : 110 (Connection timed out)

at galerautils/src/gu_lock.hpp:wait():97

2020-10-23T06:55:07.911111-05:00 309884 [Note] Aborted connection 309884 to db: ‘unconnected’ user: ‘monitorps3’ host: ‘10.102.70.147’ (Got an error reading communication packets)

2020-10-23T06:55:09.165197-05:00 309885 [Note] WSREP: monitor wait failed for causal read (repl.causal_read_timeout): : 110 (Connection timed out)

at galerautils/src/gu_lock.hpp:wait():97

2020-10-23T06:55:09.165463-05:00 309885 [Note] Aborted connection 309885 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

2020-10-23T06:55:25.087214-05:00 309889 [Note] Aborted connection 309889 to db: ‘unconnected’ user: ‘monitorps3’ host: ‘10.102.70.147’ (Got an error reading communication packets)

2020-10-23T06:55:25.087535-05:00 309890 [Note] Aborted connection 309890 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

2020-10-23T06:55:25.089725-05:00 309888 [Note] Aborted connection 309888 to db: ‘unconnected’ user: ‘monitorps2’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T06:55:25.138792-05:00 0 [Note] WSREP: Deleted page /var/lib/mysql/alertdriving/gcache.page.000006

2020-10-23T06:55:25.176410-05:00 0 [Note] WSREP: Deleted page /var/lib/mysql/alertdriving/gcache.page.000007

2020-10-23T06:55:33.583201-05:00 0 [Note] WSREP: Deleted page /var/lib/mysql/alertdriving/gcache.page.000008

2020-10-23T08:08:02.362851-05:00 310147 [Note] Aborted connection 310147 to db: ‘information_schema’ user: ‘root’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T15:17:30.873494-05:00 309925 [Note] Aborted connection 309925 to db: ‘information_schema’ user: ‘root’ host: ‘10.102.70.147’ (Got an error reading communication packets)

2020-10-23T16:31:29.148026-05:00 342873 [Note] Aborted connection 342873 to db: ‘unconnected’ user: ‘monitorps2’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T16:31:33.233197-05:00 342874 [Note] Aborted connection 342874 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

2020-10-23T16:31:34.701012-05:00 342875 [Note] Aborted connection 342875 to db: ‘unconnected’ user: ‘monitorps3’ host: ‘10.102.70.147’ (Got an error reading communication packets)

2020-10-23T16:31:39.356872-05:00 342876 [Note] Aborted connection 342876 to db: ‘unconnected’ user: ‘monitorps2’ host: ‘10.102.70.146’ (Got an error reading communication packets)

2020-10-23T16:31:42.639972-05:00 342877 [Note] Aborted connection 342877 to db: ‘unconnected’ user: ‘monitorps1’ host: ‘10.102.70.145’ (Got an error reading communication packets)

*** Error in `/usr/sbin/mysqld’: corrupted double-linked list: 0x00007f0e14206f90 ***

20:34:16 UTC - mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

Attempting to collect some information that could help diagnose the problem.

As this is a crash and something is definitely wrong, the information

collection process might fail.

Please help us make Percona XtraDB Cluster better by reporting any

bugs at https://jira.percona.com/projects/PXC/issues

key_buffer_size=8388608

read_buffer_size=131072

max_used_connections=20

max_threads=152

thread_count=20

connection_count=11

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68419 K bytes of memory

Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f0e14012030

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong…

stack_bottom = 7f0ea4d2ceb8 thread_stack 0x40000

/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xedabec]

/usr/sbin/mysqld(handle_fatal_signal+0x479)[0x7a9079]

/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f0ebe668890]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f0ebc5ee067]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f0ebc5ef448]

/lib/x86_64-linux-gnu/libc.so.6(+0x731b4)[0x7f0ebc62c1b4]

/lib/x86_64-linux-gnu/libc.so.6(+0x7898e)[0x7f0ebc63198e]

/lib/x86_64-linux-gnu/libc.so.6(+0x79923)[0x7f0ebc632923]

/usr/sbin/mysqld[0x100a08e]

/usr/sbin/mysqld(_Z29ibuf_merge_or_delete_for_pageP11buf_block_tRK9page_id_tPK11page_size_tm+0xb28)[0x1010438]

/usr/sbin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tbP7dberr_t+0x154f)[0x11c12cf]

/usr/sbin/mysqld(_Z27btr_cur_search_to_nth_levelP12dict_index_tmPK8dtuple_t15page_cur_mode_tmP9btr_cur_tmPKcmP5mtr_t+0x5d6)[0x119e6b6]

/usr/sbin/mysqld(_Z27row_ins_sec_index_entry_lowmmP12dict_index_tP16mem_block_info_tS2_P8dtuple_tmP9que_thr_tb+0x2c5)[0x10abc05]

/usr/sbin/mysqld(_Z23row_ins_sec_index_entryP12dict_index_tP8dtuple_tP9que_thr_tb+0x132)[0x10b1912]

/usr/sbin/mysqld(_Z12row_ins_stepP9que_thr_t+0x2b4)[0x10b4104]

/usr/sbin/mysqld[0x10c92fb]

/usr/sbin/mysqld(_ZN11ha_innobase9write_rowEPh+0x20f)[0xfc8a6f]

/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0x198)[0x82a2e8]

/usr/sbin/mysqld(_ZN20Write_rows_log_event9write_rowEPK14Relay_log_infob+0x12c)[0xe4d87c]

/usr/sbin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEPK14Relay_log_info+0x85)[0xe4dc95]

/usr/sbin/mysqld(_ZN14Rows_log_event12do_apply_rowEPK14Relay_log_info+0x26)[0xe37e26]

/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0xb3c)[0xe4a6bc]

/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x61)[0xe420a1]

/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x3d2)[0x7c28b2]

/usr/lib/galera3/libgalera_smm.so(+0x17951c)[0x7f0eb324451c]

/usr/lib/galera3/libgalera_smm.so(+0x1bf2b4)[0x7f0eb328a2b4]

/usr/lib/galera3/libgalera_smm.so(+0x1c1ee0)[0x7f0eb328cee0]

/usr/lib/galera3/libgalera_smm.so(+0x1c58ee)[0x7f0eb32908ee]

/usr/lib/galera3/libgalera_smm.so(+0x19e280)[0x7f0eb3269280]

/usr/lib/galera3/libgalera_smm.so(+0x19fa8a)[0x7f0eb326aa8a]

/usr/lib/galera3/libgalera_smm.so(+0x1c611b)[0x7f0eb329111b]

/usr/lib/galera3/libgalera_smm.so(+0x1db64d)[0x7f0eb32a664d]

/usr/sbin/mysqld[0x7c40fb]

/usr/sbin/mysqld(start_wsrep_THD+0x1ef)[0x79a3df]

/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xef2d24]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f0ebe661064]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0ebc6a162d]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0): is an invalid pointer

Connection ID (thread ID): 10

Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting

http://www.percona.com/software/percona-xtradb-cluster/. You may find information

in the manual which will help you identify the cause of the crash.

Hi, this looks like a bug. Can I ask you to open an issue in our JIRA so dev can look into it?

https://jira.percona.com/projects/PXC/issues

Thanks