PXC 5.6.24 - All cluster nodes got signal 11 after mass update query

shtzeng · August 17, 2015, 7:40am

Our cluster have 6 nodes and 1 arbitrator.
Each server’s average qps is 10000.
Basic environment: Ubuntu 14.04.2 TLS + Intel Xeon E5-2620v2 * 2 + 8 * 300GB SAS build as RAID 10 + Percona XtraDB Cluster 5.6.24-25.11 + galera 3.11.1 + 1G network

When somebody run a bad query like “UPDATE large_table SET some_column = ‘new_value’;”, all nodes lock write query and wait for unlock.
After this query update finished, some nodes rejoin the cluster. Then mysqld got signal 11 one by one, finally every nodes died.

How to explain this problem?

mysqld got signal 11 log, and every nodes got the same result:


Query (xxxx): is an invalid pointer

detail log:


10:01:45 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=2145
max_threads=4098
thread_count=57
connection_count=24
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1640430 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x37fe0300
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f6ca6a69e00 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x91b54c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x698ba1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f76d3600340]
/lib/x86_64-linux-gnu/libc.so.6(strlen+0x2a)[0x7f76d2a93aea]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3e7)[0x71af47]
/usr/sbin/mysqld(_ZN18Prepared_statement7executeEP6Stringb+0x3a5)[0x737085]
/usr/sbin/mysqld[0x737371]
/usr/sbin/mysqld(_Z19mysqld_stmt_executeP3THDPcj+0x32f)[0x7379af]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1c8c)[0x725cfc]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1e4)[0x727524]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x172)[0x6f0432]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x6f0620]
/usr/sbin/mysqld(pfs_spawn_thread+0x143)[0xb42bd3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f76d35f8182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f76d2b0547d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f693d99ad80): is an invalid pointer
Connection ID (thread ID): 1073150170
Status: NOT_KILLED

my.cnf


# Default values.
[mysqld_safe]
flush_caches
numa_interleave
#
#
[mysqld]
back_log = 65535
binlog_format = ROW
character_set_server = utf8mb4
collation_server = utf8mb4_general_ci
datadir = /var/lib/mysql
default_storage_engine = InnoDB
expand_fast_index_creation = 1
expire_logs_days = 7
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 16
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 32G
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 4096
min_examined_row_limit = 1000
performance-schema-instrument='%=ON'
port = 3306
relay-log-recovery = TRUE
skip-name-resolve
slow_query_log = 1
slow_query_log_timestamp_always = 1
table_open_cache = 4096
thread_cache = 1024
tmpdir = /srv/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
#
# Galera
wsrep_cluster_address = gcomm://ip1:4567,ip2:4567,ip3:4567
wsrep_cluster_name = XXX
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size=4G"
wsrep_slave_threads = 32
wsrep_sst_auth = "root:rootPASSWORD"
wsrep_sst_method = xtrabackup-v2
#
# XXX You *MUST* change!
server-id =

gasolwu · August 19, 2015, 4:45am

I found the similar issue like this one.

[url]https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1273567[/url]

gasolwu · August 19, 2015, 4:52am

Related issue [url]https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1269842[/url]

Topic		Replies	Views
Cluster dies after update with signal 11 error Percona XtraDB Cluster 5.x percona	4	1171	March 10, 2023
All nodes in cluster failed. Percona XtraDB Cluster 5.x	2	812	February 17, 2014
My node is crashed Percona XtraDB Cluster 5.x	0	568	May 28, 2013
percona node crashes bug report mysqld got signal 11 ; Percona XtraDB Cluster 5.x	0	1772	May 22, 2014
Mysqld got signal 11 MySQL 5.7 Percona XtraDB Cluster 5.x	4	1160	June 21, 2023

PXC 5.6.24 - All cluster nodes got signal 11 after mass update query

Related topics