Production Percona DB crashing repeatedly

Hi all, I’ve got a production box that crashes repeatedly with the following:
Mar 15 12:13:01 box863 mysqld: 18:13:01 UTC - mysqld got signal 11 ;
Mar 15 12:13:01 box863 mysqld: This could be because you hit a bug. It is also possible that this binary
Mar 15 12:13:01 box863 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Mar 15 12:13:01 box863 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Mar 15 12:13:01 box863 mysqld: We will try our best to scrape up some info that will hopefully help
Mar 15 12:13:01 box863 mysqld: diagnose the problem, but since we have already crashed,
Mar 15 12:13:01 box863 mysqld: something is definitely wrong and this may fail.
Mar 15 12:13:01 box863 mysqld: Please help us make Percona Server better by reporting any
Mar 15 12:13:01 box863 mysqld: bugs at [url]System Dashboard - Percona JIRA
Mar 15 12:13:01 box863 mysqld:
Mar 15 12:13:01 box863 mysqld: key_buffer_size=268435456
Mar 15 12:13:01 box863 mysqld: read_buffer_size=4194304
Mar 15 12:13:01 box863 mysqld: max_used_connections=0
Mar 15 12:13:01 box863 mysqld: max_threads=1502
Mar 15 12:13:01 box863 mysqld: thread_count=0
Mar 15 12:13:01 box863 mysqld: connection_count=0
Mar 15 12:13:01 box863 mysqld: It is possible that mysqld could use up to
Mar 15 12:13:01 box863 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12585173 K bytes of memory
Mar 15 12:13:01 box863 mysqld: Hope that’s ok; if not, decrease some variables in the equation.
Mar 15 12:13:01 box863 mysqld:
Mar 15 12:13:01 box863 mysqld: Thread pointer: 0x0
Mar 15 12:13:01 box863 mysqld: Attempting backtrace. You can use the following information to find out
Mar 15 12:13:01 box863 mysqld: where mysqld died. If you see no messages after this, something went
Mar 15 12:13:01 box863 mysqld: terribly wrong…
Mar 15 12:13:01 box863 mysqld: stack_bottom = 0 thread_stack 0x40000
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7cbfc5]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6a0ec4]
Mar 15 12:13:01 box863 mysqld: /lib64/libpthread.so.0[0x344aa0f710]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld(ZN3THD15raise_conditionEjPKcN11MYSQL_ERROR18enum_warning_levelES1+0x3b)[0x56ed6b]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld(_Z19push_warning_printfP3THDN11MYSQL_ERROR18enum_warning_levelEjPKcz+0xdd)[0x57c08d]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld(ib_warn_row_too_big+0x86)[0x8348a6]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x8e7753]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x8f3c70]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x8f1c52]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x8f2785]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x96f9f5]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x963d27]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x882cf7]
Mar 15 12:13:01 box863 mysqld: /usr/sbin/mysqld[0x877ebc]
Mar 15 12:13:01 box863 mysqld: /lib64/libpthread.so.0[0x344aa079d1]
Mar 15 12:13:01 box863 mysqld: /lib64/libc.so.6(clone+0x6d)[0x344a2e88fd]
Mar 15 12:13:01 box863 mysqld: You may download the Percona Server operations manual by visiting
Mar 15 12:13:01 box863 mysqld: [url]http://www.percona.com/software/percona-server/[/url]. You may find information
Mar 15 12:13:01 box863 mysqld: in the manual which will help you identify the cause of the crash.
Mar 15 12:13:04 box863 mysqld_safe: mysqld from pid file /var/lib/mysql/box863.bluehost.com.pid ended

Sorry, I forgot to include that I’ve had to recover InnoDB tables several times over the past few days due to this issue.

Hi dctechtest;

What version are you using? I think you might be hitting bug #20144839 (can’t link to it as Oracle is not publishing the details). Try upgrading to 5.5.42 or 5.6.23 (if you are not there already) and see if it still happens.

-Scott

updated to mysql Ver 14.14 Distrib 5.5.42-37.1, for Linux (x86_64) using readline 5.1, and my crashes stopped. Thanks scott.nemes.

Hi dctechtest;

Excellent! Glad that worked out for you. =)

-Scott

Hi Scott,
Could you please help me with my issue. I am on 5.6.21 and experiencing a similar kind of issue.

Hi,

getting the same exact problem with a just a few days upgraded node of Percona XtraDB Cluster to v5.5.41, tested to downgrade back to v5.5.37 but same problem occurs again !?.

Node is syncing OK with xtrabackup from another node in the cluster (420Go of data) but when trying to start after xtrabackup transfer it crashed with the following logs :

150413 22:24:20 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (71f6d011-dc25-11e3-9cad-be804f0c3a69): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():447. IST will be unavailable.
150413 22:24:20 [Note] WSREP: Node 0 (altair) requested state transfer from ‘any’. Selected 1 (atlas)(SYNCED) as donor.
150413 22:24:20 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 19776226)
150413 22:24:20 [Note] WSREP: Requesting state transfer: success, donor: 1
150413 23:30:34 [Note] WSREP: 1 (atlas): State transfer to 0 (altair) complete.
150413 23:30:34 [Note] WSREP: Member 1 (atlas) synced with group.
WSREP_SST: [INFO] Proceeding with SST (20150413 23:30:34.796)
WSREP_SST: [INFO] Removing existing ib_logfile files (20150413 23:30:34.800)
WSREP_SST: [INFO] Preparing the backup at /data/mysql/ (20150413 23:30:34.878)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20150413 23:30:34.881)
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20150413 23:31:14.622)
WSREP_SST: [INFO] Removing the sst_in_progress file (20150413 23:31:14.626)
150413 23:31:14 [Note] WSREP: SST complete, seqno: 19776226
150413 23:31:14 [Warning] Using unique option prefix myisam_recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150413 23:31:14 [Note] Plugin ‘FEDERATED’ is disabled.
150413 23:31:14 InnoDB: The InnoDB memory heap is disabled
150413 23:31:14 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150413 23:31:14 InnoDB: Compressed tables use zlib 1.2.7
150413 23:31:14 InnoDB: Using Linux native AIO
150413 23:31:14 InnoDB: Initializing buffer pool, size = 20.0G
150413 23:31:16 InnoDB: Completed initialization of buffer pool
150413 23:31:16 InnoDB: highest supported file format is Barracuda.
150413 23:31:17 InnoDB: Waiting for the background threads to start
21:31:17 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at [url]https://bugs.launchpad.net/percona-xtradb-cluster[/url]

key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=0
max_threads=702
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1798672 K bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x29)[0x7abeb9]
/usr/sbin/mysqld(handle_fatal_signal+0x372)[0x6aa6d2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fbdbf8728d0]
/usr/sbin/mysqld(ZN3THD15raise_conditionEjPKcN11MYSQL_ERROR18enum_warning_levelES1+0x22)[0x589062]
/usr/sbin/mysqld(_Z19push_warning_printfP3THDN11MYSQL_ERROR18enum_warning_levelEjPKcz+0xc7)[0x596d17]
/usr/sbin/mysqld(ib_warn_row_too_big+0x6d)[0x7d2d5d]
/usr/sbin/mysqld[0x869bbc]
/usr/sbin/mysqld[0x8774d3]
/usr/sbin/mysqld[0x8748cd]
/usr/sbin/mysqld[0x873a88]
/usr/sbin/mysqld[0x874e56]
/usr/sbin/mysqld[0x8753da]
/usr/sbin/mysqld[0x8e4220]
/usr/sbin/mysqld[0x8dd158]
/usr/sbin/mysqld[0x80e126]
/usr/sbin/mysqld[0x8040fc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7fbdbf86b0a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fbdbdab804d]
You may download the Percona XtraDB Cluster operations manual by visiting
[url]http://www.percona.com/software/percona-xtradb-cluster/[/url]. You may find information
in the manual which will help you identify the cause of the crash.
150413 23:31:17 mysqld_safe mysqld from pid file /data/mysql/altair.pid ended

At the moment, I’m unable to restart the node due to this problem and we are working on a two nodes cluster so in very degraded mode, is there any bug entry opened please ? Is there any workaround (other than having MySQL 5.5.42 as it’s not available yet for Percona XtraDB Cluster) ?

Regards,

Laurent MINOST
IPD - Infopro Digital

Can you share your my.cnf and the server’s hardware configurations? Some time ago I hit this same bug with 5.5 many releases due to lots of threads and misconfiguration related to main memory space. Most of times, Signal 11 or Segment[ation] Fault is related to some piece a data a process/software looks for in the main memory spaces and actually this isn’t there and then, in case of mysqld … crash!!

I recommend that you review your configs following the maths presented on the error message:

If the other nodes have the same amount of machine resources and the same mysql configuration on my.cnf file, you need to check the others nodes as well.

Just my 2 cents.

Hi wagnerbianchi,

Thanks for your answer.

Please note that this problem occurs only after upgrading to v5.5.41 of PXDBC, after this was not able to do an SST, so tried to downgrade back to v5.5.37 but without any success, node is now stuck because everytime I try to restart it, it does an SST from another node and then it crashes after SST. So configuration is probably not the culprit here as this one has not changed since before when the node was working properly with v5.5.37 !

Here is an output of the server’s hardware for each node :

root@altair:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Stepping: 4
CPU MHz: 2600.174
BogoMIPS: 5199.87
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

root@altair:~# free -m
total used free shared buffers cached
Mem: 64516 62101 2414 5 782 58067
-/+ buffers/cache: 3251 61264
Swap: 15257 1 15256

Dataset is located on a 4 disks Toshiba MK3001GRRB 300GB SAS 6Gb/s volume.

Attached id the my.cnf file for this node, please rename to altair_my.cnf.7z as the file has been compressed because upload file size is limited to 20k here and .7z extension are forbidden so I renamed it to .txt.

I’m now wondering to do a full backup of one of available/still-in-life node, then change its grastate.dat and apply the backup to this node so there will be no SST at all at start and see if it crashes again or not … but even if it works, this situation is not normal.

Please tell if you see any more informations useful to have.
Thanks !

Regards,

Laurent

altair_my.cnf.7z.txt (9.25 KB)

Hi,

​Does anyone have any idea on this problem please ?

Regards,

Laurent

Hi,

Still having the problem when doing SST from another node in the cluster, does anyone have any idea how to solve this crash please ?

150630 8:55:17 [Note] Plugin ‘FEDERATED’ is disabled.
150630 8:55:17 InnoDB: The InnoDB memory heap is disabled
150630 8:55:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150630 8:55:17 InnoDB: Compressed tables use zlib 1.2.7
150630 8:55:17 InnoDB: Using Linux native AIO
150630 8:55:17 InnoDB: Initializing buffer pool, size = 20.0G
150630 8:55:18 InnoDB: Completed initialization of buffer pool
150630 8:55:18 InnoDB: highest supported file format is Barracuda.
150630 8:55:19 InnoDB: Waiting for the background threads to start
06:55:20 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at [url]https://bugs.launchpad.net/percona-xtradb-cluster[/url]

key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=0
max_threads=702
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1798672 K bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x29)[0x7abeb9]
/usr/sbin/mysqld(handle_fatal_signal+0x372)[0x6aa6d2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f5e078e78d0]
/usr/sbin/mysqld(ZN3THD15raise_conditionEjPKcN11MYSQL_ERROR18enum_warning_levelES1+0x22)[0x589062]
/usr/sbin/mysqld(_Z19push_warning_printfP3THDN11MYSQL_ERROR18enum_warning_levelEjPKcz+0xc7)[0x596d17]
/usr/sbin/mysqld(ib_warn_row_too_big+0x6d)[0x7d2d5d]
/usr/sbin/mysqld[0x869bbc]
/usr/sbin/mysqld[0x8774d3]
/usr/sbin/mysqld[0x8748cd]
/usr/sbin/mysqld[0x873a88]
/usr/sbin/mysqld[0x874e56]
/usr/sbin/mysqld[0x873a88]
/usr/sbin/mysqld[0x874e56]
/usr/sbin/mysqld[0x873a88]
/usr/sbin/mysqld[0x874e56]
/usr/sbin/mysqld[0x8753da]
/usr/sbin/mysqld[0x8e4220]
/usr/sbin/mysqld[0x8dd158]
/usr/sbin/mysqld[0x80e126]
/usr/sbin/mysqld[0x8040fc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7f5e078e00a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f5e05b2d04d]
You may download the Percona XtraDB Cluster operations manual by visiting
[url]http://www.percona.com/software/percona-xtradb-cluster/[/url]. You may find information
in the manual which will help you identify the cause of the crash.
150630 08:55:21 mysqld_safe mysqld from pid file /var/lib/mysql/altair.pid ended

Regards,

Laurent