HP DL385 Performance and Memory giving me grey hair :(

Hmmm ok so I’m not quite sure where to start here…

A bit of background to our setup…

2 Datacentres, 15 servers in each, only one datacentre active.

In Active DC - Master DB is a HP DL360, 2 x 6 Core 4800Mhz CPU , 64Gb RAM - all other servers in the DC are slaves to this.
12 of the slaves are also DL360’s - exactly the same config (RAM, CPU) , 2 of the servers are DL385: 64Gb RAM, 2 x 16 Core 3500Mhz CPU

In Standby DC we have exactly the same - 13 DL360’s and 2 DL385’s - All the same spec
One of the DL360’s is a Slave to the Master in the Active DC, all other servers are slaved from that.

Everything is good so far…

We’re using our standby DC to try and bottom out some performance issues - specifically the two DL385s are under performing by orders of magnitude compared to the DL360s.

In addition, on one of the DL385’s, If I up the innodb_buffer_pool to 35Gb - Mysql won’t start, yet on the other DL385 it’s fine.

In terms of my.cnf parameters - apart from the obvious bin logging enabled on the masters, everything is the same and controlled by puppet.

I’m kind of lost as to:
a) why won’t mysql start with anything greater than 35Gb buffer pool on one server, but on another identical one it’s fine
b) why are the DL385’s performing so badly

I know the information I have provided is probably only a fraction of what is needed for a much more detailed investigation but just as a top level guess, can anyone think of anything that I’m missing?

We’re using 5.5.30-rel30.2.500 on all boxes.

Key my.cnf params as follows (This is from a server that starts fine with a 35Gb Buffer pool):

innodb_additional_mem_pool_size=33554432
innodb_buffer_pool_size=37580963840
innodb_log_buffer_size=16777216
join_buffer_size=131072
key_buffer_size=8388608
sort_buffer_size=2097152

This is the error we get when we increase the buffer pool to 35Gb or more on one of the DL385s:

130711 13:50:55 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended
130711 13:50:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
130711 13:50:56 [Note] Plugin 'FEDERATED' is disabled.
130711 13:50:56 InnoDB: The InnoDB memory heap is disabled
130711 13:50:56 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130711 13:50:56 InnoDB: Compressed tables use zlib 1.2.3
130711 13:50:56 InnoDB: Using Linux native AIO
130711 13:50:56 InnoDB: Error: Linux Native AIO is not supported on tmpdir.
InnoDB: You can either move tmpdir to a file system that supports native AIO
InnoDB: or you can set innodb_use_native_aio to FALSE to avoid this message.
130711 13:50:56 InnoDB: Error: Linux Native AIO check on tmpdir returned error[22]
130711 13:50:56 InnoDB: Warning: Linux Native AIO disabled.
130711 13:50:56 InnoDB: Initializing buffer pool, size = 35.0G
130711 13:50:58 InnoDB: Assertion failure in thread 47165255037984 in file ut0mem.c line 103 
InnoDB: Failing assertion: ret || !assert_on_error 
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
12:50:58 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=3002 
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6577353 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7b1b75]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x68d494]
/lib64/libpthread.so.0[0x2ae58190fbe0]
/lib64/libc.so.6(gsignal+0x35)[0x2ae582bb4285]
/lib64/libc.so.6(abort+0x110)[0x2ae582bb5d30]
/usr/sbin/mysqld[0x87a4a3]
/usr/sbin/mysqld[0x927136]
/usr/sbin/mysqld[0x85a09e]
/usr/sbin/mysqld[0x8a3783]
/usr/sbin/mysqld[0x8a3c51]
/usr/sbin/mysqld[0x856850]
/usr/sbin/mysqld[0x816953]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x68fe68]
/usr/sbin/mysqld[0x59742a]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0xa1d)[0x59b3ad]
/usr/sbin/mysqld[0x51a5fb]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x46d)[0x51e0fd]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2ae582ba1994]
/usr/sbin/mysqld[0x513339]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130711 13:50:58 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended

Hello,
If InnoDB attempts to allocate more memory for it’s buffer pool then available, it’s not crashing like in your case, it will just give you a nice message and refuse to initialize, like this:

130717 12:04:10 InnoDB: Initializing buffer pool, size = 100.0G
InnoDB: mmap(110310195200 bytes) failed; errno 12
130717 12:04:10 InnoDB: Completed initialization of buffer pool
130717 12:04:10 InnoDB: Fatal error: cannot allocate memory for the buffer pool
130717 12:04:10 [ERROR] Plugin 'InnoDB' init function returned error.
130717 12:04:10 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
130717 12:04:10 [ERROR] Unknown/unsupported storage engine: InnoDB
130717 12:04:10 [ERROR] Aborting

But in your case, MySQL crashed in this function:

Allocates memory.
@return own: allocated memory */
UNIV_INTERN
void*
ut_malloc_low(
/*==========*/
ulint n, /*!< in: number of bytes to allocate */
ibool assert_on_error)/*!< in: if TRUE, we crash mysqld if the
memory cannot be allocated */
{
#ifndef UNIV_HOTBACKUP
ulint retry_count;
void* ret;

if (UNIV_LIKELY(srv_use_sys_malloc)) {
ret = malloc(n);
ut_a(ret || !assert_on_error);

return(ret);
}

Which could mean the memory area you are trying to use is corrupted. This could also explain overall performance problems. Can’t you see anything related in dmesg log?
I would suggest to perform deep heath checks of those DL385s, starting from full memtest.

Also, on servers, where you can set higher buffer pool which would fit most hot data in memory, the performance could indeed be orders of magnitude better then on servers where the buffer pool is too small to keep those hot data. It’s disk vs memory speed :slight_smile: