We use two percona servers as master-slave replication. The server only has a few simple tables, but each table has hundreds of millions rows and this number keeps increasing.
However, the slave crashes quite often, and I guess, because of some memory spike. The problem is as long as it crashed once, it keeps crashing: After it restarts and recovers from last crash, the slave SQL-thread stuck at opening tables when it is trying to do the replication. In this case, I have to stop the server and reset the master log position. By the way, I also use innodb_buffer_pool_restore_at_startup = 300.
My question is: what causes the crash, the replication or the buffer pool restore, or something else?
my configuration file is listed below and the server has 16GB memory:
key_buffer = 10M
max_allowed_packet = 64M
thread_stack = 192K
thread_cache_size = 1000
binlog_cache_size = 1M
max_heap_table_size = 64M
sort_buffer_size = 2M
join_buffer_size = 2M
myisam-recover = BACKUP
max_connections = 1000
table_cache = 100000
thread_concurrency = 8
tmp_table_size = 64M
log-bin = mysql-bin
query_cache_limit = 64M
query_cache_size = 256M
ft_min_word_len = 4
* Logging and Replication
log_error = /var/log/mysql/error.log
long_query_time = 2
log_slow_queries
server-id = 2
log-bin = mysql-bin
relay-log = localhost-relay-bin
expire_logs_days = 2
max_binlog_size = 100M
bulk_insert_buffer_size = 64M
innodb_buffer_pool_size = 12G
innodb_thread_concurrency = 0
innodb_concurrency_tickets = 5000
innodb_flush_method = O_DIRECT
innodb_data_file_path = ibdata1:1G;ibdata2:1G:autoextend
innodb_lock_wait_timeout = 100
innodb_locks_unsafe_for_binlog = 1
innodb_buffer_pool_instances = 8
innodb_log_file_size = 256M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit=2
innodb_additional_mem_pool_size = 16M
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_file_per_table = 1
innodb_doublewrite = 0
innodb_buffer_pool_restore_at_startup = 300
innodb_open_files = 500
sync_binlog = 0