mysql replication slave stuck using 100% cpu and no io for several days

Greetings Percona forum members! (And apologies in advance if this is not the right place to post.) I’m having some severe replication delay issues, and am hoping someone here has some advice. I’m running Percona-Server-server-55-5.5.28-rel29.1.335.rhel5 on centos 5.8 (kernel 2.6.18-274.18.1.el5). All hardware is the same, 2x6-core cpu, 8-disk software raid10, 2-disk software raid1 on flash, 96GB ram. The database is medium-sized (about 750GB) and is all innodb (except mysql system catalogs, of course).

I’m trying to get replication going, and it works fine with test ddl and dml (create table, insert/update/delete rows, drop table, etc.). Then the nightly batch job runs, which consists of several million dmls (inserts, updates, etc.), and the slave falls behind and never catches up. The master has no problem with this job, but (in the latest attempt) the slave has been stuck burning cpu for several days (its cpu usage is currently 4737 minutes) and has been stuck on the same event since 2012-11-16 12:19:55 UTC according to the relay log file and position from ‘show slave status’. There is very little to no io activity on the slave (and no swapping), per vmstat and iostat. The master log file and position continue to increase with activity on the master, and both the io and sql slave threads are running. Going by mysqlbinlog output, the statement being applied on the slave is a transaction with 62k update statements.

stracing the sql slave thread (where all the cpu is being consumed) shows the only system calls are futex and sched_yield (typically 24 over a 10-second period). Running pt-pmp (as well as manually poking around with gdb), it looks like there is some sort of activity on the replication sql thread (handle_slave_sql), so I don’t think the thread is just stuck in a tight loop doing nothing.

I am aware that replication in 5.5 is single-threaded, but it seems a little extreme to take 3+ days to replicate an already-single-threaded 3-hour batch job, especially since I can bring up a new slave via full-database innobackupex, scp, and restore in 5-6 hours. (I might have to go the not-very-warm standby route if this is actually a mysql replication limitation, and not the more likely case of a bone-headed misconfiguration on my part.)

pt-pmp output:

1 thd_opt_slow_log,innobase_get_slow_log,increment_page_get_s tatistics,buf_page_optimistic_get,btr_pcur_restore_position func,sel_restore_position_for_mysql,row_search_for_mysql,ha_ innobase::general_fetch,Rows_log_event::find_row,Update_rows log_event::do_exec_row,Rows_log_event::do_apply_event,apply event,apply_event_and_update_pos,exec_relay_log_event,handl e_slave_sql,start_thread,clone
1 row_sel_store_mysql_rec,row_search_for_mysql,ha_innobase::ge neral_fetch,Rows_log_event::find_row,Update_rows_log_event:: do_exec_row,Rows_log_event::do_apply_event,apply_event,apply event_and_update_pos,exec_relay_log_event,handle_slave_sql, start_thread,clone
1 rec_offs_nth_extern,row_sel_store_mysql_rec,row_search_for_m ysql,ha_innobase::general_fetch,Rows_log_event::find_row,Upd ate_rows_log_event::do_exec_row,Rows_log_event::do_apply_eve nt,apply_event,apply_event_and_update_pos,exec_relay_log_eve nt,handle_slave_sql,start_thread,clone
1 rec_init_offsets,rec_get_offsets_func,row_search_for_mysql,h a_innobase::general_fetch,Rows_log_event::find_row,Update_ro ws_log_event::do_exec_row,Rows_log_event::do_apply_event,app ly_event,apply_event_and_update_pos,exec_relay_log_event,han dle_slave_sql,start_thread,clone
1 rec_init_offsets_comp_ordinary,rec_init_offsets,rec_get_offs ets_func,row_search_for_mysql,ha_innobase::general_fetch,Row s_log_event::find_row,Update_rows_log_event::do_exec_row,Row s_log_event::do_apply_event,apply_event,apply_event_and_upda te_pos,exec_relay_log_event,handle_slave_sql,start_thread,cl one
1 rec_get_nth_field_offs,row_sel_store_mysql_rec,row_search_fo r_mysql,ha_innobase::general_fetch,Rows_log_event::find_row, Update_rows_log_event::do_exec_row,Rows_log_event::do_apply
event,apply_event,apply_event_and_update_pos,exec_relay_log
event,handle_slave_sql,start_thread,clone
1 mutex_get_waiters,mutex_exit_func,pfs_mutex_exit_func,buf_LR U_buf_pool_running_out,sel_set_rec_lock,row_search_for_mysql ,ha_innobase::general_fetch,Rows_log_event::find_row,Update
rows_log_event::do_exec_row,Rows_log_event::do_apply_event,a pply_event,apply_event_and_update_pos,exec_relay_log_event,h andle_slave_sql,start_thread,clone
1 memcpy,row_sel_store_mysql_rec,row_search_for_mysql,ha_innob ase::general_fetch,Rows_log_event::find_row,Update_rows_log_ event::do_exec_row,Rows_log_event::do_apply_event,apply_even t,apply_event_and_update_pos,exec_relay_log_event,handle_sla ve_sql,start_thread,clone
1 memcpy,rec_copy_prefix_to_buf,btr_pcur_store_position,row_se arch_for_mysql,ha_innobase::general_fetch,Rows_log_event::fi nd_row,Update_rows_log_event::do_exec_row,Rows_log_event::do _apply_event,apply_event,apply_event_and_update_pos,exec_rel ay_log_event,handle_slave_sql,start_thread,clone
1 memcpy@plt,row_sel_field_store_in_mysql_format,row_sel_store _mysql_rec,row_search_for_mysql,ha_innobase::general_fetch,R ows_log_event::find_row,Update_rows_log_event::do_exec_row,R ows_log_event::do_apply_event,apply_event,apply_event_and_up date_pos,exec_relay_log_event,handle_slave_sql,start_thread, clone

/etc/my.cnf:

autocommit=1
binlog_format=mixed
character_set_server=utf8
collation_server=utf8_general_ci
default_storage_engine=innodb
expire_logs_days=14
innodb_buffer_pool_instances=8
innodb_buffer_pool_size=64G
innodb_doublewrite=1
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=1
innodb_import_table_from_xtrabackup=1
innodb_io_capacity=10000
innodb_lazy_drop_table=1
innodb_log_buffer_size=256M
innodb_log_file_size=1G
innodb_max_purge_lag=100000 # transactions
innodb_old_blocks_time=1000 # msec
innodb_open_files=10000
innodb_recovery_stats=1
innodb_recovery_update_relay_log=1
innodb_stats_on_metadata=0
innodb_stats_sample_pages=128
innodb_strict_mode=1
innodb_use_sys_stats_table=1
key_buffer_size=16M
log_bin=mysql-bin
log_slave_updates=1
long_query_time=5
max_allowed_packet=16M
net_buffer_length=8K
read_buffer_size=256K
read_only=1
read_rnd_buffer_size=512K
relay_log=relay-bin
server_id=3
skip_external_locking=1
skip_name_resolve=1
slave_net_timeout=30 # seconds
slow_query_log=1
slow_query_log_file=slow
slow_query_log_timestamp_always=1
slow_query_log_timestamp_precision=microsecond
sort_buffer_size=512K
sql_mode=ansi_quotes,no_auto_value_on_zero,only_full_group_b y,pipes_as_concat,traditional
sync_binlog=1
table_open_cache=64
wait_timeout=60 # seconds

innodb engine status:

Mon Nov 19 10:14:06 2012 (root@localhost:/(none)) > show engine innodb status\G
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:

121119 10:14:11 INNODB MONITOR OUTPUT

Per second averages calculated from the last 7 seconds

BACKGROUND THREAD

srv_master_thread loops: 1920 1_second, 1798 sleeps, 21 10_second, 3388 background, 3362 flush
srv_master_thread log flush and writes: 2568

SEMAPHORES

OS WAIT ARRAY INFO: reservation count 314188, signal count 332175
Mutex spin waits 560945, rounds 9113328, OS waits 290665
RW-shared spins 41825, rounds 675842, OS waits 11911
RW-excl spins 6254, rounds 473476, OS waits 11017
Spin rounds per wait: 16.25 mutex, 16.16 RW-shared, 75.71 RW-excl

FILE I/O

I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o’s: 0, sync i/o’s: 0
Pending flushes (fsync) log: 0; buffer pool: 0
791036 OS file reads, 1044201 OS file writes, 47195 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s

INSERT BUFFER AND ADAPTIVE HASH INDEX

Ibuf: size 1, free list len 23003, seg size 23005, 631719 merges
merged operations:
insert 2063838, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 151384511, node heap has 1134 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s

LOG

Log sequence number 1594257315915
Log flushed up to 1594257305594
Last checkpoint at 1594257305594
Max checkpoint age 1738750649
Checkpoint age target 1684414692
Modified age 10321
Checkpoint age 10321
0 pending log writes, 0 pending chkp writes
4963 log i/o’s done, 0.00 log i/o’s/second

BUFFER POOL AND MEMORY

Total memory allocated 70598524928; in additional pool allocated 0
Internal hash tables (constant factor + variable factor)
Adaptive hash index 1229659760 (1211076088 + 18583672)
Page hash 8851208 (buffer pool 0 only)
Dictionary cache 284884854 (283213232 + 1671622)
File system 241072 (82672 + 158400)
Lock system 178238888 (169998968 + 8239920)
Recovery system 0 (0 + 0)
Dictionary memory allocated 1671622
Buffer pool size 4194296
Buffer pool size, bytes 68719345664
Free buffers 3138210
Database pages 1054450
Old database pages 389396
Modified db pages 6
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 27333, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 790800, created 263650, written 1014668
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 1054450, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]

INDIVIDUAL BUFFER POOL INFO

—BUFFER POOL 0
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 394540
Database pages 129542
Old database pages 47839
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3794, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 97108, created 32434, written 126217
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 129542, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 1
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 390917
Database pages 133166
Old database pages 49176
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3221, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 100480, created 32686, written 128281
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 133166, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 2
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 388946
Database pages 135136
Old database pages 49903
Modified db pages 1
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3202, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 100670, created 34466, written 131055
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 135136, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 3
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 394562
Database pages 129523
Old database pages 47832
Modified db pages 2
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3311, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 97354, created 32169, written 122813
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 129523, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 4
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 394118
Database pages 129962
Old database pages 47994
Modified db pages 3
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3758, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 96360, created 33602, written 125670
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 129962, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 5
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 390488
Database pages 133593
Old database pages 49334
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3232, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 100861, created 32732, written 128490
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 133593, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 6
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 391541
Database pages 132545
Old database pages 48947
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3574, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 99841, created 32704, written 126869
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 132545, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
—BUFFER POOL 7
Buffer pool size 524287
Buffer pool size, bytes 8589918208
Free buffers 393098
Database pages 130983
Old database pages 48371
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3241, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 98126, created 32857, written 125273
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 130983, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]

ROW OPERATIONS

0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
—OLDEST VIEW—
Normal read view
Read view low limit trx n:o 98D0E
Read view up limit trx id 98D0D
Read view low limit trx id 98D0E
Read view individually stored trx ids:
Read view trx id 98D0D

Main thread process no. 454, id 46985552390464, state: waiting for server activity
Number of rows inserted 16418313, updated 58224, deleted 0, read 173479127113
0.00 inserts/s, 0.14 updates/s, 0.00 deletes/s, 633670.19 reads/s

TRANSACTIONS

Trx id counter 98D18
Purge done for trx’s n:o < 98D0C undo n:o < 0
History list length 315
LIST OF TRANSACTIONS FOR EACH SESSION:
—TRANSACTION 0, not started
MySQL thread id 10144, OS thread handle 0x2abbafb93940, query id 40631 localhost root
show engine innodb status
—TRANSACTION 98D16, ACTIVE 280488 sec fetching rows
mysql tables in use 1, locked 1
88056 lock struct(s), heap size 8239544, 5759016 row lock(s), undo log entries 58224
MySQL thread id 2, OS thread handle 0x2abbafb52940, query id 3180 Reading event from the relay log
TABLE LOCK table longitude.date_sum_temp trx id 98D16 lock mode IX
RECORD LOCKS space id 1437 page no 4 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 5 n bits 152 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 6 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 7 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 8 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 9 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 10 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 11 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
RECORD LOCKS space id 1437 page no 12 n bits 144 index GEN_CLUST_INDEX of table longitude.date_sum_temp trx id 98D16 lock_mode X
TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS

END OF INNODB MONITOR OUTPUT

1 row in set (0.04 sec)

Slave status:

Mon Nov 19 16:58:05 2012 (root@localhost:/(none)) > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: xxx.xxx.xxx.xxx
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000040
Read_Master_Log_Pos: 130506017
Relay_Log_File: relay-bin.000006
Relay_Log_Pos: 250128026
Relay_Master_Log_File: mysql-bin.000029
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 250127880
Relay_Log_Space: 13226498555
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 304767
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)

I suppose your problem is related to MIXED binlog format and the way the nightly job does those updates. Note that any non-deterministic update in MIXED format results in writing such changes using row based logging. This means that a single statement on master, though changing a lot of rows, results in binlog having a separate binary entries for each modified or inserted row. An example of non-deterministic insert would be: INSERT INTO t1 SELECT * FROM t2 WHERE … Say you insert a million rows this way, you will get a huge binlog entry having a million records, which is just slow to write/transfer by the master and write(relay log)/read/execute by the slave.

Thanks for the response przemek, I found the problem (I’ve been meaning to update this thread for a few days now…). The 62k updates were against a table without a unique key, so each row caused a table scan of the 5M-row table. Adding a primary key to the table fixed the problem. As the table easily fits in memory, it’s not surprising that the server was totally cpu-bound. What was surprising to me was that even with row-based replication, mysql is still doing some sort of logical (not physical) replication, which means that you can get horrible performance problems like this as the schema changes.

Also, during testing, much larger updates via RBR (eg 1M rows) now replicate in less than a minutes, whereas in this case only 62k updates were taking many days (and entirely cpu-bound, as mentioned in the original post).

In case anyone is interested, the relevant source code is in sql/log_event.cc:9506 (Rows_log_event::find_row), with a comment stating: “If no key is specified or table does not have keys, a table scan is used to find the row.”