Replication thread spends too much time in "freeing items" state

Hi guys. I have a Master-Slave replication (both are 5.6.21-70.0-log Percona Server). Slave is used for XtraBackup + some SELECT queries are executed once in awhile. We also have percona agent running on a server (for Percona Cloud Tools).
Red bars (CPU usage) on image are caused by XtraBackup:
[ATTACH=CONFIG]temp_337_1433497825592_323[/ATTACH]
( [url]http://i.imgur.com/ZkUuIQa.png[/url] )

Replication gap grows as shown on image (graphs show last 7 days). I can reboot slave - things will be fine for 2-3 days but gap will start growing again as shown on a graph.
[ATTACH=CONFIG]temp_335_1433497845981_524[/ATTACH]
( [url]http://i.imgur.com/sudZKPv.png[/url] )

The problem is - gap appears even before XtraBackup is scheduled to run. It just appears when traffic goes up a bit and becomes worth and worth every day. In 9 out of 10 “SHOW PROCESSLIST” cases thread is in “freeing items”:

mysql> show processlist;
| 212 | system user | | NULL | Connect | 861178 | Waiting for master to send event | NULL | 0 | 0 |
| 104717 | percona-agent | localhost | NULL | Sleep | 0 | | NULL | 403 | 403 |
| 104723 | system user | | NULL | Connect | 7368 | freeing items | NULL | 0 | 0 |

So replication gap is 7368 seconds. No queries are executed at that time. It doesn’t matter whether xtrabackup is running or not - amount of time thread spends at ‘freeing items’ visually looks the same for the whole day. I assume the reason gap goes down is because we have 2 times less traffic at that period of a day (US oriented service).

/etc/my.cnf:

[mysqld_safe]
malloc-lib = /usr/lib64/libjemalloc.so.1

[mysqld]

default-storage-engine = innodb
replicate-wild-do-table=db_name.%
key_buffer_size = 16K
max_allowed_packet = 1M
read_buffer_size = 256K
read_rnd_buffer_size = 256K
net_buffer_length = 2K
max_connections = 2000
server-id = 2
log-bin=mysql-bin
expire_logs_days=4
max_binlog_size=20873534

query_cache_limit = 0
query_cache_size = 0
query_cache_type = 0

innodb_buffer_pool_size = 23G
innodb_log_file_size = 2G
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table
innodb_read_io_threads = 4
innodb_write_io_threads = 8
innodb_thread_concurrency = 0
innodb_io_capacity = 600
innodb_support_xa = 0
innodb_stats_on_metadata = 0
innodb_buffer_pool_instances = 4

tmp_table_size = 128M
max_heap_table_size = 128M

long_query_time = 1.0
slow_query_log = 0
slow_query_log_file = /var/lib/mysql/slow-query.log
slow_query_log_timestamp_always = 1
slow_query_log_timestamp_precision = microsecond
slow_query_log_use_global_control = all
log_slow_verbosity = full

Below is “SHOW GLOBAL STATUS” difference with 300 seconds interval:

Binlog_snapshot_position diff is 1204
Binlog_cache_use diff is 1
Binlog_stmt_cache_use diff is 5
Bytes_received diff is 177512897
Bytes_sent diff is 1138503753
Com_admin_commands diff is 52
Com_begin diff is 467110
Com_commit diff is 467111
Com_create_index diff is 1
Com_create_table diff is 2
Com_delete diff is 2222
Com_drop_table diff is 3
Com_insert diff is 6109
Com_select diff is 8307
Com_set_option diff is 12
Com_show_slave_status diff is 17
Com_show_status diff is 307
Com_show_variables diff is 1
Com_update diff is 467563
Connections diff is 35
Created_tmp_disk_tables diff is 3
Created_tmp_files diff is 3
Created_tmp_tables diff is 314
Handler_commit diff is 951268
Handler_delete diff is 18768
Handler_external_lock diff is 968327
Handler_prepare diff is 4
Handler_read_first diff is 6
Handler_read_key diff is 613326
Handler_read_last diff is 1
Handler_read_next diff is 12186897
Handler_read_rnd diff is 1008274
Handler_read_rnd_next diff is 25303902
Handler_update diff is 1541282
Handler_write diff is 6675629
Innodb_background_log_sync diff is 298
Innodb_buffer_pool_pages_data diff is 924
Innodb_buffer_pool_bytes_data diff is 15138816
Innodb_buffer_pool_pages_dirty diff is 1476
Innodb_buffer_pool_bytes_dirty diff is 24182784
Innodb_buffer_pool_pages_flushed diff is 62591
Innodb_buffer_pool_pages_free diff is 3
Innodb_buffer_pool_pages_made_not_young diff is 65532
Innodb_buffer_pool_pages_made_young diff is 99828
Innodb_buffer_pool_pages_old diff is 341
Innodb_buffer_pool_read_requests diff is 72325656
Innodb_buffer_pool_reads diff is 15456
Innodb_buffer_pool_write_requests diff is 3302830
Innodb_data_fsyncs diff is 6258
Innodb_data_read diff is 310525952
Innodb_data_reads diff is 18954
Innodb_data_writes diff is 531176
Innodb_data_written diff is 2468012032
Innodb_dblwr_pages_written diff is 62591
Innodb_dblwr_writes diff is 706
Innodb_history_list_length diff is 108
Innodb_ibuf_merged_delete_marks diff is 27967
Innodb_ibuf_merged_deletes diff is 8659
Innodb_ibuf_merged_inserts diff is 6907
Innodb_ibuf_merges diff is 8535
Innodb_log_write_requests diff is 360723
Innodb_log_writes diff is 467335
Innodb_lsn_current diff is 177732798
Innodb_lsn_flushed diff is 177628594
Innodb_lsn_last_checkpoint diff is 192034095
Innodb_master_thread_active_loops diff is 298
Innodb_max_trx_id diff is 937646
Innodb_mutex_os_waits diff is 2568
Innodb_mutex_spin_rounds diff is 112751
Innodb_mutex_spin_waits diff is 16936
Innodb_os_log_fsyncs diff is 495
Innodb_os_log_written diff is 417008640
Innodb_pages_created diff is 652
Innodb_pages_read diff is 18953
Innodb_pages_written diff is 62591
Innodb_purge_trx_id diff is 937647
Innodb_rows_deleted diff is 18768
Innodb_rows_inserted diff is 48277
Innodb_rows_read diff is 32417934
Innodb_rows_updated diff is 512089
Innodb_s_lock_os_waits diff is 492
Innodb_s_lock_spin_rounds diff is 24043
Innodb_s_lock_spin_waits diff is 2070
Innodb_x_lock_os_waits diff is 1163
Innodb_x_lock_spin_rounds diff is 52583
Innodb_x_lock_spin_waits diff is 2282
Key_read_requests diff is 25033329
Key_reads diff is 8320794
Key_write_requests diff is 3064203
Key_writes diff is 3052392
Opened_files diff is 281
Opened_table_definitions diff is 6
Opened_tables diff is 411
Queries diff is 951980
Questions diff is 8686
Select_range diff is 8253
Select_scan diff is 317
Slow_queries diff is 10214
Table_locks_immediate diff is 484163
Table_open_cache_hits diff is 484162
Table_open_cache_misses diff is 3
Threads_connected diff is 1
Uptime diff is 300
Uptime_since_flush_status diff is 300

Few more things:

  • XtraBackup is running on the same machine. It works with NFS directory.
  • Slave was running well for months. Issue started in a week after XtraBackup was configured on a server.
  • Server is running on CentOS 6

Appreciate any help around the issue. Will be happy to provide more data if needed.

Thanks in advance!
Alex