High IO write latency with O_DIRECT on 5.7

Hi guys,

I have Master-Slave replica on 5.6.26-74.0-log Percona Server. I’ve recently ordered new machines and installed 5.7 there to replace old setup. These 2 machines are slaves of old 5.6 master now. Configs were copied, so it’s a pretty same setup in terms of some important settings.

First, the problem. New slaves are slower, replication gap is high at peak traffic. There are no issues with replication gap on old 5.6 slave. Check screenshots of IO Write latency, which is 0.15ms for 5.6 and around 1ms for 5.7 when O_DIRECT is enabled. I’ve changed O_DIRECT to O_DSYNC to see the difference - it’s in the middle of screenshot. O_DSYNC is better, but I’d like to have O_DIRECT cause I have BBU and write-back on RAID.

Both machines have pretty same hardware, apart RAID. Old 5.6 slave has RAID1, new 5.7 slave has RAID10. Same SSD though, just 4 instead of 2. Both are on ext4 (new machine has noatime flag, old one doesn’t have it).

Ioping results (-D for o_direct)

Old machine:
ioping -c 10 -WWW -D .
4 KiB >>> . (ext4 /dev/sda3): request=1 time=46.1 us (warmup)
4 KiB >>> . (ext4 /dev/sda3): request=2 time=74.0 us

min/avg/max/mdev = 56.0 us / 62.2 us / 74.0 us / 4.84 us

New machine:
ioping -c 10 -WWW -D .
4 KiB >>> . (ext4 /dev/sda4): request=1 time=50.5 us (warmup)
4 KiB >>> . (ext4 /dev/sda4): request=2 time=76.1 us

min/avg/max/mdev = 68.7 us / 88.6 us / 102.9 us / 11.5 us

RAID configs are same in terms of caching:

5.6 server:
arcconf getconfig 1 | grep -i cache
Controller Cache Preservation : Disabled
Global Physical Device Write Cache Policy: Drive Specific
Controller Cache Backup Unit Information
Read-cache setting : Enabled
Read-cache status : On
Write-cache setting : On when protected by battery/ZMM
Write-cache status : On
Write Cache : Enabled (write-back)
Write Cache : Enabled (write-back)

5.7 server:
arcconf getconfig 1 | grep -i cache
Controller Cache Preservation : Disabled
Global Physical Device Write Cache Policy: Drive Specific
Controller Cache Backup Unit Information
Read-cache setting : Enabled
Read-cache status : On
Write-cache setting : Enabled
Write-cache status : On
Write Cache : Enabled (write-back)
Write Cache : Enabled (write-back)
Write Cache : Enabled (write-back)
Write Cache : Enabled (write-back)

5.6 my.cnf

malloc-lib = /usr/lib64/libjemalloc.so.1

[mysqld]
default-storage-engine = innodb
innodb_buffer_pool_size = 20G
innodb_log_file_size = 2G
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table
innodb_read_io_threads = 4
innodb_write_io_threads = 8
innodb_thread_concurrency = 0
innodb_io_capacity = 3000
innodb_support_xa = 0
innodb_stats_on_metadata = 0
innodb_buffer_pool_instances = 8
tmp_table_size = 128M
max_heap_table_size = 128M

5.7 my.cnf

malloc-lib = /usr/lib64/libjemalloc.so.1

[mysqld]
default-storage-engine = innodb
query_cache_type = 0
query_cache_size = 0
innodb_buffer_pool_size = 23G
innodb_log_file_size = 2G
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
#innodb_flush_method = O_DIRECT
innodb_flush_method = O_DSYNC
innodb_file_per_table
innodb_read_io_threads = 4
innodb_write_io_threads = 8
innodb_thread_concurrency = 0
innodb_io_capacity = 3000
innodb_io_capacity_max = 10000
innodb_support_xa = 0
innodb_stats_on_metadata = 0
innodb_buffer_pool_instances = 8
innodb_purge_threads = 4
tmp_table_size = 128M
max_heap_table_size = 128M
table_open_cache = 20000

Initially, new machines have been on centos 7 and XFS, but when I saw first results I downgraded to CentOS6 and ext4 to get a similar setup. Results didn’t change so it seems to be not OS/FS problem.

So with pretty same IO latency and IO bandwidth (tested drives with FIO - pretty same results) for some reason mysql 5.7 with O_DIRECT is 6-7 times slower (~1ms w_time vs ~0.15ms w_time).

Any clues what’s wrong guys? Appreciate any hint.

Many thanks,
Alex