Terrible write performance (Ubuntu 20.04, percona-server-5.7)

Hi,
maybe someone here can help. I’m running percona-server 5.7 on Ubuntu 20.04. With Ubuntu 16.04 it’s running fine so far. Now as 16.04 is soon end of life these hosts should be upgraded to 20.04.

Now here is the issue, no matter if I use ext4, btrfs or xfs, that there is a horrible write performance (about 300 inserts/s). On the other hand the old host comes up to 1200 inserts/s.

Both servers are VMs running the same specs. Only difference is the operating system. Preferred filesystem is btrfs. Mount Option nobarrier for xfs is obsolete in Ubuntu 20.04

#BTRFS Mount Options
/dev/vdb1 /var/lib/mysql btfrs subvol=,defaults,noatime,space_cache,autodefrag,discard,compress=lzo

Same situation if nodatacow is set on mount.

Any suggestions?

Thanks in advance

1 Like

Hi BoneBunny,

ext4 and xfs are both suggested for MySQL/Percona Server and shouldn’t be the issue

Possible reasons for performance differences:

  • Different configuration. Do check you are using same settings on both MySQL servers. You can also check how to flush innodb write activity. You can check how to tune MySQL with the following Tuning MySQL/InnoDB Flushing for a Write-Intensive Workload - Percona Database Performance Blog
  • Difference in MySQL major version. Default value do change between major versions and that can impact performance, but there are also other architectural changes. Are you comparing against same version?
  • Disk scheduler: using cfq has worse performance compared to noop and deadline: Linux schedulers in tpcc like benchmark - Percona Database Performance Blog . Newer kernels have other disk schedulers. Do check which one you are using as can be a non suggested
  • Are you comparing throughput against similar disks specs? I.e if you are comparing performance between spinning and SSD, comparison won’t make any difference. Do check iostat and check that both disks have similar write/read activity
  • How are you comparing inserts? You should test same load against both servers, using either sysbench or tools like pt-upgrade — Percona Toolkit Documentation . Depending on table size and row size, number of transactions might differ , but overall performance might be similar
  • Are mount options the same on both systems?

Please confirm if after having everything the same, performance differences do still exist

1 Like

Hi @BoneBunny,
You are the first person I’ve ever seen to say “Preferred filesystem is btrfs”. :smiley:
In any case, in addition to what @CTutte said above, you can use sysbench to run raw disk IO benchmarks and see if “mysql” is the problem, or to determine if you have a more low-level disk issue.

1 Like

Hi @CTutte

MySQL Settings are the same for both servers, only difference is that the old server is running percona-server 5.7 on Ubuntu 16.04, the new one is running percona-server 5.7 on Ubuntu 20.04. Disk specs are also the same. KVM Virtual Machine running on SSD Storage.

Disk scheduler is mq-deadline. Other available would be none. Testing with Tools like sysbench I get write speed about 600Megs per second.

Server is currently running ext4 with the following mount options

defaults,noatime,nobarrier,nobh,errors=remount-ro

This is the config running on both servers which have 152GB RAM

bind_address = *
bulk_insert_buffer_size = 64M
innodb_buffer_pool_size = 100G
innodb_buffer_pool_instances = 25
innodb_buffer_pool_chunk_size = 4096M
innodb_log_buffer_size = 512M
innodb_open_files = 8192
innodb_file_per_table = 1
innodb_doublewrite = OFF
innodb_flush_log_at_timeout = 1
innodb_flush_log_at_trx_commit = 2
innodb_io_capacity = 2000
innodb_log_file_size = 13G
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_print_all_deadlocks = ON
innodb_status_output = OFF
innodb_adaptive_hash_index = 0
innodb_adaptive_hash_index_parts = 1
join_buffer_size = 524288
key_buffer_size = 128M
max_allowed_packet = 32M
max_connections = 1201
max_heap_table_size = 128M
max_relay_log_size = 256M
max_slowlog_files = 10
max_slowlog_size = 1G
open_files_limit = 65535
performance_schema_digests_size = 10000
performance_schema_events_stages_history_long_size = 10000
performance_schema_events_statements_history_long_size = 10000
performance_schema_events_transactions_history_long_size = 10000
performance_schema_events_waits_history_long_size = 10000
query_cache_limit = 1M
query_cache_size = 0
query_cache_type = OFF
query_response_time_stats = ON
read_rnd_buffer_size = 524288
read_buffer_size = 244K
read_only = OFF
skip-external-locking
skip_name_resolve = ON
slow_query_log = ON
sort_buffer_size = 4194304
thread_cache_size = 8
thread_stack = 256K
tmp_table_size = 128M
userstat = ON
table_definition_cache = 2000
table_open_cache = 8192
thread_pool_size = 36
default_time_zone = SYSTEM
innodb_flush_method = O_DIRECT

If I can provide any other useful stats just let me know.

1 Like

@BoneBunny Have you removed mysql from the equation? Use sysbench’s file tests to perform direct IO benchmarks.

https://www.alibabacloud.com/blog/testing-io-performance-with-sysbench_594709

If you still get bad performance, then you know mysql is not at fault.

1 Like

Slave has kept up his replication lag with sync_binlog=0. Currently sync_binlog=10 is set, but i know this can cause data loss. And if i set it back to 1 replication lag will grow again.

If setting sync_binlog=1 is causing replication lag, then this is a clear indicator that your disks have issues. For a third time, I recommend that you stop mysql and perform basic disk IO benchmarks.

Here are two IO test. First one using sysbench, the second one was made using fio:

### sysbench ###
sysbench fileio --file-total-size=15G --file-test-mode=rndrw --time=300 --max-requests=0 --file-io-mode=async --file-extra-flags=direct  --file-num=1 --file-rw-ratio=1 --file-fsync-freq=0 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: directio
1 files, 15GiB each
15GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.00
Calling fsync() at the end of test, Enabled.
Using asynchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      20040.73
    writes/s:                     20040.56
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  313.14
    written, MiB/s:               313.13

General statistics:
    total time:                          300.0050s
    total number of events:              12024999

Latency (ms):
         min:                                    0.01
         avg:                                    0.02
         max:                                   42.99
         95th percentile:                        0.04
         sum:                               292995.31

Threads fairness:
    events (avg/stddev):           12024999.0000/0.00
    execution time (avg/stddev):   292.9953/0.00

### fio ###
fio -direct=1 -iodepth=128 -rw=randrw -ioengine=libaio -bs=4m -size=10G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=randrw_test 
randrw_test: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=128
fio-3.16
Starting 1 process
randrw_test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=724MiB/s,w=648MiB/s][r=181,w=162 IOPS][eta 00m:00s]
randrw_test: (groupid=0, jobs=1): err= 0: pid=396151: Thu Jan 21 16:03:28 2021
  read: IOPS=159, BW=638MiB/s (669MB/s)(5048MiB/7908msec)
    slat (usec): min=670, max=13612, avg=3043.95, stdev=2634.92
    clat (msec): min=138, max=927, avg=333.64, stdev=43.34
     lat (msec): min=149, max=933, avg=336.68, stdev=43.35
    clat percentiles (msec):
     |  1.00th=[  215],  5.00th=[  296], 10.00th=[  309], 20.00th=[  321],
     | 30.00th=[  326], 40.00th=[  330], 50.00th=[  330], 60.00th=[  334],
     | 70.00th=[  338], 80.00th=[  347], 90.00th=[  359], 95.00th=[  376],
     | 99.00th=[  456], 99.50th=[  584], 99.90th=[  860], 99.95th=[  927],
     | 99.99th=[  927]
   bw (  KiB/s): min=385024, max=753664, per=97.31%, avg=636057.73, stdev=98391.43, samples=15
   iops        : min=   94, max=  184, avg=155.20, stdev=23.96, samples=15
  write: IOPS=164, BW=657MiB/s (688MB/s)(5192MiB/7908msec); 0 zone resets
    slat (usec): min=331, max=15904, avg=3011.16, stdev=2737.95
    clat (msec): min=137, max=666, avg=440.57, stdev=81.70
     lat (msec): min=138, max=672, avg=443.58, stdev=81.80
    clat percentiles (msec):
     |  1.00th=[  226],  5.00th=[  317], 10.00th=[  334], 20.00th=[  363],
     | 30.00th=[  388], 40.00th=[  418], 50.00th=[  451], 60.00th=[  477],
     | 70.00th=[  498], 80.00th=[  518], 90.00th=[  542], 95.00th=[  558],
     | 99.00th=[  592], 99.50th=[  617], 99.90th=[  642], 99.95th=[  667],
     | 99.99th=[  667]
   bw (  KiB/s): min=335872, max=770048, per=96.48%, avg=648627.60, stdev=105469.60, samples=15
   iops        : min=   82, max=  188, avg=158.27, stdev=25.73, samples=15
  lat (msec)   : 250=1.41%, 500=83.87%, 750=14.61%, 1000=0.12%
  cpu          : usr=5.36%, sys=19.16%, ctx=2426, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=0.6%, 32=1.2%, >=64=97.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=1262,1298,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=638MiB/s (669MB/s), 638MiB/s-638MiB/s (669MB/s-669MB/s), io=5048MiB (5293MB), run=7908-7908msec
  WRITE: bw=657MiB/s (688MB/s), 657MiB/s-657MiB/s (688MB/s-688MB/s), io=5192MiB (5444MB), run=7908-7908msec

mysql has been stopped before testing

Hi again,

Was the same sysbench run on the other server for comparison?
In case disk performance is the same, then you can use pt-upgrade tool pt-upgrade — Percona Toolkit Documentation to execute the queres from slow/general log into two distinct servers for performance comparison.

Last, let me ask you. In the previous response you mentioned something about “sync_binlog” and replica lagging.
Are you comparing a standalone vs a standalone ? or a primary vs a replica? If you are comparing a primary vs a replica, then performance might vary a bit since replication was originally applied single threaded on replica , and on more recent versions it could be parallelized but up to some extent. Maybe the problem is not the disk performance, but replication bottleneck. If using replica, do you see single thread saturation and most threads not being used? System lock on “SHOW SLAVE STATUS \G” frequently? You can read how to check parallel replication efficiency with the steps on this blogpost: Estimating potential for MySQL 5.7 parallel replication - Percona Database Performance Blog

Also, is sync_binlog same value on both servers? In case “replica is lagging” is the server on Ubuntu 20.04 a replica lagging ? Or both servers have a replica and only the replica in 20.04 is lagging?

1 Like

I’ve set sync_binlog=0 on the slave only. On the master the IO Perfomance is nearly the same.
For the Setup, it is a Master-Slave setup. Master and slave do have the same specs. Master in this case has no performance issues so far.
If, for testing, the slave is used for production it does have no issues too. Its having his 700-900 inserts per second, so it just is related to replication, wich, if sync_binlog=1 is set, is constantly increasing slave lag. I will have a look on the single thread situation.

1 Like