Enormous increase in number of redo writes in 8.0 with innodb log writer threads and slow transactions processing without it

while performing performance tests of our system in preparation for an upgrade to Percona MySQL 8.0 we have noticed that IO load increased by about two times and almost up to 100% utilization.

so after concluding series of tests, which include a very short transaction performing one of insert operations used by our system and executed with the same fixed TPS per thread (16) and with the same number of threads (128) plus a periodical cleanup of records inserted earlier, we have discovered that a lot more of writes to redo logs and log fsyncs are performed in comparation with MySQL 5.7:

there are in avg 48.91k (max 71.3k) of log write operations and in avg 25.43MB/s (max 36.85MB/s) on Percona MySQL 8.0.22 vs 1.71k (1.87k) and 5.98MB/s (8.55MB/s) on Percona MySQL 5.7.20.

it appears that the new log writer threads, which are enabled by default, are responsible for it and after turning it off via the new option “innodb_log_writer_threads”, the number of redo log write operations and amount of log data have been returned to values close to ones on MySQL 5.7.

However, an average elapsed time per transaction is greatly increased: from 0.505ms on MySQL 8.0.22 with log writer threads (and 0.485ms on MySQL 5.7) up to 2.120ms on MySQL 8.0.22 without log writer threads and most of time (~1.68ms) have been spent during commits.

the only changed metrics are ones related to row locks: an average number of threads waiting on row locks, which is most likely a consequence of an increased transaction time, and a number of innodb contentions: there are about three times more of OS wait operations and Spin waits.

the both situations, i.e. a soaring spike of redo log related operations caused by redo log writer threads and an increased transaction time without it, look like serious degradations and probably caused by some bugs.

any ideas how to relieve such situation?

is it the expected behavior or should I file a formal bug report for it?

there is a screenshot of PMM graphs showing redo log IOs during the test runs and contentions: a time interval from 14:20 until 14:40 corresponds MySQL 5.7, from 15:35 to 15:55 - MySQL 8.0 with redo log writer threads turned on and from 16:40 to 17:00 - MySQL 8.0 with redo log writer threads turned off.


the same options are used for the both mysql 5.7 and 8.0:

[mysqld]

innodb_numa_interleave = ON

relay-log = relay-bin
relay_log_info_repository = TABLE
relay_log_recovery = ON

character-set-filesystem = utf8
default-time-zone = +00:00

log-bin-trust-function-creators = ON
explicit_defaults_for_timestamp = ON
max_allowed_packet = 16M

character_set_server = latin1
collation_server = latin1_swedish_ci
innodb_autoinc_lock_mode = 1
log_error_verbosity = 3
master_info_repository = FILE

default_authentication_plugin=mysql_native_password

max_connections = 500

log-bin = mysql-bin
binlog_format = ROW

userstat = OFF
thread_statistics = OFF
performance_schema = OFF
event_scheduler = ON

key_buffer_size = 64M
read_rnd_buffer_size = 8M
read_buffer_size = 2M
sort_buffer_size = 2M

max_heap_table_size = 128M
tmp_table_size = 128M

innodb_strict_mode = ON

innodb_stats_auto_recalc = ON
innodb_stats_persistent_sample_pages = 40

innodb_log_buffer_size = 8M
innodb_log_file_size = 100M

innodb_buffer_pool_size = 8192M
innodb_buffer_pool_instances = 8
innodb_buffer_pool_load_at_startup = ON
innodb_buffer_pool_dump_at_shutdown = ON

innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
innodb_flush_neighbors = 1

innodb_thread_concurrency = 0
innodb_commit_concurrency = 0

table_open_cache = 2000
table_open_cache_instances = 8

innodb_adaptive_hash_index = ON

innodb_io_capacity = 400
innodb_write_io_threads = 4
innodb_read_io_threads = 4
innodb_purge_threads = 2

innodb_print_all_deadlocks = ON

audit log configuration

plugin-load = audit_log=audit_log.so
audit_log_strategy = ASYNCHRONOUS
audit_log_file = audit.log
audit_log_format = NEW
audit_log_policy = QUERIES
audit_log_rotate_on_size = 1073741824
audit_log_rotations = 10
audit_log_handler = FILE

sync_binlog = 0

PMM

performance_schema=ON
performance-schema-instrument=‘statement/%=ON’
performance-schema-consumer-statements-digest=ON
innodb_monitor_enable=all

1 Like

Hi @ep11ripe

I believe what you see is related to new MySQL 8.0 redo log design, you can read more there:
https://mysqlserverteam.com/mysql-8-0-new-lock-free-scalable-wal-design/

Unfortunately I think it may lead to performance issues you observe, and I do not have good recommendation how to solve them.
You may play with
https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-logging.html
See section
“Optimize the use of spin delay by user threads waiting for flushed redo. Spin delay helps reduce latency.”

1 Like

it seems that one of goals of reworking redo logs is too reduce a number of fsync operations, but in reality I see an opposite picture even in a simple single thread test doing a 100k of inserts with a single insert per transaction.

there are some metrics collected for runs with and without log writer threads on percona mysql 8.0.22.

they perform about the same number of work:

trx_commits_insert_update 100040 100036
trx_rw_commits 100041 100037

both log writer and log flusher threads are actively working in the 1st case:

log_flusher_no_waits 21064 0
log_flusher_wait_loops 125614 0
log_flusher_waits 99947 0
log_flush_notifier_no_waits 71 0
log_flush_notifier_wait_loops 200354 0
log_flush_notifier_waits 111635 0

log_on_flush_wait_loops 173413 0
log_on_flush_waits 100109 0
log_write_notifier_no_waits 12 0
log_write_notifier_wait_loops 175388 0
log_write_notifier_waits 107184 0
log_writer_no_waits 62797 0
log_writer_wait_loops 209129 0
log_writer_waits 118107 0

a total number of log flushes is about 20% bigger with new log writer threads

os_log_fsyncs 120731 100152
os_data_fsyncs 121602 100772
dblwr_async_requests 944 624

and while some of stats are almost identical :

log_full_block_writes 59100 59083
buffer_pool_write_requests 505061 504693
log_write_requests 503220 502838

the amount of data written is increased by ~15% and a number log_partial_block_writes increased by ~20% and it’s almost the same as for number of flushes.

buffer_data_written 136507392 120154112
log_writes 180904 159210
os_log_bytes_written 92658688 81554944
os_data_writes 182860 160524
log_partial_block_writes 121804 100127

buffer_pages_written 944 624
innodb_dblwr_pages_written 944 624
innodb_dblwr_writes 944 624
buffer_page_written_undo_log 741 438

the similar tests but using a “real” table with several secondary indexes shows more than 240% increase in a number of flushes and log partial block writes as well as more than 150% increase in amount of log data written and the same test performed in parallel (128 threads doing 16 tps each) the difference is just monstrous

1 Like

@ep11ripe You have plenty of data to open a case at https://bugs.mysql.com/ since this is their design.

1 Like