We have mysql 8.0.33-25.1 percona wsrep cluster with 6 nodes. The percona cluster is stable and works as expected.
One of the nodes is also a binlog enabled replication primary.
On the other side is 1 replica.
Those are AWS instances of comparable size, the replica has more dedicated IOPS than the primary, due to the nature of its workload.
This same setup was used on mysql5.7 which we migrated away from a few months ago.
Since the migration the replica is behaving incredibly strangely. I understand that there are probably some under-the-hood changes on mysql replication, but what we are experiencing is not stable at all.
The workload on that replica is mostly reads. On the replica there are databases that do not get replicated that do have writes, but as far as replication is concerned on that replica it is only reads.
Here are 2 example workflows that are running on the replica:
- DB1 is replicated from the primary and the replica reads data from there, creates “snapshots” of the data and writes to DB2 on the replica (which is not replicated in any server).
- Tableau reads data sequentially once per day that is then used for reports throughout the day from the replica on a schedule.
There are other workloads of smaller scale but same type that happens on that replica.
The problem that we have is during any significant read query the replication pauses and waits for the read query to finish before continuing. Resulting in often timeouts, as those read queries may run for (sometimes) many hours. This results in the following error:
2024-03-21T09:49:22.030113Z 2386083 [Warning] [MY-010584] [Repl] Replica SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000639, end_log_pos 853032011; Could not execute Update_rows event on table DB1.table1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's source log mysql-bin.000639, end_log_pos 853032011, Error_code: MY-001205
We can increase the timeout, but thats not what we are after. Prior to the migration, the 5.7 version of the replica was keeping up to date with replication throughout the day. During these past 2 months, we have tried tweaking various variables in an effort to get back to the same state as in 5.7, but with no success. These are the final variables that we are using at the moment different from the previous defaults that we had:
replica_preserve_commit_order = "OFF";
replica_parallel_type = "DATABASE";
replica_parallel_workers=1;
innodb_adaptive_hash_index="OFF";
And this is our default cnf (some values are edited out):
# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=XXX
default-character-set=utf8
[mysqld]
open-files-limit = 100000
table_open_cache = 20000
innodb_open_files = 20000
character-set-server = utf8
collation-server = utf8_unicode_ci
datadir = XXX
long_query_time = 5
slow_query_log = 1
slow_query_log_file = XXX
log_bin = mysql-bin
log-error = XXX
pid-file = XXX
default_storage_engine = InnoDB
pxc_strict_mode = ENFORCING
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
early-plugin-load = keyring_file.so
keyring-file-data = XXX
pxc-encrypt-cluster-traffic = ON
socket = XXX
server-id = 9
binlog-ignore-db = information_schema
binlog-ignore-db = mysql
binlog-ignore-db = performance_schema
binlog-ignore-db = XXX
binlog_format = ROW
log_slave_updates = ON
#
binlog_expire_logs_seconds = 604800
ft_min_word_len = 1
ft_stopword_file = ''
join_buffer_size = 256K
key_buffer_size = 16M
lc-messages-dir = /usr/share/mysql
lower_case_table_names = 0
max_allowed_packet = 512M
max_connections = 400
max_heap_table_size = 16M
memlock = 0
read_buffer_size = 256K
read_rnd_buffer_size = 512K
replica_preserve_commit_order = OFF
replica_parallel_type = DATABASE
replica_parallel_workers = 1
skip-external-locking
skip_name_resolve
sort_buffer_size = 256K
sysdate_is_now = 1
table_definition_cache = 20000
thread_cache_size = 20
thread_stack = 256K
tmp_table_size = 16M
tmpdir = XXX
net_read_timeout = 60
# 5.6 backwards compatibility
#
explicit_defaults_for_timestamp = 1
# INNODB OPTIONS - You may want to tune the below depending on number of cores and disk sub
#
innodb_autoinc_lock_mode = 1
innodb_adaptive_hash_index = OFF
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 12G
innodb_doublewrite = 1
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 3
innodb_flush_method = O_DIRECT
innodb_io_capacity = 200
innodb_log_buffer_size = 8M
innodb_log_files_in_group = 2
innodb_log_file_size = 128M
innodb_read_io_threads = 4
innodb_thread_concurrency = 0
innodb_write_io_threads = 4
# avoid statistics update when doing e.g show tables
#
innodb_ft_min_token_size = 0
innodb_stats_on_metadata = 0
[mysqld_safe]
pid-file = XXX
#syslog
malloc-lib = /usr/lib64/libtcmalloc.so.4.4.5
I am assuming that we are missing a key piece here, but so far unable to find it. As a last resort we are considering changing the default TXN isolation from REPEATABLE READ to READ COMMITTED or UNCOMMITTED, which may help with the locking, but that will require us to do significantly more research into implications in our use cases, so would like to avoid if at all possible.
Any suggestions are welcomed?