Hi team,
maybe stupid question but after I have seen many issues with MySQL 8.0.x and problem with cluster, is there any minor version that has been run longer in production environment without random issues? (tried and every second day ended with unusable cluster)
I really like what you are doing and you are the few ones that are openly sharing DBA knowledge.
For now I’m on 5.7 xtradb cluster and it works great just checking if it’s worth upgrading or waiting until some releases pass and upstream source of MySQL and XtraDB for v8.x.x are battle proven.
Kind Regards.
1 Like
Hello @hmilkovi,
PXC 8 has been out for almost 2 years and is incredibly stable. If you are experiencing issues, I suspect it is a configuration issue. Can you elaborate on the errors you are seeing?
1 Like
Hi @matthewb,
many thanks for helping .
I have monitoring system that saves changes in Xtradb cluster and I’m hitting both issues:
I have big concurrency in updates of data but I balance it to one node and in application logic we retry on lock timeout or deadlock with linear back off.
I don’t see any oom kills or similar. Configuration it the same but 5.7 works great 8.0.x doesn’t .
Docker image version: 8.0.23-14.1
Config:
[mysqld]
# Tuning
innodb_buffer_pool_size = 6G
wsrep_slave_threads = 64
innodb_read_io_threads = 32
innodb_write_io_threads = 32
innodb_io_capacity = 6000
innodb_io_capacity_max = 8000
innodb_purge_threads = 8
innodb_page_cleaners = 8
innodb_buffer_pool_instances = 6
wsrep_provider_options = "pc.weight=10; gcache.size=1G;"
wsrep_sync_wait = 7
# Limits
max-connections = 10000
open-files-limit = 65535
Only thing I see in logs is deadlock detected, queries stuck in process list and node get’s thrown out of the cluster after some time.
After hanging I noticed nodes get killed with signal 11 and gcache get’s corrupted.
1 Like
I will take more time to investigate does issues when I get some more time.
I have tried to reproduce it but I can’t as application logic changed to avoid deadlocks is possible.
1 Like
Do you have any foreign keys? Are you running in repeatable-read mode?
Why are you running pc.weight=10?
64 wsrep slave threads, 6000 io_cap for only 6G buffer pool makes me think you just slapped some numbers in there without understanding what each of those parameters does. Please read Tuning MySQL/InnoDB Flushing for a Write-Intensive Workload - Percona Database Performance Blog
wsrep_sync_wait = 7 is quite intense and will certainly cause slowness in your application. Have you tested with this disabled or maybe =1?
3 Likes
I don’t have any foreign keys. Yes I’m using REPEATABLE-READ mode.
I had 5 node cluster and flow control was an problem then i have lifted it to 64 wsrep slave threads. As we had performance issues (round trip on all 5 nodes) scaled down to 3 nodes but left this parameter.
After better considerations of application logic you are right wsrep_sync_wait is more then enough.
Thanks for pointing me to this guide after reading about io_cap i have removed it as default seems enough. Also changed other variables as suggested in guide.
Now I will test it and come with feedback after 2 days .
My suspicion is that it was my fault that I tried to perform heavy write load on 5 node cluster with bad configuration as after deadlock + flow control 100% kicked in cluster crashed.
I’m so angry at myself as I didn’t take more time to investigate mysqld crash.
1 Like
@matthewb maaany thanks for help .
Now it’s stable and didn’t have any issues for 2 days and really it was my own miss-configuration.
I think issue was that I have set wsrep_sync_wait = 7 + 5 nodes + heavy insert/updates concurrency where all nodes couldn’t keep up and flow control kicked + many deadlocks.
After fixing configuration, application logic and downscale from 5 to 3 nodes to reduce round trips all is perfect .
1 Like