Hi everyone!
I recently migrated my client from PXC 5.7.34-37 to PXC 8.0.36-28.1.
The old environment operated on 3 LAMP servers, which ran, among others: galera, it is worth adding that these machines were not powerful.
OLD GALERA CONFIG:
[mysqld]
server-id=14
datadir=/var/lib/mysql
tmpdir=/data/var/mysql-tmp
log-bin=/data/var/lib/mysql/mysql-bin
relay-log=/data/var/lib/mysql/relay
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin
log_slave_updates
expire_logs_days=7
bind-address = 0.0.0.0
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_name=pxc-cluster
wsrep_cluster_address=gcomm://some,nice,ips
#wsrep_cluster_address=gcomm://
wsrep_node_name=node1.mydomain.pl
wsrep_node_address=some_nice_ip
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sstuser:someniceauth
transaction-isolation = READ-UNCOMMITTED
innodb_file_per_table
innodb_file_format = barracuda
innodb_log_file_size = 256M
innodb_buffer_pool_size = 140GB
innodb_buffer_pool_instances = 8
innodb_flush_method = O_DIRECT
innodb_lock_wait_timeout = 100
innodb_autoinc_lock_mode=2
binlog_format=ROW
max_binlog_files=2000
max_binlog_size=50M
innodb_flush_log_at_trx_commit=2
sync_binlog=1
pxc_strict_mode=ENFORCING
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
sql-mode="NO_ENGINE_SUBSTITUTION"
max_connections=2048
secure-file-priv=/somewhere/is/nfs
[sst]
sst-initial-timeout=10000
tmpdir=/var/lib/mysql-sst-tmp
LAMPs were migrated and divided into Proxmox VMs on ZFS, with NVME disks.
However, some queries can take much longer to execute than in the previous environment.
NEW GALERA CONFIG:
# Template my.cnf for PXC
# Edit to your requirements.
[client]
# Socket file location for client connections
socket=/var/lib/mysql/mysql.sock
[mysqld]
# Server identification
server-id=1
#read_only=1
# Data directory and files
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/run/mysqld/mysqld.pid
log-error=/var/log/mysqld.log
log-bin=some_nice-sql-1-bin
# Network settings
bind-address=some_nice_ip
skip-name-resolve=1
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Security and authentication
default-authentication-plugin=mysql_native_password
# Note: Remember to remove 'skip-grant-tables' in production for security reasons
# Logging and binary logging
log-error-verbosity=1
expire_logs_days=1
binlog_expire_logs_seconds=86400 # Equals 1 days
log_slave_updates=1
binlog_format=ROW
# Performance settings
join_buffer_size=8M
innodb_redo_log_capacity=8G
max_connections=2048
transaction-isolation=READ-UNCOMMITTED
innodb_file_per_table=1
innodb_buffer_pool_size=220GB
innodb_buffer_pool_instances=16
innodb_log_buffer_size=16M
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_lock_wait_timeout=100
innodb_adaptive_hash_index=1
tmp_table_size=16M
max_heap_table_size=16M
sort_buffer_size=524288
# SQL modes and defaults
sql-mode="NO_ENGINE_SUBSTITUTION"
explicit_defaults_for_timestamp=1
default_storage_engine=InnoDB
# Galera Cluster Configuration
wsrep_provider=/usr/lib64/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://some,nice,ips
wsrep_node_address=some_nice_ip
wsrep_cluster_name=some_nice-galera-cluster
wsrep_node_name=some_nice_node_1 # If not specified, system hostname will be used
wsrep_slave_threads=128
wsrep_log_conflicts=1
wsrep_sst_method=xtrabackup-v2
pxc-encrypt-cluster-traffic=OFF
pxc_strict_mode=ENFORCING
innodb_autoinc_lock_mode=2
wsrep_provider_options="gcache.size=4G" # Cache size for incremental state transfer
# PMM configuration
slow_query_log=ON
log_output=FILE
long_query_time=10
log_slow_admin_statements=ON
log_slow_slave_statements=ON
log_slow_rate_limit=100
log_slow_rate_type='query'
slow_query_log_always_write_time=1
log_slow_verbosity='full'
slow_query_log_use_global_control='all'
# NFS Config
secure-file-priv=/somewhere/is/nfs
New VMs for SQL, have 300 GB RAM, maybe do you see anything that I can improve here?
Because it works a little bit slower than prev environment…
Proxmoxes, are connected via 10G, MTA is setted on 9999, and one more think to add:
I also did a benchmark sysbench:
OLD CLUSTER: [NOT PROD]
SQL statistics:
queries performed:
read: 3240426
write: 925836
other: 462918
total: 4629180
transactions: 231459 (3857.02 per sec.)
queries: 4629180 (77140.31 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 60.0056s
total number of events: 231459
Latency (ms):
min: 1.74
avg: 2.59
max: 29.58
95th percentile: 5.67
sum: 599573.76
Threads fairness:
events (avg/stddev): 23145.9000/178.91
execution time (avg/stddev): 59.9574/0.00
NEW CLUSTER [PROD]
SQL statistics:
queries performed:
read: 1979138
write: 486415
other: 361787
total: 2827340
transactions: 141367 (2355.74 per sec.)
queries: 2827340 (47114.81 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 60.0086s
total number of events: 141367
Latency (ms):
min: 2.37
avg: 4.24
max: 1741.55
95th percentile: 6.91
sum: 599815.28
Threads fairness:
events (avg/stddev): 14136.7000/21.19
execution time (avg/stddev): 59.9815/0.00
TEST VM WITH 1 NODE GALERA [ LXC + 64 GB RAM + EXT4 + LVM ]
SQL statistics:
queries performed:
read: 4780104
write: 1365744
other: 682872
total: 6828720
transactions: 341436 (5690.29 per sec.)
queries: 6828720 (113805.87 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 60.0026s
total number of events: 341436
Latency (ms):
min: 1.03
avg: 1.76
max: 94.10
95th percentile: 2.81
sum: 599636.85
Threads fairness:
events (avg/stddev): 34143.6000/79.27
execution time (avg/stddev): 59.9637/0.00
But IMO they are not something to rely on, because some test queries ended with very similar time.
Any idea?