Hi, I am using Percona Xtradb Cluster 5.7.
We have 3 node server on GCP with 70cores & 560 GB RAM on each node.
What we see is during heavy load on server, we see Insert/update queries taking more time, some time it goes above 1 min.
We are using ProxySQL to do load balancing, with all write operations going on single node, we don’t write on multiple nodes.
Configuration on each node is as below
[client]
port = 3306
socket=/data/mysql/mysql.sock
[mysqld]
GENERAL
user = mysql
default-storage-engine = InnoDB
socket = /data/mysql/mysql.sock
pid-file = /data/mysql/mysql.pid
server-id = 1
skip-name-resolve
MyISAM
key-buffer-size = 32M
SAFETY
max-allowed-packet = 2048M
max-connect-errors = 1000000
sql_mode = NO_ENGINE_SUBSTITUTION
DATA STORAGE
datadir = /data/mysql/
BINARY LOGGING
log-bin = /data/mysql/mysql-bin
expire-logs-days = 7
sync-binlog = 0
log_slave_updates
#sync-binlog is changed to 0 from 1
CACHES AND LIMITS
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 6000
thread-cache-size = 150
open-files-limit = 1024000
table-definition-cache = 5120
table-open-cache = 10240
INNODB
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 4
innodb-log-file-size = 5G
innodb-file-per-table = 1
innodb-buffer-pool-size = 350G
innodb_buffer_pool_chunk_size = 256M
innodb_io_capacity = 20000
innodb_io_capacity_max = 80000
innodb_read_io_threads = 12
innodb_thread_concurrency = 0
innodb_write_io_threads = 12
innodb_flush_log_at_trx_commit = 2
LOGGING
log-error = /data/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 0
slow-query-log-file = /data/mysql/mysql-slow.log
group_concat_max_len = 10485760
WSREP configuration
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_provider_options=“gcache.size=15G;gcs.fc_limit=600; gcs.fc_master_slave=YES; gcs.fc_factor=1.0”
[mysqld]
In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
MyISAM storage engine has only experimental support
default_storage_engine=InnoDB
Slave thread to use
wsrep_slave_threads= 70
wsrep_certification_rules=OPTIMIZED
wsrep_log_conflicts
This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
SST method
wsrep_sst_method=xtrabackup-v2
Below is the wsrep status on node3 (master)
±---------------------------------±---------------+
| Variable_name | Value |
±---------------------------------±---------------+
| wsrep_flow_control_paused_ns | 678859607091 |
| wsrep_flow_control_paused | 0.040823 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 1352 |
| wsrep_flow_control_interval | [ 1200, 1200 ] |
| wsrep_flow_control_interval_low | 1200 |
| wsrep_flow_control_interval_high | 1200 |
| wsrep_flow_control_status | OFF |
±---------------------------------±---------------+
8 rows in set (0.00 sec)
±---------------------------±-------------------------------------+
| Variable_name | Value |
±---------------------------±-------------------------------------+
| wsrep_local_state_uuid | 4ac77c71-4d37-11e8-a591-26cd1b1f321b |
| wsrep_local_commits | 3677445 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 866 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 1.161227 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 2 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0.002228 |
| wsrep_local_cached_downto | 9412001488 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
±---------------------------±-------------------------------------+
17 rows in set (0.01 sec)
±------------------------±------+
| Variable_name | Value |
±------------------------±------+
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 20 |
| Threads_connected | 2951 |
| Threads_created | 3032 |
| Threads_running | 18 |
±------------------------±------+
6 rows in set (0.00 sec)
±-------------------------±----------+
| Variable_name | Value |
±-------------------------±----------+
| wsrep_cert_deps_distance | 80.328236 |
±-------------------------±----------+
1 row in set (0.00 sec)
Below is the wsrep status on node2( slave)
±---------------------------------±---------------+
| Variable_name | Value |
±---------------------------------±---------------+
| wsrep_flow_control_paused_ns | 703714065012 |
| wsrep_flow_control_paused | 0.040810 |
| wsrep_flow_control_sent | 529 |
| wsrep_flow_control_recv | 1352 |
| wsrep_flow_control_interval | [ 1200, 1200 ] |
| wsrep_flow_control_interval_low | 1200 |
| wsrep_flow_control_interval_high | 1200 |
| wsrep_flow_control_status | OFF |
±---------------------------------±---------------+
8 rows in set (0.00 sec)
±---------------------------±-------------------------------------+
| Variable_name | Value |
±---------------------------±-------------------------------------+
| wsrep_local_state_uuid | 4ac77c71-4d37-11e8-a591-26cd1b1f321b |
| wsrep_local_commits | 23492 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1231 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 27.537734 |
| wsrep_local_cached_downto | 9412001742 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 2 |
±---------------------------±-------------------------------------+
17 rows in set (0.00 sec)
±------------------------±------+
| Variable_name | Value |
±------------------------±------+
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 150 |
| Threads_connected | 36 |
| Threads_created | 749 |
| Threads_running | 1 |
±------------------------±------+
6 rows in set (0.00 sec)
±-------------------------±----------+
| Variable_name | Value |
±-------------------------±----------+
| wsrep_cert_deps_distance | 80.322232 |
±-------------------------±----------+
1 row in set (0.00 sec)
Below is the status of wsrep on node1 (slave)
±---------------------------------±---------------+
| Variable_name | Value |
±---------------------------------±---------------+
| wsrep_flow_control_paused_ns | 679410518302 |
| wsrep_flow_control_paused | 0.040840 |
| wsrep_flow_control_sent | 824 |
| wsrep_flow_control_recv | 1358 |
| wsrep_flow_control_interval | [ 1200, 1200 ] |
| wsrep_flow_control_interval_low | 1200 |
| wsrep_flow_control_interval_high | 1200 |
| wsrep_flow_control_status | OFF |
| wsrep_flow_control_active | false |
| wsrep_flow_control_requested | false |
±---------------------------------±---------------+
10 rows in set (0.00 sec)
±---------------------------±-------------------------------------+
| Variable_name | Value |
±---------------------------±-------------------------------------+
| wsrep_local_state_uuid | 4ac77c71-4d37-11e8-a591-26cd1b1f321b |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 1191 |
| wsrep_local_recv_queue_max | 1228 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 35.918619 |
| wsrep_local_cached_downto | 9412001742 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
±---------------------------±-------------------------------------+
17 rows in set (0.00 sec)
±------------------------±------+
| Variable_name | Value |
±------------------------±------+
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 28 |
| Threads_connected | 46 |
| Threads_created | 74 |
| Threads_running | 1 |
±------------------------±------+
6 rows in set (0.00 sec)
±-------------------------±----------+
| Variable_name | Value |
±-------------------------±----------+
| wsrep_cert_deps_distance | 80.309898 |
±-------------------------±----------+
1 row in set (0.00 sec)
Can someone help on what is going wrong here?