Getting latency in write operations under heavy load

dipesh.patel · April 22, 2022, 12:24pm

Hi, I am using Percona Xtradb Cluster 5.7.

We have 3 node server on GCP with 70cores & 560 GB RAM on each node.

What we see is during heavy load on server, we see Insert/update queries taking more time, some time it goes above 1 min.

We are using ProxySQL to do load balancing, with all write operations going on single node, we don’t write on multiple nodes.

Configuration on each node is as below

[client]
port = 3306
socket=/data/mysql/mysql.sock

[mysqld]

GENERAL

user = mysql
default-storage-engine = InnoDB
socket = /data/mysql/mysql.sock
pid-file = /data/mysql/mysql.pid
server-id = 1
skip-name-resolve

MyISAM

key-buffer-size = 32M

SAFETY

max-allowed-packet = 2048M
max-connect-errors = 1000000
sql_mode = NO_ENGINE_SUBSTITUTION

DATA STORAGE

datadir = /data/mysql/

BINARY LOGGING

log-bin = /data/mysql/mysql-bin
expire-logs-days = 7
sync-binlog = 0
log_slave_updates

#sync-binlog is changed to 0 from 1

CACHES AND LIMITS

tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 6000
thread-cache-size = 150
open-files-limit = 1024000
table-definition-cache = 5120
table-open-cache = 10240

INNODB

innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 4
innodb-log-file-size = 5G
innodb-file-per-table = 1
innodb-buffer-pool-size = 350G
innodb_buffer_pool_chunk_size = 256M
innodb_io_capacity = 20000
innodb_io_capacity_max = 80000
innodb_read_io_threads = 12
innodb_thread_concurrency = 0
innodb_write_io_threads = 12
innodb_flush_log_at_trx_commit = 2

LOGGING

log-error = /data/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 0
slow-query-log-file = /data/mysql/mysql-slow.log
group_concat_max_len = 10485760

WSREP configuration

wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_provider_options=“gcache.size=15G;gcs.fc_limit=600; gcs.fc_master_slave=YES; gcs.fc_factor=1.0”

[mysqld]

In order for Galera to work correctly binlog format should be ROW

binlog_format=ROW

MyISAM storage engine has only experimental support

default_storage_engine=InnoDB

Slave thread to use

wsrep_slave_threads= 70

wsrep_certification_rules=OPTIMIZED

wsrep_log_conflicts

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode=2

SST method

wsrep_sst_method=xtrabackup-v2

Below is the wsrep status on node3 (master)

±---------------------------------±---------------+
| Variable_name | Value |
±---------------------------------±---------------+
| wsrep_flow_control_paused_ns | 678859607091 |
| wsrep_flow_control_paused | 0.040823 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 1352 |
| wsrep_flow_control_interval | [ 1200, 1200 ] |
| wsrep_flow_control_interval_low | 1200 |
| wsrep_flow_control_interval_high | 1200 |
| wsrep_flow_control_status | OFF |
±---------------------------------±---------------+
8 rows in set (0.00 sec)

±---------------------------±-------------------------------------+
| Variable_name | Value |
±---------------------------±-------------------------------------+
| wsrep_local_state_uuid | 4ac77c71-4d37-11e8-a591-26cd1b1f321b |
| wsrep_local_commits | 3677445 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 866 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 1.161227 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 2 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0.002228 |
| wsrep_local_cached_downto | 9412001488 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
±---------------------------±-------------------------------------+
17 rows in set (0.01 sec)

±------------------------±------+
| Variable_name | Value |
±------------------------±------+
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 20 |
| Threads_connected | 2951 |
| Threads_created | 3032 |
| Threads_running | 18 |
±------------------------±------+
6 rows in set (0.00 sec)

±-------------------------±----------+
| Variable_name | Value |
±-------------------------±----------+
| wsrep_cert_deps_distance | 80.328236 |
±-------------------------±----------+
1 row in set (0.00 sec)

Below is the wsrep status on node2( slave)

±---------------------------------±---------------+
| Variable_name | Value |
±---------------------------------±---------------+
| wsrep_flow_control_paused_ns | 703714065012 |
| wsrep_flow_control_paused | 0.040810 |
| wsrep_flow_control_sent | 529 |
| wsrep_flow_control_recv | 1352 |
| wsrep_flow_control_interval | [ 1200, 1200 ] |
| wsrep_flow_control_interval_low | 1200 |
| wsrep_flow_control_interval_high | 1200 |
| wsrep_flow_control_status | OFF |
±---------------------------------±---------------+
8 rows in set (0.00 sec)

±---------------------------±-------------------------------------+
| Variable_name | Value |
±---------------------------±-------------------------------------+
| wsrep_local_state_uuid | 4ac77c71-4d37-11e8-a591-26cd1b1f321b |
| wsrep_local_commits | 23492 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1231 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 27.537734 |
| wsrep_local_cached_downto | 9412001742 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 2 |
±---------------------------±-------------------------------------+
17 rows in set (0.00 sec)

±------------------------±------+
| Variable_name | Value |
±------------------------±------+
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 150 |
| Threads_connected | 36 |
| Threads_created | 749 |
| Threads_running | 1 |
±------------------------±------+
6 rows in set (0.00 sec)

±-------------------------±----------+
| Variable_name | Value |
±-------------------------±----------+
| wsrep_cert_deps_distance | 80.322232 |
±-------------------------±----------+
1 row in set (0.00 sec)

Below is the status of wsrep on node1 (slave)

±---------------------------±-------------------------------------+
| Variable_name | Value |
±---------------------------±-------------------------------------+
| wsrep_local_state_uuid | 4ac77c71-4d37-11e8-a591-26cd1b1f321b |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 1191 |
| wsrep_local_recv_queue_max | 1228 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 35.918619 |
| wsrep_local_cached_downto | 9412001742 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
±---------------------------±-------------------------------------+
17 rows in set (0.00 sec)

±------------------------±------+
| Variable_name | Value |
±------------------------±------+
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 28 |
| Threads_connected | 46 |
| Threads_created | 74 |
| Threads_running | 1 |
±------------------------±------+
6 rows in set (0.00 sec)

±-------------------------±----------+
| Variable_name | Value |
±-------------------------±----------+
| wsrep_cert_deps_distance | 80.309898 |
±-------------------------±----------+
1 row in set (0.00 sec)

Can someone help on what is going wrong here?

Wayne_Leutwyler · April 22, 2022, 1:27pm

I have seen problems with the wsrep_slave_threads being to high to cause issues. Check out this link: Setting Parallel Slave Threads — Galera Cluster Documentation. What does you show processlist look like when you are seeing the 1 min response time?

dipesh.patel · April 22, 2022, 1:49pm

Hi,
I tried setting ‘wsrep_slave_threads’ value to 8, 16, 32, 70, 80, 120.

All the times same issue was faced.

Under heavy load process list shows above 4K connections & all wsrep threads writing

dipesh.patel · April 22, 2022, 1:51pm

Also, My master node has CPU usage of around 70% under heavy load and other 2 slave servers has CPU load of 1% or 2%

Wayne_Leutwyler · April 22, 2022, 1:53pm

Are your cluster nodes in different locations. If so look at network latency. Also check to see if you are facing disk I/O issues. How does ProxySQL look under the heavy loads?

dipesh.patel · April 22, 2022, 2:00pm

All nodes are in same location, network latency is <1ms.

Not facing disk I/O issues,

ProxySQL looks normal no wired behaviour in terms of CPU/RAM

dipesh.patel · April 22, 2022, 2:12pm

Does difference in mysql version creates issue

Master node: mysql Ver 14.14 Distrib 5.7.30-33, for Linux (x86_64) using 6.2
Slave1: mysql Ver 14.14 Distrib 5.7.36-39, for Linux (x86_64) using 6.2
Slave2: mysql Ver 14.14 Distrib 5.7.24-26, for Linux (x86_64) using 6.2

Wayne_Leutwyler · April 22, 2022, 2:22pm

So for me that is not best practice. All nodes should be on same version. Couple more things we have been focused on MySQL settings, yet we have not looked at the database it self. Check to see that all you tables have primary keys, also review you slow query log.

dipesh.patel · April 22, 2022, 2:41pm

We have primary keys in all tables. Also we have tables with composite primary keys.

We have multi tenant application and for each tenant we have different databases.

For slow query does Query Time listed in logs, are the query time after replication or before replication

Wayne_Leutwyler · April 22, 2022, 2:58pm

Before you do anything else, I would get all your servers on the same version. Then we can continue to try and solve the issue if it continues.

matthewb · April 22, 2022, 3:25pm

Query time in the slow log does not include replication. This is especially true for SELECTs which don’t replicate. Since you have the query log enabled, what do the stats say for where the time is being spent? Do you have each node monitored by PMM? When your primary node is at 70%, what do you see with regards to disk and network? Is all 70% of that consumed by MySQL, or do you have lots of await%?

dipesh.patel · April 22, 2022, 4:01pm

We have PMM installed before couple of hours

As we are talking we are seeing slow write queries, attached is the screenshot from PMM

Let me know if you need anything else, i am happy to provide

dipesh.patel · April 22, 2022, 9:27pm

@matthewb Slow log is majorly coming for INSERT/UPDATE/DELETE statements, e.g is as below

#Time: 2022-04-22T21:19:15.052186Z
#User@Host: XXXXXXX @ [10.32.14.19] Id: 23239
#Schema: XXXXXX Last_errno: 0 Killed: 0
#Query_time: 10.107317 Lock_time: 0.000171 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1
#Bytes_sent: 52
use XXXXX;
SET timestamp=1650662355;
UPDATE emails SET emailMode = ‘opened’ WHERE id=‘2540993’;

#Time: 2022-04-22T21:19:15.052269Z
#User@Host: XXXXXX @ [10.32.14.19] Id: 22299
#Schema: XXXXXX Last_errno: 0 Killed: 0
#Query_time: 10.020400 Lock_time: 0.000307 Rows_sent: 0 Rows_examined: 0 Rows_affected: 1
#Bytes_sent: 19
use XXXXX;
SET timestamp=1650662355;
INSERT INTO microsoft_synced_email_changes (id,hostname,accountId,owner,emailId,event,processingStatus,retryCount,enqueuedAt) VALUES (DEFAULT,‘XXXXX’,‘XXXXXXX’,25,‘XXXXXXXX’,‘Updated’,‘InQueue’,0,1650662345);

#Time: 2022-04-22T21:19:15.052395Z
#User@Host: XXXXXX @ [10.32.0.23] Id: 23240
#Schema: XXXXXX Last_errno: 1265 Killed: 0
#Query_time: 10.001954 Lock_time: 0.000049 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1
#Bytes_sent: 52
use XXXXXX;
SET timestamp=1650662355;
UPDATE emails SET emailMode = ‘sending’ WHERE id = 7406615;

If there are lot of such queries. Interesting thing is that even query with where condition on primary key is taking 10s and more.

One observation from my side

What i have observed is that, during SST when only one node is Synced/Primary and other 2 are in Donor and Joiner state, I don’t see slow logs for INSERT/UPDATE/DELETE statements. i.e when certification process is not happening between nodes

matthewb · April 23, 2022, 7:55pm

That makes perfect sense. If there’s only 1 node in the cluster, then there’s nothing to replicate and thus things process very fast.

When you took that PMM screenshot, were you seeing slow writes at that time? What about flow control during these slow events?

dipesh.patel · April 25, 2022, 7:36am

Hi @matthewb

Here is the PMM screenshot of Galera Node Compare, during the period when we see such behaviour. Here at 11:31 AM we saw latency of more than 10s in write queries.

What is see is when the replication queue goes high it creates a problem.

FYI
We have updated all our MySQL nodes to 5.7.36-39-57.

Also as suggested in one comment, we updated the wsrep_slave_threads to 16 on all the servers.

We have changed the servers to n2-highmem-80 of GCP which is (80 cores, 629GB RAM). Changed the disk to Extreme Persistent disk with 10000 IOPS (dedicated).

Below is the screenshot of Galera Node Compare

Below is the screenshot of InnoDB Details of master node on which write is happening

Let me know if you need any other details

matthewb · April 25, 2022, 5:25pm

Yes, this is known as “flow control.” Managing Flow Control — Galera Cluster Documentation
I can see on the graphs flow control kicking in every now and then. You might want to adjust gc.fc_limit to something higher than the default. Try using 300 or 400, and set gcs.fc_factor=0.9

I’m also seeing lots of InnoDB Transaction History. This is usually caused by long running queries. Make sure your queries are short as possible and don’t hold open transactions. I would also recommend innodb_flush_log_at_trx_commit=0

Beyond this, you might benefit from a Percona Database Consulting - Percona that can go into more details.

dipesh.patel · April 25, 2022, 5:31pm

@matthewb Let me set the params as you suggested and see, does it improve performance.

Does increasing wsrep_slave_threads will help?

dipesh.patel · April 25, 2022, 5:44pm

@matthewb Flow control is already at higher value as below

wsrep_flow_control_interval | [ 624, 693 ]
wsrep_flow_control_interval_low | 624
wsrep_flow_control_interval_high | 693

And i have already signed the Advance Support Contract with Percona Today

dipesh.patel · April 25, 2022, 7:02pm

@matthewb After setting these params still getting issue

matthewb · April 27, 2022, 3:25pm

What does PMM show on one of the reader nodes during these scenarios? Pull up CPU/Disk during these events. There is something which is causing these nodes to lag behind which is then causing them to issue flow control messages which causes the writer to slow down.

Topic		Replies	Views
Write performance is horrible on 2nd node Percona XtraDB Cluster 5.x	2	983	September 9, 2013
Cluster hang with wsrep: initiating replication for write set Percona XtraDB Cluster 5.x	8	6963	November 14, 2017
Poor Galera performance Percona XtraDB Cluster 5.x	12	11249	April 24, 2017
Poor writing performance on Percona XtraDB Percona XtraDB Cluster 5.x	4	1625	April 24, 2017
Very very slow insert for a cluster of 3 nodes :( Percona XtraDB Cluster 5.x	12	9805	January 22, 2014