Help Troubleshooting a behaivor with max connections and Max treads.

albeda1 · June 10, 2020, 1:42am

Hi People, It’s my first question here, after looking in a lot of places before. Any help will be welcome!
Percona MySQL 5.7.24-27 running over a private on premise vmware vm, with external HP SAS 15k rpm disks, with 32 cores and 32 GB of ram. Only MySql running here, no other app nor softwares. Linux CentOS 7
MySQL Conf
max_allowed_packet=4M
innodb_log_file_size=2Ginnodb_flush_log_at_trx_commit=1innodb_flush_method=O_DIRECTmax_connections = 950 innodb_buffer_pool_size=15589934592 innodb_buffer_pool_chunk_size=2073741824innodb_buffer_pool_instances=20
thread_handling=pool-of-threads

I cannot solve a behavior I see when a high load comes from the app server. By the way, during this high load period, the app server become very unstable and very slow. Impossible to work with because the DB is not returning data. The worst part, during that period, the DB server answers flawesly using the cli.
So, after lot of test… I do not understand why when my max connection limit is at near 1000 connections, MySql Stops at near 250 concurrents, and no more. (And I’m sure there are a lot more wating… )
So when this limit is reached, the app server stop responding…
Disk seams normal… Lot of IOPs available… processor is not idle, but lot of capacity is available during this high load period. Attaching a graph to be more precise. You can see it stops at 280 connections and cannot climb to max.
I’m really running nuts…

vaibhav_upadhyay40 · June 11, 2020, 5:58pm

share the processlist and CPU utilisation during peak/incident
commands:
mysql> select * from information_schema.processlist order by time desc;
shell> sar or graph (select few hours before and after peak/incident)

2) Share the app server resource utilisation during the peak.
CPU and Memory at least. Good to have iops also

3) share below mysql command output
mysql> show engine innodb status \G

5) any error msg on app side while connecting to DB during peak/issue? please check. There can be a bottleneck on app side as well (probability)

6) share below output
app –> shell> ulimit -a
db –> shell> ulimit -a
db—> mysql> show global variables;

PS: nothing to with above issue.
*round of the innodb_buffer_pool_size if possible.
*For best efficiency, specify a combination of innodb_buffer_pool_instances and innodb_buffer_pool_size so that each buffer pool instance is at least 1GB. For e.g. if buffer pool is 15GB then innodb_buffer_pool_instances should not be > 15.

albeda1 · June 12, 2020, 12:20am

Thanks a lot for your time vaibhav_upadhyay40, i really appreciate it.

1 and 2 are not easy to answer, cause I have not recorded that info at that time, and I cannot reproduce it. Let me explain better.
I’m running a Moodle site, with 15000 students. The site goes well with lots of students using the plattform, 1000, 1500 logged in-no problem… but the problem starts in an exam, when near 2000 needs to access to the app server at the same time. so, it hangs after login. The error is the typical 503, cause the SQL cannot answer fast enough, and the appserver ends the connections cause timeout.
In front of the MySql server, we have NGINX+Varnish acting as SSL offload server and reverse proxy/cache. in another server, runs the apache with php-fpm and moodle… and the last its the Database. all over a cluster of 3 Vmware baremetal hosts, with plenty of capacty.
The 3 vm uses 100 cores and 100gb of ram. The Storage is a HP 2040 SAS one, with SSD discs in RAID1.

I cannot reproduce the behaivor cause it’s a live environment, and when hangs, lot of students wants to kill me… so… for now, we are taking smaller exams, with less students… until we found some kind of solution to test.

So, for 1 and 2 question, I have no historic data to show, but trust me… about processlist,CPU utilisation, memory, IOPS and Internet Bandwidth, was pretty free, not near 20%… this is why I’m surprised with the graph showed in my first post, showing a clear limit in Connections. I’m also behind a problem in the APP server, but everithing seems normal…

About 3 and 6, I’m uploading in two files the results. About Ulimit, I think they are enough…
prlimit --pid 11943 (the values are the same for all the app involved nginx, apache, varnish, etc, this is from the MYSQL)RESOURCE DESCRIPTION SOFT HARD UNITSAS address space limit unlimited unlimited bytesCORE max core file size 0 unlimited blocksCPU CPU time unlimited unlimited secondsDATA max data size unlimited unlimited bytesFSIZE max file size unlimited unlimited blocksLOCKS max number of file locks held unlimited unlimitedMEMLOCK max locked-in-memory address space 65536 65536 bytesMSGQUEUE max bytes in POSIX mqueues 819200 819200 bytesNICE max nice prio allowed to raise 0 0NOFILE max number of open files 100240 150240NPROC max number of processes 127946 127946RSS max resident set size unlimited unlimited pagesRTPRIO max real-time priority 0 0RTTIME timeout for real-time tasks unlimited unlimited microsecsSIGPENDING max number of pending signals 127946 127946STACK max stack size 8388608 unlimited bytes

So… any advice or trick… will be welcome… cause this is giving me nuts…

albeda1 · June 12, 2020, 12:20am

By the way, I also changed in the three server

echo 2000 65000 > /proc/sys/net/ipv4/ip_local_port_range

who knows… this is a change I did after the last problem, did not got the chance to test in the live system.

vaibhav_upadhyay40 · June 12, 2020, 6:56am

MySQL
“cause the SQL cannot answer fast enough, and the appserver ends the connections cause timeout”

If slow-query-log is enabled, please check the slow queries and identify the which query(s) is not fast enough.
If not then i suggest you to enable it. set the log-query-time as per your expected/standard SLA for the database response.
Since you are using PMM for monitoring you can refer to “MySQL Overview” dashboard –> graph name “MySQL Slow Queries”

php-fpm
also check the php-fpm logs, verify the max children , max request are enough and efficient.
Isolate which call/code is experiencing slowness or timeout during the incident.

Nginx
check the logs for any errors/warnings, verify the worker_process and worker_connections are efficiently tuned.

Topic		Replies	Views
100% Connections Usage Problem Other MySQL® Questions	5	649	September 5, 2007
Advice pls my.cnf settings for my system Other MySQL® Questions	2	713	October 6, 2007
Slow site during peak times... help! Other MySQL® Questions	13	722	October 17, 2007
Newbie, Dont know where to start.. require ur assistance Other MySQL® Questions	3	418	April 13, 2007
Max Concurrent Connections Other MySQL® Questions	4	2385	September 10, 2008

Help Troubleshooting a behaivor with max connections and Max treads.

Related topics