Extreme bottleneck on MySQL

Hi,

I have been searching for solutions to what I have been experiencing for months. Although there is nearly zero load on the server and no active domains, when only a few pages are requested, server slows down to death. I can even break it down by just using Apache’s abs.exe to simulate a few concurrent connections. In a matter of seconds after I start the benchmark with only 10 concurrent requests, I see the “server is having capacity issues” error on my browser.

When I check the Plesk health monitor, I see that MySQL is causing so much CPU load and using so much RAM. Sometimes even when the server is completely idle, I can still see that.

I decided to replace MySQL with Percona but literally nothing has changed. Even my Plesk instance was extremely slow because of this. I did all the updates I could do and used Percona configuration wizard but everything was still the same. After some months and after so many Plesk updates, now it seems my Plesk instance is working very well although I haven’t changed anything since then. But the bottleneck on my website is still there.

Now after a bit of more searching I gave the Percona configuration wizard another try and seems to have solved the CPU load issue to some extent. (It was going over %50 CPU before and now it’s %5-15 when idle; which I think is still high.) But it’s still using so much memory.

Here is my config file; [url]https://tools.percona.com/configuration/IFQanJBW[/url]

By the way I checked the previous cnf file and couldn’t see any difference that might have helped for CPU usage(!?)

Server config:
Intel Xeon E5-2620 CPU
4GB RAM
40GB SSD disk space.
Centos 6.9
Plesk Onyx 17.5.3

SHOW PROCESSLIST usually shows nothing but the shell admin, which is actually running the “show processlist” command; as I said, there is zero load. From time to time I just see this;


| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------+-------------+---------+------+--------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| 27 | admin | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
| 38 | b2c_db | localhost | b2c_db | Query | 1 | logging slow query | SELECT cscart_settings_sections.name, cscart_settings_sections.section_id, cscart_settings_sections. | 117 | 234 |
+----+-------------+-----------+-------------+---------+------+--------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
2 rows in set (0.01 sec)

And here is the last portion of the slow query list which I think is consisting of not really very slow queries:


DELETE FROM cscart_access_restriction_block WHERE expires < 1520090991;
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 675
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000523 Lock_time: 0.000132 Rows_sent: 1 Rows_examined: 126 Rows_affected: 0
# Bytes_sent: 2018
SET timestamp=1520090991;
SELECT * FROM cscart_bm_locations as l LEFT JOIN cscart_bm_locations_descriptions as d ON d.location_id = l.location_id AND d.lang_code = 'tr' WHERE l.layout_id = 6 AND l.is_default = 1 ORDER BY l.is_default desc;
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 675
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000479 Lock_time: 0.000243 Rows_sent: 1 Rows_examined: 13 Rows_affected: 0
# Bytes_sent: 117
SET timestamp=1520090991;
SELECT payment_id FROM cscart_payments INNER JOIN cscart_ult_objects_sharing ON (cscart_ult_objects_sharing.share_object_id = cscart_payments.payment_id AND cscart_ult_objects_sharing.share_company_id = 1 AND cscart_ult_objects_sharing.share_object_type = 'payments') WHERE template = 'views/orders/components/payments/cc_sanal_pos_modulleri.tpl';
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 672
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.011445 Lock_time: 0.000399 Rows_sent: 65 Rows_examined: 1099 Rows_affected: 0
# Bytes_sent: 36787
SET timestamp=1520090991;
SELECT cscart_bm_snapping.grid_id as grid_id,cscart_bm_snapping.block_id as block_id,IFNULL(dynamic_object_content.content, default_content.content) as content,IFNULL(dynamic_object_content.object_id, default_content.object_id) AS object_id,IFNULL(dynamic_object_content.object_type, default_content.object_type) AS object_type,cscart_bm_block_statuses.object_ids as object_ids,cscart_bm_snapping.*,cscart_bm_blocks.*,cscart_bm_blocks_descriptions.* FROM cscart_bm_snapping LEFT JOIN cscart_bm_blocks ON cscart_bm_blocks.block_id = cscart_bm_snapping.block_id LEFT JOIN cscart_bm_block_statuses ON cscart_bm_snapping.snapping_id = cscart_bm_block_statuses.snapping_id AND cscart_bm_block_statuses.object_type LIKE '' LEFT JOIN cscart_bm_blocks_descriptions ON cscart_bm_blocks.block_id = cscart_bm_blocks_descriptions.block_id LEFT JOIN cscart_bm_blocks_content AS default_content ON cscart_bm_blocks.block_id = default_content.block_id AND cscart_bm_blocks_descriptions.lang_code = default_content.lang_code AND default_content.snapping_id = 0 AND default_content.object_id = 0 AND default_content.object_type like '' LEFT JOIN cscart_bm_blocks_content AS dynamic_object_content ON cscart_bm_blocks.block_id = dynamic_object_content.block_id AND cscart_bm_blocks_descriptions.lang_code = dynamic_object_content.lang_code AND dynamic_object_content.object_id = 0 AND dynamic_object_content.object_type like '' WHERE cscart_bm_snapping.grid_id IN (416, 417, 418, 422, 423, 424, 419, 420, 421, 427, 428, 429, 430, 431, 433, 434, 432, 863, 442, 443, 444, 445, 446, 447, 452, 775, 823, 782, 785, 788, 861, 825, 819, 821, 834, 836, 839, 840, 842, 848, 850, 865, 858, 438, 453, 774, 824, 789, 804, 826, 818, 822, 835, 837, 841, 843, 849, 851, 881, 439, 882, 787, 820, 853, 440, 783, 436, 859, 790, 854, 773, 855, 816, 856, 838, 857, 827, 852, 845, 441, 448, 449, 450, 451) AND cscart_bm_blocks_descriptions.lang_code='tr' AND cscart_bm_blocks.company_id = 1 ORDER BY cscart_bm_snapping.order, cscart_bm_snapping.block_id;
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 676
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000292 Lock_time: 0.000118 Rows_sent: 0 Rows_examined: 7 Rows_affected: 0
# Bytes_sent: 195
SET timestamp=1520090991;
SELECT a.item_id, b.reason FROM cscart_access_restriction as a LEFT JOIN cscart_access_restriction_reason_descriptions as b ON a.item_id = b.item_id AND a.type = b.type AND lang_code = 'tr' WHERE (ip_from <= 530270013 AND ip_to >= 530270013) AND a.type IN ('ips', 'ipr', 'ipb') AND status = 'A';
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 676
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000390 Lock_time: 0.000143 Rows_sent: 4 Rows_examined: 76 Rows_affected: 0
# Bytes_sent: 519
SET timestamp=1520090991;
SELECT * FROM cscart_logos WHERE IF(layout_id = 0, 1, IF(layout_id = 6, 1, 0)) AND IF(style_id = '', 1, IF(style_id = 'Flame-Elmacik', 1, 0)) AND company_id = 1;
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 676
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000469 Lock_time: 0.000160 Rows_sent: 1 Rows_examined: 117 Rows_affected: 0
# Bytes_sent: 664
SET timestamp=1520090991;
SELECT * FROM cscart_settings_sections WHERE name = 'seo' AND type = 'ADDON' AND (FIND_IN_SET('ROOT', cscart_settings_sections.edition_type) OR FIND_IN_SET('VENDOR', cscart_settings_sections.edition_type) OR FIND_IN_SET('ULT:ROOT', cscart_settings_sections.edition_type) OR FIND_IN_SET('ULT:VENDOR', cscart_settings_sections.edition_type));
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 677
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000419 Lock_time: 0.000208 Rows_sent: 1 Rows_examined: 3 Rows_affected: 0
# Bytes_sent: 467
SET timestamp=1520090991;
SELECT cscart_languages.* FROM cscart_languages INNER JOIN cscart_ult_objects_sharing ON (cscart_ult_objects_sharing.share_object_id = cscart_languages.lang_id AND cscart_ult_objects_sharing.share_company_id = 1 AND cscart_ult_objects_sharing.share_object_type = 'languages') WHERE 1 AND cscart_languages.status = 'A';
# Time: 180303 18:29:52
# User&#64;Host: b2c_db[b2c_db] &#64; localhost [] Id: 677
# Schema: b2c_db Last_errno: 0 Killed: 0
# Query_time: 0.000523 Lock_time: 0.000241 Rows_sent: 1 Rows_examined: 7 Rows_affected: 0
# Bytes_sent: 985
SET timestamp=1520090992;
SELECT a.*, b.description FROM cscart_currencies as a LEFT JOIN cscart_currency_descriptions as b ON a.currency_code = b.currency_code AND lang_code = 'tr' INNER JOIN cscart_ult_objects_sharing ON (cscart_ult_objects_sharing.share_object_id = a.currency_id AND cscart_ult_objects_sharing.share_company_id = 1 AND cscart_ult_objects_sharing.share_object_type = 'currencies') WHERE 1 AND status IN ('A', 'H') ORDER BY a.position;

So any opinions that may help is great.
Thanks in advance.

Hi pisisler;

That does seem strange. A few questions:

  1. Is this host shared and/or a VM? Or is it a physical dedicated host?
  2. What all is running on the host? Plesk, Apache, MySQL, more?
  3. What type of disk are you running MySQL on? spinning disk? SSD?

Would be interested to see some further diagnostic info to see what your system looks like. Would look at the following:

  1. sar (to see CPU and IOWAIT mainly)
  2. free -m (to see memory and swap usage)
  3. iotop (to see what processes(s) are doing I/O wise)
  4. df -h (to see if you’re low on disk space)
  5. top (sorted by mem usage to see what other than MySQL is using up memory and how much)

The main thing that stands out is your memory situation. 4G is not a lot of memory, and you have 2G of that going to the buffer pool, so I could see you running out of memory and swapping pretty quickly if anything else is using up much memory. Easiest thing to try first would be bumping your innodb_buffer_pool_size down to 1G to see if that helps. If it does, then you know right away that it’s a memory problem.

Hi Scott.

1- This is a VM that is used only by me.
2- Standart Plesk instance, with Apache, nginx, Percona, nothing more than what Plesk ships with.
3- It’s SSD.

Actually my buffer size was already 1G and I switched to 2G to see if it would help. 2G is also what Percona config wizard suggested to me. I understand that 4G of ram is not huge but in my opinion, it should be very fairly enough to host a couple houndred visitors; while I have zero. Most of the times there is literally zero load on the server.

I attached the command results of you asked.

photoid=51061

Hi pisisler;

A few things stand out:

  1. Your SAR output is showing 7-30% IOWAIT% when the CPU spikes up (when work is actually being done)
  2. Your SAR output only shows 2 CPUs
  3. Your free -m output is showing some swapping

All of this leads me to believe that your using a VM on a shared host, and that the underlying resources are not what you think they are. Looks like you do have the 4G of RAM, but only 2 cores, and probably a very over saturated SSD. So these are likely the underlying cause of the poor performance, and is very common on shared VPS services (i.e. any VPS service that is fairly inexpensive).

So while you could further limit the resources MySQL uses (limiting memory and disk usage, disabling binary logging, etc), that would likely make it unusable for any realistic project. If it was me, I would try another VPS server to see if you get better performance. You could even test it out on your local machine if you have a decent desktop computer, just to see how it would perform with more resources.

Thank you for your comment.

I pay 35 $ monthly for this service, what they call a VDS. Let’s say it’s shared and they keep it from me (yes, it’s quite possible), but why would MySQL spike up the CPU and the memory while it has literally zero load? I am checking the stats both from the server itself and Google analytics. Most of the times I am the only one who is connected to the server. If it’s really a resource issue; how would it load near to death while it’s completely idle? You say if it’s shared, it’s producing the load from other users? Isn’t it against the virtualization logic? I am lost here…

I intend to hire another provider but since I will test that mostly when it’s idle too, I guess I won’t be sure about the real cause if it goes well. I won’t be testing the running websites in deed because websites will be idle in there too; as you saw “SHOW PROCESSLIST” command shows nothing but the shell admin.

With this system, I am not able to accomodate even only 10 concurrent users.

Hi pisisler;

My hunch is that it’s not an issue with what you are doing; it’s an issue with the underlying host system (that is hosting your VM). Meaning that even the slightest load in your VM is causing issues because the host system’s resources are over taxed (basically some companies will oversell the resources to make more money).

Using the SSD as example; imagine that you are sharing this SSD with 100 other people (with their own VMs), and those 100 people are using up 99.9% of the SSD’s capability already. So when you go to do anything in your VM using that same SSD, you end up having to wait because the SSD is already almost fully utilized by other people. This is an over simplification of things, but that’s basically what my thinking is. This is a common problem with entry level VPS providers, because providing a high quality VPS that performs well is expensive, so they oversell the resources to keep the price more affordable (and to make them more money).

Yes, Scott. I know that happens all the time. But I didn’t know that virtialization software kind of VmWare would utilize the resources this sloppy. Because if a portion of the resource is dedicated to someone, it should stay so and VM software is responsible for that. So what I knew was; when I run commands like “top”, it would have shown my own resource usage statistics. So for example let’s say the CPU is divided into 10 parts and I am having one of it. Shouldn’t the top command show me the percentage of that 1 part only? And here you say that portion dedicated to me is very very low so that it would spike up even with a slight usage…

But isn’t the total disk write so much when it’s idle? I think this has nothing to do with the sharing of the resources. Because it’s giving me my own disk write stats. So would really MySQL write 500k/s when it’s idle?

Now it comes hard to change the service provider while I had already paid for 1 year in advance. This is my 5th year with them and it’s the first time I am having this kind of issues; so I am still having some doubts about the real cause. Maybe I will try a fresh installation on the same machine.

Thank you for your time Scott, I really appreciate it.