Hello
It took me a few days to check some things, fix the queries, rewrite the scripts, and so on. So, at first I’d like to thank @matthewb and @Ankit_Kapoor1 for your input. I’ve took care of the issues you’ve pointed out. I’ve changed the innodb buffer pool size, replaced the queries that was doing full table scans, order by rand()/group, etc.
I’ve also paid more attention to the article about unused servers having high load, because the monitoring software was hitting the same timestamps as the load monitor in the kernel. This gave me a thought that a tiny bumps in the number of the processes can have a huge impact on the result. I’ve created a new graph (I’ll use PMM the next time I have a problem, I swear) and then compared load graph with the number of the cron tasks.
I’ve merged bash scripts into a single ones (when possible) and transferred most of the php-cgi requests to the proper poller scripts.
I’ve deployed the changes in queries and the cron requests right after 1600, and then some more cron request fixes before 2200.
Below is the graph of the load (the upper part) and the number of the tasks that are being executed by the cron.
