Glad you gave it a whirl to start with; looks like you got something going so that’s good. =)
I would probably start with just having the check run once a minute, which should be adequate. The only time this will not work is if the ramp up from a healthy server to a broken server is less than a minute, which is definitely possible. But to start with, I would try something like the below which you would call from cron once a minute. Note I just put this together, so make sure to test it yourself and make sure you are comfortable with it before using it:
#!/bin/bash
THREADS=`mysqladmin --user=user --password=password status | awk '{print $4}'`
if [ $THREADS -gt 300 ] && [ ! -e /tmp/mysql_proc_check.flg ]
then
touch /tmp/mysql_proc_check.flg
for i in {1..30}
do
echo "" >> /tmp/processlist.log
echo `date` >> /tmp/processlist.log
mysqladmin --user=user --password=pass processlist >> /tmp/processlist.log
echo "" >> /tmp/processlist.log
sleep 10
done
rm /tmp/mysql_proc_check.flg
fi
Theoretically this should check for connections greater than 300, and if that is the case, it then loops once every 10 seconds for 30 times, or 5 minutes in total (about). It will output the processlist each iteration to a file along with the date/time so you can track it easier. Note I added in a “flag” as well which should prevent this from running a bunch of times at once, as that would make things a lot worse. Which brings up the point that checking the processlist does add load, so do be careful with this.