I’m working with pmp-check-aws-rds.py plugin (1.1.8) over Nagios Core (4.3.2) and in the last month we are reciving from time to time service alert “(Service check timed out after 60.01 seconds)”. It happens in all RDSs (60 instances over 2 regions - MySQL 5.7.17).
Quick investigration shows that those timeouts are being counted as “connection errors” on every RDS instance (SELECT * FROM performance_schema.host_cache).
Nothing was changed on Nagios server, and i can’t understand why sometimes it works and sometimes not - almost 1 failed on every 10 checks.
Thank you for these reports, JTLYK we are opening up a review report on our JIRA system for this, so that the team can review properly. I will update when I have more information.
Hello posters, I have an update from our team. If you are able to help further with this issue that would be FANTASTIC… It’s been logged as a bug but we could do with some more info to track it down:
Tested with version pmp-check-aws-rds 1.1.8, but did not see any timeout.
It may be related to a specific to Nagios config for rds service and command configuration in it.