Hi,
I’m working with pmp-check-aws-rds.py plugin (1.1.8) over Nagios Core (4.3.2) and in the last month we are reciving from time to time service alert “(Service check timed out after 60.01 seconds)”. It happens in all RDSs (60 instances over 2 regions - MySQL 5.7.17).
Quick investigration shows that those timeouts are being counted as “connection errors” on every RDS instance (SELECT * FROM performance_schema.host_cache).
Nothing was changed on Nagios server, and i can’t understand why sometimes it works and sometimes not - almost 1 failed on every 10 checks.
Thanks.