pmp-check-aws-rds - Service check timed out after 60.01 seconds

Hi,

I’m working with pmp-check-aws-rds.py plugin (1.1.8) over Nagios Core (4.3.2) and in the last month we are reciving from time to time service alert “(Service check timed out after 60.01 seconds)”. It happens in all RDSs (60 instances over 2 regions - MySQL 5.7.17).
Quick investigration shows that those timeouts are being counted as “connection errors” on every RDS instance (SELECT * FROM performance_schema.host_cache).

Nothing was changed on Nagios server, and i can’t understand why sometimes it works and sometimes not - almost 1 failed on every 10 checks.

Thanks.

Happens to me too with RDS. Please advice.

+1 - Happens to me as well.

+1 - happens to me as well

Thank you for these reports, JTLYK we are opening up a review report on our JIRA system for this, so that the team can review properly. I will update when I have more information.

Hello posters, I have an update from our team. If you are able to help further with this issue that would be FANTASTIC… It’s been logged as a bug but we could do with some more info to track it down:

Tested with version pmp-check-aws-rds 1.1.8, but did not see any timeout.

It may be related to a specific to Nagios config for rds service and command configuration in it.

https://www.percona.com/doc/percona-monitoring-plugins/LATEST/nagios/pmp-check-aws-rds.py.html

Please provide more info about the metrics you are monitoring and the command(s) you are using which result in timeouts.
For direct updates on this issue, please add details on https://jira.percona.com/browse/PMM-2606 , which we created for this issue.

Thank you!