pmp-check-mysql-replication-running returning UNKNOWN for SQL running, IO Running, Error:

Hi all,
I am new the percona toolkit, and just downloaded the percona-monitoring-plugin for nagios.

For my slaves that are replicating OK, for the replication running check they are returning UNKNOWN for a good slave…

SHOW SLAVE STATUS;
Slave_IO_State: Waiting for master to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:

The logic in the 0.9 plugin seems to just return the default STATE_UNKNOWN=3 for the Running/Running/No error, which by my reckoning should be OK.

I can’t imagine that this is a bug, so I am presuming that I am missing the purpose of this check. Is is supposed to be dependent on some other check or something

Thanks
Tom

############################################################

Set up constants, etc.

############################################################

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4
EXITSTATUS=$STATE_UNKNOWN

############################################################

Run the program.

############################################################

main() {

Get options

for o; do
case “${o}” in
-c) shift; OPT_CRIT=“${1}”; shift; ;;
–defaults-file) shift; OPT_DEFT=“${1}”; shift; ;;
-H) shift; OPT_HOST=“${1}”; shift; ;;
-l) shift; OPT_USER=“${1}”; shift; ;;
-p) shift; OPT_PASS=“${1}”; shift; ;;
-P) shift; OPT_PORT=“${1}”; shift; ;;
-S) shift; OPT_SOCK=“${1}”; shift; ;;
-w) shift; OPT_WARN=“${1}”; shift; ;;
–version) grep -A2 ‘^=head1 VERSION’ “$0” | tail -n1; exit 0 ;;
–help) perl -00 -ne ‘m/^ Usage:/ && print’ “$0”; exit 0 ;;
-*) echo “Unknown option ${o}. Try --help.”; exit 1; ;;
esac
done
if [ -e ‘/etc/nagios/mysql.cnf’ ]; then
OPT_DEFT=“${OPT_DEFT:-/etc/nagios/mysql.cnf}”
fi

Get replication status into a temp file. TODO: move this into a subroutine

and test it.

local TEMP=$(mktemp “/tmp/${0##*/}.XXXX”) || exit $?
trap ‘rm -rf “${TEMP}” >/dev/null 2>&1’ EXIT
mysql_exec ‘SHOW SLAVE STATUS\G’ > “${TEMP}”
if [ $? = 0 ]; then

SHOW SLAVE STATUS isn’t an error if the server isn’t a replica. The file

will be empty if that happens.

if [ -s “${TEMP}” ]; then
NOTE=$(awk ‘$1 ~ /_Running:|Last_Error:/{print substr($0, 1, 100)}’ “${TEMP}”)
if grep ‘Last_Error: .’ “${TEMP}” >/dev/null 2>&1; then
EXITSTATUS=$STATE_CRITICAL
NOTE=“CRIT $NOTE”
elif grep ‘_Running: No’ “${TEMP}” >/dev/null 2>&1; then
if [ “${OPT_CRIT}” ]; then
EXITSTATUS=$STATE_CRITICAL
NOTE=“CRIT $NOTE”
else
EXITSTATUS=$STATE_WARNING
NOTE=“WARN $NOTE”
fi
fi
elif [ “${OPT_WARN}” ]; then

Empty file; not a replica, but that’s not supposed to happen.

NOTE=“WARN This server is not configured as a replica.”
EXITSTATUS=$STATE_WARNING
else

Empty file; not a replica.

NOTE=“OK This server is not configured as a replica.”
EXITSTATUS=$STATE_OK
fi
else
EXITSTATUS=$STATE_UNKNOWN
NOTE=“UNK could not determine replication status”
fi

echo $NOTE
exit $EXITSTATUS
}

This is a really embarrassing bug. It turns out that I didn’t have a test case for “OK” and in the process of fixing a failure for “Replication isn’t even set up on this server,” I introduced this bug.

This is https://bugs.launchpad.net/percona-monitoring-plugins/+bug/9 36571