Thursday, October 7, 2010

Nagios Uptime Plugin

After the recent nagios-plugins update that we applied to our centos nagios box, we found our uptime check we created no longer worked. The uptime check we created using the check_snmp plugin to grab the OID for Linux HOST-RESOURCES-MIB::hrSystemUptime.0 or for other devices 1.3.6.1.2.1.1.3.0 and then tell us if the uptime for a device was less than ten minutes. For some reason the new check_snmp plugin no longer parses the results of these MIB's in a similar manner. It reported the data correctly, but it would not trigger a critical alert when the uptime was less than what we specified.

In an attempt to get us back up and running I devised a bash script plugin to grab the snmp data manually parse it and report back the status. Here are the results aptly called check_uptime.sh:

#!/bin/bash
#Description: Nagios plugin to check uptimes of servers and equipment.
#Linux uses a different OID for uptime than does everything else. If you use the regular
#OID for Linux hosts you will get an erroneous answer.
#Version 0.5
#Created by Jason Wasser
#Until I can learn how to parse command line arguments, we must accept the command line
#argments in order.

#$1 - hostname
#$2 - community name
#$3 - critical (automatically chooses less than) value in ticks (100 ticks per second)
#$4 - OID - choose between one of the values below

#OID for Linux: HOST-RESOURCES-MIB::hrSystemUptime.0
#OID for Other: 1.3.6.1.2.1.1.3.0

UPTFULL=`snmpget -v1 -c $2 $1 $4 | cut -d "=" -f 2`
UPT=`echo $UPTFULL | cut -d "(" -f 2|cut -d ")" -f 1`

if [ $UPT -lt $3 ]; then
echo "CRITICAL -" "$UPTFULL"
RET=2
elif [ $UPT -gt $3 ]; then
echo "OK -" "$UPTFULL"
RET=0
else
echo "UNKNOWN -" "$UPTFULL"
RET=3
fi
exit $RET