-
Notifications
You must be signed in to change notification settings - Fork 130
WatchDog
Watchdog functionality is very useful, as occasionally mh will lock up. The basic idea is for mh to "touch" a file very 10 seconds. If an external process detects that the file hasn't been updated in a while, it restarts MisterHouse, as something is terribly wrong.
This example uses the DaemonTools setup. You will have to substitute your own restart mechanism if you are using something else.
# This should be added to your code directory
# watchdog_file should point to a file that is writeable by MisterHouse
# Watchdog: touch a file every 10 seconds
if (new_second 10) {
my $now=time;
utime $now, $now, $config_parms{watchdog_file};
}
# this is added to the crontab of a user who can terminate / restart MisterHouse
* * * * * /path/to/script/shown/below
#!/bin/sh
# script run by crontab every minute
WATCHDOGFILE=/usr/local/mh/data/watchdog
# if watchdog hasn't been touched in 90 seconds, restart MisterHouse
MAXDIFFERENCE=90
SERVICE=/service/mh
SVC=/command/svc
watchdog=`stat --format=%Y $WATCHDOGFILE`;
now=`date +%s`;
difference=$(( $now - $watchdog))
if [ $difference -gt $MAXDIFFERENCE ]; then
$SVC -t $SERVICE
$DIDIT mhwatchdogrestart
fi
This is my simple external independent method for achieving similar results. The original was written around 2002 for RH6, and has had some updates over the years (pete flaherty) This script assumes your startup script is /etc/init.d/misterhouse (start|stop|restart)
mrwatcher (shell script) place in ..../mh/bin/ directory (adjust as necessary)
# MisterHouse process watcher for linux
# Pete Flaherty 15 Jul 2002 - initial release for general processes
# 22 Nov 2007 - update for better process detection
#
# Set this script to run in a cron job to insure uptime, mine runs every 5 mins
# see cron entry in the next listing
# make sure we have a place to log stuff to
if [ -e '/var/log/watch.log' ] ; then
echo "" >> /dev/null # probably don't need, but we're polite so...
else
touch /var/log/watch.log
fi
# Misterhouse test
RUNNING=`ps agx | grep /usr/local/mh/bin/mh | grep -v grep`
echo "testing"
if [ "$RUNNING" == "" ] ; then
echo `date` 'MisterHouse is NOT Running Restart using mrhouse init file'
/etc/init.d/misterhouse stop
# and be sure that its really stopped
killall mh
# and restart the world
#
/etc/init.d/misterhouse start
# Log failures to the crash log
echo `date` ' --- CRASH/STOPPED Auto RESTART ---' >> /var/log/watch.log
# Sometimes we want to be notified by email too this works
## echo "MisterHouse on " `hostname` "was restarted " | mail -s "Watcher Restart MESSAGE" [email protected]
else
echo `date` $RUNNING 'is Running fine' >/dev/null
# for debugging if needed verbosity
# echo `date` $RUNNING 'is Running fine'>> /var/log/watch.log
fi
And add a crontab to run it
# check MisterHouse every 5 mins
*/5 * * * * /usr/local/mh/bin/mrwatcher