I’ve been working on a server that has been going offline. I can tell it’s offline because I get an indicator in my mail program that shows that I can’t get mail. The first time it happened we called over to the data center and had them restart everything. That worked for a while but then it happened again. This time we went over to the data center and the server itself was fine. Replacing the switch that connected two servers made the problem go away. A few weeks later, the server went offline again. This time it looked like it was a DDOS. It has been fine ever since, but we wanted to have a better way of knowing that there is an issue than me happening to look at my email program and noticing that I wasn’t getting mail.
One way to see if it is alive is to ping the server every few minutes. If I get no response, then send a text to the owner and an email to me.
I have a couple of cron job on my server running under my crontab that send out notifications by text or email every minute. So I edited the crontab to add another job to its list.
To see the cron jobs, crontab -l -u myuserid. To edit type, crontab -e.
# m h dom mon dow command
*/1 * * * * /home/myuserid/reminder.sh
*/1 * * * * /home/myuserid/ping_remote_server.sh
Here’s the shell script I wrote to ping the server and send out notifications.
#!/bin/sh
ping -c 1 -q remoteservername.com 1>/dev/null 2>&1;
return_code=$?;
if [ $return_code -ne 0 ]; then
#echo "Failure";
mail -s "Server Down" 8055551212@mobile.mycingular.com myuserid@MyServer.com < /dev/null
fi
I ping it once (-c 1), quietly—since I don’t care about the result (-q) and send the result and any error messages to /dev/null. All I care about is whether it was successful. Every Linux command has an exit status code. You can access it with $?. I only care if the command ran successfully, so I check to see if it is not 0.
I don’t need any details in the text that I send, so I set the subject and grab the content from