Server Management Suite

 View Only
  • 1.  Monitoring Linux services

    Posted Nov 24, 2011 11:30 AM

    Hi everyone,

    Has anyone a hint how to monitor Linux services in a similar way like it is possible for Windows services? I have been searching around, but have not found anything on that topic that could help me further.

    Thanks for any advise.

    -BBC



  • 2.  RE: Monitoring Linux services
    Best Answer

    Posted Nov 28, 2011 09:29 AM

    We are using 7.0 and take a targeted approach to monitoring Linux services, as opposed to our current method of monitoring all Windows services that are set to 'Automatic'.

    It's a two part process; we gather a custom inventory of servers that are running daemons we want to monitor, and then we have a metric in place that watches only those services to ensure that they are running, and running with the same command line arguments as when the inventory was taken (this will alert if someone has changed parameters).

    The first part of the process is to gather then custom inventory.  Here is the custom inventory gathering script we use:

    . `aex-helper info path -s INVENTORY`/lib/helpers/custominv_inc.sh
    #
    # Sample script for custom inventory
    # The first line of code should be always included at the begin of the script
    # Actual script for collecting inventory data begins after the following label:
    # SCRIPT_BEGINS_HERE
    #!/usr/bin/python
    import os
    daemonlist = ['LLAWP','ntpd','named','ndsd','crond','xinetd','avagent','postfix','syslog']
    try:
        os.makedirs('/opt/altiris/data')
    except OSError:
        if os.path.exists('/opt/altiris/data'):
            pass
        else:
            raise
    monitoringfile = open('/opt/altiris/data/ci-daemons.mon', 'w')
    ps = os.popen("ps axwwl").read()
    processes = ps.split('\n')
    nfields = len(processes[0].split()) - 1
    print "CI_DAEMONS_LINUX" #Put the name of the CI table here...duh.
    print "Delimiters=\"+\" "
    print "string64 string256" #put the real field values in here when we're ready to roll
    print "Application Command_Line"
    for row in processes[0:len(processes)-1]:
        proc = row.split(None, nfields)
        if proc[3] == "1": #Check to see if PPID is 1
            executable = proc[-1].split(None)[0].split("/")[-1] #split the command output of ps on spaces, then split the first item on /, the last item should be the executable
            for daemon in daemonlist:
                if executable.find(daemon) > -1:
                    print "%s+%s" % (executable, proc[-1])
                    monitoringfile.write(executable+"+"+proc[-1]+"\n")
    monitoringfile.close()
     

    This will load the custom inventory table with the process name found on the server, as well as the entire command line.  In addition to loading the CI table, it also leaves behind a file '/opt/altiris/data/ci-daemons.mon' which we will use to monitor each server. You can see that the only daemons we are inventorying and monitoring are included in the list variable 'daemonlist'.

    The metric we use to monitor the inventoried files is a command metric, which will read the custom inventory file that we left behind on the server and check to make sure that the daemon is running, and has the same parameters as when the inventory was taken.  It looks like this:

    while read -r i;do q=`echo $i|awk -F'+' '{print $2}'`;process=`echo $i|awk -F'+' '{print $1}'`;check="/var/run/${process}_restart";if [ -f $check ];then echo "Restarting $process";else p=`ps -ewwwwww -o ppid -o cmd | grep "$q"|grep -v grep|awk '  { if($1=="1") print $0} '|uniq|wc -l`;if [ $process = "LLAWP" ];then q=`echo $q|sed s/\ -a//g`;p=`ps -ewwwwww -o ppid -o cmd |sed s/\ -a//g| grep "$q"|grep -v grep|awk '  { if($1=="1") print $0} '|uniq|wc -l`;fi;fullcmd=`echo $q|tr [:blank:] "_"`;if [ $p -ne 1 ];then echo "Error $fullcmd";else echo "Running $fullcmd";fi;fi;done < /opt/altiris/data/ci-daemons.mon

    There are a number of unique features that we have included in this script for our environment.  The first is that we check for the presence of a file '/var/run/${process}_restart'.  We have modified our init scripts for a number of daemons to touch a file in /var/run when they perform a restart so that we do not falsely report that they are down during a scheduled restart time.  We also included a provision for the LLAWP daemon (Siteminder) because it will sometimes restart itself and include a '-a' runtime parameter that was causing false alerts.

    This monitoring approach presupposes that daemons will be running during the time that we take our daily custom inventory, and also that those daemons should be running at all times.



  • 3.  RE: Monitoring Linux services

    Posted Dec 02, 2011 12:03 PM

    Hi Scott,

    Thanks very much for the post, which was very much helpful!

    -BBC