Video Screencast Help
Endpoint Management Community Blog

Altiris 7.x - Linux Monitor Agent Troubleshooting Guide

Created: 18 Feb 2011
scott.hall's picture
+1 1 Vote
Login to vote

What should be running on a linux server with monitor agent installed 

  • This is all determined by running 'ps -ef|grep aex|grep -v grep' from the command line
  • aex-pluginmanager.bin -D
  • aex-pluginmanager.bin -F -nm -nc
  • aex-metricprovider.bin
  • aex-logupload.bin
  • aex-appdetector.bin

Symptoms

All processes appear to be in memory and active, but no metrics are available in 'real time'

  1. Determine whether the server has policies applied to it in the Resource Manager.
    • This obviously will cause it to have no metrics to return.
  2. Check the /opt/altiris/notification/monitor/etc/Config.xml
    1. Is the Config.xml empty, meaning it only has two lines, the second one being 'config/'
      • This could be a glitch that I've seen in SP3 and SP5.
        • If it's got policies applied to it, stop the AltirisSM service, delete the Config.xml, start the AltirisSM service and examine the new Config.xml
    2. Do a 'less /opt/altiris/notification/monitor/etc/Config.xml'
      • Check for instances of the word 'inactive' on policy lines for things like processor and memory.
        • If instances of 'inactive' appear for processor and memory policies, that indicates a detection rule is inactivating them.
          1. First, check that the server reports itself as Redhat Enterprise Server by doing 'cat /etc/redhat-release' from the command line.
          2. It could be the rpm database.
            • From the command line try 'rpm -qa' both as your user id and as root (see your friendly neighborhood unix admin for assistance). If you see errors that contain the string 'Lock table is out of available locker entries', then this indicates that the detection policy was unable to determine the version of glibc from the rpm database and inactivated the policy.
            • If that's the case, have your friendly neighborhood unix admin do a 'rm /var/lib/rpm/__db.00*' followed by an 'rpm --rebuilddb'. At this point it's safe to stop the AltirisSM service, delete the Config.xml and restart the AltirisSM service.

Everything but aex-logupload.bin and aex-appdetector.bin are in memory and active and no metrics are available in 'real time'

  1. Determine whether the server has policies applied to it in the Resource Manager.
    • This obviously will cause it to have no metrics to return.
  2. Check the /opt/altiris/notification/monitor/etc/Config.xml
    1. Is the Config.xml empty, meaning it only has two lines, the second one being 'config/'
      • This could be a glitch I've seen in SP3 and SP5.
        • If it's got policies, stop the AltirisSM service, delete the Config.xml, start the AltirisSM service and examine the new Config.xml

Everything but metricprovider.bin appear to be in memory and active and no metrics are available in 'real time'

  1. CRASH!
    • This means metricprovider.bin bit the dust and orphaned it's children aex-appdetector.bin and aex-logupload.bin.
      • It's doubtful that there is any useful information in the logs, the only thing you can really do at this point is stop the altirisSM service and start it back up again.
      • If the server is a repeat offender, consult with your friendly neighborhood unix admin to see if they would be willing to allow you to generate core dumps for metricprovider.bin by adding the following line as the second line of /opt/altiris/notification/monitor/bin/aex-metricprovider and restarting altirisSM service
        • ulimit -c unlimited