Deployment Solution

 View Only

How to Demonstrate Monitor Solution for Dell Servers 

Sep 10, 2007 05:44 PM

If you have installed monitor solution for Dell Servers and you want to actually demonstrate its functionality with a Dell server then what's the best thing to show. Chances are that you do not have a number of Dell servers lying around that you can simulate a database failure or an OS crash.

This guide will show you how to demonstrate 2 different components of monitor solution, the first will be to monitor an actual event from a Dell server and the second will be to demonstrate monitoring of a service that is or should be running.

Requirements

To be able to run this demo you will need a Notification Server with Monitor Solution and Monitor for Dell Servers installed. You will also need a Dell server that has both the Altiris agent installed with the monitor agent and also Dell OpenManage Server Administrator on the box. The Altiris Monitor Agent obtains data from OMSA through SNMP requests. The agent then uses its own mechanisms and the Altiris Agent to transfer data from the server to the computer running Notification Server.

Demo 1: Monitoring a power failure

Simulating a power failure with a Server is quite easy to do assuming that you have two power supplies attached to the back of your server, if not you probably don't want to try this demo until you find yourself another power cable. (Especially for a server in production). Once you have installed your monitor solution agents there are a couple of things you need to check to make sure that everything is ready to go.

The first thing to check is the collection called 'All Dell Windows Computers with OpenManage Server Administrator and Altiris Monitor Agent SP4'. This can be found under configuration>Dell Servers folder within the manage>monitoring page. The Dell server should appear under this collection, if not make sure that the agent has been pushed and the client has updated its configuration. Note: it may take a few minutes for the machine to appear under this collection. You should also check to make sure that your monitor packs are disabled by going to Tasks>Manage Monitor Packs.

It is important to disable monitor packs before you begin to use the product as you may trigger off rules unexpectedly which will result in alerts being generated and your dashboard staying continuously in a critical state. Once you become more familiar with monitor solution you can start to experiment with enabling different monitor packs.

Click on Manage > Monitoring to access the main monitor page within the Altiris 6.5 console. The first thing to look at is the monitor dashboard. Here you should see your server status as good. (Represented by green bars in the graph).

This guide isn't for those who are new to monitor solution but as a recap you need to understand the differences between Metrics, Rules and Monitor Packs.

Metrics define how the Monitor Agent collects data from supported data sources, called metric sources. Each agent can use numerous metrics to define all of the data that you want to collect.

Rules specify how to analyze metric or event data collected by the Monitor Agent. Rules also define under what conditions they are triggered and the actions taken. The actions can include sending an e-mail, generating an SNMP trap, creating an Alert Manager or Helpdesk incident, and running a command on the monitored computer from the command-line.

A Monitor pack is a group of rules for a similar purpose, such as monitoring an operating system or an application. The rules of a Monitor pack are grouped into categories for greater ease in working with them.

Monitor Solution for Dell Servers provides the metrics needed to monitor Dell servers.

For a more detailed understanding on how monitor solution works, please go to the documentation section on the Altiris website.

Back to the demo, the first thing we need to do is to edit the existing Dell-Windows-Basic monitor pack. Double click on the monitor pack and you will notice that there are two categories by default, server and storage management.

Double click on the server administrator and scroll down the rules until you find "Power Supply detected a failure".

Double click on this rule and on the Action tab, change the 'Display state as' to Critical and check box the reset state 'Incident acknowledgment'. This means that although a power failure may be corrected when the power is plugged back in but the dashboard will stay critical until we acknowledge that we have seen the problem.

Now to see if the setup of the monitor pack is correct, pull out one of the power supplies plugged into the back of the server. After a few seconds (could take up to 60 seconds) your dashboard should turn red.

If you click on the glasses icon at the bottom of the page you will be shown which rules have triggered the critical status.

In this example you can see that by pulling the power cable from the server you have triggered 2 rules, 1 is that a power supply failure has been detected and the 2nd is the redundancy has been lost as a result. Explain at this point you would go to the server room and see that the power cable has been pulled out and you would replace the power to the machine. Even though the agent will detect that normal service has been resumed the dashboard will not reset until you acknowledge the rule. To do this click on the acknowledge all icon

Click on the close acknowledge once the pop up window appears and then go back to the dashboard and you should see that normal service has been resumed (the graph should be green again)

Demo 2: Monitoring a Service

The second demo scenario is to show how monitor solution can be used to monitor a service and make sure that it is always running. For this demo we are going to use the Altiris Deployment Server as our service (express service). We will create a metric, a rule and a monitor pack to show how if the Altiris service is stopped, we can be alerted and it will be automatically restarted.

Creating a Metric

The first thing you need to do is to create a Metric that monitors the Altiris Service. Go to the Manage > Monitoring view and click on the 'Manage Metrics' link.

Click on the + icon to create a new metric. Complete the Metric as shown in the image below. Make sure you create a WMI metric.

The next thing to do is edit your Monitor pack to add your newly created Metric. To do this browse to your monitor packs and edit the Dell>Windows>Basic monitor pack. Create a new category called Altiris Management as shown below.

Double click on the new category. Under the general tab you need to create a detection rule for the Altiris Service. The Detection rule allows the agent to work out if it needs to apply this policy, if the detection rule isn't met the policy is simply ignored. This will stop policies being downloaded to servers that it doesn't apply to. Create a new detection method and complete the details as shown in the below image.

Now click on the 'Rules' tab. Under the general tab we need to create a rule for the service that you intend on monitoring. In this instance we will be monitoring the Altiris Deployment Service. Click on the new button and complete the rule metric as shown below. Under the metric drop down, select the Altiris Deployment Service that you created in the earlier stage. The 'Value' field should be 'Running' as this is the state of the service that we are monitoring.

Now that we have created a monitor metric for the Altiris Service and created a rule that monitors that the service should be in a running state, we need to create an action for when the service leaves the running state (i.e. the service stops).

The first thing to do is click on the 'Action' tab. Under the display state, make sure that the state is set to 'Critical'. Another important thing here is to set the 'Reset state using' value to "Update Metric Value". What this does is updates the dashboard automatically so that when the service restarts it changes the dashboard back to a good state.

The final step to make is to define the action that we want to automate. In this instance we want to try and restart the Altiris Service when it detects that it has stopped. To do this click on the new button and complete the fields as shown on the image below.

Testing the Rule

To test the rule you first need to update your configuration of the agent so that it will download the latest policy we just created. You can then go to you Monitor Solution dashboard and you should see that everything is normal.

Now go to your services under the administrative controls and stop the Altiris express Server service as shown below.

If your policy is configured correctly, after a few moments you should see the dashboard change to show the critical error.

If you click on the glasses icon you will then see the triggered rules that affected the machine. On the page you should see the name of the service that you created earlier. Because you have set the Altiris system to automatically try and restart the service the service may restart and the dashboard will go back to a normal state.

You can now play around with the solution and create your own monitor rules that monitor other services or processes that are running.

Statistics
0 Favorited
0 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Comments

Apr 18, 2008 07:21 PM

On the first definition I need to change numeric by string...
Thanks
Dom

Apr 18, 2008 07:16 PM

Hello,
I am doing the process but before the Configure Rule Metrixc I have a screen Rule Configuration with a tab General
Name: Altiris Deployment Services
Type: Based on Metric
Rule Metric I click New and get the Configure Rule Metric BUT
Metric Type: WMI
Metric : Altiris Deployment Services
Polling Interval: 30 seconds
Statistic: None
Time Period greyed out
Condition: I have only Equal To, Greater than. Greater than or equal to, Less than, Less than or equal to and Not equal to which seems refering to numeric values!!!!
Value type: Constant
Value: it just allow the numbers... as expected with the condition not having the proper statement.
"Please enter a numeric value"
It seems there is a step to define alpha, alpha numeric, numeric or whatever which is missing???
Thanks
Dom

Sep 13, 2007 05:26 PM

Suggested Dell Monitor 6.1 "out of box" rules that can be triggered by user initiated Dell hardware failures are listed below. Thanks goes out to Eric Szewczyk for providing this:

  • Dell Voltage Status Combined (Changed in OMSA) (manual)
  • Temperature has exceeded upper critical threshold (May be able to set in OMSA for PE2650?) (manual)
  • Minimum temperature probe warning threshold value changed (manual)
  • Maximum temperature probe warning threshold value changed (manual)
  • Array disk removed (manual)
  • Array disk inserted (manual)
  • Array disk degraded (automatic)
  • Array disk rebuild started (automatic)
  • Array disk rebuild completed (automatic)
  • Array disk offline (automatic)
  • Array disk(s) have been removed from a virtual disk. The virtual disk will be in a Failed state during the next system reboot. (manual)
  • Array disk(s) that are part of a virtual disk have been removed while the system was shut down. The removal was discovered during system start-up. (manual)
  • Virtual disk degraded (automatic)
  • Virtual disk check consistency started (automatic)
  • Virtual disk check consistency completed (automatic)
  • Virtual disk rebuild started (automatic)
  • Virtual disk rebuild completed (automatic)
  • A virtual disk and all of its member array disks have been removed while the system was shut down. This removal was discovered during system start-up. (manual)
  • A system BIOS update has been scheduled for the next reboot (will need to suppress reboot) (manual)
  • Redundancy is offline (pull power cable from ONE of redundant power supplies) (manual)
  • Redundancy regained (plug power cable back in) (manual)
  • AC power has been restored (automatic)
  • Dell Chassis Intrusion Status Combined Log (remove chassis cover) (manual)
  • Chassis intrusion returned to normal (automatic)
  • Server Administrator starting (try stopping the OMSA service and then restart) (manual)
  • Server Administrator startup complete (automatic)
  • A device has been inserted (try inserting a USB thumb drive?) (manual)
  • An enclosure blink operation has initiated (use sample job from C:\Training folder) (manual)
  • An enclosure blink has ceased (automatic)
  • Asset tag changed (BIOS configuration) (manual)

Related Entries and Links

No Related Resource entered.