System Outage Incident Process
In our environment, we are required to notify the IT department when a system becomes unavailable. Over time, this has produced many different messages with varying amounts of information. Often, the messages are missing the cause of the outage and frequently there isn't a follow up communication. This also leaves no way to report on how many outages we've had and for which applications and services. Altiris provides functionality that not only captures and delivers this information, but much of it can be automated.
Our process begins with the creation of a Quick Incident. We heavily use these preconfigured incident templates for speed and consistency in the information we gather when creating a ticket. Also, in a crisis, we often don't have time to fill out all of the necessary details and route the incident accordingly. For an outage, we want our users to create an incident with the same information documented each time.
We set the title to specific verbiage that instructs the user to name the incident with just the system, service or application name. The definition of "system" (for this Quick Incident) is any hardware, software, networking component or task that negatively impacts the abilities of our customers to get their work done. In order for management to know how long the system has been unavailable, we also want them to set the outage duration at that point. The category is specified and the priority, urgency and impact fields are set to the most critical. (We've added the values of enterprise, site, department and individual to our Impact field, so it may look different than yours). The comment field requires detailed information about the issue, cause, corrective actions and the approximate duration.
Once our IT worker fills out all the required information and saves the incident it launches a Notify Rule based upon defined criteria. Here's how we set up our notification rule:
Name: ITS System Availability Notification Comment: This notify rule is triggered when it sees 'UNAVAILABLE' in the title. Send e-mail: ITS System Availability Issue Send pager e-mail: [none selected] Send to: To: ITDivision@domain.com Exclude editing worker: No Also send to other e-mail addresses No When: Every time incident is saved And: When ANY of these is TRUE "Title" contains "UNAVAILABLE" (case sensitive) "Version" is equal to "1" <end> Visible: Yes Locked: No Default Yes Active Yes
Since we pre-configure our Quick Incident with the word "UNAVAILABLE" in the title, it's easier to require the word to be all caps. This ensures that all incidents with unavailable in the title won't fire off an e-mail to the entire IT Division. We also include the version to make sure subsequent updates are not sent out via e-mails. The outage incident often requires several updates and in some cases being sent to a variety of different people. At this point, we will only update the division AFTER the situation has been resolved.
After the rule is launched, the e-mail is sent out with the required information. We now have an incident to track exactly what system is down and we can document our progress towards a resolution. If the Help Desk begins to take calls relating to the system, these tickets can be linked backed to this primary outage incident. Again, this helps to determine impact and provided information we will want to report on in the future.
The final steps in our process separates the final updates to the incident from the information we send out updating our department the system is now available. The original incident is edited and the following information added to the comment field:
- The cause of the system outage
- The impact to our users
- The total duration of the Outage
- Total incidents associated/linked to this outage
- The steps that led to the system availability
The worker editing the incident then selects the e-mail button from within the incident. This is located between the View Asset Properties and the Incident Link button. Within an incident, we frequently click on the Send an e-mail when this incident is saved button. This will allow a separate e-mail with only the information typed in the "message field" to be included in the electronic communication. However, the incident history will contain the specifics of the outage and the steps to restore service. Many times this much detail is required for reporting purposes, but not necessary for updating our users or the IT department. After selecting the e-mail button, the "E-mail a message to specified recipients" screen will display.
Here you can choose to have the message sent to particular workers like the Assigned to worker or the worker who modified the incident. Multiple choices can be selected and will be sent after saving the incident. So, an e-mail can be sent to the contact for more information pertaining to that incident, to workers listed in the system, or to separate e-mail addresses.
In our example, we enter the name of an additional e-mail address in the "These Addresses" field. Our Exchange e-mail distribution list for the IT Division is entered so the text in the "message field" will be sent out as a follow up to our co-workers on the status of this outage. Attachments can be included with more detailed information or if your vendor has sent something to explain the issue. To do this, simply click on the "Upload new attachments" hyperlink and browse to the file you want to add. On the same screen, another option can include adding a hyperlink with a designated URL. There may be somewhere on the web that will include more information about what happened to your system or how it can be prevented in the future. You can add a description to your attachment and/or remove it if you need to add an updated version. Click on the "Done" button when you're finished.
An e-mail template must be used when sending this to the specified recipient. By default, only one template is displayed because it is unique from the other e-mail templates. If you go to the Admin menu, E-Mail Templates, List E-Mail Message Templates, you'll notice something different about the Incident Correspondence template. It's an e-mail action instead of an e-mail notification. Put simply, an e-mail action is used for incident e-mail functions like sending a message by workers from within the incident and logging it as part of the comment history for a ticket. More important, if you edit the Incident Correspondence e-mail template, you'll notice that it doesn't have tons of HTML displayed. Rather, it lists the "workitem_email_message" in the body of the template.
It's only going to display the text listed in the message field and then whatever text you add below. You can modify this information if you want. Simply copy the Incident Correspondence email template and add in the information you want to display. To customize what you want displayed, refer to the simple macros section of the Help Desk Solution Product Guide on page 248. They have many macros listed there with detailed descriptions and examples.
For our macro and based on our image, you would see our system outage explanation and thank you in the e-mail message. After saving the incident, the message will be sent. It will also keep the record of the email action in the incident history which can be referred to later. An example is listed below:
4 2/1/2008 10:06:03 AM Name work E mail
This incident was saved, but there were no changes detected.
--------
E-mail has been sent.E-mail template: E-mail contact - Incident question
To: customercontact
Cc:
Bcc:
Attachments:
Message:Question for contact here.
Finally, you need to send out the procedure to your IT workers so the process can be followed consistently. We wanted to make sure everyone would start using the new procedure and that our documentation was simple and easy to understand. Below are the instructions for the process we send out to our users:
"ITS System Availability Issue" Quick Incident Steps:
Quick or recurring incidents are designed to help you expedite commonly reported problems.
- On the Helpdesk menu, go to Quick Incidents, List Quick Incidents.
- Scroll down and click on the "ITS System Availability Issue" Quick Incident (listed fourth from the bottom)
- Click the Run icon.
- Fill in the Contact (customer who reported the issue or your name)
- Fill in the Asset information
- In the Title field, remove the text between the square brackets and replace with the name of the system. (The word UNAVAILABLE must remain to activate the Altiris notification process.)
- In the Comment field, there are five fields that need to be filled out:
- Issue:
- Impact to Users:
- Cause:
- Correction, Outage Duration (hrs):
- Fill in the Application information and any remaining fields. By default, the incident status is set to closed.
- Click Save. Once you save the incident, the system will automatically send out an e-mail notification to the IT Division with the contents of the Comment field.
*Note: After initially saving the incident, any edits or updates made to this incident will not send notifications to the ITS Division.
Using Altiris with this entire process allows us to be consistent in reporting our events and outages. Not only can we can document patterns of systems that recurrently become unavailable, we can gauge how many customers were affected and how frequently. All of this replaces our old method of sending out e-mails that everyone kept in their inbox and left no record of the unavailable system. Most important, we were able to automate several manual processes all into one and that saved us time and money. So, try it!






Looks good. One thing I want
Looks good. One thing I want to point out in case anyone wishes to use this is the "When" clause in the Notify Rule where you set the following conditions should read:
"When ALL of these is TRUE"
If it's left to "When ANY of these is TRUE", then the notification will go out anytime any incident version is equal to "1". Basically when ANY incident is created, irregardless of the title, a notification email will be sent.
Thanks for sharing though. I can see alot of benefit for those using email notifcations!
System Outage Incident Process
Yes, you're correct...it should be ALL instead of ANY.
A good rule of thumb when modifying rules....think of all the possible outcomes for how it might be used...otherwise, they can sometimes produce results you don't expect....
In the end...it's worth it because they're a great way to automate the notification process. :)
Would you like to reply?
Login or Register to post your comment.