Home    ITIL  Index

Incident Management and Automatic Alerting Tools

By Sturgisman, A. Kent Much of the advice regarding Incident Management is an un-stated assumption that the main source of Incidents are calls to the Service Desk. Although this may be true for the great majority of lower impact requests, there are different recommendations for those who have 'Event Management' tools in place that automatically generate Incidents.
Feb 23, 2004
By

ITSM Watch Staff





By Sturgisman, A. Kent
Much of the advice regarding Incident Management is an un-stated assumption that the main source of Incidents are calls to the Service Desk. Although this may be true for the great majority of lower impact requests, there are different recommendations for those who have "Event Management" tools in place that automatically generate Incidents.

Event Management systems automatically alert for exceptions to the expected level of service. Alerts are pre-classified and create incident records either automatically or upon human initiation. These higher impact Incidents are managed by the Incident Management team with status updates fed back to the Service Desk.

In order to integrate optimally with the Incident, Change, and Problem Management processes, several key steps should be taken:
  1. Generate alerts only when an action needs to be taken. (Alerts that cause automatic correction can be logged and closed in the Incident Management system for record keeping and analysis, but there is no point in presenting them in real time).

    This requires careful requirements design, including definition of the Incident classification, notification, and error correction actions. If one is fortunate enough to have a fully populated CMDB that can be navigated easily by the event management tool, have the tool determine the services that are (or may be) affected. Where that is not available, have the affected service(s) defined in the alert notification/correction requirements.

  2. Where possible, all alerts should be collected in a single place for record keeping and uniform action as defined in the requirements.

  3. All alerts SHOULD be related to either an Incident, a blackout (maintenance) window, or a Change (see blackout/maintenance). This allows multiple alerts to be tied to the same Incident or Change (or Incident which is related to a RFC).

  4. Record all alert events in the Incident Record (including the support team notification and response), and the alert clearing (automatically if it is resolved by the same tool that detected the incident). An Incident that is closed before the Alert Clear event is logged is suspect.

  5. Observe an Alert Quality review practice. Periodically review alerts and Incidents for a specified period. If there are alerts without Incidents there may be alerting noise or imperfect process execution.

    If there are Incidents without alerts, there MAY be process execution opportunities or missed opportunities to instrument and alert to catch errors before there is business impact. Not all situations can be caught by instrumentation. This is particularly true of "personal space" Incidents.

  6. The Incidents-without-alerts situation can also reveal opportunities to monitor business data for business incidents as opposed to technical indicators for IT incidents.

  7. Mine the PMDB for alerting opportunities.
Case Study:
In the fourth quarter of 2002, the event management system was implemented and began initiating Incidents. In the fourth quarter ~13% of all "non-personal space" incidents were reported by the alerting system. A year later the average was ~42%. Two years later, 80% of all non-personal space incidents were initiated by the alerting system.

A. Kent Sturgisman developed a career as a business and government computing professional for nearly 20 years before encountering IT Service Management in 1997. Since then he has studied and applied ITSM best practices to the benefit of his employer - a household name in financial services. He is pleased to have this opportunity to share its thoughts for using ITSM to achieve high flexibility, stability and quality at low risk and cost.




Comments  (click to add your comment)

Comments

    Name or nickname

    Email address

    Website

    Write comment
    You have characters left. (Maximum characters: 1200).

     


    IT Management Daily Newsletter




    Most Popular