Home �   IT Service Management News�  Index

Improving Incident Management

Incident Management is predictably fixable, writes ITSM Watch columnist George Spafford of Pepperweed Consulting.
May 18, 2007

George Spafford

Incident Management is concerned with deviations from, and threats to, the standard operation of services. During the course of time, even the best services will have incidents.

How IT reacts to them will be pivotal not just to operations in their drive to reduce mean time to repair and mean time between failures, but also to customer satisfaction.

As a result, many IT departments strive to find opportunities to improve their Incident Management process. One approach is to understand the expanded Incident Management lifecycle and look for means to improve each stage.

Incident resolution does follow an observable lifecycle. Consider the following six stages:

  • Occurrence – Something happens to a configuration item (CI).
  • Detection – The incident is detected either by monitoring tools, IT personnel or, worse case, the user.
  • Diagnosis – the next step is to determine what has happened.
  • Repair – Then the CI needs to be corrected. This may be a true solution or a temporary work-around aimed at getting the user back to some degree of productive work.
  • Recover – The CI is then put back into production.
  • Restore – Finally the service is put back into production.
  • Now, when we look at each of the above steps of the expanded lifecycle, we can look for process improvement opportunities. This approach allows for greater scrutiny as there is a model to mentally walk stakeholders through. Let’s step through each of the steps now:

    First, there isn’t much we can do about incident occurring so let’s start with step two – the detection of incidents. One approach is to identify deviations from standard operational norms through automated alerts. This is a necessary reactive stance aimed at identifying that something has occurred, such as a service on a server failing.

    The second approach is more proactive and involves the use of identified thresholds to send alerts and/or alarms. Some monitoring tools allow for multiple levels of events. Thus, a warning alert may be sent at 90% of disk utilization and then a critical alarm sent at 98%.

    First Things First

    The last opportunity involves monitoring trends and making management decisions accordingly. The intent is to use one of the three at the exclusion of the others. Instead, a blended approach should be pursued to prevent incidents in the first place and to effectively and efficiently detect them when they do transpire.

    When diagnosing incidents, the most important first question to always ask is, “What changed?”

    For persons not schooled in the linkages between changes and incidents this isn’t always done. We know statistically that an incident is typically preceded by some change in state. If that information can be detected through automated tools and then shared with Incident Management then the ability to diagnose rapidly will accelerate with both the first fix rate and MTTR metrics improving along with customer satisfaction.

    In addition to interfacing change data to Incident Management the problem solving skills of personnel can be improved as well through education in the configuration items they are responsible for and tracking incidents and resolutions developed by vendors.

        1 2 >> Last Page

    IT Management Daily Newsletter

    Related Articles

    Most Popular