Improving Incident ManagementIncident Management is predictably fixable, writes ITSM Watch columnist George Spafford of Pepperweed Consulting.
How IT reacts to them will be pivotal not just to operations in their drive to reduce mean time to repair and mean time between failures, but also to customer satisfaction.
Incident resolution does follow an observable lifecycle. Consider the following six stages:
Now, when we look at each of the above steps of the expanded lifecycle, we can look for process improvement opportunities. This approach allows for greater scrutiny as there is a model to mentally walk stakeholders through. Lets step through each of the steps now:
First, there isnt much we can do about incident occurring so lets start with step two the detection of incidents. One approach is to identify deviations from standard operational norms through automated alerts. This is a necessary reactive stance aimed at identifying that something has occurred, such as a service on a server failing.
The second approach is more proactive and involves the use of identified thresholds to send alerts and/or alarms. Some monitoring tools allow for multiple levels of events. Thus, a warning alert may be sent at 90% of disk utilization and then a critical alarm sent at 98%.
First Things First
The last opportunity involves monitoring trends and making management decisions accordingly. The intent is to use one of the three at the exclusion of the others. Instead, a blended approach should be pursued to prevent incidents in the first place and to effectively and efficiently detect them when they do transpire.
When diagnosing incidents, the most important first question to always ask is, What changed?
For persons not schooled in the linkages between changes and incidents this isnt always done. We know statistically that an incident is typically preceded by some change in state. If that information can be detected through automated tools and then shared with Incident Management then the ability to diagnose rapidly will accelerate with both the first fix rate and MTTR metrics improving along with customer satisfaction.
In addition to interfacing change data to Incident Management the problem solving skills of personnel can be improved as well through education in the configuration items they are responsible for and tracking incidents and resolutions developed by vendors.