The Importance of Classifying Incidents, Service CallsBy Kathiresan Lakshmanan Proper Incident classification is a crucial step in determining which tasks are most important and who should handle them.
Incident and Change Management are two processes being widely used in some form even before ITIL enters the picture. At least a simple event monitoring tool will be in place in most organizations to monitor the health and performance of IT infrastructure.
ITIL talks in-depth about how the Incident Management process should be implemented and followed. One such discussion is about Event Classification (Prioritization & Categorization). Even organizations that have implemented ITIL and that may have tools properly implemented, customized and finely tuned still struggle with classifying incidents. Some of the questions are:
- How to differentiate between Alarm, Incident, Event, Alert, Symptom and or Problem
- Why Incident, Event Categorization or Classification is so important and how is this linked to the other processes in ITIL?
- What is the best practice for standardizing Event, Incident classification?
Difference between Alarm, Incident, Event, Alert, Symptom and Problem
The terminology is being used interchangeably in many places. Any change in status of an IT infrastructure can be termed as an Event, Alarm and Alert and are similar to events. Change in status or an increase or decrease in a threshold can result in an alarm, event or an alert. Some of the monitoring tools use terms like Symptom and Problem. Where symptoms are events that can lead to a problem which is a root cause of all symptoms.
When it comes to ITIL everyone knows what an Incident or a Problem is and the difference between them. There are two tools widely in use, Event Monitoring Tools, using the terms Event, Alarm, Alert, Incident, and Problem and Service Management Tools which specify Incident and Problem. We all know that every event can be a potential Incident, but the question many have is should all the events generated by the monitoring tools be an Incident? General thinking agree that not every event in the infrastructure should necessarily end up as an Incident.
Consider the following example; 80% CPU Utilization in a 5m polling interval can be an event, if this event occurs for a continuous duration of 15m, then this can be an Incident and if the Incident is closed with a workaround or the same Incident repeating over a period of time, can be a potential Problem. So the bottom line is not all events should necessarily be opened as an Incident. Remember to open Incidents only when you take some action on them.
Why Incident, Event Classification are so important and how is this linked to other ITIL Processes?
ITIL is all about deploying best practices and continuously improving on the current state so that you can better serve the customer and increase the customer satisfaction. The result of the process improvement should help in exceeding customer expectation by matching and in fact exceeding the Service Level Agreement. Since the Availability is directly linked to the SLA and in turn to the Incident management process, the ability to categorize and classify incidents becomes of utmost importance.
Proper categorization of Incidents helps in routing to the right team, the first time. Routing the Incidents accurately is very important in Incident management process, since it saves a lot of time in troubleshooting and bringing the service to normalcy. One of the KPIs (Key Performance Indicators) of the Incident Management Process is to identify how many times the Incident bounced to different teams. It also helps in analyzing the Incidents based on the classification to do proactive Problem Management which in turn helps to reduce Incidents.
Proper Incident Classification is very important to identify and prioritize on which Incidents to work on first. Since this is directly related to meeting the SLAs the Incident classification has to be given careful thought. If there are 2 identical Incidents, IT failures at the same time, one from a Business critical service and another from a Business important service, then with a proper classification the First Level Analyst can work on the Business critical incident and help prevent a business loss and in turn SLA violation. So in a sense, incident prioritization is directly related to the penalties in the SLA.
Incident Classification is some way or another linked to the other processes like Problem, Availability and Service Level Management.
Best Practice for Standardizing Event, Incident Classification
Incidents have two main classifications:
Prioritization is usually done at the Service Desk by the Service Desk Analysts. In the case of Incidents opened by tools, this (to some extent) can be automated.
Impact and the Urgency of the Incident together decide the Priority. Severity comes from the events that are identified by the monitoring tools. Severity can be mapped directly to the "Impact to the Business."
It is very common that Severity and Priority are used interchangeably. Most commonly Sev1 to Sev5 are used to prioritize the incidents and it is common that SLAs are signed based on this Prioritization. Sev1 is the most critical and Sev5 the least, some tools use Priority 1 to 5. Some also use Prioritization codes such as; None, Low, Medium, High, Critical, Global etc.
Since Severity from the event management tools is used to identify the Impact, it is better that the ITIL term Priority is used to identify the prioritization part of the classification.
What factors influence the Priority?
- Impact - Severity of the Incident. This is the measure of the impact to the business
- Urgency - How much delay can be tolerated in fixing the issue? How quickly it should be resolved
- Customer Importance - ie; a call from the CEO
- Resources required to fix the issue
- Potential cost of non-resolution
- Disruption of service to the customer
Note: One thing to remember is Impact is not about the technical complexity but the impact to the business if the service is not restored in time.