8 Steps to Better Incident ClassificationIncident classification is one of the most important and yet least implemented aspects of ITIL, writes ITSM Watch columnist Hank Marquis of itSM Solutions.
Still, many IT organizations struggle with classification. This is evidenced by the number of incidents coded as unknown or other. This means that classification has failed; leading to more downtime and decreased service quality.
There are several relatively quick and easy fixes for this problem, and below I describe 8 simple steps to improve incident classification.
Classification is understanding, identifying, and quantifying affected systems. Effective classification helps route the incident to the correct team. Classification starts to go wrong when diagnostic scripts become to complex.
While extremely valuable, scripts require diligent management effort. However, trying to collect massive amounts of data through dozens of questions slows the process down, complicates the workflow, and results in incomplete classification.
A simple observation is to keep your diagnostic scripts as simple and purposeful as possible.
To improve classification, examine the following 8 areas:
Use diagnostic scripts. Use scripts to standardize and formalize incident classification. Without a repeatable process like a script you cannot reliably classify incidents. Without good, easy to use scripts you will not obtain management information required to maintain their effectiveness.
Classify by configuration item (CI), not symptoms. The classic mistake often repeated is to classify an incident based on what the user says (e.g., thinks) is the problem.
This is a recipe for bouncing incidents. Symptoms change, can be misleading, and quite often the user honestly does not know what they are experiencing.
Instead, collect information (No.1 above) and base the decision on the affected service, system, etc. Do record the symptoms in a comments field, but know that different users report different symptoms for the same incident. Classifying on symptom is worst- not best-practice.
Classify incidents, not calls. Logging calls is a very different activity from classification and initial support. Call logging simply gathers route data a specialist will use later. If you are logging calls, then do not bother trying to perform incident classification.
Keep it simple. Review all diagnostic scripts often to make sure they are not too complicated. Good enough is perfect. Too much time spent trying to make a perfect script often results in something to difficult and long winded for staff to complete in a reasonable amount of time.
Always make sure you include other or unknown as diagnostic codes. These codes are indicators that your scripts need maintenance.
Use a service catalog. If you have a service catalog, use it. If you do not have one, consider implementing one. Service catalogs can dramatically improve the speed of incident classification as you have to collect less information. This improves data accuracy in the incident record, and assists in routing, escalation, and support.
Use your tools. Most of the automated software tools available today provide really well implemented incident classification features, but you have to use them! Check into the capabilities of your systems, and if possible, use them. Remember tip No.4.
Take maturity into account. You can't win the Super Bowl with a high school football team. For new organizations, or those without strong process controls, you will need to change slowly over time. You need to assess your maturity to establish your expectations. Dont try to take your high school team to the super bowl.
Validate your scope. It is very important to realize that not every single event that occurs warrants an incident. It is easy to set your scope too wide or too narrow.
To wide and every normal automated system event can become an incident; swamping your staff and systems. To narrow and you are not delivering the highest value.
A good rule-of-thumb is to raise an incident only if there is some action required. This means that normal diagnostic messages on throughput, utilization and so on should not be incidents. Set your scope carefully for highest performance.