Home �   ITIL�  Index

How to Mitigate the Risk of Failed Changes, Part II

Success is based on stated change objectives, writes ITSMWatch columnist George Spafford.
Jan 4, 2010

George Spafford

In the first article, I looked at the risks of ignoring changes processes and the failures associated with that risk. In this article, I consider what really constitutes change success. The issue here is that it is important to avoid commingling data types. A change represents a modification to a production state. It is not tasked with fixing an incident, for example, but in balancing the risks of a change to the operational IT environment.

A change may be related to an incident and the intent may be to fix one or more incidents but that is not what we must measure. Instead, a change must have a defined objective(s) that require a change in state and the plan must formally define how that objective(s) will be achieved.

Imagine that a change is about replacing a problem DLL on a Windows server that caused an incident. The change objective should be to replace the DLL and the plan should outline what needs to be done. It is the CMS that must relate this change to the appropriate incidents. This segregation enables analysis and reporting.

The reason for this is that a change can be successful but not directly result in the resolution of an incident, etc. On one hand, it may take a number of successful changes to resolve some incidents and each needs a meaningful objective that isn’t just “resolve incident 123”. On the other hand, it is also possible for a change to go in successfully per the objective and plan but not solve the change as expected.

Change vs. Incident

Now, to some people, the idea of even a single change completing successfully but not resolving an incident may seem ludicrous. To groups where there are significant segregation of duties, it is very conceivable that an error is made by another team with incorrect instructions given to the group actually planning and the implementing change.

To illustrate this, we need a scenario: Imagine that someone in incident management does a quick review and says that “the error is with object XYZ v1 and needs to be replaced with XYZ v2”. The person managing the change plans it perfectly, the change is introduced into production according to plan, and the objective of revising the XYZ is met. However, the incident is still there and when people investigate further, they find out that two other objects were the real cause and introduce a new change.

From a process improvement perspective, we need to understand that the problem isn’t with Change Management or maybe even Release and Deployment Management. There was an error on the part of Incident Management or Problem Management to relay the correct need. Confusing the different records and stating that a change failed when it actually did not could make continual improvement difficult, not to mention penalize the wrong group.

How you handle these scenarios given your capabilities is something to think through. The main point is to recognize that a change needs objectives and success or failure is based on whether these objectives are attained.

Normalized, accurate and timely information is vital. Emphasizing the previous section, IT management needs timely and accurate information to make the right decisions. Change related data needs to be in change records and then relationships to other record types such as Incident, Problem, Capacity, Availability, Release & Deployment and so on must be established and managed. Data that is buried in the wrong records and/or the wrong fields will hamper effective and efficient analysis and reporting.

Reviewing Failed Changes

Organization should review failed changes at least weekly to understand the causal factors. Indeed, this should be part of each change advisor board’s (CAB’s) standing agenda―to review failed changes. It’s not enough to say “Oh, these changes failed.” and leave it at that. Instead, Continual Service Improvement (CSI) resources should be engaged, potentially with Problem Management, to understand the root cause and implement a solution. Corrective actions must take place or the processes and IT will not improve and repeat the same mistakes.

Organizations need to firmly define what a failed change is in order to improve. Care must be taken to ensure that records are completed and related correctly to aid in future analysis and management decision making. Does someone need training? Is something not documented? Is there a problem with a tool? Does this server need to be rebuilt according to standards? These questions, and many more must be answered otherwise the organization is doomed to an endless loop of incidents and wasteful levels of unplanned work. Failed changes will be many times worse if the organization fails to identify them for what they are and then doesn’t learn from them.

It is only once IT service is placed into operation that it can deliver value to the organization. Any changes made to those services carry potential risks. There are risks in terms of continuing to meet defined requirements and service levels including the service warranty dimensions of confidentiality, integrity and availability. When the IT service is adversely impacted, then IT goes into a reactive mode of trying to correct the incident and this is, by definition, unplanned work that comes at the expense of planned work.

At the same time, there is a very real risk of not making the change and not enabling the business to pursue its goals and objectives. This tension between stability and agility is a very real and continuous balancing act. An important aspect of this balancing act relates to failed changes.

George Spafford is an experienced executive, a prolific author and speaker, and has consulted and conducted training on strategy, IT management, information security and overall process improvement globally. He can be reached at gspaff@hotmail.com.

change management, ITIL, ITSM, Spafford, risk

IT Management Daily Newsletter

Related Articles

Most Popular