April 24, 2008
By George Spafford
Don't let incidents and problems blemish the veracity of your change management process, writes ITSMWatch columnist George Spafford of Pepperweed Consulting.
The ITIL change management process is tasked with balancing the risk of making a change against the impacts to the business of not making a change. While this process has earned a bad reputation when not properly implemented, it is an enabler that can increase speed and agility by reducing levels of unplanned work associated with failed changes.
A key metric to track for oversight of the process is the change success rate (CSR). When employed correctly the process becomes effective. We should see correlations between improvements in the change success rate, a decline in availability related incidents and thus reductions in unplanned work.
In pursuit of this metric, we need to establish how it is computed. People basically accept that the CSR is a percentage created by dividing the number of successful changes by the total number of changes multiplied by 100. Like many things, the devil is in the details because a great many definitions exist about what constitutes “successful changes” and “total changes”.
In terms of successful changes, some groups will say a change is successful if it “fixes” what is broken and others will add in that changes shouldn’t create incidents or problems. While the latter part is acceptable, the former is not because it mixes processes versus isolating change.
Change management is fundamentally a process designed to manage risks.
Incident management is tasked with dealing with deviations from standard operations or things that may threaten standard operations by opening an incident record in the configuration management database (CMDB). In situations where the root cause is unknown and/or a major incident is underway, a problem record may be opened also. Both of these processes have their own metrics.
With change management we need to focus on the changes – not the incidents, problems or other process areas. We need to understand if the changes were planned correctly and were then implemented according to plan.
To better focus the metric on change management, a change is successful if it was implemented according to plan without creating Incidents and problems. We do not want to cause overlapping measures by requiring that the related incident be resolved. Resolution isn’t the point of the change management process.
If incidents and problems are not resolved by a change, then those respective records remain open and they can be related through fields in the CMDB. For example, if the change record RFC123 is opened in the CMDB it can be related to incident record INC001, INC002 and PRB004.
If the change is implemented per plan without any issues but the two incident and one problem records that required the change are not resolved, then that is not a failing of the change management process. In fact, the change was implemented successfully based on our aforementioned criteria.
Upon reading the above referenced article, the question remains, what metrics are important for measuring the effectiveness of the Change Management process?
Mr. Spafford is partially correct in his assessment that measuring the success rate of changes is an important measurement in determining the effectiveness of your process. He also correctly indicates that Change Management is largely about Risk Management.
What the author does not clearly indicate is the very close relationship between the two processes. The "risk" that Change Management seeks to mitigate is the risk of service unavailability. Incidents impact availability. In fact, if a service experiences unavailability, that is always a result of an Incident.
Therefore, contrary to the assertion put forth by Mr. Spafford, Change Management should be very concerned about the occurrence of incidents. Not to resolve them but as an indicator of the effectiveness of the Change Management process.
It follows that if there is an increase in the number of incidents following a change, there is probably something wrong with my process. The risk is not being properly mitigated.
This information should be used to make the proper adjustments to the process to bring the risk down to an acceptable level. Ideally, there should be no incidents following changes because the change has been properly planned, tested, coordinated and risk minimized.
Lastly, as the CMDB is a repository for information on Service Components (CIs), it is unlikely that Changes, Incidents or Problems will be recorded there. They will all be recorded in their corresponding repositories and linked to each related CI.
To read the article in its entirety, visit ITSMWatch