'Availability' is no Longer Just an IT MetricHow you measure system availability says everything about your IT ops.
Mickey Zandi, managing principal, Consulting Services, at SunGard Availability Services, agreed. "Uptime is always driven by the business and supported by IT," he said. "To determine availability metrics, we first interview the business and then the IT team. We identify the core business measures for success, which typically revolve around revenue, cost and profit. Next, we identify what are the infrastructure components that drive those mission-critical applications and measure the business impact of downtime."
This focus on what the end user wants hasn't always been the defining factor in calculating system availability. Mouline explained that, in the past, service availability concentrated on specific areas of infrastructure. If a server was pingable, it was working. "Whether a server is up is interesting," he said, "but not necessarily relevant from a business perspective."
Steve Shalita, VP Marketing at NetScout, has also seen a shift in the perception of availability.
"Out-right failures are fairly rare these days," he explains. "Things are being built to avoid outages, equipment is built to maintain standards of availability. Many view degradation in the same way as an outage, and it can be much more impactful."
Previously, Shalita added, downtime was almost always caused by network problems. Today, he sees many more issues with application or server configuration although "the network" still gets blamed for performance problems. Availability measures have to take into account every element that contributes towards the user experience.
Measures of availability
Once availability is defined in a way that is meaningful to you, it's then possible to measure it. The standard approach to measuring is as a percentage. "The holy grail in the enterprise is five nines," said Shalita. That's 99.999% available. The small window when services are not available equates to five minutes per year.
"Response time is another measure," said Mouline. "How long did it take for the transaction to go through?" This is important because consistency is what counts for users. If it takes you half a second to process a transaction today and three seconds tomorrow, you'll soon start to feel that the system is unreliable.
Measurement should be based on the business need," said Chris O'Connell, director of Marketing at Nimsoft, Inc. "For example, for office workers at their desks, the work peak is typically between 9 a.m. and 5 p.m. That being the case, they need the best possible response time and availability during that time, depending on the application. Another example that clarifies where specific application prioritization plays a role would be at the end of a quarter at any given company, the financial tools may be given priority over other types of applications. The customer should be able to easily set and adjust priorities as necessary, based on business requirements."
These days, however, it is rare that a business only operates between nine and five. Now flexible working and a global customer base are commonplace, IT teams have less and less time to plan in scheduled maintenance work.
"The traditional IT team mentality was there is a maintenance window to change or update systems," said Zandi. "Maintenance windows are luxuries that no longer apply in the 7x24x365 global business environment. Maintenance needs to be done transparently to the business."
Fortunately, manufacturers have given us options for working in this environment. "Network-level vendors have kit that allows for upgrades without disrupting operations," Shalita explained.
Mouline believes that there is still the option of having a maintenance window for consumer-facing applications. Where the expectation of availability is 100%, scheduling a time for maintenance may be the only option. Any planned downtime should be well-communicated and as infrequent as possible. Other applications don't have these restrictions: a trading application, for example, only needs to be used during trading hours, so users' expectations of availability will be different. Understanding expectations of availability help define when maintenance can be planned in.
Calculating the impact
Unplanned downtime has a massive business impact. Figures from Alinean show that outages in a messaging system can cost around $1,000 a minute. Downtime for trading applications can cost up to $40,000 a minute, so tolerance of downtime could differ between mission critical and other systems.
It may take some time to define appropriate measures for system availability but there is one thing everyone's clear on: the opinion of the IT department is not important.
"There should be only one way that matters: availability in the mind of the customer and end-users' perception and experience," said O'Connell. "We measure by the users' experience, because that's reality of the situation."
Elizabeth Harrin is Computer Weekly's IT Blogger of the Year 2010. She is also director of The Otobos Group The Otobos Group, a business writing consultancy specializing in IT and project management. She's the author of "Social Media for Project Managers " and "Project Management in the Real World". She has a decade of experience in IT and business change functions in healthcare and financial services, and is ITIL v3 Foundation certified.