Home �   ITIL�  Index

The Capacity Manager And Their Complex Juggling Act

By Adam Grummitt Virtualization, consolidation, grids, blades, utility Computing, ITIL and other ghosties; the reality for the Capacity Manager.
May 23, 2005

ITSM Watch Staff

By Adam Grummitt

This article, is about virtualization, consolidation, grids, blades, utility Computing, ITIL and other ghosties; the reality for the Capacity Manager.

There is an old Cornish prayer from the West Country of England:
From ghoulies and ghosties,
And long-leggedy beasties
And things that go bump in the night,
Good Lord, deliver us!
The Information and Communication Technology (ICT) data-center ideally would not fear ghosties or have anything to go bump in the night as it would be entirely self-managed. Such ideas for exploiting computational intelligence have been around for some time but are yet to be fully realized. Other ideas are emerging that are gradually becoming popular as they exploit emerging new technology.

In the list of topics above that a Capacity Manager could face, some are real; some are less so; but not all are "ghosties" or "vaporware."

Computing has always been a complex juggling act. Essentially it is a problem of interfacing devices which perform at hugely different speeds. Processors, disks, controllers, networks, memory, bus, cache and all the other variables in the ICT architecture have to be configured. Each has its own bandwidth and performance characteristics. And then the exercise repeated for all the demands placed on the system by a large number of different users performing different functions. The key performance issue remains one of balancing them all to minimise the inevitable bottlenecks in such a disparate environment.

The idea of virtualized computing is not new, having long been used to describe a form of virtual data management in storage, or virtual Operating Systems (OSs) for multiple users with logical partitioning capabilities, or emulation of one OS on another. This was used for sharing development, testing and production instances on the same hardware. Virtualization can now be applied at different levels where a server can support multiple instances of OS images. Similar techniques can be used to separate the logical from the physical by virtualization of data (SAN) and networks (LAN/WAN).

Virtualized computing is essentially the practice of automating consolidation of ICT resources. This allows them to be managed as a "pool" of like devices (servers, storage or networking capacity) that can be allocated on the fly. This can be achieved by multiple machines with special load balancers in front of them or other techniques where a virtualization software layer sits on top of multiple hardware systems, making them look like a single pool of computing power.

In the mainframe computer world with IBM's VM (now called z/VM) OS, each of many simultaneous users of the mainframe (physical server) is given apparent control of the entire computer with access to an OS and all other needed resources.

VMware is the major software vendor providing server virtualization for Windows and Linux systems with Intel-powered servers. Microsoft offers its own Virtual Server product.

Consolidation is the current manifestation of what used to be called "down-sizing" or "right-sizing" reflecting the prevalent economic mood. In essence, it is viewed as having just the right number of servers (or file storage or databases) or right number of locations (centralization v subsidiarity) or right number of rationalized applications or right number of data repositories.

The centralization pendulum has swung to and fro. Originally, there was a desire to eliminate large central processing complexes (the dreaded "mainframe") at all costs, and to implement all ICT services on small "open systems."

Now, it would seem, many are coming back to the realization that central control of a few large servers, as opposed to dispersed control of many small servers, can have major advantages.

Grid computing is a type of distributed computing in which a wide-ranging network connects multiple computers whose resources can then be shared by all end-users in what is often called "peer-to-peer" computing. It has emerged from the scientific and technical computing segment and is now maybe on the verge of more widespread commercialization.

The principal focus of Grid Computing to date has been on "scavenging" - maximizing the use of available processor resources for compute-intensive applications and thus taking advantage of unused capacity within a network.

Blade Servers and Blade PCs are an emerging hardware option. They are the latest manifestation of modular computing, which has been a long-standing goal. Many early approaches amounted in essence to rack mounting rather than pedestal or cabinet based devices. Now the move is to blades within the rack. This is applied both to complete servers with all their interfaces and to complete PC systems, with the user having just a keyboard, mouse (or equivalent) and monitor - a thin client, or essentially back to the old distributed dumb terminals.

In a blade chassis, slim, hot-swappable blade servers fit like books in a bookshelf and each is an independent server, with its own processors, memory, storage, network controllers, OS and applications. The blade server simply slides into a bay in the chassis and plugs into a mid or backplane, sharing power, fans, floppy drives, switches, and ports with other blade servers.

Utility Computing
Utility computing is all about supply and demand. It is aimed at getting computing resources working together as one entity, balancing workload demands to available resources. Technologies like Web Services, rapid server provisioning, storage virtualization and network route-optimization technologies tend to fuse servers, storage and network links under a centralized control. The utility computing vision is a new data center architecture built on standards, commodity components and consolidated control. The idea is that ICT systems are shared, self-managing and efficient.

In the initial stage nearly every company attempted to create their own terms for similar ideas in this area. IBM chose "on demand", HP picked "adaptive" and "utility", Microsoft preferred "agility," and Forrester chose the word "organic." Now most are happy to just say "utility computing." However, it is now more confusing as to what each vendor means by utility computing. Some are referring solely to data-center sharing, some dynamic provisioning and some offer a pure pay-as-you-go financing option for outsourcing. Most are referring to a variable mix of ideas.

There are four main approaches to utility computing. Any given organization might use none, one, more than one or conceivably all of them.

  1. The first approach is a provisioning system that can move work among multiple servers as required. Maybe hardware with extra capacity built in (but not turned on) is available which allows extra capacity to come online at short notice.
  2. The second approach is service virtualization, as already discussed.
  3. The third approach is outsourcing. A service provider can help a user handle peaks in its computing needs.
  4. Partitioning is the fourth main approach to utility computing: buy one big box, partition it to run different applications and different OSs or OS levels if necessary, and change partitions on the fly.
This at least has a clear definition. It is the Information Technology Infrastructure Library. It is a set of books which define Good Practice within ICT, with emphasis on IT Service Management (ITSM).

The UK Office of Government Commerce issues procedures under the ITIL title (see www.itil.co.uk). Developed in the late 1980's, ITIL started as a guide for computing in the UK public sector. Metron played a leading part in the early Demonstrator project for it. Today, ITIL is known and used worldwide (see www.itsmf.net).

The core IT Service Management (ITSM) books define Service Support and Service Delivery. These two key ITSM books define ten processes (and one function: the Service Desk or Help Desk). One of these processes (in Service Delivery) is Capacity Management, which embraces both computer performance management and capacity planning.

Capacity Management
The essential purpose of Capacity Management is to provide sufficient computing capacity to satisfy the needs of the business. This discipline is increasingly needed, despite rapid hardware price-performance improvements, as the aspirations of end-users and resource demands of application software are increasing even more rapidly and so total ICT expenditure is also increasing.

Essentially the activities defined within ITIL as Capacity Management include:

  • Performance management (monitoring, accounting, tuning etc.)
  • Capacity planning (workload characterization and trending, configuration planning etc)
  • Performance forecasting (modeling and future performance prediction etc.)
Irrespective of the hardware architecture or virtualization approach, performance of a system is key to the user (once you have system availability and continuity). The essential matter is to find the key metrics that reflect the performance of the system and capture them. Certainly every real instance of an OS will yield useful measures in terms of resources required at system level as well as per user or command or process. These can be captured and collected and maintained on a Capacity Management Database. This can be used to characterize the workload and then assess the potential impact of changes in those workloads.

Beyond this, other metrics can be collected from data sources as available to reveal key patterns within applications or devices such as a SAN to relate their activity to the given workloads. Thus end-to-end pictures and correlations can be derived which help to pin-point the root cause of performance problems.

There is a danger of becoming obsessed with the details of the underlying architecture. What is at issue is the performance of the solution as experienced by the end-user and ensuring that the service is maintained at the required level. Collect performance data on a regular basis and analyze it. Identify what is key, measure it, publish it, evaluate it and act upon what it foretells.

Adam Grummitt has been playing with computers since graduating from Cambridge way back. Doing research in mass spectrometry he used the first Digital PDP-8 in the UK and early IBM mainframes.

He has since been an analyst, designer and programmer for end-users, software houses and as a consultant. He has been in performance management and capacity planning for many years, specializing in strategic studies, application trials and performance engineering.

He has served time in Health Computing (first in an NHS Integrated Patient Record Project, then in an NHS Regional Datacentre, then in a health computing software house) and in more recent years he has done a significant amount of tactical and strategic Capacity Management consultancy for the NHS in the UK (with Metron).

He was a founding Director of Metron and he is now responsible for the roadmap, consultancy and international partners.