Monday, March 03, 2008

Perfect Uptime, Is it Necessary?

I recently published a comment on TechRepublic regarding an discussion thread on the concept of 99.999% availabiltiy and perfect "uptime." I've excerpted the thread below.

Visit TechRepublic for the complete discussion.

You might have read about the recent — some call it annual — BlackBerry outage in North America or reports of Amazon’s S3 storage service being unaccessible for several hours just last month. As an IT professional, you may wonder how much downtime is considered acceptable or if perfect uptime is even possible.

...Now, everyone knows that ISPs oversubscribe their bandwidth. Guaranteed bandwidth is available though — if you are prepared to pay for it. Where I live in Singapore, all ISPs offer “business Internet” connectivity that deliver pretty close to advertised speeds round-the-clock.

...However, they can be priced up to 10 times or more than the price I can get as a consumer at home. Ditto to entry-level hosting plans with “shared” bandwidth.Similarly, if you require five 9’s of uptime, then be prepared to pay for it — be it in the form of redundant data centers, multiple Internet trunks, fail-over clusters, or even a couple of mainframe computers.Does the operation of your company require 99.999% — or even “perfect” — uptime?

My Comments:

By now, we should all be aware that Availabiity should be measured by Service. This means that if there are several systems necessary to deliver a service (email, internet access), we should measure the average uptime of all system components required to deliver that service. In doing so, we can arrive at a predictable level of uptime.

This is the baseline used to determine the improvements necessary to meet customers requirements and expectations regarding availability. If a higher level of availability is required, then additional components (bandwidth, storage, memory) can be added to reach the desired level of availability.

Additionally, availability is always relative to the timeframe in which it is measured.

This means that a service that requires 99.999% availability between the hours of 9-5 (M-F) may be much different (in terms of architecture & resource requirements) than a service that requires the same level of availabilty 24/7.

Thus, not every service requires the same level of availability.This concept of relative availability is one that is always missing from the discussions of uptime.

What is also missing from these discussions is a definition of "uptime" and the impact of performance on that definition.If a service is available but very slow, is it still considered "up?"

Here is where the specifics in a Service Level Agreements become very important.ITIL provides guidance on these and other IT Service Management process disciplines.

read more | digg story

No comments: