Prepared by CustomSystems
Enterprise High Availability
Segment
Compaq Computer Corporation
Contents
Introduction3
Determining Availability
Requirements3
What is the Cost of Downtime? 4
Recovery Point and Recovery
Time5
What Causes Downtime?6
Vulnerable Technologies7
Availability Technologies8
Architecting High-Availability
Systems9
-- Deployment Options10
-- Service and Support Options 10
Architecting and Deploying
High-Availability Solutions
Abstract: The demand for high availability is growing. Long required for
mission-critical applications in industries such as finance, process
manufacturing, and telecommunications, high availability today is fast
becoming a requirement in many other industries as well.
This white paper provides an overview of the combination of factors that
defines high-availability computing requirements. It describes
methodologies that can assist you in architecting and deploying the right
level of availability across your information-technology environment. And
it describes how Compaq can assist organizations of any size in achieving
their high-availability computing goals.
Putting it all Together12
List of Sales Offices14
Help us improve our technical communication. Let us know what you think
about the technical information in this document. Your feedback is valuable
and will help us structure future communications. Please send your
comments to: customsystems@digital.com
Architecting and Deploying High-Availability Solutions2
Notice
The information in this publication is subject to change without notice and is provided “AS IS” WITHOUT
WARRANTY OF ANY KIND. THE ENTIRE RISK ARISING OUT OF THE USE OF THIS
INFORMATION REMAINS WITH RECIPIENT. IN NO EVENT SHALL COMPAQ BE LIABLE FOR
ANY DIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL, PUNITIVE OR OTHER DAMAGES
WHATSOEVER (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS
PROFITS, BUSINESS INTERRUPTION OR LOSS OF BUSINESS INFORMATION), EVEN IF
COMPAQ HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
The limited warranties for Compaq products are exclusively set forth in the documentation accompanying
such products. Nothing herein should be construed as constituting a further or additional warranty.
This publication does not constitute an endorsement of the product or products that were tested. The
configuration or configurations tested or described may or may not be the only available solution. This test
is not a determination or product quality or correctness, nor does it ensure compliance with any federal
state or local requirements.
Product names mentioned herein may be trademarks and/or registered trademarks of their respective
companies.
Netelligent, Armada, Cruiser, Concerto, QuickChoice, ProSignia, Systempro/XL, Net1, LTE Elite,
Vocalyst, PageMate, SoftPaq, FirstPaq, SolutionPaq, EasyPoint, EZ Help, MaxLight, MultiLock,
QuickBlank, QuickLock, UltraView, Innovate logo, Wonder Tools logo in black/white and color, and
Compaq PC Card Solution logo are trademarks and/or service marks of Compaq Computer Corporation.
Microsoft, Windows, Windows NT, Windows NT Server and Workstation, Microsoft SQL Server for
Windows NT are trademarks and/or registered trademarks of Microsoft Corporation.
NetWare and Novell are registered trademarks and intraNetWare, NDS, and Novell Directory Services are
trademarks of Novell, Inc.
White Paper prepared by CustomSystems Enterprise High Availability Segment
First Edition (October 1998)
ECG064/1198
Architecting and Deploying High-Availability Solutions3
Introduction
When the systems that run your organization1 are down the costs can be devastating. Lost opportunities.
Lost revenues. Failure-to-perform fees. Non-compliance penalties. Plus stranded fixed costs you have to
keep on paying whether your people are productive or not.
Potentially more damaging is a nearly incalculable loss: the loss of good will. Customers, partners, and
suppliers affected by system shutdowns perceive your organization as poorly run and ill-suited to meeting
their needs.
Thus, the starting point for any discussion of availability has to be the cost of information system downtime
to your organization. The higher the cost of downtime, the more robust the availability environment needs
to be. And the more successful your environment is in delivering the level of availability required by the
organization, the faster the return on your investment.
The purpose of this white paper is to provide an overview of the factors that – taken together – define your
availability needs and how Compaq Computer Corporation can meet them.
Definitions
Before proceeding with a discussion of high availability, it is helpful to define a few terms.
Availability: The ratio of the total time a functional unit is capable of being used during a given interval to
the length of that interval. It is the proportion of time a system is productive which implies performance.
Mission Critical: A term applied to information systems upon which the success of an organization
depends and the loss of which results in unacceptable functional or financial harm.
Mean Time Between Failure(MTBF): a statistically derived length of time a user may reasonably expect a
component, device, or system to work between two incapacitating failures.
Reliability: A measure of how dependable a system is once you actually use it. Reliability can also be
considered the sum of availability and data integrity.
Determining Availability Requirements
Determining an organization’s availability requirements and architecting a system to meet them is a multistep process.
1. Determine the cost of downtime. (Page 4)
2. Understand recovery in terms of point and time. That is, when is recovery necessary in the system’s
operations and how long a time exists between the point of failure and recovery. (Page 5)
3. Focus on the events that can have a negative impact on the ability to keep an application -- and an
organization -- up and running. Understanding these events is essential for developing the right highavailability environment for your organization. (Page 6)
4. Understand the vulnerability of various systems with respect to the above. (Page 7)
5. Once you understand the events that lead to downtime, determine which technology areas you will need
to focus on to achieve the level of availability required. (Page 8)
ECG064/1198
6. When all these factors are understood, architect and deploy the availability environment. (Page 9)
1
As used in this white paper, organization means any entity that requires computing technology to achieve its goals and conduct its
operations. Examples include businesses, departments of businesses, academic institutions, research facilities, or military units.
Architecting and Deploying High-Availability Solutions4
Many of our customers find it more cost-effective to engage Compaq for architectural and deployment
activities.
1. What is the Cost of Downtime?
You need your Information System to survive in a world fraught with risk. A world where off-chance and
down-right failure can bring your operations grinding to a halt.
What happens to your organization when the system goes down? The range of answers runs from
“inconvenient” to “catastrophic.”
If your answer is closer to “catastrophic” than to “inconvenient” then you should read on. If your answer is
closer to “inconvenient” then perhaps you should read on to see if things are really as rosy as they seem.
In fact, most organizations underestimate -- or have not calculated -- the impact of downtime on their
business. The Gartner Group (1998) studied downtime costs for a variety of industries. The table below
summarizes the findings.
In order to measure the impact of downtime, let’s ask a basic question that helps quantify the level of
availability you might need.
Who and what gets hurt when a system goes down?
Processes: Vital business processes may be interrupted, lost, corrupted, or even changed. Such processes
might include order management, inventory management, financial reporting, transactions, manufacturing,
human resources, life-sustaining medical systems, extended 911 identification, ATM operations, and more.
Programs: Both long- and short-term revenue can be affected. Key employee or customer activities might
be missed or lost.
Business itself: In this age of electronic commerce, if prospects or customers can’t access your site, chances
are they’ll never come back. And the chances are good that they’ll end up with your competitor. Customers
lost forever!
People: Lives can be lost; employee benefits missed with adverse impact on morale; governmental
program problems might harm citizenry; and even battles can be adversely affected if vital information is
lost, corrupted, or late.
Projects: Hundreds of thousands of person-hours of work can be lost, deadlines can be missed, changeorders might be skipped with devastating results.
Operations: Those who manage the daily activities of an organization may find themselves without needed
data, with lost information, with standard activities lost, or with key reports missing or corrupted.
ECG064/1198
Architecting and Deploying High-Availability Solutions5
Loss can be measured in more than money. But if money is the measure then the figures can be astounding.
In a recent study, the Standish Group (1998) reports that costs of downtime typically range from $1,000 to
$27,000 per minute. What’s more, they report that in some cases, the cost of downtime for a single incident
has exceeded $10,000,000. And if you consider the estimates of the Gartner Group as noted, the costs can
be in the Billions! Think about what downtime means to your organization.
2. Recovery Point and Recovery Time
High availability means different things to different people. At the high end it is called “continuous
availability” or “nonstop computing” and has come to mean something on the order of 99.999% uptime,
some five minutes a year of downtime. Pretty impressive. But what is your definition of high availability?
Perhaps you don’t need “five-nines” but you’d like to come as close as you can. Your requirement may not
be for continuous computing 24 hours per day, 365 days per year, but you may require that when your
system is in operation it cannot go down. An airborne surveillance and target acquisition system might be
in operation for only eight hours over the forward edge of battle but it better be available every second that
it’s there. Or a retail operation that does 90% of its business during a holiday season had better not go down
during those few weeks or months. Each type of availability may demand very different requirements.
The first thing to keep in mind, though, is that defining availability depends on your needs in terms of
Recovery Point and Recovery Time.
While the inherent reliability of Information Systems has been increasing, things still do happen that cause
applications to stop. Disaster Recovery specialists tend to examine the impact possibilities in terms of
Recovery Point -- the amount of “acceptable” loss -- and Recovery Time -- the amount of time needed to
get back in operation. Recovery Point is most important in data-centric operations where the loss of data is
unacceptable. Recovery Time is most important in transaction-centric operations where realtime continuity
is key.
Do you need fast recovery, or recovery to the exact state prior to the failure… or both? What is the impact
on your operations measured by a Recovery Point standard? If you don’t resume processing right where
you left off will it be inconvenient? Damaging? Catastrophic? What is the most effective and efficient
method to use to recover the information? What is the impact on business measured in Recovery Time? If
you don’t resume processing within a second will it be inconvenient? Damaging? Catastrophic?
Thus the recovery strategy you use depends on this assessment of Recovery Point and Recovery Time. The
diagram below displays four availability options measured in those terms.
Availability Options
Weeks
Electronic
Recovery
Time
Machine
Cycles
0
Transactions
Vaulting
24 x 365
Recovery Point
Remote Hot Sites
On-Line
Hot Backup
1000’s of
Transactions
ECG064/1198
Loading...
+ 9 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.