Hp COMPAQ PROLIANT 1200, COMPAQ PROLIANT 2000, COMPAQ PROLIANT 5000, COMPAQ PROLIANT 1500, COMPAQ PROLIANT 3000 Architecting and Deploying High-Availability Solutions

...
White Paper
October 1998 ECG064/1198
Prepared by CustomSystems Enterprise High Availability Segment
Compaq Computer Corporation
Contents
Introduction 3
Determining Availability Requirements 3
What is the Cost of Downtime? 4
Recovery Point and Recovery Time 5
What Causes Downtime? 6
Vulnerable Technologies 7
Availability Technologies 8
Architecting High-Availability Systems 9
-- Deployment Options 10
-- Service and Support Options 10
Architecting and Deploying High-Availability Solutions
Abstract: The demand for high availability is growing. Long required for mission-critical applications in industries such as finance, process manufacturing, and telecommunications, high availability today is fast becoming a requirement in many other industries as well.
This white paper provides an overview of the combination of factors that defines high-availability computing requirements. It describes methodologies that can assist you in architecting and deploying the right level of availability across your information-technology environment. And it describes how Compaq can assist organizations of any size in achieving their high-availability computing goals.
Putting it all Together 12
List of Sales Offices 14
Help us improve our technical communication. Let us know what you think about the technical information in this document. Your feedback is valuable and will help us structure future communications. Please send your comments to: customsystems@digital.com
Architecting and Deploying High-Availability Solutions 2
Notice
The information in this publication is subject to change without notice and is provided “AS IS” WITHOUT WARRANTY OF ANY KIND. THE ENTIRE RISK ARISING OUT OF THE USE OF THIS INFORMATION REMAINS WITH RECIPIENT. IN NO EVENT SHALL COMPAQ BE LIABLE FOR ANY DIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL, PUNITIVE OR OTHER DAMAGES WHATSOEVER (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION OR LOSS OF BUSINESS INFORMATION), EVEN IF COMPAQ HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
The limited warranties for Compaq products are exclusively set forth in the documentation accompanying such products. Nothing herein should be construed as constituting a further or additional warranty.
This publication does not constitute an endorsement of the product or products that were tested. The configuration or configurations tested or described may or may not be the only available solution. This test is not a determination or product quality or correctness, nor does it ensure compliance with any federal state or local requirements.
Product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
Compaq, Contura, Deskpro, Fastart, Compaq Insight Manager, LTE, PageMarq, Systempro, Systempro/LT, ProLiant, TwinTray, ROMPaq, LicensePaq, QVision, SLT, ProLinea, SmartStart, NetFlex, DirectPlus, QuickFind, RemotePaq, BackPaq, TechPaq, SpeedPaq, QuickBack, PaqFax, Presario, SilentCool, CompaqCare (design), Aero, SmartStation, MiniStation, and PaqRap, registered United States Patent and Trademark Office.
Netelligent, Armada, Cruiser, Concerto, QuickChoice, ProSignia, Systempro/XL, Net1, LTE Elite, Vocalyst, PageMate, SoftPaq, FirstPaq, SolutionPaq, EasyPoint, EZ Help, MaxLight, MultiLock, QuickBlank, QuickLock, UltraView, Innovate logo, Wonder Tools logo in black/white and color, and Compaq PC Card Solution logo are trademarks and/or service marks of Compaq Computer Corporation.
Microsoft, Windows, Windows NT, Windows NT Server and Workstation, Microsoft SQL Server for Windows NT are trademarks and/or registered trademarks of Microsoft Corporation.
NetWare and Novell are registered trademarks and intraNetWare, NDS, and Novell Directory Services are trademarks of Novell, Inc.
DIGITAL and OpenVMS are trademarks of Digital Equipment Corporation. Pentium and Intel are registered trademarks of Intel Corporation. Copyright ©1998 Compaq Computer Corporation. All rights reserved. Printed in the U.S.A. Architecting and Deploying High-Availability Solutions
White Paper prepared by CustomSystems Enterprise High Availability Segment First Edition (October 1998)
ECG064/1198
Architecting and Deploying High-Availability Solutions 3
Introduction
When the systems that run your organization1 are down the costs can be devastating. Lost opportunities. Lost revenues. Failure-to-perform fees. Non-compliance penalties. Plus stranded fixed costs you have to keep on paying whether your people are productive or not.
Potentially more damaging is a nearly incalculable loss: the loss of good will. Customers, partners, and suppliers affected by system shutdowns perceive your organization as poorly run and ill-suited to meeting their needs.
Thus, the starting point for any discussion of availability has to be the cost of information system downtime to your organization. The higher the cost of downtime, the more robust the availability environment needs to be. And the more successful your environment is in delivering the level of availability required by the organization, the faster the return on your investment.
The purpose of this white paper is to provide an overview of the factors that – taken together – define your availability needs and how Compaq Computer Corporation can meet them.
Definitions
Before proceeding with a discussion of high availability, it is helpful to define a few terms.
Availability: The ratio of the total time a functional unit is capable of being used during a given interval to the length of that interval. It is the proportion of time a system is productive which implies performance.
Mission Critical: A term applied to information systems upon which the success of an organization depends and the loss of which results in unacceptable functional or financial harm.
Mean Time Between Failure(MTBF): a statistically derived length of time a user may reasonably expect a component, device, or system to work between two incapacitating failures.
Reliability: A measure of how dependable a system is once you actually use it. Reliability can also be considered the sum of availability and data integrity.
Determining Availability Requirements
Determining an organization’s availability requirements and architecting a system to meet them is a multi­step process.
1. Determine the cost of downtime. (Page 4)
2. Understand recovery in terms of point and time. That is, when is recovery necessary in the system’s operations and how long a time exists between the point of failure and recovery. (Page 5)
3. Focus on the events that can have a negative impact on the ability to keep an application -- and an organization -- up and running. Understanding these events is essential for developing the right high­availability environment for your organization. (Page 6)
4. Understand the vulnerability of various systems with respect to the above. (Page 7)
5. Once you understand the events that lead to downtime, determine which technology areas you will need to focus on to achieve the level of availability required. (Page 8)
ECG064/1198
6. When all these factors are understood, architect and deploy the availability environment. (Page 9)
1
As used in this white paper, organization means any entity that requires computing technology to achieve its goals and conduct its
operations. Examples include businesses, departments of businesses, academic institutions, research facilities, or military units.
Architecting and Deploying High-Availability Solutions 4
Many of our customers find it more cost-effective to engage Compaq for architectural and deployment activities.
1. What is the Cost of Downtime?
You need your Information System to survive in a world fraught with risk. A world where off-chance and down-right failure can bring your operations grinding to a halt.
What happens to your organization when the system goes down? The range of answers runs from “inconvenient” to “catastrophic.”
If your answer is closer to “catastrophic” than to “inconvenient” then you should read on. If your answer is closer to “inconvenient” then perhaps you should read on to see if things are really as rosy as they seem. In fact, most organizations underestimate -- or have not calculated -- the impact of downtime on their business. The Gartner Group (1998) studied downtime costs for a variety of industries. The table below summarizes the findings.
Industry Application Average Cost per
Hour of Downtime
Financial Brokerage Operations $6,500,000 Financial Credit Card Sales $2,600,000 Media Pay-per-View $1,150,000 Retail Home Shopping (TV) $ 113,000 Retail Catalog Sales $ 90,000 Transportation Airline Reservations $ 89,500
In order to measure the impact of downtime, let’s ask a basic question that helps quantify the level of availability you might need.
Who and what gets hurt when a system goes down?
Processes: Vital business processes may be interrupted, lost, corrupted, or even changed. Such processes might include order management, inventory management, financial reporting, transactions, manufacturing, human resources, life-sustaining medical systems, extended 911 identification, ATM operations, and more.
Programs: Both long- and short-term revenue can be affected. Key employee or customer activities might be missed or lost.
Business itself: In this age of electronic commerce, if prospects or customers can’t access your site, chances are they’ll never come back. And the chances are good that they’ll end up with your competitor. Customers lost forever!
People: Lives can be lost; employee benefits missed with adverse impact on morale; governmental program problems might harm citizenry; and even battles can be adversely affected if vital information is lost, corrupted, or late.
Projects: Hundreds of thousands of person-hours of work can be lost, deadlines can be missed, change­orders might be skipped with devastating results.
Operations: Those who manage the daily activities of an organization may find themselves without needed data, with lost information, with standard activities lost, or with key reports missing or corrupted.
ECG064/1198
Architecting and Deploying High-Availability Solutions 5
Loss can be measured in more than money. But if money is the measure then the figures can be astounding. In a recent study, the Standish Group (1998) reports that costs of downtime typically range from $1,000 to $27,000 per minute. What’s more, they report that in some cases, the cost of downtime for a single incident has exceeded $10,000,000. And if you consider the estimates of the Gartner Group as noted, the costs can be in the Billions! Think about what downtime means to your organization.
2. Recovery Point and Recovery Time
High availability means different things to different people. At the high end it is called “continuous availability” or “nonstop computing” and has come to mean something on the order of 99.999% uptime, some five minutes a year of downtime. Pretty impressive. But what is your definition of high availability? Perhaps you don’t need “five-nines” but you’d like to come as close as you can. Your requirement may not be for continuous computing 24 hours per day, 365 days per year, but you may require that when your system is in operation it cannot go down. An airborne surveillance and target acquisition system might be in operation for only eight hours over the forward edge of battle but it better be available every second that it’s there. Or a retail operation that does 90% of its business during a holiday season had better not go down during those few weeks or months. Each type of availability may demand very different requirements.
The first thing to keep in mind, though, is that defining availability depends on your needs in terms of Recovery Point and Recovery Time.
While the inherent reliability of Information Systems has been increasing, things still do happen that cause applications to stop. Disaster Recovery specialists tend to examine the impact possibilities in terms of Recovery Point -- the amount of “acceptable” loss -- and Recovery Time -- the amount of time needed to get back in operation. Recovery Point is most important in data-centric operations where the loss of data is unacceptable. Recovery Time is most important in transaction-centric operations where realtime continuity is key.
Do you need fast recovery, or recovery to the exact state prior to the failure… or both? What is the impact on your operations measured by a Recovery Point standard? If you don’t resume processing right where you left off will it be inconvenient? Damaging? Catastrophic? What is the most effective and efficient method to use to recover the information? What is the impact on business measured in Recovery Time? If you don’t resume processing within a second will it be inconvenient? Damaging? Catastrophic?
Thus the recovery strategy you use depends on this assessment of Recovery Point and Recovery Time. The diagram below displays four availability options measured in those terms.
Availability Options
Weeks
Electronic
Recovery Time
Machine Cycles
0 Transactions
Vaulting
24 x 365
Recovery Point
Remote Hot Sites
On-Line
Hot Backup
1000’s of Transactions
ECG064/1198
Loading...
+ 9 hidden pages