Hp COMPAQ PROLIANT 8000 NMI Crash Dump Feature

White Paper
January 2000 11N8-0100A-WWEN Prepared by: Industry Standard Server Division Compaq Computer Corporation
What Is NMI Crash Dump........... 3
Compaq Automatic Server
Recovery (ASR) ........................ 4
Initiating a NMI Crash Dump
Under NetWare............................ 6
Initiating a NMI Crash Dump Under SCO OpenServer 5
and SCO UnixWare 7.................. 6
How to Set Up NMI Crash
Dump Under Windows NT 4.0.... 7
Microsoft Software .................... 7
Compaq Software ..................... 8
How to Set Up NMI Crash
Dump Under Windows 2000 ...... 8
Microsoft Software (subject
to change) ................................. 8
Compaq Software (subject to
change) ..................................... 9
Where Is the Crash Dump
Button located? ........................ 10
ProLiant 8000.......................... 10
ProLiant 8500.......................... 11
Compaq ProLiant Server NMI Crash Dump Feature
Abstract: This document provides a description of the Compaq ProLiant Class Computer Systems implementation of NMI-based Crash Dump facilities. This facility can be beneficial to system administrators in their root cause failure analysis.
NMI Crash Dump allows customers to obtain critical diagnostic information in the event of system lock-ups and other failures. Both user-initiated and automatic crash dump support is presented.
<none>
Compaq ProLiant Server NMI Crash Dump Feature 2
Notice
THE INFORMATION IN THIS PUBLICATION IS SUBJECT TO CHANGE WITHOUT NOTICE AND IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND. THE ENTIRE RISK ARISING OUT OF THE USE OF THIS INFORMATION REMAINS WITH RECIPIENT. IN NO EVENT SHALL COMPAQ BE LIABLE FOR ANY DIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL, PUNITIVE, OR OTHER DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, OR LOSS OF BUSINESS INFORMATION), EVEN IF COMPAQ HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
The limited warranties for Compaq products are exclusively set forth in the documentation accompanying such products. Nothing herein should be construed as constituting a further or additional warranty.
This publication does not constitute an endorsement of the product or products that were tested. The configuration or configurations tested or described may or may not be the only available solution. This test is not a determination of product quality or correctness, nor does it ensure compliance with any federal, state, or local requirements.
Compaq, NonStop, Deskpro, Compaq Insight Manager, Systempro, Systempro/LT, ProLiant, ROMPaq, QVision, SmartStart, NetFlex, QuickFind, PaqFax, and ProSignia are registered with the United States Patent and Trademark Office.
ActiveAnswers, Netelligent, Systempro/XL, SoftPaq, Fastart, QuickBlank, and QuickLock are trademarks and/or service marks of Compaq Computer Corporation.
Microsoft, Windows, and Windows NT are trademarks and/or registered trademarks of Microsoft Corporation.
Intel, Pentium, and Xeon are trademarks and/or registered trademarks of Intel Corporation.
Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
© 2000 Compaq Computer Corporation. All rights reserved. Printed in the U.S.A.
Compaq ProLiant Server NMI Crash Dump Feature White Paper prepared by Industry Standard Server Division
First Edition (January 2000) Document Number 11N8-0100A-WWEN
11N8-0100A-WWEN
Compaq ProLiant Server NMI Crash Dump Feature 3

What Is NMI Crash Dump

Non-Maskable Interrupt (NMI) Crash Dump is a diagnostic mechanism. It allows for crash dump files to be created in situations when a system is hung and not able to respond to traditional debug mechanisms.
Crash dump analysis is an essential part of diagnosing reliability problems such as hangs in operating systems, device drivers, and applications. Many crashes will freeze a system in such a way that your only recourse is to do a hard reset-(i.e. power cycle the system). Since resetting the system erases any information that would support an analysis of the problem, it is desirable for the system to be able to perform a memory dump before a hard reset is performed. A dump switch and the associated support in Windows NT, Windows 2000, NetWare, SCO OpenServer 5, and SCO UnixWare 7 provide this function.
Figure 1: NMI Crash Dump Issue Resolution Events
The dump switch can be used to diagnose software failures by forcing the operating system to invoke the non-maskable interrupt (NMI) handler and generate a crash dump log. The crash dump log can provide critical information for root-cause analysis that may be difficult or impossible to obtain through other means
A user initiates a Non Maskable Interrupt (NMI) event by pressing the dump switch. The NMI can allow a hung system to become responsive enough to generate a crash dump log.
WARNING: Using the NMI Crash Dump switch on a functioning system (using any operating system) will cause the unit to abruptly fail. This is the designed functionality of the NMI Crash Dump Switch. Thus, is should never be used during normal operation.
Compaq has enhanced Automatic Server Recovery (ASR) handling for situations where an automated crash dump may be desired. ASR detects a lockup and normally provides a system reset. ASR can be configured to generate the Crash Dump NMI instead of a reset. This will allow for a crash dump file to be written prior to a system reset.
The Compaq health (systems management) driver must be loaded in order to properly differentiate an NMI caused by the dump switch from a PCI SERR NMI. The dump switch operates even if the health driver is not loaded.
The NMI Crash Dump switch may not work in all situations: after another NMI has already occurred in the system, when the OS crash handler is incapable of running properly, following some hardware failures, and when ASR is in progress, for example.
11N8-0100A-WWEN
Compaq ProLiant Server NMI Crash Dump Feature 4

Compaq Automatic Server Recovery (ASR)

ASR provides a means to recover from a lockup caused by either the software or the hardware. The feature consists of a hardware countdown timer and a software driver, which periodically refreshes the countdown timer. If the driver fails to report in and refresh the timer, the ASR hardware timer normally resets the machine.
When configured for NMI Crash Dump, in Windows NT 4.0, Windows 2000, SCO OpenServer 5 and SCO UnixWare 7 ASR initiates the NMI Crash Dump instead of resetting the machine
ASR does not cover cases where the operating system or applications may be executing in an indeterminate state. These cases would require the system administrator to intervene and initiate the NMI with the crash dump button.
Note: All Compaq systems require the Compaq health (systems management) driver to support ASR.
Compaq Sysmgmt Driver
(ASR Refresh)
Hardware Abstraction Layer
NMI
NMI Dump
Button
Figure 2: Simplified Block Diagram of NMI Crash Dump Architecture
Compaq Hardware
(NMI Tracker)
ASR Timer
Crash Dump/Mass
Storage Driver
Crash dump file
11N8-0100A-WWEN
Loading...
+ 7 hidden pages