Digital Equipment Corporation
Maynard, Massachusetts
Revised, July 1993
First Printing, December 1992
The information in this document is subject to change without notice and should not be construed
as a commitment by Digital Equipment Corporation.
Digital Equipment Corporation assumes no responsibility for any errors that may appear in this
document.
The software, if any, described in this document is furnished under a license and may be used or
copied only in accordance with the terms of such license. No responsibility is assumed for the use
or reliability of software or equipment that is not supplied by Digital Equipment Corporation or its
affiliated companies.
in preparing future documentation.
The following are trademarks of Digital Equipment Corporation: Alpha AXP, AXP, DEC, DECchip,
DECconnect, DECdirect, DECnet, DECserver, DEC VET, DESTA, MSCP, RRD40, ThinWire,
TMSCP, TU, UETP, ULTRIX, VAX, VAX DOCUMENT, VAXcluster, VMS, the AXP logo, and the
DIGITAL logo.
OSF/1 is a registered trademark of Open Software Foundation, Inc.
All other trademarks and registered trademarks are the property of their respective holders.
FCC NOTICE: The equipment described in this manual generates, uses, and may emit radio
frequency energy. The equipment has been type tested and found to comply with the limits for
a Class A computing device pursuant to Subpart J of Part 15 of FCC Rules, which are designed
to provide reasonable protection against such radio frequency interference when operated in a
commercial environment. Operation of this equipment in a residential area may cause interference,
in which case the user at his own expense may be required to take measures to correct the
interference.
This document was prepared using VAX DOCUMENT, Version 2.1.
This guide describes the procedures and tests used to service DEC 4000 AXP
systems.
Intended Audience
This guide is intended for use by Digital Equipment Corporation service personnel
and qualified self-maintenance customers.
Conventions
The following coventions are used in this guide.
ConventionMeaning
Return
Ctrl/xCtrl/x indicates that you hold down the Ctrl key while you
bold typeIn the online book (Bookreader), bold type in examples
lowercaseLowercase letters in commands indicate that commands can be
A key name enclosed in a box indicates that you press that key.
press another key, indicated here by x. In examples, this key
combination is enclosed in a box, for example,
indicates commands and other instructions that you enter
at the keyboard.
entered in uppercase or lowercase.
Ctrl/C
.
xiii
In some illustrations, small drawings of the DEC 4000 AXP
system appear in the left margin. Shaded areas help you locate
components on the front or back of the system.
WarningWarnings contain information to prevent personal injury.
CautionCautions provide information to prevent damage to equipment
[]
console command
abbreviations
boot
italic typeItalic type in console command sections indicates a variable.
< >In console mode online help, angle brackets enclose a
{ }In command descriptions, braces containing items separated by
or software.
In command format descriptions, brackets indicate optional
elements.
Console command abbreviations must be entered exactly as
shown.
Console and operating system commands are shown in this
special typeface.
placeholder for which you must specify a value.
commas imply mutually exclusive items.
xiv
1
System Maintenance Strategy
Any successful maintenance strategy is based on the proper understanding
and use of information services, service tools, service support and escalation
procedures, field feedback, and troubleshooting procedures. This chapter
describes the maintenance strategy for the DEC 4000 AXP system.
•Section 1.1 provides a diagnostic strategy you should use to troubleshoot a
DEC 4000 AXP system.
•Section 1.2 explains the service delivery methodology.
•Section 1.3 lists the product tools and utilities.
•Section 1.4 lists available information services.
•Section 1.5 describes field feedback procedures.
1.1 Troubleshooting the System
Before troubleshooting any system problem, check the site maintenance log for
the system’s service history. Be sure to ask the system manager the following
questions:
•Has the system been used before and did it work correctly?
•Have changes to hardware or updates to firmware or software been made to
the system recently?
•What is the state of the system—is the operating system up?
If the operating system is down and you are not able to bring it up, use the
console environment diagnostic tools, such as RBDs and LEDs.
If the operating system is up, use the operating system environment
diagnostic tools, such as error logs, crash dumps, DEC VET and UETP
exercisers, and other log files.
System Maintenance Strategy 1–1
System problems can be classified into the following five categories:
1. Power problems
2. Problems getting to the console
3. Failures reported by the console subsystem
4. Boot failures
5. Failures reported by the operating system
Using these categories, you can quickly determine a starting point for diagnosis
and eliminate the unlikely sources of the problem. Table 1–1 provides the
recommended tools or resources you should use to isolate problems in each
category.
Table 1–1 Recommended Troubleshooting Procedures
Description
1. Power Problems (Table 1–2)
Diagnostic
Tools/Resources Reference
No power at system
enclosure or trouble with
power supply subsystem, as
indicated by LEDs.
2. Problems Getting to Console Mode (Table 1–3)
System powers up, but
does not display power-up
screen.
Power supply
subsystem
LEDs
OCP LEDsRefer to Section 2.1.2 for information on
Console
terminal
troubleshooting
flow
Power-up
sequence
description
Robust mode
power-up
Refer to Section 2.1.1 for information on
interpreting power supply LEDs.
interpreting OCP LEDs.
Refer to Table 1–3 for information
on troubleshooting console terminal
problems.
Refer to Section 2.3 and 2.3.3 for a
description of the power-up and self-test
sequence.
Refer to Section 2.2.3 for a description of
robust mode power-up and its functions.
5. Failures Reported by the Operating System (Table 1–6)
Diagnostic
Tools/Resources Reference
Operating system generates
error logs; process hangs or
operating system crashes.
Error logsRefer to Chapter 4 for information on
Crash dumpRefer to OpenVMS AXP Alpha System
DEC VET or
UETP
Other log filesRefer to Chapter 4 for information on
interpreting error logs.
Dump Analyzer Utility Manual for
information on how to interpret
OpenVMS crash dump files.
Refer to the Guide to Kernel Debugging
(AA–PS2TA–TE) for information on
using the DEC OSF/1 Krash Utility.
Refer to Section 3.3 for a description
of DEC VET, and Section 3.4 for
information on running UETP software
exercisers.
using log files such as SETHOST.LOG
and OPERATOR.LOG to aid in
troubleshooting.
Use the following tables to identify the diagnostic flow for the five types of system
problems:
•Table 1–2 provides the diagnostic flow for power problems.
•Table 1–3 provides the diagnostic flow for problems getting to console mode.
•Table 1–4 provides the diagnostic flow for problems reported by the console
program.
•Table 1–5 provides the diagnostic flow for boot problems.
•Table 1–6 provides the diagnostic flow for errors reported by the operating
system.
1–4 System Maintenance Strategy
Table 1–2 Diagnostic Flow for Power Problems
SymptomActionReference
No AC power at system
as indicated by AC
present LED.
AC power is present, but
system does not power
on.
Check the power source and power cord.
Check the system AC circuit breaker
setting.
Check the DC on/off switch setting.
Examine power supply subsystem LEDs
to determine if a power supply unit
or fan has failed, or if the system has
shut down due to an overtemperature
condition.
Section 2.1.1
Table 1–3 Diagnostic Flow for Problems Getting to Console Mode
SymptomActionReference
Power-up screens (or
console event log) are
not displayed.
Check OCP LEDs for a failure during
self-tests. If two OCP LEDs remain lit,
either option could be at fault.
Check baud rate setting for console
terminal and system. The system default
baud rate setting is 9600.
Try connecting the console terminal to
the auxiliary console port.
Note: No console output is directed to
the auxiliary console port untill the
power-up self-tests have completed and
you press the Enter key or Ctrl/x.
For certain situations, power up under
robust mode to bypass the power-up
script and get to a low-level console.
From console mode, you can then edit the
nvram file, set and examine environment
variables, or initialize individual phases
of drivers.
Section 2.1.2
Section 6.5
Section 2.2.3
System Maintenance Strategy 1–5
Table 1–4 Diagnostic Flow for Problems Reported by the Console Program
SymptomActionReference
Power-up screens are
displayed, but tests do
not complete.
Console program reports
error.
Use power-up display and/or OCP LEDs
to determine error.
Examine the console event log to check
for embedded error messages recorded
during power-up.
If power-up screens indicate problems
with mass storage devices, use the
troubleshooting flow charts to determine
the problems.
Run RBD tests to verify problem.Section 3.1
Use the
examine error information contained
in serial control bus EEPROMs.
show error
command to
Section 2.2 and
Section 2.1.2
Section 2.2.1
Section 2.2.2
Section 3.1.4
Table 1–5 Diagnostic Flow for Boot Problems
SymptomActionReference
System cannot find boot
device.
Device does not boot.Run device test to check that boot device
Check system configuration for correct
device parameters (node ID, device name,
and so on) and environment variables
(bootdef_dev, boot_file, boot_osflags).
is operating.
Section 6.2.1,
Section 6.3, and
Section 6.4
Section 3.2
1–6 System Maintenance Strategy
Table 1–6 Diagnostic Flow for Errors Reported by the Operating System
SymptomActionReference
System is hung or has
crashed.
Operating system is up.Examine the operating system error log
Examine the crash dump file.Operating system
Use the
examine error information contained
in serial control bus EEPROMs (console
environment error log).
files to isolate the problem.
If the problem occurs intermittently, run
DEC VET or UETP to stress the system.
Examine other log files, such as
SETHOST.LOG, OPCOM.LOG, and
OPERATOR.LOG.
show error
command to
documentation
Section 3.1.4
Chapter 4
Section 3.3 and
Section 3.4
1.2 Service Delivery Methodology
Before beginning any maintenance operation, you should be familiar with the
following:
•The site agreement
•Your local and area geography support and escalation procedures
•Your Digital Services product delivery plan
System Maintenance Strategy 1–7
Service delivery methods are part of the service support and escalation
procedure. When appropriate, remote services should be part of the initial
system installation. Methods of service delivery include the following:
•Local support
•Remote call screening
•Remote diagnosis (using modem support)
Recommended System Installation
The recommended system installation includes:
1. Hardware installation and acceptance testing. Acceptance testing includes
running ROM-based diagnostics.
2. Software installation and acceptance testing. For example, using OpenVMS
Factory Installed Software (FIS), and then acceptance testing with DEC VET
or UETP.
3. Installation of the remote service tools and equipment to allow a Digital
Service Center to dial in to the system. Refer to your remote service delivery
strategy.
If you do not follow your service delivery methodology, you risk incurring
excessive service expenses for any product.
1.3 Product Service Tools and Utilities
This section lists the array of service tools and utilities available for acceptance
testing, diagnosis, and serviceability and provides recommendations for their use.
Error Handling/Logging
OpenVMS and DEC OSF/1 operating systems provide recovery from errors,
fault handling, and event logging. The OpenVMS Error Report Formatter
(ERF) provides bit-to-text translation of the event logs for interpretation.
DEC OSF/1 uses UERF to capture the same kinds of information.
RECOMMENDED USE: Analysis of error logs is the primary method of
diagnosis and fault isolation. If the system is up, or the customer allows the
service representative to bring the system up, look at this information first.
Refer to Chapter 4 for information on using error logs to isolate faults.
1–8 System Maintenance Strategy
ROM-Based Diagnostics (RBDs)
ROM-based diagnostics have significant advantages:
•There is no load time.
•The boot path is more reliable.
•Diagnosis is done in console mode.
RECOMMENDED USE: The ROM-based diagnostic facility is the primary
means of console environment testing and diagnosis of the CPU, memory,
Ethernet, Futurebus+, and SCSI and DSSI subsystems. Use ROM-based
diagnostics in the acceptance test procedures when you install a system,
add a memory module, or replace the following: CPU module, memory
module, backplane, I/O module, Futurebus+ device, or storage device. Refer
to Section 3.1 for information on running ROM-based diagnostics.
Loopback Tests
Internal and external loopback tests are used to isolate a failure by testing
segments of a particular control or data path. The loopback tests are a subset
of the ROM-based diagnostics.
RECOMMENDED USE: Use loopback tests to isolate problems with the
auxiliary console port and Ethernet controllers. Refer to Section 3.1.12 for
instructions on performing loopback tests.
Firmware Console Commands
Console commands are used to set and examine environment variables and
device parameters. For example, the
and
show device
set
(bootdef_dev, auto_action, and boot_osflags) commands are used to set
environment variables; and the
parameters.
RECOMMENDED USE: Use console commands to set and examine
environment variables and device parameters. Refer to Section 6.2 for
information on firmware commands and utilities.
commands are used to examine the configuration; the
show memory,show configuration
cdp
command is used to configure DSSI
System Maintenance Strategy 1–9
,
Option LEDs During Power-Up
The power supply LEDs display pass/fail test results for the power supply
subsystem; the operator control panel (OCP) LEDs display pass/fail self-test
results for CPU, memory, I/O, and Futurebus+ modules. Storage devices and
Futurebus+ modules have their own LEDs as well.
RECOMMENDED USE: Monitor LEDs during power-up to see if the devices
pass their self-tests. Refer to Chapter 2 for information on LEDs and powerup tests.
Operating System Exercisers (DEC VET or UETP)
The Digital Verifier and Exerciser Tool (DEC VET) is supported by the
OpenVMS and DEC OSF/1 operating systems. DEC VET performs exerciseroriented maintenance testing of both hardware and operating system. UETP
is included with OpenVMS and is designed to test whether the OpenVMS
operating system is installed correctly.
RECOMMENDED USE: Use DEC VET or UETP as part of acceptance testing
to ensure that the CPU, memory, disk, tape, file system, and network are
interacting properly. Also use DEC VET or UETP to stress test the user’s
environment and configuration by simulating system operation under heavy
loads to diagnose intermittent system failures.
Crash Dumps
For fatal errors, such as fatal bugchecks, OpenVMS and DEC OSF/1 operating
systems will save the contents of memory to a crash dump file.
RECOMMENDED USE: The support representative should analyze crash
dump files. To save a crash dump file for analysis, you need to know
proper system settings. Refer to the OpenVMS AXP Alpha System DumpAnalyzer Utility Manual or the Guide to Kernel Debugging (AA–PS2TA–TE)
for instructions.
Other Log Files
Several types of log files, such as operator log, console event log, sethost log,
and accounting file (accounting.dat) are useful in troubleshooting.
RECOMMENDED USE: Use the sethost log and other log files to
capture/examine the console output and compare with event logs and crash
dumps in order to see what the system was doing at the time of the error.
1–10 System Maintenance Strategy
1.4 Information Services
As a Digital service representative, you may access several information resources,
including advanced database applications, online training courses, and remote
diagnostic tools. A brief description of some of these resources follows.
Technical Information Management Architecture (TIMA)
TIMA is an online database that delivers technical and reference information
to service representatives. A key benefit of TIMA is the pooling of worldwide
knowledge and expertise.
DEC 4000 AXP Model 600 Series Information Set
The DEC 4000 AXP Model 600 Series Information Set consists of service
documentation that contains information on installing and using, servicing
and upgrading, and understanding the system. The guide you are reading
is part of the set. The hardcopy kit number is EK–KN430–DK. The set is
also available on TIMA. Refer to your DEC 4000 Model 600 Information Map
(EK–KN430–IN) for detailed information.
Training
Computer Based Training (CBT) and lecture lab courses are available from
the Digital training center:
•DEC 4000 System Installation and Troubleshooting (CBT course, EY–
Digital Services Product Delivery Plan (Hardware or Software)
The Product Delivery Plan documents Digital Services’ delivery commitments.
The plan is the communications vehicle used among the various groups
responsible for ensuring consistency between Digital Services’ delivery
strategies and engineering product strategies.
Blitzes
Technical updates are ‘‘blitzed’’ to the field using online mail and TIMA.
System Maintenance Strategy 1–11
Storage and Retrieval System (STARS)
STARS is a worldwide database for storing and retrieving technical
information. The STARS databases, which contain more than 150,000 entries,
are updated daily.
Using STARS, you can quickly retrieve the most up-to-date technical
information via DSNlink or DSIN.
1.5 Field Feedback
Providing the proper feedback to the corporation is essential in closing the loop
on any service call. Consider the following when completing a service call:
•Fill out repair tags accurately and with as much symptom information as
possible so that repair centers can fix a problem.
•Provide accurate call closeout information for Labor Activity Reporting
System (LARS) or Call-Handling and Management Planning (CHAMP).
•Keep an up-to-date site maintenance log, whether hardcopy or electronic, to
provide a record of the performed maintenance.
1–12 System Maintenance Strategy
2
Power-On Diagnostics and System
LEDs
This chapter provides information on how to interpret system LEDs and the
power-up console screens. In addition, a description of the power-up and
bootstrap sequence is provided as a resource to aid in troubleshooting.
•Section 2.1 describes how to interpret system LEDs.
•Section 2.2 describes how to interpret the power-up screens.
•Section 2.3 describes the power-up sequence.
•Section 2.3.3 describes power-on self-tests.
•Section 2.4 describes the boot sequence.
2.1 Interpreting System LEDs
DEC 4000 AXP systems have several diagnostic LEDs that indicate whether
modules and subsystems have passed self-tests. The power system controller
constantly monitors the power supply subsystem and can indicate several types
of failures. The system LEDs are used primarily to troubleshoot power problems
and problems getting to the console program.
This section describes the function of each of the following types of system LEDs,
and what action to take when a failure is indicated.
•Power supply LEDs
•Operator control panel (OCP) LEDs
•I/O panel LEDs
•Futurebus+ option LEDs
•Storage device LEDs
Power-On Diagnostics and System LEDs 2–1
2.1.1 Power Supply LEDs
The power supply LEDs (Figure 2–1) are used to indicate the status of the
components that make up the power supply subsystem. The following types of
failures will cause the power system controller to shut down the system:
•Power system controller (PSC) failure
•Fan failure
•Overtemperature condition
•Power regulator failures (indicated by the DC3 or DC5 failure LEDs)
•Front end unit (FEU) failure
Note
The AC circuit breaker will also shut down the system. If a power surge
occurs, the breaker will trip, causing the switch to return to the off
position (0). If the circuit breaker trips, wait 30 seconds before setting the
switch to the on position (1).
Refer to Table 2–1 for information on interpreting the LEDs and determining
what actions to take when a failure is indicated.
Figure 2–2 shows the local disk converter (LDC) and fan locations as they
correspond to the fault ID display.
2–2 Power-On Diagnostics and System LEDs
Figure 2–1 Power Supply LEDs
PSCDC3FEUDC5
MO
SI
SO
AC Circuit
Breaker
FEU Failure
FEU OK
DC3 Failure
DC3 OK
DC5 Failure
DC5 OK
PSC Failure
PSC OK
Over
Overtemperature
Shutdown
Fan Failure
Disk Power Failure
Fault ID Display
AC Present
LJ-02011-TI0
Power-On Diagnostics and System LEDs 2–3
Table 2–1 Interpreting Power Supply LEDs
IndicatorMeaningAction on Error
Front End Unit (FEU)
AC PresentWhen lit, indicates AC power
is present at the AC input
connector (regardless of circuit
breaker position).
FEU OKWhen lit, indicates DC output
voltages for the FEU are above
the specified minimum.
FEU FailureWhen lit, indicates DC output
voltages for the FEU are less
than the specified minimum.
If AC power is not present, check
the power source and power cord.
If the system will not power up and
the AC LED is the only lit LED,
check if the system AC circuit
breaker has tripped. Replace the
front end unit (Chapter 5) if the
system circuit breaker is broken.
Replace front end unit (Chapter 5).
(continued on next page)
2–4 Power-On Diagnostics and System LEDs
Table 2–1 (Cont.) Interpreting Power Supply LEDs
IndicatorMeaningAction on Error
Power System Controller (PSC)
PSC OKWhen blinking, indicates the
PSC FailureWhen lit, indicates the PSC has
Disk Power
Failure
Fan FailureWhen lit, indicates a fan has
Overtemperature
Shutdown
PSC is performing power-up
self-tests.
When steady, indicates the PSC
is functioning normally.
detected a fault in itself.
When lit, indicates a disk
power problem for the storage
compartment specified in the
hexadecimal fault ID display.
The most likely failing unit is
the local disk converter, but a
shorting cable or drive could also
be at fault.
failed or a cable guide is not
properly secured. The failure is
identified by a number displayed
in the hexadecimal fault ID
display.
When lit, indicates the PSC has
shut down the system due to
excessive internal temperature.
Replace power system controller
(Chapter 5).
To isolate the local disk converter,
disconnect the drives on the
specified bus and then power
up the system. If the Disk Power
Failure LED lights with the drives
disconnected, replace the failing
local disk converter (Chapter 5).
Refer to Figure 2–2 to locate the
local disk converter specified by
the fault ID display. A is the top
compartment, D is the bottom
compartment.
Refer to Figure 2–2 to locate the
failure specified by the fault ID
display.
Replace the failing fan (Chapter 5).
Set the AC circuit breaker to off (0)
and wait one minute before turning
on the system.
Make sure the air intake is
unobstructed and that the room
temperature does not exceed
maximum requirement as
described in the DEC 4000 SitePreparation Checklist.
(continued on next page)
Power-On Diagnostics and System LEDs 2–5
Table 2–1 (Cont.) Interpreting Power Supply LEDs
IndicatorMeaningAction on Error
DC–DC Converter (DC3)
DC3 OKWhen lit, indicates that all the
DC3 output voltages are within
specified tolerances.
DC3 FailureWhen lit, indicates that one of
the output voltages is outside
Replace the DC3 converter
(Chapter 5).
specified tolerances.
DC–DC Converter (DC5)
DC5 OKWhen lit, indicates the DC5
output voltage is within specified
tolerances.
DC5 FailureWhen lit, indicates the DC5
output voltage is outside
Replace the DC5 converter
(Chapter 5).
specified tolerances.
Figure 2–2 LDC and Fan Unit Locations and Error Codes
Fan Error Codes
Local Disk
Converter A
Local Disk
Converter B
Local Disk
Converter C
Local Disk
Converter D
Fan 3Fan 4Fan 1
3
1 - Rear left
2 - Rear right
3 - Front left
4 - Front right
9 - A cable guide is not
properly secured or
two or more fans have
failed.
4
Fans are located
behind the cable guides
Fan 2
1
2
MLO-010872
2–6 Power-On Diagnostics and System LEDs
Loading...
+ 212 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.