This manual is for anyone who services the DIGITAL Server 5300 systems. It
includes troubleshooting information, configuration rules, and instructions for removal
and replacement of field-replaceable units.
Digital Equipment Corporation
Maynard, Massachusetts
Page 2
February 1998
Digital Equipment Corporation makes no representations that the use of its products in the manner described in this
publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication
imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.
Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid
written license from DIGITAL or an authorized sublicensor.
The following are trademarks of Digital Equipment Corporation: Alpha, DIGITAL, RRD46, StorageWorks, and
the DIGITAL logo.
The following are third-party trademarks:
Adobe and PostScript are registered trademarks of Adobe Systems, Incorporated.
Helvetica and Times are registered trademarks of Linotype Co.
Microsoft and MS-DOS are registered trademarks and Windows is a trademark of Microsoft Corporation.
Page 3
Table of Contents
1 System Overview
System Enclosure....................................................................................................................1–2
This manual is written for the customer service engineer.
Document Structure
This manual uses a structured documentation design. Topics are organized into small
sections for efficient reference. Each topic begins with an abstract, followed by an
illustration or example, and ends with descriptive text. This manual has eight chapters, as
follows:
Chapter 1, System Overview, introduces the DIGITAL Server 5300 system. It
describes each system component.
Chapter 2, Power-Up, provides information on how to interpret the power-up display
on the operator control panel, the console screen, and system LEDs. It also
describes how hardware diagnostics execute when the system is initialized.
Chapter 3, Troubleshooting, describes troubleshooting during power-up and booting,
as well as the test command.
Chapter 4, Error Registers, describes the error registers used to hold error
information.
Chapter 5, Removal and Replacement, describes removal and replacement
procedures for field-replaceable units (FRUs).
Chapter 6, Running Utilities, explains how to run utilities such as the EISA
Configuration Utility and RAID Standalone Configuration Utility.
Chapter 7, Halts, Console Commands, and Environment Variables, summarizes the
commands used to examine and alter the system configuration.
Chapter 8, Managing the System Remotely, describes how to use the Remote
Console Manager (RCM) to monitor and control the system remotely.
Preface
xi
Page 11
Documentation Titles
The following table lists other books in the documentation set.
System Documentation
TitleOrder Number
User and Installation Documentation Kit
DIGITAL Server 5300 User’s Guide
DIGITAL Server 5300 Basic Installation
Information on the Internet
Access the latest system firmware with a Web browser as follows:
http://www.windowsnt.digital.com/
QC–06CAB–H8
ER−K8FWW−UA
ER–K8FWW–IM
xii
Page 12
1
System Overview
The DIGITAL Server 5300 system base unit consists of up to two CPUs, up to 2 Gbytes of
memory, 6 I/O slots, and up to 7 SCSI storage devices. The system is enclosed in a
pedestal. DIGITAL Server 5300 systems can also be mounted in a standard 19” rack.
The DIGITAL Server 5300 system supports the Windows NT operating system.
Topics in this chapter include the following:
• System Enclosure
• Operator Control Panel and Drives
• System Consoles
• System Architecture
• CPU Types
• Memory
• Memory Addressing
• System Motherboard
• System Bus Backplane
• System Bus to PCI Bus Bridge
• PCI I/O Subsystem
• Remote Control Logic
• Power Control Logic
• Power Circuit and Cover Interlock
• Power Supply
• Power Up/Down Sequence
• Maintenance Bus (I
• StorageWorks Drives
2
C Bus)
DIGITAL Server 5300 1–1
Page 13
System Overview
System Enclosure
The system has up to two CPU modules and up to 2 Gbytes of memory. A
single fast wide or fast wide Ultra SCSI StorageWorks shelf provides
storage.
Figure 1-1 System Enclosure
4
1
2
5
PKW-0500-97
3
6
The numbered callouts in Figure 1-1 refer to the system components.
System card cage, which holds the system motherboard and the CPU, memory, and
system I/O.
PCI/EISA section of the system card cage.
Operator control panel assembly, which includes the control panel, the LCD display,
and the floppy drive.
CD-ROM drive.
Cooling section containing two fans.
StorageWorks shelf.
1–2
DIGITAL Server 5300
Page 14
Cover Interlock
The system has a single cover interlock switch tripped by the top cover. To override the
cover interlock, use a suitable object to close the interlock circuit. Disk damage willresult if the system is run with the top cover off.
The control panel includes the On/Off, Halt, and Reset buttons and an LCD
display.
Figure 1-3 Control Panel Assembly
CD-ROM
Floppy
OC P D isplay
1
2
3
PKW-0501-97
OCP display. The OCP display is a 16-character LCD that indicates status during powerup and self-test. While the operating system is running, the LCD displays the system type.
Its controller is on the XBUS.
CD-ROM. The CD-ROM drive is used to load software, firmware, and updates. Its
controller is on PCI1 on the PCI backplane on the system motherboard.
Floppy disk. The floppy drive is used to load software. The floppy controller is on the
XBUS on the PCI backplane on the system motherboard.
1–4
DIGITAL Server 5300
Page 16
System Overview
On/Off button. Powers the system on or off. When the LED to the right of the button is
lit, the power is on. The On/Off button is connected to the power supplies through the
system interlock and the RCM logic.
Reset button. Initializes the system.
Halt button. When the halt button is pressed, different results are manifest depending
upon the state of the machine.
To get to the SRM console, press the Halt button and then press the Reset button.
(Pressing the Halt button when the system is running Windows NT causes a “halt
assertion” flag to be set in the firmware. When Reset is pressed the console reads the “halt
assertion” flag and ignores environment variables that would cause the system to boot.)
Function of the Halt button is complex because it depends upon the state of the machine
when the button is pressed. See “Halt Button Functions” in Chapter 7 for a full discussion
of the Halt button.
DIGITAL Server 5300
1–5
Page 17
System Overview
gsy
g
System Consoles
There are two console programs: the SRM console and the AlphaBIOS
console.
NOTE: The console prompt displays only after the entire power-up sequence is
complete. This can take up to several minutes if the memory is very large.
On systems running the Windows NT operating system, the Boot menu is displayed when
the AlphaBIOS console is invoked (see Figure 1-4).
Figure 1-4 AlphaBIOS Boot Menu
Please select the operatin
A lph a B IO S 5.3 2
stem to start:
W indow s NT S erver 4 .0
Use and to move th e hi
Press Enter to ch oose.
diigtal
SRM Console
The SRM console is a command-line interface that provides support for examining and
modifying the system state and configuring and testing the system. The SRM console can
be run from a serial terminal or a graphics monitor. The following console prompt is
displayed whenever the SRM console is invoked:
P00>>>
1–6
DIGITAL Server 5300
hlight toyour ch oice.
D IG ITAL Server 5300
Press <F2> to enter SETU P
ML014366
Page 18
AlphaBIOS Console
The AlphaBIOS console is a menu-based interface that supports the Microsoft Windows
NT operating system. AlphaBIOS is used to set up operating system selections, boot
Windows NT, and display information about the system configuration. The EISA
Configuration Utility and the RAID Standalone Configuration Utility are run from the
AlphaBIOS console. AlphaBIOS runs on either a serial or graphics terminal. Windows
NT requires a graphics monitor.
Environment Variables
Environment variables are software parameters that define, among other things, the system
configuration. They are used to pass information to different pieces of software running in
the system at various times.
Refer to Chapter 7 of this guide for a list of the environment variables used to configure a
system.
Refer to your system User’s Guide for information on setting environment variables.
Most environment variables are stored in the NVRAM that is placed in a socket on the
system motherboard. Even though the NVRAM can be removed and replaced on a new
system motherboard, it is recommended that you keep a record of the environment
variables for each system that you service. Some environment variable settings are lost
when a module is swapped and must be restored after the new module is installed. Refer
to Chapter 7 for a convenient worksheet for recording environment variable settings.
System Overview
DIGITAL Server 5300
1–7
Page 19
System Overview
System Architecture
Alpha microprocessor chips are used in these systems. The CPU, memory,
and the I/O modules are connected to the system motherboard.
Figure 1-5 Architecture Diagram
Xceivers
´
EISA
Bridge
EISA
Bus
XBUS
CPU
Memory
Pair
System B us
128-Bit D ata B us + 16 E CC a nd 40-B it Co mm and /Address Bus
PCI Bus 0
64 Bits
PCI Slot
System to
PCI Bus
Bridge 0
IOD0
System
System to
PCI Bus
Bridge 1
IO D1
Motherbo ard
PCI Slot
No te : W he n the EIS A /ISA slot
PCI Slot
on PCI Bus 0 is used, the last
PCI slot on PCI Bus 1 is not
available.
EISA Slot
Real-Tim e
Clock
Combo I/O:
serial ports
parallel port
floppy cntrl
XBUS
Mouse/
Keyboard
I2C Bu s
Interface
BDATA
Xceivers
NVRAM
8Kx8
Flash
ROM
2MB
PCI Bus 1
64 Bits
PCI Slot
PCI Slot
PCI Slot
PKW 0502-97
1–8
DIGITAL Server 5300
Page 20
System Overview
Both systems use the Alpha chip for the CPU. The CPU, memory, and I/O devices
connect to the system motherboard. On the system motherboard is:
• The system bus
• Two system bus to PCI bus chip sets that bridge two PCI buses to the system bus
• Two 64-bit PCI buses with three PCI options slots each (5 64-bit PCI slots; 1 32-bit
PCI slot)
• One EISA/ISA bus bridged to one of the PCIs (If an EISA/ISA option is used, one PCI
slot cannot be used)
• One CD-ROM controller built in to the other PCI
• One EISA/ISA to XBUS bridge to the built-in XBUS options
A fully configured system can have two CPUs, eight DIMM memory pairs, and a total of
six I/O options. The I/O options can be all PCI options or a combination of PCI options
and a single EISA/ISA option.
The system bus has a 144-bit data bus, protected by 16 bits of ECC, and a 40-bit
command/address bus, protected by parity. The bus speed is set to 66.6 MHz. The 40-bit
address bus can create one terabyte of addresses (that’s a million million). The bus
connects CPUs, memory, and the system bus to PCI bus bridge(s).
There is a cache external to the CPU chip on CPU modules. The Alpha chip has an 8Kbyte instruction cache (I-cache), an 8-Kbyte write-through data cache (D-cache), and a
96-Kbyte, write-back secondary data cache (S-cache). The cache system is write-back.
The system supports up to two CPUs.
Memory on these systems is constructed of DIMM memory pairs placed onto two memory
modules called riser cards. The riser cards are placed into the two memory slots on the
system motherboard. One member of a DIMM pair is placed onto one riser card, and the
other member is placed onto another riser card. Each riser card drives half of the system
bus, along with the associated ECC bits. Memory pairs consist of two synchronous
DIMMs of the same size and are placed into the same slot on each riser card.
The system bus-to-PCI bus bridge chip set translates system bus commands and data
addressed to I/O space to PCI commands and data. It also translates PCI bus commands
and data addressed to system memory or CPUs to system bus commands and data. The
PCI bus is a 64-bit wide bus used for I/O.
Logic and sensors on the system motherboard monitor power status and the system
environment (temperature and fan speeds).
DIGITAL Server 5300
1–9
Page 21
System Overview
CPU Types
There are several CPU variants differentiated by CPU speeds.
Figure 1-6 CPU Module Placement
Bulkhead
connectors
PCI 0 Slot2
PC I 0 Slot 3
PC I 0 Slot 4
PCI 1 Slot2
PC I 1 Slot 3
PC I 1 Slot 4
EISA/ISA Slot
RCM
Sw itchpack
Power connectors
Floppy
connector
OCP
conne ctor
Fan
connectors
CPU 0
MEM L
CPU 1
MEM H
LEDs
PCI Bridges
Internal SC SI
connector
RCM power-down
connector
Speaker
connector
PKW0504A-97
1–10
DIGITAL Server 5300
Page 22
Alpha Chip Composition
The Alpha chip is made using state-of-the-art chip technology, has a transistor count of 9.3
million, consumes 50 watts of power, and is air cooled (a fan is on the chip). The default
cache system is write-back and when the module has an external cache, it is write-back.
The Alpha chip used in these systems is the 21164.
The following rules should be applied to CPU configuration:
• The first CPU must be in CPU slot 0 to provide the system clock.
• The second CPU should be installed in CPU slot 1.
• Both CPUs must have the same Alpha chip clock speed. The system bus may hang
without an error message if the oscillators clocking the CPUs are different.
Onboard
Cache
Color
DIGITAL Server 5300
1–11
Page 23
System Overview
Memory
Memory consists of two riser cards and up to eight pairs of DIMMs. Each
riser card receives one of the two DIMMs in the DIMM pair. There are two
DIMM variants: a 32-Mbyte version and a 128-Mbyte version.
Figure 1-7 Memory Placement
Bulkhead
connectors
PCI 0 Slot2
PC I 0 Slot 3
PC I 0 Slot 4
PCI 1 Slot2
PC I 1 Slot 3
PC I 1 Slot 4
EISA/ISA Slot
RCM
Sw itchpack
Power connectors
Floppy
connector
OCP
conne ctor
Fan
connectors
CPU 0
MEM L
CPU 1
MEM H
LEDs
PCI Bridges
Internal SC SI
connector
RCM power-down
connector
Speaker
connector
PKW0504B-97
1–12
DIGITAL Server 5300
Page 24
Memory Variants
Memory consists of two riser cards supporting eight DIMM pairs. There are two DIMM
variants: a 32-Mbyte version and a 128-Mbyte version. Maximum memory using 32Mbyte DIMMs is 128 Mbytes and the maximum memory using 128-Mbyte DIMMs is 2
Gbytes. All memory is synchronous.
Table 1-3 Memory Variants
OptionSizeModuleTypeDRAM No.Size
MS300-BA64 MB54-25084-DASynch.184M x 72 = 32MB
MS300-DA256 MB54-25092-DASynch.1816M x 72 = 128MB
Memory Operation
Memory drives the system bus in bursts. Upon each memory fetch, data is transferred in 4
consecutive cycles transferring 64 bytes. Each DIMM in the pair provides half the data, or
64 bits plus 8 ECC bits, of the octaword (16 byte) transferred on the system bus. DIMMs
are placed in slots on the riser cards, which are placed in the slots designated MEM L and
MEM H on the system motherboard.
Memory in slot MEM L does not drive the lower 8 bytes, and memory in slot
MEM H does not drive the higher 8 bytes of the 16-byte transfer. Some bits
originating from MEM L are high order bits, and some bits originating from
MEM H are low order bits.
In a system, memories of different sizes are permitted, but:
• DIMMs are installed and used in pairs. Both DIMMs in a memory pair must be of
the same size.
• Each riser card receives one DIMM of the DIMM pair.
• The largest DIMM pair must be in riser card slot 0.
• Other memory pairs must be the same size or smaller than the first memory pair.
DIGITAL Server 5300
1–13
Page 25
System Overview
• Memory pairs must be installed in consecutive slots.
• Memory configurations that have a 64-Mbyte pair in riser card slot 0 are limited to
two DIMM pairs or 128 Mbytes for the system. (The reason for this restriction is that
the bit map describing memory holes can grow larger than physical memory.)
Memory Addressing
Memory addressing in these systems is fixed regardless of the size of the
DIMMs. The address of a DIMM pair is fixed according to the slot in which
the pair is placed. The starting address of each pair in each slot on the riser
card starts on a 512-Mbyte boundary.
Figure 1-8 How Memory Addressing Is Calculated
Address Spa ce
Gbytes
4.0
Riser Card
Slot
3.5
3.0
2.5
2.0
1.5
1.0
.5
0
1–14
DIGITAL Server 5300
e0000000
c0000000
a0000000
80000000
60000000
40000000
20000000
00000000
7
6
5
4
3
2
1
0
PKW 0505 -97
Page 26
System Overview
The rules for addressing memory are as follows:
1. A memory pair consists of two DIMMs of the same size.
2. Memory pairs in riser cards may be of different sizes.
3. The memory pair in slot 0 must be the largest of all memory pairs. Other memory
pairs may be as large but none may be larger.
4. The physical starting address of each memory pair is N times 512 Mbytes (2000000)
where N is the slot number on the riser card.
5. Memory addresses are contiguous within each memory pair.
6. If memory pairs do not completely fill the 512-Mbyte space provided, memory
“holes” occur in the physical address space.
7. Software creates contiguous virtual memory even though physical memory may not be
contiguous.
DIGITAL Server 5300
1–15
Page 27
System Overview
System Motherboard
The system motherboard contains five major logic sections performing five
major system functions.
• The PCI backplane containing two PCI buses, an EISA/ISA bus, a built-in CD-ROM
controller, and an XBUS with several devices integral to the system.
DIGITAL Server 5300
1–17
Page 29
System Overview
System Bus (Backplane)
The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus,
and several control signals and clocks. The system bus is part of the system motherboard.
Figure 1-10 System Bus Block Diagram
SYNC
DRAMS
CPU1
CPU0
P CI/I SA
PCI/ISA0
PCI1
MEM0
ADR
DATA
CTRL
A
L
P
H
A
CTRL
EV_ADR
EV_DATA
MC to PCI B ridge
IOD0
IOD1
SIM_ADR
MEM CTRL&
CNTRL ARB
ROW
COL
ADR
MC Bus
Control
MC ADR
<39:4>
MC DATA
<127:0>
PKW 0506-97
1–18
DIGITAL Server 5300
Page 30
System Overview
The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus,
and several control signals, clocks, and a bus arbiter. The bus requires that all CPUs have
the same high-speed oscillator providing the clock to the Alpha chip.
The system bus connects up to two CPUs, up to eight DIMM memory pairs on two riser
cards, and two I/O bus bridges.
The system bus clock is provided by an oscillator on the CPU in slot CPU0. This
oscillator is adjusted to maintain the system bus at a 66 MHz speed no matter what the
speed of the CPU is.
The system bus backplane initiates memory refresh transactions.
Five volt, 3.43 volt, and 12 volt power is provided directly to the motherboard from the
power supplies.
DIGITAL Server 5300
1–19
Page 31
System Overview
System Bus to PCI Bus Bridge
The bridge is the physical interconnect between the system bus and the PCI bus.
Figure 1-11 System Bus to PCI Bus Bridge Block Diagram
System Bus
Control
Address
ECC & Data
<63:0>
ECC & Data
<127:64>
Control
CAP
MDPA
MDPB
PCI Bus
AD<31:0>
Data A
to B bus
Data A to B &
BtoAbus
AD<63:32>
PKW0507-97
1–20
DIGITAL Server 5300
Page 32
System Overview
The system bus to PCI bus bridge module converts system bus commands and data
addressed to I/O space to PCI commands and data; and converts PCI bus commands and
data addressed to system memory or CPUs to system bus commands and data.
The bridge has two major components:
• Command/address processor (CAP) chip
• Two data path chips (MDPA and MDPB)
There are two sets of these three chips, one set for each PCI.
The interface on the system bus side of the bridge responds to system bus commands
addressed to the upper 64 Gbytes of I/O space. I/O space is addressed whenever bit <39>
on the system bus address lines is set. The space so defined is 512 Gbytes in size. The
first 448 Gbytes are reserved and the last 64 Gbytes, when bits <38:36> are set, are
mapped to the PCI I/O buses.
The interface on the PCI side of the bridge responds to commands addressed to CPUs and
memory on the system bus. On the PCI side, the bridge provides the interface to the PCIs.
Each PCI bus is addressed separately. The bridge does not respond to devices
communicating with each other on the same PCI bus. However, should a device on one
PCI address a device on the other PCI bus, commands, addresses, and data run through the
bridge out onto the system bus and back through the bridge to the other PCI bus.
In addition to its bridge function, the system bus to PCI bus bridge module monitors every
transaction on the system bus for errors. It monitors the data lines for ECC errors and the
command/address lines for parity errors.
DIGITAL Server 5300
1–21
Page 33
System Overview
PCI I/O Subsystem
The I/O subsystem consists of two 64-bit PCI buses. One has an embedded EISA/ISA
bridge and three PCI option slots; the other has a built-in CD-ROM driver and three PCI
option slots.
The logic for two PCI buses is on each PCI motherboard.
• PCI0 is a 64-bit bus with a built-in PCI to EISA/ISA bus bridge. PCI0 has three PCI
slots and one EISA/ISA slot. When the EISA/ISA slot is used, PCI slot 4 on PCI bus
1 is not available. An 8-bit XBUS is connected to the EISA/ISA bus. On this bus
there is an interface to the system I2 C bus; mouse and keyboard support; an I/O
combo controller supporting two serial ports, the floppy controller, and a parallel port;
a real-time clock; two 1-Mbyte flash ROMs containing system firmware, and an 8Kbyte NVRAM.
• PCI1 is a 64-bit bus with a built-in CD-ROM SCSI controller with three PCI slots.
Cable connectors to the CD-ROM, the floppy, and the OCP are on the motherboard.
Connectors for the mouse, keyboard, two COM ports, the serial port, and a modem are on
the system bulkhead. The bulkhead is part of the system motherboard.
DIGITAL Server 5300
1–23
Page 35
System Overview
Remote Control Logic
A section of the motherboard provides remote control operation of the system. A fourswitch switchpack enables or disables remote control features.
Figure 1-13 Remote Control Logic
RCM
Switchpack
System M otherboard
SET DEF
RPD DIS
MODEM OFF
EN RCM
4
3
2
1
RCM power
VAU X fro m
power supplie s
PKW 0504C-97
1–24
DIGITAL Server 5300
Page 36
System Overview
The system allows both local and remote control. A set of switches enables or disables
remote control.
Table 1-5 Remote Control Switch Functions
SwitchCondition Function
1 EN RCMOn (default)Allows remote system control
OffDoes not allow remote system control
2 Modem OffOnDisables the RCM modem port
Off (default)Enable the RCM modem port
3 RPD DISOnDisables remote power down
Off (default)Enables remote power down
4 SET DEFOnResets the RCM microprocessor defaults
Off (default)Allows use of conditions set by the user
The default settings allow complete remote control. The user would have to change the
switch settings to any other desired control.
See Chapter 8 for information on controlling the system remotely.
The remote console manager connects to a modem through the modem port on the
bulkhead. The RCM uses VAUX power provided by the system power supplies.
The standard I/O ports (keyboard, mouse, COM1 and COM2 serial ports, and parallel
ports) are on the same bulkhead.
DIGITAL Server 5300
1–25
Page 37
System Overview
Power Control Logic
The power control section of the motherboard controls power sequencing and monitors
power supply voltage, system temperature, and fans.
Figure 1-14 Power Control Logic
Pow e r
control
logic
System Motherbo ard
1–26
DIGITAL Server 5300
PK W 0 50 4D -97
Page 38
System Overview
The power control logic performs these functions:
• Monitors system temperature and powers down the system 30 seconds after it detects
that internal temperature of the system is above the value of the environment variable
over_temp. Default = 550 C.
• Monitors the system and CPU fans at one second intervals and powers down the
system 30 seconds after it detects a fan failure.
• Provides some visual indication of faults through LEDs.
• Controls reset sequencing.
• Provides I
• Power supply 0, 1: present
• Power supply 0, 1: power OK
• CPU fan 0, 1: OK
• CPU 1: present
• Overtemp: Temp OK
• System fan 0, 1: OK
2
C interface for fans, power supplies, and temperature signals:
• Fan Kit OK
DIGITAL Server 5300
1–27
Page 39
System Overview
Power Circuit and Cover Interlock
Power is distributed throughout the system and mechanically can be broken
by the On/Off switch, the cover interlock, or remotely through the RCM.
Figure 1-15 Power Circuit Diagram
Power Supply
J30
Switch
pack
Motherboard
J7
J2
Cover
Int e rlo ck
Push button
ON/OFF
OCP
DC _EN AB LE_L
PKW 0503A-97
1–28
DIGITAL Server 5300
Page 40
System Overview
Figure 1-14 shows the distribution of power throughout the system. Opens in the circuit or
the RCM signal RCM_DC_EN_L, or a power supply detected power fault interrupt DC
power applied to the system. The opens can be caused by the On/Off button or the cover
interlock.
A failure anywhere in the circuit will result in the removal of DC power. A potential
failure is the relay used in the remote control logic to control the RCM_DC_EN_L signal.
The cover interlock is located under the top cover between the system card cage and the
storage area. To override the interlock, place a suitable object in the interlock switch that
closes it.
DIGITAL Server 5300
1–29
Page 41
System Overview
Power Supply
Two power supplies provide system power.
Figure 1-16 Back of Power Supply and Location
C urren t
share
Power
Supply 1
Power
Supply 0
+5V/Return
+12V/Return
1–30
DIGITAL Server 5300
Misc.
Signal
+5V/Return
+3 .4 V/Retur n
PKW0513-97
Page 42
Description
Two power supplies each provide 450 W to the system. Redundant power is not available
at this time.
Power Supply Features
• 88–132 and 176–264 Vrms AC input
• 450 watts output. Output voltages are as follows:
Output VoltageMin. VoltageMax. VoltageMax. Current
+5.0V is sensed on the system motherboard.
+3.43V is sensed on all CPUs in the system and the system bus motherboard.
• Current share on +5.0V, +3.43V, and +12V.
• 1 % regulation on +3.43V.
• Fault protection (latched). If a fault is detected by the power supply, it will shut
down. The power supply faults detected are:
Fan Failure
Over-voltage
Overcurrent
Power overload
• DC_ENABLE_L input signal starts the DC outputs.
• SHUTDOWN_H input signal shuts the power supply off in case of a system fan or
CPU fan failure.
• POK_H output signal indicates that the power supply is operating properly.
DIGITAL Server 5300
1–31
Page 43
System Overview
Power Up/Down Sequence
System power can be controlled manually by the On/Off button on the OCP or remotely
through the RCM. The power-up/down sequence flow is shown below.
Figure 1-17 Power Up/Down Sequence Flowchart
Apply AC
Pow e r
Vaux on
Off
On-Off
Bu tton
Assert
SHUTDOWN
30 Second
Delay
On
Disable Ou tputs
Deassert POK
DC_ENABLE_L
Power Supply
Yes
No
Fan/Temp
On-O ff
Button
Assert
Starts
Any
Faults
Assert
POK
OK
On
No
Yes
Off
On-Off
Button
On
Off
PKW-0513A-97
1–32
DIGITAL Server 5300
Page 44
System Overview
When AC is applied to the system, Vaux (auxiliary voltage) is asserted and is sensed by
the power control logic (PCL) section of the motherboard if the On-Off Button is On. The
PCL asserts DC_ENABLE_L starting the power supplies. If there is a hard fault on
power-up, the power supplies shut down immediately; otherwise, the power system powers
up and remains up until the system is shut off or the PCL senses a fault. If a power fault is
sensed, the signal SHUTDOWN is asserted after a 30 second delay. Cycling the On-Off
button can restore the power.
DIGITAL Server 5300
1–33
Page 45
System Overview
Maintenance Bus (I2C Bus)
The IC bus (referred to as the “I squared C bus”) is a small internal
maintenanc e bus used to monitor sy stem conditions scanne d by the power
control logic, write the fault display, store error state, and track
configuration information in the system. Although all system modules (not
I/O modules) sit on the maintenance bus, only the IC controller accesses it.
Figure 1-18 I2C Bus Block Diagram
Motherboard
Thermom/
Thermo stat
1
CPU 0
PCL
Registers
ICBus
up to 8
Memory
Pairs
2
CPUs
MEMs
IOD 1
IOD 0
1–34
DIGITAL Server 5300
PCI 1
PCI 0
Controller
OCP
2
ICBus
Controller
XBUS
PKW 0511-97
IOD0
PCI0
ISA
Page 46
Monitor
The I2C bus monitors the state of system conditions scanned by the power control logic.
There are two registers that the PC logic writes data to:
• One records the state of the fans and power supplies and is latched when there is a
fault.
• The other causes an interrupt on the I
overtemperature condition exists, or power supplied to the system exhibits an
overcurrent condition.
The interrupt received by the I
set alerts the system of imminent power shutdown. The controller has 30 seconds to read
the two registers and store the information in the EEPROM on the motherboard. The SRM
console command show power reads these registers.
Fault Display
The OCP display is written through the I2C bus.
Error State
Error state is stored for power, fan, and overtemperature conditions on the I2C bus.
System Overview
2
C bus when a CPU or system fan fails, an
2
C bus controller on PCI 0 and passed on to the IOD 0 chip
Configuration Tracking
Each CPU and each logical section of the system motherboard (the PCI bridge, the PCI
backplane, the power control logic, the remote console manager), and the system
motherboard itself has an EEPROM that contains information about the module that can
be written and read over the I
• Module type
• Module serial number
• Hardware revision for the logical block
• Firmware revision
2
C bus. All EEPROMs contain the following information:
DIGITAL Server 5300
1–35
Page 47
System Overview
StorageWorks Drives
The system supports up to seven StorageWorks drives.
Figure 1-19 StorageWorks Drive Location
StorageWorks
Drives Shelf
The StorageWorks drives are to the right of the system cage. Up to seven drives fit into
the shelf. The system supports fast wide Ultra SCSI disk drives. The RAID controller is
also supported. With an optional Ultra SCSI Bus Splitter Kit the StorageWorks shelf can
be split into two buses.
1–36
DIGITAL Server 5300
PKW0514-97
Page 48
2
Power-Up
This chapter describes system power-up testing and explains the power-up displays. The
following topics are covered:
• Control Panel
• Power-Up Sequence
• SROM Power-Up Test Flow
• SROM Errors Reported
• XSROM Power-Up Test Flow
• XSROM Errors Reported
• Console Power-Up Tests
• Console Device Determination
• Console Power-Up Display
• Fail-Safe Loader
DIGITAL Server 5300 2–1
Page 49
Power-Up
Control Panel
The control panel display indicates the likely device when testing fails.
Figure 2-1 Control Panel and LCD Display
&RQWURO3DQHO
123
P0 TEST 11 CPU0
4
PKW0510-97
• When the On/Off button LED is on, power is applied and the system is running.
When it is off, the system is not running, but power may or may not be present. If the
power supplies are receiving AC power, Vaux is present on the system motherboard
regardless of the condition of the On/Off switch.
• When the Halt button LED is lit and the On/Off button LED is on, the system should
be running either the SRM console or Windows NT.
The potentiometer, accessible through the access hole just above the Reset button controls
the intensity of the LCD. Use a small Phillips head screwdriver to adjust.
2–2
DIGITAL Server 5300
Page 50
Table 2-1 Control Panel Display
FieldContentDisplayMeaning
CPU numberP0–P1CPU reporting status
Power-Up
StatusTEST¨Tests are executing
FAILFailure has been detected
MCHKMachine check has occurred
INTRError interrupt has occurred
Test number
Suspected deviceCPU0–1CPU module number
MEM0–7 and L,
H, or *
Memory pair number and low
DIMM, high DIMM, or either
IOD0Bridge to PCI bus 0
IOD1Bridge to PCI bus 1
FROM0Flash ROM
COMBOCOM controller
1
1
PCEBPCI-to-EISA bridge
ESCEISA system controller
NVRAMNonvolatile RAM
TOYReal-time clock
1
1
I8242Keyboard and mouse controller
1
1
1
1
1
1
On the system motherboard (54-25147-01).
DIGITAL Server 5300
2–3
Page 51
Power-Up
Power-Up Sequence
Console and most power-up tests reside on the I/O subsystem, not on the
CPU nor on any other module on the system bus.
Figure 2-2 Power-Up Flow
SROM code loaded
SRO M tests execute
X S R OM load e d into
each CPU's S-cache
Definitions
SROM. The SROM is a 128-Kbit ROM on each CPU module. The ROM contains
minimal diagnostics that test the Alpha chip and the path to the XSROM. Once the path is
verified, it loads XSROM code into the Alpha chip and jumps to it.
Power-Up/Reset
into each CPU's
I-cache
XSR OM te sts execute
SRM console loaded
into memo ry
SRM console tests
execute
SR M con sole either
remains in the system
or loads AlphaBIOS
console
PKW 0432B-96
XSROM. The XSROM, or extended SROM, contains back-up cache and memory tests,
the I/O subsystem tests for embedded devices, and a fail-safe loader. The XSROM code
resides in sector 0 of FEPROM 0 on the XBUS. Sector 2 of FEPROM 0 contains a
2–4
DIGITAL Server 5300
Page 52
Power-Up
duplicate copy of the code and is used if sector 0 is corrupt. Code for sizing DIMM
memory resides in sector 1 of FEPROM 0 along with the PAL code.
FEPROM. Two 1-Mbyte programmable ROMs (FEPROMS) are on the XBUS on PCI0.
FEPROM 0 contains two copies of the XSROM, and the SRM console and decompression
code. FEPROM 1 contains the AlphaBIOS and NT HAL code. See Figure 2-3. These
two FEPROMs can be flash updated. Refer to Chapter 6.
Figure 2-3 Contents of FEPROMs
FEPRO M 0FEPRO M 1
Sector
XSROM
0
Fail Sa fe ldr
Pal Co de
1
XSROM DIMM
XSROM
2
Fail S afe ldr
Decompress
3
and
and
64Kb
64Kb
64Kb
AlphaBIOS
Code
16
SR M
Console
Code
64Kb
1Mbyte
PKW0515-97
DIGITAL Server 5300
2–5
Page 53
Power-Up
y
y
y
For the console to run, the path from the CPU to the XSROM must be functional. The
XSROM resides in FEPROM0 on the XBUS, off the EISA bus, off PCI 0, off IOD 0. See
Figure 2-4. This path is minimally tested by SROM.
PCIBus0isused,thelast
PCI slot on PCI Bus 1 is not
available.
EISA Slot
Real-Tim e
Clock
Combo I/O:
serial p orts
parallelport
floppy cntrl
XBUS
Mous e/
Keyboard
2
ICBus
Interface
BDATA
Xceivers
NVRAM
8Kx8
Flash
ROM
2M B
64 B its
PCI Slot
PCI Slot
PCI Slot
PKW 0502A-97
2–6
DIGITAL Server 5300
Page 54
Power-Up
The SROM contents are loaded into each CPU’s I-cache and executed on power-up/reset.
After testing the caches on each processor chip, it tests the path to the XSROM. Once this
path is tested and deemed reliable, layers of the XSROM are loaded sequentially into the
processor chip on each CPU. None of the SROM or XSROM power-up tests are run from
memory—all run from the caches in the CPU chip, thus providing excellent diagnostic
isolation. Later power-up tests, run under the console, are used to complete testing of the
I/O subsystem.
There are two console programs: the SRM console and the AlphaBIOS console, as detailed
in your system User’s Guide. By default, the SRM console is always loaded and I/O
system tests are run under it before the system loads AlphaBIOS.
DIGITAL Server 5300
2–7
Page 55
Power-Up
SROM Power-Up Test Flow
The SROM tests the CPU chip and the path to the XSROM.
Figure 2-5 SROM Power-Up Test Flow
For each C PU
Initia lize C PU ch ip
Tur n off C PU LED
HANG
Yes
D-cache
erro rs
No
Initia lize
PCI-EISA bridge
chip
Read TOY
NVRAM
HANG
HANG
No
Yes
All 3 S-cache
banks pass
Yes
Dupilcate Tag or
Fill errors
No
Light CPU LE D
De term ine P rim a ry
Size IOD
Loopback on
each IOD
Pas s
Light IOD LEDs
Fail
Init ia lize C om bo Ch ip
on XBUS for access
to C O M po rt 1
Init ia lize O CP por t
on XBUS for access
to O CP display
P rint to co n s o le
device and O CP
Initia lize a ll S -c a c h e
banks
Check integrity of
XSROM
Pas s
Load first 8K o f
XSROM into
S-cache
Jump to X S RO M
overlay in S-cach e
Fail
tw i c e
PKW 04 32-96
HANG
2–8
DIGITAL Server 5300
Page 56
Power-Up
The Alpha chip built-in self-test tests the I-cache at power-up and upon reset.
Each CPU chip loads its SROM code into its I-cache and starts executing it. If the chip is
partially functional, the SROM code continues to execute. However, if the chip cannot
perform most of its functions, that CPU hangs and that CPU pass/fail LED remains off.
(In these systems, the CPU pass/fail LED is not visible.)
If the system has more than one CPU and at least one passes both the SROM and XSROM
power-up tests, the system will bring up the console. The console checks the
FW_SCRATCH register where evidence of the power-up failure is left. Upon finding the
error, the console sends these messages to COM1 and the OCP:
• COM1 (or VGA):Power-up tests have detected a problem with your system
• OCP:Power-up failure
DIGITAL Server 5300
2–9
Page 57
Power-Up
Table 2-2 lists the tests performed by the SROM.
Table 2-2 SROM Tests
Test NameLogic Tested
D-cache RAM March testD-cache access, D-cache data, D-
cache address logic
D-cache Tag RAM March testD-cache tag store RAM, D-cache bank
address logic
S-cache Data March testS-cache RAM cells, S-cache data path,
S-cache address path
S-cache Tag RAM March testS-cache tag store RAM, S-cache bank
SC_CTL register and parity error
forcing logic, SC_STAT register and
reporting logic
through CAP chip and MDP0 on each
IOD, PCI0 A/D lines <31:0>
Page 58
SROM Errors Reported
The SROM reports machine checks, pending interrupt/exception errors,
and errors related to corruption of FEPROM 0. If SROM errors are fatal,
the particular CPU will hang and only the CPU self-test pass LEDs and/or
the LEDs on the system motherboard will indicate the failure. The CPU
self-test pass LED is not visible but the IOD0 and IOD1 pass LEDs are.
Example 2-1 SROM Errors Reported at Power-Up
Unexpected Machine Check (CPU Error)
UNEX MCHK on CPU 0
EXC_ADR 42a9
EI(STAT fffffff004ffffff
EI(ADDR ffffff000000801f
SC(STAT 0
SC(ADDR FFFFFF0000005F2F
Once the SROM has completed its tests and verified the path to the
FEPROM containing the XSROM code, it loads the first 8 Kbytes of
XSROM into t he primary CPU’s S-cache and jumps to it . XSROM te st s are
described in Table 2-3. Failure indicates a CPU failure.
Figure 2-6 XSROM Power-Up Flowchart
XSR OM b anner to
OC P/console device
Clear SC _FHIT (force h it)
Enable all 3 S-cache banks
Run memor
Print trace to O CP /console dev.
Print errors to O CP/console d ev.
Done messa
texts.
e to console dev.
Run B-cache tests
P rin t errors to OC P/co nso le d ev.
Done m essa
and ena ble d uplicate ta
through I squared C bus
Print me m info to console dev.
C h ec k for ille
Print warnin
and O CP.
In itia liz e a ll m emo r
Note: The XSROM can onl
the environment variable conso le = serial. It alwa
output to the OCP.
e to conso le dev.
Boot processor
redetermination
Init ia liz e B-c a c h e
Size system memor
al memoryconfig.
s to console d ev.
pairs.
print to the con sole device if
Boot processor
redetermination
Primar
verifies c he cksum
of PAL/decom p/console
Primar
decomp ression code or
fa il-safe loader depend in
upon results of checksum
Primar
and starts the console
Second aries alerted that
console has started. The
jump to and run PALcode
join in
code
Pass
unloads PAL/
jumps to PA Lcode
the console.
s send s
Fa il
Fail- safe
loader
PKW 0432A-96
2–12
DIGITAL Server 5300
Page 60
Power-Up
After jumping to the primary CPU's S-cache, the code then intentionally I-caches itself and
is completely register based (no D-stream for stack or data storage is used). The only Dstream accesses are writes/reads during testing.
Each FEPROMhas sixteen 64-Kbyte sectors. The first sector contains B-cache tests,
memory tests, and a fail-safe loader. The second sector contains support for system
memory and PALcode. The third sector contains a copy of the first sector. The remaining
thirteen sectors contain the SRM console and decompression code.
Memory tests are run during power-up and reset (see Figure 2-4). They are also
affected by the state of the memory_test environment variable, which can have
the following values: FULL Test all memory
PARTIAL Test up to the first 256 Mbytes
NONETest 32 Mbytes
11B-cache Data March testB-cache data RAMs, CPU chip B-cache
control, CPU chip B-cache address
decode, INDEX_H<23:6> (address bus)
12B-cache Tag March testB-cache tag store RAMs, B-cache
STAT store RAMs
13B-cache ECC Data Line testCPU chip ECC generation and checking
logic, ECC lines from CPU chip to Bcache, B-cache ECC RAMs
14B-cache Tag Data Line testAccess to B-cache tags, shorts between
tag data and its status and parity bits
15B-cache Data Line testB-cache data lines to B-cache data
RAMs, B-cache read/write logic
16B-cache ECC Data Line testCPU chip ECC generation and checking
logic, ECC lines from CPU chip to Bcache, B-cache ECC RAMs
DIGITAL Server 5300
2–13
Page 61
Power-Up
Table 2-4 Memory Tests
TestTest NameLogic TestedDescription
20Memory Data
test
Data path to and from
memory
Test floats 1 and 0
across data and check
bit data lines. Errors
are reported for each
DIMM memory card
from MEM0_L to
MEM7_H.
21Memory
Address test
23*Memory Bitmap
Building
24Memory March
test
*There is no test 22.
Address path to and
Same as test 20.
from memory
No new logicMaps out bad memory
by way of the bitmap. It
does not completely fail
memory.
No new logicMaps out bad memory.
2–14
DIGITAL Server 5300
Page 62
XSROM Errors Reported
The XSROM r eports B-cache t est errors and memory test errors. It also
reports a warning if memory is illegally configured.
Example 2-2 XSROM Errors Reported at Power-Up
B-cache Error (CPU Error)
TEST ERR on cpu0#CPU running the test
FRUcpu0
err#2
tst#11
exp:5555555555555555#Expected data
rcv:aaaaaaaaaaaaaaaa#Received data
adr:ffff8#B-cache location
Memory Error (Memory Module Indicated)
20..21..
TEST ERR on cpu0#CPU running test
FRU:MEM1L#Low member of memory pair 1
Once loaded, the SRM console tests each IOD further. Table 2-5 describes
the IOD power-up tests, and Figure 2-6 describes the PCI power-up tests.
Table 2-5 IOD Tests
Test #Test NameDescription
1IOD CSR Access testRead and write all CSRs in each IOD.
2Loopback testDense space writes to the IOD’s PCI dense
3ECC testLoopback tests similar to test 2 but with a
4Parity Error and Fill Error
tests
5Translation Error testA loopback test using scatter/gather
6Write Pending testRuns test 2 with the write-pending bit set
7PCI Loopback testLoops data through each PCI on each
8PCI Peer-to-Peer Byte
Mask test
1
9
10
1
Page Table Entry test 1
(CAP chip)
Page Table Entry test 2
(CAP chip)
space to check the integrity of ECC lines.
varying pattern to create an ECC of 0s.
Single- and double-bit errors are checked.
Parity errors are forced on the address and
data lines on system bus and PCI buses. A
fill error transaction is forced on the
system bus.
address translation logic on each IOD.
and clear in the CAP chip control register.
IOD, testing the mask field of the system
bus.
Tests that devices on the same PCI and on
different PCIs can communicate.
Tests every PTE using scatter/gather
translation and addressing.
Tests random PTEs forcing use of all
interesting tag and page registers.
2–16
DIGITAL Server 5300
Page 64
Table 2-6 PCI Motherboard Tests
Power-Up
Test
Number
Test NameDiagnostic NameDescription
1PCEBpceb_diagTests the PCI to EISA
bridge chip
2ESCesc_diagTests the EISA system
controller
38K NVRAMnvram_diagTests the NVRAM
4Real-Time Clockds1287_diagTests the real-time
clock chip
5Keyboard and
Mouse
i8242_diagTests the
keyboard/mouse chip
6Flash ROMflash_diagDumps contents of
flash ROM
7Serial and
Parallel Ports and
Floppy
combo_diagTests COM ports 1 and
2, the parallel port, and
the floppy
8CD-ROMncr810_diagTests the CD-ROM
controller
For both IOD tests and PCI 0 and PCI 1 tests, trace and failure status is sent to the OCP. If
any of these tests fail, a warning is sent to the SRM console device after the console
prompt (or AlphaBIOS pop-up box). The IOD LEDs on the system motherboard are
controlled by the diagnostics. If a LED is off, a failure occurred.
DIGITAL Server 5300
2–17
Page 65
Power-Up
Console Device Determination
After the SROM and XSROM have compl et ed their tasks, the SRM consol e
program, as it starts, determines where to send its power-up messages.
Figure 2-7 Console Device Determination Flowchart
For each CP U
In itialize C PU ch ip
Tu rn off C PU LED
HANG
Ye s
D-cache
errors
No
Initialize
PCI-EISA bridge
chip
Read TOY
NVRAM
HANG
HANG
No
Yes
Determine Prim ary
All 3 S-cache
banks pass
Yes
Dupilcate Tag or
Fill errors
No
Light CPU LE D
Size IOD
Loopback on
each IOD
Pass
Light IOD LEDs
Fail
Initialize Comb o C hip
on X BUS for access
to C OM port 1
In itia lize OC P po rt
on X BUS for access
to O CP dis play
Print to console
device and OCP
In itialize a ll S-c ac he
banks
Check integrity of
XSROM
Pass
Load first 8K o f
XSROM into
S-cache
Jump to XSRO M
overlay in S -cache
Fail
twic e
PKW0432-96
HANG
2–18
DIGITAL Server 5300
Page 66
Console Device Options
The console device can be either a serial terminal or a graphics monitor. Specifically:
• A serial terminal connected to COM1 off the bulkhead. The terminal connected to
COM1 must be set to 9600 baud. This baud rate cannot be changed.
• A graphics monitor off an adapter on PCI0.
Systems running Windows NT must have a graphics monitor as the console device and run
AlphaBIOS as the console program.
During power-up, the SROM and the XSROM always send progress and error messages to
the OCP and to the COM1 serial port if the SRM console environment variable (set with
the set console command) is set to serial. If the console environment variable is set to
graphics, no messages are sent to COM1.
If the console device is connected to COM1, the SROM, XSROM, and console power-up
messages are sent to it once it has been initialized. If the console device is a graphics
device, console power-up messages are sent to it, but SROM and XSROM power-up
messages are lost. No matter what the console environment variable setting, each of the
three programs sends messages to the control panel display.
Power-Up
Messages
Sent By
Console Set to
SerialGraphics
SROMCOM1Lost, though a subset is sent to the OCP
XSROMCOM1Lost, though a subset is sent to the OCP
SRM consoleCOM1¨VGA, though a subset is sent to the
OCP
DIGITAL Server 5300
2–19
Page 67
Power-Up
Console Power-Up Display
The entire power-up display prints to a serial terminal (if the console
environment variable is set to serial), and parts of it print to the control
panel display. The last several lines print to either a serial terminal or a
graphics monitor.
Example 2-3 Power-Up Display
SROM V3.0 on cpu0
SROM V3.0 on cpu1
XSROM V5.0 on cpu0
XSROMb V5.0 on cpu1
BCache testing complete on cpu1
BCache testing complete on cpu0
mem_pair0 - 256 MB
mem_pair1 - 256 MB
mem_pair2 - 64 MB
mem_pair3 - 64 MB
20..21..20..21..23..24..24..
Memory testing complete on cpu0
Memory testing complete on cpu1
2–20
DIGITAL Server 5300
Page 68
Power-Up
At power-up or reset, the SROM code on each CPU module is loaded into that
module’s I-cache and tests the module. If all tests pass, the processor’s LED
lights. If any test fails, the LED remains off and power-up testing terminates on
that CPU.
The first determination of the primary processor is made, and the primary
processor executes a loopback test to each PCI bridge. If this test passes, the
bridge LED lights. If it fails, the LED remains off and power-up continues. The
EISA system controller, PCI-to-EISA bridge, COM1 port, and control panel port
are all initialized thereafter.
Each CPU prints an SROM banner to the device attached to the COM1 port and
to the control panel display. (The banner prints to COM1 if the console
environment variable is set to serial. If it is set to graphics, nothing prints to the
console terminal, only to the control panel display, until occurs.
Each processor’s S-cache is initialized, and the XSROM code in the FEPROM on
the PCI 0 is unloaded into them. (If the unload is not successful, a copy is unloaed
from a different FEPROM sector. If the second try fails, the CPU hangs.)
Each processor jumps to the XSROM code and sends an XSROM banner to the
COM1 port and to the control panel display.
The three S-cache banks on each processor are enabled, and then the
B-cache is tested. If a failure occurs, a message is sent to the COM1 port and to
the control panel display.
Each CPU sends a B-cache completion message to COM1.
The primary CPU is again determined, and memory is sized using code in sector 1
of FEPROM 0.
The information on memory pairs is sent to COM1. If an illegal memory
configuration is detected, a warning message is sent to COM1 and the control
panel display.
Memory is initialized and tested, and the test trace is sent to COM1 and the
control panel display. Each CPU participates in the memory testing. The
numbers for tests 20 and 21 might appear interspersed, as in Example 2–3. This
is normal behavior. Test 24 can take several minutes if the memory is very large.
The message “P0 TEST 24 MEM**” is displayed on the control panel display;
the second asterisk rotates to indicate that testing is continuing. If a failure occurs,
a message is sent to the COM1 port and to the control panel display.
Each CPU sends a test completion message to COM1.
DIGITAL Server 5300
2–21
Page 69
Power-Up
Example 2–3 Power-Up Display (Continued)
starting console on CPU 0
sizing memory
0 256 MB DIMM
1 256 MB DIMM
64 MB DIMM
64 MB DIMM
starting console on CPU 1
probing IOD1 hose 1
bus 0 slot 1 - NCR 53C810
bus 0 slot 2 - DECchip 21041-AA
bus 0 slot 3 - NCR 53C810
probing IOD0 hose 0
bus 0 slot 1 – PCEB
probing EISA Bridge, bus 1
bus 0 slot 2 – S3 Trio64/Trio32
bus 0 slot 3 – DECchip 21140-AA
Configuring I/O adapters...
Ncr0, hose 1, bus 0, slot 1
Tulip0, hose 1, bus 0, slot 2
Ncr1, hose 1, bus 0, slot 3
Floppy0, hose 0, bus 1 slot 0
Mc0, hose 0 bus 0, slot 2
tulip1, hose 0, bus 0, slot 3
System temperature is 31 degrees C
DIGITAL Server 5300 Console V5.0, 02-SEP-1997 18:18:26
P00>>>
¡
2–22
DIGITAL Server 5300
Page 70
Power-Up
The final primary CPU determination is made. The primary CPU unloads
PALcode and decompression code from the FEPROM on PCI 0 to its B-cache.
The primary CPU then jumps to the PALcode to start the SRM console.
The primary CPU prints a message indicating that it is running the console.
Starting with this message, the power-up display is printed to the default console
terminal, regardless of the state of the console environment variable. (If console is
set to graphics, the display from here to the end is saved in a memory buffer and
printed to the graphics monitor after the PCI buses are sized and the graphics
device is initialized.)
The size and type of each memory pair is determined.
The console is started on each of the secondary CPUs. A status message prints for
each CPU.
The PCI bridges (indicated as IODn) are probed and the devices are reported. I/O
adapters are configured.
¡
The SRM console bannerand prompt are printed. (The SRM prompt is shown in
this manual as P00>>>. It can, however, be P01>>>.)
The SRM console loads and starts the AlphaBIOS console.
DIGITAL Server 5300
2–23
Page 71
Power-Up
Fail-Safe Loader
The fail-safe loader is a software routine that loads the SRM console image
from floppy. Once the console is running you will want to run LFU to
update FEPROM 0 with a new image.
If the fail-safe loader loads, the following conditions exist on the machine:
• The SROM has passed its tests and successfully unloaded the XSROM. If the SROM
fails to unload both copies of XSROM, it reports the failure to the control panel
display and COM1 if possible, and the system hangs.
• The XSROM has completed its B-cache and memory tests but has failed to unload the
PALcode in FEPROM 0 sector 1 or the SRM console code.
• The XSROM reports the errors encountered and loads the fail-safe loader.
2–24
DIGITAL Server 5300
Page 72
3
Troubleshooting
This chapter describes troubleshooting during power-up and booting. It also describes the
console test command and other useful commands. The following topics are covered:
• Troubleshooting with LEDs
• Troubleshooting Power Problems
• Running Diagnostics—Test Command
• Releasing Secure Mode
• Testing an Entire System
• Other Useful Console Commands
DIGITAL Server 5300 3–1
Page 73
Troubleshooting
Troubleshooting with LEDs
During power-up, reset, initialization, or testing, diagnostics are run on
CPUs, memories, I/O bridges, and the PCI backplane and its embedded
options. This section describes possible problems that can be identified by
checking LEDs. Unfortunately LEDs on the CPU module ar e not visible;
the only visible LEDs are on the system motherboard.
Figure 3-1 System Motherboard LEDs
System Motherboard
LEDs
IOD 0 Pa ss
3–2
DIGITAL Server 5300
IOD 1 Pa ss
Fan Fault
Temp OK
PKW 0504G-97
Page 74
System Motherboard LEDs
You see the system motherboard LEDs by looking through the grate at the back of the
machine. The normal state of the LEDs is shown in Figure 3-1.
• If one of the IOD LEDs is off, the system bus to PCI bus bridge has failed. Replace
the system motherboard.
• If the Fan Fault LED is ON, at least one of the four fans is broken. If this condition
occurs while the system is up and running, an error message identifying the FRU is
printed to the console. If this condition occurs during a cold start, to identify which
fan caused the fan fault, reset the system and watch the OCP display. During the first
30 seconds, one of the following message should occur:
• SYSx Fan Failed where x = 0 or 1
• CPUx Fan Failed where x = 0 or 1
Replace the failing FRU.
• If the Temp OK LED is OFF, an overtemperature condition exists. Several things can
cause this condition: blocked airflow, temperature in the room where the system is
located is too high, the system card cage is open and air is not channeled properly over
the system. Fix any of these conditions, if possible. The overtemperature threshold is
programmable and is controlled by the environment variable over_temp. Its default
is 55 degrees C. After the system has cooled down and can be powered up, you can
change the threshold. If you do this and the temperature inside the system gets too
hot, it is likely that system errors will occur and the system may crash.
Troubleshooting
DIGITAL Server 5300
3–3
Page 75
Troubleshooting
Troubleshooting Power Problems
Power problems can occur before the system is up or while the system is
running.
Power Problem List
The system will halt for the following reasons:
1. A CPU fan failure
2. A system fan failure
3. An overtemperature condition
4. Power supply failure
5. Circuit breaker(s) tripped
6. AC problem
7. Interlock switch activation or failure
8. Environmental electrical failure or unrecoverable system fault with auto_action ev =
halt or boot
9. Cable failure
Indication of failure:
1. LEDs indicate fan and overtemperature condition
2. The OCP display
3. Circuit breaker(s) tripped
There is no obvious indication for failures 7 – 10 from the power system.
3–4
DIGITAL Server 5300
Page 76
Halt Caused by Power, Fan, or Overtemperature Condition
If a system is stopped because of a power, fan, or overtemperature problem, the console
and the OCP should report the problem.
If Power Problem Occurs at Power-Up
If the system has a power problem on a cold start, the motherboard LEDs and the OCP
display will indicate a problem. Causes of power problems are:
• Broken system fan
• Broken CPU fan
• A power supply could be broken and the system could still power up momentarily.
(During power-up, an overcurrent condition occurs with two power supplies and is
tolerated for a short period but a persistent overcurrent is not.)
• Power control logic on the motherboard could fail
• Interlock failure
• Wire problems
• Temperature problem (unlikely)
Troubleshooting
Recommended Order for Troubleshooting Failure at Power-Up
If the SRM console does not come all the way up, restart the system if the system runs NT
and watch for an error message on the OCP display. Replace the FRU indicated.
1. If you can get to the SRM console, use the show power command. It will show the
last power fault.
2. If neither step one nor step 2 identifies a FRU, replace the motherboard.
DIGITAL Server 5300
3–5
Page 77
Troubleshooting
Running Diagnostics — Test Command
The test command runs diagnostics on the entire system, CPU devices,
memory devices, and the PCI I/O subsystem. The test command r uns only
from the SRM console. Ctrl/C stops the test. The console cannot be secure.
Example 3-1 Test Command Syntax
P00>>> help test
FUNCTION
SYNOPSIS
test ([-q] [-t <time>] [option]
where option is:
cpun
memn
pcin
where n = 0, 1 or * for CPUs and PCIs
where n = 0 through 7 or * for MEM
The entire system is tested by default if no is option specified.
If you are running the Microsoft Windows NT operating system, switch from
AlphaBIOS to the SRM console in order to enter the test command. From the
AlphaBIOS console, press in the Halt button (the LED will light) and reset the
system.
-t time Specifies the run time in seconds. The default for system test is 600 seconds (10
minutes).
-qDisables the display of status messages as exerciser processes are started and
stopped during testing.
option Either cpun, memn, or pcin, where n is 0, 1, or * for CPUs and PCIs; or where n
is 0 through 7 or * for memory. If nothing is specified, the entire system is tested.
3–6
DIGITAL Server 5300
Page 78
Releasing Secure Mode
The console ca nnot be secure for most SRM c onsole commands to run. If
the console is not secure, user mode consol e commands c an be e ntered. See
the system manager if the system is secure and you do not know the
password.
Example 3-2 Releasing/Reestablishing Secure Mode
P00>>> login
Please enter password: xxxx
P00>>>
[User mode SRM console commands are now available.]
P00>>> set secure
Troubleshooting
The console command login clears secure.
If the password has been forgotten and the system is in secure mode, the procedure for
regaining control is:
1. Enter the login command
P00>>> login
2. At the please enter password: prompt, press the Halt button and then press the
Return key.
The password is now cleared and the console is in user mode. A new password must be set
to put the console into secure mode again.
For a full discussion of securing the console, see your system User’s Guide.
DIGITAL Server 5300
3–7
Page 79
Troubleshooting
Testing an Entire System
A test command with no modifiers runs all exercisers for subsystems and
devices on the system. I/O devices tested are supported boot devices. The
test runs for 10 minutes.
Example 3-3 Sample Test Command
P00>>> test
Console is in diagnostic mode
System test, runtime 600 seconds
Type ^C to stop testing
Configuring system..
polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7
dka500.5.0.1.1 DKa500 RRD45 1645
polling ncr1 (NCR 53C810) slot 3, bus 0 PCI, hose 1 SCSI Bus ID 7
dkb200.2.0.3.1 DKb200 RZ29B 0007
dkb400.4.0.3.1 DKb400 RZ29B 0007
polling floppy0 (FLOPPY) PCEB - XBUS hose 0
dva0.0.0.1000.0 DVA0 RX23
polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1
ewa0.0.0.2.1: 08-00-2B-E5-B4-1A
Starting background memory test, affinity to all CPUs..
Starting processor/cache thrasher on each CPU..
Starting processor/cache thrasher on each CPU..
Testing SCSI disks (read-only)
No CD/ROM present, skipping embedded SCSI test
Testing other SCSI devices (read-only)..
Testing floppy drive (dva0, read-only)
3–8
DIGITAL Server 5300
Page 80
Troubleshooting
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
NI72000047
PCI Motherboard 25147-01 a 0003 saddle0
NI72000047
Bus 0 iod0 (PCI0)
Slot Option Name Type Rev Name
1 PCEB 4828086 0005 pceb0
2 S3 Trio64/Trio32 88115333 0054 vga0
3 DECchip 21041-AA 141011 0011 tulip0
Bus 1 pceb0 (EISA Bridge connected to iod0, slot 1)
Slot Option Name Type Rev Name
Bus 0 iod1 (PCI1)
Slot Option Name Type Rev Name
1 NCR 53C810 11000 0002 ncr0
4 QLogic ISP1020 10201077 0005 isp0
DIGITAL Server 5300
3–15
Page 87
Error Registers
This chapter describes the registers used to hold error information. These registers
include:
• External Interface Status Register
• External Interface Address Register
• MC Error Information Register 0
• MC Error Information Register 1
• CAP Error Register
• PCI Error Status Register 1
4
DIGITAL Server 5300 4–1
Page 88
Error Registers
External Interface Status Register - EI_STAT
The EI_STAT register is a read-only register that is unlocked and cleared
by any PALco de read. A re ad of this regi ster also unloc ks the EI_ADDR,
BC_TAG_ADDR, and FILL_SYN registers subject to some restrictions.
The EI_STAT register is not unlocked or cleared by reset.
Fill data from B-cache or main memory could have correctable or uncorrectable errors in
ECC mode. System address/command parity errors are always treated as uncorrectable
hard errors, irrespective of the mode. The sequence for reading, unlocking, and clearing
EI_STAT, EI_ADDR, BC_TAG_ADDR, and FILL_SYN is as follows:
1. Read the EI_ADDR, BC_TAG_ADDR, and FIL_SYN registers in any order. Does
not unlock or clear any register.
2. Read the EI_STAT register. This operation unlocks the EI_ADDR, BC_TAG_ADDR,
and FILL_SYN registers. It also unlocks the EI_STAT register subject to conditions
given in Table 4-1, which defines the loading and locking rules for external interface
registers.
If the first error is correctable, the registers are loaded but not locked. On the
second correctable error, the registers are neither loaded nor locked.
Registers are locked on the first uncorrectable error except the second hard error
bit. This bit is set only for an uncorrectable error that follows an uncorrectable
error. A correctable error that follows an uncorrectable error is not logged as a
second error. B-cache tag parity errors are uncorrectable in this context.
received from outside the CPU contained a correctable ECC error.
<30>R
External Interface Error Source.
indicates that the error source is fill data from main
memory or a system address/command parity error.
When clear, the error source is fill data from the Bcache.
This bit is only meaningful when <COR_ECC_ERR>,
<UNC_ECC_ERR>, or <EI_PAR_ERR> is set in this
register. This bit is not defined for a B-cache tag error
(BC_TPERR) or a B-cache tag control parity error
(BC_TC_ERR).
<29>R
B-Cache Tag Control Parity Error.
B-cache read transaction encountered bad parity in the
tag control RAM.
<28>R
B-Cache Tag Address Parity Error.
B-cache read transaction encountered bad parity in the
tag address RAM.
<27:24>R
Chip Identification.
revisions to the chip will return new unique values.
<23:0>All ones.
<63:36>All ones.
<35>R
Second External Interface Hard Error.
that a fill from B-cache or main memory, or a system
address/command received by the CPU has a hard
error while one of the hard error bits in the EI_STAT
register is already set.
<34>R
Fill I-Ref D-Ref.
occurred during an I-ref fill. When clear, indicates
that the error occurred during a D-ref fill. This bit has
meaning only when one of the ECC or parity error bits
is set. This bit is not defined for a B-cache tag parity
error (BC_TPERR) or a B-cache tag control parity
error (BC_TC_ERR).
<33>R
External Interface Command/Address Parity
Error.
received by the CPU has a parity error.
<32>R
Uncorrectable ECC Error
received from outside the CPU contained an
uncorrectable ECC error. In parity mode, this bit
indicates a data parity error.
Indicates that fill data
When set,
Indicates that a
Indicates that a
Read as “5.” Future update
Indicates
When set, indicates that the error
Indicates that an address and command
. Indicates that fill data
4–4
DIGITAL Server 5300
Page 91
External Interface Address Register - EI_ADDR
The EI_ADDR re gister contai ns the physic al addre ss associ ated with errors
reported by the EI_STAT register. It is unlocked by a read of the EI_STAT
Register. This register is meaningful only when one of the error bits is set.
Error Registers
Address
Access
R
FF FFF0 0148
Table 4-2 Loading and Locking Rules for External Interface Registers
Correctable
Error
00Not
10Not
010YesYesClears and unlocks all
1
1
011NoAlready
1
1
1
These are special cases. It is possible that when EI_ADDR is read, only the correctable error bit is set and the
registers are not locked. By the time EI_STAT is read, an uncorrectable error is detected and the registers are
loaded again and locked. The value of EI_ADDR read earlier is no longer valid. Therefore, for the “1,1,x” case,
when EI_STAT is read correctable, the error bit is cleared and the registers are not unlocked or cleared. Software
must reexecute the IPR read sequence. On the second read operation, error bits are in “0,1,x” state, all the related
IPRs are unlocked, and EI_STAT is cleared.
Uncorrectable
Error
Second
Hard
Error
Load
Register
Lock
Register
Action When
EI_STAT Is Read
NoNoClears and unlocks all
possible
registers
YesNoClears and unlocks all
possible
registers
registers
10YesYesClear bit (c) does not
unlock. Transition to
“0,1,0” state.
Clears and unlocks all
locked
11NoAlready
locked
registers
Clear bit (c) does not
unlock. Transition to
“0,1,1” state.
DIGITAL Server 5300
4–5
Page 92
Error Registers
MC Error Information Register 0 (MC_ERR0 - Offset = 800)
The low-order MC bus (system bus) address bits are latched into this
register when the sy st e m bus to P CI bus bridge det e c t s a n error event. If the
event is a hard error, the register bits are locked. A write to clear symptom
bits in the CAP Error Register unlocks this register. When the valid bit
(MC_ERR_VALID) in the CAP Error Register is clear, the contents are
undefined.
transaction on the system
bus when an error is
detected.
Page 93
Error Registers
MC Error Information Register 1 (MC_ERR1 - Offset = 840)
The high-orde r MC bus (system bus) address bit s and error symptoms are
latched i nto this re gister when t he system bus to P CI bus bridge de tects an
error. If the event is a hard error, the register bits are locked. A write to
clear symptom bits in the CAP Error Register unlocks this register. When
the valid bit (MC_ERR_VALID) in the CAP Error Register is clear, the
contents are undefined.
Reserved<30:21>RO0
Dirty<20>RO0Set if the system bus
Reserved<19:17>1All ones.
DEVICE_ID<16:14>RO0Slot number of bus
MC_CMD<5:0><13:8>RO0Active command at the
ADDR<39:32><7:0>RO0Address bits <39:32> of
Description
<30:23> in the
CAP_ERR Register. Set
if MC_ERR0 and
MC_ERR1 contain a
valid address.
error was associated with
a Read/Dirty transaction.
When set, the device ID
field <19:14> does not
indicate the source of the
data.
master at the time of the
error.
time the error was
detected.
the transaction on the
system bus when an error
is detected.
4–8
DIGITAL Server 5300
Page 95
CAP Error Register (CAP_ERR - Offset = 880)
CAP_ERR is used to log information pertaining to an error detected by the
CAP or MDP ASIC. If the error is a hard error, the register is locked. All
bits, except the LOST_MC_ERR bit, are locked on hard errors. CAP_ERR
remains locked until the CAP error is written to clear each individual error
bit.
PIO_OVFL
LOS T_M C_ER R
MC _AD R_PE RR
NXM
CRDA
CRDB
RDSA
RDSB
MC _ER R_VALID
PERR
SERR
MAB
PTE_INV
PCI_ERR_VALID
PKW 0551B-97
Table 4-5 CAP Error Register
NameBitsTypeInitial
State
MC_ERR VALID<31>RO0Logical OR of bits <30:23> in
RDSB<30>RW1C0Uncorrectable ECC error detected
RDSA<29>RW1C0Uncorrectable ECC error detected
CRDB<28>RW1C0Correctable ECC error detected
Description
this register. When set MC_ERR0
and MC_ERR1 are latched.
by MDPB. Clear state in MDPB
before clearing this bit.
by MDPA. Clear state in MDPA
before clearing this bit.
by MDPB. Clear state in
MDPB_STAT before clearing this
bit.
continued on next page
DIGITAL Server 5300
4–9
Page 96
Error Registers
Table 4-5 CAP Error Register (continued)
CRDA<27>RW1C0Correctable ECC error detected by
NXM<26>RW1C0System bus master transaction status
MC_ADR_PERR<25>RW1C0Set when a system bus
LOST_MC_ERR<24>RW1C0Set when an error is detected but not
PIO_OVFL<23>RW1C0Set when a transaction that targets
Reserved<22:5>RO0
PCI_ERR_VALID<4>RO0Logical OR of bits <3:0> of this
PTE_INV<3>RW1C0Invalid page table entry on
MAB<2>RW1C0PCI master state machine detected
SERR<1>RW1C0PCI target state machine observed
PERR<0>RW1C0PCI master state machine observed
MDPA. Clear state in MDPA_STAT
before clearing this bit.
NXM (Read with Address bit <39>
set but transaction not pended or
transaction target above the top of
memory register.) CPU will also get
a fill error on reads.
command/address parity error is
detected.
logged because the associated
symptom fields and registers are
locked with the state of an earlier
error.
this system bus to PCI bus bridge is
not serviced because the buffers are
full. This is a symptom of setting
the PEND_NUM field in
CAP_CNTL to an incorrect value.
register. When set, the PCI error
address register is locked.
scatter/gather access.
PCI Target Abort (likely cause:
NXM) (except Special Cycle). On
reads fill error is also returned.
SERR#. CAP asserts SERR when it
is master and detects target abort.
PERR#.
4–10
DIGITAL Server 5300
Page 97
Error Registers
PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040)
PCI_ERR1 is used by the system bus to PCI bus bridge to log bus address
<31:0> pertaining to an error condi tion logged in CAP_ERR. This register
always captures PCI addr ess <31:0>, even for a P CI DAC cycle. When the
PCI_ERR_VALID bit in CAP_ERR is clear, the contents are undefined.
When the system interlocks are disabled and the system is still powered on,
voltages are low in the system, but current is high. Observe the following
guidelines to prevent personal injury.
5
Removal and Replacement
1. Remove any jewelry that may conduct electricity before working on the
system.
2. If you need to access the system card cage, power down the system and wait
2 minutes to allow components in that area to cool.
Table 5-1 Field-Replaceable Unit Part Numbers (continued)
System Backplane, Display, and
Support Hardware
54-25147-02System motherboard
RX23L-ABFloppy
RRD46-AB or 30-48116-02CD-ROM
54-23302-02OCP assembly
70-31349-01Speaker assembly
Fans
70-31351-01Cooling fan 120x120
70-31350-01Cooling fan 92x92
12-24701-34CPU fan
Power System Components
30-43120-02Power supply
SCSI Hardware
54-23365-01SCSI backplane
30-48985-01Ultra SCSI bus extender
Power Cords
BN35B-02North America, Japan 12V, 75-inches long
BN35S-02Australia, New Zealand, 2.5m long
BN35R-02Central Europe, 2.5m long
BN35J-02UK, Ireland, 2.5m long
BN35K-02Switzerland 2.5m long
BN35P-02Denmark, 2.5m long
BN35M-02Italy, 2.5m long
BN35L-02Egypt, India, South Africa, 2.5m long
BN35N-02Israel, 2.5m long
continued on next page
DIGITAL Server 5300
5–3
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.