DEC DIGITAL Server 5300 Service Guide

Page 1
DIGITAL Server 5300 Service Guide
Part Number: ER−K8FWW−SG. A01
February 1998
Digital Equipment Corporation Maynard, Massachusetts
Page 2
February 1998
Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.
Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license from DIGITAL or an authorized sublicensor.
© Digital Equipment Corporation 1998. All rights reserved.
The following are trademarks of Digital Equipment Corporation: Alpha, DIGITAL, RRD46, StorageWorks, and the DIGITAL logo.
The following are third-party trademarks: Adobe and PostScript are registered trademarks of Adobe Systems, Incorporated. Helvetica and Times are registered trademarks of Linotype Co. Microsoft and MS-DOS are registered trademarks and Windows is a trademark of Microsoft Corporation.
Page 3

Table of Contents

1 System Overview

System Enclosure....................................................................................................................1–2
Cover Interlock................................................................................................................ 1–3
Operator Control Panel and Drives..........................................................................................1–4
System Consoles..................................................................................................................... 1–6
AlphaBIOS Boot Menu.................................................................................................... 1–6
SRM Console...................................................................................................................1–6
AlphaBIOS Console......................................................................................................... 1–7
Environment Variables.....................................................................................................1–7
System Architecture................................................................................................................1–8
CPU Types ...........................................................................................................................1–10
Alpha Chip Composition................................................................................................ 1–11
CPU Configuration Rules............................................................................................... 1–11
Memory................................................................................................................................ 1–12
Memory Variants........................................................................................................... 1–13
Memory Operation......................................................................................................... 1–13
Memory Configuration Rules......................................................................................... 1–13
Memory Addressing.............................................................................................................. 1–14
System Motherboard.............................................................................................................1–16
System Bus (Backplane) ................................................................................................ 1–18
System Bus to PCI Bus Bridge....................................................................................... 1–20
PCI I/O Subsystem.........................................................................................................1–22
Remote Control Logic.................................................................................................... 1–24
Power Control Logic......................................................................................................1–26
Power Circuit and Cover Interlock........................................................................................ 1–28
Power Supply........................................................................................................................ 1–30
Description .................................................................................................................... 1–31
Power Supply Features................................................................................................... 1–31
iii
Page 4
Power Up/Down Sequence.................................................................................................... 1–32
Maintenance Bus (I
2
C Bus)................................................................................................... 1–34
Monitor.......................................................................................................................... 1–35
Fault Display ................................................................................................................. 1–35
Error State ..................................................................................................................... 1–35
Configuration Tracking.................................................................................................. 1–35
StorageWorks Drives............................................................................................................ 1–36

2 Power-Up

Control Panel.......................................................................................................................... 2–2
Power-Up Sequence................................................................................................................ 2–4
Definitions....................................................................................................................... 2–4
SROM Power-Up Test Flow................................................................................................... 2–8
SROM Errors Reported......................................................................................................... 2–11
XSROM Power-Up Test Flow .............................................................................................. 2–12
XSROM Errors Reported...................................................................................................... 2–15
Console Power-Up Tests....................................................................................................... 2–16
Console Device Determination ............................................................................................. 2–18
Console Device Options................................................................................................. 2–19
Console Power-Up Display................................................................................................... 2–20
Fail-Safe Loader................................................................................................................... 2–24

3 Troubleshooting

Troubleshooting with LEDs.................................................................................................... 3–2
System Motherboard LEDs.............................................................................................. 3–3
Troubleshooting Power Problems ........................................................................................... 3–4
Power Problem List ......................................................................................................... 3–4
Halt Caused by Power, Fan, or Overtemperature Condition ............................................. 3–5
If Power Problem Occurs at Power-Up............................................................................. 3–5
Recommended Order for Troubleshooting Failure at Power-Up....................................... 3–5
Running Diagnostics — Test Command................................................................................. 3–6
Releasing Secure Mode ..........................................................................................................3–7
Testing an Entire System........................................................................................................ 3–8
Testing Memory................................................................................................................... 3–10
Testing PCI.................................................................................................................... 3–12
Other Useful Console Commands......................................................................................... 3–14

4 Error Registers

External Interface Status Register - EI_STAT......................................................................... 4–2
iv
Page 5
External Interface Address Register - EI_ADDR.....................................................................4–5
MC Error Information Register 0 (MC_ERR0 - Offset = 800)................................................. 4–6
MC Error Information Register 1 (MC_ERR1 - Offset = 840)................................................. 4–7
CAP Error Register (CAP_ERR - Offset = 880)...................................................................... 4–9
PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040)...................................................... 4–11

5 Removal and Replacement

System Safety.........................................................................................................................5–1
FRU List.................................................................................................................................5–2
System Exposure ....................................................................................................................5–6
Exposing the System........................................................................................................5–7
Dressing the System......................................................................................................... 5–7
CPU Removal and Replacement............................................................................................. 5–8
Removal ..........................................................................................................................5–9
Replacement.................................................................................................................... 5–9
Verification................................................................................................................... ... 5–9
CPU Fan Removal and Replacement .................................................................................... 5–10
Removal ........................................................................................................................5–11
Replacement.................................................................................................................. 5–11
Verification................................................................................................................... . 5–11
Memory Riser Card Removal and Replacement.................................................................... 5–12
Removal ........................................................................................................................5–13
Replacement.................................................................................................................. 5–13
Verification................................................................................................................... . 5–13
DIMM Removal and Replacement........................................................................................ 5–14
Removal ........................................................................................................................5–15
Replacement.................................................................................................................. 5–15
Verification................................................................................................................... . 5–15
System Motherboard Removal and Replacement.................................................................. 5–16
Removal ........................................................................................................................5–17
Replacement.................................................................................................................. 5–17
Verification................................................................................................................... . 5–17
PCI/EISA Option Removal and Replacement ....................................................................... 5–18
Removal ........................................................................................................................5–19
Replacement.................................................................................................................. 5–19
Verification................................................................................................................... . 5–19
Power Supply Removal and Replacement............................................................................. 5–20
Removal ........................................................................................................................5–21
Replacement.................................................................................................................. 5–21
Verification................................................................................................................... . 5–21
Power Harness Removal and Replacement............................................................................ 5–22
v
Page 6
Removal........................................................................................................................ 5–23
Replacement.................................................................................................................. 5– 23
Verification.................................................................................................................... 5–23
System Fan Removal and Replacement ................................................................................ 5–24
Removal........................................................................................................................ 5–25
Replacement.................................................................................................................. 5–25
Verification.................................................................................................................... 5–25
Cover Interlock Removal and Replacement.......................................................................... 5–26
Removal........................................................................................................................ 5–27
Replacement.................................................................................................................. 5–27
Verification.................................................................................................................... 5–27
Operator Control Panel Removal and Replacement ............................................................. 5–28
Removal........................................................................................................................ 5–29
Replacement.................................................................................................................. 5–29
Verification.................................................................................................................... 5–29
CD-ROM Removal and Replacement................................................................................... 5–30
Removal........................................................................................................................ 5–31
Replacement.................................................................................................................. 5–31
Verification.................................................................................................................... 5–31
Floppy Removal and Replacement........................................................................................ 5–32
Removal........................................................................................................................ 5–33
Replacement.................................................................................................................. 5–33
Verification.................................................................................................................... 5–33
SCSI Disk Removal and Replacement.................................................................................. 5–34
Removal........................................................................................................................ 5–35
Replacement.................................................................................................................. 5–35
Verification.................................................................................................................... 5–35
StorageWorks Backplane Removal and Replacement........................................................... 5–36
Removal........................................................................................................................ 5–37
Replacement.................................................................................................................. 5–37
Verification.................................................................................................................... 5–37
StorageWorks Ultra SCSI Bus Extender Removal and Replacement..................................... 5–38
Removal........................................................................................................................ 5–39
Replacement.................................................................................................................. 5–39
Verification.................................................................................................................... 5–39

6 Running Utilities

Running Utilities from a Graphics Monitor............................................................................. 6–2
Running Utilities from a Serial Terminal................................................................................ 6–3
Running ECU......................................................................................................................... 6–4
Running RAID Standalone Configuration Utility.................................................................... 6–5
vi
Page 7
Updating Firmware with LFU................................................................................................. 6–6
Updating Firmware from the CD-ROM............................................................................ 6–8
Updating Firmware from a Network Device................................................................... 6–12
LFU Commands............................................................................................................. 6–16
Updating Firmware from AlphaBIOS.................................................................................... 6–19
Upgrading AlphaBIOS.......................................................................................................... 6–20

7 Halts, Console Commands, and Environment Variables

Halt Button Functions............................................................................................................. 7–2
Using the Halt Button............................................................................................................. 7–2
Using Halt to Clear the Console Password .......................................................................7–2
Halt Assertion.........................................................................................................................7–3
Summary of SRM Console Commands................................................................................... 7–5
Summary of SRM Environment Variables .......................................................................7–7
Recording Environment Variables .......................................................................................... 7–8

8 Managing the System Remotely

RCM Overview....................................................................................................................... 8–2
First-Time Setup..................................................................................................................... 8–3
Configuring the Modem................................................................................................... 8–4
Qualified Modems.....................................................................................................8–4
Modem Configuration Procedure...............................................................................8–4
Dialing In and Invoking RCM.......................................................................................... 8–5
Dialing In and Invoking RCM................................................................................... 8–5
Using RCM Locally......................................................................................................... 8–6
RCM Commands.................................................................................................................... 8–7
Command Conventions.............................................................................................8–8
Dial-Out Alerts..................................................................................................................... 8–16
Enabling Dial-Out Alerts ............................................................................................... 8–16
Composing the Dial-Out String...................................................................................... 8–17
Using the RCM Switchpack.................................................................................................. 8–19
Uses of the Switchpack.................................................................................................. 8–21
Changing a Switch Setting............................................................................................. 8–21
Resetting the RCM to Factory Defaults.......................................................................... 8–22
Troubleshooting Guide ......................................................................................................... 8–23
Modem Dialog Details.......................................................................................................... 8–26
Default Initialization and Answer Strings....................................................................... 8–26
Modifying Initialization and Answer Strings.................................................................. 8–26
Initialization String Substitutions...................................................................................8–27
vii
Page 8

Figures

Figure 1-1 System Enclosure................................................................................ 1–2
Figure 1-2 Cover Interlock Circuit........................................................................ 1–3
Figure 1-3 Control Panel Assembly ...................................................................... 1–4
Figure 1-4 AlphaBIOS Boot Menu........................................................................ 1 6
Figure 1-5 Architecture Diagram.......................................................................... 1–8
Figure 1-6 CPU Module Placement......................................................................1–10
Figure 1-7 Memory Placement.............................................................................1–12
Figure 1-8 How Memory Addressing Is Calculated..............................................1–14
Figure 1-9 System Motherboard...........................................................................1–16
Figure 1-10 System Bus Block Diagram..............................................................1–18
Figure 1-11 System Bus to PCI Bus Bridge Block Diagram.................................1–20
Figure 1-12 PCI Block Diagram ..........................................................................1–22
Figure 1-13 Remote Control Logic......................................................................1–24
Figure 1-14 Power Control Logic.........................................................................1–26
Figure 1-15 Power Circuit Diagram.....................................................................1–28
Figure 1-16 Back of Power Supply and Location.................................................1–30
Figure 1-17 Power Up/Down Sequence Flowchart...............................................1–32
Figure 1-18 I
Figure 1-19 StorageWorks Drive Location...........................................................1–36
Figure 2-1 Control Panel and LCD Display........................................................... 2–2
Figure 2-2 Power-Up Flow ................................................................................... 2–4
Figure 2-3 Contents of FEPROMs ........................................................................ 2–5
Figure 2-4 Console Code Critical Path (Block Diagram)....................................... 2–6
Figure 2-5 SROM Power-Up Test Flow................................................................ 2–8
Figure 2-6 XSROM Power-Up Flowchart ............................................................2–12
Figure 2-7 Console Device Determination Flowchart...........................................2–18
Figure 3-1 System Motherboard LEDs.................................................................. 3–2
Figure 5-1 System FRU Locations........................................................................ 5–2
Figure 5-2 Exposing the System ........................................................................... 5–6
Figure 5-3 Removing CPU Module....................................................................... 5–8
Figure 5-4 Removing CPU Fan............................................................................5–10
Figure 5-5 Removing Memory Riser Card...........................................................5–12
Figure 5-6 Removing a DIMM from a Memory Riser Card..................................5–14
Figure 5-7 Removing System Motherboard..........................................................5–16
Figure 5-8 Removing PCI/EISA Option...............................................................5–18
Figure 5-9 Removing Power Supply ....................................................................5–20
Figure 5-10 Removing Power Harness................................................................ 522
Figure 5-11 Removing System Fan..................................................................... 524
Figure 5-12 Removing Cover Interlock............................................................... 526
Figure 5-13 Removing the OCP.......................................................................... 528
Figure 5-14 Removing CD-ROM........................................................................ 530
2
C Bus Block Diagram.....................................................................1–34
viii
Page 9
Figure 5-15 Removing Floppy ............................................................................ 532
Figure 5-16 Removing StorageWorks Disk......................................................... 534
Figure 5-17 Removing StorageWorks Backplane................................................ 536
Figure 5-18 Removing StorageWorks Ultra SCSI Bus Extender..........................538
Figure 6-1 Running a Utility from a Graphics Monitor.......................................... 6–2
Figure 6-2 Starting LFU from the AlphaBIOS Console......................................... 6–6
Figure 6-3 AlphaBIOS Setup Screen................................................................... 619
Figure 8-1 RCM Connections ............................................................................... 8–3
Figure 8-2 Location of RCM Switchpack on System Board ................................ 8–19
Figure 8-3 RCM Switches (Factory Settings)...................................................... 8–20

Tables

Table 1-1 Chip Description................................................................................. 1–11
Table 1-2 CPU Variants...................................................................................... 1–11
Table 1-3 Memory Variants................................................................................ 1–13
Table 1-4 PCI Motherboard Slot Numbering....................................................... 1–22
Table 1-5 Remote Control Switch Functions....................................................... 1–25
Table 2-1 Control Panel Display........................................................................... 23
Table 2-2 SROM Tests ....................................................................................... 2–10
Table 2-3 XSROM Tests..................................................................................... 2–13
Table 2-4 Memory Tests..................................................................................... 2–14
Table 2-5 IOD Tests ........................................................................................... 2–16
Table 2-6 PCI Motherboard Tests ....................................................................... 217
Table 4-1 External Interface Status Register ......................................................... 4–4
Table 4-2 Loading and Locking Rules for External Interface Registers................. 4–5
Table 4-3 MC Error Information Register 0.......................................................... 4–6
Table 4-4 MC Error Information Register............................................................. 4–8
Table 4-5 CAP Error Register............................................................................... 4–9
Table 4-6 PCI Error Status Register.................................................................... 4–11
Table 5-1 Field-Replaceable Unit Part Numbers................................................... 5–2
Table 6-1 AlphaBIOS Option Key Mapping .........................................................63
Table 6-2 LFU Command Summary................................................................... 616
Table 7-1 Results of Pressing the Halt Button....................................................... 72
Table 7-2 Summary of SRM Console Commands.................................................75
Table 7-3 Environment Variable Summary........................................................... 77
Table 7-4 Environment Variables Worksheet........................................................ 78
Table 8-1 RCM Command Summary.................................................................... 8–7
Table 8-2 RCM Status Command Fields............................................................. 8–15
Table 8-3 Elements of the Dial-Out String.......................................................... 8–18
Table 8-4 RCM Switch Settings.......................................................................... 820
Table 8-5 RCM Troubleshooting ........................................................................823
ix
Page 10

Intended Audience

This manual is written for the customer service engineer.

Document Structure

This manual uses a structured documentation design. Topics are organized into small sections for efficient reference. Each topic begins with an abstract, followed by an illustration or example, and ends with descriptive text. This manual has eight chapters, as follows:
Chapter 1, System Overview, introduces the DIGITAL Server 5300 system. It
describes each system component.
Chapter 2, Power-Up, provides information on how to interpret the power-up display
on the operator control panel, the console screen, and system LEDs. It also describes how hardware diagnostics execute when the system is initialized.
Chapter 3, Troubleshooting, describes troubleshooting during power-up and booting,
as well as the test command.
Chapter 4, Error Registers, describes the error registers used to hold error
information.
Chapter 5, Removal and Replacement, describes removal and replacement
procedures for field-replaceable units (FRUs).
Chapter 6, Running Utilities, explains how to run utilities such as the EISA
Configuration Utility and RAID Standalone Configuration Utility.
Chapter 7, Halts, Console Commands, and Environment Variables, summarizes the
commands used to examine and alter the system configuration.
Chapter 8, Managing the System Remotely, describes how to use the Remote
Console Manager (RCM) to monitor and control the system remotely.

Preface

xi
Page 11

Documentation Titles

The following table lists other books in the documentation set.
System Documentation
Title Order Number
User and Installation Documentation Kit
DIGITAL Server 5300 User’s Guide DIGITAL Server 5300 Basic Installation

Information on the Internet

Access the latest system firmware with a Web browser as follows:
http://www.windowsnt.digital.com/
QC–06CAB–H8 ERK8FWWUA ER–K8FWW–IM
xii
Page 12
1

System Overview

The DIGITAL Server 5300 system base unit consists of up to two CPUs, up to 2 Gbytes of memory, 6 I/O slots, and up to 7 SCSI storage devices. The system is enclosed in a pedestal. DIGITAL Server 5300 systems can also be mounted in a standard 19” rack.
The DIGITAL Server 5300 system supports the Windows NT operating system. Topics in this chapter include the following:
System Enclosure
Operator Control Panel and Drives
System Consoles
System Architecture
CPU Types
Memory
Memory Addressing
System Motherboard
System Bus Backplane
System Bus to PCI Bus Bridge
PCI I/O Subsystem
Remote Control Logic
Power Control Logic
Power Circuit and Cover Interlock
Power Supply
Power Up/Down Sequence
Maintenance Bus (I
StorageWorks Drives
2
C Bus)
DIGITAL Server 5300 1–1
Page 13
System Overview

System Enclosure

The system has up to two CPU modules and up to 2 Gbytes of memory. A single fast wide or fast wide Ultra SCSI StorageWorks shelf provides storage.
Figure 1-1 System Enclosure
4
1
2
5
PKW-0500-97
3
6
The numbered callouts in Figure 1-1 refer to the system components.
System card cage, which holds the system motherboard and the CPU, memory, and
system I/O.
PCI/EISA section of the system card cage.
Operator control panel assembly, which includes the control panel, the LCD display,
and the floppy drive.
CD-ROM drive.
Cooling section containing two fans.
StorageWorks shelf.
1–2
DIGITAL Server 5300
Page 14

Cover Interlock

The system has a single cover interlock switch tripped by the top cover. To override the cover interlock, use a suitable object to close the interlock circuit. Disk damage will result if the system is run with the top cover off.
Figure 1-2 Cover Interlock Circuit
System Overview
Power Supply
J30
Switch
pack
Mo therboard
J2
J7
Cover
Interlo ck
Push button
ON/OFF
OCP
DC_ENABLE_L
Cover
Inte rlock
Switch
PKW 0503-97
__________________________ Note _____________________________
The cover interlock must be engaged to enable power-up.
____________________________________________________________
DIGITAL Server 5300
1–3
Page 15
System Overview

Operator Control Panel and Drives

The control panel includes the On/Off, Halt, and Reset buttons and an LCD display.
Figure 1-3 Control Panel Assembly
CD-ROM
Floppy
OC P D isplay
1
2
3
PKW-0501-97
OCP display. The OCP display is a 16-character LCD that indicates status during power­up and self-test. While the operating system is running, the LCD displays the system type. Its controller is on the XBUS.
CD-ROM. The CD-ROM drive is used to load software, firmware, and updates. Its controller is on PCI1 on the PCI backplane on the system motherboard.
Floppy disk. The floppy drive is used to load software. The floppy controller is on the XBUS on the PCI backplane on the system motherboard.
1–4
DIGITAL Server 5300
Page 16
System Overview
On/Off button. Powers the system on or off. When the LED to the right of the button is lit, the power is on. The On/Off button is connected to the power supplies through the system interlock and the RCM logic.
Reset button. Initializes the system.
Halt button. When the halt button is pressed, different results are manifest depending upon the state of the machine.
To get to the SRM console, press the Halt button and then press the Reset button. (Pressing the Halt button when the system is running Windows NT causes a “halt assertion” flag to be set in the firmware. When Reset is pressed the console reads the “halt assertion” flag and ignores environment variables that would cause the system to boot.)
Function of the Halt button is complex because it depends upon the state of the machine when the button is pressed. See “Halt Button Functions” in Chapter 7 for a full discussion of the Halt button.
DIGITAL Server 5300
1–5
Page 17
System Overview
gsy
g

System Consoles

There are two console programs: the SRM console and the AlphaBIOS console.
_____________________________ Note ____________________________
NOTE: The console prompt displays only after the entire power-up sequence is complete. This can take up to several minutes if the memory is very large.
______________________________________________________________

AlphaBIOS Boot Menu

On systems running the Windows NT operating system, the Boot menu is displayed when the AlphaBIOS console is invoked (see Figure 1-4).
Figure 1-4 AlphaBIOS Boot Menu
Please select the operatin
A lph a B IO S 5.3 2
stem to start:
W indow s NT S erver 4 .0
Use and to move th e hi Press Enter to ch oose.
diigtal

SRM Console

The SRM console is a command-line interface that provides support for examining and modifying the system state and configuring and testing the system. The SRM console can be run from a serial terminal or a graphics monitor. The following console prompt is displayed whenever the SRM console is invoked:
P00>>>
1–6
DIGITAL Server 5300
hlight toyour ch oice.
D IG ITAL Server 5300
Press <F2> to enter SETU P
ML014366
Page 18

AlphaBIOS Console

The AlphaBIOS console is a menu-based interface that supports the Microsoft Windows NT operating system. AlphaBIOS is used to set up operating system selections, boot Windows NT, and display information about the system configuration. The EISA Configuration Utility and the RAID Standalone Configuration Utility are run from the AlphaBIOS console. AlphaBIOS runs on either a serial or graphics terminal. Windows NT requires a graphics monitor.

Environment Variables

Environment variables are software parameters that define, among other things, the system configuration. They are used to pass information to different pieces of software running in the system at various times.
Refer to Chapter 7 of this guide for a list of the environment variables used to configure a system.
Refer to your system User’s Guide for information on setting environment variables. Most environment variables are stored in the NVRAM that is placed in a socket on the
system motherboard. Even though the NVRAM can be removed and replaced on a new system motherboard, it is recommended that you keep a record of the environment variables for each system that you service. Some environment variable settings are lost when a module is swapped and must be restored after the new module is installed. Refer to Chapter 7 for a convenient worksheet for recording environment variable settings.
System Overview
DIGITAL Server 5300
1–7
Page 19
System Overview

System Architecture

Alpha microprocessor chips are used in these systems. The CPU, memory, and the I/O modules are connected to the system motherboard.
Figure 1-5 Architecture Diagram
Xceivers
´
EISA Bridge
EISA Bus
XBUS
CPU
Memory
Pair
System B us
128-Bit D ata B us + 16 E CC a nd 40-B it Co mm and /Address Bus
PCI Bus 0
64 Bits
PCI Slot
System to PCI Bus Bridge 0 IOD0
System
System to PCI Bus Bridge 1 IO D1
Motherbo ard
PCI Slot
No te : W he n the EIS A /ISA slot
PCI Slot
on PCI Bus 0 is used, the last PCI slot on PCI Bus 1 is not available.
EISA Slot
Real-Tim e
Clock
Combo I/O: serial ports
parallel port
floppy cntrl
XBUS
Mouse/
Keyboard
I2C Bu s
Interface
BDATA
Xceivers
NVRAM
8Kx8
Flash ROM
2MB
PCI Bus 1
64 Bits
PCI Slot
PCI Slot
PCI Slot
PKW 0502-97
1–8
DIGITAL Server 5300
Page 20
System Overview
Both systems use the Alpha chip for the CPU. The CPU, memory, and I/O devices connect to the system motherboard. On the system motherboard is:
The system bus
Two system bus to PCI bus chip sets that bridge two PCI buses to the system bus
Two 64-bit PCI buses with three PCI options slots each (5 64-bit PCI slots; 1 32-bit
PCI slot)
One EISA/ISA bus bridged to one of the PCIs (If an EISA/ISA option is used, one PCI
slot cannot be used)
One CD-ROM controller built in to the other PCI
One EISA/ISA to XBUS bridge to the built-in XBUS options
A fully configured system can have two CPUs, eight DIMM memory pairs, and a total of six I/O options. The I/O options can be all PCI options or a combination of PCI options and a single EISA/ISA option.
The system bus has a 144-bit data bus, protected by 16 bits of ECC, and a 40-bit command/address bus, protected by parity. The bus speed is set to 66.6 MHz. The 40-bit address bus can create one terabyte of addresses (that’s a million million). The bus connects CPUs, memory, and the system bus to PCI bus bridge(s).
There is a cache external to the CPU chip on CPU modules. The Alpha chip has an 8­Kbyte instruction cache (I-cache), an 8-Kbyte write-through data cache (D-cache), and a 96-Kbyte, write-back secondary data cache (S-cache). The cache system is write-back. The system supports up to two CPUs.
Memory on these systems is constructed of DIMM memory pairs placed onto two memory modules called riser cards. The riser cards are placed into the two memory slots on the system motherboard. One member of a DIMM pair is placed onto one riser card, and the other member is placed onto another riser card. Each riser card drives half of the system bus, along with the associated ECC bits. Memory pairs consist of two synchronous DIMMs of the same size and are placed into the same slot on each riser card.
The system bus-to-PCI bus bridge chip set translates system bus commands and data addressed to I/O space to PCI commands and data. It also translates PCI bus commands and data addressed to system memory or CPUs to system bus commands and data. The PCI bus is a 64-bit wide bus used for I/O.
Logic and sensors on the system motherboard monitor power status and the system environment (temperature and fan speeds).
DIGITAL Server 5300
1–9
Page 21
System Overview

CPU Types

There are several CPU variants differentiated by CPU speeds.
Figure 1-6 CPU Module Placement
Bulkhead connectors
PCI 0 Slot2 PC I 0 Slot 3 PC I 0 Slot 4 PCI 1 Slot2 PC I 1 Slot 3 PC I 1 Slot 4
EISA/ISA Slot
RCM Sw itch­pack
Power connectors
Floppy
connector
OCP
conne ctor
Fan connectors
CPU 0 MEM L
CPU 1 MEM H
LEDs
PCI Bridges
Internal SC SI connector
RCM power-down connector
Speaker connector
PKW0504A-97
1–10
DIGITAL Server 5300
Page 22

Alpha Chip Composition

The Alpha chip is made using state-of-the-art chip technology, has a transistor count of 9.3 million, consumes 50 watts of power, and is air cooled (a fan is on the chip). The default cache system is write-back and when the module has an external cache, it is write-back. The Alpha chip used in these systems is the 21164.
Table 1-1 Chip Description
Unit Description
Instruction 8-Kbyte cache, 4-way issue Execution 4-way execution; 2 integer units, 1 floating-point adder, 1 floating-
point multiplier
Memory Merge logic, 8-Kbyte write-through first-level data cache,
96-Kbyte write-back second-level data cache, bus interface unit
Table 1-2 CPU Variants
System Overview
Module Variant
Clock Frequency
B3007-AA 400 MHz 4 Mbytes Orange B3007-CA 533 MHz 4 Mbytes Violet

CPU Configuration Rules

The following rules should be applied to CPU configuration:
The first CPU must be in CPU slot 0 to provide the system clock.
The second CPU should be installed in CPU slot 1.
Both CPUs must have the same Alpha chip clock speed. The system bus may hang
without an error message if the oscillators clocking the CPUs are different.
Onboard Cache
Color
DIGITAL Server 5300
1–11
Page 23
System Overview

Memory

Memory consists of two riser cards and up to eight pairs of DIMMs. Each riser card receives one of the two DIMMs in the DIMM pair. There are two DIMM variants: a 32-Mbyte version and a 128-Mbyte version.
Figure 1-7 Memory Placement
Bulkhead connectors
PCI 0 Slot2 PC I 0 Slot 3 PC I 0 Slot 4 PCI 1 Slot2 PC I 1 Slot 3 PC I 1 Slot 4
EISA/ISA Slot
RCM Sw itch­pack
Power connectors
Floppy
connector
OCP
conne ctor
Fan connectors
CPU 0 MEM L
CPU 1 MEM H
LEDs
PCI Bridges
Internal SC SI connector
RCM power-down connector
Speaker connector
PKW0504B-97
1–12
DIGITAL Server 5300
Page 24

Memory Variants

Memory consists of two riser cards supporting eight DIMM pairs. There are two DIMM variants: a 32-Mbyte version and a 128-Mbyte version. Maximum memory using 32­Mbyte DIMMs is 128 Mbytes and the maximum memory using 128-Mbyte DIMMs is 2 Gbytes. All memory is synchronous.
Table 1-3 Memory Variants
Option Size Module Type DRAM No. Size
MS300-BA 64 MB 54-25084-DA Synch. 18 4M x 72 = 32MB
MS300-DA 256 MB 54-25092-DA Synch. 18 16M x 72 = 128MB

Memory Operation

Memory drives the system bus in bursts. Upon each memory fetch, data is transferred in 4 consecutive cycles transferring 64 bytes. Each DIMM in the pair provides half the data, or 64 bits plus 8 ECC bits, of the octaword (16 byte) transferred on the system bus. DIMMs are placed in slots on the riser cards, which are placed in the slots designated MEM L and MEM H on the system motherboard.
System Overview
20-47405-D3
20-45619-D3
__________________________NOTE ____________________________
Memory in slot MEM L does not drive the lower 8 bytes, and memory in slot MEM H does not drive the higher 8 bytes of the 16-byte transfer. Some bits originating from MEM L are high order bits, and some bits originating from MEM H are low order bits.
____________________________________________________________

Memory Configuration Rules

In a system, memories of different sizes are permitted, but:
DIMMs are installed and used in pairs. Both DIMMs in a memory pair must be of
the same size.
Each riser card receives one DIMM of the DIMM pair.
The largest DIMM pair must be in riser card slot 0.
Other memory pairs must be the same size or smaller than the first memory pair.
DIGITAL Server 5300
1–13
Page 25
System Overview
Memory pairs must be installed in consecutive slots.
Memory configurations that have a 64-Mbyte pair in riser card slot 0 are limited to
two DIMM pairs or 128 Mbytes for the system. (The reason for this restriction is that the bit map describing memory holes can grow larger than physical memory.)

Memory Addressing

Memory addressing in these systems is fixed regardless of the size of the DIMMs. The address of a DIMM pair is fixed according to the slot in which the pair is placed. The starting address of each pair in each slot on the riser card starts on a 512-Mbyte boundary.
Figure 1-8 How Memory Addressing Is Calculated
Address Spa ce
Gbytes
4.0
Riser Card
Slot
3.5
3.0
2.5
2.0
1.5
1.0
.5
0
1–14
DIGITAL Server 5300
e0000000
c0000000
a0000000
80000000
60000000
40000000
20000000
00000000
7 6
5 4
3 2
1 0
PKW 0505 -97
Page 26
System Overview
The rules for addressing memory are as follows:
1. A memory pair consists of two DIMMs of the same size.
2. Memory pairs in riser cards may be of different sizes.
3. The memory pair in slot 0 must be the largest of all memory pairs. Other memory
pairs may be as large but none may be larger.
4. The physical starting address of each memory pair is N times 512 Mbytes (2000000)
where N is the slot number on the riser card.
5. Memory addresses are contiguous within each memory pair.
6. If memory pairs do not completely fill the 512-Mbyte space provided, memory
“holes” occur in the physical address space.
7. Software creates contiguous virtual memory even though physical memory may not be
contiguous.
DIGITAL Server 5300
1–15
Page 27
System Overview

System Motherboard

The system motherboard contains five major logic sections performing five major system functions.
Figure 1-9 System Motherboard
PCI 0 Slot2 PCI 0 Slot 3 PCI 0 Slot 4 PCI 1 Slot2 PCI 1 Slot 3 PCI 1 Slot 4
EISA/ISA Slot
Power Control Logic
Remote Control Logic
Power conn ectors
System Bus Backplane
System Bu s
to
PCI Bus Bridges
PCI Backp lane and Leg acy I/O Devices
Floppy
connector
OCP
connector
Fan connectors
CPU 0 MEM L
CPU 1 MEM H
Internal S CSI connector
Speaker connector
PKW 0504F -97
1–16
DIGITAL Server 5300
Page 28
System Overview
The five sections on the system motherboard are:
The system bus or the CPU and memory backplane
The power control logic
The remote control logic
The system bus to PCI bus bridges
The PCI backplane containing two PCI buses, an EISA/ISA bus, a built-in CD-ROM
controller, and an XBUS with several devices integral to the system.
DIGITAL Server 5300
1–17
Page 29
System Overview

System Bus (Backplane)

The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals and clocks. The system bus is part of the system motherboard.
Figure 1-10 System Bus Block Diagram
SYNC DRAMS
CPU1
CPU0
P CI/I SA
PCI/ISA0
PCI1
MEM0
ADR
DATA
CTRL
A L P H A
CTRL
EV_ADR EV_DATA
MC to PCI B ridge
IOD0
IOD1
SIM_ADR
MEM CTRL& CNTRL ARB
ROW COL
ADR
MC Bus Control
MC ADR <39:4>
MC DATA <127:0>
PKW 0506-97
1–18
DIGITAL Server 5300
Page 30
System Overview
The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals, clocks, and a bus arbiter. The bus requires that all CPUs have the same high-speed oscillator providing the clock to the Alpha chip.
The system bus connects up to two CPUs, up to eight DIMM memory pairs on two riser cards, and two I/O bus bridges.
The system bus clock is provided by an oscillator on the CPU in slot CPU0. This oscillator is adjusted to maintain the system bus at a 66 MHz speed no matter what the speed of the CPU is.
The system bus backplane initiates memory refresh transactions. Five volt, 3.43 volt, and 12 volt power is provided directly to the motherboard from the
power supplies.
DIGITAL Server 5300
1–19
Page 31
System Overview

System Bus to PCI Bus Bridge

The bridge is the physical interconnect between the system bus and the PCI bus.
Figure 1-11 System Bus to PCI Bus Bridge Block Diagram
System Bus
Control
Address
ECC & Data <63:0>
ECC & Data <127:64>
Control
CAP
MDPA
MDPB
PCI Bus
AD<31:0>
Data A to B bus
Data A to B & BtoAbus
AD<63:32>
PKW0507-97
1–20
DIGITAL Server 5300
Page 32
System Overview
The system bus to PCI bus bridge module converts system bus commands and data addressed to I/O space to PCI commands and data; and converts PCI bus commands and data addressed to system memory or CPUs to system bus commands and data.
The bridge has two major components:
Command/address processor (CAP) chip
Two data path chips (MDPA and MDPB)
There are two sets of these three chips, one set for each PCI. The interface on the system bus side of the bridge responds to system bus commands
addressed to the upper 64 Gbytes of I/O space. I/O space is addressed whenever bit <39> on the system bus address lines is set. The space so defined is 512 Gbytes in size. The first 448 Gbytes are reserved and the last 64 Gbytes, when bits <38:36> are set, are mapped to the PCI I/O buses.
The interface on the PCI side of the bridge responds to commands addressed to CPUs and memory on the system bus. On the PCI side, the bridge provides the interface to the PCIs. Each PCI bus is addressed separately. The bridge does not respond to devices communicating with each other on the same PCI bus. However, should a device on one PCI address a device on the other PCI bus, commands, addresses, and data run through the bridge out onto the system bus and back through the bridge to the other PCI bus.
In addition to its bridge function, the system bus to PCI bus bridge module monitors every transaction on the system bus for errors. It monitors the data lines for ECC errors and the command/address lines for parity errors.
DIGITAL Server 5300
1–21
Page 33
System Overview

PCI I/O Subsystem

The I/O subsystem consists of two 64-bit PCI buses. One has an embedded EISA/ISA bridge and three PCI option slots; the other has a built-in CD-ROM driver and three PCI option slots.
Figure 1-12 PCI Block Diagram
m
PCI-1 Bus
NVRA M
8Kx8
Serial
Int errupt
Logic
33.3MHz Osc
Clock Bfr
Serial
Inter rup t
Logic
Flash ROM
2MB
3&, %XV
BDATA
Xceivers
S y s
t
e
B u s
Realtime
Clock
PCI-1
3 64-bit slots
PCI-0 2 64-bit slots 1 32-bit slot
XBUS
Combo I/O: serial ports
parallel port
floppy cntrl
Mouse/
Keyboard
SCSI Co ntrol
53C810
Connector
PCI to EISA/ISA
B rid g e Chipse t
I2C Bus
Interface
XBUS
Xceivers
PKW 0508-97
Table 1-4 PCI Motherboard Slot Numbering
Slot PCI0 PCI1
1 PCI to EISA/ISA bridge Internal CD-ROM controller 2 PCI slot PCI slot 3 PCI slot PCI slot 4 PCI slot PCI slot
EISA: 116-
bit slot
40MHz
Clock
EISA Data Bus
1–22
DIGITAL Server 5300
Page 34
System Overview
The logic for two PCI buses is on each PCI motherboard.
PCI0 is a 64-bit bus with a built-in PCI to EISA/ISA bus bridge. PCI0 has three PCI
slots and one EISA/ISA slot. When the EISA/ISA slot is used, PCI slot 4 on PCI bus 1 is not available. An 8-bit XBUS is connected to the EISA/ISA bus. On this bus there is an interface to the system I2 C bus; mouse and keyboard support; an I/O combo controller supporting two serial ports, the floppy controller, and a parallel port; a real-time clock; two 1-Mbyte flash ROMs containing system firmware, and an 8­Kbyte NVRAM.
PCI1 is a 64-bit bus with a built-in CD-ROM SCSI controller with three PCI slots. Cable connectors to the CD-ROM, the floppy, and the OCP are on the motherboard.
Connectors for the mouse, keyboard, two COM ports, the serial port, and a modem are on the system bulkhead. The bulkhead is part of the system motherboard.
DIGITAL Server 5300
1–23
Page 35
System Overview

Remote Control Logic

A section of the motherboard provides remote control operation of the system. A four­switch switchpack enables or disables remote control features.
Figure 1-13 Remote Control Logic
RCM
Switchpack
System M otherboard
SET DEF
RPD DIS
MODEM OFF
EN RCM
4 3 2 1
RCM power VAU X fro m power supplie s
PKW 0504C-97
1–24
DIGITAL Server 5300
Page 36
System Overview
The system allows both local and remote control. A set of switches enables or disables remote control.
Table 1-5 Remote Control Switch Functions
Switch Condition Function
1 EN RCM On (default) Allows remote system control
Off Does not allow remote system control
2 Modem Off On Disables the RCM modem port
Off (default) Enable the RCM modem port
3 RPD DIS On Disables remote power down
Off (default) Enables remote power down
4 SET DEF On Resets the RCM microprocessor defaults
Off (default) Allows use of conditions set by the user
The default settings allow complete remote control. The user would have to change the switch settings to any other desired control.
See Chapter 8 for information on controlling the system remotely. The remote console manager connects to a modem through the modem port on the
bulkhead. The RCM uses VAUX power provided by the system power supplies. The standard I/O ports (keyboard, mouse, COM1 and COM2 serial ports, and parallel
ports) are on the same bulkhead.
DIGITAL Server 5300
1–25
Page 37
System Overview

Power Control Logic

The power control section of the motherboard controls power sequencing and monitors power supply voltage, system temperature, and fans.
Figure 1-14 Power Control Logic
Pow e r control logic
System Motherbo ard
1–26
DIGITAL Server 5300
PK W 0 50 4D -97
Page 38
System Overview
The power control logic performs these functions:
Monitors system temperature and powers down the system 30 seconds after it detects
that internal temperature of the system is above the value of the environment variable over_temp. Default = 550 C.
Monitors the system and CPU fans at one second intervals and powers down the
system 30 seconds after it detects a fan failure.
Provides some visual indication of faults through LEDs.
Controls reset sequencing.
Provides I
Power supply 0, 1: present
Power supply 0, 1: power OK
CPU fan 0, 1: OK
CPU 1: present
Overtemp: Temp OK
System fan 0, 1: OK
2
C interface for fans, power supplies, and temperature signals:
Fan Kit OK
DIGITAL Server 5300
1–27
Page 39
System Overview

Power Circuit and Cover Interlock

Power is distributed throughout the system and mechanically can be broken by the On/Off switch, the cover interlock, or remotely through the RCM.
Figure 1-15 Power Circuit Diagram
Power Supply
J30
Switch
pack
Motherboard
J7
J2
Cover
Int e rlo ck
Push button
ON/OFF
OCP
DC _EN AB LE_L
PKW 0503A-97
1–28
DIGITAL Server 5300
Page 40
System Overview
Figure 1-14 shows the distribution of power throughout the system. Opens in the circuit or the RCM signal RCM_DC_EN_L, or a power supply detected power fault interrupt DC power applied to the system. The opens can be caused by the On/Off button or the cover interlock.
A failure anywhere in the circuit will result in the removal of DC power. A potential failure is the relay used in the remote control logic to control the RCM_DC_EN_L signal.
The cover interlock is located under the top cover between the system card cage and the storage area. To override the interlock, place a suitable object in the interlock switch that closes it.
DIGITAL Server 5300
1–29
Page 41
System Overview

Power Supply

Two power supplies provide system power.
Figure 1-16 Back of Power Supply and Location
C urren t share
Power Supply 1
Power Supply 0
+5V/Return
+12V/Return
1–30
DIGITAL Server 5300
Misc. Signal
+5V/Return
+3 .4 V/Retur n
PKW0513-97
Page 42

Description

Two power supplies each provide 450 W to the system. Redundant power is not available at this time.

Power Supply Features

88–132 and 176–264 Vrms AC input
450 watts output. Output voltages are as follows:
Output Voltage Min. Voltage Max. Voltage Max. Current
+5.0 4.90 5.25 52 +3.43 3.400 3.465 37.4 +12 11.5 12.6 17 –12 –13.2 –10.9 0.5 –5.0 –5.5 –4.6 0.2 Vaux 4.85 5.25 0.6
Remote sense on +5.0V and +3.43V
System Overview
+5.0V is sensed on the system motherboard. +3.43V is sensed on all CPUs in the system and the system bus motherboard.
Current share on +5.0V, +3.43V, and +12V.
1 % regulation on +3.43V.
Fault protection (latched). If a fault is detected by the power supply, it will shut
down. The power supply faults detected are:
Fan Failure Over-voltage Overcurrent Power overload
DC_ENABLE_L input signal starts the DC outputs.
SHUTDOWN_H input signal shuts the power supply off in case of a system fan or
CPU fan failure.
POK_H output signal indicates that the power supply is operating properly.
DIGITAL Server 5300
1–31
Page 43
System Overview

Power Up/Down Sequence

System power can be controlled manually by the On/Off button on the OCP or remotely through the RCM. The power-up/down sequence flow is shown below.
Figure 1-17 Power Up/Down Sequence Flowchart
Apply AC
Pow e r
Vaux on
Off
On-Off Bu tton
Assert
SHUTDOWN
30 Second
Delay
On
Disable Ou tputs
Deassert POK
DC_ENABLE_L
Power Supply
Yes
No
Fan/Temp
On-O ff Button
Assert
Starts
Any
Faults
Assert
POK
OK
On
No
Yes
Off
On-Off Button
On
Off
PKW-0513A-97
1–32
DIGITAL Server 5300
Page 44
System Overview
When AC is applied to the system, Vaux (auxiliary voltage) is asserted and is sensed by the power control logic (PCL) section of the motherboard if the On-Off Button is On. The PCL asserts DC_ENABLE_L starting the power supplies. If there is a hard fault on power-up, the power supplies shut down immediately; otherwise, the power system powers up and remains up until the system is shut off or the PCL senses a fault. If a power fault is sensed, the signal SHUTDOWN is asserted after a 30 second delay. Cycling the On-Off button can restore the power.
DIGITAL Server 5300
1–33
Page 45
System Overview
Maintenance Bus (I2C Bus)
The IC bus (referred to as the “I squared C bus”) is a small internal maintenanc e bus used to monitor sy stem conditions scanne d by the power control logic, write the fault display, store error state, and track configuration information in the system. Although all system modules (not I/O modules) sit on the maintenance bus, only the IC controller accesses it.
Figure 1-18 I2C Bus Block Diagram
Motherboard
Thermom/ Thermo stat
1
CPU 0
PCL
Registers
ICBus
up to 8
Memory
Pairs
2
CPUs
MEMs
IOD 1
IOD 0
1–34
DIGITAL Server 5300
PCI 1
PCI 0
Controller
OCP
2
ICBus
Controller
XBUS
PKW 0511-97
IOD0
PCI0
ISA
Page 46

Monitor

The I2C bus monitors the state of system conditions scanned by the power control logic. There are two registers that the PC logic writes data to:
One records the state of the fans and power supplies and is latched when there is a
fault.
The other causes an interrupt on the I
overtemperature condition exists, or power supplied to the system exhibits an overcurrent condition.
The interrupt received by the I set alerts the system of imminent power shutdown. The controller has 30 seconds to read the two registers and store the information in the EEPROM on the motherboard. The SRM console command show power reads these registers.

Fault Display

The OCP display is written through the I2C bus.

Error State

Error state is stored for power, fan, and overtemperature conditions on the I2C bus.
System Overview
2
C bus when a CPU or system fan fails, an
2
C bus controller on PCI 0 and passed on to the IOD 0 chip

Configuration Tracking

Each CPU and each logical section of the system motherboard (the PCI bridge, the PCI backplane, the power control logic, the remote console manager), and the system motherboard itself has an EEPROM that contains information about the module that can be written and read over the I
Module type
Module serial number
Hardware revision for the logical block
Firmware revision
2
C bus. All EEPROMs contain the following information:
DIGITAL Server 5300
1–35
Page 47
System Overview

StorageWorks Drives

The system supports up to seven StorageWorks drives.
Figure 1-19 StorageWorks Drive Location
StorageWorks Drives Shelf
The StorageWorks drives are to the right of the system cage. Up to seven drives fit into the shelf. The system supports fast wide Ultra SCSI disk drives. The RAID controller is also supported. With an optional Ultra SCSI Bus Splitter Kit the StorageWorks shelf can be split into two buses.
1–36
DIGITAL Server 5300
PKW0514-97
Page 48
2

Power-Up

This chapter describes system power-up testing and explains the power-up displays. The following topics are covered:
Control Panel
Power-Up Sequence
SROM Power-Up Test Flow
SROM Errors Reported
XSROM Power-Up Test Flow
XSROM Errors Reported
Console Power-Up Tests
Console Device Determination
Console Power-Up Display
Fail-Safe Loader
DIGITAL Server 5300 2–1
Page 49
Power-Up

Control Panel

The control panel display indicates the likely device when testing fails.
Figure 2-1 Control Panel and LCD Display
&RQWURO 3DQHO
1 2 3
P0 TEST 11 CPU0
4
PKW0510-97
When the On/Off button LED is on, power is applied and the system is running.
When it is off, the system is not running, but power may or may not be present. If the power supplies are receiving AC power, Vaux is present on the system motherboard regardless of the condition of the On/Off switch.
When the Halt button LED is lit and the On/Off button LED is on, the system should
be running either the SRM console or Windows NT.
The potentiometer, accessible through the access hole just above the Reset button controls the intensity of the LCD. Use a small Phillips head screwdriver to adjust.
2–2
DIGITAL Server 5300
Page 50
Table 2-1 Control Panel Display
Field Content Display Meaning
CPU number P0–P1 CPU reporting status
Power-Up
Status TEST¨ Tests are executing
FAIL Failure has been detected MCHK Machine check has occurred INTR Error interrupt has occurred
Test number Suspected device CPU0–1 CPU module number
MEM0–7 and L, H, or *
Memory pair number and low DIMM, high DIMM, or either
IOD0 Bridge to PCI bus 0 IOD1 Bridge to PCI bus 1 FROM0 Flash ROM COMBO COM controller
1
1
PCEB PCI-to-EISA bridge ESC EISA system controller NVRAM Nonvolatile RAM TOY Real-time clock
1
1
I8242 Keyboard and mouse controller
1
1
1
1
1
1
On the system motherboard (54-25147-01).
DIGITAL Server 5300
2–3
Page 51
Power-Up

Power-Up Sequence

Console and most power-up tests reside on the I/O subsystem, not on the CPU nor on any other module on the system bus.
Figure 2-2 Power-Up Flow
SROM code loaded
SRO M tests execute
X S R OM load e d into
each CPU's S-cache

Definitions

SROM. The SROM is a 128-Kbit ROM on each CPU module. The ROM contains minimal diagnostics that test the Alpha chip and the path to the XSROM. Once the path is verified, it loads XSROM code into the Alpha chip and jumps to it.
Power-Up/Reset
into each CPU's
I-cache
XSR OM te sts execute
SRM console loaded
into memo ry
SRM console tests
execute
SR M con sole either
remains in the system
or loads AlphaBIOS
console
PKW 0432B-96
XSROM. The XSROM, or extended SROM, contains back-up cache and memory tests, the I/O subsystem tests for embedded devices, and a fail-safe loader. The XSROM code resides in sector 0 of FEPROM 0 on the XBUS. Sector 2 of FEPROM 0 contains a
2–4
DIGITAL Server 5300
Page 52
Power-Up
duplicate copy of the code and is used if sector 0 is corrupt. Code for sizing DIMM memory resides in sector 1 of FEPROM 0 along with the PAL code.
FEPROM. Two 1-Mbyte programmable ROMs (FEPROMS) are on the XBUS on PCI0. FEPROM 0 contains two copies of the XSROM, and the SRM console and decompression code. FEPROM 1 contains the AlphaBIOS and NT HAL code. See Figure 2-3. These two FEPROMs can be flash updated. Refer to Chapter 6.
Figure 2-3 Contents of FEPROMs
FEPRO M 0 FEPRO M 1
Sector
XSROM
0
Fail Sa fe ldr
Pal Co de
1
XSROM DIMM
XSROM
2
Fail S afe ldr
Decompress
3
and
and
64Kb
64Kb
64Kb
AlphaBIOS
Code
16
SR M
Console
Code
64Kb
1Mbyte
PKW0515-97
DIGITAL Server 5300
2–5
Page 53
Power-Up
y
y
y
For the console to run, the path from the CPU to the XSROM must be functional. The XSROM resides in FEPROM0 on the XBUS, off the EISA bus, off PCI 0, off IOD 0. See Figure 2-4. This path is minimally tested by SROM.
Figure 2-4 Console Code Critical Path (Block Diagram)
EISA
Bus
EISA
Bridge
XBUS
Xceivers
stem
Memor
Pair
System to PCI Bus Bridge 1 IO D1
PCI Bus 1
CPU
stem Bus
S
128-Bit D ata B t Comm and/Address Bus
PCI Bus 0
64 Bits
PCI Slot
us + 16 EC C and 40-Bi
System to
PCI Bus
Bridge 0
IOD 0
S
Motherboard
PCI Slot
Note: When the EISA slot on
PCI Slot
PCIBus0isused,thelast PCI slot on PCI Bus 1 is not available.
EISA Slot
Real-Tim e
Clock
Combo I/O: serial p orts
parallelport
floppy cntrl
XBUS
Mous e/
Keyboard
2
ICBus
Interface
BDATA
Xceivers
NVRAM
8Kx8
Flash ROM
2M B
64 B its
PCI Slot
PCI Slot
PCI Slot
PKW 0502A-97
2–6
DIGITAL Server 5300
Page 54
Power-Up
The SROM contents are loaded into each CPU’s I-cache and executed on power-up/reset. After testing the caches on each processor chip, it tests the path to the XSROM. Once this path is tested and deemed reliable, layers of the XSROM are loaded sequentially into the processor chip on each CPU. None of the SROM or XSROM power-up tests are run from memory—all run from the caches in the CPU chip, thus providing excellent diagnostic isolation. Later power-up tests, run under the console, are used to complete testing of the I/O subsystem.
There are two console programs: the SRM console and the AlphaBIOS console, as detailed in your system User’s Guide. By default, the SRM console is always loaded and I/O system tests are run under it before the system loads AlphaBIOS.
DIGITAL Server 5300
2–7
Page 55
Power-Up

SROM Power-Up Test Flow

The SROM tests the CPU chip and the path to the XSROM.
Figure 2-5 SROM Power-Up Test Flow
For each C PU Initia lize C PU ch ip Tur n off C PU LED
HANG
Yes
D-cache
erro rs
No
Initia lize
PCI-EISA bridge
chip
Read TOY
NVRAM
HANG
HANG
No
Yes
All 3 S-cache
banks pass
Yes
Dupilcate Tag or
Fill errors
No
Light CPU LE D
De term ine P rim a ry
Size IOD
Loopback on
each IOD
Pas s
Light IOD LEDs
Fail
Init ia lize C om bo Ch ip
on XBUS for access
to C O M po rt 1
Init ia lize O CP por t
on XBUS for access
to O CP display
P rint to co n s o le
device and O CP
Initia lize a ll S -c a c h e
banks
Check integrity of
XSROM
Pas s
Load first 8K o f
XSROM into
S-cache
Jump to X S RO M
overlay in S-cach e
Fail tw i c e
PKW 04 32-96
HANG
2–8
DIGITAL Server 5300
Page 56
Power-Up
The Alpha chip built-in self-test tests the I-cache at power-up and upon reset. Each CPU chip loads its SROM code into its I-cache and starts executing it. If the chip is
partially functional, the SROM code continues to execute. However, if the chip cannot perform most of its functions, that CPU hangs and that CPU pass/fail LED remains off. (In these systems, the CPU pass/fail LED is not visible.)
If the system has more than one CPU and at least one passes both the SROM and XSROM power-up tests, the system will bring up the console. The console checks the FW_SCRATCH register where evidence of the power-up failure is left. Upon finding the error, the console sends these messages to COM1 and the OCP:
COM1 (or VGA): Power-up tests have detected a problem with your system
OCP: Power-up failure
DIGITAL Server 5300
2–9
Page 57
Power-Up
Table 2-2 lists the tests performed by the SROM.
Table 2-2 SROM Tests
Test Name Logic Tested
D-cache RAM March test D-cache access, D-cache data, D-
cache address logic
D-cache Tag RAM March test D-cache tag store RAM, D-cache bank
address logic
S-cache Data March test S-cache RAM cells, S-cache data path,
S-cache address path
S-cache Tag RAM March test S-cache tag store RAM, S-cache bank
address logic
I-cache Parity Error test I-cache parity error detection, ISCR
register and error forcing logic, IC_PERR_STAT register and reporting logic
D-cache Parity Error test D-cache parity error detection,
DC_MODE register and parity error forcing logic, DC_PERR_STAT register and reporting logic
S-cache Parity Error test S-cache parity error detection,
IOD Access test Access to IOD CSRs, data path
2–10
DIGITAL Server 5300
SC_CTL register and parity error forcing logic, SC_STAT register and reporting logic
through CAP chip and MDP0 on each IOD, PCI0 A/D lines <31:0>
Page 58

SROM Errors Reported

The SROM reports machine checks, pending interrupt/exception errors, and errors related to corruption of FEPROM 0. If SROM errors are fatal, the particular CPU will hang and only the CPU self-test pass LEDs and/or the LEDs on the system motherboard will indicate the failure. The CPU self-test pass LED is not visible but the IOD0 and IOD1 pass LEDs are.
Example 2-1 SROM Errors Reported at Power-Up
Unexpected Machine Check (CPU Error)
UNEX MCHK on CPU 0 EXC_ADR 42a9 EI(STAT fffffff004ffffff EI(ADDR ffffff000000801f SC(STAT 0 SC(ADDR FFFFFF0000005F2F
Pending Interrupt/Exception (CPU Error)
Power-Up
INT-EXC on CPU0 ISR 400000 EI(STAT fffffff007ffffff EI(ADDR ffffff7fffffffdf FIL(SYN 631B BCTGADR ffffffa7fffcafff
FEPROM Failures (PCI Motherboard Error)
Sector 0 failures (XSROM flash unload failure)
Sctr 0 -XSROM headr PTTRN fail Sctr 0 -XSROM headr CHKSM fail Sctr 0 -XSROM code CHKSM fail
Sector 2 failures (XSROM recovery flash unload failure)
Sctr 2 -XSROM headr PTTRN fail Sctr 2 -XSROM headr CHKSM fail Sctr 2 -XSROM code CHKSM fail
DIGITAL Server 5300
2–11
Page 59
Power-Up
g
g
y
g
g
y
y
y
g
y
y
y
g
y
y
g

XSROM Power-Up Test Flow

Once the SROM has completed its tests and verified the path to the FEPROM containing the XSROM code, it loads the first 8 Kbytes of XSROM into t he primary CPU’s S-cache and jumps to it . XSROM te st s are described in Table 2-3. Failure indicates a CPU failure.
Figure 2-6 XSROM Power-Up Flowchart
XSR OM b anner to OC P/console device
Clear SC _FHIT (force h it)
Enable all 3 S-cache banks
Run memor Print trace to O CP /console dev. Print errors to O CP/console d ev. Done messa
texts.
e to console dev.
Run B-cache tests
P rin t errors to OC P/co nso le d ev.
Done m essa
and ena ble d uplicate ta
through I squared C bus
Print me m info to console dev. C h ec k for ille Print warnin and O CP. In itia liz e a ll m emo r
Note: The XSROM can onl the environment variable conso le = serial. It alwa output to the OCP.
e to conso le dev.
Boot processor
redetermination
Init ia liz e B-c a c h e
Size system memor
al memoryconfig.
s to console d ev.
pairs.
print to the con sole device if
Boot processor redetermination
Primar
verifies c he cksum
of PAL/decom p/console
Primar decomp ression code or fa il-safe loader depend in upon results of checksum
Primar and starts the console
Second aries alerted that console has started. The jump to and run PALcode join in
code
Pass
unloads PAL/
jumps to PA Lcode
the console.
s send s
Fa il
Fail- safe loader
PKW 0432A-96
2–12
DIGITAL Server 5300
Page 60
Power-Up
After jumping to the primary CPU's S-cache, the code then intentionally I-caches itself and is completely register based (no D-stream for stack or data storage is used). The only D­stream accesses are writes/reads during testing.
Each FEPROMhas sixteen 64-Kbyte sectors. The first sector contains B-cache tests, memory tests, and a fail-safe loader. The second sector contains support for system memory and PALcode. The third sector contains a copy of the first sector. The remaining thirteen sectors contain the SRM console and decompression code.
__________________________NOTE ____________________________
Memory tests are run during power-up and reset (see Figure 2-4). They are also affected by the state of the memory_test environment variable, which can have the following values: FULL Test all memory
PARTIAL Test up to the first 256 Mbytes NONE Test 32 Mbytes
____________________________________________________________
Table 2-3 XSROM Tests
Test Test Name Logic Tested
11 B-cache Data March test B-cache data RAMs, CPU chip B-cache
control, CPU chip B-cache address decode, INDEX_H<23:6> (address bus)
12 B-cache Tag March test B-cache tag store RAMs, B-cache
STAT store RAMs
13 B-cache ECC Data Line test CPU chip ECC generation and checking
logic, ECC lines from CPU chip to B­cache, B-cache ECC RAMs
14 B-cache Tag Data Line test Access to B-cache tags, shorts between
tag data and its status and parity bits
15 B-cache Data Line test B-cache data lines to B-cache data
RAMs, B-cache read/write logic
16 B-cache ECC Data Line test CPU chip ECC generation and checking
logic, ECC lines from CPU chip to B­cache, B-cache ECC RAMs
DIGITAL Server 5300
2–13
Page 61
Power-Up
Table 2-4 Memory Tests
Test Test Name Logic Tested Description
20 Memory Data
test
Data path to and from memory
Test floats 1 and 0 across data and check bit data lines. Errors are reported for each DIMM memory card from MEM0_L to MEM7_H.
21 Memory
Address test
23* Memory Bitmap
Building
24 Memory March
test
*There is no test 22.
Address path to and
Same as test 20.
from memory No new logic Maps out bad memory
by way of the bitmap. It does not completely fail memory.
No new logic Maps out bad memory.
2–14
DIGITAL Server 5300
Page 62

XSROM Errors Reported

The XSROM r eports B-cache t est errors and memory test errors. It also reports a warning if memory is illegally configured.
Example 2-2 XSROM Errors Reported at Power-Up
B-cache Error (CPU Error)
TEST ERR on cpu0 #CPU running the test FRU cpu0 err# 2 tst# 11 exp: 5555555555555555 #Expected data rcv: aaaaaaaaaaaaaaaa #Received data adr: ffff8 #B-cache location
Memory Error (Memory Module Indicated)
20..21.. TEST ERR on cpu0 #CPU running test FRU: MEM1L #Low member of memory pair 1
Power-Up
#error occurred
err# c tst# 21
22..23..24..Memory testing complete on cpu0
Memory Configuration Error (Operator Error)
ERR! mem(pair0 misconfigured ERR! mem(pair1 card size mismatch ERR! mem_pair6 card type mismatch ERR! mem(pair1 EMPTY
FEPROM Failures (PCI Error)
Sctr 1 -PAL headr PTTRN fail Sctr 1 -PAL headr CHKSM fail Sctr 1 -PAL code CHKSM fail Sctr 3 -CONSLE headr PTTRN fail Sctr 3 -CONSLE headr CHKSM fail Sctr 3 -CONSLE code CHKSM fail
DIGITAL Server 5300
2–15
Page 63
Power-Up

Console Power-Up Tests

Once loaded, the SRM console tests each IOD further. Table 2-5 describes the IOD power-up tests, and Figure 2-6 describes the PCI power-up tests.
Table 2-5 IOD Tests
Test # Test Name Description
1 IOD CSR Access test Read and write all CSRs in each IOD. 2 Loopback test Dense space writes to the IOD’s PCI dense
3 ECC test Loopback tests similar to test 2 but with a
4 Parity Error and Fill Error
tests
5 Translation Error test A loopback test using scatter/gather
6 Write Pending test Runs test 2 with the write-pending bit set
7 PCI Loopback test Loops data through each PCI on each
8 PCI Peer-to-Peer Byte
Mask test
1
9
10
1
Page Table Entry test 1 (CAP chip)
Page Table Entry test 2 (CAP chip)
space to check the integrity of ECC lines.
varying pattern to create an ECC of 0s. Single- and double-bit errors are checked.
Parity errors are forced on the address and data lines on system bus and PCI buses. A fill error transaction is forced on the system bus.
address translation logic on each IOD.
and clear in the CAP chip control register.
IOD, testing the mask field of the system bus.
Tests that devices on the same PCI and on different PCIs can communicate.
Tests every PTE using scatter/gather translation and addressing.
Tests random PTEs forcing use of all interesting tag and page registers.
2–16
DIGITAL Server 5300
Page 64
Table 2-6 PCI Motherboard Tests
Power-Up
Test Number
Test Name Diagnostic Name Description
1 PCEB pceb_diag Tests the PCI to EISA
bridge chip
2 ESC esc_diag Tests the EISA system
controller 3 8K NVRAM nvram_diag Tests the NVRAM 4 Real-Time Clock ds1287_diag Tests the real-time
clock chip 5 Keyboard and
Mouse
i8242_diag Tests the
keyboard/mouse chip 6 Flash ROM flash_diag Dumps contents of
flash ROM 7 Serial and
Parallel Ports and Floppy
combo_diag Tests COM ports 1 and
2, the parallel port, and
the floppy 8 CD-ROM ncr810_diag Tests the CD-ROM
controller
For both IOD tests and PCI 0 and PCI 1 tests, trace and failure status is sent to the OCP. If any of these tests fail, a warning is sent to the SRM console device after the console prompt (or AlphaBIOS pop-up box). The IOD LEDs on the system motherboard are controlled by the diagnostics. If a LED is off, a failure occurred.
DIGITAL Server 5300
2–17
Page 65
Power-Up

Console Device Determination

After the SROM and XSROM have compl et ed their tasks, the SRM consol e program, as it starts, determines where to send its power-up messages.
Figure 2-7 Console Device Determination Flowchart
For each CP U In itialize C PU ch ip Tu rn off C PU LED
HANG
Ye s
D-cache
errors
No
Initialize
PCI-EISA bridge
chip
Read TOY
NVRAM
HANG
HANG
No
Yes
Determine Prim ary
All 3 S-cache
banks pass
Yes
Dupilcate Tag or
Fill errors
No
Light CPU LE D
Size IOD
Loopback on
each IOD
Pass
Light IOD LEDs
Fail
Initialize Comb o C hip
on X BUS for access
to C OM port 1
In itia lize OC P po rt
on X BUS for access
to O CP dis play
Print to console
device and OCP
In itialize a ll S-c ac he
banks
Check integrity of
XSROM
Pass
Load first 8K o f
XSROM into
S-cache
Jump to XSRO M
overlay in S -cache
Fail twic e
PKW0432-96
HANG
2–18
DIGITAL Server 5300
Page 66

Console Device Options

The console device can be either a serial terminal or a graphics monitor. Specifically:
A serial terminal connected to COM1 off the bulkhead. The terminal connected to
COM1 must be set to 9600 baud. This baud rate cannot be changed.
A graphics monitor off an adapter on PCI0. Systems running Windows NT must have a graphics monitor as the console device and run
AlphaBIOS as the console program. During power-up, the SROM and the XSROM always send progress and error messages to
the OCP and to the COM1 serial port if the SRM console environment variable (set with the set console command) is set to serial. If the console environment variable is set to graphics, no messages are sent to COM1.
If the console device is connected to COM1, the SROM, XSROM, and console power-up messages are sent to it once it has been initialized. If the console device is a graphics device, console power-up messages are sent to it, but SROM and XSROM power-up messages are lost. No matter what the console environment variable setting, each of the three programs sends messages to the control panel display.
Power-Up
Messages Sent By
Console Set to
Serial Graphics
SROM COM1 Lost, though a subset is sent to the OCP XSROM COM1 Lost, though a subset is sent to the OCP SRM console COM1¨ VGA, though a subset is sent to the
OCP
DIGITAL Server 5300
2–19
Page 67
Power-Up

Console Power-Up Display

The entire power-up display prints to a serial terminal (if the console environment variable is set to serial), and parts of it print to the control panel display. The last several lines print to either a serial terminal or a graphics monitor.
Example 2-3 Power-Up Display
SROM V3.0 on cpu0 SROM V3.0 on cpu1 XSROM V5.0 on cpu0 XSROMb V5.0 on cpu1 BCache testing complete on cpu1 BCache testing complete on cpu0 mem_pair0 - 256 MB mem_pair1 - 256 MB mem_pair2 - 64 MB mem_pair3 - 64 MB
20..21..20..21..23..24..24.. Memory testing complete on cpu0 Memory testing complete on cpu1
2–20
DIGITAL Server 5300
Page 68
Power-Up
At power-up or reset, the SROM code on each CPU module is loaded into that
module’s I-cache and tests the module. If all tests pass, the processor’s LED lights. If any test fails, the LED remains off and power-up testing terminates on that CPU.
The first determination of the primary processor is made, and the primary processor executes a loopback test to each PCI bridge. If this test passes, the bridge LED lights. If it fails, the LED remains off and power-up continues. The EISA system controller, PCI-to-EISA bridge, COM1 port, and control panel port are all initialized thereafter.
Each CPU prints an SROM banner to the device attached to the COM1 port and to the control panel display. (The banner prints to COM1 if the console environment variable is set to serial. If it is set to graphics, nothing prints to the console terminal, only to the control panel display, until occurs.
Each processor’s S-cache is initialized, and the XSROM code in the FEPROM on
the PCI 0 is unloaded into them. (If the unload is not successful, a copy is unloaed from a different FEPROM sector. If the second try fails, the CPU hangs.)
Each processor jumps to the XSROM code and sends an XSROM banner to the COM1 port and to the control panel display.
The three S-cache banks on each processor are enabled, and then the
B-cache is tested. If a failure occurs, a message is sent to the COM1 port and to the control panel display.
Each CPU sends a B-cache completion message to COM1.
The primary CPU is again determined, and memory is sized using code in sector 1 of FEPROM 0.
The information on memory pairs is sent to COM1. If an illegal memory configuration is detected, a warning message is sent to COM1 and the control panel display.
Memory is initialized and tested, and the test trace is sent to COM1 and the control panel display. Each CPU participates in the memory testing. The numbers for tests 20 and 21 might appear interspersed, as in Example 2–3. This is normal behavior. Test 24 can take several minutes if the memory is very large. The message “P0 TEST 24 MEM**” is displayed on the control panel display; the second asterisk rotates to indicate that testing is continuing. If a failure occurs, a message is sent to the COM1 port and to the control panel display.
Each CPU sends a test completion message to COM1.
DIGITAL Server 5300
2–21
Page 69
Power-Up
Example 2–3 Power-Up Display (Continued)
starting console on CPU 0 sizing memory 0 256 MB DIMM 1 256 MB DIMM 64 MB DIMM 64 MB DIMM starting console on CPU 1 probing IOD1 hose 1 bus 0 slot 1 - NCR 53C810 bus 0 slot 2 - DECchip 21041-AA bus 0 slot 3 - NCR 53C810 probing IOD0 hose 0 bus 0 slot 1 – PCEB probing EISA Bridge, bus 1 bus 0 slot 2 – S3 Trio64/Trio32 bus 0 slot 3 – DECchip 21140-AA Configuring I/O adapters... Ncr0, hose 1, bus 0, slot 1 Tulip0, hose 1, bus 0, slot 2 Ncr1, hose 1, bus 0, slot 3 Floppy0, hose 0, bus 1 slot 0 Mc0, hose 0 bus 0, slot 2 tulip1, hose 0, bus 0, slot 3 System temperature is 31 degrees C DIGITAL Server 5300 Console V5.0, 02-SEP-1997 18:18:26 P00>>>
¡
2–22
DIGITAL Server 5300
Page 70
Power-Up
The final primary CPU determination is made. The primary CPU unloads
PALcode and decompression code from the FEPROM on PCI 0 to its B-cache. The primary CPU then jumps to the PALcode to start the SRM console.
The primary CPU prints a message indicating that it is running the console. Starting with this message, the power-up display is printed to the default console terminal, regardless of the state of the console environment variable. (If console is set to graphics, the display from here to the end is saved in a memory buffer and printed to the graphics monitor after the PCI buses are sized and the graphics device is initialized.)
The size and type of each memory pair is determined. The console is started on each of the secondary CPUs. A status message prints for
each CPU.
The PCI bridges (indicated as IODn) are probed and the devices are reported. I/O adapters are configured.
¡
The SRM console bannerand prompt are printed. (The SRM prompt is shown in this manual as P00>>>. It can, however, be P01>>>.)
The SRM console loads and starts the AlphaBIOS console.
DIGITAL Server 5300
2–23
Page 71
Power-Up

Fail-Safe Loader

The fail-safe loader is a software routine that loads the SRM console image from floppy. Once the console is running you will want to run LFU to update FEPROM 0 with a new image.
_________________________ NOTE____________________________
FEPROM 0 contains images of the SROM, XSROM, PAL, decompression, and SRM console code.
___________________________________________________________
If the fail-safe loader loads, the following conditions exist on the machine:
The SROM has passed its tests and successfully unloaded the XSROM. If the SROM
fails to unload both copies of XSROM, it reports the failure to the control panel display and COM1 if possible, and the system hangs.
The XSROM has completed its B-cache and memory tests but has failed to unload the
PALcode in FEPROM 0 sector 1 or the SRM console code.
The XSROM reports the errors encountered and loads the fail-safe loader.
2–24
DIGITAL Server 5300
Page 72
3

Troubleshooting

This chapter describes troubleshooting during power-up and booting. It also describes the console test command and other useful commands. The following topics are covered:
Troubleshooting with LEDs
Troubleshooting Power Problems
Running Diagnostics—Test Command
Releasing Secure Mode
Testing an Entire System
Other Useful Console Commands
DIGITAL Server 5300 3–1
Page 73
Troubleshooting

Troubleshooting with LEDs

During power-up, reset, initialization, or testing, diagnostics are run on CPUs, memories, I/O bridges, and the PCI backplane and its embedded options. This section describes possible problems that can be identified by checking LEDs. Unfortunately LEDs on the CPU module ar e not visible; the only visible LEDs are on the system motherboard.
Figure 3-1 System Motherboard LEDs
System Motherboard
LEDs
IOD 0 Pa ss
3–2
DIGITAL Server 5300
IOD 1 Pa ss Fan Fault
Temp OK
PKW 0504G-97
Page 74

System Motherboard LEDs

You see the system motherboard LEDs by looking through the grate at the back of the machine. The normal state of the LEDs is shown in Figure 3-1.
If one of the IOD LEDs is off, the system bus to PCI bus bridge has failed. Replace
the system motherboard.
If the Fan Fault LED is ON, at least one of the four fans is broken. If this condition
occurs while the system is up and running, an error message identifying the FRU is printed to the console. If this condition occurs during a cold start, to identify which fan caused the fan fault, reset the system and watch the OCP display. During the first 30 seconds, one of the following message should occur:
SYSx Fan Failed where x = 0 or 1
CPUx Fan Failed where x = 0 or 1
Replace the failing FRU.
If the Temp OK LED is OFF, an overtemperature condition exists. Several things can
cause this condition: blocked airflow, temperature in the room where the system is located is too high, the system card cage is open and air is not channeled properly over the system. Fix any of these conditions, if possible. The overtemperature threshold is programmable and is controlled by the environment variable over_temp. Its default is 55 degrees C. After the system has cooled down and can be powered up, you can change the threshold. If you do this and the temperature inside the system gets too hot, it is likely that system errors will occur and the system may crash.
Troubleshooting
DIGITAL Server 5300
3–3
Page 75
Troubleshooting

Troubleshooting Power Problems

Power problems can occur before the system is up or while the system is running.

Power Problem List

The system will halt for the following reasons:
1. A CPU fan failure
2. A system fan failure
3. An overtemperature condition
4. Power supply failure
5. Circuit breaker(s) tripped
6. AC problem
7. Interlock switch activation or failure
8. Environmental electrical failure or unrecoverable system fault with auto_action ev = halt or boot
9. Cable failure
Indication of failure:
1. LEDs indicate fan and overtemperature condition
2. The OCP display
3. Circuit breaker(s) tripped
There is no obvious indication for failures 7 – 10 from the power system.
3–4
DIGITAL Server 5300
Page 76

Halt Caused by Power, Fan, or Overtemperature Condition

If a system is stopped because of a power, fan, or overtemperature problem, the console and the OCP should report the problem.

If Power Problem Occurs at Power-Up

If the system has a power problem on a cold start, the motherboard LEDs and the OCP display will indicate a problem. Causes of power problems are:
Broken system fan
Broken CPU fan
A power supply could be broken and the system could still power up momentarily.
(During power-up, an overcurrent condition occurs with two power supplies and is tolerated for a short period but a persistent overcurrent is not.)
Power control logic on the motherboard could fail
Interlock failure
Wire problems
Temperature problem (unlikely)
Troubleshooting

Recommended Order for Troubleshooting Failure at Power-Up

If the SRM console does not come all the way up, restart the system if the system runs NT and watch for an error message on the OCP display. Replace the FRU indicated.
1. If you can get to the SRM console, use the show power command. It will show the
last power fault.
2. If neither step one nor step 2 identifies a FRU, replace the motherboard.
DIGITAL Server 5300
3–5
Page 77
Troubleshooting

Running Diagnostics — Test Command

The test command runs diagnostics on the entire system, CPU devices, memory devices, and the PCI I/O subsystem. The test command r uns only from the SRM console. Ctrl/C stops the test. The console cannot be secure.
Example 3-1 Test Command Syntax
P00>>> help test FUNCTION
SYNOPSIS test ([-q] [-t <time>] [option] where option is: cpun memn pcin
where n = 0, 1 or * for CPUs and PCIs where n = 0 through 7 or * for MEM The entire system is tested by default if no is option specified.
_________________________ NOTE____________________________
If you are running the Microsoft Windows NT operating system, switch from AlphaBIOS to the SRM console in order to enter the test command. From the AlphaBIOS console, press in the Halt button (the LED will light) and reset the system.
___________________________________________________________
test [-t time] [-q] [option]
-t time Specifies the run time in seconds. The default for system test is 600 seconds (10
minutes).
-q Disables the display of status messages as exerciser processes are started and
stopped during testing. option Either cpun, memn, or pcin, where n is 0, 1, or * for CPUs and PCIs; or where n
is 0 through 7 or * for memory. If nothing is specified, the entire system is tested.
3–6
DIGITAL Server 5300
Page 78

Releasing Secure Mode

The console ca nnot be secure for most SRM c onsole commands to run. If the console is not secure, user mode consol e commands c an be e ntered. See the system manager if the system is secure and you do not know the password.
Example 3-2 Releasing/Reestablishing Secure Mode
P00>>> login Please enter password: xxxx P00>>>
[User mode SRM console commands are now available.]
P00>>> set secure
Troubleshooting
The console command login clears secure. If the password has been forgotten and the system is in secure mode, the procedure for
regaining control is:
1. Enter the login command
P00>>> login
2. At the please enter password: prompt, press the Halt button and then press the
Return key.
The password is now cleared and the console is in user mode. A new password must be set to put the console into secure mode again.
For a full discussion of securing the console, see your system User’s Guide.
DIGITAL Server 5300
3–7
Page 79
Troubleshooting

Testing an Entire System

A test command with no modifiers runs all exercisers for subsystems and devices on the system. I/O devices tested are supported boot devices. The test runs for 10 minutes.
Example 3-3 Sample Test Command
P00>>> test
Console is in diagnostic mode System test, runtime 600 seconds Type ^C to stop testing Configuring system..
polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7 dka500.5.0.1.1 DKa500 RRD45 1645 polling ncr1 (NCR 53C810) slot 3, bus 0 PCI, hose 1 SCSI Bus ID 7 dkb200.2.0.3.1 DKb200 RZ29B 0007 dkb400.4.0.3.1 DKb400 RZ29B 0007 polling floppy0 (FLOPPY) PCEB - XBUS hose 0 dva0.0.0.1000.0 DVA0 RX23 polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1 ewa0.0.0.2.1: 08-00-2B-E5-B4-1A
Testing EWA0 network device Testing VGA (alphanumeric mode only)
Starting background memory test, affinity to all CPUs.. Starting processor/cache thrasher on each CPU.. Starting processor/cache thrasher on each CPU..
Testing SCSI disks (read-only) No CD/ROM present, skipping embedded SCSI test Testing other SCSI devices (read-only).. Testing floppy drive (dva0, read-only)
3–8
DIGITAL Server 5300
Page 80
Troubleshooting
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ----------­00003047 memtest memory 1 0 0 134217728 134217728 00003050 memtest memory 205 0 0 213883392 213883392 00003059 memtest memory 192 0 0 200253568 200253568 00003062 memtest memory 192 0 0 200253568 200253568 00003084 memtest memory 80 0 0 82827392 82827392 000030d8 exer_kid dkb200.2.0.3 26 0 0 0 13690880 000030d9 exer_kid dkb400.4.0.3 26 0 0 0 13674496 0000310d exer_kid dva0.0.0.100 0 0 0 0 0 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ----------­00003047 memtest memory 1 0 0 432013312 432013312 00003050 memtest memory 635 0 0 664716032 664716032 00003059 memtest memory 619 0 0 647940864 647940864 00003062 memtest memory 620 0 0 648989312 648989312 00003084 memtest memory 263 0 0 274693376 274693376 000030d8 exer_kid dkb200.2.0.3 90 0 0 0 47572992 000030d9 exer_kid dkb400.4.0.3 90 0 0 0 47523840 0000310d exer_kid dva0.0.0.100 0 0 0 0 327680 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ----------­00003047 memtest memory 1 0 0 727711744 727711744 00003050 memtest memory 1054 0 0 1104015744 1104015744 00003059 memtest memory 1039 0 0 1088289024 1088289024 00003062 memtest memory 1041 0 0 1090385920 1090385920 00003084 memtest memory 447 0 0 467607808 467607808 000030d8 exer_kid dkb200.2.0.3 155 0 0 0 81488896 000030d9 exer_kid dkb400.4.0.3 155 0 0 0 81472512 0000310d exer_kid dva0.0.0.100 1 0 0 0 607232 Testing aborted. Shutting down tests. Please wait..
System test complete
^C P00>>>
DIGITAL Server 5300
3–9
Page 81
Troubleshooting

Testing Memory

The test mem command tests individual memory devices or all memory. The test shown in Example 3-4 runs for 2 minutes.
Example 3-4 Sample Test Memory Command
P00>>> test memory Console is in diagnostic mode System test, runtime 120 seconds
Type ^C to stop testing
Starting background memory test, affinity to all CPUs.. Starting memory thrasher on each CPU.. Starting memory thrasher on each CPU..
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------
000046d7 memtest memory 1 0 0 48234496 48234496 000046e0 memtest memory 122 0 0 126862208 126862208 000046e9 memtest memory 111 0 0 115329280 115329280 000046f2 memtest memory 109 0 0 113232384 113232384 000046fb memtest memory 41 0 0 41937920 41937920 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------
000046d7 memtest memory 1 0 0 226492416 226492416 000046e0 memtest memory 566 0 0 592373120 592373120 000046e9 memtest memory 555 0 0 580840192 580840192 000046f2 memtest memory 554 0 0 579791744 579791744 000046fb memtest memory 211 0 0 220174080 220174080 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------
000046d7 memtest memory 1 0 0 404750336 404750336 000046e0 memtest memory 1011 0 0 1058932480 1058932480 000046e9 memtest memory 1000 0 0 1047399552 1047399552 000046f2 memtest memory 999 0 0 1046351104 1046351104 000046fb memtest memory 381 0 0 398410240 398410240
3–10
DIGITAL Server 5300
Page 82
Troubleshooting
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- -----------­000046d7 memtest memory 1 0 0 583008256 583008256 000046e0 memtest memory 1456 0 0 1525491840 1525491840 000046e9 memtest memory 1446 0 0 1515007360 1515007360 000046f2 memtest memory 1444 0 0 1512910464 1512910464 000046fb memtest memory 550 0 0 575597952 575597952 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- -----------­000046d7 memtest memory 1 0 0 761266176 761266176 000046e0 memtest memory 1901 0 0 1992051200 1992051200 000046e9 memtest memory 1892 0 0 1982615168 1982615168 000046f2 memtest memory 1889 0 0 1979469824 1979469824 000046fb memtest memory 720 0 0 753834112 753834112 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- -----------­000046d7 memtest memory 1 0 0 937426944 937426944 000046e0 memtest memory 2346 0 0 2458610560 2458610560 000046e9 memtest memory 2337 0 0 2449174528 2449174528 000046f2 memtest memory 2333 0 0 2444980736 2444980736 000046fb memtest memory 890 0 0 932070272 932070272
Memory test complete
Test time has expired... P00>>>
DIGITAL Server 5300
3–11
Page 83
Troubleshooting

Testing PCI

The test pci command tests PCI buses and devices. The test runs for 2 minutes.
Example 3-5 Sample Test Command for PCI
P00>>> test pci* Console is in diagnostic mode System test, runtime 120 seconds
Type ^C to stop testing
Configuring all PCI buses.. polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7 dka500.5.0.1.1 DKa500 RRD45 1645 polling ncr1 (NCR 53C810) slot 3, bus 0 PCI, hose 1 SCSI Bus ID 7 dkb200.2.0.3.1 DKb200 RZ29B 0007 dkb400.4.0.3.1 DKb400 RZ29B 0007 polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1 ewa0.0.0.2.1: 08-00-2B-E5-B4-1A polling floppy0 (FLOPPY) PCEB - XBUS hose 0 dva0.0.0.1000.0 DVA0 RX23
Testing all PCI buses..
Testing EWA0 network device
Testing VGA (alphanumeric mode only)
Testing SCSI disks (read-only)
Testing floppy (dva0, read-only)
3–12
DIGITAL Server 5300
Page 84
Troubleshooting
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- -----------­00002c29 exer_kid dkb200.2.0.3 27 0 0 0 14642176 00002c2a exer_kid dkb400.4.0.3 27 0 0 0 14642176 00002c5e exer_kid dva0.0.0.100 0 0 0 0 0 ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ----------­00002c29 exer_kid dkb200.2.0.3 92 0 0 0 48689152 00002c2a exer_kid dkb400.4.0.3 92 0 0 0 48689152
00002c5e exer_kid dva0.0.0.100 0 0 0 0 286720 Testing aborted. Shutting down tests. Please wait..
Testing complete
^C P00>>>
DIGITAL Server 5300
3–13
Page 85
Troubleshooting

Other Useful Console Commands

There are several console commands that help diagnose the system.
The show power command can be used to identify power, temperature, and fan faults.
Example 3-6 Show Power
P00>>> show power
Status Power Supply 0 good Power Supply 1 good System Fans good CPU Fans good Temperature good
Current ambient temperature is 20 degrees C System shutdown temperature is set to 55 degrees C
The system was last reset via a system software reset
0 Environmental events are logged in nvram
The show memory command shows memory DIMMs and their starting addresses.
Example 3-7 Show Memory
P00>>> show memory
Slot Type MB Base
---- ---- ---- --------
0 DIMM 256 0 1 DIMM 256 20000000 2 DIMM 256 40000000 3 DIMM 256 60000000
Total 1.2GB
3–14
DIGITAL Server 5300
Page 86
Troubleshooting
The show fru command lists all FRUs in the system.
Example 3-8 Show FRU
P00>>> show fru Digital Equipment Corporation DIGITAL Server 5300
Console V5.0-2 Module Part # Type Rev Name Serial # System Motherboard 25147-01 0 0000 mthrbrd0
NI72000047 Memory 256 MB DIMM N/A 0 0000 mem0 N/A Memory 256 MB DIMM N/A 0 0000 mem1 N/A Memory 256 MB DIMM N/A 0 0000 mem2 N/A Memory 256 MB DIMM N/A 0 0000 mem3 N/A CPU (4MB Cache) B3007-AA 3 0000 cpu0
KA705TRVNS Bridge (IOD0/IOD1) 25147-01 600 0032 iod0/iod1
NI72000047 PCI Motherboard 25147-01 a 0003 saddle0
NI72000047
Bus 0 iod0 (PCI0) Slot Option Name Type Rev Name 1 PCEB 4828086 0005 pceb0 2 S3 Trio64/Trio32 88115333 0054 vga0 3 DECchip 21041-AA 141011 0011 tulip0
Bus 1 pceb0 (EISA Bridge connected to iod0, slot 1) Slot Option Name Type Rev Name
Bus 0 iod1 (PCI1) Slot Option Name Type Rev Name 1 NCR 53C810 11000 0002 ncr0 4 QLogic ISP1020 10201077 0005 isp0
DIGITAL Server 5300
3–15
Page 87

Error Registers

This chapter describes the registers used to hold error information. These registers include:
External Interface Status Register
External Interface Address Register
MC Error Information Register 0
MC Error Information Register 1
CAP Error Register
PCI Error Status Register 1
4
DIGITAL Server 5300 4–1
Page 88
Error Registers

External Interface Status Register - EI_STAT

The EI_STAT register is a read-only register that is unlocked and cleared by any PALco de read. A re ad of this regi ster also unloc ks the EI_ADDR, BC_TAG_ADDR, and FILL_SYN registers subject to some restrictions. The EI_STAT register is not unlocked or cleared by reset.
Address Type
3130 29 28 27 24 23
61
FF FFF0 0168 R
0
All 1s
CHIP_ID <3:0> BC_TPERR BC_TC_PERR EI_ES COR_ECC _ERR
35 34 33 3236
All 1s
SEO_H RD_ERR FIL_IRD EI_PA R_ERR UNC_EC C_ERR
PKW0453-96
4–2
DIGITAL Server 5300
Page 89
Error Registers
Fill data from B-cache or main memory could have correctable or uncorrectable errors in ECC mode. System address/command parity errors are always treated as uncorrectable hard errors, irrespective of the mode. The sequence for reading, unlocking, and clearing EI_STAT, EI_ADDR, BC_TAG_ADDR, and FILL_SYN is as follows:
1. Read the EI_ADDR, BC_TAG_ADDR, and FIL_SYN registers in any order. Does
not unlock or clear any register.
2. Read the EI_STAT register. This operation unlocks the EI_ADDR, BC_TAG_ADDR,
and FILL_SYN registers. It also unlocks the EI_STAT register subject to conditions given in Table 4-1, which defines the loading and locking rules for external interface registers.
__________________________NOTE ____________________________
If the first error is correctable, the registers are loaded but not locked. On the second correctable error, the registers are neither loaded nor locked. Registers are locked on the first uncorrectable error except the second hard error bit. This bit is set only for an uncorrectable error that follows an uncorrectable error. A correctable error that follows an uncorrectable error is not logged as a second error. B-cache tag parity errors are uncorrectable in this context.
____________________________________________________________
DIGITAL Server 5300
4–3
Page 90
Error Registers
Table 4-1 External Interface Status Register
Name Bits Type Description
COR_ECC_ERR
EI_ES
BC_TC_PERR
BC_TPERR
CHIP_ID
SEO_HRD_ERR
FIL_IRD
EI_PAR_ERR
UNC_ECC_ERR
<31> R
Correctable ECC Error.
received from outside the CPU contained a correct­able ECC error.
<30> R
External Interface Error Source.
indicates that the error source is fill data from main memory or a system address/command parity error. When clear, the error source is fill data from the B­cache.
This bit is only meaningful when <COR_ECC_ERR>, <UNC_ECC_ERR>, or <EI_PAR_ERR> is set in this register. This bit is not defined for a B-cache tag error (BC_TPERR) or a B-cache tag control parity error (BC_TC_ERR).
<29> R
B-Cache Tag Control Parity Error.
B-cache read transaction encountered bad parity in the tag control RAM.
<28> R
B-Cache Tag Address Parity Error.
B-cache read transaction encountered bad parity in the tag address RAM.
<27:24> R
Chip Identification.
revisions to the chip will return new unique values. <23:0> All ones. <63:36> All ones. <35> R
Second External Interface Hard Error.
that a fill from B-cache or main memory, or a system
address/command received by the CPU has a hard
error while one of the hard error bits in the EI_STAT
register is already set. <34> R
Fill I-Ref D-Ref.
occurred during an I-ref fill. When clear, indicates
that the error occurred during a D-ref fill. This bit has
meaning only when one of the ECC or parity error bits
is set. This bit is not defined for a B-cache tag parity
error (BC_TPERR) or a B-cache tag control parity
error (BC_TC_ERR). <33> R
External Interface Command/Address Parity
Error.
received by the CPU has a parity error. <32> R
Uncorrectable ECC Error
received from outside the CPU contained an
uncorrectable ECC error. In parity mode, this bit
indicates a data parity error.
Indicates that fill data
When set,
Indicates that a
Indicates that a
Read as “5.” Future update
Indicates
When set, indicates that the error
Indicates that an address and command
. Indicates that fill data
4–4
DIGITAL Server 5300
Page 91

External Interface Address Register - EI_ADDR

The EI_ADDR re gister contai ns the physic al addre ss associ ated with errors reported by the EI_STAT register. It is unlocked by a read of the EI_STAT Register. This register is meaningful only when one of the error bits is set.
Error Registers
Address Access
R
FF FFF0 0148
Table 4-2 Loading and Locking Rules for External Interface Registers
Correct­able Error
0 0 Not
1 0 Not
0 1 0 Yes Yes Clears and unlocks all
1
1
0 1 1 No Already
1
1
1
These are special cases. It is possible that when EI_ADDR is read, only the correctable error bit is set and the registers are not locked. By the time EI_STAT is read, an uncorrectable error is detected and the registers are loaded again and locked. The value of EI_ADDR read earlier is no longer valid. Therefore, for the “1,1,x” case, when EI_STAT is read correctable, the error bit is cleared and the registers are not unlocked or cleared. Software must reexecute the IPR read sequence. On the second read operation, error bits are in “0,1,x” state, all the related IPRs are unlocked, and EI_STAT is cleared.
Uncorrect­able Error
Second Hard Error
Load Register
Lock Register
Action When EI_STAT Is Read
No No Clears and unlocks all
possible
registers
Yes No Clears and unlocks all
possible
registers
registers
1 0 Yes Yes Clear bit (c) does not
unlock. Transition to “0,1,0” state.
Clears and unlocks all
locked
1 1 No Already
locked
registers Clear bit (c) does not
unlock. Transition to “0,1,1” state.
DIGITAL Server 5300
4–5
Page 92
Error Registers

MC Error Information Register 0 (MC_ERR0 - Offset = 800)

The low-order MC bus (system bus) address bits are latched into this register when the sy st e m bus to P CI bus bridge det e c t s a n error event. If the event is a hard error, the register bits are locked. A write to clear symptom bits in the CAP Error Register unlocks this register. When the valid bit (MC_ERR_VALID) in the CAP Error Register is clear, the contents are undefined.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 09 08 07 06 05 04 03 02 01 00
0
Failing Address ADDR<31:4>
PKW0551-97
Table 4-3 MC Error Information Register 0
Name Bits Type Initial
ADDR<31:4> <31:4> RO 0 Contains the address of the
Reserved <3:0> RO 0
4–6
DIGITAL Server 5300
Description
State
transaction on the system bus when an error is detected.
Page 93
Error Registers

MC Error Information Register 1 (MC_ERR1 - Offset = 840)

The high-orde r MC bus (system bus) address bit s and error symptoms are latched i nto this re gister when t he system bus to P CI bus bridge de tects an error. If the event is a hard error, the register bits are locked. A write to clear symptom bits in the CAP Error Register unlocks this register. When the valid bit (MC_ERR_VALID) in the CAP Error Register is clear, the contents are undefined.
31 30 2 9 28 27 2 6 25 2 4 23 22 21 20 19 18 17 16 1 5 14 1 312 11 10 09 08 07 0 6 05 04 0 3 02 0 1 00
res e r ved (0)
VALID bit
Dirty bit
DEVICE_ID
MC Command <5:0>
Fa iling Add re ss ADDR<3 9:32>
111
PK W 0551A -97
DIGITAL Server 5300
4–7
Page 94
Error Registers
Table 4-4 MC Error Information Register
Name Bits Type Initial
State
VALID <31> RO 0 Logical OR of bits
Reserved <30:21> RO 0 Dirty <20> RO 0 Set if the system bus
Reserved <19:17> 1 All ones. DEVICE_ID <16:14> RO 0 Slot number of bus
MC_CMD<5:0> <13:8> RO 0 Active command at the
ADDR<39:32> <7:0> RO 0 Address bits <39:32> of
Description
<30:23> in the CAP_ERR Register. Set if MC_ERR0 and MC_ERR1 contain a valid address.
error was associated with a Read/Dirty transaction. When set, the device ID field <19:14> does not indicate the source of the data.
master at the time of the error.
time the error was detected.
the transaction on the system bus when an error is detected.
4–8
DIGITAL Server 5300
Page 95

CAP Error Register (CAP_ERR - Offset = 880)

CAP_ERR is used to log information pertaining to an error detected by the CAP or MDP ASIC. If the error is a hard error, the register is locked. All bits, except the LOST_MC_ERR bit, are locked on hard errors. CAP_ERR remains locked until the CAP error is written to clear each individual error bit.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 1 6 15 14 1312 11 10 09 08 07 06 05 0 4 03 02 01 00
res e rv ed
Error Registers
PIO_OVFL LOS T_M C_ER R MC _AD R_PE RR NXM CRDA CRDB RDSA RDSB MC _ER R_VALID
PERR SERR MAB PTE_INV PCI_ERR_VALID
PKW 0551B-97
Table 4-5 CAP Error Register
Name Bits Type Initial
State
MC_ERR VALID <31> RO 0 Logical OR of bits <30:23> in
RDSB <30> RW1C 0 Uncorrectable ECC error detected
RDSA <29> RW1C 0 Uncorrectable ECC error detected
CRDB <28> RW1C 0 Correctable ECC error detected
Description
this register. When set MC_ERR0 and MC_ERR1 are latched.
by MDPB. Clear state in MDPB before clearing this bit.
by MDPA. Clear state in MDPA before clearing this bit.
by MDPB. Clear state in MDPB_STAT before clearing this bit.
continued on next page
DIGITAL Server 5300
4–9
Page 96
Error Registers
Table 4-5 CAP Error Register (continued)
CRDA <27> RW1C 0 Correctable ECC error detected by
NXM <26> RW1C 0 System bus master transaction status
MC_ADR_PERR <25> RW1C 0 Set when a system bus
LOST_MC_ERR <24> RW1C 0 Set when an error is detected but not
PIO_OVFL <23> RW1C 0 Set when a transaction that targets
Reserved <22:5> RO 0 PCI_ERR_VALID <4> RO 0 Logical OR of bits <3:0> of this
PTE_INV <3> RW1C 0 Invalid page table entry on
MAB <2> RW1C 0 PCI master state machine detected
SERR <1> RW1C 0 PCI target state machine observed
PERR <0> RW1C 0 PCI master state machine observed
MDPA. Clear state in MDPA_STAT before clearing this bit.
NXM (Read with Address bit <39> set but transaction not pended or transaction target above the top of memory register.) CPU will also get a fill error on reads.
command/address parity error is detected.
logged because the associated symptom fields and registers are locked with the state of an earlier error.
this system bus to PCI bus bridge is not serviced because the buffers are full. This is a symptom of setting the PEND_NUM field in CAP_CNTL to an incorrect value.
register. When set, the PCI error address register is locked.
scatter/gather access.
PCI Target Abort (likely cause: NXM) (except Special Cycle). On reads fill error is also returned.
SERR#. CAP asserts SERR when it is master and detects target abort.
PERR#.
4–10
DIGITAL Server 5300
Page 97
Error Registers

PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040)

PCI_ERR1 is used by the system bus to PCI bus bridge to log bus address <31:0> pertaining to an error condi tion logged in CAP_ERR. This register always captures PCI addr ess <31:0>, even for a P CI DAC cycle. When the PCI_ERR_VALID bit in CAP_ERR is clear, the contents are undefined.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 09 08 07 06 05 04 03 02 01 00
Failing Address ADDR <31:0>
PKW0551C-97
Table 4-6 PCI Error Status Register
Name Bits Type Initial
State
ADDR<31:0> <31:0> RO 0 Contains address bits
Description
<31:0> of the transaction on the PCI bus when an error is detected.
DIGITAL Server 5300
4–11
Page 98
This chapter describes removal and replacement procedures for field-replaceable units (FRUs).

System Safety

Observe the safety guidelines in this section to prevent personal injury.
________________________ CAUTION___________________________
Wear an antistatic wrist strap whenever you work on a system.
____________________________________________________________
________________________WARNING __________________________
When the system interlocks are disabled and the system is still powered on, voltages are low in the system, but current is high. Observe the following guidelines to prevent personal injury.
5

Removal and Replacement

1. Remove any jewelry that may conduct electricity before working on the system.
2. If you need to access the system card cage, power down the system and wait 2 minutes to allow components in that area to cool.
____________________________________________________________
DIGITAL Server 5300 5–1
Page 99
Removal and Replacement

FRU List

Figure 5-1 shows the locations of FRUs, and Table 5-1 lists the part numbers of all field­replaceable units.
Figure 5-1 System FRU Locations
SCSI Disks
OCP and Display
Floppy
PKW0521-97
CPUs
Memory
Power Supplies
I/O O p tions
CD-ROM
U
P
C
Table 5-1 Field-Replaceable Unit Part Numbers
CPU Modules
B3107-AA 400 MHz CPU 4 Mbyte cache B3107-CA 533 MHz CPU, 4 Mbyte cache
Memory Modules
54-25084-DA 32 Mbyte DIMM (synchronous) 20-47405-D3
54-25092-DA 128 Mbyte DIMM (synchronous) 20-45619-D3
54-25149-01 Memory riser card
5–2
DIGITAL Server 5300
continued on next page
Page 100
Removal and Replacement
Table 5-1 Field-Replaceable Unit Part Numbers (continued)
System Backplane, Display, and Support Hardware
54-25147-02 System motherboard RX23L-AB Floppy RRD46-AB or 30-48116-02 CD-ROM 54-23302-02 OCP assembly 70-31349-01 Speaker assembly Fans 70-31351-01 Cooling fan 120x120 70-31350-01 Cooling fan 92x92 12-24701-34 CPU fan
Power System Components
30-43120-02 Power supply
SCSI Hardware
54-23365-01 SCSI backplane 30-48985-01 Ultra SCSI bus extender
Power Cords
BN35B-02 North America, Japan 12V, 75-inches long BN35S-02 Australia, New Zealand, 2.5m long BN35R-02 Central Europe, 2.5m long BN35J-02 UK, Ireland, 2.5m long BN35K-02 Switzerland 2.5m long BN35P-02 Denmark, 2.5m long BN35M-02 Italy, 2.5m long BN35L-02 Egypt, India, South Africa, 2.5m long BN35N-02 Israel, 2.5m long
continued on next page
DIGITAL Server 5300
5–3
Loading...