Digital Equipment AlphaServer 1000A Service Manual

Page 1
AlphaServer1000A ServiceGuide
Order Number: EK–ALPSV–SV. A01
Digital Equipment Corporation Maynard, Massachusetts
Page 2
First Printing, March 1996
Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license from Digital or an authorized sublicensor.
Copyright © Digital Equipment Corporation, 1996. All Rights Reserved.
The following are trademarks of Digital Equipment Corporation: AlphaServer, DEC, DECchip, DEC VET, Digital, OpenVMS, StorageWorks, VAX DOCUMENT, and the DIGITAL logo.
Digital UNIX Version 3.0 is an X/Open UNIX 93 branded product. Windows NT is a trademark of Microsoft Corp.
All other trademarks and registered trademarks are the property of their respective holders.
FCC NOTICE: The equipment described in this manual generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class B computing device pursuant to Subpart J of Part 15 of FCC Rules, which are designed to provide reasonable protection against such radio frequency interference when operated in a commercial environment. Operation of this equipment in a residential area may cause interference, in which case the user at his own expense may be required to take measures to correct the interference.
S3016
This document was prepared using VAX DOCUMENT Version 2.1.
Page 3
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Troubleshooting Strategy
1.1 Troubleshooting the System . . . . . . . . . . . . . . . . . . . . . . . . 1–1
1.1.1 Problem Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2
1.2 Service Tools and Utilities . . . . . . . . . . . . . . . . . . . . . . . . . 1–7
1.3 Information Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–9
2 Power-Up Diagnostics and Display
2.1 Interpreting Error Beep Codes . . . . . . . . . . . . . . . . . . . . . . 2–2
2.2 SROM Memory Power-Up Tests . . . . . . . . . . . . . . . . . . . . . 2–4
2.3 Power-Up Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–9
2.3.1 Console Event Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.4 Mass Storage Problems Indicated at Power-Up . . . . . . . . . 2–12
2.5 Storage Device LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–15
2.6 EISA Bus Problems Indicated at Power-Up . . . . . . . . . . . . 2–18
2.6.1 Additional EISA Troubleshooting Tips . . . . . . . . . . . . . 2–19
2.7 PCI Bus Problems Indicated at Power-Up . . . . . . . . . . . . . 2–20
2.7.1 Additional PCI Troubleshooting Tips . . . . . . . . . . . . . . 2–20
2.8 Fail-Safe Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–21
2.8.1 Fail-Safe Loader Functions . . . . . . . . . . . . . . . . . . . . . 2–21
2.8.2 Activating the Fail-Safe Loader . . . . . . . . . . . . . . . . . . 2–22
2.9 Power-Up Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24
2.9.1 AC Power-Up Sequence . . . . . . . . . . . . . . . . . . . . . . . . 2–24
2.9.2 DC Power-Up Sequence . . . . . . . . . . . . . . . . . . . . . . . . 2–25
2.10 Firmware Power-Up Diagnostics . . . . . . . . . . . . . . . . . . . . 2–25
2.10.1 Serial ROM Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 2–25
2.10.2 Console Firmware-Based Diagnostics . . . . . . . . . . . . . . 2–26
iii
Page 4
3 Running System Diagnostics
3.1 Running ROM-Based Diagnostics . . . . . . . . . . . . . . . . . . . 3–1
3.2 Command Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2
3.3 Command Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3.3.1 test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–4
3.3.2 cat el and more el . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–7
3.3.3 memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–8
3.3.4 netew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–10
3.3.5 network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–12
3.3.6 net -s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–14
3.3.7 net -ic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–15
3.3.8 kill and kill_diags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–16
3.3.9 show_status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–17
3.4 Acceptance Testing and Initialization. . . . . . . . . . . . . . . . . 3–18
3.5 DEC VET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–18
4 Error Log Analysis
4.1 Fault Detection and Reporting . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Machine Check/Interrupts . . . . . . . . . . . . . . . . . . . . . . 4–2
4.2 Error Logging and Event Log Entry Format . . . . . . . . . . . 4–4
4.3 Event Record Translation. . . . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.3.1 OpenVMS Alpha Translation Using DECevent . . . . . . 4–5
4.3.2 Digital UNIX Translation Using DECevent . . . . . . . . . 4–6
5 System Configuration and Setup
5.1 Verifying System Configuration . . . . . . . . . . . . . . . . . . . . . 5–2
5.1.1 System Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2
5.1.2 Switching Between Interfaces . . . . . . . . . . . . . . . . . . . 5–4
5.1.3 Verifying Configuration: ARC Menu Options for
Windows NT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5.1.3.1 Display Hardware Configuration . . . . . . . . . . . . . . 5–5
5.1.3.2 Set Default Variables . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.1.4 Verifying Configuration: SRM Console Commands for
Digital UNIX and OpenVMS . . . . . . . . . . . . . . . . . . . . 5–9
5.1.4.1 show config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–9
5.1.4.2 show device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14
5.1.4.3 show memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15
5.1.4.4 Setting and Showing Environment Variables . . . . . 5–15
5.2 System Bus Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–23
5.2.1 CPU Daughter Board . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24
iv
Page 5
5.2.2 Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24
5.3 Motherboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–25
5.4 EISA Bus Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–26
5.5 ISA Bus Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–27
5.5.1 Identifying ISA and EISA options . . . . . . . . . . . . . . . . 5–27
5.6 EISA Configuration Utility . . . . . . . . . . . . . . . . . . . . . . . . 5–28
5.6.1 Before You Run the ECU . . . . . . . . . . . . . . . . . . . . . . . 5–28
5.6.2 How to Start the ECU . . . . . . . . . . . . . . . . . . . . . . . . . 5–29
5.6.3 Configuring EISA Options . . . . . . . . . . . . . . . . . . . . . . 5–31
5.6.4 Configuring ISA Options . . . . . . . . . . . . . . . . . . . . . . . 5–32
5.7 PCI Bus Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5.7.1 PCI-to-PCI Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–34
5.8 SCSI Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–34
5.8.1 Internal StorageWorks Shelf . . . . . . . . . . . . . . . . . . . . 5–34
5.8.2 External SCSI Expansion . . . . . . . . . . . . . . . . . . . . . . 5–35
5.8.3 SCSI Bus Configurations . . . . . . . . . . . . . . . . . . . . . . . 5–36
5.9 Power Supply Configurations . . . . . . . . . . . . . . . . . . . . . . . 5–40
5.10 Console Port Configurations . . . . . . . . . . . . . . . . . . . . . . . . 5–43
5.10.1 set console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–43
5.10.2 set tt_allow_login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–44
5.10.3 set tga_sync_green . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–45
5.10.4 Setting Up a Serial Terminal to Run ECU . . . . . . . . . . 5–45
5.10.5 Using a VGA Controller Other than the Standard
On-Board VGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–46
6 AlphaServer 1000A FRU Removal and Replacement
6.1 AlphaServer 1000A FRUs . . . . . . . . . . . . . . . . . . . . . . . . . 6–1
6.2 Removal and Replacement . . . . . . . . . . . . . . . . . . . . . . . . . 6–7
6.2.1 Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9
6.2.2 Power Supply DC Cable Assembly . . . . . . . . . . . . . . . . 6–12
6.2.3 CPU Daughter Board . . . . . . . . . . . . . . . . . . . . . . . . . . 6–21
6.2.4 Fans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–22
6.2.5 StorageWorks Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.2.6 Internal StorageWorks Backplane . . . . . . . . . . . . . . . . 6–24
6.2.7 Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–26
6.2.8 Interlock Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–30
6.2.9 Motherboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–31
6.2.10 NVRAM Chip (E14) and NVRAM TOY Clock Chip
(E78) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–36
6.2.11 OCP Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–36
6.2.12 Power Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–39
6.2.13 Speaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–40
v
Page 6
6.2.14 Removable Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–41
A Default Jumper Settings
A.1 Motherboard Jumpers . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–2
A.2 CPU Daughter Board (J3 and J4) Supported Settings . . . . A–4
A.3 CPU Daughter Board (J1 Jumper) . . . . . . . . . . . . . . . . . . . A–6
Glossary
Index
Examples
5–1 Sample Hardware Configuration Display . . . . . . . . . . . 5–6
Figures
2–1 Jumper J1 on the CPU Daughter Board . . . . . . . . . . . 2–8
2–2 AlphaServer 1000A Memory Layout . . . . . . . . . . . . . . 2–9
2–3 StorageWorks Disk Drive LEDs (SCSI) . . . . . . . . . . . . 2–16
2–4 Floppy Drive Activity LED . . . . . . . . . . . . . . . . . . . . . . 2–16
2–5 CD–ROM Drive Activity LED . . . . . . . . . . . . . . . . . . . 2–17
2–6 Jumper J1 on the CPU Daughter Board . . . . . . . . . . . 2–23
5–1 System Architecture: AlphaServer 1000A . . . . . . . . . . 5–2
5–2 Device Name Convention . . . . . . . . . . . . . . . . . . . . . . . 5–14
5–3 Card Cages and Bus Locations . . . . . . . . . . . . . . . . . . . 5–23
5–4 Memory Layout on the Motherboard . . . . . . . . . . . . . . 5–25
5–5 EISA and ISA Boards . . . . . . . . . . . . . . . . . . . . . . . . . 5–27
5–6 PCI Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5–7 Single Controller Configuration . . . . . . . . . . . . . . . . . . 5–37
5–8 Dual Controller Configuration with Split StorageWorks
Backplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–38
5–9 Triple Controller Configuration with Split
StorageWorks Backplane . . . . . . . . . . . . . . . . . . . . . . 5–39
5–10 Power Supply Configurations . . . . . . . . . . . . . . . . . . . . 5–41
5–11 Power Supply Cable Connections . . . . . . . . . . . . . . . . . 5–42
6–1 FRUs, Front Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–5
vi
Page 7
6–2 FRUs, Rear Left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6
6–3 Opening Front Door . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–7
6–4 Removing Top Cover and Side Panels . . . . . . . . . . . . . 6–8
6–5 Floppy Drive Cable (34-Pin) . . . . . . . . . . . . . . . . . . . . . 6–9
6–6 OCP Module Cable (10-Pin) . . . . . . . . . . . . . . . . . . . . . 6–10
6–7 Power Cord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–10
6–8 Power Supply Current Sharing Cable (3-Pin) . . . . . . . 6–11
6–9 Removing Cable Channel Guide . . . . . . . . . . . . . . . . . . 6–12
6–10 Power Supply DC Cable Assembly . . . . . . . . . . . . . . . . 6–13
6–11 Power Supply Storage Harness (12-Pin) . . . . . . . . . . . . 6–14
6–12 Interlock/Server Management Cable (2-pin) . . . . . . . . . 6–15
6–13 Internal StorageWorks Jumper Cable (68-Pin) . . . . . . . 6–16
6–14 Wide-SCSI (Controller to StorageWorks Shelf) Cable
(68-Pin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–17
6–15 Wide-SCSI (Controller to StorageWorks Shelf) Cable
(68-Pin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18
6–16 Wide-SCSI (J10 to Bulkhead Connector) Cable
(68-Pin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–19
6–17 SCSI (Embedded 8-bit) Removable-Media Cable
(50-Pin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6–18 Removing CPU Daughter Board . . . . . . . . . . . . . . . . . 6–21
6–19 Removing Fans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–22
6–20 Removing StorageWorks Drive . . . . . . . . . . . . . . . . . . . 6–23
6–21 Removing Power Supply . . . . . . . . . . . . . . . . . . . . . . . 6–24
6–22 Removing Internal StorageWorks Backplane . . . . . . . . 6–25
6–23 Memory Layout on Motherboard . . . . . . . . . . . . . . . . . 6–26
6–24 Removing SIMMs from Motherboard . . . . . . . . . . . . . . 6–27
6–25 Installing SIMMs on Motherboard . . . . . . . . . . . . . . . . 6–28
6–26 Removing the Interlock Safety Switch . . . . . . . . . . . . . 6–30
6–27 Removing EISA and PCI Options . . . . . . . . . . . . . . . . . 6–31
6–28 Removing CPU Daughter Board . . . . . . . . . . . . . . . . . 6–32
6–29 Removing Motherboard . . . . . . . . . . . . . . . . . . . . . . . . 6–33
6–30 Motherboard Layout . . . . . . . . . . . . . . . . . . . . . . . . . . 6–35
6–31 Removing Front Door . . . . . . . . . . . . . . . . . . . . . . . . . . 6–36
6–32 Removing Front Panel . . . . . . . . . . . . . . . . . . . . . . . . . 6–37
6–33 Removing the OCP Module . . . . . . . . . . . . . . . . . . . . . 6–38
6–34 Removing Power Supply . . . . . . . . . . . . . . . . . . . . . . . 6–39
vii
Page 8
6–35 Removing Speaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–40
6–36 Removing a CD–ROM Drive . . . . . . . . . . . . . . . . . . . . 6–41
6–37 Removing a Tape Drive . . . . . . . . . . . . . . . . . . . . . . . . 6–42
6–38 Removing a Floppy Drive . . . . . . . . . . . . . . . . . . . . . . . 6–43
A–1 Motherboard Jumpers (Default Settings) . . . . . . . . . . . A–2
A–2 AlphaServer 1000A 4/266 CPU Daughter Board
(Jumpers J3 and J4) . . . . . . . . . . . . . . . . . . . . . . . . . . A–4
A–3 AlphaServer 1000A 4/233 CPU Daughter Board
(Jumpers J3 and J4) . . . . . . . . . . . . . . . . . . . . . . . . . . A–5
A–4 CPU Daughter Board (J1 Jumper) . . . . . . . . . . . . . . . . A–6
Tables
1–1 Diagnostic Flow for Power Problems . . . . . . . . . . . . . . 1–3
1–2 Diagnostic Flow for Problems Getting to Console
Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–4
1–3 Diagnostic Flow for Problems Reported by the Console
Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–5
1–4 Diagnostic Flow for Boot Problems . . . . . . . . . . . . . . . 1–6
1–5 Diagnostic Flow for Errors Reported by the Operating
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–7
2–1 Interpreting Error Beep Codes . . . . . . . . . . . . . . . . . . . 2–2
2–2 SROM Memory Tests, CPU Jumper J1 . . . . . . . . . . . . 2–5
2–3 Console Power-Up Countdown Description and Field
Replaceable Units (FRUs) . . . . . . . . . . . . . . . . . . . . . . 2–10
2–4 Mass Storage Problems . . . . . . . . . . . . . . . . . . . . . . . . 2–12
2–5 Troubleshooting RAID Problems . . . . . . . . . . . . . . . . . 2–14
2–6 EISA Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . 2–18
2–7 PCI Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–20
3–1 Summary of Diagnostic and Related Commands . . . . . 3–2
4–1 AlphaServer 1000 Fault Detection and Correction . . . . 4–2
5–1 Listing the ARC Firmware Device Names . . . . . . . . . . 5–5
5–2 ARC Firmware Device Names . . . . . . . . . . . . . . . . . . . 5–6
5–3 ARC Firmware Environment Variables . . . . . . . . . . . . 5–8
5–4 Environment Variables Set During System
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–16
5–5 Operating System Memory Requirements . . . . . . . . . . 5–25
viii
Page 9
5–6 Summary of Procedure for Configuring EISA Bus
(EISA Options Only) . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31
5–7 Summary of Procedure for Configuring EISA Bus with
ISA Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32
5–8 SCSI Storage Configurations . . . . . . . . . . . . . . . . . . . . 5–36
6–1 AlphaServer 1000A FRUs . . . . . . . . . . . . . . . . . . . . . . 6–2
6–2 Power Cord Order Numbers . . . . . . . . . . . . . . . . . . . . . 6–11
ix
Page 10
Page 11
Preface
This guide describes the procedures and tests used to service AlphaServer 1000A systems. AlphaServer 1000A systems use a deskside ‘‘wide-tower’’ enclosure.
Intended Audience
This guide is intended for use by Digital Equipment Corporation service personnel and qualified self-maintenance customers.
xi
Page 12
Conventions
The following conventions are used in this guide:
Convention Meaning
Return
A key name enclosed in a box indicates that you press that key.
Ctrl/x Ctrl/x
indicates that you hold down the Ctrl key while you press another key, indicated here by x. In examples, this key combination is enclosed in a box, for example,
Ctrl/C
. Warning Warnings contain information to prevent personal injury. Caution Cautions provide information to prevent damage to equipment
or software.
Note A note calls the reader’s attention to any information that may
be of special importance.
boot
Console and operating system commands are shown in this special typeface.
[ ]
In command format descriptions, brackets indicate optional elements.
show config
Console command abbreviations must be entered exactly as shown. Commands shown in lowercase can be entered in
either uppercase or lowercase. italic type In console command sections, italic type indicates a variable. < > In console mode online help, angle brackets enclose a
placeholder for which you must specify a value. { } In command descriptions, braces containing items separated by
commas imply mutually exclusive items.
Related Documentation
AlphaServer 1000A Owner’s Guide, EK-ALPSV-OG
DEC Verifier and Exerciser Tool User’s Guide, AA-PTTMD-TE
Guide to Kernel Debugging, AA-PS2TD-TE
OpenVMS Alpha System Dump Analyzer Utility Manual, AA-PV6UB-TE
DECevent Translation and Reporting Utility for OpenVMS Alpha, User and
Reference Guide, AA-Q73KC-TE
DECevent Translation and Reporting Utility for Digital UNIX, User and
Reference Guide AA-QAA3A-TE
xii
Page 13
DECevent Analysis and Notification Utility for OpenVMS Alpha, User and
Reference Guide, AA-Q73LC-TE
DECevent Analysis and Notification Utility for Digital UNIX, User and
Reference Guide AA-QAA4A-TE
StorageWorks RAID Array 200 Subsystems Controller Installation and
Standalone Configuration Utility User’s Guide, EK-SWRA2-IG
xiii
Page 14
Page 15
1
Troubleshooting Strategy
This chapter describes the troubleshooting strategy for AlphaServer 1000A systems.
Section 1.1 provides questions to consider before you begin troubleshooting an
AlphaServer 1000A system.
Tables 1–1 through 1–5 provide a diagnostic flow for each category of system
problem.
Section 1.2 lists the product tools and utilities.
Section 1.3 lists available information services.
1.1 Troubleshooting the System
Before troubleshooting any system problem, check the site maintenance log for the system’s service history. Be sure to ask the system manager the following questions:
Has the system been used before and did it work correctly?
Have changes to hardware or updates to firmware or software been made
to the system recently? If so, are the revision numbers compatible for the system? (Refer to the hardware and operating system release notes).
What is the state of the system—is the operating system running?
If the operating system is down and you are not able to bring it up, use the console environment diagnostic tools, such as the power-up display and ROM-based diagnostics (RBDs).
If the operating system is running, use the operating system environment diagnostic tools, such as the DECevent event management utility (to translate and interpret error logs), crash dumps, and exercisers (DEC VET).
Troubleshooting Strategy 1–1
Page 16
1.1.1 Problem Categories
System problems can be classified into the following five categories. Using these categories, you can quickly determine a starting point for diagnosis and eliminate the unlikely sources of the problem.
1. Power problems (Table 1–1)
2. No access to console mode (Table 1–2)
3. Console-reported failures (Table 1–3)
4. Boot failures (Table 1–4)
5. Operating system-reported failures (Table 1–5)
1–2 Troubleshooting Strategy
Page 17
Table 1–1 Diagnostic Flow for Power Problems
Symptom Action
System does not power on.
Check the power source and power cord.
Check that the system’s top cover is properly secured. A safety interlock switch shuts off power to the system if the top cover is removed.
If there are two power supplies, make sure both power supplies are plugged in.
Check the On/Off switch setting on the operator control panel.
Check that the ambient room temperature is within environmental specifications (10–40°C, 50–104°F).
Check that internal power supply cables are plugged in at both the power supply and system motherboard (Section 5.9).
Power supply shuts down after a few seconds (fan failure).
Using a flashlight, look through the front (to the left of the internal StorageWorks shelf) to determine if the fans are spinning at power-up. A failure of either fan causes the system to shut down after a few seconds.
Troubleshooting Strategy 1–3
Page 18
Table 1–2 Diagnostic Flow for Problems Getting to Console Mode
Symptom Action
Power-up screen is not displayed. Interpret the error beep codes at power-up (Section 2.1)
for a failure detected during self-tests. Check that the keyboard and monitor are properly
connected and turned on. If the power-up screen is not displayed, yet the system
enters console mode when you press
Return
, check that
the
console
environment variable is set correctly. If you are using a VGA monitor as the console terminal, the console variable should be set to ‘‘graphics.’’ If you are using a serial console terminal, the console variable should be set to ‘‘serial.’’
If a VGA controller other than the standard on-board VGA controller is being used, refer to Section 5.10 for more information.
If
console
is set to serial, the power-up screen is routed to the COM1 serial communication port (Section 5.10) and cannot be viewed from the VGA monitor.
Try connecting a console terminal to the COM1 serial communication port (Section 5.10). If necessary use an MMJ-to-9-pin adapter (H8571-J). Check the baud rate setting for the console terminal and the system. The system baud rate setting is 9600. When using the COM1 port, you must set the
console
environment
variable to ‘‘serial.’’ For certain situations, power up using the fail-safe
loader (Section 2.8) to load new console firmware from a diskette.
1–4 Troubleshooting Strategy
Page 19
Table 1–3 Diagnostic Flow for Problems Reported by the Console Program
Symptom Action
Power-up tests do not complete. Interpret the error beep codes at power-up (Section 2.1)
and check the power-up screen (Section 2.3) for a failure detected during self-tests.
Console program reports error:
Error beep codes report an error at power-up.
Power-up screen includes error messages.
Use the error beep codes (Section 2.1) and/or console terminal (Section 2.3) to determine the error.
Examine the console event log (enter the
more el
command) (Section 2.3.1) or the power-up screen (Section 2.3) to check for embedded error messages recorded during power-up.
If the power-up screen or console event log indicates problems with mass storage devices, or if storage devices are missing from the
show config
display, use the troubleshooting tables (Section 2.4) to determine the problem.
Note
The external SCSI terminator must be installed on the SCSI port at the rear of the enclosure. Without the termination, some SCSI drives will not be available– these drives will be missing from the
show
config
display.
If the power-up screen or console event log indicates problems with EISA devices, or if EISA devices are missing from the
show config
display, use the troubleshooting table (Section 2.6) to determine the problem.
If the power-up screen or console event log indicates problems with PCI devices, or if PCI devices are missing from the
show config
display, use the troubleshooting table (Section 2.7) to determine the problem.
Run the ROM-based diagnostic (RBD) tests (Section 3.1) to verify the problem.
Troubleshooting Strategy 1–5
Page 20
Table 1–4 Diagnostic Flow for Boot Problems
Symptom Action
System cannot find boot device. Check the system configuration for the correct device
parameters (node ID, device name, and so on).
For Digital UNIX and OpenVMS, use the
show config
and
show device
commands
(Section 5.1).
For Windows NT, use the Display Hardware
Configuration display and the Set Default Environment Variables display (Section 5.1).
Check the system configuration for the correct environment variable settings.
For Digital UNIX and OpenVMS, examine the
auto_action, bootdef_dev, boot_osflags, and os_type environment variables. Also, make sure that the bus_probe_algorithm environment variable is set to ‘‘new’’ (Section 5.1.4.4).
For problems booting over a network, check the ew*0_protocols or er*0_protocols environment variable settings: Systems booting from a Digital UNIX server should be set to bootp; systems booting from an OpenVMS server should be set to mop (Section 5.1.4.4).
For Windows NT, examine the FWSEARCHPATH,
AUTOLOAD, and COUNTDOWN environment variables (Section 5.1.4.4).
Device does not boot. For problems booting over a network, check the ew*0_
protocols or er*0_protocols environment variable settings: Systems booting from a Digital UNIX server should be set to bootp; systems booting from an OpenVMS server should be set to mop (Section 5.1.4.4).
For systems running Digital UNIX and OpenVMS, make sure that the bus_probe_algorithm environment variable is set to ‘‘new’’ (Section 5.1.4.4).
Run the device tests (Section 3.1) to check that the boot device is operating.
1–6 Troubleshooting Strategy
Page 21
Table 1–5 Diagnostic Flow for Errors Reported by the Operating System
Symptom Action
System is hung or has crashed. Examine the crash dump file.
Refer to OpenVMS Alpha System Dump Analyzer Utility Manual (AA-PV6UB-TE) for information on how to interpret OpenVMS crash dump files.
Refer to the Guide to Kernel Debugging (AA–PS2TD– TE) for information on using the Digital UNIX Krash Utility.
Errors have been logged and the operating system is up.
Examine the operating system error log files to isolate the problem (Chapter 4).
If the problem occurs intermittently, run an operating system exerciser, such as DEC VET, to stress the system.
Refer to the DEC Verifier and Exerciser Tool User’s Guide (AA–PTTMD–TE) for instructions on running DEC VET.
1.2 Service Tools and Utilities
This section lists the array of service tools and utilities available for acceptance testing, diagnosis, and serviceability and provides recommendations for their use.
Error Handling/Logging Tools
Digital UNIX, OpenVMS, and Microsoft Windows NT operating systems provide recovery from errors, fault handling, and event logging. The DECevent Translation and Reporting Utility provides bit-to-text translation of event logs for interpretation for Digital UNIX and Open VMS error logs.
RECOMMENDED USE: Analysis of error logs is the primary method of diagnosis and fault isolation. If the system is up, or you are able to bring it up, look at this information first.
ROM-Based Diagnostics (RBDs)
Many ROM-based diagnostics and exercisers are embedded in AlphaServer 1000A systems. ROM-based diagnostics execute automatically at power-up and can be invoked in console mode using console commands.
Troubleshooting Strategy 1–7
Page 22
RECOMMENDED USE: ROM-based diagnostics are the primary means of testing the console environment and diagnosing the CPU, memory, Ethernet, I/O buses, and SCSI and DSSI subsystems. Use ROM-based diagnostics in the acceptance test procedures when you install a system, add a memory module, or replace the following components: CPU module, memory module, motherboard, I/O bus device, or storage device. Refer to Chapter 3 for information on running ROM-based diagnostics.
Loopback Tests
Internal and external loopback tests are used to isolate a failure by testing segments of a particular control or data path. The loopback tests are a subset of the ROM-based diagnostics.
RECOMMENDED USE: Use loopback tests to isolate problems with the COM2 serial port, the parallel port, and Ethernet controllers. Refer to Chapter 3 for instructions on performing loopback tests.
Firmware Console Commands
Console commands are used to set and examine environment variables and device parameters, as well as to invoke ROM-based diagnostics and exercisers. For example, the
show memory,show configuration
, and
show
device
commands are used to examine the configuration; the
set
(bootdef_ dev, auto_action, and boot_osflags) commands are used to set environment variables; and the
cdp
command is used to configure DSSI parameters.
RECOMMENDED USE: Use console commands to set and examine environment variables and device parameters and to run RBDs. Refer to Section 5.1 for information on configuration-related firmware commands and Chapter 3 for information on running RBDs.
Operating System Exercisers (DEC VET)
The Digital Verifier and Exerciser Tool (DEC VET) is supported by the Digital UNIX, OpenVMS, and Windows NT operating systems. DEC VET performs exerciser-oriented maintenance testing of both hardware and operating system.
RECOMMENDED USE: Use DEC VET as part of acceptance testing to ensure that the CPU, memory, disk, tape, file system, and network are interacting properly. Also use DEC VET to stress test the user’s environment and configuration by simulating system operation under heavy loads to diagnose intermittent system failures.
1–8 Troubleshooting Strategy
Page 23
Crash Dumps
For fatal errors, such as fatal bugchecks, Digital UNIX and OpenVMS operating systems will save the contents of memory to a crash dump file.
RECOMMENDED USE: Crash dump files can be used to determine why the system crashed. To save a crash dump file for analysis, you need to know the proper system settings. Refer to the OpenVMS Alpha System Dump Analyzer Utility Manual (AA-PV6UB-TE) or the Guide to Kernel Debugging (AA–PS2TD–TE) for Digital UNIX.
1.3 Information Services
Several information resources are available, including online information for servicers and customers, computer-based training, and maintenance documentation database services. A brief description of some of these resources follows.
Fast Track Service Help File
The information contained in this guide, including the field-replaceable unit (FRU) procedures and illustrations, is available in online format. You can download the hypertext file (A200A-S.HLP) or a self-extracting .HLP file from TIMA, or order the diskette (AK-QQRMA-CA) or the AlphaServer 1000A Maintenance Kit (QZ-OOUAB-GC). The maintenance kit includes hardcopy, diskette, and illustrated parts breakdown.
Alpha Firmware Updates
Under certain circumstances, such as a CPU upgrade or replacement of the system backplane, you need to update your system firmware. An Alpha Firmware CD–ROM is shipped on an ‘‘as released’’ basis with Digital UNIX, OpenVMS, and Windows NT operating systems. The Alpha firmware files can also be downloaded from the Internet as follows:
ftp://ftp.digital.com/pub/Digital/Alpha/firmware/
http://www.service.digital.com/alpha/server/firmware/ New versions of firmware released between shipments of the Alpha Firmware
CD–ROM are available in an interim directory: ftp://ftp.digital.com/pub/Digital/Alpha/firmware/interim/
Troubleshooting Strategy 1–9
Page 24
ECU Revisions
The EISA Configuration Utility (ECU) is used for configuring EISA options on AlphaServer systems. Systems are shipped with an ECU kit, which includes the ECU license. Customers who already have the ECU and license, but need the latest revision of the ECU, can order a separate kit. Call 1-800-DIGITAL to order.
If the customer plans to migrate from Digital UNIX or OpenVMS to Windows NT, you must re-run the appropriate ECU. Failure to run the operating­specific ECU will result in system failure.
OpenVMS Patches
Software patches for the OpenVMS operating system are available from the World Wide Web as follows:
http://www.service.digital.com/html/patch_service.html Choose the ‘‘Contract Access’’ option if you have a valid software contract
with Digital or you wish to become a software contract customer. Choose the ‘‘Public Access’’ options if you do not have a sofware service contract.
Late-Breaking Technical Information
You can download up-to-date files and late-breaking technical information from the Internet for managing AlphaServer 1000A systems.
FTP address:
ftp.digital.com cd /pub/DEC/Alpha/systems/as1000/docs
World Wide Web address:
http://www.service.digital.com/alpha/server/1000.html
The information includes firmware updates, the latest configuration utilities, software patches, lists of supported options, Wide SCSI information and more.
Supported Options
Refer to the AlphaServer 1000A Supported Options List for a list of options supported under Digital UNIX, OpenVMS, and Windows NT. The options list is available from the Internet as follows:
FTP address:
ftp://ftp.digital.com/pub/Digital/Alpha/systems/
World Wide Web address:
http://www.service.digital.com/alpha/server/
1–10 Troubleshooting Strategy
Page 25
You can obtain information about hardware configurations for the AlphaServer 1000A from the Digital Systems and Options Catalog. The catalog is regularly published to assist in ordering and configuring systems and hardware options. Each printing of the catalog presents all of the products that are announced, actively marketed, and available for ordering.
Access printable postscript files of any section of the catalog from the Internet as follows (Be sure to check the Readme file):
ftp://ftp.digital.com/pub/Digital/info/SOC/
Training
The following Computer Based Training (CBT) and lecture lab courses are available from the Digital training center:
Alpha Concepts
DSSI Concepts: EY-9823E
ISA and EISA Bus Concepts: EY-I113E-P0
RAID Concepts: EY-N935E
SCSI Concepts and Troubleshooting: EY-P841E, EY-N838E
Digital Assisted Services
Digital Assisted Services (DAS) offers products, services, and programs to customers who participate in the maintenance of Digital computer equipment. Components of Digital assisted services include:
Spare parts and kits
Diagnostics and service information/documentation
Tools and test equipment
Parts repair services, including Field Change Orders
Troubleshooting Strategy 1–11
Page 26
Page 27
2
Power-Up Diagnostics and Display
This chapter provides information on how to interpret error beep codes and the power-up display on the console screen. In addition, a description of the power-up and firmware power-up diagnostics is provided as a resource to aid in troubleshooting.
Section 2.1 describes how to interpret error beep codes at power-up.
Section 2.2 describes SROM memory tests that can be run at power-up to isolate failing SIMM memory.
Section 2.3 describes how to interpret the power-up screen display.
Section 2.4 describes how to troubleshoot mass-storage problems indicated at power-up or storage devices missing from the
show config
display.
Section 2.5 shows the location of storage device LEDs.
Section 2.6 describes how to troubleshoot EISA bus problems indicated at power-up or EISA devices missing from the
show config
display.
Section 2.7 describes how to troubleshoot PCI bus problems indicated at power-up or PCI devices missing from the
show config
display.
Section 2.8 describes the use of the Fail-Safe Loader.
Section 2.9 describes the power-up sequence.
Section 2.10 describes power-on self-tests.
Power-Up Diagnostics and Display 2–1
Page 28
2.1 Interpreting Error Beep Codes
If errors are detected at power-up, audible beep codes are emitted from the system. For example, if the SROM code could not find any good memory, you would hear a 1-3-3 beep code (one beep, a pause, a burst of three beeps, a pause, and another burst of three beeps).
The beep codes are the primary diagnostic tool for troubleshooting problems when console mode cannot be accessed. Refer to Table 2–1 for information on interpreting error beep codes.
Table 2–1 Interpreting Error Beep Codes
Beep Code Problem Corrective Action
1-1-2 ROM data path error detected while
loading ARC/SRM console code.
1. Use the Fail-Safe Loader to
load new ARC/SRM console code (Section 2.8).
2. If successfully loading new
console firmware does not solve the problem, replace the motherboard (Chapter 6).
1-1-4 The SROM code is unable to load the
console code: Flash ROM header area or checksum error detected.
1. Use the Fail-Safe Loader to
load new ARC/SRM console code (Section 2.8).
2. If successfully loading new
console firmware does not solve the problem, replace the motherboard (Chapter 6).
1-2-1 TOY NVRAM failure. Replace the TOY NVRAM chip (E78)
on system motherboard (Chapter 6).
(continued on next page)
2–2 Power-Up Diagnostics and Display
Page 29
Table 2–1 (Cont.) Interpreting Error Beep Codes
Beep Code Problem Corrective Action
1-3-3 No usable memory detected.
1. Verify that the memory modules
are properly seated and try powering up again.
2. Swap bank 0 memory with
known good memory and run SROM memory tests at power­up (Section 2.2).
3. If populating bank 0 with known
good memory does not solve the problem, replace the CPU daughter board (Chapter 6).
4. If replacing the CPU daughter
board does not solve the prob­lem, replace the motherboard (Chapter 6).
3-1-2 J1 jumper on CPU daughter board set
incorrectly or failure of native SCSI controller (Qlogic 1020A).
1. Check that the J1 jumper on the
CPU daughter board is set at bank 1 for AlphaServer 1000A systems, as opposed to bank 0, reserved for AlphaServer 1000 systems (Figure 2–1).
2. If the J1 jumper setting is
not the problem, replace the motherboard (Chapter 6).
(continued on next page)
Power-Up Diagnostics and Display 2–3
Page 30
Table 2–1 (Cont.) Interpreting Error Beep Codes
Beep Code Problem Corrective Action
3-3-1 Generic system failure. Possible problem
sources include the TOY NVRAM chip (Dallas DS1287A) or PCI-to-EISA bridge chipset (Intel 82375EB).
1. Replace the TOY NVRAM chip
(E78) on system motherboard (Chapter 6.)
2. If replacing the TOY NVRAM
chip did not solve the problem, replace the motherboard (Chapter 6).
3-3-2 J1 jumper on CPU daughter board set
incorrectly or failure of the PCI-to-PCI bridge (DECchip 21050).
1. Check that the J1 jumper on the
CPU daughter board is set at bank 1 for AlphaServer 1000A systems, as opposed to bank 0, reserved for AlphaServer 1000 systems (Figure 2–1).
2. If the J1 jumper setting is
not the problem, replace the motherboard (Chapter 6).
3-3-3 Failure of the native SCSI controller
(Qlogic 1020A) on the system mother­board.
Replace the motherboard (Chapter 6).
2.2 SROM Memory Power-Up Tests
To test SIMM memory and report the position of a failing SIMM, set SROM power-up tests by using jumper J1 (Figure 2–1) on the CPU daughter board. The progress and results of these tests are reported on the LCD display on the operator control panel (OCP).
To thoroughly test memory and data paths, complete the SROM tests in the order presented in Table 2–2. If a SIMM is reported bad, replace the SIMM (Chapter 6) and resume testing at bank 4 (Memory Test).
2–4 Power-Up Diagnostics and Display
Page 31
Table 2–2 SROM Memory Tests, CPU Jumper J1
Bank # Test Description Test Results
3 Cache Test: Tests
backup cache.
Test status displays on OCP:
....done.
If the test takes longer than a few seconds to complete, there is a problem with the backup cache—replace the CPU daughter board (Chapter 6).
5 Memory Test:
Tests memory with backup and data cache disabled.
Test status displays on OCP:
12345.done.
If an error is detected, the bank number and failing SIMM position are displayed. The following OCP message indicates a failing SIMM at bank 0, SIMM position 2.
FAIL B:0 S:2
Test duration: Approximately 10 seconds per 8 megabytes of memory.
Figure 2–2 shows the bank and SIMM layout for AlphaServer 1000A systems. After determining the bad SIMM, refer to Chapter 6 for instructions on replacing FRUs.
Note: The memory tests do not test the ECC SIMMs. If the operating system logs five or more single-bit correctible errors, replace the suspected ECC SIMMs with good SIMMs and repeat the memory test.
ECC SIMMs cannot be used in the standard memory banks (banks 0–3). ECC SIMMs are specialized for use only in ECC banks.
(continued on next page)
Power-Up Diagnostics and Display 2–5
Page 32
Table 2–2 (Cont.) SROM Memory Tests, CPU Jumper J1
Bank # Test Description Test Results
6 Memory Test,
Cache Enabled: Tests memory with backup and data cache enabled.
Test status displays on OCP:
12345.done.
If an error is detected, the bank number and failing SIMM position are displayed. The following OCP message indicates a failing SIMM at bank 0, SIMM position 2.
FAIL B:0 S:2
Test duration: Approximately 2 seconds per 8 megabytes of memory.
Figure 2–2 shows the bank and SIMM layout for AlphaServer 1000A systems. After determining the bad SIMM, refer to Chapter 6 for instructions on replacing FRUs.
Note: The memory tests do not test the ECC SIMMs. If the operating system logs five or more single-bit correctible errors, replace the suspected ECC SIMMs with good SIMMs and repeat the memory test.
ECC SIMMs cannot be used in the standard memory banks (banks 0–3). ECC SIMMs are specialized for use only in ECC banks.
(continued on next page)
2–6 Power-Up Diagnostics and Display
Page 33
Table 2–2 (Cont.) SROM Memory Tests, CPU Jumper J1
Bank # Test Description Test Results
4 Backup Cache Test:
Tests backup cache alternatively with data cache enabled then disabled.
Test status displays on OCP:
d 12345.done. D 12345.done. D 12345.done. d 12345.done.
If an error is detected, the bank number and failing SIMM position are displayed. The following OCP message indicates a failing SIMM at bank 0, SIMM position 2.
FAIL B:0 S:2
Test duration: Approximately 2 seconds per 8 megabytes of memory.
Figure 2–2 shows the bank and SIMM layout for AlphaServer 1000A systems. After determining the bad SIMM, refer to Chapter 6 for instructions on replacing FRUs.
Note: The memory tests do not test the ECC SIMMs. If the operating system logs five or more single-bit correctible errors, replace the suspected ECC SIMMs with good SIMMs and repeat the memory test.
ECC SIMMs cannot be used in the standard memory banks (banks 0–3). ECC SIMMs are specialized for use only in ECC banks.
Power-Up Diagnostics and Display 2–7
Page 34
Figure 2–1 Jumper J1 on the CPU Daughter Board
MA00926
J1
7
6
5
4
3
2
1
0
Bank Jumper Setting
0 Standard boot setting (AlphaServer 1000 systems) 1 Standard boot setting (AlphaServer 1000A systems) 2 Mini-console setting: Internal use only 3 SROM CacheTest: backup cache test 4 SROM BCacheTest: backup cache and memory test 5 SROM memTest: memory test with backup and data cache disabled 6 SROM memTestCacheOn: memory test with backup and data cache enabled 7 Fail-Safe Loader setting: selects fail-safe loader firmware
2–8 Power-Up Diagnostics and Display
Page 35
Figure 2–2 AlphaServer 1000A Memory Layout
Bank 3
Bank 2
Bank 1
Bank 0
ECC Banks
MA00327
SIMM 3 SIMM 2
SIMM 1 SIMM 0
ECC SIMM for Bank 2 ECC SIMM for Bank 0
ECC SIMM for Bank 3 ECC SIMM for Bank 1
SIMM 3 SIMM 2
SIMM 1 SIMM 0
SIMM 3 SIMM 2
SIMM 1 SIMM 0
SIMM 3 SIMM 2
SIMM 1 SIMM 0
2.3 Power-Up Screen
During power-up self-tests, the test status and result are displayed on the console terminal. Information similar to the following example should be displayed on the screen.
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5. ef.df.ee.f4.ed.ec.initializing keyboard
eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
X4.4-5365, built on Oct 27 1995 at 09:26:04 >>>
Table 2–3 provides a description of the power-up countdown for output to the serial console port. If the power-up display stops, use the beep codes (Table 2–1) and Table 2–3 to isolate the likely field-replaceable unit (FRU).
Power-Up Diagnostics and Display 2–9
Page 36
Table 2–3 Console Power-Up Countdown Description and Field Replaceable
Units (FRUs)
Countdown Number Description Likely FRU
ff Console initialization started Non-specific/Status message fe Initialized idle PCB Non-specific/Status message fd Initializing semaphores Non-specific/Status message fc,fb,fa Initializing heap Non-specific/Status message f9 Initializing driver structures Non-specific/Status message f8 Initializing idle process PID Non-specific/Status message f7 Initializing file system TOY chip (E78) f6 Initializing timer data structures Non-specific/Status message f5 Lowering IPL Non-specific/Status message f4 Entering idle loop TOY chip (E78) ef Start memory configuration (heap) SIMM memory or backplane df Configure PCI and EISA bus PCI or EISA option ee Start phase 1 drivers: NVRAM and
PCICFG drivers
NVRAM chip (E14) or PCI option
ed Start phase 2 drivers: IIC bus and OCP
drivers
Non-specific/Status message
ec Start phase 3 drivers (console select):
tt serial line class, tga graphics, vga graphics, and keyboard drivers
Keyboard, VGA or TGA option, or backplane
eb Run power-up memory test SIMM memory ea Start phase 4 drivers Non-specific/Status message e9 Phase 4 drivers complete Non-specific/Status message e8 Initialize environment variables Non-specific/Status message e7 Start SCSI class driver Backplane (on-board Qlogic
1020A) e6 Start phase 5 drivers: I/O drivers PCI or EISA option e5 Restore timers TOY chip (E78)
Digital UNIX or OpenVMS Systems
Digital UNIX and OpenVMS operating systems are supported by the SRM firmware (see Section 5.1.1). The SRM console prompt follows:
>>>
2–10 Power-Up Diagnostics and Display
Page 37
Windows NT Systems
The Windows NT operating system is supported by the ARC firmware (see Section 5.1.1). Systems using Windows NT power up to the ARC boot menu as follows:
Alpha Firmware Version n.nn Copyright (c) 1993-1995 Microsoft Corporation Copyright (c) 1993-1995 Digital Equipment Corporation
Boot menu:
Boot Windows NT Boot an alternate operating system... Run a program... Supplementary menu...
Use the arrow keys to select, then press Enter.
2.3.1 Console Event Log
AlphaServer 1000A systems maintain a console event log consisting of status messages received during power-on self-tests. If problems occur during power-up, standard error messages indicated by asterisks (***) may be embedded in the console event log. To display a console event log, use the
more elorcat el
command.
Note
To stop the screen display from scrolling, press
Ctrl/S
. To resume scrolling,
press
Ctrl/Q
.
You can also use the command,
more el
, to display the console event log
one screen at a time.
The following example shows a console event log that contains a standard error message indicating that the mouse is not plugged in or is not working.
>>> cat el ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.
ef.df.ee.f4.ed.ec.initializing keyboard ** mouse error **
eb.ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0. X4.4-5365, built on Oct 27 1995 at 09:26:04 >>>
Power-Up Diagnostics and Display 2–11
Page 38
2.4 Mass Storage Problems Indicated at Power-Up
Mass storage failures at power-up are usually indicated by read fail messages. Other problems are indicated by storage devices missing from the
show config
display.
Table 2–4 provides information for troubleshooting mass storage problems
indicated at power-up or storage devices missing from the
show config
display.
Table 2–5 provides troubleshooting tips for AlphaServer systems that use the
RAID Array 200 Subsystem.
Section 2.5 provides information on storage device LEDs. Use Tables 2–4 and 2–5 to diagnose the likely cause of the problem.
Table 2–4 Mass Storage Problems
Problem Symptom Corrective Action
Drive failure Fault LED for drive is on
(steady) (Section 2.5).
Replace drive.
Duplicate SCSI IDs Drives with duplicate SCSI
IDs are missing from the
show config
display.
Correct SCSI IDs. May need to reconfigure internal StorageWorks backplane (Section 5.8).
SCSI ID set to 7 (reserved for host ID)
Valid drives are missing from the
show config
display. One drive may appear
seven times on the
show
config
display.
Correct SCSI IDs.
Duplicate host IDs on a shared bus
Valid drives are missing from the
show config
display. One drive may appear
seven times on the
show
config
display.
Change host ID through the pk*0_host_id environment variable (
set pk*0_host_id
) for systems running OpenVMS or Digital UNIX (SRM console). For systems running Windows NT (ARC console), choose ‘‘Set default configuration’’ in the Setup Menu.
(continued on next page)
2–12 Power-Up Diagnostics and Display
Page 39
Table 2–4 (Cont.) Mass Storage Problems
Problem Symptom Corrective Action
Missing or loose cables. Drives not properly seated on StorageWorks shelf
Activity LEDs do not come on. Drive missing from the
show config
display.
Remove device and inspect cable connections. Reseat drive on StorageWorks shelf.
SCSI bus length exceeded
Drives may disappear intermittently from the
show config
and
show
device
displays.
A SCSI bus extended to the internal StorageWorks shelf with the backplane configured as a single bus, cannot be extended outside of the enclosure.
A SCSI bus extended to the internal StorageWorks shelf with the backplane configured as a dual bus, can be extended 1 meter outside of the enclosure.
The entire SCSI bus length, from terminator to terminator, must not exceed 6 meters for single­ended SCSI-2 at 5 MB/sec, or 3 meters for single-ended SCSI-2 at 10 MB/sec.
Terminator missing or wrong terminator used
Read/write errors in the console event log; storage adapter port may fail.
If the bulkhead terminator for the removable-media bus is missing, removable media devices may not be recognized by the system and may be missing from the
show config
and
show
device
displays.
Attach appropriate terminators as needed (external SCSI terminator for use with the RAID Array 200 Subsystem, 12-41667­04 (68-pin), 17-04166-02 (50-pin); external SCSI terminator for removable-bus, 12-41667-05).
Note: The SCSI terminator jumper (J51) on the system motherboard should be set to ‘‘on’’ to enable the onboard SCSI termination.
Extra terminator Devices produce errors or
device IDs are dropped.
Check that bus is terminated only at beginning and end. Remove unnecessary terminators.
Note: The SCSI terminator jumper (J51) on the system motherboard should be set to ‘‘on’’ to enable the onboard SCSI termination.
(continued on next page)
Power-Up Diagnostics and Display 2–13
Page 40
Table 2–4 (Cont.) Mass Storage Problems
Problem Symptom Corrective Action
SCSI storage controller failure
Problems persist after eliminating the problem sources.
Replace failing EISA or PCI storage adapter module (or motherboard for the native SCSI controller).
Table 2–5 provides troubleshooting hints for AlphaServer 1000A systems that have the StorageWorks RAID Array 200 Subsystem. The RAID subsystem includes either the KZESC-xx (SWXCR-Ex) or the KZPSC-xx (SWXCR-Px) PCI backplane RAID controller.
Table 2–5 Troubleshooting RAID Problems
Symptom Action
Some RAID drives do not appear on the
show device d
display.
Valid configured RAID logical drives will appear as DRA0–DRAn, not as DKn. Configure the drives by running the RAID Configuration Utility (RCU), following the instructions in the StorageWorks RAID
Array 200 Subsystems Controller Installation and Standalone Configuration Utility User ’s Guide, EK-
SWRA2-IG. Reminder: several physical disks can be grouped as a
single logical DRAn device. External SCSI terminators used with the SWXCR
controller must be of the following type: 12-41667-04 (68-pin); 17-41667-02 (50-pin).
Drives on the SWXCR controller power up with the amber Fault light on.
Whenever you move drives onto or off of the controller, run the RAID Configuration Utility to set up the drives and logical units. Follow the instructions in the
StorageWorks RAID Array 200 Subsystems Controller Installation and Standalone Configuration Utility User’s Guide.
External SCSI terminators used with the SWXCR controller must be of the following type: 12-41667-04 (68-pin); 17-41667-02 (50-pin).
(continued on next page)
2–14 Power-Up Diagnostics and Display
Page 41
Table 2–5 (Cont.) Troubleshooting RAID Problems
Symptom Action
Cannot access disks connected to the RAID subsystem on Windows NT systems.
On Windows NT systems, disks connected to the controller must be spun up before they can be accessed. While running the ECU, verify that the controller is set to spin up two disks every six seconds. This is the default setting if you are using the default configuration files for the controller. If the settings are different, adjust them as needed.
2.5 Storage Device LEDs
Storage device LEDs indicate the status of the device.
Figure 2–3 shows the LEDs for disk drives contained in a StorageWorks shelf. A failure is indicated by the Fault light on each drive.
Figure 2–4 shows the Activity LED for the floppy drive. This LED is on when the drive is in use.
Figure 2–5 shows the Activity LED for the CD–ROM drive. This LED is on when the drive is in use.
For information on other storage devices, refer to the documentation provided by the manufacturer or vendor.
Power-Up Diagnostics and Display 2–15
Page 42
Figure 2–3 StorageWorks Disk Drive LEDs (SCSI)
Activity Fault
MA00927
Figure 2–4 Floppy Drive Activity LED
MA00330
Activity LED
2–16 Power-Up Diagnostics and Display
Page 43
Figure 2–5 CD–ROM Drive Activity LED
Activity LED
MA00333
Power-Up Diagnostics and Display 2–17
Page 44
2.6 EISA Bus Problems Indicated at Power-Up
EISA bus failures at power-up are usually indicated by the following messages displayed during power-up:
EISA Configuration Error. Run the EISA Configuration Utility.
Run the EISA Configuration Utility (ECU) (Section 5.4) when this message is displayed. Other EISA bus problems are indicated by the absence of EISA devices from the
show config
display.
Table 2–6 provides steps for troubleshooting EISA bus problems that persist after you run the ECU.
Table 2–6 EISA Troubleshooting
Step Action
1 Confirm that the EISA module and any cabling are properly seated. 2 Run the ECU to:
Confirm that the system has been configured with the most recently installed controller.
See what the hardware jumper and switch setting should be for each ISA controller.
See what the software setting should be for each ISA and EISA controller.
See if the ECU deactivated (<>) any controllers to prevent conflict.
See if any controllers are locked (!), which limits the ECU’s ability to change resource assignments.
3 Confirm that the hardware jumpers and switches on ISA controllers reflect the
settings indicated by the ECU. Start with the last ISA module installed.
4 Run ROM-based diagnostics for the type of option:
Storage adapter—Run
test
to exercise the storage devices off the EISA
controller option (Section 3.3.1).
Ethernet adapter—Run
netewornetwork
to exercise an Ethernet adapter
(Section 3.3.4, Section 3.3.5).
5 Check for a bad slot by moving the last installed controller to a different slot. 6 Call the option manufacturer or support for help.
2–18 Power-Up Diagnostics and Display
Page 45
2.6.1 Additional EISA Troubleshooting Tips
The following tips can aid in isolating EISA bus problems.
Peripheral device controllers need to be seated (inserted) carefully, but firmly, into their slots to make all necessary contacts. Improper seating is a common source of problems for EISA modules.
Be sure you run the correct version of the ECU for the operating system. For windows NT, use ECU diskette DECpc AXP (AK-PYCJ*-CA); for Digital UNIX and OpenVMS, use ECU diskette DECpc AXP (AK-Q2CR*-CA).
The CFG files supplied with the option you want to install may not work on AlphaServer 1000A systems. Some CFG files call overlay files that are not required on this system or may reference inappropriate system resources, for example, BIOS addresses. Contact the option vendor to obtain the proper CFG file.
Peripherals cannot share direct memory access (DMA) channels. Assignment of more than one peripheral to the same DMA channel can cause unpredictable results or even loss of function of the EISA module.
Not all EISA products work together. EISA is an open standard, and not every EISA product or combination of products can be tested. Violations of specifications may matter in some configurations, but not in others.
Manufacturers of EISA options often test the most common combinations and may have a list of ISA and EISA options that do not function in combination with particular systems. Be sure to check the documentation or contact the option vendor for the most up-to-date information.
EISA systems will not function unless they are first configured using the ECU.
The ECU will not notify you if the configuration program diskette is write­protected when it attempts to write the system configuration file (
system.sci
)
to the diskette.
Power-Up Diagnostics and Display 2–19
Page 46
2.7 PCI Bus Problems Indicated at Power-Up
PCI bus failures at power-up are usually indicated by the inability of the system to see the device. Table 2–7 provides steps for troubleshooting PCI bus problems. Use the table to diagnose the likely cause of the problem.
Note
Some PCI devices do not implement PCI parity, and some have a parity­generating scheme in which parity is sometimes incorrect or is not compliant with the PCI Specification. In such cases, the device functions properly as long as parity is not checked. The pci_parity environment variable for the SRM console, or the ENABLEPCIPARITY CHECKING environment variable for the ARC console, allow you to turn off parity checking so that false PCI parity errors do not result in machine check errors.
When you disable PCI parity, no parity checking is implemented for any PCI device, even those devices that produce correct, compliant parity.
Table 2–7 PCI Troubleshooting
Step Action
1 Confirm that the PCI module and any cabling are properly seated. 2 Run ROM-based diagnostics for the type of option:
Storage adapter—Run
test
to exercise the storage devices off the PCI
controller option (Section 3.3.1).
Ethernet adapter—Run
netewornetwork
to exercise an Ethernet adapter
(Section 3.3.4, Section 3.3.5).
3 Check for a bad slot by moving the last installed controller to a different slot. 4 Call the option manufacturer or support for help.
2.7.1 Additional PCI Troubleshooting Tips
Some PCI options are restricted to the primary PCI bus, slots 11, 12, and 13. Refer to the following documents for restrictions on specific PCI options:
AlphaServer 1000A READ THIS FIRST—shipped with the system.
AlphaServer 1000A Supported Options List—The options list is available from the Internet at the following locations:
2–20 Power-Up Diagnostics and Display
Page 47
ftp://ftp.digital.com/pub/DEC/Alpha/systems/ http://www.service.digital.com/alpha/server/
2.8 Fail-Safe Loader
The fail-safe loader (FSL) is a redundant or backup ROM that allows you to power up without running power-up diagnostics and load new SRM/ARC and FSL console firmware from the firmware diskette.
Note
The fail-safe loader should be used only when a failure at power-up prohibits you from getting to the console program. You cannot boot an operating system from the fail-safe loader.
If a checksum error is detected when the SRM/ARC console is loading at power-up (error beep code 1-1-4), you need to activate the fail-safe loader and reinstall the firmware.
The fail-safe loader (FSL) allows you to attempt to recover when one of the following is the cause of a problem getting to the console program under normal power-up:
A hardware or power failure, or accidental power down during a firmware upgrade occurred.
A configuration error, such as an incorrect environment variable setting or an inappropriate nvram script.
A driver error at power-up.
A checksum error is detected when the SRM console is loading at power-up (corrupted firmware).
The fail-safe loader program is also available on diskette.
2.8.1 Fail-Safe Loader Functions
From the FSL program, you can update or load new SRM/ARC console firmware and FSL console firmware.
Note
When installing new console firmware, the flash ROM VPP enable jumper (J50) on the motherboard must be enabled.
Power-Up Diagnostics and Display 2–21
Page 48
2.8.2 Activating the Fail-Safe Loader
To activate the FSL:
1. Install the jumper at bank 7 of the J1 jumper on the CPU daughter board (Figure 2–6). The jumper is normally installed in the standard boot setting (bank 1 for AlphaServer 1000A systems).
2. Install the console firmware diskette and turn on the system. Two messages are displayed on the operator control panel (OCP) when the
FSL program loads the diskette:
OCP Message Meaning
Floppy Loader
FSL firmware is executing.
Starting CPU
FSL firmware found a valid boot block, loaded the program into memory, and is attempting to transfer control to the loaded program.
3. Reinstall the console firmware from a firmware diskette.
4. When you have finished, power down and return the J1 jumper to the standard boot setting (bank 1).
2–22 Power-Up Diagnostics and Display
Page 49
Figure 2–6 Jumper J1 on the CPU Daughter Board
MA00926
J1
7
6
5
4
3
2
1
0
Bank Jumper Setting
0 Standard boot setting (AlphaServer 1000 systems) 1 Standard boot setting (AlphaServer 1000A systems) 2 Mini-console setting: Internal use only 3 SROM CacheTest: backup cache test 4 SROM BCacheTest: backup cache and memory test 5 SROM memTest: memory test with backup and data cache disabled 6 SROM memTestCacheOn: memory test with backup and data cache enabled 7 Fail-Safe Loader setting: selects fail-safe loader firmware
Power-Up Diagnostics and Display 2–23
Page 50
2.9 Power-Up Sequence
During the AlphaServer 1000A power-up sequence, the power supplies are stabilized and the system is initialized and tested through the firmware power-on self-tests.
The power-up sequence includes the following:
Power supply power-up: – AC power-up – DC power-up
Two sets of power-on diagnostics: – Serial ROM diagnostics – Console firmware-based diagnostics
Caution
The AlphaServer 1000A enclosure will not power up if the top cover is not securely attached. Removing the top cover will cause the system to shut down.
2.9.1 AC Power-Up Sequence
The following power-up sequence occurs when AC power is applied to the system (system is plugged in) or when electricity is restored after a power outage:
1. The front end of the power supply begins operation and energizes.
2. The power supply then waits for the DC power to be enabled.
Note
The top cover and side panels must be securely installed. A safety interlock prevents the system from being powered on with the cover and panels removed.
2–24 Power-Up Diagnostics and Display
Page 51
2.9.2 DC Power-Up Sequence
DC power is applied to the system with the DC On/Off button on the operator control panel.
A summary of the DC power-up sequence follows:
1. When the DC On/Off button is pressed, the power supply checks for a POK_H condition.
2. 12V, 5V, 3.3V, and -12V outputs are energized and stabilized. If the outputs do not come into regulation, the power-up is aborted and the power supply enters the latching-shutdown mode.
2.10 Firmware Power-Up Diagnostics
After successful completion of AC and DC power-up sequences, the processor performs its power-up diagnostics. These tests verify system operation, load the system console, and test the core system (CPU, memory, and motherboard), including all boot path devices. These tests are performed as two distinct sets of diagnostics:
1. Serial ROM diagnostics—These tests are loaded from the serial ROM located on the CPU daughter board into the CPU’s instruction cache (I-cache). The tests check the basic functionality of the system and load the console code from the FEPROM on the motherboard into system memory.
Failures during these tests are indicated by audible error beep codes (Table 2–1). Failures of customized SROM tests (Section 2.2), set using the J1 jumper on the CPU daughter board, are displayed on the operator control panel.
2. Console firmware-based diagnostics—These tests are executed by the console code. They test the core system, including all boot path devices.
Failures during these tests are reported to the console terminal through the power-up screen or console event log.
2.10.1 Serial ROM Diagnostics
The serial ROM diagnostics are loaded into the CPU’s instruction cache from the serial ROM on the CPU daughter board. The diagnostics test the system in the following order:
1. Test the CPU and backup cache located on the CPU daughter board.
2. Test the CPU module’s system bus interface.
Power-Up Diagnostics and Display 2–25
Page 52
3. Test the system bus to PCI bus bridge and system bus to EISA bus bridge. If the PCI bridge fails or EISA bridge fails, an audible error beep code (3-3-1) sounds (Table 2–1). The power-up tests continue despite these errors.
4. Test the PCI-to-PCI bus bridge. If the bridge fails, an error beep code (3-3-2) sounds.
5. Test the native SCSI controller. If the controller fails, an error beep code (3-1-2) sounds.
6. Configure the memory in the system and test only the first 4 MB of memory. If there is more than one memory module of the same size, the lowest numbered memory module (one closest to the CPU) is tested first.
If the memory test fails, the failing bank is mapped out and memory is reconfigured and re-tested. Testing continues until good memory is found. If good memory is not found, an error beep code (1-3-3) is generated and the power-up tests are terminated.
7. Check the data path to the FEPROM on the motherboard.
8. The console program is loaded into memory from the FEPROM on the motherboard. A checksum test is executed for the console image. If the checksum test fails, an error beep code (1-1-4) is generated, and the power-up tests are terminated.
If the checksum test passes, control is passed to the console code, and the console firmware-based diagnostics are run.
2.10.2 Console Firmware-Based Diagnostics
Console firmware-based tests are executed once control is passed to the console code in memory. They check the system in the following order:
1. Perform a complete check of system memory. Steps 2–5 may be completed in parallel.
2. Start the I/O drivers for mass storage devices and tapes. At this time a complete functional check of the machine is made. After the I/O drivers are started, the console program continuously polls the bus for devices (approximately every 20 or 30 seconds).
3. Check that EISA configuration information is present in NVRAM for each EISA module detected and that no information is present for modules that have been removed.
2–26 Power-Up Diagnostics and Display
Page 53
4. Run exercisers on the drives currently seen by the system.
Note
This step does not ensure that all disks in the system will be tested or that any device drivers will be completely tested. Spin-up time varies for different drives, so not all disks may be on line at this point in the power-up sequence. To ensure complete testing of disk devices, use the
test
command (Section 3.3.1).
5. Enter console mode or boot the operating system. This action is determined by the auto_action environment variable.
If the os_type environment variable is set to NT, the ARC console is loaded into memory, and control is passed to the ARC console.
Power-Up Diagnostics and Display 2–27
Page 54
Page 55
3
Running System Diagnostics
This chapter provides information on how to run system diagnostics.
Section 3.1 describes how to run ROM-based diagnostics, including error reporting utilities and loopback tests.
Section 3.4 describes acceptance testing and initialization procedures.
Section 3.5 describes the DEC VET operating system exerciser.
3.1 Running ROM-Based Diagnostics
ROM-based diagnostics (RBDs), which are part of the console firmware that is loaded from the FEPROM on the system motherboard, offer many powerful diagnostic utilities, including the ability to examine error logs from the console environment and run system- or device-specific exercisers.
AlphaServer 1000A RBDs rely on exerciser modules, rather than functional tests, to isolate errors. The exercisers are designed to run concurrently, providing a maximum bus interaction between the console drivers and the target devices.
The multitasking ability of the console firmware allows you to run diagnostics in the background (using the background operator ‘‘&’’ at the end of the command). You run RBDs by using console commands.
Note
ROM-based diagnostics, including the
test
command, are run from the SRM console (firmware used by OpenVMS and Digital UNIX operating systems). If you are running a Windows NT system, refer to Section 5.1.2 for the steps used to switch between consoles.
RBDs report errors to the console terminal and/or the console event log.
Running System Diagnostics 3–1
Page 56
3.2 Command Summary
Table 3–1 provides a summary of the diagnostic and related commands.
Table 3–1 Summary of Diagnostic and Related Commands
Command Function Reference Acceptance Testing
test Quickly tests the core system. The
test
command is the primary diagnostic for acceptance testing and console environment diagnosis.
Section 3.3.1
Error Reporting
cat el Displays the console event log. Section 3.3.2 more el Displays the console event log one screen at a time. Section 3.3.2
Extended Testing/Troubleshooting
memory Runs memory exercises each time the command is
entered. These exercises run concurrently in the background.
Section 3.3.3
net -ic Initializes the MOP counters for the specified
Ethernet port.
Section 3.3.7
net -s Displays the MOP counters for the specified
Ethernet port.
Section 3.3.6
netew Runs external MOP loopback tests for specified
EISA- or PCI-based ew* (DECchip 21040, TULIP) Ethernet ports.
Section 3.3.4
network Runs external MOP loopback tests for specified
EISA- or PCI-based er* (DEC 4220, LANCE) Ethernet ports.
Section 3.3.5
(continued on next page)
3–2 Running System Diagnostics
Page 57
Table 3–1 (Cont.) Summary of Diagnostic and Related Commands
Command Function Reference Loopback Testing
test lb Conducts loopback tests for COM2 and the parallel
port in addition to quick core system tests.
Section 3.3.1
netew Runs external MOP loopback tests for specified
EISA- or PCI-based ew* (DECchip 21040, TULIP) Ethernet ports.
Section 3.3.4
network Runs external MOP loopback tests for specified
EISA- or PCI-based er* (DEC 4220, LANCE) Ethernet ports.
Section 3.3.5
Diagnostic-Related Commands
kill Terminates a specified process. Section 3.3.8 kill_diags Terminates all currently executing diagnostics. Section 3.3.8 show_status Reports the status of currently executing test
/exercisers.
Section 3.3.9
3.3 Command Reference
This section provides detailed information on the diagnostic commands and related commands.
Running System Diagnostics 3–3
Page 58
3.3.1 test
The
test
command runs firmware diagnostics for the entire core system. The tests are run concurrently in the background. Fatal errors are reported to the console terminal.
The
cat el
command should be used in conjunction with the
test
command to
examine test/error information reported to the console event log. Because the tests are run concurrently and indefinitely (until you stop them with
the
kill_diags
command), they are useful in flushing out intermittent hardware
problems.
Note
By default, no write tests are performed on disk and tape drives. Media must be installed to test the floppy drive and tape drives. A loopback connector is required for the COM2 (9-pin loopback connector, 12-27351-
01) port. The test command does not test the DNSES, TGA card, reflective memory
option, nor third party options. When using the
test
command after shutting down an operating system, you must initialize the system to a quiescent state. Enter the following commands at the SRM console:
P00>>> set auto_action halt P00>>> init ... P00>>> test
After testing is completed, set the auto_action environment variable to its previous value (usually, boot) and use the Reset button to reset the system.
To terminate the tests, use the
kill
command to terminate an individual
diagnostic or the
kill_diags
command to terminate all diagnostics. Use the
show_status
display to determine the process ID when terminating an individual
diagnostic test.
Note
A serial loopback connector (12-27351-01) must be installed on the COM2 serial port for the
kill_diags
command to successfully terminate system
tests.
3–4 Running System Diagnostics
Page 59
The
test
script tests devices in the following order:
1. Console loopback tests if lb argument is specified: COM2 serial port and parallel port.
2. Network external loopback tests for E*A0. This test requires that the Ethernet port be terminated or connected to a live network; otherwise, the test will fail.
3. Memory tests (one pass).
4. Read-only tests: DK* disks, DR* disks, DU* disks, MK* tapes, DV* floppy.
5. VGA console tests. These tests are run only if the console environment variable is set to ‘‘serial.’’ The VGA console test displays rows of the letter ‘‘H’’.
Synopsis:
test [lb]
Argument:
[lb] The loopback option includes console loopback tests for the COM2 serial
port and the parallel port during the test sequence.
Examples:
In the following example, the system is tested and the tests complete successfully.
Note
Examine the console event log after running tests.
>>> test Requires diskette and loopback connectors on COM2 and parallel port type kill_diags to halt testing type show_status to display testing progress type cat el to redisplay recent errors Testing COM2 port Setting up network test, this will take about 20 seconds Testing the network
48 Meg of System Memory Bank 0 = 16 Mbytes(4 MB Per Simm) Starting at 0x00000000 Bank 1 = 16 Mbytes(4 MB Per Simm) Starting at 0x01000000 Bank 2 = 16 Mbytes(4 MB Per Simm) Starting at 0x02000000 Bank 3 = No Memory Detected
Running System Diagnostics 3–5
Page 60
Testing the memory Testing parallel port Testing the SCSI Disks Non-destructive Test of the Floppy started dka400.4.0.6.0 has no media present or is disabled via the RUN/STOP switch file open failed for dka400.4.0.6.0 Testing the VGA(Alphanumeric Mode only) Printer offline file open failed for para
>>> show_status
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- -------------
00000001 idle system 0 0 0 0 0 0000002d exer_kid tta1 0 0 0 1 0 0000003d nettest era0.0.0.2.1 43 0 0 1376 1376 00000045 memtest memory 7 0 0 424673280 424673280 00000052 exer_kid dka100.1.0.6 0 0 0 0 2688512 00000053 exer_kid dka200.2.0.6 0 0 0 0 922624 >>> kill_diags >>>
In the following example, the system is tested and the system reports a fatal error message. No network server responded to a loopback message. Ethernet connectivity on this system should be checked.
>>> test Requires diskette and loopback connectors on COM2 and parallel port type kill_diags to halt testing type show_status to display testing progress type cat el to redisplay recent errors Testing COM2 port Setting up network test, this will take about 20 seconds Testing the network
*** Error (era0), Mop loop message timed out from: 08-00-2b-3b-42-fd *** List index: 7 received count: 0 expected count 2 >>>
3–6 Running System Diagnostics
Page 61
3.3.2 cat el and more el
The
cat el
and
more el
commands display the current contents of the console event log. Status and error messages (if problems occur) are logged to the console event log at power-up, during normal system operation, and while running system tests.
Standard error messages are indicated by asterisks (***). When
cat el
is used, the contents of the console event log scroll by. You can use
the
Ctrl/S
combination to stop the screen from scrolling,
Ctrl/Q
to resume scrolling.
The
more el
command allows you to view the console event log one screen at a
time.
Synopsis:
cat el or more el
Examples:
The following examples show abbreviated console event logs that contains a standard error message:
The error message indicates the keyboard is not plugged in or is not working.
>>> cat el *** keyboard not plugged in...
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5. ef.df.ee.f4.ed.ec.eb.ea.e9.e8.e7.e6.port pka0.7.0.6.0 initialized, scripts are at 4f7faa0 resetting the SCSI bus on pka0.7.0.6.0 port pkb0.7.0.12.0 initialized, scripts are at 4f82be0 resetting the SCSI bus on pkb0.7.0.12.0 e5.e4.e3.e2.e1.e0. V1.1-1, built on Nov 4 1994 at 16:44:07 device dka400.4.0.6.0 (RRD43) found on pka0.4.0.6.0 >>>
Running System Diagnostics 3–7
Page 62
3.3.3 memory
The
memory
command tests memory by running a memory exerciser each time the command is entered. The exercisers are run in the background and nothing is displayed unless an error occurs.
The number of exercisers, as well as the length of time for testing, depends on the context of the testing. Generally, running three to five exercisers for 15 minutes to 1 hour is sufficient for troubleshooting most memory problems.
To terminate the memory tests, use the
kill
command to terminate an individual
diagnostic or the
kill_diags
command to terminate all diagnostics. Use the
show_status
display to determine the process ID when terminating an individual
diagnostic test.
Synopsis:
memory
Examples:
The following is an example with no errors.
>>> memory >>> memory >>> memory Testing the memory >>> show_status
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------­00000001 idle system 0 0 0 0 0 0000006b memtest memory 1 0 0 53477376 53477376 00000071 memtest memory 1 0 0 31457280 31457280 00000077 memtest memory 1 0 0 24117248 24117248 >>> kill_diags >>>
3–8 Running System Diagnostics
Page 63
The following is an example with a memory compare error indicating bad SIMMs.
>>> memory >>> memory >>> memory
*** Hard Error - Error #44 - Memory compare error
Diagnostic Name ID Device Pass Test Hard/Soft 1-JAN-2066 memtest 000000c8 brd0 1 1 1 0 12:00:01 Expected value: 00000004 Received value: 80000001 Failing addr: 800001c
*** End of Error *** >>> kill_diags
>>>
Running System Diagnostics 3–9
Page 64
3.3.4 netew
The
netew
command is used to run MOP loopback tests for any EISA- or PCI­based ew* (DECchip 21040, TULIP) Ethernet ports. The command can also be used to test a port on a ‘‘live’’ network.
The loopback tests are set to run continuously (-p pass_count set to 0). Use the
kill
command (or
Ctrl/C
) to terminate an individual diagnostic or the
kill_diags
command to terminate all diagnostics. Use the
show_status
display to determine
the process ID when terminating an individual diagnostic test.
Note
While some results of network tests are reported directly to the console, you should examine the console event log (using the
cat elormore el
commands) for complete test results.
Synopsis:
netew When the
netew
command is entered, the following script is executed:
net -sa ew*0>ndbr/lp_nodes_ew*0 set ew*0_loop_count 2 2>nl set ew*0_loop_inc 1 2>nl set ew*0_loop_patt ffffffff 2>nl set ew*0_loop_size 10 2>nl set ew*0_lp_msg_node 1 2>nl net -cm ex ew*0 echo "Testing the network" nettest ew*0 -sv 3 -mode nc -p 0 -w 1 &
The script builds a list of nodes for which to send MOP loopback packets, sets certain test environment variables, and tests the Ethernet port by using the following variation of the nettest exerciser:
netew ew*0 -sv 3 -mode nc -p 0 -w 1 &
3–10 Running System Diagnostics
Page 65
Testing an Ethernet Port:
>>> netew >>> show_status
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------­00000001 idle system 0 0 0 0 0 000000d5 nettest ewa0.0.0.0.0 13 0 0 308672 308672 >>> kill_diags >>>
Running System Diagnostics 3–11
Page 66
3.3.5 network
The
network
command is used to run MOP loopback tests for any EISA- or PCI­based er* (DEC 4220, LANCE) Ethernet ports. The command can also be used to test a port on a ‘‘live’’ network.
The loopback tests are set to run continuously (-p pass_count set to 0). Use the
kill
command (or
Ctrl/C
) to terminate an individual diagnostic or the
kill_diags
command to terminate all diagnostics. Use the
show_status
display to determine
the process ID when terminating an individual diagnostic test.
Note
While some results of network tests are reported directly to the console, you should examine the console event log (using the
cat elormore el
commands) for complete test results.
Synopsis:
network When the
network
command is entered, the following script is executed:
echo "setting up the network test, this will take about 20 seconds" net -stop er*0 net -sa er*0>ndbr/lp_nodes_er*0 net ic er*0 set er*0_loop_count 2 2>nl set er*0_loop_inc 1 2>nl set er*0_loop_patt ffffffff 2>nl set er*0_loop_size 10 2>nl set er*0_lp_msg_node 1 2>nl set er*0_mode 44 2>nl net -start er*0 echo "Testing the network" nettest er*0 -sv 3 -mode nc -p 0 -w 1 &
The script builds a list of nodes for which to send MOP loopback packets, sets certain test environment variables, and tests the Ethernet port by using the following variation of the nettest exerciser:
network er*0 -sv 3 -mode nc -p 0 -w 1 &
3–12 Running System Diagnostics
Page 67
Testing an Ethernet Port:
>>> network >>> show_status
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------­00000001 idle system 0 0 0 0 0 000000d5 nettest era0.0.0.0.0 13 0 0 308672 308672 >>> kill_diags >>>
Running System Diagnostics 3–13
Page 68
3.3.6 net -s
The
net -s
command displays the MOP counters for the specified Ethernet port.
Synopsis:
net -s ewa0
Example:
>>> net -s ewa0 Status counts:
ti: 72 tps: 0 tu: 47 tjt: 0 unf: 0 ri: 70 ru: 0 rps: 0 rwt: 0 at: 0 fd: 0 lnf: 0 se: 0 tbf: 0 tto: 1 lkf: 1 ato: 1 nc: 71 oc: 0
MOP BLOCK:
Network list size: 0
MOP COUNTERS: Time since zeroed (Secs): 42
TX:
Bytes: 0 Frames: 0 Deferred: 1 One collision: 0 Multi collisions: 0
TX Failures:
Excessive collisions: 0 Carrier check: 0 Short circuit: 71 Open circuit: 0 Long frame: 0 Remote defer: 0 Collision detect: 71
RX:
Bytes: 49972 Frames: 70 Multicast bytes: 0 Multicast frames: 0
RX Failures:
Block check: 0 Framing error: 0 Long frame: 0 Unknown destination: 0 Data overrun: 0 No system buffer: 0 No user buffers: 0
>>>
3–14 Running System Diagnostics
Page 69
3.3.7 net -ic
The
net -ic
command initializes the MOP counters for the specified Ethernet
port.
Synopsis:
net -ic ewa0
Example:
>>> net -ic ewa0 >>> net -s ewa0 Status counts: ti: 72 tps: 0 tu: 47 tjt: 0 unf: 0 ri: 70 ru: 0 rps: 0 rwt: 0 at: 0 fd: 0 lnf: 0 se: 0 tbf: 0 tto: 1 lkf: 1 ato: 1 nc: 71 oc: 0
MOP BLOCK:
Network list size: 0
MOP COUNTERS: Time since zeroed (Secs): 3
TX:
Bytes: 0 Frames: 0 Deferred: 0 One collision: 0 Multi collisions: 0
TX Failures:
Excessive collisions: 0 Carrier check: 0 Short circuit: 0 Open circuit: 0 Long frame: 0 Remote defer: 0 Collision detect: 0
RX:
Bytes: 0 Frames: 0 Multicast bytes: 0 Multicast frames: 0
RX Failures:
Block check: 0 Framing error: 0 Long frame: 0 Unknown destination: 0 Data overrun: 0 No system buffer: 0 No user buffers: 0
>>>
Running System Diagnostics 3–15
Page 70
3.3.8 kill and kill_diags
The
kill
and
kill_diags
commands terminate diagnostics that are currently
executing .
Note
A serial loopback connector (12-27351-01) must be installed on the COM2 serial port for the
kill_diags
command to successfully terminate system
tests.
The
kill
command terminates a specified process.
The
kill_diags
command terminates all diagnostics.
Synopsis:
kill_diags kill [PID . . . ]
Argument:
[PID . . . ] The process ID of the diagnostic to terminate. Use the
show_status
command to determine the process ID.
3–16 Running System Diagnostics
Page 71
3.3.9 show_status
The
show_status
command reports one line of information per executing diagnostic. The information includes ID, diagnostic program, device under test, error counts, passes completed, bytes written, and bytes read.
Many of the diagnostics run in the background and provide information only if an error occurs. Use the
show_status
command to display the progress of
diagnostics. The following command string is useful for periodically displaying diagnostic
status information for diagnostics running in the background:
>>> while true;show_status;sleep n;done
Where n is the number of seconds between
show_status
displays.
Synopsis:
show_status
Example:
>>> show_status
>>>show_status
ID Program Device Pass Hard/Soft Bytes Written Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------­00000001 idle system 0 0 0 0 0 0000002d exer_kid tta1 0 0 0 1 0 0000003d nettest era0.0.0.2.1 43 0 0 1376 1376 00000045 memtest memory 7 0 0 424673280 424673280 00000052 exer_kid dka100.1.0.6 0 0 0 0 2688512 >>>
Process ID
Program module name
Device under test
Diagnostic pass count
Error count (hard and soft): Soft errors are not usually fatal; hard errors halt the system or prevent completion of the diagnostics.
Bytes successfully written by diagnostic
Bytes successfully read by diagnostic
Running System Diagnostics 3–17
Page 72
3.4 Acceptance Testing and Initialization
Perform the acceptance testing procedure listed below after installing a system or whenever adding or replacing the following:
Memory modules Motherboard CPU daughter board Storage devices EISA or PCI options
1. Run the RBD acceptance tests using the
test
command.
2. If you have added or moved, an EISA option or some ISA options, run the
EISA Configuration Utility (ECU).
3. Bring up the operating system.
4. Run DEC VET to test that the operating system is correctly installed. Refer
to Section 3.5 for information on DEC VET.
3.5 DEC VET
Digital’s DEC Verifier and Exerciser Tool (DEC VET) software is a multipurpose system maintenance tool that performs exerciser-oriented maintenance testing. DEC VET runs on Digital UNIX, OpenVMS, and Windows NT operating systems. DEC VET consists of a manager and exercisers. The DEC VET manager controls the exercisers. The exercisers test system hardware and the operating system.
DEC VET supports various exerciser configurations, ranging from a single device exerciser to full system loading, that is, simultaneous exercising of multiple devices.
Refer to the DEC Verifier and Exerciser Tool User’s Guide (AA–PTTMD–TE) for instructions on running DEC VET.
3–18 Running System Diagnostics
Page 73
4
Error Log Analysis
This chapter provides information on how to interpret error logs reported by the operating system.
Section 4.1 describes machine check/interrupts and how these errors are
detected and reported.
Section 4.2 describes the entry format used by the error formatters.
Section 4.3 describes how to generate a formatted error log using the
DECevent Translation and Reporting Utility available with OpenVMS and Digital UNIX.
4.1 Fault Detection and Reporting
Table 4–1 provides a summary of the fault detection and correction components of AlphaServer 1000A systems.
Generally, PALcode handles exceptions as follows:
The PALcode determines the cause of the exception.
If possible, it corrects the problem and passes control to the operating system
for reporting before returning the system to normal operation.
If error/event logging is required, control is passed through the system control
block (SCB) to the appropriate exception handler.
Error Log Analysis 4–1
Page 74
Table 4–1 AlphaServer 1000 Fault Detection and Correction
Component Fault Detection/Correction Capability KN22A Processor Module
DECchip 21064 and 21064A microprocessors
Contains error detection and correction (EDC) logic for data cycles. There are check bits associated with all data entering and exiting the 21064(A) microprocessor. A single­bit error on any of the four longwords being read can be corrected (per cycle). A double-bit error on any of the four longwords being read can be detected (per cycle).
Backup cache (B-cache) EDC check bits on the data store, and parity on the tag
address store and tag control store.
Memory Subsystem
Memory SIMMs EDC logic protects data by detecting and correcting data
cycle errors. A single-bit error on any of the four longwords can be corrected (per cycle). A double-bit error on any of the four longwords being read can be detected (per cycle).
System Motherboard
SCSI Controller SCSI data parity is generated. EISA-to-PCI bridge chip PCI data parity is generated. PCI-to-PCI bridge chip PCI data parity is generated.
4.1.1 Machine Check/Interrupts
The exceptions that result from hardware system errors are called machine check/interrupts. They occur when a system error is detected during the processing of a data request. There are four types of machine check/interrupts related to system events:
1. Processor machine check (SCB 670)
2. System machine check (SCB 660)
3. Processor-corrected machine check (SCB 630)
4. System-corrected machine check (SCB 620) During the error handling process, errors are first handled by the appropriate
PALcode error routine and then by the associated operating system error handler. The causes of each of the machine check/interrupts are as follows. The system control block (SCB) vector through which PALcode transfers control to the operating system is shown in parentheses.
4–2 Error Log Analysis
Page 75
Processor Machine Check (SCB: 670)
Processor machine check errors are fatal system errors that result in a system crash. The error handling code for these errors is common across all platforms using the DECchip 21064 and 21064A microprocessors.
The DECchip 21064 or 21064A microprocessor detected one or more of the
following uncorrectable data errors: – Uncorrectable B-cache data error – Uncorrectable memory data error
A B-cache tag or tag control parity error occurred
Hard error was asserted in response to:
Double-bit Istream ECC error – Double-bit Dstream ECC error – System transaction terminated with CACK_HERR – I-cache parity errors – D-cache parity errors
System Machine Check (SCB: 660)
A system machine check is a system-detected error, external to the DECchip 21064 microprocessor and possibly not related to the activities of the CPU. These errors are specific to AlphaServer 1000A systems.
Fatal errors:
System overtemperature failure
System complete power supply failure
The power supply number is called out in the register: power supply 1 is the bottom supply; power supply 2 is the top supply.
System fan failure
I/O read/write retry timeout
DMA data parity error
I/O data parity error
Slave abort PCI transaction
DEVSEL not asserted
Uncorrectable read error
Error Log Analysis 4–3
Page 76
Invalid page table lookup (scatter gather)
Memory cycle error
B-cache tag address parity error
B-cache tag control parity error
Non-existent memory error
ESC NMI: IOCHK
Processor-Corrected Machine Check (SCB: 630)
Processor-corrected machine checks are caused by B-cache errors that are detected and corrected by the DECchip 21064 or 21064A microprocessor. These are nonfatal errors that result in an error log entry. The error handling code for these errors is common across all platforms using the DECchip 21064 and 21064A microprocessors.
Single-bit Istream ECC error
Single-bit Dstream ECC error
System transaction terminated with CACK_SERR
System Machine Check (SCB: 620)
These errors (non-fatal) are AlphaServer 1000A-specific correctable errors. These errors result in the generation of the correctable machine check logout frame:
Correctable read errors
Single power supply failure when operating with redundant power supplies.
System overtemperature warning
4.2 Error Logging and Event Log Entry Format
The Digital UNIX and OpenVMS error handlers can generate several entry types. All error entries, with the exception of correctable memory errors, are logged immediately. Entries can be of variable length based on the number of registers within the entry.
Each entry consists of an operating system header, several device frames, and an end frame. Most entries have a PAL-generated logout frame, and may contain frames for CPU, memory, and I/O.
4–4 Error Log Analysis
Page 77
4.3 Event Record Translation
Systems running Digital UNIX and OpenVMS operating systems use the DECevent management utility to translate events into ASCII reports derived from system event entries (bit-to-text translations).
The DECevent utility has the following features relating to the translation of events:
Translating event log entries into readable reports
Selecting input and output sources
Filtering input events
Selecting alternate reports
Translating events as they occur
Maintaining and customizing the user environment with the interactive shell
commands
Note
Microsoft Windows NT does not currently provide bit-to-text translation of system errors.
Section 4.3.1 summarizes the command used to translate the error log
information for the OpenVMS operating system using DECevent.
Section 4.3.2 summarizes the command used to translate the error log
information for the Digital UNIX operating system using DECevent.
4.3.1 OpenVMS Alpha Translation Using DECevent
The kernel error log entries are translated from binary to ASCII using the DIAGNOSE command. To invoke the DECevent utility, enter the DCL command DIAGNOSE.
Format: DIAGNOSE/TRANSLATE [qualifier] [, . . . ] [infile[, . . . ]]
Example:
$ DIAGNOSE/TRANSLATE/SINCE=14-JUN-1995
For more information on generating error log reports using DECevent, refer to DECevent Translation and Reporting Utility for OpenVMS Alpha, User and Reference Guide, AA-Q73KC-TE.
Error Log Analysis 4–5
Page 78
System faults can be isolated by examining translated system error logs or using the DECevent Analysis and Notification Utility. Refer to the DECevent Analysis and Notification Utility for OpenVMS Alpha, User and Reference Guide, AA-Q73LC-TE, for more information.
4.3.2 Digital UNIX Translation Using DECevent
The kernel error log entries are translated from binary to ASCII using the
dia
command. To invoke the DECevent utility, enter
dia
command.
Format: dia [-a -f infile[ . . . ]]
Example:
% dia -t s:14-jun-1995:10:00
For more information on generating error log reports using DECevent, refer to
DECevent Translation and Reporting Utility for Digital UNIX, User and Reference Guide, AA-QAA3-TE.
System faults can be isolated by examining translated system error logs or using the DECevent Analysis and Notification Utility. Refer to the DECevent Analysis and Notification Utility for Digital UNIX, User and Reference Guide, AA-QAA4A-TE, for more information.
4–6 Error Log Analysis
Page 79
5
System Configuration and Setup
This chapter provides configuration and setup information for AlphaServer 1000A systems and system options.
Section 5.1 describes how to examine the system configuration using the
console firmware. – Section 5.1.1 describes the function of the two firmware interfaces used
with AlphaServer 1000A systems. – Section 5.1.2 describes how to switch between firmware interfaces. – Sections 5.1.3 and 5.1.4 describe the commands used to examine system
configuration for each firmware interface.
Section 5.2 describes the system bus configuration.
Section 5.3 describes the motherboard.
Section 5.4 describes the EISA bus.
Section 5.5 describes how ISA options are compatible on the EISA bus.
Section 5.6 describes the EISA configuration utility (ECU).
Section 5.7 describes the PCI bus.
Section 5.8 describes SCSI buses and configurations.
Section 5.9 describes power supply configurations.
Section 5.10 describes the console port configurations.
System Configuration and Setup 5–1
Page 80
5.1 Verifying System Configuration
Figure 5–1 illustrates the system architecture for AlphaServer 1000A systems.
Figure 5–1 System Architecture: AlphaServer 1000A
PCI Slots
EISA Slots
OCP
EISA
Config
RAM
8242
Keybd &
Mouse
Buffers
Keyboard Mouse
X-Bus
Serial Ports Floppy Port Parallel Port
SVGA Cirrus
5422
NS
87332
PCI Slots
PCI Slots
PCI-PCI
Bridge
EISA Bus
CPU Card
21064
SROM
Bcache
2MB
Comanche
Decade
Epic
Memory
(16MB-1GB)
MA00946
EISA Slots
PCI Slots
PCI-EISA
Bridge
PCI Slots
PCI Slots
PCI Slots
Primary
PCI Bus
Secondary
PCI Bus
OLOGIC
ISP1020A
TOY
Flash ROM
(1MB)
Fast-Wide SCSI Bus
5.1.1 System Firmware
The system firmware currently provides support for the following operating systems:
Digital UNIX and OpenVMS Alpha are supported under the SRM command line interface, which can be serial or graphical. The SRM firmware is in compliance with the Alpha System Reference Manual (SRM).
Windows NT is supported under the ARC menu interface, which is graphical. The ARC firmware is in compliance with the Advanced RISC Computing Standard Specification (ARC).
The console firmware provides the data structures and callbacks available to booted programs defined in both the SRM and ARC standards.
5–2 System Configuration and Setup
Page 81
SRM Command Line Interface
Systems running Digital UNIX or OpenVMS access the SRM firmware through a command line interface (CLI). The CLI is a UNIX style shell that provides a set of commands and operators, as well as a scripting facility. The CLI allows you to configure and test the system, examine and alter system state, and boot the operating system.
The SRM console prompt is
>>>
.
Several system management tasks can be performed only from the SRM console command line interface:
All console test and reporting commands are run from the SRM console.
Certain environment variables are changed using the SRM
set
command.
For example:
er*0_protocols ew*0_mode ew*0_protocols ocp_text pk*0_fast pk*0_host_id
To run the ECU, you must enter the
ecu
command. This command will boot the
ARC firmware and the ECU software.
ARC Menu Interface
Systems running Windows NT access the ARC console firmware through menus that are used to configure and boot the system, run the EISA Configuration Utility (ECU), run the RAID Configuration Utility (RCU), adapter configuration utility, or set environment variables.
You must run the EISA Configuration Utility (ECU) whenever you add, remove, or move an EISA or ISA option in your AlphaServer system. The ECU is run from diskette. Two diskettes are supplied with your system shipment, one for Digital UNIX and OpenVMS and one for Windows NT. For more information about running the ECU, refer to Section 5.6.
If you purchased a StorageWorks RAID Array 200 Subsystem for your server, you must run the RAID Configuration Utility (RCU) to set up the disk drives and logical units. Refer to StorageWorks RAID Array 200 Subsystems Controller Installation and Standalone Configuration Utility User’s Guide, included in your RAID kit.
System Configuration and Setup 5–3
Page 82
5.1.2 Switching Between Interfaces
For a few procedures it is necessary to switch from one console interface to the other.
The
test
command is run from the SRM interface.
The EISA Configuration Utility (ECU) and the RAID Configuration Utility (RCU) are run from the ARC interface.
Switching from SRM to ARC
Two SRM console commands are used to temporarily switch to the ARC console:
The
arc
command loads the ARC firmware and switches to the ARC menu
interface.
The
ecu
command loads the ARC firmware and then boots the ECU diskette.
For systems that boot the Windows NT operating system, return to the ARC console by setting the os_type environment variable to NT, then enter the
init
command:
>>> set os_type NT >>> init
Switching from ARC to SRM
Switch from the ARC console to the SRM console as follows:
1. From the Boot menu, select the Supplementary menu.
2. From the Supplementary menu, select ‘‘Set up the system.’’
3. From the Setup menu, select ‘‘Switch to OpenVMS or UNIX console.’’
4. Select your operating system, then press enter on ‘‘Setup menu.’’
5. When the ‘‘Power-cycle the system to implement the change’’ message is displayed, press the Reset button. Once the console firmware is loaded and the system is initialized, the SRM console prompt,
>>>
, is displayed.
5.1.3 Verifying Configuration: ARC Menu Options for Windows
NT
The following ARC menu options are used for verifying system configuration on Windows NT systems:
Display hardware configuration
(Section 5.1.3.1)—Lists the ARC device
names for devices installed in the system.
Set default environment variables
(Section 5.1.3.2)—Allows you to select
values for Windows NT firmware environment variables.
5–4 System Configuration and Setup
Page 83
5.1.3.1 Display Hardware Configuration
The hardware configuration display provides the following information:
The first screen displays system information, such as the memory, CPU type, speed, NVRAM usage, the ARC version time stamp, and the type of video option detected.
The second screen displays devices detected by the firmware, including the monitor, keyboard, serial ports and devices on the SCSI bus. Tape devices are displayed, but cannot be accessed from the firmware.
The third screen contains the PCI slot information: bus number, device number, function number, vendor ID, revision ID, interrupt vector and device type. All PCI network cards are displayed.
The fourth screen contains the EISA slot information: slot, device, and identifier. All EISA network cards are displayed.
Table 5–1 lists the steps to view the hardware configuration display.
Table 5–1 Listing the ARC Firmware Device Names
Step Action Result
1 If necessary, access the Supplementary menu. The system displays the
Supplementary menu.
2 Choose ‘‘Display hardware configuration’’ and
press Enter.
The system displays the hardware configuration screens.
Table 5–2 explains the device names listed on the first screen of the hardware configuration display.
Note
The available boot devices display marks tape devices as not used by the firmware. All PCI and EISA network cards are listed under the PCI and EISA screen displays.
System Configuration and Setup 5–5
Page 84
Table 5–2 ARC Firmware Device Names
Name Description
multi(0)key(0)keyboard(0) multi(0)serial(0) multi(0)serial(1)
The multi( ) devices are located on the system module. These devices include the keyboard port and the serial line ports.
eisa(0)video(0)monitor(0) eisa(0)disk(0)fdisk(0)
The eisa( ) devices are provided by devices on the EISA bus. These devices include the monitor and the diskette drive.
scsi(0)disk(0)rdisk(0) scsi(0)cdrom(5)fdisk(0)
The scsi( ) devices are SCSI disk or CD–ROM devices. These examples represent installed SCSI devices. The disk drives are set to SCSI ID 0, and the CD–ROM drive is set to SCSI ID 5. The devices have logical unit numbers of 0.
Example 5–1 Sample Hardware Configuration Display
12/20/1995 9:06:23 AM
Wednesday
Alpha Processor and System Information:
Processor ID 21064 Processor Revision 3 System Revision 0x1 Processor Speed 266.02 MHz Physical Memory 64 MB Backup Cache Size 2 MB
Extended Firmware Information:
Version: 4.45 (Proto) 951212.0949 NVRAM Environment Usage: 75%
(744 of 1024 bytes)
Video Option detected:
BIOS controlled video card
Press any key to continue...
12/20/1995 9:06:23 AM
Devices detected by the firmware: Wednesday
(continued on next page)
5–6 System Configuration and Setup
Page 85
Example 5–1 (Cont.) Sample Hardware Configuration Display
eisa(0)video(0)monitor(0) multi(0)key(0)keyboard(0) eisa(0)disk(0)fdisk(0) (Removable) multi(0)serial(0) multi(0)serial(1) scsi(0)disk(0)rdisk(0) (4 Partitions) DEC RZ29B (C)DEC007 scsi(0)cdrom(0)fdisk(0) (Removable) DEC RRD43 (C) DEC 1084
Press any key to continue...
12/20/1995 9:06:23 AM
Wednesday
PCI slot information:
Bus Device Function Vendor Device Revision Interrupt Device
Number Number Number ID ID ID Vector Type
------ ------ -------- ------ ------ -------- --------- ------­0 7 0 8986 482 4 0 EISA bridge 0 8 0 1011 1 2 0 PCI bridge 0 11 0 1011 2 23 13 Ethernet 1 1 0 1000 1 1 19 SCSI
Press any key to continue...
12/20/1995 9:06:23 AM
Wednesday
EISA slot information:
Slot Device Identifier
0 Other DEC5000 0 Disk FLOPPY
Press any key to continue...
5.1.3.2 Set Default Variables
The Set default environment variables option of the Setup menu sets and displays the default Windows NT firmware environment variables.
Caution
Do not edit or delete the default firmware Windows NT environment variables. This can result in corrupted data or make the system inoperable. To modify the values of the environment variables, use the menu options on the ‘‘Set up the system’’ menu.
System Configuration and Setup 5–7
Page 86
Table 5–3 lists and explains the default ARC firmware environment variables.
Table 5–3 ARC Firmware Environment Variables
Variable Description
A: The default floppy drive. The default value is
eisa( )disk()fdisk( ).
AUTOLOAD The default startup action, either YES (boot) or NO or
undefined (remain in Windows NT firmware).
CONSOLEIN The console input device. The default value is
multi( )key()keyboard( )console().
CONSOLEOUT The console output device. The default value is
eisa( )video()monitor( )console().
COUNTDOWN The default time limit in seconds before the system boots
automatically when AUTOLOAD is set to yes. The default value is 10.
ENABLEPCIPARITY­CHECKING
Disables parity checking on the PCI bus in order to prevent machine check errors that can occur if the PCI device has not properly set the parity on the bus. The default value is FALSE—PCI parity checking is disabled.
FLOPPY The capacity of the default diskette drive, either 1 (1.2
MB), 2 (1.44 MB), or 3 (2.88 MB).
FLOPPY2 The capacity of an optional second diskette drive, either N
(not installed), 1, 2, or 3.
FWSEARCHPATH The search path used by the Windows NT firmware
and other programs to locate particular files. The default value is the same as the SYSTEMPARTITION
environment variable value. KEYBOARDTYPE The keyboard language. The default is U.S. (English). TIMEZONE The time zone in which the system is located. This
variable accepts ISO/IEC9945-1 (POSIX) standard values. VERSION The firmware version.
Note
The operating system or other programs, for example, the ECU, may create either temporary or permanent environment variables for their own use. Do not edit or delete these environment variables.
5–8 System Configuration and Setup
Page 87
5.1.4 Verifying Configuration: SRM Console Commands for
Digital UNIX and OpenVMS
The following SRM console commands are used to verify system configuration on Digital UNIX and OpenVMS systems:
show config
(Section 5.1.4.1)—Displays the buses on the system and the
devices found on those buses.
show device
(Section 5.1.4.2)—Displays the devices and controllers in the
system.
show memory
(Section 5.1.4.3)—Displays main memory configuration.
set
and
show
(Section 5.1.4.4)—Set and display environment variable settings.
5.1.4.1 show config
The
show config
command displays all devices found on the system bus, PCI bus, and EISA bus. You can use the information in the display to identify target devices for commands such as
boot
and
test
, as well as to verify that the system
sees all the devices that are installed. The configuration display includes the following:
Firmware:
The version numbers for the firmware code, PALcode, SROM chip, and CPU are displayed.
Memory:
The memory size and configuration for each bank of memory is displayed.
PCI Bus:
Bus 0, Slot 7 = PCI to EISA bridge chip Bus 0, Slot 8 = PCI to PCI bridge chip
Bus 2, Slot 0 = SCSI controller on backplane, along with storage
drives on the bus.
Bus 2, Slots 1–4 correspond to physical PCI card cage slots on the
secondary PCI bus:
Slot 1 = PCI1 Slot 2 = PCI2 Slot 3 = PCI3 Slot 4 = PCI4
In the case of storage controllers, the devices off the controller are also displayed.
System Configuration and Setup 5–9
Page 88
Bus 0, Slots 11–13 correspond to physical PCI card cage slots on the primary PCI bus:
Slot 11 = PCI11 Slot 12 = PCI12 Slot 13 = PCI13
In the case of storage controllers, the devices off the controller are also displayed.
EISA Bus:
Slot numbers correspond to EISA card cage slots (1 and 2). For storage controllers, the devices off the controller are also displayed.
For more information on device names, refer to Figure 5–2. Refer to Figure 5–3 for the location of physical slots.
5–10 System Configuration and Setup
Page 89
Synopsis:
show config
Example:
>>> show config Firmware SRM Console: X4.4-5365 ARC Console: 4.43p PALcode: VMS PALcode X5.48-115, OSF PALcode X1.35-84 Serial Rom: X2.1
Processor DECchip (tm) 21064A-6
MEMORY
32 Meg of System Memory
Bank 0 = 32 Mbytes() Starting at 0x00000000
PCI Bus
Bus 00 Slot 07: Intel 8275EB PCI to Eisa Bridge
Bus 00 Slot 08: Digital PCI to PCI Bridge Chip
Bus 02 Slot 00: ISP1020 Scsi Controller
pka0.7.0.2000.0 SCSI Bus ID 7 dka0.0.0.2000.0 RZ29B dka500.5.0.2000.0 RRD45
Bus 02 Slot 04: DECchip 21040 Network Controller
ewa0.0.0.2004.0 08-00-2B-E5-6A-41
Bus 00 Slot 11: DECchip 21040 Network Controller
ewb0.0.0.11.0 08-00-2B-E1-03-19
EISA Bus Modules (installed) >>>
Note
The onboard SCSI controller (Qlogic 1020A) is always device pka.
System Configuration and Setup 5–11
Page 90
The following
show config
example illustrates how PCI options that contain a PCI-to-PCI bridge are represented in the display. For each option that contains a PCI-to-PCI bridge, the bus number increments by 1, and the logical slot numbers start anew at 0.
The sample system configuration contains the following options:
Primary Bus
Physical PCI slot 11: KZPSM option with PCI-to-PCI bridge
Physical PCI slot 12: KZPSM option with PCI-to-PCI bridge
Physical PCI slot 13: —
Secondary Bus
Physical PCI slot 1: — Physical PCI slot 2: — Physical PCI slot 3: KZPSM option with PCI-to-PCI bridge
Physical PCI slot 4: NCR810 SCSI controller
5–12 System Configuration and Setup
Page 91
Example:
>>> show config Firmware SRM Console: X4.4-5365 ARC Console: 4.43p PALcode: VMS PALcode X5.48-115, OSF PALcode X1.35-84 Serial Rom: X2.1
Processor DECchip (tm) 21064A-6
MEMORY
32 Meg of System Memory
Bank 0 = 32 Mbytes() Starting at 0x00000000
PCI Bus
Bus 00 Slot 07: Intel 8275EB PCI to Eisa Bridge
Bus 00 Slot 08: Digital PCI to PCI Bridge Chip
Bus 02 Slot 00: ISP1020 SCSI Controller
pka0.7.0.2000.0 SCSI Bus ID 7 dka0.0.0.2000.0 RZ29B dka500.5.0.2000.0 RRD45
Bus 02 Slot 03: Digital PCI to PCI Bridge Chip
Bus 3 Slot 0 ewa0.0.0.3000.0 08-00-3C-E6-6B-41 Bus 3 Slot 1 ISP1020 Controller
pkb0.7.0.3001.0 dkb0.0.0.3001.0 dkb100.0.0.3001.0
Bus 02 Slot 04: NCR 810 SCSI Controller
dkc100.1.0.2004.0 dkc200.2.0.2004.0
Bus 00 Slot 11: Digital PCI to PCI Bridge Chip
Bus 04 Slot 0: DEC 21040 Network Controller
ewb0.0.0.4000.0 08-12-2E-C3-04-92
Bus 04 Slot 1: ISP1020 Controller
pkd0.7.0.4001.0
Bus 00 Slot 12: Digital PCI to PCI Bridge Chip
Bus 05 Slot 0: DEC 21040 Network Controller
ewc0.0.0.5000.0 08-24-3D-C6-08-04
Bus 05 Slot 1: ISP1020 Controller
pke0.7.0.5001.0
EISA Bus Modules (installed)
Slot 01: MLX0075 SCSI Controller
dkc0.7.0.2004.0
>>>
System Configuration and Setup 5–13
Page 92
5.1.4.2 show device
The
show device
command displays the devices and controllers in the system.
The device name convention is shown in Figure 5–2.
Figure 5–2 Device Name Convention
dka0.0.0.0.0
Hose Number:
Channel Number:
Bus Node Number:
Device Unit Number:
Used for multi-channel devices. Bus Node ID Unique device unit number
0 PCI_0 (32-bit PCI); 1 EISA
Adapter ID:
Driver ID: Two-letter port or class driver designator:
PU--DSSI port, DU--DSSI disk, MU--DSSI tape
PK--SCSI port, DK--SCSI disk, MK--SCSI tape
EW--Ethernet port (TULIP chip, DECchip 21040)
MA00921
One-letter adapter designator (A,B,C...)
ER--Ethernet port (LANCE chip, DEC 4220)
DR--RAID-set device
SCSI unit numbers are forced to 100 x Node ID
DV--Floppy drive
Logical Slot Number: For EISA options---Correspond to EISA option physical slot numbers (1 and 2)
For PCI options:
Slot 7 = PCI to EISA bridge chip Slot 8 = PCI to PCI bridge chip
Slot 0 = SCSI controller on system backplane Slots 1--4 = (Secondary bus) Correspond to physical PCI
option slots: PCI1, PCI2, PCI3, and PCI4
Slots 11--13 = (Primary bus) Correspond to physical PCI
option slots: PCI11, PCI12, and PCI13.
Synopsis:
show device [device_name]
Argument:
[device_name] The device name or device abbreviation. When abbreviations or
wildcards are used, all devices that match the type are displayed.
5–14 System Configuration and Setup
Page 93
Example:
>>> show device
dka400.4.0.6.0 DKA400 RRD43 2893 dva0.0.0.0.1 DVA0 era0.0.0.2.1 ERA0 08-00-2B-BC-93-7A pka0.7.0.6.0 PKA0 SCSI Bus ID 7 >>>
Console device name
Node name (alphanumeric, up to 6 characters)
Device type
Firmware version (if known)
5.1.4.3 show memory
The
show memory
command displays information for each bank of memory in the
system.
Synopsis:
show memory
Example:
>>> show memory
48 Meg of System Memory Bank 0 = 16 Mbytes(4 MB Per Simm) Starting at 0x00000000 Bank 1 = 16 Mbytes(4 MB Per Simm) Starting at 0x01000000 Bank 2 = 16 Mbytes(4 MB Per Simm) Starting at 0x02000000 Bank 3 = No Memory Detected
>>>
5.1.4.4 Setting and Showing Environment Variables
The environment variables described in Table 5–4 are typically set when you are configuring a system.
Synopsis:
set [-default] [-integer] -[string] envar value
Note
Whenever you use the
set
command to reset an environment variable, you must initialize the system to put the new setting into effect. You initialize the system by entering the
init
command or pressing the Reset
button.
System Configuration and Setup 5–15
Page 94
show envar
Arguments:
envar The name of the environment variable to be modified. value The value that is assigned to the environment variable. This may be an
ASCII string.
Options:
-default Restores variable to its default value.
-integer Creates variable as an integer.
-string Creates variable as a string (default).
Examples:
>>> set bootdef_dev eza0 >>> show bootdef_dev eza0 >>> show auto_action boot >>> set boot_osflags 0,1 >>>
Table 5–4 Environment Variables Set During System Configuration
Variable Attributes Function
auto_action NV,W The action the console should take following an error
halt or power failure. Defined values are:
BOOT—Attempt bootstrap. HALT—Halt, enter console I/O mode. RESTART—Attempt restart. If restart fails, try boot.
No other values are accepted. Other values result in an error message, and the variable remains unchanged.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
(continued on next page)
5–16 System Configuration and Setup
Page 95
Table 5–4 (Cont.) Environment Variables Set During System Configuration
Variable Attributes Function
bootdef_dev NV The device or device list from which booting is to be
attempted, when no path is specified on the command line. Set at factory to disk with Factory Installed Software; otherwise NULL.
boot_file NV,W The default file name used for the primary bootstrap
when no file name is specified by the
boot
command.
The default value when the system is shipped is NULL.
boot_osflags NV,W Default additional parameters to be passed to system
software during booting if none are specified by the
boot
command.
OpenVMS: On the OpenVMS Alpha operating system, these additional parameters are the root number and boot flags. The default value when the system is shipped is NULL.
Digital UNIX: The following parameters are used with the Digital UNIX operating system:
a Autoboot. Boots /vmunix from bootdef_dev, goes
to multiuser mode. Use this for a system that should come up automatically after a power failure.
s Stop in single-user mode. Boots /vmunix to
single-user mode and stops at the # (root) prompt.
i Interactive boot. Requests the name of the
image to boot from the specified boot device. Other flags, such as -kdebug (to enable the kernel debugger), may be entered using this option.
D Full dump; implies ‘‘s’’ as well. By default, if
Digital UNIX crashes, it completes a partial memory dump. Specifying ‘‘D’’ forces a full dump at system crash.
Common settings are a, autoboot; and Da, autoboot and create full dumps if the system crashes.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
(continued on next page)
System Configuration and Setup 5–17
Page 96
Table 5–4 (Cont.) Environment Variables Set During System Configuration
Variable Attributes Function
bus_probe_ algorithm
NV Specifies a bus probe algorithm for the system.
OLD—Systems running OpenVMS V6.1 or earlier must set the bus probe algorithm to old—Failure to do so could result in bugcheck errors when booting from an EISA device. NEW—Systems running Digital UNIX V3.0B or later or OpenVMS V6.2 or later should be set to new. This setting improves the bus sizing and configuration for Digital UNIX systems.
Not applicable for systems running Windows NT.
console NV Sets the device on which power-up output is displayed.
GRAPHICS—Sets the power-up output to be displayed at a graphics terminal or device connected to the VGA module at the rear of the system. SERIAL—Sets the power-up output to be displayed on the device that is connected to the COM1 port at the rear of the system.
ew*0_mode NV Sets the Ethernet controller to the default Ethernet
device type.
‘‘aui’’—Sets the default Ethernet device to AUI or thinwire. ‘‘twisted’’—Sets the default Ethernet device to 10Base-T (twisted-pair). ‘‘auto’’—Reads the device connected to the Ethernet port and sets the default to the appropriate Ethernet device type.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
(continued on next page)
5–18 System Configuration and Setup
Page 97
Table 5–4 (Cont.) Environment Variables Set During System Configuration
Variable Attributes Function
er*0_protocols, ew*0_protocols
NV Determines which network protocols are enabled for
booting and other functions.
‘‘mop’’—Sets the network protocol to MOP: the setting typically used for systems using the OpenVMS operating system. ‘‘bootp’’—Sets the network protocol to bootp: the setting typically used for systems using the Digital UNIX operating system. ‘‘bootp,mop’’—When the settings are used in a list, the mop protocol is attempted first, followed by bootp.
os_type NV Sets the default operating system.
‘‘vms’’ or ‘‘unix’’—Sets system to boot the SRM firmware. ‘‘nt’’—Sets system to boot the ARC firmware.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
(continued on next page)
System Configuration and Setup 5–19
Page 98
Table 5–4 (Cont.) Environment Variables Set During System Configuration
Variable Attributes Function
pci_parity NV Disable or enable parity checking on the PCI bus.
ON—PCI parity enabled. OFF—PCI parity disabled.
Some PCI devices do not implement PCI parity checking, and some have a parity-generating scheme in which the parity is sometimes incorrect or is not fully compliant with the PCI specification. In such cases, the device functions properly as long as parity is is not checked. The default value is ON—PCI parity enabled.
Note
If you disable PCI parity, no parity check­ing is implemented for any PCI device, even those devices in full compliance with the PCI specification.
pk*0_fast NV Enables Fast SCSI devices on a SCSI controller to
perform in standard or fast mode.
0—Sets the default speed for devices on the controller to standard SCSI.
If a controller is set to standard SCSI mode, both standard and Fast SCSI devices will perform in standard mode.
1—Sets the default speed for devices on the controller to Fast SCSI mode.
Devices on a controller that connect to both standard and Fast SCSI devices will automatically perform at the appropriate rate for the device, either fast or standard mode.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
(continued on next page)
5–20 System Configuration and Setup
Page 99
Table 5–4 (Cont.) Environment Variables Set During System Configuration
Variable Attributes Function
pk*0_host_id NV Sets the controller host bus node ID to a value between
0 and 7.
0 to 7—Assigns bus node ID for specified host
adapter.
pk*0_soft_term NV Enables or disables SCSI terminators. This environ-
ment variable applies to systems using the QLogic ISP1020 SCSI controller.
The QLogic ISP1020 SCSI controller implements the 16-bit wide SCSI bus. The QLogic module has two terminators, one for the 8 low bits and one for the high 8 bits. There are five possible values:
off—Turns off both low 8 bits and high 8 bits. low—Turns on low 8 bits and turns off high 8 bits. high—Turns on high 8 bits and turns of low 8 bits. on—Turns on both low 8 bits and high 8 bits. diff—Places the bus in differential mode.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
(continued on next page)
System Configuration and Setup 5–21
Page 100
Table 5–4 (Cont.) Environment Variables Set During System Configuration
Variable Attributes Function
tga_sync_green NV Sets the location of the SYNC signal generated by the
ZLXp-E PCI graphics accelerator (PBXGA). This environment variable must be set correctly so that
the graphics monitor will synchronize. The parameter is a bit mask, where the least significant bit (LSB) sets the vertical SYNC for the first graphics card found, the second for the second found, and so on.
The command
set tga_sync_green 00
sets all graphics cards to synchronize on a separate vertical SYNC line, as required by some monitors. See the monitor documentation for all other information.
ff—Synchronizes the graphics monitor on systems that do not use a ZLXp-E PCI graphics accelerator (default setting). 00—Synchronizes the graphics monitor on systems with a ZLXp-E PCI graphics accelerator.
tt_allow_login NV Enables or disables login to the SRM console firmware
on alternative console ports.
0—Disables login on alternative console ports. 1—Enables login on alternative console ports (default setting).
If the console output device is set to ‘‘serial’’,
set
tt_allow_login 1
allows you to log in on the primary COM1 port, or alternate COM2 port, or the graphics monitor.
If the console output device is set to ‘‘graphics’’,
set tt_allow_login 1
allows you to log in
through either the COM1 or COM2 console port.
Key to variable attributes:
NV —- Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W —- Warm nonvolatile. The last value set by system software is preserved across warm bootstraps and restarts.
Note
Whenever you use the
set
command to reset an environment variable,
you must initialize the system to put the new setting into effect. Initialize
5–22 System Configuration and Setup
Loading...