Copyright Hewlett-Packard Company 1998. All Rights Reserved.
Reproduction, adaptation, or translation without prior written
permission is prohibited, except as allowed under the copyright laws.
The information contained in this document is subject to change without
notice.
Hewlett-Packard makes no warranty of any kind with regard to this
material, including, but not limited to, the implied warranties of
merchantability and fitness for a particular purpose. Hewlett-Packard
shall not be liable for errors contained herein or for incidental or
consequential damages in connection with the furnishing, performance
or use of this material.
This document describes the offline diagnostics for V2500 servers. It is
not intended to be a tutorial or troubleshooting guide but a reference
guide that contains information on all utilties and scripts used to
troubleshoot these systems.
Notational conventions
This section describes notational conventions used in this book.
bold monospaceIn command examples, bold monospace
identifies input that must be typed exactly as
shown.
monospaceIn paragraph text, monospace identifies
command names, system calls, and data
structures and types.
In command examples, monospace identifies
command output, including error messages.
italicIn paragraph text, italic identifies titles of
documents.
In command syntax diagrams, italic identifies
variables that you must provide.
The following command example uses
brackets to indicate that the variable
output_file is optional:
command input_file [output_file]
In command syntax diagrams, text
surrounded by curly brackets indicates a
choice. The choices available are shown inside
the curly brackets and separated by the pipe
sign (|).
The following command example indicates
that you can enter either a or b:
command {a | b}
KeycapKeycap indicates the keyboard keys you must
press to execute the command example.
NOTEA note highlights important supplemental information.
CAUTIONA caution highlights procedures or information necessary to avoid
damage to equipment, damage to software, loss of data, or invalid test
results.
xviiiPreface
Page 19
1Introduction
This chapter presents an overview of the diagnostic mechanism for
V2500 servers.
Chapter 11
Page 20
Introduction
Utilities board
Utilities board
The diagnostic mechanism in the V2500 servers is centered around the
Stingray Core Utilities board (SCUB). The SCUB is mounted under the
MidPlane Interconnect board (MIB) toward the front of the system. See
Figure 1.
2Chapter 1
Page 21
Figure 1Location of the Utilities board
Power board
MidPlane
Utilities board
Introduction
Utilities board
IOEXS120
12/7/98
Chapter 13
Page 22
Introduction
Utilities board
The following devices connect to the Utilities board:
• Core logic bus
• Environmental sensors
• Test points
• Liquid crystal display (LCD)
• Attention lightbar
• Teststation
The teststation connects to the system via the ethernet and RS232
connections. It is used to configure and run diagnostics on the system.
A system will boot and operate without a teststation, and failure of
the teststation will not cause interruption of the system.
Figure 2 shows the Utilities board functional layout.
The following hardware components comprise the Utilities board:
• Core logic—Contains initialization, booting firmware, controller for
ethernet and RS-232 interface, and various memories.
• Power-On circuit—Controls powering up the entire system.
Environmental sensors are located throughout the system and
connect to the SMUC. The SMUC latches interrupts from these
sensors as well as other interrupts. The SMUC and the power-on
circuit together control system power-up. The power-on circuit drives
the attention lightbar diagnostic display through which the operator
can determine power-on status.
• Stingray Processor Utilities controller (SPUC)—Interfaces to the core
logic bus.
The SPUC connects to the two core logic buses. Each bus connects up
to four Stingray Processor Agent Controllers (SPACs).
• JTAG (Joint Test Action Group) interface—Supports a teststation for
running diagnostics. The V2500 servers use a test method called
scanning to test boards and other hardware units.
4Chapter 1
Page 23
The microprocessor-controlled JTAG interface captures incoming
command packets and sends out scan information packets across the
ethernet connection to the teststation. Through the teststation
connection, one can read and write every CSR in the system.
Figure 2Utilities board
Introduction
Utilities board
SPACSPACSPACSPACSPACSPACSPACSPAC
Core logic busCore logic bus
Node
scanning
To
power
controller
and interface
Ethernet
Teststation
JTAG
Clock
logic
Ethernet
Hard errors and
environmental
sensors
SPUCSMUC
Utility bus
Core logic
RS232
RS232
Liquid crystal
display
MIB
Utilities board
To
power
Power-on
Led display
IOEXS118
11/16/98
Chapter 15
Page 24
Introduction
Utilities board
Core logic
The core logic contains initialization and booting firmware and is
described in the following sections.
Flash memory
The core logic contains a four-MByte electrically erasable programmable
read only memory (EEPROM) storage for Processor-Dependent Code
(PDC). PDC consists of Power-On Self Test (POST) and Open Boot
PROM (OBP). The V2500 server uses these two components plus
additional firmware called spp_pdc that is laid over OBP and interfaces
OBP to HP-UX. Flash memory also contains all diagnostic test, utilities ,
and scripts.
Flash memory is configured as 512-KByte addresses by 32 data bits with
only 32-bit read and write accesses allowed. EEPROM devices are used
for flash memory so that it may be rewritten for field upgrades. It can
also be written when the SPUC is scanned.
Nonvolatile static RAM
The core logic section contains a nonvolatile battery-backed 128-Kbyte
RAM (NVRAM) for storing system log and configuration information.
This RAM is byte addressable and can be accessed even after power
failures.
DUART
A Dual Universal Asynchronous Receiver-Transmitter (DUART)
provides to RS232 serial ports and a single parallel port. One serial port
provides an interface to a terminal used as a local console to analyze
problems, reconfigure the system, and provide other user access.The
parallel port of the DUART drives the LCD. The second RS232 port can
be used for a modem for field service.
RAM
Random access memory (RAM) provides support for the core system
functions. When the system powers up, the processors operate out of this
RAM to run self test and configure the rest of the node. Once the system
is fully configured, the processors execute out of main memory. The RAM
is byte addressable and is 512 KBytes, configured as 128-KByte
addresses by 32 data bits.
6Chapter 1
Page 25
Introduction
Utilities board
Console ethernet
The ethernet I/O port provides a connection to the teststation over
LAN1.
Attention lightbar and LCD
The attention light bar displays environmental information, such as the
source of an environmental error that caused the Utilities board to power
down the node.
The liquid crystal display provides basic system information. The core
logic drives the LCD through the parallel port on the DUART. The
attention lightbar and LCD are detailed in “System displays” on page 12.
COP interface
A serial EEPROM (referred to as COP chip) is located on major boards
with information such as serial number , assembly revision, wire revision,
truncated board part number, and so on. The SMUC connects to the COP
bus selector (CBS) chip on the MIB allowing each COP chip in a node to
be read.
SPUC
The SPUC provides interrupts and error messages to and receives
control messages from the processors through two 18-bit, bidirectional
buses. Each bus connects up to four SP ACs . The SPUC also provides core
logic bus arbitration for the processors.
SMUC and Power-on
The SMUC registers system environmental parameters. It connects to
the utilities bus so that processors can monitor the node by accessing the
appropriate CSRs. The SMUC works in conjunction with the power-on
circuit to power up the entire system, and it can operate when the rest of
the node is powered off or in some indeterminate state. The SMUC drives
the environment LCD display. The teststation can also read the
environmental LCD display using the sppdsh utility. See “sppdsh” on
page 268.
Chapter 17
Page 26
Introduction
Utilities board
SMUC environmental monitoring
The following environmental conditions are monitored:
• ASIC installation error sensing
• FPGA configuration and status
• Thermal sensing
• Fan Sensing
• Power failure sensing
• 48-V failure
• 48-V maintenance
• Ambient air temperature sensing.
• Power-on
Table 1Environmental conditions monitored by the SMUC and power-
on circuit
ConditionTypeAction
ASIC Not InstalledOKEnvironmental
error
FPGA not OKEnvironmental
error
48-V FailEnvironmental
error
MIB power failEnvironmental
error
Board over tempEnvironmental
error
Fan not turningEnvironmental
error
Ambient air hotEnvironmental
error
Other power failEnvironmental
error
8Chapter 1
Power not turned on, LED
indication
Power not turned on, LED
indication
Power turned off, LED
indication
Power turned off, LED
indication
Power off in one second,
LED indication interrupt
Power off in one second,
LED indication interrupt
Power off in one second,
LED indication interrupt
Power off in one second,
LED indication interrupt
Page 27
ConditionTypeAction
Introduction
Utilities board
Ambient air warmEnvironmental
warning
48-Volt maintenanceEnvironmental
warning
Hard errorHard errorLED indication, interrupt
LED indication, interrupt
LED indication, interrupt
Environmental condition detected by power-on
function
The power-on function detects environmental errors (such as ASIC Not
Installed OK or FPGA Not OK). It does not turn on power to the node
until the conditions are corrected. It also detects environmental errors
such as 48-V Fail while the system is powering up and MIB Power Fail
after the system has powered up. If a failure is detected in these two
cases, the power-on circuit turns off power to the system.
Environmental warnings such as 48-Volt maintenance are also detected
by the power-on circuit.
In all cases, the power-on circuit sets an environmental attention light
bar code. The code is prioritized so that it displays the highest priority
error or warning. See “Attention light bar” on page 16 for a list of codes.
Environmental conditions detected by SMUC
The SMUC detects most of the environmental conditions. It samples
error conditions during a time period derived from a local 10-Hz clock
that drives the power-on circuit. It registers all the environmental error
conditions twice and then logically ORs them together. If the conditions
persist for 200 mS, the environmental error bit is set, and an
environmental error interrupt is sent to the SPUC, which sends it on to
the processors. The SMUC then waits 1.2 seconds and commands the
power-on circuit to power down the system.
This same procedure exists for an environmental warning, except that
an environmental warning interrupt is sent and the power-on circuit
does not power down the system.
Chapter 19
Page 28
Introduction
Utilities board
The environmental error interrupt and the 1.2 second delay provide the
system adequate time to read CSRs to determine the cause of the error,
log the condition in NVRAM, and display the condition on the attention
lightbar.
After the system is powered down, the Utilities board is still powered up,
but all outputs are disconnected from the system.
Environmental control
The Utilities board performs the following functions to control the node
environment.
Power-on
When the power switch is turned on, the outputs of the 48-Volt power
supplies become active. Several hundred milliseconds after the Utilities
board 5-V olt supply reaches its nominal level, the power-on circuit starts
powering up the other DC-to-DC converters of the node in succession.
The power-on circuit does not power up the node if an ASIC is installed
incorrectly (see “ ASIC installation error” on page 18) or if an FPGA is not
configured (see “FPGA configuration and status” on page 19). It keeps
the system powered up unless an environmental condition occurs that
warrants a power-down.
Voltage margining
Voltage margin is divided into four groups called quadrants. The user
can margin quadrants separately. When setting the upper margin, for
example, all boards in that quadrant are margined for upper.
Clock margining
Parallel ports on the core logic microprocessor select the nominal, upper,
or external clock that drives the node.
JTAG interface
The JTAG interface supports a teststation and a mechanism to fanout
JTAG to all the boards in a node. It is used only for testing.
JTAG functions are described in the following sections.
10Chapter 1
Page 29
Introduction
Utilities board
Teststation interface
The teststation can be a PA-RISC based workstation. The interface to
the teststation is an ethernet AUI port for flexibility in connecting to
many workstations. It is also easily expandable.
DC test of a node
To perform the DC test, the Test Bus Controller (TBC) first scans data
to all boards in a node. Then each JTAG device performs a capture step
that completes the movement of the test data from the driver to the
receiver. This step is described in the JTAG 1149.1 specification.
AC test of a node
To perform theAC test, the Test Bus Controller (TBC) scans data to all
boards in a node and then loads an AC test instruction into all ASICs
on one board at a time. The scan ring on each board is paused.
Once all boards have been loaded with the AC test instruction, the TBC
takes all boards out of pause mode simultaneously, causing them all to
exit update together and execute the AC test.
The AC test enables clocks inside the ASICs so that they test internal
and external paths at the system clock rate. They all execute on the same
system clock.
JTAG fanout
The teststation interface is thin ethernet. In addition to the teststation,
this port is also used for the console ethernet. There is one cable that
connects to all the nodes and to the teststation (if it exists) and to
whatever device or network that will display the console.
Chapter 111
Page 30
Introduction
System displays
System displays
The V2500 server provides two means of displaying status and error
reporting: an LCD and an Attention light bar.
Figure 3System displays
CONSOLE
DC OFF
ENABLE
CONSLOLE
SECURE
DC ON
TOC
LCD display
Attention light bar
IOLM010
9/18/97
Front panel LCD
The front panel is a 20-character by 4-line liquid crystal display as
shown in Figure 4.
12Chapter 1
Page 31
Figure 4Front panel LCD
0 (0,0)
MIII IIII IIII IIII
IIII IIII IIII IIII
abcedfghijklr
When the node key switch is turned on, the LCD powers up but is
initially blank.
Power-On Self Test (POST) starts displaying output to the LCD. The
following illustrates this output shown in Figure 4:
Node status line
The Node Status Line shows the node ID in both decimal and X, Y
topology formats.
Introduction
System displays
Processor status line
The processor status line shows the current run state for each processor
in the node. Table 2 shows the initialization step code definitions and
Table 3 shows the run-time status codes. The M in the first processor
status line stands for the monarch processor.
Table 2Processor initialization steps
StepDescription
0Processor internal diagnostic register initialization
1Processor early data cache initialization.
2Processor stack SRAM test.(optional)
3Processor stack SRAM initialization.
4Processor BIST-based instruction cache initialization.
5Processor BIST-based data cache initialization
RRUN: Performing system initialization operations.
IIDLE: Processor is in an idle loop, awaiting a command.
MMONARCH: The main POST initialization processor.
HHPMC: processor has detected a high priority machine
check (HPMC).
TTOC: processor has detected a transfer of control (TOC).
SSOFT_RESET: processor has detected a soft RESET.
DDEAD: processor has failed initialization or selftest.
dDECONFIG: processor has been deconfigured by POST or
the user.
-EMPTY: Empty processor slot.
?UNKNOWN: processor slot status in unknown.
Message display line
The message display line shows the POST initialization progress. This is
updated by the monarch processor. The system console also shows detail
for some of these steps. Table 4 shows the code definitions.
14Chapter 1
Page 33
Table 4Message display line
Introduction
System displays
Message
display code
aUtilities board (SCUB) hardware initialization.
bProcessor initialization/selftest rendezvous.
cUtilities board (SCUB) SRAM test. (optional)
dUtilities board (SCUB) SRAM initialization.
eReading Node ID and serial number.
fVerifying non-volatile RAM (NVRAM) data
structures.
gProbing system hardware (ASICs).
hInitializing system hardware (ASICs).
iProbing processors.
jInitialing, and optionally testing, remaining SCUB
SRAM.
kProbing main memory.
lInitializing main memory.
Description
rEnabling system error hardware.
Power supply indicators
When the keyswitch on the operator panel is in the DC ON position both
the AC power (amber) LED and the DC power (green) LED on each of the
power supplies should be on.
Chapter 115
Page 34
Introduction
System displays
Attention light bar
The Attention light bar is located at the top left corner on the front of the
HP 9000 V2500 server as shown in Figure 3 on page 12. This light bar
displays system status in three ways:
• Off—system powered down
• Steady on—system powered up
• Flashing—error condition
The SMUC prioritizes system environmental errors and warnings and
passes the information to the power-on circuit. This circuit prioritizes the
6-bit field with its environmental conditions and produces a 7-bit field
plus an attention bit (ATTN) that drives the attention light bar. ATTN is
on if there is an environmental warning. Second-level registers in the
SMUC drive the attention light bar.
In general, the power-on-detected errors are a higher priority than
SMUC-detected errors, the lower the error code number, the higher its
priority. Environmental warnings are lower priority than the
environmental errors. Table 5 shows the attention light bar error codes
in hexadecimal. The top of the table is the highest priority, the bottom
the lowest. If a higher condition occurs, that one is displayed.
126-2F48-V error, NPSLR failure, PWRUP=0-9
130-3948-V error, no supply failure, PWRUP=0-9
13A48-V yo-yo error
13BMIB power failure (PB)
13CClock failure
13D-3FNot used (3)
140-47MB0-MB7 power failure
148-4FPB0L, PB1R, PB2L, PB3R, PB4L, PB5R, PB6L,
150-57PB0R, PB1L, PB2R, PB3L, PB4R, PB5L, PB6R,
158-5BIOB (LF,LR,RF,RR) power failure
15C-61Fan failure (UR,UM,UL,LR,LM,LL)
162Ambient hot or ambient shutdown
attentio
n light
bar
Description
PB7R power failure
PB7L power failure (possibly switch R and L)
163Overtemp MIB
164-67Overtemp quadrant (RL, RU, LL, LU)
168Hard error
169Ambient warm
16A-6FNot used (6)
170-73DC supply maintenance (UL,UR,LL,LR)
174AC circuit failure
175-7FNot used (11)
000-09PWRUP state (00=System all powered up),
attention LED off
Chapter 117
Page 36
Introduction
System displays
SCUB 3.3-Volt error
This error indicates that the SCUB 3.3-Volt power supply has failed, but
the 5-Volt supply has not.
ASIC installation error
Each ASIC in the node has ASIC Install lines to prevent power-up if an
ASIC is installed incorrectly (such as a SPAC installed in an ERAC
position). If an ASIC is improperly installed, the Utilities board does not
power up the system. This condition is not monitored after power up.
DC OK error
When this error is displayed, the power-on circuit did not power up the
system, because one or more 48-V olt power supplies reported an error. In
systems with redundant 48-Volt power supplies, this error means that
two or more 48-Volt supplies reported an error.
48-Volt error
If the 48-Volt supply has dropped below 42 volts for any reason other
than normally turning off the system or an ac failure, then this error is
displayed by the power-on circuit. Also, the 48-Volt supply that reported
the error and the power-up state of the system at the time of the error is
displayed.
48-Volt yo-yo error
This error indicates that a 48-V olt error occurred and the SCUB lost and
then later regained power without the machine being turned off. The
power-on circuit will display this error and not power on the system,
because the 48-Volt supply is likely at fault.
Clock failure
If the system clock fails, the SMUC is unable to monitor environmental
errors that could possibly damage the system. If the power-on circuit
receives no response from the SMUC, it powers down the system and
displays this error.
18Chapter 1
Page 37
Introduction
System displays
FPGA configuration and status
The SMUC is programmed by a serial data transfer from EEPROM upon
utility board power-up. If the transfer does not complete properly, the
SMUC cannot configure itself and many environmental conditions
cannot be monitored. The power-on circuit monitors both the SMUC and
SPUC and does not power up the system, if they are not configured
correctly.
Board over-temperature
On each board in the node, there is one temperature sensor that detects
board overheating. The sensors are bussed together into four-node
quadrants, along with the MIB, and applied to the SMUC.
Fan sensing
The V2500 node has up to six fans, but only four may be configured.
Sensors in the fans determine if the fans are running properly. The
SMUC waits 12.8 seconds for the fans to spin up after power-up before
monitoring them. It is assumed that the unconfigured fans do not report
errors.
Power failure
Because a power failure on a board could cause damage to other boards,
a mechanism on each board detects 3.3-Volt failures on each board.
Power failures are considered environmental errors, and the system is
powered down after they are detected.
MidPlane Interface Board (MIB) power failure
If the MIB power fails, the power-on circuit powers down the entire node .
The Utilities board is still active, but the power-on circuit displays the
power failure condition and disables all Utilities board outputs that drive
the node. This condition persists until power is cycled on the Utilities
board.
48-Volt maintenance
There are four 48-Volt power supplies; three are required, and one is a
redundant source. Each sends a signal to the power-on circuit. If any
supply fails at any time, the circuit asserts the 48-V maintenance line to
Chapter 119
Page 38
Introduction
System displays
the SMUC, which reports the environmental warning to the processors.
The power-on circuit displays the “highest priority” 48-Volt supply that
failed.
Ambient air sensors
The ambient air sensors detect a too warm or too hot condition in the
input air stream to the Utilities board (and therefore the entire node).
Ambient air too warm is an environmental warning; ambient air too hot
is an environmental error that powers down the system.
The temperature set points are set using the sppdsh utility, described in
“sppdsh” on page 268. The digital temperature sensor has nonvolatile
storage for the temperature set points. Power-on reset starts the digital
temperature sensor without the core logic microprocessor intervening.
AC circuit fail
An AC circuit failure denotes that the circuit that detects AC failures is
broken. A power-on reset clears this warning.
20Chapter 1
Page 39
2Configuration management
The teststation allows the user to configure the node using the
ts_config utility. ts_config configures the teststation to
communicate with the node. The teststation daemon, ccmd, monitors the
node and reports back configuration information, error information and
general status. ts_config must be run before using ccmd.
Two additional utilities, sppdsh and xconfig, allow reading or writing
configuration information and changing it. OBP can also be used to
modify the configuration.
Chapter 221
Page 40
Configuration management
Teststation
Teststation
The teststation is used for configuring, monitoring, testing, and error
logging. It is not required for normal operation of a node.
The teststation communicates with the JTAG interface in the nodes. The
JTAG port remains idle if no teststation is connected to it. It receives
communications packets, interprets requests, and generates responses to
them. The hardware on the node can read board information, system
configuration, device revisions, and environmental conditions. When a
teststation is present, all of these parameters are read or written by the
configuration management tools.
The configuration management daemon, ccmd, initiates communications
between the teststation and the nodes.
22Chapter 2
Page 41
Configuration management
ts_config
ts_config
ts_config [-display display name]
V2500 nodes added to the teststation must be configured by ts_config
to enable diagnostic and scan capabilities, environmental and hard-error
monitoring, and console access.
Once the configuration for each node is set, it is retained when new
teststation software is installed.
ts_config tasks include:
• Configuring a node—Adding and removing a node to the teststation
configuration
• Configuring the terminal mux—Configuring and removing the
terminal mux on the teststation
• Installing a node—Upgrading JTAG firmware, configuring a node
scub_ip address, and resetting a node
The user must have root privilege to configure a node of the terminal
mux, because several HP-UX system files are modified during the
configuration.
Starting ts_config
To startts_config from the teststation desktop, click on an empty area
of the background to obtain the Workspace menu and then select the
ts_config (root) option. Enter the root password.
To start ts_config from a shell (local or remote), ensure that the
DISPLAY environment variable is set appropriately before starting
ts_config.
Also, the -display start-up option may be used as shown below:
For example:
# /spp/bin/ts_config -display myws:0
Chapter 223
Page 42
Configuration management
ts_config
NOTEFor shells that are run from the teststation desktop, the DISPLAY
variable is set (at the shell start-up) to the local teststation display.
ts_config operation
The ts_config utility displays an active list of nodes that are powered
up and connected to the teststation diagnostic LAN. The operator selects
a node and configures the selected node. A sample display is shown
below.
Figure 5ts_config sample display
The window has two main parts: the drop-down menu bar and the
display panel. The display panel contains a list of nodes and their status.
To select a node, click with the left-mouse button the line containing the
desired node entry in the list. When a node is selected, information about
that node is shown at the bottom of the ts_config window. If an action
needs to be performed to configure the node, specific instructions are
included.
ts_config automatically updates the display when it detects either a
change in the configuration status of any node or a new node. However,
the automatic update is disabled while the user has a node selected.
After the node is selected, the display is not updated until the user
selects an action or refreshes the node list. The upper right corner of the
ts_config window indicates whether a node has been selected.
24Chapter 2
Page 43
The ts_config window title includes in parenthesis the name of the
effective user ID running ts_config, either root or sppuser.
The ts_config display shows the configuration status of the nodes.
Table 6 shows the possible status values.
Table 6ts_config status values
Configuration management
ts_config
Configuration
Status
Upgrade JTAG
firmware
Not Configuredts_config has detected the
The version of JTAG firmware
running on the SCUB does not
support the capabilities
required to complete the node
configuration process.
node on the Diagnostic LAN and
the JTAG firmware is capable of
supporting the node
configuration activity and the
node needs to be configured.
DescriptionAction Required
Select the node and follow the
instructions given at the bottom of
the ts_config window. ts_config
guides the operator through the
JTAG firmware upgrade procedure.
Select the node and follow the
instructions given at the bottom of
the ts_config window.
ts_config guide the operator
through the node configuration
procedure described later in this
document.
Chapter 225
Page 44
Configuration management
ts_config
Configuration
Status
ActiveThe node is configured and
answering requests on the
Diagnostic LAN.
InactiveThe teststation node
configuration file contains
information about the specified
node, but the node is not
responding to requests on the
Diagnostic LAN.This status is
also shown if a node was
configured and then removed
from the teststation LAN
without being deconfigured.
Node Id
changed
The node is configured and
answering requests on the
diagnostic LAN, but the node ID
currently reported by the node
does not match the teststation
configuration information.
DescriptionAction Required
None required. This is the desired
status.
Power-up the node and/or check for
a LAN connection problem. If the
node information shown is for a
node that has been removed, select
the node then select “Actions,”
“Deconfigure Node,” and click “Yes.”
Select the node to obtain additional
information. If the node COP
information was changed to a
different node ID and the new node
ID is correct, select “Actions,”
“Configure Node,” then click
“Configure.” The teststation
configuration information is
updated using the new node ID.
Configuration Procedures
NOTEThis procedure does not need to be performed unless the status shows
“Upgrade JTAG firmware.” If the node shows “Not Configured,” skip this
section.
The following procedures provide additional details about each
configuration action. ts_config automatically guides the user through
the appropriate procedure when a node is selected.
26Chapter 2
Page 45
Upgrade JTAG firmware
Step 1. Select the node from the list in the display panel. For example, clicking
on node 0 in the list highlights that line as shown in Figure 6.
Figure 6ts_config show node 0 highlighted
Configuration management
ts_config
Notice that after the node has been highlighted that ts_config displays
information concerning the node. In this step, it tells the user what
action to take next, “This node’s JTAG firmware must be upgraded.
Select “Actions,” “Upgrade JTAG firmware” and “Yes” to upgrade.”
Step 2. Select “Actions” to drop the pop-down menu and then click “Upgrade
Step 3. A message panel appears as the one shown in Figure 8. Read the
message. If this is the desired action, click “Yes” to begin the upgrade.
Figure 8Upgrade JTAG firmware confirmation panel
Step 4. After the firmware is loaded a panel appears as the one shown in Figure
9. Click “OK” and then power-cycle the node to activate the new
firmware.
28Chapter 2
Page 47
Figure 9ts_config power-cycle panel
When the node is powered up, the “Configuration Status” should change
to “Not Configured.”
Configure a Node
Step 1. Select the desired node from the list of available nodes. When the node is
selected, the appropriate line is highlighted as shown in Figure 10.
Notice the bottom of the display indicates the Node 0 is not configured
and provides the steps necessary to configure the node.
Figure 10ts_config indicating Node 0 as not configured
Configuration management
ts_config
Step 2. Select “Actions” and then click “Configure Node,” as shown in Figure 11.
Chapter 229
Page 48
Configuration management
ts_config
Figure 11ts_config “Configure Node” selection.
After invoking ts_config to configure the node, a node configuration
panel appears as the one in Figure 12.
Figure 12ts_config node configuration panel
Step 3. Enter a name for the V2500 System. The teststation uses this name as
the “Complex Name” and to generate the IP hostnames of the Diagnostic
and OBP LAN interfaces. Select a short name that teststation users can
easily relate to the associated system (for example: hw2a, swtest, etc.).
30Chapter 2
Page 49
Configuration management
Step 4. Select an appropriate serial connection for the V2500 console from the
pop-down option menu in the node configuration panel.
ts_config automatically assigns the first unused serial port. If the
terminal mux has been configured, the terminal mux ports are included
in the list of available serial connections.
The IP address information for the Diagnostic interface is provided. The
ts_config utility automatically changes the IP address of the
diagnostic LAN interface to prevent a duplicate when other nodes are
added to this teststation configuration.
ts_config automatically updates the local /etc/hosts file with the
names and addresses of the Diagnostic and OBP LAN interfaces.
Step 5. Click “Configure.”
This updates several teststation files. The node configuration
confirmation panel appears as the one in Figure 13.
ts_config checks the scub_ip address stored in NVRAM in the node.
ts_config
If the scub_ip address is correct, no action is required. If the node is not
detected and scanned by ccmd, ts_config may ask you to try again
later. The ccmd detection scan process should take less than a minute.
Step 3. If prompted by ts_config (as indicated by the panel in Figure 16), click
Step 4. A panel as the shown in Figure 17 appears confirming that the scub_ip
address is set. Click OK.
Chapter 233
Page 52
Configuration management
ts_config
Figure 17ts_config scub_ip address set confirmation panel
Initiate a node reset to activate the new scub_ip address.
Reset the Node
Step 1. Select the desired node from the list of available nodes.
Step 2. Select “Actions,” then “Reset Node.” This is indicated in Figure 18.
Figure 18ts_config “Reset Node” selection
A panel as the one shown in Figure 19 appears.
34Chapter 2
Page 53
Figure 19ts_config node reset panel
Configuration management
ts_config
Step 3. In the Node Reset panel, select the desired “Reset Level” and “Boot
Options,” then click Reset.”
Deconfigure a Node
Deconfiguring a node removes the selected node from the teststation
configuration. The teststation will no longer monitor the environmental
and hard-error status of this node. Console access to the node is also be
disabled.
Step 1. Select the desired node from the list of available nodes.
Step 2. Select “Actions,” then “Deconfigure Node,” then click “Yes.”
Add/Configure the Terminal Mux
To add or reconfigure the terminal mux, perform the following
procedure.
Step 1. In the ts_config display, select “Actions,” then “Configure Terminal
Mux.”
Step 2. Select “Add/Configure Terminal Mux.” This is indicated in Figure 20.
A panel appears as the on shown Figure 21. This panel requires the
terminal mux IP address.
Figure 21ts_config terminal mux IP address panel
Step 3. Connect a serial cable from serial port 2 on the teststation to port 1 on
the terminal mux.
Step 4. Enter the desired “Terminal Mux IP Address” and click “Configure,” as
indicated in Figure 22.
36Chapter 2
Page 55
Figure 22Terminal mux IP address entered into panel
Remove terminal mux
ts_config does not remove the terminal mux if any node consoles are
assigned to terminal mux ports.
Step 1. Select “Actions,” then “Configure Terminal Mux.”
Configuration management
ts_config
Step 2. Select “Remove Terminal Mux,” then click “Yes.”
Chapter 237
Page 56
Configuration management
Teststation-to-system communications
Teststation-to-system communications
This section describes how the teststation communicates with the system
using the utilities presented in Chapter 11, “Utilities.”
Figure 23 depicts the V-Class server to teststation communications using
HP-UX.
Figure 23Teststation-to-system communications
ccmd
event_logger
hard_logger
private ethernet
global ethernet
RS232 (tty1)
RS232 (tty0)
sppconsole
ttylink
consolelogx
LAN 1
LAN 0
console
ethernet
(JTAG)
Scan
NFS-FWCP
dfdutil
pcirom
ethernet
Node
ethernet
LCD
DUART
console messages
console messages
console messages/LCD
Node
memlog
events
syslog
POST
/test controller
fwcp/nfs
dfdutil
pcirom
JTAG FW
HPUX
spp_pdc
OBP
teststation
modem
for
system
38Chapter 2
Page 57
Configuration management
Teststation-to-system communications
The hardware components located on the SCUB are shown in the
diagram on the left side of the node or system. They include three
ethernet ports and one DUART.
A layer of firmware between HP-UX and OBP called spp_pdc allows the
HP-UX kernel to communicate with OBP. spp_pdc is platformdependent code and runs on top of OBP providing access to the devices
and OBP configuration properties.
LAN communications
One system ethernet port connects to global LAN 0. The other ethernet
port connects to the system private LAN 1. The JTAG port is used for
scanning. The other port is used for downloading system firmware via
nfs using fwcp, via tftp using pdcfl, downloading disk firmware
using dfdutil (dfdutil uses tftp for read peripheral firmware), and
loading Symbios FORTH code using pcirom.
The configuration deamon, ccmd, which is located on the teststation
obtains system configuration information.
Serial communications
The DUART port on the SCUB provides an RS232 serial link (tty 0) to
the teststation. Through this port HP-UX, OBP, POST and the Test
Controller send console messages. The teststation processes these
message using sppconsole, ttylink, and consolelogx. POST and
OBP also send system status to the LCD connected to the DUART.
Chapter 239
Page 58
Configuration management
ccmd
ccmd
ccmd builds a configuration information database on the teststation. The
board names and revisions, the device names and revisions, and the
start-up information generated by POST are all read and stored in
memory for use by other diagnostic tools.
ccmd is typically run automatically from /etc/inittab on the teststation.
Entering init on the teststation starts ccmd. init monitors ccmd and
respawns it if it ever stops. Once started, ccmd becomes a daemon and
allocates a block of teststation memory used for a database for all nodes,
boards, and devices.
ccmd broadcasts a command to all JTAG ports to report in. Each node
responds with its IP address, error status, complex identification, and
the node identification number. ccmd continues until no new responses
are detected.
Once ccmd has all valid node numbers and IP addresses, it scans each
ring of each node looking for JTAG device IDs. The JTAG IDs contain
device and revision information stored in the teststation database. The
JTAG ID is cross checked in the /spp/data/part_ids file to retrieve a
complete device description of the part. The file for the part is also in
/spp/data.
After ccmd loads all parts and their descriptions into the database, it
reads all board information. Board and device information is cross
checked in the file /spp/data/DB_RING_FILE. Complex and node
configuration files for est_config is written to the /spp/data/ directory.
Once all information is in place, ccmd monitors the node for changes in
node configuration or error status. After a 10 second pause, ccmd once
again broadcasts to determine which nodes are powered up or have an
error condition. If a node response is not received after six broadcasts,
then the node is removed from the database of existing nodes.
If no nodes are responding, ccmd clears all node data and continues
broadcasting, waiting for a node to respond. If a node powers up, the
nodes database is rebuilt.
40Chapter 2
Page 59
Configuration management
If ccmd detects a hard error, it starts the hard_logger script to extract
additional information from the node through the JTAG interface. After
the hard_logger runs, ccmd resets the node or complex that failed.
This behavior can be stopped with autoreset.
ccmd sends output to the console. If running under X-windows as
sppuser, it sends its output to the teststation console message output
window. The -d debug option generates a substantial amount of console
output. Output is also sent to the file /spp/data/ccmd_log.
ccmd does respond to several signals. The SIGHUP signal tells ccmd to
rebuild the teststation database. A sigint or sigabort signal terminates
the ccmd process.
NOTEThe time zone information is read when ccmd starts. If the time zone
information changes, ccmd should be restarted as well.
ccmd
Chapter 241
Page 60
Configuration management
xconfig
xconfig
xconfig is the graphical tool that can also modify the parameters
initialized by POST to reconfigure a node.
The graphical interface allows the user to see the configuration state.
Also the names are consistent with the hardware names, since individual
configuration parameters are hidden to the user. The drawback of
xconfig is that it can not be used as a part of script-based tests, nor can
it be used for remote debug.
xconfig is started from a shell. Information on node 0 is read and
interpreted to form the starting X-windows display shown in Figure 24.
The xconfig window appears on the system indicated by the
environmental variable $DISPLA Y. This may be overridden, however, by
using the following command:
% xconfig -display system_name:0.0
The xconfig window has two display views: one shows each component as
a physical location in the server, the other shows them as logical names.
Figure 24 and Figure 25 show the window in each view, respectively. To
switch between view, click on the Help button in menu bar and then click
the Change names option. See “Menu bar” on page 45.
42Chapter 2
Page 61
Figure 24xconfig window—physical location names
Configuration management
xconfig
Chapter 243
Page 62
Configuration management
xconfig
Figure 25xconfig window—logical names
As buttons are clicked, the item selected changes state and color. There
is a legend on the screen to explain the color and status. The change is
recorded in the teststation’s image of the node.
When the user is satisfied with the new configuration, it should be copied
back into the node, and the node should be reset to enable the changes.
44Chapter 2
Page 63
Configuration management
xconfig
The main xconfig window has three sections:
• Menu bar—Provides additional capability and functions.
• Node configuration map—Provides the status of the node.
• Node control panel—Provides the capability to select a node and
control the way data flows to it.
Menu bar
The menu bar appears at the top of the xconfig main window. It has
four menus that provide additional features:
• File menu—Displays the file and exit options.
• Memory menu—Displays the main memory and CTI cache memory
options.
• Error Enable menu—Displays the device menu options for error
enabling and configuration.
• Help menu—Displays the help and about options.
The menu bar is shown in Figure 26.
Figure 26xconfig window menu bar
The File menu provides the capability to save and restore node images
and to exit xconfig.
The Memory menu provides the capability to enable or disable memory
at the memory DIMM level by the total memory size and to change the
network cache size on a multinode complex.
The Error Enable menu provides the capability to change a device’s
response to an error condition. This is normally only used for
troubleshooting.
The Help menu provides a help box that acts as online documentation
and also contains program revision information.
Chapter 245
Page 64
Configuration management
xconfig
Node configuration map
The node configuration map is a representation of the left and right side
views of a node as shown in Figure 27.
Figure 27xconfig window node configuration map
46Chapter 2
Page 65
Configuration management
xconfig
The button boxes are positioned to represent the actual boards as viewed
from the left and right sides. Each of the configurable components of the
node is in the display. The buttons are used as follows:
• Green button—Indicates that the component is present and enabled.
• Red button—Indicates that the component is software disabled in the
system.
• White button—Indicates that it is not possible to determine what the
status of the component would be if POST were to be started.
• Blue box—Indicates that the component is either not present or fails
the power-on self tests.
• Brown button—Indicates that POST had to hardware deconfigure
this component in order to properly execute.
• Grey button—Indicates a hardware component that did not properly
initialize.
The colors are shown in the legend box of the node control panel.
Components can change from enabled to disabled or disabled to
unknown by clicking on the appropriate button with the left mouse
button.
A multinode system requires an additional component on a memory
board to enable the scalable coherent memory interface. This component
can be viewed by right clicking the on the memory board button. The
right mouse button toggles the memory board display between the
memory board and the SCI device
Node control panel
The node control panel allows the user to select a node, select the stop
clocks on an error function, select the boot parameters for a node and
direct data flow between the node and the xconfig utility. It is shown in
Figure 28.
Chapter 247
Page 66
Configuration management
xconfig
Figure 28xconfig window node control panel
The node number is shown in the node box. A new number can be
selected by clicking on the node box and selecting the node from the pulldown menu. A new complex can be selected by clicking on the complex
box and selecting it from the pull-down. A node IP address is displayed
along with the node number and complex.
48Chapter 2
Page 67
Configuration management
xconfig
When a new node is selected and available, its data is automatically read
and the node configuration map updated. The data image is kept on the
teststation until it is rebuilt on the node using the Replace button. This
is similar to the replace command on sppdsh.
Even though data can be rebuilt on a node, it does not become active
until POST runs again and reconfigures the system. The Reset or Reset
All buttons can be used to restart POST on one or all nodes of a system.
A multinode system requires a reset all to properly function.
A Retrieve button is available on the node control panel to get a fresh
copy of the parameters settings in the system. Clicking this button
overwrites the setting local to the teststation and xconfig.
The Stop-on-hard button is typically used to assist in fault isolation. It
stops all system clocks shortly after an error occurs. Only scan-based
operations are available once system clocks have stopped.
The last group of buttons controls what happens after POST completes.
The node can become idle or boot OBP, the test controller, or spsdv. The
test controller and spsdv are additional diagnostic modes.
Chapter 249
Page 68
Configuration management
Configuration utilities
Configuration utilities
V2500 diagnostics provides utilities that assist the user with
configuration management.
autoreset
autoreset allows the user to specify whether ccmd should
automatically reset a complex after a hard error and after the hard
logger error analysis software has run. autoreset occurs if a
.ccmd_reset file does not exist in the complex-specific directories
Arguments to autoreset arguments include <complex_name> on and
<complex_name> off or chk.
The output of the chk option for a complex name of hw2a looks like:
Autoreset for hw2a is enabled.
or
Autoreset for hw2a is disabled.
NOTEautoreset determines the behavior of ccmd when it encounters an error
condition. ccmd makes its decision whether to reset a complex
immediately after running hard_logger. Enabling autoreset after
hard_logger has run does not reset the complex.
est_config
est_config is a utility that builds the node and complex descriptions
used by est. est_config reads support files at
/spp/data/DB_RING_FILE, reads the electronic board identifier (COP
chip) and scans to completely describe the node or complex. It also uses
the hardware database created by ccmd. The data retrieved is organized
and sorted into an appropriate node configuration file in the /spp/data/
<complex name> directory.
An optional configuration directory can be specified using the -p
argument. est_config works across all nodes unless a specific node or
complex is requested with the -n option.
50Chapter 2
Page 69
Configuration management
Configuration utilities
NOTEIf there is a node_#.pwr file that is older than the node_#.cfg file, existing
node configuration files do not need to be updated. est_config also
generates a complex_uts.cfg file that can be compared against a
complex.cfg file for accuracy and consistency.
xsecure
xsecure is an application that helps make a V2500 class teststation
secure from external sources. This tool disables modem and LAN activity
to provide an extra layer of security for the V2500 system. xsecure may
be run as a command line tool or an windows-based application.
In secure mode, all network LANs other than the tsdart bus are disabled
and the optional modem on the second serial port will be disabled. When
in normal mode all networks and modems are re-enabled.
If the command line [-on | -off | -check] options are used,
xsecure does not use the GUI interface. These options allow the user to
turn the secure mode on, off or allow the user to check the secure mode
status
A simple button with a red or green secure mode indicator provides the
user with secure mode status information. The red indicator shows that
the secure mode process has begun. The label near the red button will
inform the user when the test station is secure. A green indicator and the
appropriate label shows that the network is available and the teststation
may be accessed through the ethernet port.
In order for xsecure to work properly the teststation, console cables,
terminal mux and modems must be configured in specific ways. The
teststation JTAG connections, OBP connections and an optional terminal
mux must all be connected to the Diagnostic LAN and identified in the /
etc/hosts file as tsdart-d. The sppconsole serial cable must be connected
to serial port 0 and to node 0. An optional modem may be connected to
serial port 1.
Chapter 251
Page 70
Configuration management
Configuration utilities
52Chapter 2
Page 71
3Power-On Self Test
POST is the Power On Self Test firmware for the V-Class platform.
POST provides processor and system hardware initialization
functionality, as well as providing basic processor selftest and utilities
board SRAM pattern test capability. This chapter describes how POST
initializes a node and handles power up errors.
Chapter 353
Page 72
Power-On Self Test
Overview
Overview
Upon power up, all processors and hardware must be initialized before
the node proceeds with booting. POST begins executing and brings up
the node from an indeterminate state and then calls OBP.
None of the POST modules can be directly controlled via a user interface.
Program control is provided by a set of configuration parameters
(processing flags and variable definitions) stored in NVRAM by OBP,
do_rest, or xconfig.
The error reporting modules display error codes for all fatal errors that
occur during the POST execution. Any errors that can be recovered from,
are reported to OBP. POST status is reflected on the LCD display.
POST performs the following tasks:
• Initializes and conditionally performs cache tests on each processor in
the node
• Validates all shared data structures within the NVRAM.
• Initializes the core logic required to start OBP execution
• Determines node configuration
• Initializes all ASICs
• Initializes main memory
• Sets up CTI cache
• Invokes OBP or the Test Controller.
Any fatal errors are reported to the user by way of the system LCD and
the system console. POST passes node configuration and any options to
OBP via shared data structures.
Reset
The following types of reset invoke POST:
• Power up reset— If a client had execution control before the power
down condition, it invokes POST to initialize the hardware. POST
initializes all hardware after a power up reset.
54Chapter 3
Page 73
Power-On Self Test
Overview
• Hard reset—If a client had execution control before the hard reset, it
invokes POST to initialize the hardware. POST restarts execution
and reinitializes all hardware.
• Soft reset—If a soft reset condition has occurred while POST was
executing, POST restarts execution but does not initialize main
memory. It invokdes its interactive prompt.
Chapter 355
Page 74
Power-On Self Test
POST modules
POST modules
POST executes modules listed below in chronological order:
• Processor Initialization and Selftest—Each processor initializes itself
on power up or reset in parallel with the other processors.
Initialization includes setting values into the internal diagnostic
registers, initializing the instruction and data caches, clearing a
scratch ram area for stack and data storage, and enabling highpriority machine checks (HPMC), low-priority machine checks
(LPMC), and transfer of control (TOC). Selftest includes instruction
set tests, instruction and data cache RAM tests and TLB RAM tests.
• SCUB Hardware Initialization—POST clears any error state in the
SCUB, initializes the SCUB hardware registers and DUART, and
initializes and optionally tests the SRAM on the SCUB (see
scuba_test_enable).
• Non-volatile Configuration Data Verification—POST verifies the
checksum of all shared data regions in a battery-backed-up SRAM
(NVRAM). POST verifies only the regions it shares with other
modules, such as OBP, and those private to POST. If a region fails, it
is rebuilt using default values.
• Hardware Configuration Determination—POST determines the ASIC
installations status and verifies that each installed ASIC responds to
register accesses. If one does not, it is reported as failing. POST then
configures the system to utilize the maximum amount of installed
hardware based on the V2500 hardware configuration rules.
• Node Hardware (ASIC) Initialization—POST sets up all available
hardware with the proper operating mode(s) enabled. Routing is
configured for the current hardware population.
• Node Main Memory Initialization—POST probes all installed
memory boards for memory installation status. It then enables each
memory board as a 2-, 4-, or 8-board configuration based on V2500
configuration rules. All remaining memory boards are configured to
have the same logical memory population. It then initializes main
memory in parallel, using up to eight processors using initialization
hardware in the memory controllers.
56Chapter 3
Page 75
Power-On Self Test
POST modules
• Page Deallocation Table Support—POST supports reading the page
deallocation table (PDT) and remapping memory if it detects a bad
page in the HPUX good-memory region. It updates all entries to
reflect the new memory layout if remapping occurs. It also clears PDT
if memory hardware change is detected.
• Client Boot—POST cleans up any residual state from POST
execution and boots the client specified in boot_module. POST can
boot clients with all processors or with just with the monarch
processor leaving the other processors in an idle loop.
Chapter 357
Page 76
Power-On Self Test
Interactive mode
Interactive mode
POST for the V2500 provides a command line interface for configuration
and debugging. The command line interface is invoked if boot_module is
set to “interactive,” by a soft reset, or a TOC during POST execution.
Interactive mode commands
POST supports the following commands at the line prompt:
• help—Displays a list of supported commands and their usage.
• banner—Displays the POST version and build information.
• reset [loader|post|soft]—Causes POST to perform a reset of
the node. If loader is specified, then the node is hard reset and
executes the firmware loader PDCFL. If post is specified, the node is
hard reset and executes POST. If soft is specified, the node is soft
reset and executes POST.
• dcm—Dumps the configuration map from NVRAM and display the
hardware status of the machine, showing what hardware is enabled,
deconfigured, or failing.
• setenv <parm> <value>—Sets the configuration parameter
specified by parm to the value.
• printenv [parm]—Prints the value of the configuration parameter
specified by parm. If no parameter is specified, then all are printed.
• get_opt [asic_type [asic_number]]—Dumps the option mode
bits for the ASIC type specified by asic_type. If an asic_number is
also specified, then only the values of the ASIC are printed. If an
asic_number is not specified, then all ASICs of that asic_type are
dumped. If no asic_type is specified, then all ASICs are dumped.
• pdt—Dumps the current Page Deallocation Table (PDT) contents.
• clear_pdt—Clears out all entries in the PDT.
• memmap—Displays any rows that have been logically remapped due to
PDT entries or failing software-deconfigured DIMMs.
58Chapter 3
Page 77
Configuration parameters
The following parameters control the runtime operation of POST:
• ts_ip—Specifies the teststation IP address for LAN messaging. The
value should be set to the IP address of the diagnostics LAN port on
the teststation. [default: 15.99.111.99]
Table 7Name of teststation IP address for listed utilities
UtilityParameter name
OBPts-ip#
POSTts_ip
sppdshts_ip
• scub_ip—Specifies the IP address used for LAN interface hardware
on the utilities (SCUB) board. This is the IP address that POST, OBP,
and the Test Controller use for LAN messaging with the teststation.
[default: none]
Power-On Self Test
Interactive mode
Table 8Name of scub IP address for listed utilities
UtilityParameter name
OBPobp-ip#
POSTscub-ip
sppdshscub_ip
ts_configscub_ip
• cti_cache_size—Specifies the amount of memory, in megabytes,
to reserve in the node for CTI cache. This is used only in multinode
configurations. [default: 0 Mbytes]
Chapter 359
Page 78
Power-On Self Test
Interactive mode
Table 9Name of CTI cache size IP address for listed utilities
• scuba_test_enable—Enables Scuba test control if the SCUB
SRAM is tested before initialization. This affects the processor
initialization, since processors test and initialize their own stack
region. It also affects the SCUB initialization and core LAN SRAM
initialization steps for the monarch. [default: true]
60Chapter 3
Page 79
Table 12Name of scuba test enable for listed utilities
• master_error_enable—Determines whether POST will enable
errors or not. This is used in conjunction with
use_error_overrides to determine how errors are enabled.
[default: true]
Table 13Name of master error enable for listed utilities
UtilityParameter name
OBPmaster-error-enable?
POSTmaster_error_enable
Power-On Self Test
Interactive mode
sppdshmaster_error_enable
• use_error_overrides—Determines if POST will use the built-in
defaults for errors or the user error overrides. This is only checked if
master_error_enable is enabled. [default: false]
Table 14Name of use error overides for listed utilities
• force_monarch—Determines if POST will force the monarch
selection to a specific processor. The processor is specified in
monarch_number [default: false]
Chapter 361
Page 80
Power-On Self Test
Interactive mode
Table 15Name of sforce monarch for listed utilities
POST has three types of messages: LCD, console, and error. This section
discusses each type.
LCD messages
Each node has an LCD display. Figure 29 shows the display and
indicates what each line on the display means.
Figure 29Front panel LCD
Power-On Self Test
Messages
0 (0,0)
MIII IIII IIII IIII
IIII IIII IIII IIII
abcedfghijklr
Node status line
The Node Status Line shows the physical node ID in both decimal and
X, Y topology formats.
Processor status line
The processor status line shows the current run state for each processor
in the node. Table 17 shows the initialization step code definitions and
Table 18 shows the run-time status codes. The M in the first processor
status line stands for the monarch processor.
RRUN: Performing system initialization operations.
IIDLE: Processor is in an idle loop, awaiting a command.
MMONARCH: The main POST initialization processor.
HHPMC: processor has detected a high priority machine check
(HPMC).
TTOC: processor has detected a transfer of control (TOC).
SSOFT_RESET: processor has detected a soft RESET.
DDEAD: processor has failed initialization or selftest.
64Chapter 3
Page 83
StatusDescription
dDECONFIG: processor has been deconfigured by POST or the user.
-EMPTY: Empty processor slot.
?UNKNOWN: processor slot status in unknown.
Message display line
The message display line shows the POST initialization progress. This is
updated by the monarch processor. The system console also shows detail
for some of these steps. Table 19 shows the code definitions.
cUtilities board (SCUB) SRAM test. (optional)
dUtilities board (SCUB) SRAM initialization.
eReading Node ID and serial number.
fVerifying non-volatile RAM (NVRAM) data structures.
gProbing system hardware (ASICs).
hInitializing system hardware (ASICs).
iProbing processors.
jInitialing, and optionally testing, remaining SCUB SRAM.
kProbing main memory.
lInitializing main memory.
rEnabling system error hardware.
Chapter 365
Page 84
Power-On Self Test
Messages
Console messages
POST provides several messages that are displayed on the teststation
console. This section describes these console messages.
Type-of-boot
This message reports the type of boot for the current POST execution,
and the node ID and monarch processor.
For example:
POST Hard Boot on [0:PB1R_A]
Version and build
This message reports the version and build information for POST.
For example:
HP9000/V2500 POST Release 1.0, compiled 1998/11/04
14:33:12
Processor probe
This message reports the processors as they are detected in the system.
Only available processors are reported; any failing or deconfigured
processors are not listed. Processors in this list may be deconfigured if
they share a Runway bus with a processors that fails the probe or is
deconfigured.
This message reports that the Utilities board SRAM reserved for
missing or unavailable CPUs is being initialized. The SRAM is tested
prior to initialization if scuba_test_enable is true.
For example:
Completing core logic SRAM initialization.
66Chapter 3
Page 85
Power-On Self Test
Messages
Main memory initialization
This message reports that main memory initialization has started.
For example:
Starting main memory initialization.
Memory probe
This message reports the status of the memory boards as they are
detected and probed for DIMMs
This message reports the total memory installed and available, in
megabytes.
For example:
Installed memory: 2048 MBs, available memory: 2048 MBs
Main memory initialization started
This message marks the beginning of main memory initialization.
For example:
Initializing main memory.
Parallel memory initialization
This message reports that main memory initialization will be done with
multiple processors in parallel. Only printed if more than one processor
is available for memory initialization.
For example:
Parallel memory initialization in progress.
Memory initialization progress
This message reports the results of the initialization, the initializing
processor, and the memory board for each board available in the node.
Chapter 367
Page 86
Power-On Self Test
Messages
Each character indicates the physical location of the DIMM and the
logical size of the DIMM. The memory information is encoded as follows:
ValueMemory Type
This message indicates that POST is entering it's interactive mode.
POST provides a console interface for system configuration and debug.
For example:
Booting Interactive
Interactive prompt
The following is the POST interactive prompt and is only seen if
boot_module is set to interactive.
For example:
[node id 0:monarch] POST>
Chassis codes
The processor initialization and selftest functions in POST report
status and error information with chassis codes. These chassis codes
are shared with cpu3000 and are documented in the man page with
the exception of the following POST-specific codes:
0x6103CThe processor is executing it's processor initialization
code
0x22025The processor encountered a data error while loading
the processor Icache
0x22026The processor encountered a tag error while loading
the processor Icache.
Error messages
POST provides error message that are printed to the console. This
section describes these error message
Teststation parameters failure
This message reports the that test station parameters structure failed
Chapter 369
Page 88
Power-On Self Test
Messages
the checksum and was rebuilt to the default structure.
For example:
Test Station Parameters checksum FAILED, rebuilding...
This node may be forced with the sppdsh reboot <node> default
command
Configuration map failure
This message indicates that the configuration map structure failed the
checksum and was rebuilt to defaults. Any user deconfigured hardware
state is lost.
For example:
Configuration Map checksum FAILED, rebuilding...
Configuration map failure
This message indicates that the configuration parameters structure
failed the checksum and was rebuilt to the default structure. Any user
overrides from the default value, for parameters that have a default, is
lost. Some parameters have no default and retain the value in NVRAM.
Since NVRAM could be corrupt, these values could be invalid.
This message indicates that the specified ASIC failed the probe. The
status of any components that must be accessed through this component
are unknown, and they are be available if installed.
For example:
Failed probe of P1R.
Unable to determine status of PB1R_A PB1L_A PB1L_B
PB1L_A IOLR_B.
70Chapter 3
Page 89
Power-On Self Test
Messages
Memory board deconfiguration
This message indicates that the specified memory board is deconfigured.
This can be due to a memory board being found on one side of memory
without a corresponding pair, since boards must be used in pairs of even/
odd boards. This can also occur when a memory board has no usable
memory.
For example:
Deconfiguring: MB5L
Illegal memory board configuration
This message indicates that there is an unallowed memory board
configuration. Memory boards can only be used in two-, four-, or eightboard configurations. In the following example, a six-board configuration
was detected, and two boards will be deconfigured.
For example:
Illegal 6 memory board configuration.
Memory remap
This message indicates that the specified physical row was mapped to
the indicated logical row. This is done to accommodate either improperly
installed DIMMs or when DIMMs in lower rows are not usable. This can
occur when a DIMM is bad or when memory boards contain differing
memory populations.
For example:
MB0L: Physical Row:2 mapped to Logical Row:0
Processor initialization failure
This message indicates that the specified processor failed to perform the
step described during parallel main memory initialization. The monarch
processor completes the initialization assigned the failing processor.
For example:
PB1R_A timed out during encache memory init code
PB1R_A timed out during memory initialization
PB1R_A timed out during idle request after memory init
Chapter 371
Page 90
Power-On Self Test
Messages
PB0L_B failed to go idle after memory init
Unable to force CPU PB2L_A into idle loop
Monarch completing memory initialization
This message indicates that the monarch processor is completing the
memory initialization assigned to the specified processor.
For example:
Using Monarch to initialize memory assigned to PB2L_A
PDT checksum failure
This message indicates that the page deallocation table structure failed
the checksum and was rebuilt to defaults. All bad page information is
lost.
This message indicates that POST detected a change in memory
hardware and cleared all entries in the PDT.
For example:
Detected a hardware change, clearing the Page
Deallocation Table (PDT).
Memory remapped
This message indicates that POST remapped memory to achieve HP-UX
good memory region. This occurs when a bad page is marked within the
good memory region.
For example:
Memory was re-mapped to achieve HP/UX good memory
region.
72Chapter 3
Page 91
Power-On Self Test
Messages
Contiguous memory block not found
This message indicates that POST could not find a block of contiguous
memory to place at address zero to achieve good memory. POST will
report no main memory to the OBP for this failure.
For example:
HP/UX good memory region could not be achieved.
Processor not reported
This message indicates that a processor failed to mark itself in the
system report register. Reporting happens early in the sequence, and
this failure usually indicates the processor has failed to execute any
instructions.
For example:
Failed probe of PB1R_B, CPU failed to report in.
Processor initialization/selftest failure
This message indicates that a processor failed at some point during
initialization or selftest. The chassis code for the module that failed is
reported.
For example:
Failed probe of PB1R_B
chassis code 0x6103C
Processor not responding to interrupt
This message indicates that a processor properly initialized itself but did
not respond to an external interrupt
For example:
Failed probe of PB1R_B
cpu PB1R_B did not respond to an interrupt
Shared Runway bus failure
This message indicates that an available processor has been
deconfigured because it shares a Runway bus with a processor that failed
to probe
Chapter 373
Page 92
Power-On Self Test
Messages
For example:
cpu PB1R_A deconfigured due to PB1R_B shutdown.
New monarch processor selected
This message indicates that the previous monarch processor was
deconfigured and a new one was selected. The new monarch continues
the initialization of the rest of the system
For example:
INFO: New monarch selected: PB0R_A
New monarch processor not found
This message indicates that the other processor on the Runway bus with
the monarch processor was deconfigured or failed and another suitable
processor could not be found to replace the monarch.
For example:
WARNING: The monarch shares a Runway bus with a failed
cpu.
74Chapter 3
Page 93
4Test Controller
The Test Controller is an EEPROM-based utility that provides the
environment for executing the offline diagnostic tests. It is controlled
through parameters stored in the NVRAM on the Utilities board. The
Test Controller reads these parameters to determine its execution mode ,
the number processors to test, which SMACs to include in the testing,
which subtests to run, and other diagnostic test-specific information.
Chapter 475
Page 94
Test Controller
Test Controller modes
Test Controller modes
There are three basic operational modes for this utility:
• Stand-alone mode
• Interactive mode
• I/O Utility mode
In stand-alone mode, cxtest invokes the Test Controller. The Test
Controller reads test parameters from NVRAM (these parameters are
written into NVRAM by cxtest before it invokes the Test Controller),
executes the test and subtests specified in NVRAM, and sets a
completion bit in NVRAM when the test and subtests are finished.
cxtest is described in Chapter 5.
In interactive mode, a user interface allows the user to select the
processors to test, select the subtests to run, and examine error
information. The user interface is a set of menus described in this
chapter.
In the interactive mode, the Test Controller loops waiting for the start
command. Prior to issuing the start command, any global and/or
processor specific parameters can be modified. When all tests have
completed, the Test Controller waits for the next start command. Any
combination of parameter and tests may be modified and executed.
In I/O Utility mode, the Test Controller will load, and subsequenty
exectue, a firmware utility module from the test station. There are
currently two support utility modules: arrm and dfdutil. The
teststation utility tc_ioutil identifies the utility module to be loaded.
tc_ioutil updates an NVRAM location with the file name of the utility
module to be loaded. See tc_ioutil, arrm and dfdutil in Chapter 11,
“Utilities,” or more details.
An example of tc_ioutil would be:
tc_ioutil <node> dfdutil.fw
76Chapter 4
Page 95
Test Controller
User interface
User interface
The Test Controller provides for the control of offline diagnostic test
execution. It utilizes a set of parameters to control its operation. The
parameters consist of the following:
• Global set that controls the overall operation of the Test Controller
• Test set (one per test) that controls how the tests are executed by the
Test Controller
• CPU parameters (one per processor) that contain status information
about the tests executing on each processor
All these parameters are in NVRAM.
The user interface allows the user to modify parameters that reside in
NVRAM, thereby controlling the operation of the Test Controller. It also
allows the user to select which subtests are executed on each of the
processors and modify the test parameters, as well as any other test
information.
The Test Controller user interface consists of two basic menus. The first
is the main menu that gives the user the following capabilities:
• Modify the POST boot selection
• Control operation of the Test Controller
• Display the current global parameter selections
• Display processor summary
• Switch processors
• Go to the Test Configuration menu
The second menu is the processor Test Control menu that provides the
following capabilities:
• Select classes of subtests to execute
• Select subtests to execute
• Specify pause enables
• Specify whether or not to loop
• Specify the test and/or subtest error counts
Chapter 477
Page 96
Test Controller
User interface
• Read and write the 128 words of test specific information
• Select the hardware to test
• Display the current parameter selections
Main menu
Test Controller Main Menu
MAIN Menu commands
0=Quit Test Controller
1=Begin Test Controller Execution
2=Halt Test Controller Execution
3=Resume Test Controller Execution
4=Switch CPU
5=POST Boot Selection
6=Execution Mode Selection
7=Global Parameter Display
8=CPU Summary Display
9=Display CPU Errors
A=Test Selection Menu
B=Test Configuration Menu
C=Debugging Menu
D=Display revision
Enter Command:
Each main menu selection is defined as follows:
• 0=Quit Test Controller—Terminates the Test Controller utility and
either reboots the system (to POST and then to the selected program)
or halts the system depending on the current value of the POST Boot
Selection flag.
• 1=Begin Test Controller Execution—Starts the Test Controller utility
executing the specified subtests on the selected processors. The entire
system is started from the beginning.
• 2=Halt Test Controller Execution—Suspends temporarily the
operation of the Test Controller. This command may be entered at
any time. Only the Test Controller is halted; subtests on other
processors continue to execute.
78Chapter 4
Page 97
Test Controller
User interface
• 3=Resume Test Controller Execution—Continues execution from the
point of interruption.
• 4=Switch CPU—Allows the user to start the Test Controller on the
specified processor. The previously used processor starts executing
the command wait loop code. The user is prompted for the processor
as follows:
Enter CPU (0-1f):
• 5=POST Boot Selection—Prompts the user for the new value with the
following prompt:
For all values, POST boots to ESPDV. The Test Controller performs a
hard reset to POST when the Test Controller terminates.
• 6=Execution Mode Selection—Allows the user to select the mode for
executing the subtests. The two options are serial and parallel. The
following prompt queries for the selection:
Execution Mode Selection (0=serial, 1=parallel):
• 7=Global Parameter Display—Displays the available hardware
components, the current “POST Boot Selection” value, and the
current “Execution Mode Selection” value. The display is shown
below:
Example Global Parameter display
Enter command: 7
MAIN Menu - Global Parameters Display
The asterisks denote the component has passed POST processing and
is available for diagnostic testing (see processors 0-3 and SMACs 0, 2,
4, and 6 in the display).
Chapter 479
Page 98
Test Controller
User interface
• 8=CPU Summary display—Displays a summary of the current
processor and testing information. An example of the display is
shown below:
FAIL
CPU STATE COUNT SUBTEST TEST NAME
=== ===== ===== ======= =========
0 Not Available n/a n/a n/a
1 Not Available n/a n/a n/a
2 Not Available n/a n/a n/a
3 Idle n/a n/a n/a
4 Not Available n/a n/a n/a
5 Not Available n/a n/a n/a
6 Not Available n/a n/a n/a
7 Not Available n/a n/a n/a
8 Not Available n/a n/a n/a
9 Not Available n/a n/a n/a
10 Not Available n/a n/a n/a
11 Not Available n/a n/a n/a
12 Not Available n/a n/a n/a
13 Not Available n/a n/a n/a
14 Not Available n/a n/a n/a
15 Not Available n/a n/a n/a
16 Not Available n/a n/a n/a
17 Not Available n/a n/a n/a
18 Not Available n/a n/a n/a
19 Not Available n/a n/a n/a
20 Not Available n/a n/a n/a
21 Not Available n/a n/a n/a
22 Not Available n/a n/a n/a
23 Idle n/a n/a n/a
24 Idle n/a n/a n/a
25 Not Available n/a n/a n/a
26 Not Available n/a n/a n/a
27 Not Available n/a n/a n/a
28 Not Available n/a n/a n/a
29 Not Available n/a n/a n/a
30 Not Available n/a n/a n/a
31 Not Available n/a n/a n/a
Hit <ENTER> key to return to the MAIN Menu:
Test Controller
User interface
Each available hardware component is marked with an asterisk just
to the right of its number (see processors 0-3 and SMACs 0, 2, 4, and 6
in the display).
Chapter 481
Page 100
Test Controller
User interface
The possible states in the CPU Summary Display are described in Table
20.
Table 20Processor States
CPU StateDescription
Not AvailableDenotes processor is not available for testing.
RunningDenotes a test is currently running on this
IdleDenotes that no test is running on this processor.
ReadyDenotes last subtest completed and ready for
Test CompletedDenotes test completed execution on this
Error DetectedDenotes test halted due to an error condition on
processor.
next subtest.
processor.
this processor.
Test TimeoutDenotes a timeout detected during test execution
on this processor; the test is halted.
HW Reqs Not MetDenotes the hardware selected does not meet the
minimum hardware required for executing the
test.
User HaltedDenotes user halted test.
Unexpected
HPMC
SW DeconfiguredDenotes test automatically halted testing on this
• 9=Display CPU Errors—Displays the errors for the currently selected
processor. When selected, the user is prompted for the processor as
follows:
Enter CPU [0-1f]:
82Chapter 4
Denotes running test caused an HPMC; the test
is halted.
processor, because of a software restriction.
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.