IBM p 655 User Manual

ERserver

pSeries 655
User’s Guide
SA38-0617-03
ER s e r v e r

pSeries 655
User’s Guide
SA38-0617-03
A
©
Fourth Edition (February 2004)
Before using this information and the product it supports, read the information in “Safety Notices” on page ix, Appendix A, “Environmental Notices,” on page 123, and Appendix B, “Notices,” on page 125.
reader’s comment form is provided at the back of this publication. If the form has been removed, address comments to Information Development, Department H6DS-905-6C006, 11501 Burnet Road, Austin, Texas 78758-3493. To send comments electronically, use this commercial internet address: aix6kpub@austin.ibm.com. Any information that you supply may be used without incurring any obligation to you.
Copyright International Business Machines Corporation, 2002, 2004 All rights reserved.
Note to U.S. Government Users -- Documentation related to restricted rights -- Use, duplication or disclosure is subject to restrictions set forth is GSA ADP Schedule Contract with IBM Corp.
Contents
Safety Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix
Rack Safety Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix
Electrical and Mechanical Safety . . . . . . . . . . . . . . . . . . . . . . . . . . .ix
Laser Safety Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Laser Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Data Integrity and Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
ISO 9000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
Highlighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
Accessing Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
References to AIX Operating System . . . . . . . . . . . . . . . . . . . . . . . . .xv
Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Chapter 1. Reference Materials . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Documentation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
Chapter 2. Introducing the pSeries 655 . . . . . . . . . . . . . . . . . . . . . . . .7
Processor Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
Partitioned System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
Partition Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
System Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Types of Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
System Attention LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
System Attention LED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Accessing System Log Error Information . . . . . . . . . . . . . . . . . . . . . . .12
PCI Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Resetting the System Attention LED . . . . . . . . . . . . . . . . . . . . . . . .13
Chapter 3. Using the Hardware Management Console for pSeries . . . . . . . . . . . . .15
Hardware Management Console (HMC) Overview and Setup . . . . . . . . . . . . . . . .15
System Power-On Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Powering On the Processor Subsystem Using the HMC . . . . . . . . . . . . . . . . . .15
Powering Off the Processor Subsystem Using the HMC . . . . . . . . . . . . . . . . . .15
Graphics Console Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Understanding the Power-On Self-Test (POST) . . . . . . . . . . . . . . . . . . . . .16
POST Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
POST Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Chapter 4. Using the Service Processor . . . . . . . . . . . . . . . . . . . . . . .19
Service Processor Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Accessing the Service Processor Menus . . . . . . . . . . . . . . . . . . . . . . .19
Saving and Restoring Service Processor Settings . . . . . . . . . . . . . . . . . . .19
Menu Inactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
General User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Privileged User Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Main Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Service Processor Setup Menu . . . . . . . . . . . . . . . . . . . . . . . . . .23
Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
System Power Control Menu . . . . . . . . . . . . . . . . . . . . . . . . . . .27
System Information Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
iii
iv
Language Selection Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Call-In/Call-Out Setup Menu . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Service Processor Parameters in Service Mode (Full System Partition) . . . . . . . . . . . . .39
Service Processor Reboot/Restart Recovery . . . . . . . . . . . . . . . . . . . . . .39
Boot (IPL) Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Failure During Boot Process . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Failure During Normal System Operation . . . . . . . . . . . . . . . . . . . . . . .39
Service Processor Reboot/Restart Policy Controls . . . . . . . . . . . . . . . . . . .40
Updating System Firmware and Microcode . . . . . . . . . . . . . . . . . . . . . . .42
General Information on Processor Subsystem Firmware Updates . . . . . . . . . . . . . .42
Determining the Level of Firmware on the Processor Subsystem . . . . . . . . . . . . . .42
Processor Subsystem Firmware Update Using a Locally Available Image . . . . . . . . . . .43
Updating System Firmware from the AIX Service Aids . . . . . . . . . . . . . . . . . .44
Updating System Firmware from the AIX Command Line . . . . . . . . . . . . . . . . .44
Frame (Power Subsystem) Firmware Update . . . . . . . . . . . . . . . . . . . . .44
Integrated SCSI Controller Microcode Update . . . . . . . . . . . . . . . . . . . . .45
Integrated Ethernet Microcode Update . . . . . . . . . . . . . . . . . . . . . . . .45
Installing Corrective Service on the Frame . . . . . . . . . . . . . . . . . . . . . .45
Reconfiguration Procedure for SNI Adapters . . . . . . . . . . . . . . . . . . . . .46
Configuring and Deconfiguring Processors or Memory . . . . . . . . . . . . . . . . . . .46
Run-Time CPU Deconfiguration (CPU Gard) . . . . . . . . . . . . . . . . . . . . .47
Service Processor System Monitoring - Surveillance . . . . . . . . . . . . . . . . . . .47
System Firmware Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . .47
Operating System Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . .47
Service Processor Error Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
LCD Progress Indicator Log . . . . . . . . . . . . . . . . . . . . . . . . . . . .49
Resetting the Service Processor . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Service Processor Operational Phases . . . . . . . . . . . . . . . . . . . . . . . .50
Pre-Standby Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Standby Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Bring-Up Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Runtime Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Clearing L3 Gard Records . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Chapter 5. Using System Management Services . . . . . . . . . . . . . . . . . . . .53
Select Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Change Password Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Set Privileged-Access Password . . . . . . . . . . . . . . . . . . . . . . . . . .54
View Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
Setup Remote IPL (Initial Program Load) . . . . . . . . . . . . . . . . . . . . . . . .55
Change SCSI Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Select Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Select Boot Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Select Boot Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61
Display Current Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Restore Default Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Multiboot Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Exiting System Management Services . . . . . . . . . . . . . . . . . . . . . . . . .64
Chapter 6. Using the Online and Standalone Diagnostics . . . . . . . . . . . . . . . .65
Online and Standalone Diagnostics Operating Considerations . . . . . . . . . . . . . . . .65
Identifying the Terminal Type to the Diagnostics . . . . . . . . . . . . . . . . . . . .65
Undefined Terminal Types . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
Resetting the Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
Running Online Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
Online Diagnostics Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . .66
Eserver pSeries 655 User’s Guide
Service Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
Concurrent Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Maintenance Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Running Online Diagnostics in Service Mode . . . . . . . . . . . . . . . . . . . . . .68
Running the Online Diagnostics in Concurrent Mode . . . . . . . . . . . . . . . . . . .68
Running the Online Diagnostics in Maintenance Mode . . . . . . . . . . . . . . . . . . .68
Standalone Diagnostic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . .69
Partitioned System Considerations for Standalone Diagnostics . . . . . . . . . . . . . . .69
Running Standalone Diagnostics from a Network Installation Management (NIM) Server . . . . .69
NIM Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . .69
Client Configuration and Booting Standalone Diagnostics from the NIM Server . . . . . . . . .70
Chapter 7. Introducing Tasks and Service Aids . . . . . . . . . . . . . . . . . . . .73
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Add Resource to Resource List . . . . . . . . . . . . . . . . . . . . . . . . . . .75
AIX Shell Prompt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Analyze Adapter Internal Log . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Backup and Restore Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Certify Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Change Hardware Vital Product Data . . . . . . . . . . . . . . . . . . . . . . . . .77
Configure Dials and LPF Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
Configure ISA Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Configure Reboot Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Configure Remote Maintenance Policy . . . . . . . . . . . . . . . . . . . . . . . . .79
Configure Ring Indicate Power-On Policy . . . . . . . . . . . . . . . . . . . . . . . .80
Configure Scan Dump Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
Configure Surveillance Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
Create Customized Configuration Diskette . . . . . . . . . . . . . . . . . . . . . . .82
Delete Resource from Resource List . . . . . . . . . . . . . . . . . . . . . . . . .82
Disk Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
Disk to Disk Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
Display/Alter Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Display Configuration and Resource List . . . . . . . . . . . . . . . . . . . . . . . .83
Display Firmware Device Node Information . . . . . . . . . . . . . . . . . . . . . . .83
Display Hardware Error Report . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Display Hardware Errors for Any Resource . . . . . . . . . . . . . . . . . . . . . .83
Display Hardware Errors for PCI-X SCSI RAID Adapters . . . . . . . . . . . . . . . . .84
Display Hardware Errors for PCI-X SCSI Adapters . . . . . . . . . . . . . . . . . . .84
Display Hardware Vital Product Data . . . . . . . . . . . . . . . . . . . . . . . . .84
Display Machine Check Error Log . . . . . . . . . . . . . . . . . . . . . . . . . .84
Display Microcode Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84
Display MultiPath I/O (MPIO) Device Configuration . . . . . . . . . . . . . . . . . . . .84
Display or Change Bootlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
Display or Change Diagnostic Run-Time Options . . . . . . . . . . . . . . . . . . . . .85
Display Previous Diagnostic Results . . . . . . . . . . . . . . . . . . . . . . . . .86
Display Resource Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87
Display Service Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87
Display Software Product Data . . . . . . . . . . . . . . . . . . . . . . . . . . .87
Display System Environmental Sensors . . . . . . . . . . . . . . . . . . . . . . . .87
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Display Test Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Display USB Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Download Microcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
Download Microcode to PCI SCSI RAID Adapter . . . . . . . . . . . . . . . . . . . .89
Download Microcode to a PCI-X Dual Channel Adapter . . . . . . . . . . . . . . . . .89
Download Microcode to Disk Drive Attached to a PCI SCSI RAID Adapter . . . . . . . . . .89
Contents
v
vi
Download Microcode to a Fiber Channel Adapter . . . . . . . . . . . . . . . . . . . .90
Download Microcode to DVD-RAM Attached to a PCI SCSI Adapter . . . . . . . . . . . . .90
Download Microcode to Disk Attached to PCI SCSI Adapter . . . . . . . . . . . . . . . .90
Download Microcode to Other Devices . . . . . . . . . . . . . . . . . . . . . . .90
Fault Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
Fibre Channel RAID Service Aids . . . . . . . . . . . . . . . . . . . . . . . . . .91
Flash SK-NET FDDI Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . .92
Format Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92
Format and/or Erase Hard File Attached to a Non-RAID and PCI-X SCSI Adapter . . . . . . . .92
Hardfile Attached to PCI SCSI RAID Adapter . . . . . . . . . . . . . . . . . . . . .93
Optical Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
Diskette Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
Gather System Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
Generic Microcode Download . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
Hot-Plug Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
PCI Hot-Plug Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
SCSI Hot Swap Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
SCSI and SCSI RAID Hot-Plug Manager . . . . . . . . . . . . . . . . . . . . . . .97
RAID Hot-Plug Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98
Identify Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99
Identify and Remove Resource Task . . . . . . . . . . . . . . . . . . . . . . . . .99
Identify and System Attention Indicators . . . . . . . . . . . . . . . . . . . . . . . . 100
Local Area Network Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Log Repair Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
PCI RAID Physical Disk Identify . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
PCI SCSI Disk Array Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Process Supplemental Media . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
RAID Array Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
PCI SCSI Disk Array Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 102
PCI-X SCSI Disk Array Manager . . . . . . . . . . . . . . . . . . . . . . . . . 102
Run Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Run Error Log Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Run Exercisers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Exerciser Commands (CMD) . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Memory Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Tape Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Diskette Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
CD-ROM Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Floating Point Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Save or Restore Hardware Management Policies . . . . . . . . . . . . . . . . . . . . 105
SCSI Bus Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
SCSI RAID Physical Disk Status and Vital Product Data . . . . . . . . . . . . . . . . . . 106
SCSD Tape Drive Service Aid . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Spare Sector Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
SSA Service Aid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
System Fault Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
System Identify Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Update Disk-Based Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Update System or Service Processor Flash . . . . . . . . . . . . . . . . . . . . . . 108
7135 RAIDiant Array Service Aid . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Command Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7318 Serial Communications Network Server Service Aid . . . . . . . . . . . . . . . . .110
Chapter 8. Verifying Hardware Operation . . . . . . . . . . . . . . . . . . . . . . 111
Considerations Before Running This Procedure . . . . . . . . . . . . . . . . . . . . . 111
Eserver pSeries 655 User’s Guide
. . . . . . . . . . . . . . . . . . . . . . . . . .
Loading the Online Diagnostics in Service Mode . . . . . . . . . . . . . . . . . . . . . 111
Running Standalone Diagnostics from a Network Installation Management (NIM) Server . . . . . .112
NIM Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . .112
Client Configuration and Booting Standalone Diagnostics from the NIM Server . . . . . . . .112
Default Boot List and Service Mode Boot List . . . . . . . . . . . . . . . . . . . . . .114
Chapter 9. Hardware Problem Determination . . . . . . . . . . . . . . . . . . . . .115
Problem Determination Using the Standalone or Online Diagnostics . . . . . . . . . . . . .115
Problem Determination When Unable to Load Diagnostics . . . . . . . . . . . . . . . . . 120
Appendix A. Environmental Notices . . . . . . . . . . . . . . . . . . . . . . . . 123
Product Recycling and Disposal . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Environmental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Acoustical Noise Emissions
(1, 2)
. 124
Appendix B. Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Appendix C. Removing and Replacing PCI Adapters . . . . . . . . . . . . . . . . . . 127
Removing a PCI Adapter Cassette . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Replacing a PCI Adapter Cassette . . . . . . . . . . . . . . . . . . . . . . . . . 131
Removing a Non-Hot-Pluggable PCI Adapter . . . . . . . . . . . . . . . . . . . . . 133
Replacing a Non-Hot-Pluggable PCI Adapter . . . . . . . . . . . . . . . . . . . . . 133
Hot-Pluggable PCI Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
PCI Hot-Plug Manager Access . . . . . . . . . . . . . . . . . . . . . . . . . . 137
PCI Adapter or Blank Filler Removal from a Snap-Assembly-Type Cassette . . . . . . . . . 140
Replacing an Adapter in a PCI Adapter Cassette . . . . . . . . . . . . . . . . . . . . 152
Short Adapter or Blank Filler Installation . . . . . . . . . . . . . . . . . . . . . . . . 156
Long Adapter Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Contents
vii
viii
Eserver pSeries 655 User’s Guide
A
v ix A
v x v x v xi v
v
v Do
v Do
v
v
v An
Safety Notices
danger notice indicates the presence of a hazard that has the potential of causing death or serious
personal injury. Danger notices appear on the following pages:
caution notice indicates the presence of a hazard that has the potential of causing moderate or minor
personal injury. Caution notices appear on the following pages:
140
152
For a translation of the safety notices contained in this book, see the System Unit Safety Information, order number SA23-2652.
Rack Safety Instructions
not install this unit in a rack where the internal rack ambient temperatures will exceed 35 degrees C.
not install this unit in a rack where the air flow is compromised. Any side, front or back of the unit
used for air flow through the unit must not be in direct contact with the rack.
Care should be taken to ensure that a hazardous condition is not created due to uneven mechanical loading when installing this unit in a rack. If the rack has a stabilizer it must be firmly attached before installing or removing this unit.
Consideration should be given to the connection of the equipment to the supply circuit so that overloading of circuits does not compromise the supply wiring or overcurrent protection. To provide the correct power connection to the rack, refer to the rating labels located on the equipment in the rack to determine the total power requirement for the supply circuit.
electrical outlet that is not correctly wired could place hazardous voltage on the metal parts of the system or the devices that attach to the system. It is the responsibility of the customer to ensure that the outlet is correctly wired and grounded to prevent an electrical shock.
Electrical and Mechanical Safety
Observe the following safety instructions any time you are connecting or disconnecting devices attached to the system.
DANGER
electrical outlet that is not correctly wired could place hazardous voltage on metal parts of
An the system or the devices that attach to the system. It is the responsibility of the customer to ensure that the outlet is correctly wired and grounded to prevent an electrical shock.
Use one hand, when possible, to connect or disconnect signal cables to prevent a possible shock from touching two surfaces with different electrical potentials.
During an electrical storm, do not connect cables for display stations, printers, telephones, or station protectors for communications lines.
D06
ix
x
CAUTION: This product is equipped with a four-wire (three-phase and ground) power cable for the user’s safety. Use this power cable with a properly grounded electrical outlet to avoid electrical shock.
C27
DANGER
To prevent electrical shock hazard, disconnect all power cables from the electrical outlet before relocating the system.
D01
CAUTION: Metal edges might be sharp.
C38
Eserver pSeries 655 User’s Guide
1
of an
Laser Safety Information
CAUTION: This product may contain a CD-ROM, DVD-ROM, or laser module on a PCI card, which are class 1 laser products.
C30
Laser Compliance
All lasers are certified in the U.S. to conform to the requirements of DHHS 21 CFR Subchapter J for class
laser products. Outside the U.S., they are certified to be in compliance with the IEC 825 (first edition
1984) as a class 1 laser product. Consult the label on each part for laser certification numbers and approval information.
CAUTION: All IBM laser modules are designed so that there is never any human access to laser radiation above a class 1 level during normal operation, user maintenance, or prescribed service conditions. Data processing environments can contain equipment transmitting on system links with laser modules that operate at greater than class 1 power levels. For this reason, never look into the end
optical fiber cable or open receptacle. Only trained service personnel should perform the
inspection or repair of optical fiber cable assemblies and receptacles.
C26
C25,
Preface
xi
xii
Eserver pSeries 655 User’s Guide
or
Data Integrity and Verification
IBM computer systems contain mechanisms designed to reduce the possibility of undetected data corruption
loss. This risk, however, cannot be eliminated. Users who experience unplanned outages, system failures, power fluctuations or outages, or component failures must verify the accuracy of operations performed and data saved or transmitted by the system at or near the time of the outage or failure. In addition, users must establish procedures to ensure that there is independent data verification before relying on such data in sensitive or critical operations. Users should periodically check the IBM support websites for updated information and fixes applicable to the system and related software.
xiii
xiv
Eserver pSeries 655 User’s Guide
v To
v To
v
v
v
About This Book
This book provides information on how to use the server, use diagnostics, use service aids, and verify server operation. This book also provides information to help you solve some of the simpler problems that might occur.
ISO 9000
ISO 9000 registered quality systems were used in the development and manufacturing of this product.
Highlighting
The following highlighting conventions are used in this book:
Bold
Identifies commands, subroutines, keywords, files, structures, directories, and other items whose names are predefined by the system. Also identifies graphical objects such as buttons, labels, and icons that the user selects.
Italics
Monospace
Identifies parameters whose actual names or values are to be supplied by the user.
Identifies examples of specific data values, examples of text similar to what you might see displayed, examples of portions of program code similar to what you might write as a programmer, messages from the system, or information you should actually type.
Accessing Information
Documentation for the IBM Eserver pSeries is available online. Visit the IBM Eserver pSeries Information Center at http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base.
access the pSeries publications, click Hardware documentation.
view information about the accessibility features of Eserver pSeries hardware and the AIX operating
system, click AIX and pSeries accessibility.
References to AIX Operating System
This document may contain references to the AIX operating system. If you are using another operating system, consult the appropriate documentation for that operating system.
This document may describe hardware features and functions. While the hardware supports them, the realization of these features and functions depends upon support from the operating system. AIX provides this support. If you are using another operating system, consult the appropriate documentation for that operating system regarding support for those features and functions.
Related Publications
The following publications provide related information:
The System Unit Safety Information, order number SA23-2652, contains translations of safety information used throughout this book.
The Site and Hardware Planning Information, order number SA38-0508, contains information to help you plan your installation.
The Eserver pSeries 655 Service Guide, order number SA38-0618, contains reference information, maintenance analysis procedures (MAPs), error codes, removal and replacement procedures, and a parts catalog.
xv
to
v
v
v
v
v
v
v
v
v
v
v
v
v
The Eserver pSeries 655 Installation Guide, order number SA38-0616, contains information about how
set up and cable the server, install additional processors and subsystems, and verify server
operation.
The IBM Hardware Management Console for pSeries Installation and Operations Guide, order number SA38-0590, provides information to system administrators about how to install and use a Hardware Management Console (HMC) to manage a system.
The IBM Hardware Management Console for pSeries Maintenance Guide, order number SA38-0603, provides information about how to service a Hardware Management Console (HMC).
The RS/6000 and Eserver Diagnostic Information for Multiple Bus Systems, order number SA38-0509, contains diagnostic information, service request numbers (SRNs), and failing function codes (FFCs).
The RS/6000 and Eserver Adapters, Devices and Cable Information for Multiple Bus Systems, order number SA38-0516, contains information about adapters, devices, and cables for your server. This manual is intended to supplement the service information found in the RS/6000 and Eserver Diagnostic Information for Multiple Bus Systems.
The RS/6000 and Eserver PCI Adapter Placement Reference, order number SA38-0538, contains information regarding slot restrictions for adapters that can be used in this system.
The AIX Installation Guide and Reference, order number SC23-4389, describes how to install the operating system, to use a network server, and to install the operating system and run diagnostics on systems connected to a network.
The AIX Installation in a Partitioned Environment, order number SC23-4384 , provides information about installing AIX in a partitioned environment.
Trademarks
The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both:
AIX
Eserver
IBM
pSeries
RS/6000
company, product, and service names may be trademarks or service marks of others.
Other
xvi
Eserver pSeries 655 User’s Guide
v
v
Chapter 1. Reference Materials
Note: This document may contain references to the AIX operating system. If you are using another
operating system, consult the appropriate documentation for that operating system.
This document may describe hardware features and functions. While the hardware supports them, the implementation of these features and functions depends on support from the operating system. AIX provides this support. If you are using another operating system, consult the appropriate documentation for that operating system regarding support for those features and functions.
This chapter helps you get started with installing and configuring the Eserver pSeries environment. The following information is included in the chapter:
Eserver pSeries Roadmap
Documentation Overview - Brief description of the printed and softcopy documentation shipped including targeted audience
Eserver pSeries Roadmap helps you locate marketing, service, and customer task information. The
The roadmap guides you through the tasks and the publications that document those tasks.
1
2
Marketing and Customer Tasks
Begin
Managed
by HMC
?
Yes
No
Site and Hardware Planning Information Planning for Partitioned-System Operations
Hardware Management Console Installation and Operations Guide
Planning
Planning
Hardware
Installation
Planning for Partitioned-System Operations
AIX Installation in a Partitioned Environment AIX Installation Guide
and Reference Operating System Installation: Getting Started
Installing/Configuring
the Operating System
Site and Hardware Planning Information
Installer Tasks
Hardware Installation Guide
Hardware Management Console Installation and Operations Guide
Customer Tasks
Configuring
Partitions
Installing/Configuring
the Operating System
Installing/Configuring
Applications
AIX Installation in a Partitioned Environment
Application Documentation AIX Documentation Library
Yes
Hardware
Installation
Is System
Using
Partitions
?
No
Configuring Full
System Partition
Using the System
The publications listed in this section are available online. To access the online books, visit our IBM
Eserver pSeries Information Center at http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base.
Eserver pSeries 655 User’s Guide
Hardware User's Guide
AIX Documentation Library Application Documentation
Documentation Overview
This section provides descriptions and target audience information for the Eserver pSeries and AIX 5L documentation libraries. Some of the documentation may only be available in softcopy form. Based on the documentation content, the books are divided into the following categories: Planning, Installing and Configuring, and Using the System.
Table 1. Planning
Documentation Title
Site and Hardware Planning Information
Planning for Partitioned-System Operations
Hardware Management Console for pSeries Installation and Operations Guide
Description
Contains information to help plan for site preparation tasks, such as floor-planning, electrical needs, air conditioning, and other site-planning considerations.
Describes planning considerations for partitioned systems, including information on dynamic partitioning and Capacity Upgrade on Demand.
Provides information on how to install, configure, and use a Hardware Management Console (HMC). Logical partition (LPAR) tasks, such as configuring and managing partitions on multiple host servers, are included.
Audience
Marketing, system administrators
System administrators
System administrators
Type
softcopy
printed and softcopy
printed and softcopy
Chapter 1. Reference Materials
3
4
Table 2. Installing and Configuring
Documentation Title
Hardware Installation Guide
Planning for Partitioned-System Operations
Hardware Management Console for pSeries Installation and Operations Guide
AIX Installation in a Partitioned Environment
AIX Operating System Installation: Getting Started
AIX 5L Installation Guide and Reference
PCI Adapter Placement Reference
AIX 5L Release Notes
AIX 5L Documentation CD
Description
Provides information on how to install system hardware, cable the system, and verify operations.
Describes planning considerations for partitioned systems, including information on dynamic partitioning and Capacity Upgrade on Demand.
Provides information on how to install, configure, and use a Hardware Management Console (HMC). Logical partition (LPAR) tasks, such as configuring and managing partitions on multiple host servers, are included.
Provides information on how to install the AIX operating system in an LPAR environment.
Provides information on how to install and configure the AIX operating system on a standalone system using a CD-ROM device.
Provides information on installing the AIX 5L operating system on standalone systems, as well as on client systems using the Network Installation Management (NIM) interface.
Outlines system-specific PCI adapter slot placement and adapter support configurations.
Provides late-breaking information for a specific AIX release.
AIX documentation library (system management guides, user guides, application programmer guides, commands and files references, AIX man pages, and so on).
Audience
Type
System installer printed and
softcopy
System administrators
System administrators
System administrators
System administrators
System administrators
System
printed and softcopy
printed and softcopy
printed and softcopy
printed and softcopy
printed and softcopy
softcopy administrators, service personnel
System administrators
System
printed and
softcopy
softcopy administrators
Eserver pSeries 655 User’s Guide
Table 3. Using the System
Documentation Title
Hardware Management Console for pSeries Installation and Operations Guide
Description
Provides information on how to install, configure, and use a Hardware Management Console (HMC). Logical partition (LPAR) tasks, such as configuring and managing partitions on multiple host servers, are included.
Hardware User’s Guide
Provides using, problem determination, and service processor information.
Diagnostic Information for Multiple Bus Systems
Combines operating instructions for hardware diagnostic programs with common MAPs and SRNs (Service Request Numbers).
PCI Adapter Placement Reference
Hardware Management Console for pSeries Maintenance Guide
Adapters, Devices, and Cable Information for Multiple Bus Systems
Outlines system-specific PCI adapter slot placement and adapter support configurations.
Contains MAPs, removal and replacement, error code, and parts information to help diagnose and repair the system.
Provides information about adapters, devices, and cables that are attached to or used within the system.
System Unit Safety Information Contains the English version of safety notices, as
well as translations of those safety notices into other languages.
AIX 5L Documentation CD
AIX documentation library (system management guides, user guides, application programmer guides, commands and files references, AIX man pages, and so on).
Audience
System administrators
System administrators
Type
printed and softcopy
printed and softcopy
Service personnel printed and
softcopy
System
softcopy administrators, service personnel
Service personnel printed and
softcopy
System administrators
System administrators,
printed and
softcopy
printed and
softcopy service personnel
System
softcopy administrators
Chapter 1. Reference Materials
5
6
Eserver pSeries 655 User’s Guide
v
-
-
-
-
32
v
v
up to 16
Chapter 2. Introducing the pSeries 655
The pSeries 655 is a shared multiprocessor server.
The processor subsystem can be configured (or partitioned) as multiple separate systems. This configuration is known as a logically partitioned system. The processor subsystem is described in “Processor Subsystem” on page 8.
Other options for a rack configuration include the 7040 Model 61D I/O Subsystem and 7040 Model W42 Integrated Battery Feature (IBF) for the power subsystem.
The following components comprise configurations of the pSeries 655 system:
Bulk Power Assembly (BPA). The BPA is the main power control unit for the pSeries 655 system. This
redundant bulk power assembly distributes power at 350 V to each drawer where conversion is made to the required chip level. The BPA contains the following components:
Bulk Power Regulator (BPR)
Bulk Power Controller (BPC)
Bulk Power Distributor (BPD)
Bulk Power Jumper (BPJ) - optional
Processor Subsystem. The processor subsystem is a 4 EIA-unit-high drawer. The processor
v
subsystem contains:
One multichip module (MCM) processor operating at one of the following speeds:
1.1 GHz (8-way)
1.3 GHz (4-way)
1.5 GHz (8-way)
1.7 GHz (4-way
Four L3 cache modules
Two externalized RIO or RIO-2 ports
One pair of internal RIO or RIO-2 ports servicing three PCI-X slots, two 10/100 Ethernet adapters, a dual SCSI adapter, and a service processor card
Fan assembly
Two 3.5-inch SCSI DASD bays
Four memory cards with a minimum of 4 GB and a maximum of 64 GB of memory
The processor subsystem contains the Distributed Converter Assembly (DCA) used in the conversion of 350 V bulk power to the supply voltages required by the various internal components.
The minimum memory required to operate this system is 4 GB, and the maximum amount of memory is
GB.
Integrated Battery Feature (IBF). The IBF is a 2 EIA-unit-high drawer that can be added to your
system. The IBF is optional and provides backup electric power in case of a power outage. You can install up to six IBFs in the rack configuration. The total number of IBFs that can be installed depends upon the number of processor subsystems and I/O subsystems installed in the rack.
I/O Subsystem. Each I/O subsystem is a 4 EIA-unit-high subsystem containing up to two I/O boards,
disk drives, four DASD backplanes, a midplane card, four cooling fans, and two power supplies (which are independent of the bulk power assembly). A rack can have up to five I/O subsystems with each drawer having 20 PCI card slots, and more than 500 GB of storage.
Note: If your system configuration contains IBFs, the rack drawer space may be limited for I/O
subsystems.
7
1
2
3
8
v
Hardware Management Console (HMC). The HMC consists of a display, independent processor,
keyboard, and mouse. One HMC is standard for all systems. An additional HMC is optional. Two HMCs can attach to one processor subsystem, or two HMCs can jointly manage up to 16 processor subsystems in up to four racks with the use of 8-port asynchronous adapters and 128-port asynchronous adapters.
For more information about the use of logical partitioned systems, see the “Partitioned System Overview.”
Processor Subsystem
The pSeries 655 Model 651 (processor subsystem) is a processor node installed in a frame-mounted cage. The equipment rack holds a maximum of 16 processor subsystems.
Rear View
2
3
1
Front View
Frame Cage (shown with two processor subsystems and front cover removed) First pSeries 655 Processor Subsystem Second pSeries 655 Processor Subsystem
The Hardware Management Console for pSeries (HMC) is used to manage the resources in the system. The system can be configured as a full system partition, which means that all resources of the system are used as a single system.
The system can also be configured into multiple (or logical) partitioned systems. With a logically partitioned system, system resources can be divided into a number of systems, each running in its own partition.
Numerous configurations of pSeries 655 systems can be managed from one HMC. A second HMC can be used for redundancy.
Partitioned System Overview
Partitioning enables users to configure a single computer into several independent systems. Each of these systems, called logical partitions, is capable of running applications in its own independent environment. This independent environment contains its own operating system, its own set of system processors, its own set of system memory, and its own I/O adapters.
Eserver pSeries 655 User’s Guide
A
A
A
The HMC allows you to perform many hardware management tasks for your managed system, including configuring logical partitions. You can choose to operate your managed system as a single server (called full system partition), or you can choose to run multiple partitions.
Partition Profiles
profile defines a configuration setup for a managed system or partition. The HMC allows you to create multiple profiles for each managed system or partition. You can then use the profiles you created to start a managed system or partition in a particular configuration.
partition does not actually own any resources until it is activated; resource specifications are stored within partition profiles. The same partition can operate using different resources at different times, depending on the profile you activate.
When you activate a partition, you enable the system to create a partition using the set of resources in a profile created for that partition. For example, a logical partition profile might indicate to the managed system that its partition requires three processors, 2 gigabytes of memory, and I/O slots 6, 11, and 12 when activated.
You can have more than one profile for a partition. However, you can only activate a partition with one profile at a time.
When you create a partition profile, the HMC shows you all the resources available on your system. The HMC does not, however, verify if another partition profile is currently using a portion of these resources. For example, the HMC might show eight processors on your system, but does not notify you that other partitions are using six of them. You can create two partition profiles, each using a majority of system resources. If you attempt to activate both of these partitions at the same time, the second partition in the activation list fails.
System Profiles
Using the HMC, you can create and activate often-used collections of predefined partition profiles. A collection of predefined partition profiles is called a system profile. The system profile is an ordered list of partitions and the profile that is to be activated for each partition. The first profile in the list is activated first, followed by the second profile in the list, followed by the third, and so on.
The system profile helps you change the managed systems from one complete set of partition configurations to another. For example, a customer may want to switch from using 8 partitions to using only four, every day. To do this, the system administrator deactivates the 8 partitions and activates a different system profile, one specifying four partitions.
When you create a group of affinity partitions, the HMC automatically creates a system profile that includes all of the affinity partitions that you created.
Types of Partitions
The HMC allows you to use two types of partitions: logical partitions and the full system partition.
Logical Partitions
Logical partitions are user-defined system resource divisions. Users determine the number of processors, memory, and I/O that a logical partition can have when active.
Full System Partition
special partition called the full system partition assigns all of your managed system’s resources to one large partition. The full system partition is similar to the traditional, non-partitioned method of operating a system. Because all resources are assigned to this partition, no other partitions can be started when the full system partition is running. Likewise, the full system partition cannot be started while other partitions are running.
Chapter 2. Introducing the pSeries 655
9
10
The HMC allows you to easily switch from the full system partition to logical partitions. The actual setup of the operating system in a partition may require some careful planning to ensure no conflicts exist between the two environments.
For more detail about partitions, see the IBM Hardware Management Console for pSeries Installation and Operations Guide.
Eserver pSeries 655 User’s Guide
On
1
11
21
2
12
22
3
13
23 SP
4
14 L3
24
5
15 L3
25
6
16 L3
26
7
17 L3
27
8
18
28
9
19
10
20
System Attention LEDs
the processor subsystem, there are two system attention LEDs, one on the DCA and the other on the
rear of the system. For the specific locations of these LEDs, refer to the following illustration.
Cable Identify
Port 0
Port 1
6
4
5
3
10
7
8 2 1
12 13
9
26
27
28
Link Status
Enet 0
Enet 1
System Attention
SP Adapter FRU Indicator
Service Processor FRU Indicator
Processor Planar FRU Indicator
11
23
24
25
Hardfile Carrier
Hot Plug Slot ID
Hot Plug Slot Power
FRU Indicator
Bank 2
Bank 1
14
18
16
19
Memory Card 1
Memory Card 2
Memory Card 3
Memory Card 4
15
17
20
21
22
Activity Identify
Hardfile 1
DASD Cage FRU Identify
DASD Backplane
DCA
Power In
Power Out
System Attention
DCA FRU Identify
Fan FRU Identify
Activity Identify
Hardfile 2
Light Pipes to make LED visible with File plugged
Ethernet Port 1 Ethernet Port 0 RIO (or RIO-2) Port 0
(A0) RIO (or RIO-2) Port 1
(A1) RIO (or RIO-2) Adapter Memory Card Slot 1 Memory Card Slot 2 Memory Card Slot 3
Memory Card Slot 4 Memory Controller 0
The two LEDs are tied together, so they will always be in the same state, either both on or both off. They are referred to as the system attention LEDs. The system attention LED is turned on when an entry is
Memory Controller 1 Memory Controller 3 Memory Controller 2
Cache Module 0
Cache Module 1 Cache Module 3 Cache Module 2
MCM Module (processor)
RIO/PCI-X Bridge SCSI Controller
DASD Ribbon Cable Connector Service Processor/VPD Card Connector
Adapter Connector
PCI-X Bridge (PH2)
PCI-X Bridge (PH0) PCI Adapter Slot 1 PCI Adapter Slot 2 PCI Adapter Slot 3
Chapter 2. Introducing the pSeries 655
11
If a
v
v
v As an
of
If
1.
2.
If
12
made in the service processor error log that gets transmitted to the system-level error logs (the AIX error log and the service action event log in the Service Focal Point application). When the attention light comes on, examine these error logs to see if user intervention is required.
hardware problem is indicated, call service support.
System Attention LED
The system attention LED lights and stays on when an event occurs that needs either customer intervention or IBM service. The system attention LED lights when an entry is made in the service processor error log. The error entry is transmitted to the following:
System-level error logs
AIX error log
entry in the service action event log in the Service Focal Point application; for example, the loss
surveillance from the HMC to a logical partition.
Accessing System Log Error Information
When an error is detected by the system, information about the error is stored in system error logs. The error logs are accessed from the HMC that is used to manage the system.
the system attention LED lights, use the HMC to access the error logs by doing the following:
Open the Service Action Event Log in the Service Focal Point application on the HMC.
Check the open service events using the procedure described in ″Working with Serviceable Events in the IBM Hardware Management Console for pSeries Installation and Operations Guide.
Note:
For information on these Service Focal Point settings, see ″Setting Up Surveillance and Enabling Surveillance Notifications in the IBM Hardware Management Console for pSeries Installation and Operations Guide.
Events requiring customer intervention are marked Call Home Candidate? NO. For each of these events, examine the description in the serviceable event error details. If actions are listed in the description, perform those actions. If the error indicates a loss of surveillance between the HMC and a partition, check the status of the partition, the network, and the cabling between the HMC and the partition. Correct any problems found. If a surveillance problem is still indicated, call service support.
the service action event is labeled Call Home Candidate? YES, an error code is supplied to identify the
problem. If the system is configured to automatically call home on error, the request for service is placed.
Note: The system attention LED can be reset by following the procedure described in “Resetting the
System Attention LED” on page 13.
PCI Adapters
For complete information about removing and replacing PCI adapters, see Appendix C, “Removing and Replacing PCI Adapters,” on page 127.
Eserver pSeries 655 User’s Guide
To
1. On
2.
3. In
4. In
5.
As a
1.
2. On
3.
4.
If
do
1.
2.
3.
Resetting the System Attention LED
The system attention LED can be turned off by using the HMC or either of the alternate methods that are described.
Resetting the System Attention LED from the HMC
reset the system attention LED from the HMC, do the following:
the HMC interface, click Service Applications.
Double-click Service Focal Point.
the Contents area of the screen, select Hardware Service Functions. The LED Management
window opens.
the LED Management window, select one or more managed systems from the table.
Select Deactivate LED. The associated system attention LED is turned off.
more information about the virtual operator panel on the HMC, see the IBM Hardware Management
For Console for pSeries Installation and Operations Guide.
Alternate Method of Resetting the System Attention LED Using the AIX Command Line
user with root authority, enter diag on the AIX command line, and do the following:
Select Task Selection.
the Task Selection Menu, select Identify and Attention Indicators.
When the list of LEDs displays, use the cursor to highlight Set System Attention Indicator to Normal.
Press Enter, and then press F7 to commit. This action turns off the LED.
Alternate Method of Resetting the System Attention LED Using the Service Processor
the system is powered off, access the service processor menus. From the service processor main menu,
the following:
Select the System Information Menu.
Select LED Control Menu.
Select Clear System Attention Indicator. This action turns off the LED.
Chapter 2. Introducing the pSeries 655
13
14
Eserver pSeries 655 User’s Guide
of
v
v
v
in
To
1.
2. To
If
3.
4. To
5.
To
Chapter 3. Using the Hardware Management Console for pSeries
This chapter discusses the Hardware Management Console for pSeries (HMC), system power control, and the power-on self-test that occurs after powering on the system.
Hardware Management Console (HMC) Overview and Setup
The HMC uses its connection to the processor subsystem to perform various functions. The main functions
the HMC include:
Creating and maintaining a multiple-partition environment
Detecting, reporting, and storing changes in hardware conditions
Acting as a service focal point for service representatives to determine an appropriate service strategy
Note:
The HMC is shipped with the Eserver pSeries 655 and is the main interface for configuring and managing resources on this system through the HMC virtual terminal window. Virtual terminal
window refers to the operating system session on a particular window. You can have up to 16
virtual terminal windows.
All the tasks you need to maintain the interface, the underlying operating system, and the HMC application code are available by using the HMC’s management applications.
For more information about the HMC, refer to the IBM Hardware Management Console for pSeries Installation and Operations Guide.
System Power-On Methods
The HMC is used to power on the managed system. The managed system will reboot in the same mode
which it was previously booted. (If the managed system was previously booted in partitioned system
mode, all partitions will automatically start and run.)
Powering On the Processor Subsystem Using the HMC
power on the processor subsystem using the HMC, do the following:
Log in to the HMC with your user ID and password. For more information about HMC user IDs and passwords, refer to the IBM Hardware Management Console for pSeries Installation and Operations Guide.
select your preferred partition environment, click on the Partition Management icon under the HMC
host name. The Contents area now lists the processor subsystem as available as a managed system.
you have only one processor subsystem, the Contents area lists the processor subsystem as
System A.
Select the appropriate managed system.
power on the managed system, select the desired system in the Contents area. Next, on the menu,
choose Selected.
Select Power On.
Powering Off the Processor Subsystem Using the HMC
Attention: Shut down the partitions before powering off the processor subsystem.
power off the processor subsystem using the HMC, do the following:
15
To
If
2.
3. To
4.
5. A
v
v
17
16
1.
select your preferred partition environment, click on the Partition Management icon under the HMC
host name. The Contents area now lists the processor subsystem as available as a managed system.
you have only one processor subsystem, the Contents area lists the processor subsystem as
System A.
Select the appropriate managed system.
power off the managed system, select the desired system in the Contents area. Next, on the menu,
choose Selected.
Select Power Off.
screen displays to verify that you want to power off. Select Yes.
Note:
Only logic power will be removed; 350 V dc power will still be present within the system.
Graphics Console Support
The pSeries 655 Model 651 Processor Subsystem supports graphics consoles. Graphics console support requires the following adapters:
Graphics adapter with a graphics display attached
Universal Serial Bus (USB) adapter with a keyboard and mouse attached
one graphics console is supported per system partition. If the system is running partitions, up to eight
Only partitions can have graphics consoles.
The graphics console is functional only when AIX is running. For any installation or service processor functions, you must use the HMC.
Understanding the Power-On Self-Test (POST)
After power is turned on and before the operating system is loaded, the partition does a power-on self-test (POST). This test performs checks to ensure that the hardware is functioning correctly before the operating system is loaded. During the POST, a POST screen displays, and POST indicators appear on the virtual terminal window. The next section describes the POST indicators and functions that can be accessed during the POST.
POST Indicators
POST indicators indicate tests that are being performed as the partition is preparing to load the operating system. The POST indicators are words that display on the virtual terminal window. Each time that the firmware starts another different step in the POST, a POST indicator word appears on the console. Each word is an indicator of the tests that are being performed.
The POST screen displays the following words:
Memory
Memory test
Keyboard
Initialize the keyboard and mouse. The time period for pressing a key to access the System Management Services, or to initiate a service mode boot, is now open. See “POST Keys” on page
for more information.
Network
Self-test on network adapters
SCSI
Adapters are being initialized
Eserver pSeries 655 User’s Guide
A
1
5
6
8
To
Speaker
speaker is not implemented on this system
POST Keys
The POST keys, if pressed after the keyboard POST indicator displays and before the last (speaker) POST indicator displays, cause the system to start services or to initiate service mode boots used for configuring the system and diagnosing problems. The keys are described below:
Note: The program function keys (F1-F12) on a keyboard attached to the HMC or USB card are not used
and will be ignored. After the keyboard POST indicator displays, you must use the numeric number keys to enter input.
Key
The numeric 1 key, when pressed during POST, starts the System Management Services (SMS) interface.
Key
The numeric 5 key, when pressed during POST, initiates a system boot in service mode using the default service mode boot list.
Key
The numeric 6 key works like the numeric 5 key, except that the firmware uses the customized service mode bootlist.
Key
This option is used by service personnel. To enter the open firmware command line, press the numeric 8 key after the word keyboard displays and before the last word speaker displays during startup. After you press the 8 key, the remaining POST indicators display until initialization completes.
When initialization and POST are complete, the open firmware command line (an OK prompt) displays.
Note:
This option should only be used by service personnel to obtain additional debug information.
exit from the open firmware command prompt, type reset-all or power off the system and reboot.
Chapter 3. Using the Hardware Management Console for pSeries
17
18
Eserver pSeries 655 User’s Guide
v
v
It is
It is
Chapter 4. Using the Service Processor
Note: The information in this chapter regarding the configuring of serial ports does not apply to the serial
ports, or modems attached to those serial ports, on the Hardware Management Console (HMC).
The service processor runs on its own power boundary and continually monitors hardware attributes and the environmental conditions within the system. The service processor is controlled by firmware and does not require the operating system to be operational to perform its tasks.
The service processor menus allow you to configure service processor options, as well as enable and disable functions.
Service processor menus are available using an HMC virtual terminal window when OK is displayed on the virtual operator panel or when the service processor has detected a server problem (such as a surveillance failure).
Service Processor Menus
The service processor menus are divided into the following groups:
General user menu - the user must know the general-access password.
Privileged user menus - the user must know the privileged-access password.
the server is powered off, the service processor menus can be accessed on the HMC.
If
Accessing the Service Processor Menus
Service processor menus are accessed by opening a virtual terminal window on the HMC. After OK displays in the virtual operator panel on the HMC, press any key on the keyboard to signal the service processor.
When you gain access, the service processor prompts you for a password (if one is set), and when verified, displays the service processor menus.
The service processor menu prompt, represented by 0> on the HMC, indicates the serial port to which the terminal is connected.
Saving and Restoring Service Processor Settings
All the settings that you make (except language) from the service processor menus can be backed up either for recovering from a fault that may corrupt these settings, or for replicating these settings to other servers that include a service processor.
The service aid, Save or Restore Hardware Management Policies, can be used to save your settings after initial setup or whenever the settings must be changed for system operation purposes.
strongly recommended that you use this service aid for backing up service processor settings to protect the usefulness of the service processor and the availability of the server. Refer to ″Save or Restore Hardware Management Policies, in the ″Introduction to Tasks and Service Aids section of the RS/6000 and Eserver Diagnostic Information for Multiple Bus Systems.
strongly recommended that you use this service aid for backing up service processor settings to protect the usefulness of the service processor and the availability of the server. For information about this service aid, refer to “Save or Restore Hardware Management Policies,” in “Introduction to Tasks and Service Aids”.
19
1. At
2. At
3.
1.
2.
3.
4.
5.
6.
0> v
v
v
v
If
an
v
20
If this task cannot be run, or the service processor settings were not previously backed up, the settings should be recorded manually. To record the settings manually, do the following:
the service processor main menu, select Option 3, ″System Information Menu.
the system information menu, select Option 3,, ″Read Service Processor Configuration.
Manually record the settings.
Menu Inactivity
The service processor exits menu mode after ten minutes of inactivity and displays a message indicating that it has done so. Pressing any key on the virtual terminal window causes the main menu to display.
General User Menu
The menu options presented to the general user are a subset of the options available to the privileged user. The user must know the general-access password, if one is set, to access this menu.
GENERAL USER MENU
Power-on System
Power-off System
Read VPD Image from Last System Boot
Read Progress Indicators from Last System Boot
Read Service Processor Error Logs
Read System POST Errors
99. Exit from Menus
Power-on System
Allows the user to start the system using the current virtual operator panel on the HMC.
Power-off System
This option is not available on this system.
Read VPD Image from Last System Boot
Displays manufacturer vital product data, such as serial numbers and part numbers, that were stored from the system boot prior to the one in progress now, for the entire system.
Read Progress Indicators from Last System Boot
Displays a number of the boot progress indicators, which may include service processor checkpoints, IPL checkpoints, or AIX configuration codes, from the previous system boot. This information can be useful in diagnosing system faults.
Note:
you are running one or more logical partitions, enter the partition ID (0-15) to display progress indicators for that partition since the last system boot. If your system is running in Full System Partition mode, this option automatically displays details from partition zero.
The progress indicator codes are listed from top (latest) to bottom (oldest).
This information is not stored in nonvolatile storage. If the system is powered off using the HMC, this information is retained. If the ac power is disconnected from the system, this information will be lost. For
example, refer to “LCD Progress Indicator Log” on page 49.
Read Service Processor Error Logs
Displays the service processor error logs. For an example, refer to “Service Processor Error Logs” on page 48.
Eserver pSeries 655 User’s Guide
v
v
Read System POST Errors
Displays additional error log information (this option is only for service personnel).
Exit from Menus
Selecting this option will exit the service processor menus. You can re-enter the menus by pressing any key on the console.
Chapter 4. Using the Service Processor
21
A
v
v
v
1.
2.
3.
4.
5.
6.
0> v
v
v
v
v
v
22
Privileged User Menus
The following menus are available to privileged users only. The user must know the privileged-access password, if one is set, to access these menus.
Main Menu
listing at the top of the main menu contains the following:
Your system’s current firmware version
The firmware copyright notice
The system name given to your server during setup
need the firmware version for reference when you either update or repair the functions of your service
You processor.
The system name, an optional field, is the name that your server reports in problem messages. This name helps your support team (for example, your system administrator, network administrator, or service representative) to more quickly identify the location, configuration, and history of your server. Set the system name, from the main menu, using option 6.
Note: The information under the Service Processor Firmware heading in the following Main Menu
illustration is example information only.
Service Processor Firmware VERSION: RH011007 Copyright 2001 IBM Corporation
SYSTEM NAME
MAIN MENU
Service Processor Setup Menu
System Power Control Menu
System Information Menu
Language Selection Menu
Call-In/Call-Out Setup Menu
Not Supported
Set System Name
99. Exit from Menus
Service Processor Setup Menu
See “Service Processor Setup Menu” on page 23 for more information.
System Power Control Menu
See “System Power Control Menu” on page 27 for more information.
System Information Menu
See “System Information Menu” on page 30 for more information.
Language Selection Menu
See “Language Selection Menu” on page 38 for more information.
Call-In/Call-Out Setup Menu
This function is not available on this system.
Set System Name
Eserver pSeries 655 User’s Guide
v
1.
2.
3.
4.
5. OS
6.
7.
8.
9.
0>
Allows setting of the system name.
Reset all L3 Cache Module Records
Clears L3 Gard records after a repair action.
Note: This is a hidden menu option for use only by a service representative.
Service Processor Setup Menu
The following Service Processor Setup Menu is accessed from the Main Menu:
SERVICE PROCESSOR SETUP MENU
Change Privileged Access Password
Change General Access Password
Enable/Disable Console Mirroring:
Not Supported
Start Talk Mode
Not Supported
Surveillance Setup Menu
Reset Service Processor
Reprogram Flash EPROM Menu
Not Supported
Serial Port Snoop Setup Menu
Not Supported
Scan Log Dump Setup Menu:
Currently As Needed
98. Return to Previous Menu
99. Exit from Menus
Note: Unless otherwise stated in menu responses, settings become effective when a menu is exited using
option 98 or 99.
Chapter 4. Using the Service Processor
23
If
If
v
v
is
is
v
v
v OS
24
Passwords
Passwords can be any combination of up to eight alphanumeric characters. You can enter longer passwords, but the entries are truncated to include only the first eight characters. The privileged access password can be set from service processor menus or from System Management Services (SMS) utilities (see Chapter 5, “Using System Management Services,” on page 53). The general access password can be set only from service processor menus.
For security purposes, the service processor counts the number of attempts to enter passwords. The results of not recognizing a password within this error threshold are different, depending on whether the attempts are being made locally (at the server) or remotely (through a modem). The error threshold is three attempts.
the error threshold is reached by someone entering passwords at the server, the service processor commands the server to resume the initial program load (IPL). This action is taken based on the assumption that the server is in an adequately secure location with only authorized users having access. Such users must still successfully enter a login password to access the operating system.
the error threshold is reached by someone entering passwords remotely, the service processor commands the server to power off to prevent potential security attacks on the server by unauthorized remote users. The following table lists what you can access with the privileged-access password and the general-access password.
Privileged Access
Password
None
Set
General Access
Password
None
None
Resulting Menu
Service processor MAIN MENU displays.
Users with the password see the service processor MAIN MENU. Users without password cannot log in.
Set
Set
Users see menus associated with the entered password.
Change Privileged-Access Password
Set or change the privileged-access password. It provides the user with the capability to access all service processor functions. This password is usually used by the system administrator or root user.
Change General-Access Password
Set or change the general-access password. It provides limited access to service processor menus, and
usually available to all users who are allowed to power on the server, especially remotely.
Note: The general-access password can only be set or changed after the privileged access password
set.
Enable/Disable Console Mirroring
This function is not available on this system.
Start Talk Mode
This function is not available on this system.
Surveillance Setup Menu
Note: This option is disabled in partitioned systems.
This menu can be used to set up operating system (OS) surveillance.
Eserver pSeries 655 User’s Guide
1.
2.
2
3.
2
0>
on
v
If
v
v
v
A
Go to
OS Surveillance Setup Menu
Surveillance:
Currently Enabled
Surveillance Time Interval:
minutes
Surveillance Delay:
minutes
98. Return to Previous Menu
Surveillance
Can be set to Enabled or Disabled.
Surveillance Time Interval
Can be set to any number from 2 through 255.
Surveillance Delay
Can be set to any number from 0 through 255.
For more information about surveillance, refer to “Service Processor System Monitoring - Surveillance”
page 47.
Reset Service Processor
this option is selected, entering Y causes the service processor to reboot.
Reprogram Flash EPROM Menu
This function is not available on this system.
Serial Port Snoop Setup Menu
This function is not available on this system.
Scan Log Dump Setup Menu
scan dump is the collection of chip data that the service processor gathers after a system malfunction, such as a checkstop or hang. The scan dump data may contain chip scan rings, chip trace arrays, and SCOM contents.
The scan dump data are stored in the system control store. The size of the scan dump area is approximately 4 MB.
During the scan log dump, A8xx (in the range A810 to A8FF) displays in the operator panel value on the HMC. The xx characters will change as the scan log dump progresses. If the xx characters do not change after five minutes, the service processor is hung and must be reset.
When the scan log dump is complete, depending on how the reboot policy is set, the system will either:
the standby state (and the service processor menus will be available), indicated by OK or
STBY in the virtual operator panel on the HMC.
OR
Attempt to reboot.
Chapter 4. Using the Service Processor
25
1.
2.
3.
0>
2 = As
3 =
2
1.
2.
3.
0>
1 = As
2 =
3 =
26
Scan Log Dump Setup Menu
Scan Log Dump Policy:
Currently As Needed
Scan Log Dump Content:
Currently As Requested
Immediate Dump
98. Return to Previous Menu
Select from the following options: (As Needed=2, Always=3)
Enter New Option:
The scan log dump policy can be set to the following:
Needed
The processor run-time diagnostics record the dump data based on the error type. This is the default value.
Always
Selecting this option allows the service processor to record a scan log dump for all error types.
scan log dump policy can also be set from the Tasks menu in the AIX service aids.
The
Option
displays the following screens:
Scan Log Dump Setup Menu
Scan Log Dump Policy:
Currently As Needed
Scan Log Dump Content:
Currently As Requested
Immediate Dump
98. Return to Previous Menu
Select from the following options: (As Requested=1, Optimum=2, Complete=3, Minimum=4)
Enter New Option:
The scan log dump content can be set to the following:
Requested
The processor run-time diagnostics will select the contents of the dump file based on the type of error that occurs. This is the default.
Optimum
The dump will include the smallest amount of information to diagnose a hardware error.
Complete
The dump will include as much information as possible to allow the complete analysis of hardware and software errors.
Eserver pSeries 655 User’s Guide
If a
1.
2.
3.
4.
5.
6.
7.
0> v
v
v
4 = Minimum
The dump will include the smallest amount of information possible (a minimum number of hardware scan log rings).
The complete dump will take the longest time to finish; it may take as long as 1.5 hours on a fully configured system.
The scan log dump content can also be set from the Tasks menu in the AIX diagnostic service aids.
valid dump file already exists, the dump control code will stop because the contents of the prior dump
must be protected.
Option 3, Immediate Dump, can only be used when the system is in the standby state with power on. It is used to dump the system data after a checkstop or machine check occurs when the system firmware is running, or when the operating system is booting or running.
System Power Control Menu
This menu is used to set power control options. Other menus that control boot options are available from this menu.
SYSTEM POWER CONTROL MENU
Enable/Disable Unattended Start Mode:
Currently Enabled
Ring Indicate Power-On Menu
Reboot/Restart Policy Setup Menu
Power-On System
Power-Off System
Enable/Disable Fast System Boot
Currently Fast Boot
Boot Mode Menu
98. Return to Previous Menu
99. Exit from Menus
Enable/Disable Unattended Start Mode
Use this option to instruct the service processor to restore the power state of the server after a temporary power failure. This option is intended to be used on servers that require automatic power-on after a power failure.
Ring Indicate Power-On Menu
This function is not available on this system.
Reboot/Restart Policy Setup Menu
The following menu controls the Reboot/Restart Policy:
Chapter 4. Using the Service Processor
27
1.
2.
3.
4.
0>
in
by
On a
OR
If
v
v
v
28
Reboot/Restart Policy Setup Menu
Number of reboot attempts:
Currently 1
Use OS-Defined restart policy?
Currently No
Enable supplemental restart policy?
Currently Yes
Call-Out before restart:
Currently Disabled
98. Return to Previous Menu
Reboot is the process of bringing up the system hardware; for example, from a system reset or power
on. Restart is activating the operating system after the system hardware is reinitialized. Restart must follow a successful reboot.
Number of reboot attempts - If the server fails to successfully complete the boot process, it
attempts to reboot the number of times specified. Entry values equal to or greater than 0 are valid. Only successive failed reboot/restart attempts are counted.
Use OS-Defined restart policy - In a full system partition, this allows the service processor to react
the same way that the operating system does to major system faults by reading the setting of the operating system parameter Automatically Restart/Reboot After a System Crash. This parameter might already be defined, depending on the operating system or its version or level. If the operating system automatic restart setting is defined, it can be set to respond to a major fault by restarting or
not restarting. See your operating system documentation for details on setting up operating
system automatic restarts. The default value is No.
partitioned system, this setting is ignored.
Enable supplemental restart policy - The default setting is Yes. When set to Yes in a full system
partition, the service processor restarts the system when the system loses control as detected by service processor surveillance, and either:
The Use OS-Defined restart policy is set to No.
The Use OS-Defined restart policy is set to Yes, and the operating system has no automatic restart policy.
set to Yes in a partitioned system, the service processor restarts the system when the system
loses control and it is detected by service processor surveillance.
Call-Out before restart (Enabled/Disabled) - If a restart is necessary due to a system fault, and
you are running a full system partition, you can enable the service processor to call out and report the event. This option can be valuable if the number of these events becomes excessive, which might signal a bigger problem.
This setting is ignored on a partitioned system.
Power-On System
v
Allows immediate power-on of the system.
Power-Off System
This option is not available on this system.
Enable/Disable Fast System Boot
Allows the user to select the IPL type, mode, and speed of the system boot.
Attention: Selecting the fast IPL results in several diagnostic tests being skipped and a shorter
memory test being run.
Boot Mode Menu
Eserver pSeries 655 User’s Guide
1.
2.
3.
4.
0> To
The Boot Mode Menu allows you to select a boot mode.
Boot Mode Menu
Boot to SMS Menu:
Currently Disabled
Service Mode Boot from Saved List:
Currently Disabled
Service Mode Boot from Default List:
Currently Disabled
Boot to Open Firmware Prompt:
Currently Disabled
98. Return to Previous Menu
select a boot mode, select a number and press Enter. The item corresponding to the selected
number toggles between Disabled to Enabled. If a boot mode is Enabled, the boot mode selected is performed, and the Disabled/Enabled selection is reset to Disabled. The following describes each boot mode:
Boot to SMS Menu
When this selection is enabled, the system boots to the SMS Menu.
Service Mode Boot from Saved List
This selection causes the system to perform a service mode boot using the service mode boot list
saved in NVRAM. If the system boots AIX from the disk drive and AIX diagnostics are loaded on the
disk drive, AIX boots to the diagnostics menu.
Using this option to boot the system is the preferred way to run online diagnostics.
Service Mode Boot from Default List
This selection is similar to Service Mode Boot from Saved List, except the system boots using the
default boot list that is stored in the system firmware. This is normally used to try to boot customer
diagnostics from the CD-ROM drive or NIM server.
Using this option to boot the system is the preferred way to run standalone diagnostics from
CD-ROM.
Boot to Open Firmware
This option should only be used by service personnel to obtain additional debug information. When
this selection is enabled, the system boots to the open firmware prompt.
Chapter 4. Using the Service Processor
29
1.
2.
3.
4.
5.
6.
7.
8.
9.
0> v
v
an
v
v
v
v
30
System Information Menu
This menu provides access to system configuration information, error logs, system resources, and processor configuration.
SYSTEM INFORMATION MENU
Read VPD Image from Last System Boot
Read Progress Indicators from Last System Boot
Read Service Processor Error Logs
Read System POST Errors
Read NVRAM
Read Service Processor Configuration
Processor Configuration/Deconfiguration Menu
Memory Configuration/Deconfiguration Menu
Power Control Network Utilities Menu
10. LED Control Menu
11. MCM/L3 Interposer Plug Count Menu
12. Performance Mode Setup Menu
13. L3 Mode Menu
14. Remote I/O (RIO) Link Speed Setup Menu
98. Return to Previous Menu
99. Exit from Menus
Read VPD Image from Last System Boot
Displays manufacturer’s vital product data (VPD), such as serial numbers, part numbers, and so on, that was stored from the system boot prior to the one in progress now. VPD from all devices in the system is displayed.
Read Progress Indicators from Last System Boot
Displays a number of the boot progress indicators, which may include Service Processor checkpoints, IPL checkpoints, or AIX configuration codes, from the previous system boot. This information can be useful in diagnosing system faults.
The progress indicator codes are listed from top (latest) to bottom (oldest).
This information is not stored in nonvolatile storage. If the system is powered off using the HMC, this information is retained. If the ac power is disconnected from the system, this information will be lost. For
example, refer to “LCD Progress Indicator Log” on page 49.
Read Service Processor Error Logs
Displays error conditions detected by the service processor. Refer to “Service Processor Error Logs” on page 48 for an example of this error log.
Read System POST Errors
This option should only be used by service personnel to obtain additional debug information.
Read NVRAM
Displays Non Volatile Random Access Memory (NVRAM) content.
Read Service Processor Configuration
Displays current service processor configuration.
Eserver pSeries 655 User’s Guide
of
at a
1. 0
2. 1
3. 2
4. 3
0>
1.
2.
3.
4.
5.
To
v
Processor Configuration/Deconfiguration Menu
Enable/Disable CPU Repeat Gard
CPU repeat gard will automatically deconfigure a CPU during a system boot if a processor has failed BIST (built-in self-test), caused a machine check or check stop, or has reached a threshold of recoverable errors. The processor will remain deconfigured until repeat gard is disabled or the processor is replaced.
The default is enabled.
For more information, see “Configuring and Deconfiguring Processors or Memory” on page 46.
Enable/Disable Dynamic Processor Sparing
This function is not available on this system.
menu allows the user to change the system processor configuration. If it is necessary to take one
This
the processors offline, use this menu to deconfigure a processor, and then reconfigure the processor
later time. An example of this menu follows:
PROCESSOR CONFIGURATION/DECONFIGURATION MENU
77. Enable/Disable CPU Repeat Gard: Currently Enabled
78. Enable/Disable Dynamic Processor Sparing (if available): Currently Enabled
3.0 (00) Configured by system
3.2 (00) Configured by system
3.1 (00) Deconfigured by system
3.3 (00) Configured by system
98. Return to Previous Menu
Note: This table is built from vital product data collected during the last boot sequence. The first time
the system is powered on, or after the system’s nonvolatile RAM (NVRAM) has been erased, this table may be empty. The table is rebuilt during the next boot into AIX.
The fields of the previous table represent the following:
Column 1
(1.) Menu selection index.
Column 2
(0) Logical processor device number assigned by AIX. You can display these logical device numbers by issuing the following command on the AIX command line:
lsdev -C | grep proc
Column 3
(3.0) Processor address list used by the service processor.
Column 4
(00) Error status of the processors.
The error status of each processor is indicated by AB, where B indicates the number of errors and A indicates the type of error according to the following:
Bring-up failure
Run-time non-recoverable failure
Run-time recoverable failure
Group integrity failure
Non-repeat-gardable error. The resource may be reconfigured on the next boot.
status of 00 indicates that the CPU has not had any errors logged against it by the service processor.
A
enable or disable CPU repeat gard, use menu option 77. CPU repeat gard is enabled by default.
Chapter 4. Using the Service Processor
31
1.
2.
a
of
1.
32
If CPU repeat gard is disabled, processors that are in the ″deconfigured by system state will be reconfigured. These reconfigured processors are then tested during the boot process, and if they pass, they remain online. If they fail the boot testing, they are deconfigured even though CPU repeat gard is disabled.
The failure history of each CPU is retained. If a processor with a history of failures is brought back online by disabling repeat gard, it remains online if it passes testing during the boot process. However, if repeat gard is enabled, the processor is taken offline again because of its history of failures.
Notes:
The processor numbering scheme used by the service processor is different from the numbering scheme used by AIX. To ensure that the correct processor is selected, consult the AIX documentation before configuring or deconfiguring a processor.
The number of processors available to AIX can be determined by issuing the following command on the AIX command line: bindprocessor -q.
Memory Configuration/Deconfiguration Menu
v
Enable/Disable Memory Repeat
Memory repeat gard will automatically deconfigure a memory riser card during a system boot if
memory card has failed BIST (built-in self-test), caused a machine check or checkstop, or has reached a threshold of recoverable errors. The memory will remain deconfigured until repeat gard is disabled or the memory card is replaced.
For more information, see “Configuring and Deconfiguring Processors or Memory” on page 46.
Runtime Recoverable Error Repeat Gard
The runtime recoverable error repeat gard flag controls the deallocation of the memory if a recoverable error occurs during runtime. If a recoverable memory error occurs, and runtime recoverable error repeat gard is disabled, the system will continue running with no change in the memory configuration. If a recoverable memory error occurs, and runtime recoverable error repeat gard is enabled, the memory card on which the error occurred will be garded out (taken offline).
The default is disabled.
These
menus allow the user to change the system memory configuration. If it is necessary to take one
the memory cards offline, this menu allows you to deconfigure a memory card, and then reconfigure
the card at a later time.
When this option is selected, a menu displays. The following is an example of this menu:
MEMORY CONFIGURATION/DECONFIGURATION MENU
77. Enable/Disable Memory Repeat Gard: Currently Enabled
78. Runtime Recoverable Error Repeat Gard: Currently Disabled
Memory card
98. Return to Previous Menu
After you select the memory card option by entering 1, a menu displays, allowing the selection of a memory card. The following is an example of this menu.
Eserver pSeries 655 User’s Guide
1.
2.
3.
4.
1.
1.
2.
3.
4.
5.
An
-
0 or 1
In
To
To
v
MEMORY CONFIGURATION/DECONFIGURATION MENU
16.16(00, -) Configured by system
16.18(00, -) Configured by system
16.17(00, -) Configured by system
16.19(00, 1) Partially deconfigured by system
98. Return to Previous Menu
Note: This table is built from vital product data collected during the last boot sequence. The first time
the system is powered on, or after the system’s nonvolatile RAM (NVRAM) has been erased, this table may be empty. The table is rebuilt during the next boot into AIX.
The fields in the previous table represent the following:
Column 1
Menu selection index/card number
Column 2
xx.xx : Card address used by service processor
Column 3
(00.-) Error/deconfiguration status
error status of the each memory card is indicated by AB, where B indicates the number of errors
The and A indicates the type of error according to the following table:
Bring-up failure
Run-time non-recoverable failure
Run-time recoverable failure
Group integrity failure
Non-repeat-gardable error. The resource may be reconfigured on the next boot.
error status of (00, -) (for example, 11.16(00, -)) indicates that the memory card has not had any
errors logged against it by the service processor, and it is fully configured.
The field after the error status will be one of the following:
(dash)
indicates that the memory card is fully configured
indicates that memory repeat gard has deconfigured half of the memory card. If this occurs, the status of the card in the menu is shown as Partially deconfigured by system.
change the memory configuration, select the number of the memory card. The memory card state
To will change from configured to deconfigured or from deconfigured to configured.
This menu only allows the deconfiguration of an entire card; it does not allow the manual deconfiguration of half a card. If half a card has been configured by the system (Partially deconfigured), it can be manually reconfigured using this menu.
the previous example menu, each line shows two cards and indicates whether they are configured.
enable or disable Memory Repeat Gard, use menu option 77 of the Memory
Configuration/Deconfiguration menu.
enable or disable runtime recoverable error repeat gard, use option 78 of the Memory
Configuration/Deconfiguration menu.
The failure history of each card is retained. If a card with a history of failures is brought back online by disabling Repeat Gard, it remains online if it passes testing during the boot process. However, if Repeat Gard is enabled, the card is taken offline again because of its history of failures.
Power Control Network Utilities Menu
Chapter 4. Using the Service Processor
33
1.
2.
3.
4.
5.
6.
7.
8.
0>
1.
2.
To
0
34
POWER CONTROL NETWORK UTILITIES MENU
Lamp Test for all Operator Panels
Display I/O Type
Change I/O Type
Collect and display SPCN trace data
Start SPCN Flash Update
Display Power Subsystem FRU Code Levels
Power Subsystem Code Update via the Power Control Network
Not Applicable
Start Power Subsystem Code Update
Not Applicable
98. Return to Previous Menu
Lamp Test for All Operator Panels
This option is not available on this system.
Display I/O Type
This option, when selected, will display the SPCN address of the CEC drawer and the I/O type of the service processor.
Change I/O Type
Use this option to change the I/O type of the service processor after a service action or configuration change if the I/O type is incorrect. If this option is chosen, you will be asked to make the following entries:
For the I/O drawer address, type 1.
For the I/O type, type A5.
either value is not valid, a failure message displays on the console. Press Enter to return to the
If Power Control Network Utilities Menu.
Collect & display SPCN trace data
This option is used by service personnel to dump the SPCN trace data from the processor subsystem (processor subsystem drawer) to gather additional debug information.
dump the SPCN trace data for the processor subsystem (processor subsystem drawer), enter 0 when prompted for the SPCN drawer number and enter d0 when prompted for the unit address, as shown in the example screen:
Enter the SPCN drawer number:
Enter the unit address: d0
Note: It may take up to 5 minutes for the trace data to be dumped to the screen.
The SPCN trace data will be displayed on the screen. An example of this output follows:
Eserver pSeries 655 User’s Guide
1.
2.
0 > If
to
1.
2.
3.
4.
Please wait....
***** Power Trace Data Start *****
00000000ffffffffffff0006158800a000061574a00000060200353700060210 031300060242000000060241040b0006110a0040000611090200000611020804 0000000000061102080b0006041000000006d0e3850000061103000000061109 02000006040200000006040200010006041000010006d0e38501000604100002 000000000006d0e385020006041000030006d0e3850300061580100000061540 03180006101116000006101116010006158010010007158156a0000711000010 00000000000910120000000910120001 ***** Power Trace data End *****
(Press Return to Continue)
Start SPCN Flash Update
This option is not available on this system.
Display Power Subsystem FRU Code Levels
This option is not available on this system.
Power Subsystem Code Update via the Power Control Network
This option is not available on this system.
Start Power Subystem Code Update:
This option is not available on this system.
LED Control Menu
v
This menu displays the state of the I/O subsystem disturbance/system attention LED. Use this menu to toggle the attention/fault LEDs between identify (blinking) and off. Option 1 is only available when the system is in the error state (the processor subsystem is powered on and the service processor menus are available). Option 1 is not available when the system is in standby. An example of this menu follows:
LED Control Menu
Set/Reset Identify LED state
Clear System Attention Indicator
98. Return to Previous Menu
option 1 is selected, a list of location codes of the I/O subsystems is shown. The screen will be similar
the following:
LED Control Manu
U1.9-P1 U1.9-P2 U1.5-P1 U1.5-P2
Enter number corresponding to the location code, or press Return to continue, or ’x" to return to the menu.
0>4
Chapter 4. Using the Service Processor
35
v
on
If
If
be
If
A
1.
2.
3.
4.
5.
of
36
If one of the devices is selected using the index number, the present state of its LED will be displayed, and you will be given the option to toggle it as shown in these example screens. The final state of the LED will then be displayed whether or not it was changed.
U1.5–P2 is currently in the OFF state
Select from the following (1=IDENTIFY ON, 2=IDENTIFY OFF)
0>2
Please wait ...
U1.5-P2 is currently in the OFF state (Press Return to continue)
Option 2 is not available on this system.
MCM/L3 Interposer Plug Count Menu
Attention: Do not power on the system when in this menu. Fully exit from this menu before powering
the system.
This menu tracks the number of times that the MCM and L3 cache modules have been plugged into the system backplane.
the MCM or L3 cache module is reseated or replugged, the plug count for that module must be incremented by 1. If the plug count exceeds the limit of 10 (reaches 11 or greater), a 450x yyyy or 4B2x yyyy error with a detail value of CFF0 that calls out an MCM or L3 cache module will be posted in the service processor error log. The FRU should be replaced during a deferred service call.
the MCM or L3 cache module is replaced, or installed during an MES upgrade, the plug count must
set using the MCM/L3 Interposer Plug Count menu. If the plug count information is not included with the new or replacement module, enter the default value of 8 (7 for the manufacturing process and 1 for the installation of the module that was just done). If the plug count is not entered, a B1xx 4698 error code, with a detailed value of E10B or E10C, will be posted in the service processor error log.
the service processor card is replaced, the plug counts are retained. However, the plug count menu must be accessed and option 50, Commit the values and write to the VPD, must be executed, so that the plug counts are revalidated. If the counts are not revalidated, a B1xx 4698 error code, with a detail value of E10B or E10C, will be posted in the service processor error log.
screen similar to the following will be displayed.
MCM/L3 Interposer Plug Count Menu
L3_0:7
MCM_0:8
L3_3:7
50. Commit the values and write to the VPD
98. Return to the Previous Menu
The MCM and L3 cache modules are shown in the same way that they are plugged into the processor subsystem planar; the layout shown in the menu represents the physical location as seen from the front
the subsystem.
The format of the menu entries shown above is the menu index number, followed by L3_xx, followed by the plug count after the colon. The following table matches the index number shown above with the physical location codes.
Menu Index Number
Eserver pSeries 655 User’s Guide
L3_1:9
L3_2:7
Physical Location Code
2.
3.
4.
5.
To
v
If
1.
0>
1.
2.
3.
0> If
1. L3_0
L3_1
MCM_0
L3_3
L3_2
U1.x-P1-C1
U1.x-P1-C3
U1.x-P1-C2
U1.x-P1-C4
U1.x-P1-C5
change the plug count for a particular module, enter a menu index number. For example, to change the plug count of the L3 cache module that is physically in the upper-left corner (U1.9-P1-C1), type 1, then enter the new plug count.
When all of the new plug counts have been entered, select 50, Commit the values and write to the
VPD. This action will store the new values in NVRAM.
Performance Mode Setup Menu
certain types of processor cards are installed in the system, this menu remains not applicable. For other
If types of processor cards, this menu will be active after the first boot.
Note:
The first time the system is booted after NVRAM is cleared, Not Applicable displays under Performance Mode Setup Menu on the screen. This may also occur if the service processor is replaced, or the processor MCM is upgraded.
option 12 is selected when Not Applicable is on the screen, the system responds with Not
Applicable and redisplays the system information menu. The setup menu can be displayed after
the performance mode is set, which happens the first time that the system is booted.
The default performance mode is set by the firmware during IPL. The default mode is designed to provide the best performance for the hardware configuration of the system. The performance mode is system-wide; it cannot be set on a per-partition basis. The default setting can be overwritten using the performance mode setup menu. The performance mode setup menu will be similar to the following:
Default Performance Mode:
Current Performance Mode:
Standard Operation
98. Return to Previous Menu
Standard Operation
Selecting option 1 displays the the following performance modes:
Select from the following options:
Large Commercial System optimization
Standard Operation
Turbo Database Mode
you want to override the default setting, a brief description of each performance mode follows:
Chapter 4. Using the Service Processor
37
v
v
v L3
v
If
1.
If
0>
38
v
Large Commercial System Optimization is the setting for systems that do not fall under the other two
selections, Standard Operation and Turbo Database Mode.
Standard Operation optimizes the system for high-memory bandwidth applications where minimal
sharing of data occurs, and the likelihood of significant hardware data prefetching exists. This is the default performance mode on this system.
Turbo Database Mode optimizes system operation for environments where there is a large amount of
data-sharing among processes running concurrently on the system.
Mode Menu
This menu is not supported on this system.
Remote I/O (RIO) Link Speed Setup Menu
This option is used to change the speed of the RIO interface to external drawers. On this system, setting this speed has no effect on the system.
The hardware capability is detected and set by the system firmware during the boot process. If the hardware capability is not initialized (after NVRAM is cleared), it will be set during the first boot and remain set during subsequent boots.
The maximum RIO speed setting will remain not initialized until set by the user.
The user can set the speed lower than or equal to the hardware capability, but not higher. If the hardware capability is 1000 Mbps, the user can set the link speed to 500 Mbps, and the RIO link will run at 500 Mbps. However, if the hardware capability is 500 Mbps and the user selects 1000 Mbps, the user’s selection will be ignored by the system.
this option is chosen, a menu similar to the following is displayed:
Remote I/O (RIO) Link Speed Set Up Menu
Hardware Capability (internal and external): 1000 Mbps
Maximum RIO speed setting (internal and external): Currently Uninitialized
98. Return to previous menu.
0>1
option 1 is selected, the RIO interface speed can be set as follows:
Enter new value for this option: (1 = 1000 MBPS, 2 = 500 Mbps)
This setting will remain in effect until it is changed or NVRAM is cleared.
Language Selection Menu
The service processor menus and messages are available in various languages. This menu allows selecting languages in which the service processor and system firmware menus and messages are displayed.
Eserver pSeries 655 User’s Guide
1.
2.
3.
4.
5.
0>
v
v
v
LANGUAGE SELECTION MENU
English
Francais
Deutsch
Italiano
Espanol
98. Return to Previous Menu
99. Exit from Menus
Note: Your virtual terminal window must support the ISO-8859 character set to correctly display
languages other than English.
Call-In/Call-Out Setup Menu
This menu is not supported on this system.
Service Processor Parameters in Service Mode (Full System Partition)
When the system is in service mode, the following service processor parameters are suspended:
Unattended Start Mode
Reboot/Restart Policy
Surveillance
service mode is exited, the service processor parameters revert to the customer settings.
When
Service Processor Reboot/Restart Recovery
Reboot describes bringing the system hardware back up; for example, from a system reset or power-on.
The boot process ends when control passes to the operating system process.
Restart describes activating the operating system after the system hardware is reinitialized. Restart must
follow a successful reboot.
Boot (IPL) Speed
When the server enters reboot recovery, slow IPL is automatically started, which gives the POST an opportunity to locate and report any problems that might otherwise be unreported.
Failure During Boot Process
During the boot process, either initially after system power-on or upon reboot after a system failure, the service processor monitors the boot progress. If progress stops, the service processor can reinitiate the boot process (reboot) if enabled to do so. The service processor can re-attempt this process according to the number of retries selected in the Reboot/Restart Policy Setup Menu.
Failure During Normal System Operation
When the boot process completes and control transfers to the operating system (OS), the service processor can monitor operating system activity (see the Set Surveillance Parameters option in the SERVICE PROCESSOR SETUP MENU). If OS activity stops due to a hardware- or software-induced
Chapter 4. Using the Service Processor
39
If
If
v
If
OS
No
No
No
No
No
40
failure, the service processor can initiate a reboot/restart process based on the settings in the Service Processor Reboot/Restart Policy Setup Menu and the OS automatic restart settings (see the operating system documentation).
you are using the AIX operating system, the menu item under SMIT for setting the restart policy is Automatically Reboot After Crash. The default is false. When the setting is true, and if the service processor parameter ″Use OS-Defined Restart Policy is yes (the default), the service processor takes over for AIX to reboot/restart after a hardware or surveillance failure.
Service Processor Reboot/Restart Policy Controls
The operating system’s automatic restart policy (see operating system documentation) indicates the operating system response to a system crash. The service processor can be instructed to refer to that policy by the Use OS-Defined Restart Policy setup menu.
the operating system has no automatic restart policy, or if it is disabled, then the service processor-restart policy can be controlled from the service processor menus. Use the Enable Supplemental Restart Policy selection.
Use OS-Defined restart policy - The default setting is no. If set to yes on a full system partition, this
causes the service processor to refer to the OS Automatic Restart Policy setting and take action (the same action the operating system would take if it could have responded to the problem causing the restart).
When this setting is no, or if the operating system did not set a policy, the service processor refers to enable supplemental restart policy for its action.
This setting is ignored on a partitioned system.
Enable supplemental restart policy - The default setting is Yes. When set to yes on a full system
partition, the service processor restarts the server when the operating system loses control and either:
The Use OS-Defined restart policy is set to No.
OR
The Use OS-Defined restart policy is set to Yes and the operating system has no automatic restart
v
policy.
set to Yes on a partitioned system, the service processor restarts the system when the system loses
control and it is detected by service processor surveillance.
following table describes the relationship among the operating system and service processor restart
The controls in a full system partition.
Automatic
reboot/restart
after crash setting
None
None
None
None
2
False
2
False
2
False
2
False
True
True
Service processor to use
OS-Defined
restart policy?
1
No
1
No
Yes
Yes
1
No
1
No
Yes
Yes
1
No
1
No
Service Processor Enable
supplemental restart
policy?
1
Yes
1
Yes
1
Yes
1
Yes
1
Yes
System response
Restarts
Restarts
Restarts
Restarts
Eserver pSeries 655 User’s Guide
No
1
2
In a
If
is
If
OS Automatic
reboot/restart
after crash setting
True
True
Service processor to use
OS-Defined
restart policy?
Yes
Yes
Service Processor Enable
supplemental restart
policy?
1
Yes
System response
Restarts
Restarts
Service processor default
AIX default
partitioned system, the service processor’s supplemental restart policy is the only setting that is used.
the service processor supplemental restart policy is enabled, the system restarts. The enable state (Yes)
the default setting for the supplemental restart policy.
the service processor supplemental restart policy is not enabled, there is no system response.
Chapter 4. Using the Service Processor
41
If
v
v
v
v
v
v On
A
v
v If
v If
42
Updating System Firmware and Microcode
System firmware and microcode updates can be performed by a systems administrator or by a service representative. You can use the microcode updates application on the HMC to survey the levels of microcode on a system as well as retrieve and apply updates. For detailed information, see the IBM Hardware Management Console for pSeries Installation and Operations Guide.
the HMC and managed systems are not set up to use the microcode update application, check the availability of firmware and microcode updates by going to http://techsupport.services.ibm.com/server/mdownload2.
General Information on Processor Subsystem Firmware Updates
Firmware on the processor subsystem includes:
System firmware. System firmware includes:
System power control network programming
Service processor programming
IPL programming
Run-time abstraction services
v
Frame (Power Subsystem) firmware
Integrated SCSI controller microcode
Integrated Ethernet microcode
Determining the Level of Firmware on the Processor Subsystem
Note: This information may be superseded by the information that is available on the following Web site.
Always check the Web site for the latest images and instructions for checking the firmware level. The Web address is http://techsupport.services.ibm.com/server/mdownload2.
The firmware level is indicated in either of the following forms:
RJyymmdd, where RJ = the pSeries 655 Model 651 Processor Subsystem firmware designation, yy = year, mm = month, and dd = day of the release.
vJyymmdd, where v = version number, J = the pSeries 655 Model 651 Processor Subsystemfirmware designation, yy = year, mm = month, and dd = day of the release.
your system is running AIX, the platform firmware level can be determined by either of the following
If methods:
the AIX command line, by typing:
lscfg -vp|grep -p Platform
line that begins with ROM level (alterable).. displays the firmware level that is currently on the
system.
Looking at the top of the Service Processor Main Menu.
your system is running Linux, the platform firmware level can be determined by either of the following
If methods:
the system was installed prior to May of 2003, and has not had a firmware update in May 2003 or
later, type the following on the Linux command line:
lscfg -vp | grep RJ
the system was installed in May 2003 or later, or the system has firmware on it that was released in
May 2003 or later, type the following on the Linux command line:
lscfg -vp | grep 3J
Eserver pSeries 655 User’s Guide
OR
To
1.
2. If
3.
ls
OR ls
4.
cd
5.
OR
A line similar to the following displays the firmware level that is currently on the system:
Alterable ROM level RJ021114
Alterable ROM level 3J030509
Processor Subsystem Firmware Update Using a Locally Available Image
update the system firmware using a locally available image, perform the following steps:
Log in as root user.
the /tmp/fwupdate directory does not exist, create it by running the following command:
mkdir /tmp/fwupdate
The firmware update file must be downloaded or copied into the /tmp/fwupdate directory on the system. This can be done by using the ftp command to get the image from an ftp server, a NIM server, or by NFS-mounting the directory on the host system. If a control workstation (CWS) is attached to the system, the ftp command can be used to transfer the update file to the target system from the control workstation.
The firmware updated file can also be transferred to the target system by backing up the image onto diskettes from another server and restoring it into the /tmp/fwupdate directory.
After the firmware update file has been downloaded or copied into the /tmp/fwupdate directory, verify its existence by entering either of the following commands, depending on the name of the update image:
/tmp/fwupdate/RJ*.img
/tmp/fwupdate/3RJ*.img
The update file has either the RJyymmdd.img or the 3Jyymmdd.img format. In both cases, the J in the second position indicates that this is an update image for your system; yy is the year, mm is the month, and dd is the day of the update file.
After the update file has been written to the /tmp/fwupdate directory, enter the following command:
/usr/lpp/diagnostics/bin
Enter either of the following commands, depending on the name of the update file:
./update_flash -f /tmp/fwupdate/RJyymmdd.img
./update_flash -f /tmp/fwupdate/3Jyymmdd.img
Attention: Do not overlook the periods (.) in the commands shown above. AIX commands are
case-sensitive. Type them exactly as shown.
will be asked for confirmation to proceed with the firmware update and the required reboot. If you
You confirm, the system will apply the new firmware, reboot, and return to the AIX prompt. This may take up to ten minutes depending on the configuration of the system.
Attention: On some systems, the message Wait for rebooting before stopping may display on the
system console. Do not turn off the system until the system has fully rebooted to the AIX login prompt. If a shutdown is necessary at that time, log in as root user and issue the shutdown command.
While the update is in progress, you will see Rebooting... on the display for several minutes.
The firmware update is complete.
Chapter 4. Using the Service Processor
43
If
v
v
v
v
If
To
1.
2. In
3. In
4.
v
is in
44
Updating System Firmware from the AIX Service Aids
Attention: This method is not recommended for partitioned systems, but if the device resources are
allocated correctly, the firmware update can be done using the AIX service aid.
Note:
This service aid is supported only in online diagnostics.
the firmware on a partitioned system is being updated:
One partition running AIX must have service authority.
All other partitions except the one with service authority must be shut down.
The partition with service authority must own the device from which the firmware update image will be read.
The partition with service authority must have a hard disk.
If the required devices are not in the partition with service authority, the customer or system administrator must reassign the appropriate resources to it. This requires rebooting the partition with service authority.
the firmware on a full system partition is being updated, no special steps are required to perform the firmware update using the service aid.
Note: Because the system always reboots itself during this type of firmware update process, the update
process may take up to an hour.
Updating System Firmware from the AIX Command Line
Refer to the detailed instructions that are provided on the website with the latest image.
Note: The update process can take up to 60 minutes, and the system reboots itself during the update
process.
Frame (Power Subsystem) Firmware Update
The frame firmware, which includes the power subsystem, is updated using a task on the HMC. Firmware updates (also called corrective service) are available at http://techsupport.services.ibm.com/server/mdownload2. If a service representative is installing the corrective service, the frame firmware download is also available on CORE..
This task downloads a corrective service for the frame onto the HMC.
Note:
Because the HMC is a closed system, you cannot install additional applications on your HMC. All of the tasks you need to maintain the managed system, the underlying operating system, and the HMC application code are available by using the HMC’s management applications.
download corrective service on the HMC, do the following:
From the HMC interface, in the Navigation area (left side of the panel), open the Software Maintenance folder.
the Contents area (right side of the panel), double-click the Frame icon. The Frame application
opens in the Contents area.
the Contents area, click Receive Corrective Service task.
Select one of the following by clicking on the circle to the left.
Upload corrective service from diskette. If you select this option, make sure the correct diskette
the HMC diskette drive.
Eserver pSeries 655 User’s Guide
v
v
v
1. To
2. Do
1. If
on
2.
3. In
4. In
5.
6.
If
7. If
v
Download the corrective service file from a remote system. If you select this option, complete
the Remote Site (host name), Patch file (file name or remote system), User ID (for remote system), and password (for remote system).
5.
Click OK to copy the corrective service to this HMC.
Integrated SCSI Controller Microcode Update
The SCSI controller is located on the system planar and may require microcode updates. The SCSI controller microcode is updated using a task in AIX diagnostics. A microcode update may be available at http://techsupport.services.ibm.com/server/mdownload2.
The microcode update and procedures for performing the update are available at the Web site.
Integrated Ethernet Microcode Update
The Ethernet controller is located on the system board and may require microcode updates. The Ethernet microcode may be updated using a task in AIX diagnostics. A microcode update may be available at http://techsupport.services.ibm.com/server/mdownload2.
The microcode update and procedures for performing the update are available at the Web site.
Installing Corrective Service on the Frame
This task allows you to update the level of code on the frame after you have downloaded a corrective service. This procedure should be run after any of the following components are replaced:
Bulk Power Regulator (BPR)
Bulk Power Controller (BPC)
Distributed Converter Assembly (DCA)
Notes:
install a corrective service on a frame, you must be a system administrator or a service
representative. For information about system administrator or service representative roles, refer to the
IBM Hardware Management Console for pSeries Installation and Operations Guide.
not power off any of the components in the frame at any time during this installation procedure.
Interruptions can leave the power subsystem, or one of the other components in the frame, in an unrecoverable state.
install a corrective service, do the following:
To
you have not installed the corrective fix from ftp or diskette onto this HMC, perform steps to receive
frame corrective service on the HMC, as described in “Frame (Power Subsystem) Firmware Update”
page 44. Then go to Step 4.
From the HMC interface, in the Navigation area (left side of the panel), open the Software Maintenance folder.
the Contents area (right side of the panel), double-click the Frame icon. The Frame application
opens in the Contents area.
the Contents area, click Install Corrective Service. The Install Corrective Service window opens.
Select the Corrective Service Version, and select the frame where the service will be applied.
Note:
Unless you are directed otherwise, select the highest version number for the most recent fix.
Click Install. The installation may take up to an hour, depending on the number of parts in the frame that require an update. When installation is complete, a window opens to indicate installation status.
Note:
you have replaced a single frame part and are using this procedure to update it, this process
normally takes only two to five minutes.
the installation fails, click Reason for Failure.
Chapter 4. Using the Service Processor
45
If
b. If
c.
If
a
A
v A
v A
be
v A
If a
46
a.
you are able to correct the problems(s), click Cancel and go to Step 6 on page 45.
you are given a service request code (SRC), perform the steps to service the SRC, then return to
this procedure.
Otherwise, contact the next level of support.
8.
After the corrective service has been applied successfully, click Cancel to return.
Reconfiguration Procedure for SNI Adapters
After the system has booted to the AIX login prompt, log in and verify that the SNI adapters are configured properly. On the AIX command line, issue the following command:
lscfg | grep sni
There should be two sni adapters identified in the output of the command (represented by their logical device names).
two adapters are not present, issue the following commands at the AIX command line, where x is the first SNI port (usually sni0) and y is the second SNI port (usually sni1)
ifconfig -d snix down detach ifconfig -d sniy down detach rmdev -d -l snix rmdev -d -l sniy cfgmgr -v
After issuing the above commands, reboot the system and verify the presence of two SNI ports.
Configuring and Deconfiguring Processors or Memory
All failures that crash the system with a machine check or check stop, even if intermittent, are reported as
diagnostic callout for service repair. To prevent the recurrence of intermittent problems and improve the availability of the system until a scheduled maintenance window, processors and memory cards with a failure history are marked ″bad″ to prevent their being configured on subsequent boots.
processor or memory card is marked ″bad″ under the following circumstances:
processor or memory card fails built-in self-test (BIST) or power-on self-test (POST) testing during
boot (as determined by the service processor).
processor or memory card causes a machine check or check stop during runtime, and the failure can
isolated specifically to that processor or memory card (as determined by the processor runtime
diagnostics in the service processor).
processor or memory card reaches a threshold of recovered failures that results in a predictive callout
(as determined by the processor run-time diagnostics in the service processor).
During “bad.”
subsequent reboots until it is replaced or repeat gard is disabled. The repeat gard function also provides the user with the option of manually deconfiguring a processor or memory card, or re-enabling a previously deconfigured processor or memory card.
boot time, the service processor does not configure processors or memory cards that are marked
processor or memory card is deconfigured, the processor or memory card remains offline for
For information about configuring or deconfiguring a processor, see the Processor Configuration/Deconfiguration Menu on page 30. For information about configuring or deconfiguring a memory card, see the Memory Configuration/Deconfiguration Menu on page 32. Both of these menus are submenus under the System Information Menu. You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the Processor Configuration/Deconfiguration Menu.
Eserver pSeries 655 User’s Guide
L1
v
v
If
v
v
v
Run-Time CPU Deconfiguration (CPU Gard)
instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache correctable errors are monitored by the processor runtime diagnostics (PRD) firmware running on the service processor. When a predefined error threshold is met, an error log with warning severity and threshold exceeded status is returned to AIX. At the same time, PRD marks the CPU for deconfiguration at the next boot. AIX will attempt to migrate all resources associated with that processor to another processor and then stop the defective processor.
Service Processor System Monitoring - Surveillance
Surveillance is a function in which the service processor monitors the system, and the system monitors the service processor. This monitoring is accomplished by periodic samplings called heartbeats.
Surveillance is available during the following phases:
System firmware bringup (automatic)
Operating system runtime (optional)
System Firmware Surveillance
System firmware surveillance is automatically enabled during system power-on. It cannot be disabled by the user, and the surveillance interval and surveillance delay cannot be changed by the user.
the service processor detects no heartbeats during system IPL (for a set period of time), it cycles the system power to attempt a reboot. The maximum number of retries is set from the service processor menus. If the fail condition persists, the service processor leaves the machine powered on, logs an error, and displays menus to the user. If Call-out is enabled, the service processor calls to report the failure and displays the operating-system surveillance failure code on the operator panel on the HMC.
Operating System Surveillance
Note: This function is not available on a partitioned system.
Operating system surveillance provides the service processor with a means to detect hang conditions, as well as hardware or software failures, while the operating system is running. It also provides the operating system with a means to detect a service processor failure caused by the lack of a return heartbeat.
Operating system surveillance is not enabled by default, allowing you to run operating systems that do not support this service processor option.
You can also use service processor menus and AIX service aids to enable or disable operating system surveillance.
For operating system surveillance to work correctly, you must set these parameters:
Surveillance enable/disable
Surveillance interval
The maximum time the service processor should wait for a heartbeat from the operating system before timeout.
Surveillance delay
The length of time to wait from the time the operating system is started to when the first heartbeat is expected.
Surveillance
does not take effect until the next time the operating system is started after the parameters
have been set.
Chapter 4. Using the Service Processor
47
If
If
1.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If
48
If desired, you can initiate surveillance mode immediately from service aids. In addition to the three options above, a fourth option allows you to select immediate surveillance, and rebooting of the system is not necessarily required.
operating system surveillance is enabled (and system firmware has passed control to the operating system), and the service processor does not detect any heartbeats from the operating system, the service processor assumes the system is hung and takes action according to the reboot/restart policy settings. See “Service Processor Reboot/Restart Recovery” on page 39.
surveillance is selected from the service processor menus which are only available at system boot, then surveillance is enabled by default as soon as the system boots. From service aids, the selection is optional.
Service Processor Error Logs
The service processor error logs, an example of which follows, contain error conditions detected by the service processor.
Error Log
11/30/99
19:41:56 Service Processor Firmware Failure
B1004999
Enter error number for more details. Press Return to continue, or ’x’ to return to menu.
Press "C" to clear error log, any other key to continue. >
Note: The time stamp in this error log is coordinated universal time (UTC), which is also referred to as
Greenwich mean time (GMT). Operating system error logs have additional information available and can time stamp with local time.
Entering an error number provides nine words of system reference code (SRC) data; an example menu follows.
Detail:
SRC
word11:B1004999 word14:00000000 word17:B1004AAA
B1004999
Press Return to continue, or ’x’ to return to menu.
6005
word12:0110005D word15:00001111 word18:0114005D
word13:00000000 word16:00000 000 word19:A4F1E909
Return is pressed, the contents of NVRAM will be dumped 320 bytes at a time, starting at address 0000.
Eserver pSeries 655 User’s Guide
LCD Progress Indicator Log
The following is an example of the LCD progress indicator log. It shows the types of entries that can appear in the log, and is for example purposes only.
The progress indicator codes are listed from top (latest) to bottom (oldest).
B0FF
LCD Progress Indicator Log
0539..17
0538..17
0539..17
0538..17
0539..17 0581
0538..17
0539..12
0538..12
0539..
0821..01-K1-00
0539..
0728..01-R1-00-00
0539..
0664..40-60-00-1,0
0539..
0777..U0.1-P2-I1/E1
0539..
0742..U0.1-P2-I2/E1
0539..
0776..U0.1-P2-I3/T1 E139 E1FB E139 E183
Press Return to continue, or ’x’ to return to menu. >
EAA1..U0.1-P1-I4 E172..U0.1-P1 E172..U0.1-P1-I4 E172..U0.1-P1 94BB 9109 9380 9108 9107 9106 9105 9118 9104 9103 9102 90FD
Chapter 4. Using the Service Processor
49
If
To
v If
v
v
v
v
v
OR 50
Resetting the Service Processor
required, the system is restarted by resetting the service processor. If the system is powered up, resetting the service processor will cause the system to shut down. When the service processor is reset, it goes through its power-up sequence, including self-tests. Successful completion of the reset sequence is indicated by OK on the HMC display.
Before resetting the service processor, if the managed system is powered up and running, shut down all partitions, or the full system partition. This action causes the system to shutdown and the system power is turned off.
reset the service processor when the managed system is powered off, perform either of the following:
the service processor is responding to input from the HMC, select Service Processor Setup Menu
from the main menu, then select Reset Service Processor. This can only be done by a privileged user.
Remove, then reapply, power using the HMC.
Service Processor Operational Phases
This section provides a high-level flow of the phases of the service processor.
SP Power Applied
Pre-Standby Phase
Standby Phase SP Menus Available
Bring-Up Phase SMS Menus Available
Run-time Phase Operating System Login Prompt Available
Pre-Standby Phase
This phase is entered when the server is connected to a power source. The server may or may not be fully powered on. This phase is exited when the power-on self-tests (POSTs) and configuration tasks are completed.
The pre-standby phase components are:
Service Processor Initialization - Service processor performs any necessary hardware and software initialization.
Service Processor POST - Service processor conducts Power-on self-tests on its various work and code areas.
Service Processor Unattended Start Mode Checks - To assist fault recovery. If unattended start mode is set, the service processor automatically reboots the server. The service processor does not wait for user input or power-on command, but moves through the phase and into the bring-up phase. Access the SMS menus or the service processor menus to reset the unattended start mode.
Standby Phase
The standby phase can be reached in either of the following ways:
With the server off and power connected (the normal path), recognized by OK in the virtual operator panel.
Eserver pSeries 655 User’s Guide
v
On
v
v
an
v
v
v
v
v
v
v
If
v
On an
v
With the server on after an operating system fault, recognized by an 8-digit code in the virtual operator panel.
the standby phase, the service processor takes care of some automatic duties and is available for
In menus operation. The service processor remains in the standby phase until a power-on request is detected.
The standby phase components are as follows:
Menus
The service processor menus are password-protected. Before you can access them, you need either the general user-password or privileged-user password.
HMC-managed systems, service processor menus are available on the HMC graphical user
interface.
Bring-Up Phase
The bring-up phase components are as follows:
Retry Request Check
The service processor checks to see if the previous boot attempt failed. If the specified number of failures are detected, the service processor displays an error code.
Dial Out
This function is handled by the Service Agent code running on the HMC. The service processor issues
error report with the last reported IPL status indicated and any other available error information.
Update Operator Panel (on the HMC)
The service processor displays operator panel data on the HMC virtual terminal window if a remote connection is active.
Environmental Monitoring
The service processor provides expanded error recording and reporting.
System Firmware Surveillance (Heartbeat Monitoring)
The service processor monitors and times the interval between system firmware heartbeats.
Responding to System Processor Commands
The service processor responds to any command issued by the system processor.
Runtime Phase
This phase includes the tasks that the service processor performs during steady-state execution of the operating system.
Environmental Monitoring
The service processor monitors voltages, temperatures, and fan speeds (on some servers).
Responding to System Processor Commands
The service processor responds to any command issued by the system processor.
Run-Time Surveillance
the device driver is installed and surveillance enabled, the service processor monitors the system heartbeat. If the heartbeat times out, the service processor places an outgoing call. This is different from the bring-up phase scenario, where the specified number of reboot attempts are made before placing an outgoing call.
HMC surveillance
HMC-managed system, the service processor monitors the communication link between the managed system and the HMC. If the service processor detects that this communication link has been broken, it will post an error to the operating system running on the managed system.
Chapter 4. Using the Service Processor
51
To
1. At
If
or
0> 2.
0> 3.
If n is
52
Clearing L3 Gard Records
Attention: The following menu should be accessed only by a customer engineer after an L3 cache
module is replaced.
clear the L3 module Gard records, perform the following steps:
the service processor main menu, enter the access code 85712. A screen similar to the following is
displayed.
**** IBM Authorized USE ONLY ****
This menu is for IBM authorized use only.
you have not been authorized to use this
menu, please discontinue use immediately.
Please press ’x’ and return, for main menu
press any other key(s) and return, to continue
Press Enter. The following text is displayed at the bottom of the screen:
Reset all L3 records
============================
Want to clear all L3 records (y/n)?:
Enter y.
Enter y to clear the records. A Task Completed message is displayed.
Note:
entered, the Press Return to Continue message is displayed and the L3 Gard records
are not cleared.
This step ends the procedure for clearing the L3 Gard records.
Eserver pSeries 655 User’s Guide
1. On
2. In a
1.
2.
3.
1
2
3
4
5
6
7
X =
On
M
X
If X is
N
Chapter 5. Using System Management Services
Use the system management services menus to view information about your system or partition, and to perform tasks such as setting a password, changing the boot list, and setting the network parameters.
Notes:
some of the system management services (or service processor) screens, you will see the term
LPAR. LPAR is equivalent to the term logically partitioned system or just partitioned system.
partitioned system, only those devices that are assigned to the partition that is being booted display in the SMS menus. In a full system partition, all devices in the system display in the SMS menus.
start the system management services, do the following:
To
For a partitioned system, use theHardware Management Console for pSeries (HMC) to restart the partition.
For a full system partition, restart the system.
For a partitioned system, watch the virtual terminal window on the HMC.
For a full system partition, watch the firmware console.
Look for the POST indicators memory, keyboard, network, SCSI, speaker, which appear across the bottom of the screen.
Press the numeric 1 key after the word keyboard appears, and before the word speaker appears.
the system management services starts, the following screen displays:
After
Main Menu
Select Language Change Password Options NOT available in LPAR mode View Error Log Setup Remote IPL (Initial Program Load) Change SCSI Settings Select Console NOT available in LPAR mode Select Boot Options
-------------------------------------------------------------------------------------------------­Navigation keys:
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Note: The System Management Services can also be started using the Service Processor Boot Mode
Menu. See page 28.
all menus except the Main Menu, there are several navigation keys:
Return to the main menu.
ESC Return to the previous menu.
Exit the System Management Services and start the operating system.
entered, you are asked to confirm your choice to exit the SMS menus and start the
operating system.
When there is more than one page of information to display, there are two additional navigation keys:
Display the next page of the list.
53
On
1.
2.
3.
4.
5.
M =
X =
1
2
M =
X =
54
P
Note:
Display the next page of the list.
The lowercase navigation key has the same effect as the uppercase key that is shown on the screen. For example, m or M returns you to the main menu.
each menu screen, you are given the option of choosing a menu item and pressing enter (if
applicable), or selecting a navigation key.
Select Language
Note: Your virtual terminal must support the ISO-8859 character set to properly display languages other
than English.
This option allows you to change the language used by the text-based System Management Services menus.
SELECT LANGUAGE
English Francais Deutsch Italiano Espanol
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Change Password Options
Note: This option is not available when the system is booted in LPAR mode.
The Change Password Options menu enables you to select from password utilities.
Password Utilities
Set Privileged-Access Password Remove Privileged-Access Password
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Set Privileged-Access Password
The privileged-access password protects against the unauthorized starting of the system programs.
Eserver pSeries 655 User’s Guide
If
1.
M =
X =
of
1.
2.
M =
X =
Attention: If the privileged-access password has been enabled, you are asked for the privileged-access
password at startup every time you boot your system.
you previously had set a privileged-access password and want to remove it, select Remove
Privileged-Access Password.
View Error Log
Use this option to view or clear your system’s error log. A menu similar to the following displays when you select this option.
Error Log
Entry 1. 01/04/96
Date
Time 12:13:22
Error Code 25A80011
Location 00-00
Entry 2. no error logged
Clear error log
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Note: The time stamp in this error log is coordinated universal time (UTC), which is also referred to as
Greenwich mean time (GMT). Operating system error logs have more information available and can time stamp with your local time.
Setup Remote IPL (Initial Program Load)
This option allows you to enable and set up the remote startup capability of your system or partition. A list
NIC (network interface card) adapters in the system displays first. The following is an example of this
screen.
Attention: In a partitioned system, only those network adapters that have been assigned to the partition
being booted display in the IP Parameters menu. In a full system partition, all network adapters in the system are listed in the adapter parameters menu.
NIC Adapters
Device 10/100 Mbps Ethernet PCI Adapt IBM 10/100/1000 Base-TX PCI
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
When an adapter is selected, the network parameters menu displays:
Slot Integ: U1.9-P1/E1
Hardware Address
00096baeac10
Integ: U1.9-P1-I2/E1 0009cbce0fde
eXit System Management Services
Chapter 5. Using System Management Services
55
1. IP
2.
3.
M =
X =
IP
1.
2.
3.
4.
M =
X =
To
1.
2.
3.
M =
X =
56
Network Parameters 10/100 Mbps Ethernet PCI Adapter II: U1.9-P1/E1
Parameters Adapter Parameters Ping Test
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Selecting the IP (Internet Protocol) Parameters option displays the following menu.
Parameters
10/100 Mbps Ethernet PCI Adapter II: U1.9-P1/E1
Client IP Address Server IP Address Gateway IP Address Subnet Mask
[000.000.000.000] [000.000.000.000] [000.000.000.000] [255.255.255.000]
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
change IP parameters, type the number of the parameters for which you want to change the value.
Entering IP parameters on this screen will automatically update the parameters on the ping test screen.
Attention: If the client system and the server are on the same subnet, set the gateway IP address to
[0.0.0.0].
Selecting the Adapter Configuration allows the setting of the network speed, enabling or disabling spanning tree, and setting the protocol, as shown in the following example menu:
Adapter Configuration 10/100 Mbps Ethernet PCI Adapter II: U1.9-P1/E1
Speed, Duplex Spanning Tree Enabled Protocol
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Selecting the Speed, Duplex option allows you to set the interface speed at which the card will run at half versus full duplex. The current setting is indicated by ″<===″.
Eserver pSeries 655 User’s Guide
1.
2.
3.
4.
5.
M =
X =
is
1.
2. No
M =
X =
1.
2.
M =
X =
a
1.
2. If
Adapter Configuration 10/100 Mbps Ethernet PCI Adapter II: U1.9-P1/E1
auto, auto
10,half
10,full
100,half
100,full
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Selecting the Spanning Tree Enabled menu allows you to enable or disable the spanning tree. If this flag
enabled (because the network the system is being attached to supports spanning trees), the firmware will impose a waiting period of 60 seconds before the adapter can communicate with the network. If this flag is disabled, the network adapter will be able to access the network immediately after the system is connected.
Spanning Tree Enabled 10/100 Mbps Ethernet PCI Adapter II: U1.9-P1/E1
Yes <===
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
The Protocol option allows you to set the appropriate protocol for your network as shown in the following menu:
Protocol 10/100 Mbps Ethernet PCI Adapter II: U1.9-P1/E1
Standard <=== IEEE802.3
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Select the ping test option from the network parameters menu to test an adapter’s network connection to
remote system. After the ping test option is selected, the same series of screens will take you through
setting up the IP parameters and the adapter configuration before attempting the ping test.
Notes:
After the ping test is initiated, it may take 60 seconds or longer to return a result.
the ping test passes or fails, the firmware will stop and wait for a key to be pressed before
continuing.
Chapter 5. Using System Management Services
57
1.
2.
M =
X =
1.
2.
3.
M =
X =
58
Change SCSI Settings
This option allow you to view and change the addresses of the SCSI controllers attached to your system.
SCSI Utilities
Hardware Spin Up Delay Change SCSI Id
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Select Console
Note: This option is not available in a partitioned system. A virtual terminal window on the HMC is the
default firmware console for a partitioned system.
The Select Console Utility allows the user to select which console the user would like to use to display the SMS menus. This selection is only for the SMS menus and does not affect the display used by the operating system.
Follow the instructions that display on the screen. The firmware will automatically return to the SMS main menu.
Select Boot Options
Use this menu to view and set various options regarding the installation devices and boot devices.
Select Install or Boot a Device Select Boot Devices Multiboot Startup
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Option 1
Select Install or Boot a Device allows you to select a device to boot from or install the operating
system from. This selection is for the current boot only.
Option 2
Select Boot Devices allows you to set the boot list.
Option 3
Multiboot Startup toggles the multiboot startup flag, which controls whether the multiboot menu is
invoked automatically on startup.
Eserver pSeries 655 User’s Guide
1.
2.
3.
4.
5.
6.
7.
8.
M =
X =
If a
|
|
|
If
1.
2.
3.
4.
5.
M =
X =
If option 1 is selected, the following menu displays:
Select Device Type
Diskette Tape CD/DVD IDE Hard Drive Network None List All Devices
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
device is selected that does not reside in the system, a menu with the following message displays:
.---------------------------------------------------------.
THE SELECTED DEVICES WERE NOT DETECTED IN THE SYSTEM ! | Press any key to continue.
`---------------------------------------------------------’
hard drive is selected, the following menu displays:
Select Hard Drive Type
SCSI SSA SAN None List All Devices
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Chapter 5. Using System Management Services
59
1. 1
2. -
3. -
4.
5.
M =
X =
1.
2.
3.
M =
X =
If
60
If List All Devices is selected, a menu similar to the following displays, depending on the devices that are installed in the system:
Select Device Device Current Device Number Position Name
SCSI 18200 MB Harddisk (loc=U1.9-P1/Z1-A8,0) SCSI 18200 MB Harddisk (loc=U1.9-P1/Z2-A9,0) SCSI 18200 MB Harddisk (loc=U1.9-P1/Z2-Aa,0)
None
List all devices
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
The appropriate device can then be selected for this boot or installation.
When a device is selected for installing the operating system, or to boot from, the Select Task menu allows you to get more information about the device, or to boot from that device in normal mode or service mode. The following is an example of this menu.
Select Task
SCSI 18200 MB Harddisk (loc=U1.9-P1/Z1-A8,0)
Information Normal Mode Boot Service Mode Boot
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
either Normal Mode Boot or Service Mode Boot is selected, the next screen will ask, Are you sure?. If you answer yes, the device will be booted in the appropriate mode. If you answer no, the firmware will return to the Select Task menu.
Eserver pSeries 655 User’s Guide
To
1.
2.
3.
4.
5.
6.
7.
M =
X =
1.
2.
3.
4.
5.
6.
7.
8.
M =
X =
Select Boot Devices
Attention: In a partitioned system, only those devices from which an operating system can be booted
that are assigned to the partition being booted display on the select boot devices menu. In a full system partition, devices from which an operating system can be booted display on the select boot devices menu.
Note:
Use the following menu hierarchy to minimize the search time for bootable devices:
device type -> bus type -> adapter -> devices attached to the adapter
view all of the potentially bootable devices at one time rather than traversing down the hierarchy with the submenus, go to the Select Device Type menu or the Select Media Type menu, and select List all Devices. The List all Devices option may take a long time to run on a large system with many I/O adapters and devices, such as large disk arrays.
Select this option to view and change the customized boot list, which is the sequence of devices read at startup.
Configure Boot Device Order
Select 1st Boot Device Select 2nd Boot Device Select 3rd Boot Device Select 4th Boot Device Select 5th Boot Device Display Current Setting Restore Default Setting
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
When any of the options 1-5 is selected, the Select Device Type screen displays, which will be similar to the following.
Select Device Type
Diskette Tape CD/DVD IDE Hard Drive Network None List All Devices
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
When a device type is selected, a Select Media Type menu is displayed. The following is an example of that menu.
Chapter 5. Using System Management Services
61
1.
2.
3.
4.
5.
7.
8.
M =
X =
1.
2.
3.
4.
5.
M =
X =
1. 1
2. -
3.
M =
X =
If
|
| |
|
62
Select Media Type
SCSI
SSA
SAN
IDE
ISA
None
List All Devices
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
When the media type is selected, all adapters of that type are displayed on the Select Media Adapter menu. The following is an example of that menu for a SCSI media type.
Select Media Adapter
U1.9-P1/Z1
U1.9-P1-l1/Z1
U1.9-P1-l1/Z2
None
List All Devices
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Each adapter must then be selected individually to see the devices that are attached to it. An example of this menu for the first adapter in the previous example is as follows:
Select Device Device Number
Current
Device
Position Name
SCSI 18 GB Harddisk (loc=U1.9-P1/Z1-A8,0) SCSI 18 GB Harddisk (loc=U0.9-P1/Z1-A9,0)
None
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
there are no devices of the type chosen earlier (on the Select Device Type menu) attached to the
adapter that is specified, a message similar to the following displays:
-------------------------------------------------------------------------------------------------­THE SELECTED DEVICES WERE NOT DETECTED IN THE SYSTEM Press any key to continue.
--------------------------------------------------------------------------------------------------
When a device type is selected, a Select Task menu displays. The following is an example of that menu for a hard disk.
Eserver pSeries 655 User’s Guide
1.
2.
M =
X =
:
:
: sd
:
M =
X =
Select Task
SCSI 18200 MB Harddisk (loc=U1.9-P1/Z1-A8.0)
Information Set Boot Sequence: Configure as 1st Boot Device
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
Selecting Information displays a menu similar to the following for a hard disk.
Device Information
/pci@3fffdf0a000/pci@2,4/scsi@1/sd@8,0
DEVICE NAME DEVICE-TYPE
(Integrated)
(Bootable)
SCSI 18200 MB Harddisk (loc=U1.9-P1/Z1-A8,0)
block
-------------------------------------------------------------------------------------------------­Navigation keys:
return to main menu
ESC key = return to previous screen
eXit System Management Services
-------------------------------------------------------------------------------------------------­Type the number of the menu item and press Enter or Select a Navigation key: _
The Set Boot Sequence option allows you to set the location of the device in the boot list.
Chapter 5. Using System Management Services
63
1.
2.
3.
4.
5.
M =
X =
1.
2.
3.
4.
5.
M =
X =
64
Display Current Settings
This option displays the current setting of the customized boot list. An example of this menu, with one device in the boot list, follows.
Current Boot Sequence
SCSI 18200 MB Harddisk (loc=U1.9-P1/Z1-A8,0)
None None None None
--------------------------------------------------------------------------------------------------
Navigation keys:
return to main menu
ESC key = return to previous screen
--------------------------------------------------------------------------------------------------
Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Restore Default Settings
This option restores the boot list to the default boot list. The default boot list will vary depending on the devices that are installed in the system.
The default boot list is:
Primary diskette drive (if installed) CD-ROM drive (if installed) Tape drive (in installed) Hard disk drive (if installed) Network adapter
--------------------------------------------------------------------------------------------------
Navigation keys:
return to main menu
ESC key = return to previous screen
--------------------------------------------------------------------------------------------------
Type the number of the menu item and press Enter or Select a Navigation key: _
eXit System Management Services
Multiboot Startup
Multiboot Startup toggles the multiboot startup flag, which controls whether the multiboot menu is invoked automatically on startup.
Exiting System Management Services
After you have finished using the system management services, type x (for exit) to boot your system or partition.
Eserver pSeries 655 User’s Guide
v In
v To
v To
to
v
on
v
v
Chapter 6. Using the Online and Standalone Diagnostics
Running diagnostics verifies system hardware operation. The diagnostics consist of online diagnostics and standalone diagnostics.
Note:
The diagnostic procedures described in this chapter must be run for each processor subsystem installed in a rack.
Attention: The AIX operating system must be installed in a partition in order to run online diagnostics on
that partition. If the AIX operating system is not installed, use the standalone diagnostic procedures.
Online diagnostics, when they are installed, reside with AIX in the file system. They can be booted:
single user mode (referred to as service mode)
run in maintenance mode (referred to as maintenance mode)
run concurrently with other applications (referred to as concurrent mode)
Standalone
diagnostics must be booted before they can be run. If booted, the diagnostics have no access
the AIX error log or the AIX configuration data.
Online and Standalone Diagnostics Operating Considerations
Note: When possible, run online diagnostics in service mode. Online diagnostics perform additional
functions compared to standalone diagnostics. Running online diagnostics in service mode ensures that the error state of the system that has been captured in NVRAM is available for your use in analyzing the problem. The AIX error log and certain SMIT functions are only available when diagnostics are run from the disk drive.
Consider the following items before using the diagnostics:
Standalone diagnostics can run on systems configured for either a full (or single) system partition or a multiple partitioned system. When running in a partitioned system, the device from which you are booting standalone diagnostics must be made available to the partition dedicated to running standalone diagnostics. This may require moving the device from the partition that currently contains the boot device (for example, the network adapter connected to the Network Installation Management (NIM) server that has a standalone diagnostic image) to the partition used to run standalone diagnostics. If you move devices, reboot both partitions. For more information, see “Standalone Diagnostic Operation”
page 69.
When diagnostics are installed, the device support for some devices might not get installed. If this is the case, that device does not display in the diagnostic test list when running disk-based diagnostics.
When running diagnostics in a partitioned system, diagnostics function only with the resources that were assigned to that partition. You must run diagnostics in the partition containing the resource that you want to test.
Identifying the Terminal Type to the Diagnostics
When you run diagnostics, you must identify which type of terminal you are using. If the terminal type is not known when the FUNCTION SELECTION menu is displayed, the diagnostics do not allow you to continue until a terminal is selected from the DEFINE TERMINAL option menu. Choose the ″vt320″ selection when running diagnostics from an HMC.
65
If
If
v
v
v
v
in
v On
If
If
on
v
v
v
no
on a
66
Undefined Terminal Types
you specify an undefined terminal type from the DEFINE TERMINAL option menu, the menu prompts the user to enter a valid terminal type. The menu redisplays until either a valid type is entered or you exit the DEFINE TERMINAL option.
Resetting the Terminal
you enter a terminal type that is valid (according to the DEFINE TERMINAL option menu) but is not the correct type for the HMC virtual terminal window being used, you may be unable to read the screen, use the function keys, or use the Enter key. Bypass these difficulties by pressing Ctrl-C to reset the terminal. The screen display that results from this reset depends on the mode in which the system is being run:
Online Normal or Maintenance Mode - The command prompt displays.
Standalone Mode or Online Service Mode - The terminal type is reset to dumb, the Diagnostic Operating Instruction panel displays, and you are required to go through the DEFINE TERMINAL process again.
Running Online Diagnostics
Consider the following when you run the online diagnostics from a server or a disk:
The diagnostics cannot be loaded and run from a disk until the AIX operating system has been installed and configured.
The diagnostics cannot be loaded on a system (client) from a server if that system is not set up to boot from a server over a network. When the system is set up to boot from a server, the diagnostics are run
the same manner as they are from disk.
full system partitions, if the diagnostics are loaded from disk or a server, you must shut down the AIX operating system before turning off the system unit to prevent possible damage to disk data. Do this in either of the following ways:
the diagnostics were loaded in standalone mode, press the F3 key until DIAGNOSTIC OPERATING
INSTRUCTIONS displays. Then press the F3 key once again to shut down the AIX operating system.
the diagnostics were loaded in maintenance or concurrent mode, type the shutdown -F command.
Under some conditions, the system might stop, with instructions displayed on attached displays and
v
terminals. Follow the instructions to select a console display.
Online Diagnostics Modes of Operation
Note: When running online diagnostics on a partition in a partitioned system, diagnostics can be run only
resources that are allocated to that partition.
The online diagnostics can be run in the following modes:
Service Mode (see “Service Mode”). Refer to “Running Online Diagnostics in Service Mode” on page 68 for instructions on how to run the diagnostics in service mode.
Concurrent Mode (see “Concurrent Mode” on page 67). Refer to “Running the Online Diagnostics in Concurrent Mode” on page 68 for instructions on how to run the diagnostics in service mode.
Maintenance Mode (see “Maintenance Mode” on page 67). Refer to “Running the Online Diagnostics in Maintenance Mode” on page 68 for instructions on how to run the diagnostics in service mode.
Service Mode
Service mode provides the most complete checkout of the system resources. This mode also requires that
other programs be running on the partition or system on a full system partition. All partitions or system
full system partition resources, except the SCSI adapter and the disk drives used for paging, can be tested. However, note that the memory and processor are only tested during POST, and the results of the POST tests are reported by diagnostics.
Eserver pSeries 655 User’s Guide
v
v
v
v
v
v
v
in
v
To
Error-log analysis is done in service mode when you select the Problem Determination option on the DIAGNOSTIC MODE SELECTION menu.
Concurrent Mode
Use concurrent mode to run online diagnostics on some of the system resources while the system is running normal activity.
Because the system is running in normal operation, the following resources cannot be tested in concurrent mode:
SCSI adapters connected to paging devices
Disk drive used for paging
Some display adapters and graphics related devices
Memory (tested during POST)
Processor (tested during POST)
The following levels of testing exist in concurrent mode:
The share-test level tests a resource while the resource is being shared by programs running in the
normal operation. This testing is mostly limited to normal commands that test for the presence of a
device or adapter.
The sub-test level tests a portion of a resource while the remaining part of the resource is being used
normal operation. For example, this test could test one port of a multiport device while the other ports
are being used in normal operation.
The full-test level requires the device not be assigned to or used by any other operation. This level of
testing on a disk drive might require the use of the varyoff command. The diagnostics display menus to
allow you to vary off the needed resource.
Error-log
analysis is done in concurrent mode when you select the Problem Determination option on the
DIAGNOSTIC MODE SELECTION menu.
run the online diagnostics in concurrent mode, you must be logged in to the AIX operating system and
have proper authority to issue the commands (if help is needed, see the system operator).
The diag command loads the diagnostic controller and displays the online diagnostic menus.
Maintenance Mode
Maintenance mode runs the online diagnostics using the customer’s version of the AIX operating system. This mode requires that all activity on the partition running the AIX operating system be stopped so that the online diagnostics have most of the resources available to check. All of the system resources, except the SCSI adapters, memory, processor, and the disk drive used for paging, can be checked.
Error log analysis is done in maintenance mode when you select the Problem Determination option on the DIAGNOSTIC MODE SELECTION menu.
Use the shutdown -m command to stop all activity on the AIX operating system and put the AIX operating system into maintenance mode. The diag command is then used to invoke the diagnostic controller so you can run the diagnostics. After the diagnostic controller is loaded, follow the normal diagnostic instructions.
Documentation for the AIX operating system is available from the IBM Eserver pSeries Information Center at http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base. Select AIX documentation. The
AIX Documentation CD contains the base set of publications for the operating system, including
system-management and end-user documentation.
Chapter 6. Using the Online and Standalone Diagnostics
67
To
1.
2.
3.
4.
5.
6.
7.
8.
9.
in
To
1.
2.
3.
4.
5.
To
1.
2.
3.
4.
5.
to
6.
7.
68
Running Online Diagnostics in Service Mode
run the online diagnostics in service mode from the boot hard disk, do the following:
From the HMC, select the Partition Manager.
Right-click on the mouse and select Open Terminal Window.
From the Service Processor Menu on the VTERM, select Option 2 System Power Control.
Select option 6. Verify that the state changes to currently disabled. Disabling fast system boot automatically enables slow boot.
Select Option 98 to exit the system power control menu.
Use the HMC to power on the managed system in a full system partition by selecting the managed system in the Contents area.
Right-click or select the desired system in the Contents area. Next, on the menu, choose Selected.
Select Power On.
Select the Power on Diagnostic Stored Boot list option (refer to “Full System Management Tasks”
the IBM Hardware Management Console for pSeries Installation and Operations Guide, for more
information on full system partitions).
10.
Enter any passwords, if requested.
Note: If you are unable to load the diagnostics to the point when the DIAGNOSTIC OPERATING
INSTRUCTIONS display, go to “Running Standalone Diagnostics from a Network Installation
Management (NIM) Server” on page 69.
Running the Online Diagnostics in Concurrent Mode
run online diagnostics in concurrent mode, do the following:
Log in to the AIX operating system as root user or use CE Login.
Enter the diag command.
When the DIAGNOSTIC OPERATING INSTRUCTIONS display, follow the instructions to check out the desired resources.
When testing is complete, use the F3 key to return to the DIAGNOSTIC OPERATING INSTRUCTIONS. Press the F3 key again to return to the AIX operating system prompt. Be sure to vary on any resource that you had varied to off.
Press Ctrl-D to log off from root user or CE Login.
Running the Online Diagnostics in Maintenance Mode
run the online diagnostics in maintenance mode, do the following:
Stop all programs running on the partition except the AIX operating system (if help is needed, see the system operator).
Log in to the AIX operating system as root user or use CE Login.
Type the shutdown -m command.
When a message indicates the system is in maintenance mode, enter the diag command.
Note: It might be necessary to set TERM type again.
When DIAGNOSTIC OPERATING INSTRUCTIONS screen displays, follow the displayed instructions
check out the desired resources.
When testing is complete, use the F3 key to return to DIAGNOSTIC OPERATING INSTRUCTIONS. Press the F3 key again to return to the AIX operating system prompt.
Press Ctrl-D to log off from root user or CE Login.
Eserver pSeries 655 User’s Guide
v
v
v
v Do
v Do
v Do
To
in a
A
1.
2. If
3.
4. On
v
v
Standalone Diagnostic Operation
Use standalone diagnostics to test the system when the online diagnostics are not installed and as a method of testing the disk drives that cannot be tested by the online diagnostics.
Note: No error log analysis is done by the standalone diagnostics. The standalone diagnostics:
Are resident on a Network Installation Management (NIM) server
Provide a method to test the system when the online diagnostics are not installed or cannot be loaded from the disk drive
Allow testing of the disk drives and other resources that cannot be tested by the online diagnostics
not have access to the AIX configuration data
not have access to the AIX error log
not allow for running of error log analysis
Partitioned System Considerations for Standalone Diagnostics
run standalone diagnostics on a full system partition, you must reboot the entire system. However, for a partition in a partitioned system, you can boot standalone diagnostics either in a given partition or on the entire system (which is the same procedure as a full system partition). For a partitioned system, before running standalone diagnostics on a given partition, the user must move the device from the existing location where standalone diagnostics is booted (the network adapter connected to the NIM server, in the case of NIM boot of standalone diagnostics), to the partition that will run standalone diagnostics. Devices
partitioned system are moved on an I/O-slot basis.
Running Standalone Diagnostics from a Network Installation Management (NIM) Server
client system connected to a network with a NIM server can boot standalone diagnostics from the NIM
server if the client-specific settings on both the NIM server and client are correct.
Notes:
All operations to configure the NIM server require root user authority.
you replace the network adapter in the client, the network adapter hardware address settings for the
client must be updated on the NIM server.
The Cstate for each standalone diagnostics client on the NIM server should be kept in the diagnostic
boot has been enabled state.
the client partition, the NIM server network adapter can be put in the bootlist after the boot disk drive. This allows the system to boot in standalone diagnostics from the NIM server if there is a problem booting from the disk drive. Another option is to use the Select Boot Options function in the SMS menu to set up the network adapter that is connected to the NIM server for a one-time boot of standalone diagnostics.
NIM Server Configuration
Refer to the “Advanced NIM Configuration Tasks” chapter of the AIX Installation Guide and Reference, for information on doing the following:
Registering a client on the NIM server
Enabling a client to run diagnostics from the NIM server
verify that the client system is registered on the NIM server and the diagnostic boot is enabled, run the
To command lsnim -a Cstate -Z ClientName from the command line on the NIM server. Refer to the following table for system responses.
Chapter 6. Using the Online and Standalone Diagnostics
69
or
to
To
1.
2. If
3. If
4.
5.
6.
7.
8. If
to
do a
1.
2.
3.
4. On
5.
6.
70
Note: The ClientName is the name of the system on which you want to run standalone diagnostics.
System Response
#name:Cstate: ClientName:diagnostic boot has been
Client Status
The client system is registered on the NIM server and enabled to run diagnostics from the NIM server.
enabled:
#name:Cstate: ClientName:ready for a NIM operation:
The client is registered on the NIM server but not enabled
run diagnostics from the NIM server.
Note: If the client system is registered on the NIM server
but Cstate has not been set, no data will be returned.
#name:Cstate: ClientName:BOS installation has been enabled:
0042-053 lsnim: there is no NIM object
The client is not registered on the NIM server.
named "ClientName"
Client Configuration and Booting Standalone Diagnostics from the NIM Server
run standalone diagnostics on a client from the NIM server, do the following:
Stop all programs including the AIX operating system (get help if needed).
you are running standalone diagnostics in a full system partition, verify with the system administrator
and system users that the system unit can be shut down. Stop all programs, including the AIX operating system. Refer to the AIX operating system documentation for shutdown command information.
Verify with the system administrator and system users using that partition that all applications on that partition must be stopped, and that the partition will be rebooted. Stop all programs on that partition, including the operating system.
you are in a full system partition, power on the system unit to run standalone diagnostics. In a
partitioned system, reboot the partition to run standalone diagnostics.
When the keyboard indicator is displayed (the word keyboard on an HMC virtual terminal window), press the number 1 key on the keyboard to display the SMS menu.
Enter any requested passwords.
Select Setup Remote IPL (Initial Program Load).
Enter the client address, server address, gateway address (if applicable), and subnet mask.
the NIM server is set up to allow pinging from the client system, use the ping utility in the RIPL utility
verify that the client system can ping the NIM server. Under the ping utility, choose the network adapter that provides the attachment to the NIM server to do the ping operation. If the ping returns with an OK prompt, the client is prepared to boot from the NIM server. If ping returns with a FAILED prompt, the client cannot proceed with the NIM boot.
Note: If the ping fails, refer to the Boot Problems section of the Eserver pSeries 655 Service Guide
and follow the steps for network boot problems.
To
one-time boot of the network adapter attached to the NIM server network, do the following:
Exit to the SMS Main screen.
Select Select Boot Options.
Select Install or Boot a Device.
the Select Device Type screen, select Network.
Set the network parameters for the adapter from which you want to boot.
Exit completely from SMS.
Eserver pSeries 655 User’s Guide
v If
v If
The system starts loading packets while doing a bootp from the network.
Follow
the instructions on the screen to select the system console.
Diagnostics Operating Instructions Version x.x.x displays, standalone diagnostics have loaded
successfully.
the AIX login prompt displays, standalone diagnostics did not load. Check the following items:
The network parameters on the client may be incorrect.
Cstate on the NIM server may be incorrect.
Network problems might be preventing you from connecting to the NIM server.
Verify the settings and the status of the network. If you continue to have problems, refer to the Boot
Problems section of the Eserver pSeries 655 Service Guide and follow the steps for network boot
problems.
Chapter 6. Using the Online and Standalone Diagnostics
71
72
Eserver pSeries 655 User’s Guide
on a
1.
2. If
v
v
v
v
v
v
v
A
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
Chapter 7. Introducing Tasks and Service Aids
The AIX Diagnostic Package contains programs that are called Tasks. Tasks can be thought of as performing a specific function on a resource; for example, running diagnostics or performing a service aid
resource.
Notes:
Many of these programs work on all system model architectures. Some programs are only accessible from Online Diagnostics in Service or Concurrent mode, while others might be accessible only from Standalone Diagnostics.
the system is running on a logically partitioned system, the following tasks can be executed only in a
partition with service authority:
Configure Reboot Policy
Configure Remote Maintenance Policy
Configure Ring Indicate Power On
Configure Ring Indicate Power-On Policy
Update System or Service Processor Flash
Save or Restore Hardware Management Policies
Configure Scan Dump Policy
To perform one of these tasks, use the Task Selection option from the FUNCTION SELECTION menu.
After a task is selected, a resource menu may be presented showing all resources supported by the task.
fast-path method is also available to perform a task by using the diag command and the -T flag. By using the fast path, the user can bypass most of the introductory menus to access a particular task. The user is presented with a list of resources available to support the specified task. The fast-path tasks are as follows:
Certify - Certifies media Chkspares - Checks for the availability of spare sectors Download - Downloads microcode to an adapter or device Disp_mcode - Displays current level of microcode Format - Formats media Identify - Identifies the PCI RAID physical disks IdentifyRemove - Identifies and removes devices (Hot-Plug)
run these tasks directly from the command line, specify the resource and other task-unique flags. Use
To the descriptions in this chapter to understand which flags are needed for a given task.
Tasks
The following tasks are described in this chapter:
Add Resource to Resource List AIX Shell Prompt Analyze Adapter Internal Log Backup and Restore Media Certify Media Change Hardware Vital Product Data Configure Dials and LPF Keys Configure ISA Adapters Configure Reboot Policy Configure Remote Maintenance Policy Configure Ring Indicate Power-On Policy
73
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
74
v
Configure Scan Dump Policy Configure Surveillance Policy Create Customized Configuration Diskette Delete Resource from Resource List Disk Maintenance Display Configuration and Resource List Display Firmware Device Node Information Display Hardware Error Report Display Hardware Vital Product Data Display Machine Check Error Log Display Microcode Level Display Multipath I/O (MPIO) Device Configuration Display or Change Bootlist Display or Change Diagnostic Run Time Options Display Previous Diagnostic Results Display Resource Attributes Display Service Hints Display Software Product Data Display System Environmental Sensors Display Test Patterns Display USB Devices Download Microcode Fibre Channel RAID Service Aids Flash SK-NET FDDI Firmware Format Media Gather System Information Generic Microcode Download Hot-Plug Task Identify Indicators Identify and Remove Resource Task (See ″Hot-Plug Task for AIX 4.3.3.10 or higher) Identify and System Attention Indicators Local Area Network Analyzer Log Repair Action Periodic Diagnostics PCI SCSI Disk Array Manager PCI RAID Physical Disk Identify Process Supplemental Media RAID Array Manager Run Diagnostics Run Error Log Analysis Run Exercisers Save or Restore Hardware Management Policies SCSI Bus Analyzer SCSI and SCSI RAID hot-plug Manager SCSI RAID Physical Disk Status and Vital Product Data SCSD Tape Drive Service Aid Spare Sector Availability SSA Service Aid System Fault Indicator System Identify Indicator Update Disk-Based Diagnostics Update System or Service Processor Flash 7135 RAIDiant Array Service Aids 7318 Serial Communication Network Server
Eserver pSeries 655 User’s Guide
of
To
-c
-d
-s
-e
-T
To
Add Resource to Resource List
Use this task to add resources back to the resource list.
Note: Only resources that were previously detected by the diagnostics and deleted from the Diagnostic
Test List are listed. If no resources are available to be added, then none are listed.
AIX Shell Prompt
Note: Use this service aid in Online Service Mode only.
This service aid allows access to the AIX command line. To use this service aid, the user must know the root password (when a root password has been established).
Note: Do not use this task to install code or to change the configuration of the system. This task is
intended to view files, configuration records, and data. Using this service aid to change the system configuration or install code can produce unexplained system problems after exiting the diagnostics.
Analyze Adapter Internal Log
The PCI RAID adapter has an internal log that logs information about the adapter and the disk drives attached to the adapter. Whenever data is logged in the internal log, the device driver copies the entries to the AIX system error log and clears the internal log.
The Analyze Adapter Internal Log service aid analyzes these entries in the AIX system error log. The service aid displays the errors and the associated service actions. Entries that do not require any service actions are ignored.
When running this service aid, a menu is presented to enter the start time, the end time, and the file name. The start time and end time have the following format: [mmddHHMMyy]. (where mm is the month (1-12), dd is the date (1-31) HH is the hour (00-23) MM is the minute (00-59), and yy is the last two digits
the year (00-99). The file name is the location where the user wants to store the output data.
invoke the service aid task from the command line, type:
diag -c -d devicename -T "adapela [-s start date -e end date]
Flag
devicename
start date Specifies all errors after this date are analyzed. end date
Description
Specifies not console mode.
Specifies the device whose internal log you want to analyze (for example, SCRAID0)
Specifies all errors before this date are analyzed. Specifies the Analyze Adapter Internal Log task
Note:
specify a file name from the command line, use the redirection operator at the end of the command to specify where the output of the command is to be sent, for example > filename (where filename is the name and location where the user wants to store the output data (for example, /tmp/adaptlog).
Backup and Restore Media
This service aid allows verification of backup media and devices. It presents a menu of tape and diskette devices available for testing and prompts for selecting the desired device. It then presents a menu of available backup formats and prompts for selecting the desired format. The supported formats are tar, backup, and cpio. After the device and format are selected, the service aid backs up a known file to the
Chapter 7. Introducing Tasks and Service Aids
75
v To
v To
v
v
on
If
-d -T
-c No
-d
-T
If
If
76
selected device, restores that file to /tmp, and compares the original file to the restored file. The restored file remains in /tmp to allow for visual comparison. All errors are reported.
Certify Media
This task allows the selection of diskette, DVD-RAM media, or hard files to be certified. Normally, this is done under the following conditions:
determine the condition of the drive and media
verify that the media is error-free after a Format Service Aid has been run on the media
Normally,
run Certify if after running diagnostics on a drive and its media, no problem is found, but you
suspect that a problem still exists.
Hard files can be connected either to a SCSI adapter (non-RAID) or a PCI SCSI RAID adapter. The usage and criteria for a hard file connected to a non-RAID SCSI adapter are different from those for a hard file connected to a PCI SCSI RAID adapter.
Certify Media can be used in the following ways:
Certify Diskette
This selection enables you to verify the data written on a diskette. When you select this service aid, a menu asks you to select the type of diskette being verified. The program then reads all of the ID and data fields on the diskette one time and displays the total number of bad sectors found.
Certify DVD-RAM media
This selection reads all of the ID and data fields. It checks for bad data and counts all errors encountered. If an unrecovered error occurs, or recovered errors exceed the threshold value, the data
the media should be transferred to other media and the original media should be discarded.
The Certify service aid displays the following information:
Capacity in bytes
Number of Data Errors Not Recovered
Number of Equipment Check Errors
Number of Recovered Errors
the drive is reset during a certify operation, the operation is restarted.
If
the drive is reset again, the certify operation is terminated, and the user is asked to run diagnostics on
the drive.
This task can be run directly from the AIX command line. See the following command syntax: diag -c
certify
Flag Description
console mode Specifies a device Specifies the certify task
Certify Hard file Attached to a Non-RAID and PCI-X RAID SCSI Adapter
v
For pdisks and hdisks, this selection reads all of the ID and data fields on the hard file. If bad-data errors are encountered, the certify operation counts the errors.
there are non-recovered data errors that do not exceed the threshold value, do one of the following:
For hdisk hard files, the hard file must be formatted and then certified again.
For pdisk hard files, diagnostics should be run on the parent adapter.
the non-recovered data errors, recovered data errors, recovered and non-recovered equipment errors
exceed the threshold values, the hard file must be replaced.
Eserver pSeries 655 User’s Guide
-
-
-
-
-
-
-
-
-
-d
-c No
-d
-T
-d
-c No
-d
-T
-I
-A
After the read certify of the disk surface completes for hdisk hard files, the certify operation performs 2000 random-seek operations. Errors are also counted during the random-seek operations. If a disk timeout occurs before the random seeks are finished, the disk needs to be replaced.
The Certify service aid displays the following information:
For hdisks:
Drive capacity in megabytes.
Number of data errors recovered.
Number of data errors not recovered.
Number of equipment checks recovered.
Number of equipment checks not recovered.
For pdisks:
Drive capacity in megabytes.
Number of data errors not recovered.
Number of LBA reassignments
Number of equipment checks not recovered.
This task can be run directly from the AIX command line. See the following command syntax: diag -c
deviceName -T "certify"
Flag Description
console mode Specifies a device Specifies the certify task
Certify Hard File Attached to a PCI SCSI RAID Adapter
v
This selection is used to certify physical disks attached to a PCI SCSI RAID adapter. Certify reads the entire disk and checks for recovered errors, unrecovered errors, and reassigned errors. If these errors exceed the threshold values, the user is prompted to replace the physical disk.
This task can be run directly from the AIX command line. See the following command syntax: diag -c
RAIDadapterName -T "certify {-l chID | -A}"
Flag Description
console mode Specifies the RAID adapter to which the disk is attached Specifies the certify task and its parameters Specifies physical Disk channel/ID (for example: -l 27) All disks
Change Hardware Vital Product Data
Use this service aid to display the Display/Alter VPD Selection Menu. The menu lists all resources installed on the system. When a resource is selected, a menu displays that lists all the VPD for that resource.
Note: The user cannot alter the VPD for a specific resource unless the VPD is not machine-readable.
Configure Dials and LPF Keys
Note: The Dials and LPF Keys service aid is not supported in standalone mode (CD-ROM and NIM) on
systems with 32 MB or less memory. If you have problems in standalone mode, use the hard file-based diagnostics.
This service aid provides a tool for configuring and removing dials and LPF keys to the asynchronous serial ports.
Chapter 7. Introducing Tasks and Service Aids
77
If
v
v
v
78
This selection invokes the SMIT utility to allow Dials and LPF keys configuration. A TTY must be in the available state on the async port before the Dials and LPF keys can be configured on the port. The task allows an async adapter to be configured, then a TTY port defined on the adapter. Dials and LPF keys can then be defined on the port.
Before configuring Dials or LPF keys on a serial port, you must remove all defined TTYs. To determine if there are any defined TTYs, select List All Defined TTYs. Once all defined TTYs have been removed, then add a new TTY and configure the Dials or LPF keys.
Configure ISA Adapter
This task uses SMIT to identify and configure ISA adapters on systems that have an ISA bus and adapters.
Diagnostic support for ISA adapters not shown in the list may be supported from a supplemental diskette. You can use the Process Supplemental Media task to add ISA adapter support from a supplemental diskette.
Whenever an ISA adapter is installed, this service aid must be run and the adapter configured before the adapter can be tested. You must also run this service aid to remove an ISA adapter from the system whenever an ISA adapter is physically removed from the system.
diagnostics are run on an ISA adapter that has been removed from the system, the diagnostics fail
because the system cannot detect the ISA adapter.
Configure Reboot Policy
This service aid controls how the system tries to recover from a system crash.
Use this service aid to display and change the following settings for the Reboot Policy.
Note: Because of system capability, some of the following settings might not be displayed by this service
aid.
Maximum Number of Reboot Attempts
Enter a number that is 0 or greater.
Note: A value of 0 indicates ’do not attempt to reboot’ to a crashed system.
This number is the maximum number of consecutive attempts to reboot the system. The term reboot, in the context of this service aid, describes bringing system hardware back up from scratch; for example, from a system reset or power-on.
When the reboot process completes successfully, the reboot-attempts count is reset to 0, and a restart begins. The term restart, in the context of this service aid, is used to describe the operating system activation process. Restart always follows a successful reboot.
When a restart fails, and a restart policy is enabled, the system attempts to reboot for the maximum number of attempts.
Use the O/S Defined Restart Policy (1=Yes, 0=No)
When ’Use the O/S Defined Restart Policy’ is set to Yes, the system attempts to reboot from a crash if the operating system has an enabled Defined Restart or Reboot Policy.
When ’Use the O/S Defined Restart Policy’ is set to No, or the operating system restart policy is undefined, then the restart policy is determined by the ’Supplemental Restart Policy’.
Enable Supplemental Restart Policy (1=Yes, 0=No)
The ’Supplemental Restart Policy’, if enabled, is used when the O/S Defined Restart Policy is undefined, or is set to False.
Eserver pSeries 655 User’s Guide
v
v
If
v
v
v
v
v S1
S2
A
v
v
When surveillance detects operating system inactivity during restart, an enabled ’Supplemental Restart Policy’ causes a system reset and the reboot process begins.
Call-Out Before Restart (on/off)
When enabled, Call-Out Before Restart allows the system to call out (on a serial port that is enabled for call-out) when an operating system restart is initiated. Such calls can be valuable if the number of these events becomes excessive, thus signalling bigger problems.
Enable Unattended Start Mode (1=Yes, 0=No)
When enabled, ’Unattended Start Mode’ allows the system to recover from the loss of ac power.
the system was powered-on when the ac loss occurred, the system reboots when power is restored. If
the system was powered-off when the ac loss occurred, the system remains off when power is restored.
can access this service aid directly from the AIX command line, by typing:
You
/usr/lpp/diagnostics/bin/uspchrp -b
Configure Remote Maintenance Policy
The Remote Maintenance Policy includes modem configurations and phone numbers to use for remote maintenance support.
Use this service aid to display and change the following settings for the Remote Maintenance Policy.
Note:
Because of system capability, some of the following settings might not be displayed by this service aid.
Configuration File for Modem on serial port 1 (S1)
Configuration File for Modem on serial port 2 (S2).
Enter the name of a modem configuration file to load on either S1 or S2. The modem configuration files are located in the directory /usr/share/modems. If a modem file is already loaded, it is indicated by Modem file currently loaded.
Modem file currently loaded on S1
Modem file currently loaded on S2
This is the name of the file that is currently loaded on serial port 1 or serial port 2.
Note: These settings are only shown when a modem file is loaded for a serial port.
Call In Authorized on S1 (on/off)
Call In Authorized on S2 (on/off)
Call In allows the Service Processor to receive a call from a remote terminal.
Call Out Authorized on S1 (on/off)
Call Out Authorized on S2 (on/off)
Call Out allows the Service Processor to place calls for maintenance.
Line Speed
Line Speed
list of line speeds is available by using List on the screen.
Service Center Phone Number
This is the number of the service center computer. The service center usually includes a computer that takes calls from systems with call-out capability. This computer is referred to as ″the catcher. The catcher expects messages in a specific format to which the Service Processor conforms. For more information about the format and catcher computers, refer to the README file in the AIX
/usr/samples/syscatch directory. Contact the service provider for the correct telephone number to enter
here.
Customer Administration Center Phone Number
Chapter 7. Introducing Tasks and Service Aids
79
v
v
v
v
v
1.
2.
3.
1.
2.
3.
v
v
v
is
of
v
v
80
This is the number of the System Administration Center computer (catcher) that receives problem calls from systems. Contact the system administrator for the correct telephone number to enter here.
Digital Pager Phone Number In Event of Emergency
This is the number for a pager carried by someone who responds to problem calls from your system.
Customer Voice Phone Number
This is the number for a telephone near the system, or answered by someone responsible for the system. This is the telephone number left on the pager for callback.
Customer System Phone Number
This is the number to which your system’s modem is connected. The service or administration center representatives need this number to make direct contact with your system for problem investigation. This is also referred to as the Call In phone number.
Customer Account Number
This number is available for service providers to use for record-keeping and billing.
Call Out Policy Numbers to call if failure
This is set to either First or All. If the call-out policy is set to First, call out stops at the first successful call to one of the following numbers in the order listed:
Service Center Customer Administration Center Pager
If Call Out Policy is set to All, call-out attempts to call all of the following numbers in the order listed:
Service Center Customer Administration Center Pager
Customer RETAIN Login ID Customer RETAIN Login Password
v
These settings apply to the RETAIN service function.
Remote Timeout, in seconds Remote Latency, in seconds
These settings are functions of the service provider’s catcher computer.
Number of Retries While Busy
This is the number of times the system should retry calls that resulted in busy signals.
System Name (System Administrator Aid)
This is the name given to the system and is used when reporting problem messages.
Note:
Knowing the system name aids the support team in quickly identifying the location, configuration, history, and so on of your system.
can access this service aid directly from the AIX command line, by typing:
You
/usr/lpp/diagnostics/bin/uspchrp -m
Configure Ring Indicate Power-On Policy
This service aid allows the user to power-on a system by telephone from a remote location. If the system
powered off, and Ring Indicate Power On is enabled, the system powers on at a predetermined number
rings. If the system is already on, no action is taken. In either case, the telephone call is not answered,
and the caller receives no feedback that the system has powered on.
Use this service aid to display and change the following settings for the Ring Indicate Power-On Policy:
Because of system capability, some of the following settings might not be displayed by this service aid.
Power On Via Ring Indicate (on/off) Number of Rings Before Power On
Eserver pSeries 655 User’s Guide
is
v
v As
v
v As
v
v
v
v
v
v
v
You can access this service aid directly from the AIX command line, by typing:
/usr/lpp/diagnostics/bin/uspchrp -r
Configure Scan Dump Policy
Configure Scan Dump Policy allows the user to set or view the scan dump policy (scan dump control and size) in NVRAM. Scan Dump data is a set of chip data that the service processor gathers after a system malfunction. It consists of chip scan rings, chip trace arrays, and Scan COM (SCOM) registers. This data
stored in the scan-log partition of the system’s Nonvolatile Random Access Memory (NVRAM).
Use this service aid to display and change the following settings for the Scan Dump Policy at run time:
Scan Dump Control (how often the dump is taken)
Scan Dump Size (size and content of the dump)
v
Scan Dump Control (SDC) settings are as follows:
The
Needed: This setting allows the platform firmware to determine whether a scan dump is performed.
This is the default setting for the dump policy.
Always: This setting overrides the firmware recommendations and always performs a dump after a system failure.
Scan Dump Size (SDS) settings are as follows:
The
Requested - Dump content is determined by the platform firmware.
Minimum - Dump content collected provides the minimum debug information, enabling the platform to reboot as quickly as possible.
Optimum - Dump content collected provides a moderate amount of debug information.
Complete - Dump data provides the most complete error coverage at the expense of reboot speed.
can access this service aid directly from the AIX command line by typing:
You
/usr/lpp/diagnostics/bin/uspchrp -d
Configure Surveillance Policy
Note: This service aid is supported only for systems running in full machine partition.
This service aid monitors the system for hang conditions; that is, hardware or software failures that cause operating system inactivity. When enabled, and surveillance detects operating system inactivity, a call is placed to report the failure.
Use this service aid to display and change the following settings for the Surveillance Policy:
Note: Because of system capability, some of the following settings might not be displayed by this service
aid: Surveillance (on/off) Surveillance Time Interval - This is the maximum time between heartbeats from the operating system. Surveillance Time Delay - This is the time to delay between when the operating system is in control and when to begin operating system surveillance. Changes are to Take Effect Immediately - Set this to Yes if the changes made to the settings in this menu are to take place immediately. Otherwise, the changes take effect beginning with the next system boot.
can access this service aid directly from the AIX command line, by typing:
You
/usr/lpp/diagnostics/bin/uspchrp -s
Chapter 7. Introducing Tasks and Service Aids
81
v
v
v
v
1.
2.
of a
To
1.
82
Create Customized Configuration Diskette
This selection invokes the Diagnostic Package Utility Service Aid, which allows the user to create a standalone diagnostic package configuration diskette.
The Standalone Diagnostic Package Configuration Diskette allows the following to be changed from the console:
Default refresh rate for a low function terminal (LFT)
The refresh rate used by the standalone diagnostic package is 60 Hz. If the display’s refresh rate is 77 Hz, set the refresh rate to 77.
Different async terminal console
You can create a console configuration file that allows a terminal attached to any RS232 or RS422 adapter to be selected as a console device. The default device is an RS232 TTY device attached to the first standard serial port (S1).
Delete Resource from Resource List
Use this task to delete resources from the resource list.
Note: Only resources that were previously detected by the diagnostics and have not been deleted from
the Diagnostic Test List are listed. If no resources are available to be deleted, then none are listed.
Disk Maintenance
This service aid provides the following options for the fixed-disk maintenance:
Disk to Disk Copy Display/Alter Sector
Disk to Disk Copy
Notes:
This service aid cannot be used to update to a different size drive. The service aid only supports copying from a SCSI drive to another SCSI drive of similar size.
Use the migratepv command when copying the contents to other disk drive types. This command also works when copying SCSI disk drives or when copying to a different size SCSI disk drive. Refer to
System Management Guide: Operating System and Devices for a procedure on migrating the contents
physical volume.
This publication is located on the AIX V4.3 Documentation CD. The documentation information is made accessible by loading the documentation CD onto the hard disk or by mounting the CD in the CD-ROM drive.
This selection allows you to recover data from an old drive when replacing it with a new drive. The service aid recovers all logical volume manager (LVM) software reassigned blocks. To prevent corrupted data from being copied to the new drive, the service aid stops if an unrecoverable read error is detected. To help prevent possible problems with the new drive, the service aid stops if the number of bad blocks being reassigned reaches a threshold.
use this service aid, both the old and new disks must be installed in or attached to the system with unique SCSI addresses. This requires that the new disk drive SCSI address must be set to an address that is not currently in use and the drive be installed in an empty location. If there are no empty locations, then one of the other drives must be removed. When the copy is complete, only one drive can remain installed. Either remove the target drive to return to the original configuration, or perform the following procedure to complete the replacement of the old drive with the new drive:
Remove both drives.
Eserver pSeries 655 User’s Guide
Loading...