Silicon Graphics UV 2000, UV 2000 Series System User's Manual

SGI® UV 2000 System User Guide
Document Number 007-5832-002
COPYRIGHT
© 2013 SGI. All rights rese rved; provid ed portions ma y be copyri ght in third parties, as indicated elsewhere here in. No permi ssion is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of SGI.
The software described in this document is "commercial computer software" provided with restricted rights (except as to included open/free source) as specified in the FAR 52.227-19 and/or the DFAR 227.7202, or successive sections. Use beyond license provisions is a violation of worldwide intellectual property laws, treaties and conventio ns. Thi s document is provided with limited rights as defined in 52.227 -14.
The electronic (soft w are) version of this documen t wa s developed at private expense; if acquired under an agreement with t he U S A government or any contractor thereto, it is acquire d as “commercial computer software” subject to the provisions of its ap pl icable license a greement, as specified in (a) 48 CFR
12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufactu rer is SGI, 46600 Landing Parkway, Fremont, CA 94538.
TRADEMARKS AND ATTRIBUTIONS
Silicon Graphics, SGI, the SGI logo, NUMAlink and NUMAflex are trademarks or registered trademarks of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries worldwide.
Intel, Itanium and Xeon are trademarks or register ed trademarks of Intel Corporation or its subsidiaries in the United States and ot he r cou nt rie s. UNIX is a registered trademark in the United States and other countries, licensed excl usi vely through X/Open Com pany, Ltd. Infiniband is a trademark of the In fi niB and Trade Association. LSI, MegaRAID, and Mega RAID Storage Manager are trademarks or registered trademarks of LSI Corporation.
Linux is a registered tra de ma rk of Linus Torvalds in the U.S. and other countries.
Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries.
SUSE LINUX is a registered trademark of Novell Inc.
Windows is a registered trademark of Microsoft Corporation in the Uni te d States and ot he r countries.
All other trademarks mentioned herein are the pr ope rt y of their respective owners.
Record of Revision
Version Description
001 June, 2012
First Release
002 May, 2013
MIC/GPU Blade and RAID updates
007-5832-002 iii
Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . xiii
Audience. . . . . . . . . . . . . . . . . . . . . . . . . . xv
Important Information . . . . . . . . . . . . . . . . . . . . . . xv
Chapter Descriptions . . . . . . . . . . . . . . . . . . . . . . xvi
Related Publications. . . . . . . . . . . . . . . . . . . . . . .xvii
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . xix
Product Support . . . . . . . . . . . . . . . . . . . . . . . . xx
Reader Comments . . . . . . . . . . . . . . . . . . . . . . . xx
1. Operation Procedures . . . . . . . . . . . . . . . . . . . . . . 1
Precautions . . . . . . . . . . . . . . . . . . . . . . . . . 1
ESD Precaution. . . . . . . . . . . . . . . . . . . . . . . 1
Safety Precautions . . . . . . . . . . . . . . . . . . . . . . 2
Power Connections Overview . . . . . . . . . . . . . . . . . . . . 2
System Connections Overview . . . . . . . . . . . . . . . . . . . . 7
Connecting to the UV System Control Network . . . . . . . . . . . . . . 7
System Controller Access . . . . . . . . . . . . . . . . . . . . 8
Serial Console Hardware Requirements . . . . . . . . . . . . . . . . 9
Establishing a Serial Connection to the CMC on SGI UV 2000 . . . . . . . . . . 9
Establishing CMC IP Hardware Connections . . . . . . . . . . . . . . . 9
Using DHCP to Establish an IP Address . . . . . . . . . . . . . . . 10
Using a Static IP Address . . . . . . . . . . . . . . . . . . . 10
007-5832-002 v
Contents
System Control Overview . . . . . . . . . . . . . . . . . . . . . 12
Communicating with the System . . . . . . . . . . . . . . . . . . 13
The SGI Management Center Graphical User Interface . . . . . . . . . . . . 13
Powering-On and Off From the SGI Management Center Interface . . . . . . . . . 13
The Command Line Interface . . . . . . . . . . . . . . . . . . . 14
Powering On and Off from the Command Line Interface . . . . . . . . . . . 14
Command Options for Power On . . . . . . . . . . . . . . . . . 15
Power On the System From the SMN Command Line. . . . . . . . . . . . 15
Specific CLI Commands Used With the SMN. . . . . . . . . . . . . . 16
Optional Power On From the CMC Command Line . . . . . . . . . . . . . 17
Booting Directly From a CMC . . . . . . . . . . . . . . . . . . . 17
Power On the System Using the CMC Network . . . . . . . . . . . . . 17
Optional Power On Using the SMN to Connect to the CMC . . . . . . . . . . 18
Monitoring Power On . . . . . . . . . . . . . . . . . . . . . 19
Power off a UV System . . . . . . . . . . . . . . . . . . . . . 20
Additional CLI Power Command Options . . . . . . . . . . . . . . . 20
Using Embedded Support Partner (ESP). . . . . . . . . . . . . . . . . . 22
Optional Components . . . . . . . . . . . . . . . . . . . . . . 23
PCIe Cards . . . . . . . . . . . . . . . . . . . . . . . . 23
PCIe Drive Controllers in BaseIO Blade . . . . . . . . . . . . . . . . 25
RAID PCIe Disk Controller . . . . . . . . . . . . . . . . . . 25
Non-RAID PCIe Disk Controller . . . . . . . . . . . . . . . . . 26
2. System Control . . . . . . . . . . . . . . . . . . . . . . . . 27
Levels of System Control . . . . . . . . . . . . . . . . . . . . . 27
System Management Node (SMN) Overview . . . . . . . . . . . . . . . 28
CMC Overview . . . . . . . . . . . . . . . . . . . . . . . 29
BMC Overview . . . . . . . . . . . . . . . . . . . . . . . 31
System Controller Interaction . . . . . . . . . . . . . . . . . . . . 31
IRU Controllers . . . . . . . . . . . . . . . . . . . . . . . . 32
Chassis Management Controller Functions . . . . . . . . . . . . . . . . 32
1U Console Option . . . . . . . . . . . . . . . . . . . . . . 32
Flat Panel Rackmount Console Option Features . . . . . . . . . . . . . 33
vi 007-5832-002
Contents
3. System Overview . . . . . . . . . . . . . . . . . . . . . . . 35
System Models . . . . . . . . . . . . . . . . . . . . . . . . 37
System Architecture. . . . . . . . . . . . . . . . . . . . . . . 39
System Features . . . . . . . . . . . . . . . . . . . . . . . . 41
Modularity and Scalability . . . . . . . . . . . . . . . . . . . . 41
Distributed Shared Memory (DSM) . . . . . . . . . . . . . . . . . 41
Distributed Shared I/O . . . . . . . . . . . . . . . . . . . . . 43
Chassis Management Controller (CMC) . . . . . . . . . . . . . . . . 43
ccNUMA Architecture . . . . . . . . . . . . . . . . . . . . . 43
Cache Coherency . . . . . . . . . . . . . . . . . . . . . 43
Non-uniform Memory Access (NUMA) . . . . . . . . . . . . . . . 44
Reliability, Availability, and Serviceability (RAS) . . . . . . . . . . . . . 44
System Components. . . . . . . . . . . . . . . . . . . . . . . 46
Optional BaseIO SSDs . . . . . . . . . . . . . . . . . . . . . 48
MIC/GPU Enabled Compute Blade . . . . . . . . . . . . . . . . . 49
Bay (Unit) Numbering . . . . . . . . . . . . . . . . . . . . . 50
Rack Numbering . . . . . . . . . . . . . . . . . . . . . . 50
Optional System Components . . . . . . . . . . . . . . . . . . . 50
4. Rack Information . . . . . . . . . . . . . . . . . . . . . . . 51
Overview . . . . . . . . . . . . . . . . . . . . . . . . . 51
SGI UV 2000 Series Rack (42U) . . . . . . . . . . . . . . . . . . . 52
SGI UV 2000 System Rack Technical Specifications . . . . . . . . . . . . . . 56
5. Option al Octal Rou ter Chassis I n formation . . . . . . . . . . . . . . . . 57
Overview . . . . . . . . . . . . . . . . . . . . . . . . . 57
SGI UV 2000 Series NUMAlink Octal Router Chassis. . . . . . . . . . . . . . 58
SGI UV 2000 External NUMAlink System Technical Specifications . . . . . . . . . . 60
6. Add or Replace Procedures . . . . . . . . . . . . . . . . . . . . 61
Maintenance Precautions and Procedures . . . . . . . . . . . . . . . . . 61
Preparing the System for Maintenance or Upgrade . . . . . . . . . . . . . 62
Returning the System to Operation. . . . . . . . . . . . . . . . . . 62
Overview of PCI Express (PCIe) Operation . . . . . . . . . . . . . . . . 63
007-5832-002 vii
Contents
Adding or Replacing PCIe Cards in the Expansion Enclosure . . . . . . . . . . . . 64
Installing Cards in the 1U PCIe Expansion Chassis. . . . . . . . . . . . . . 65
Removing and Replacing an IRU Enclosure Power Supply . . . . . . . . . . . 68
7. Troubleshoot ing an d Diagnostics . . . . . . . . . . . . . . . . . . . 71
Troubleshooting Chart . . . . . . . . . . . . . . . . . . . . . . 72
LED Status Indicators . . . . . . . . . . . . . . . . . . . . . . 73
IRU Power Supply LEDs . . . . . . . . . . . . . . . . . . . . 73
Compute/Memory Blade LEDs . . . . . . . . . . . . . . . . . . . 74
SGI Electronic Support . . . . . . . . . . . . . . . . . . . . . . 75
A. Technical Specifications and Pinouts . . . . . . . . . . . . . . . . . . 79
System-level Specifications . . . . . . . . . . . . . . . . . . . . . 79
Physical Specifications . . . . . . . . . . . . . . . . . . . . . . 80
Environmental Specifications . . . . . . . . . . . . . . . . . . . . 81
Power Specifications . . . . . . . . . . . . . . . . . . . . . . . 82
I/O Port Specifications . . . . . . . . . . . . . . . . . . . . . . 83
BaseIO VGA Port Information . . . . . . . . . . . . . . . . . . . 83
Ethernet Port. . . . . . . . . . . . . . . . . . . . . . . . 85
Serial Ports . . . . . . . . . . . . . . . . . . . . . . . . 86
USB Type A Connector . . . . . . . . . . . . . . . . . . . . . 88
B. Safety Information and Regulatory Specifications . . . . . . . . . . . . . . 89
Safety Information . . . . . . . . . . . . . . . . . . . . . . . 89
viii 007-5832-002
Contents
Regulatory Specifications . . . . . . . . . . . . . . . . . . . . . 91
CMN Number . . . . . . . . . . . . . . . . . . . . . . . 91
CE Notice and Manufacturer’s Declaration of Conformity . . . . . . . . . . . 91
Electromagnetic Emissions. . . . . . . . . . . . . . . . . . . . 92
FCC Notice (USA Only) . . . . . . . . . . . . . . . . . . . 92
Industry Canada Notice (Canada Only) . . . . . . . . . . . . . . . 93
VCCI Notice (Japan Only). . . . . . . . . . . . . . . . . . . 93
Korean Class A Regulatory Notice . . . . . . . . . . . . . . . . 93
Shielded Cables. . . . . . . . . . . . . . . . . . . . . . . 94
Electrostatic Discharge . . . . . . . . . . . . . . . . . . . . . 94
Laser Compliance Statements . . . . . . . . . . . . . . . . . . . 95
Lithium Battery Statement . . . . . . . . . . . . . . . . . . . . 96
007-5832-002 ix
List of Figures
Figure 1-1 IRU Power Supply Cable Location Example . . . . . . . . 3
Figure 1-2 Single-Phase 2-Outlet PDU Example . . . . . . . . . . 4
Figure 1-3 Single-Phase 8-Outlet PDU . . . . . . . . . . . . . 5
Figure 1-4 Three-Phase PDU Examples . . . . . . . . . . . . . 6
Figure 1-5 System Management Node Rear Video Connections . . . . . . 8
Figure 1-6 UV CMC Connection Faceplate Example . . . . . . . . . 8
Figure 1-7 PCIe Option Blade Example with Full-Height and Low-Profile Slots . . 24
Figure 1-8 PCIe Option Blade Example with Two Low-Profile Slots . . . . . 24
Figure 1-9 BaseIO Blade and PCIe Disk Controller Example . . . . . . . 25
Figure 2-1 System Management Node Front and Rear Panels . . . . . . . 29
Figure 2-2 SGI UV 2000 LAN-attached System Control Network Example . . . 30
Figure 2-3 Optional 1U Rackmount Console . . . . . . . . . . . 33
Figure 2-4 System Management Node (SMN) Direct Video Connection Ports. . . 34
Figure 3-1 SGI UV 2000 Single-Rack System Example . . . . . . . . 36
Figure 3-2 SGI UV 2000 IRU and Rack. . . . . . . . . . . . . 38
Figure 3-3 Functional Block Diagram of the Individual Rack Unit (IRU) . . . . 40
Figure 3-4 Blade Node Block Diagram Example . . . . . . . . . . 42
Figure 3-5 SGI UV 2000 IRU System Components Example . . . . . . . 47
Figure 3-6 BaseIO Riser Enabled Blade Front Panel Example . . . . . . . 48
Figure 3-7 MIC/GPU Enabled Compute Blade Example Front View . . . . . 49
Figure 4-1 SGI UV 2000 Series Rack Example . . . . . . . . . . . 53
Figure 4-2 Front Lock on Tall (42U) Rack . . . . . . . . . . . . 54
Figure 4-3 Optional Water-Chilled Cooling Units on Rear of SGI 42U Rack . . . 55
Figure 5-1 SGI UV 2000 Optional NUMAlink ORC (Rear View) . . . . . . 58
Figure 5-2 SGI UV 2000 Optional ORC Chassis Example (Front View) . . . . 59
Figure 6-1 Comparison of PCI/PCI-X Conn ector with PCI Express Connectors . . 63
Figure 6-2 The PCIe Expansion Enclosure . . . . . . . . . . . . 66
007-5832-002 xi
List of Figures
Figure 6-3 Card Slot Locations . . . . . . . . . . . . . . . 67
Figure 6-4 Removing an Enclosure Power Supply . . . . . . . . . . 68
Figure 7-1 UV Compute Blade Status LED Locations Example. . . . . . . 74
Figure 7-2 Full Support Sequence Example . . . . . . . . . . . . 75
Figure A-1 VGA Port Pinouts. . . . . . . . . . . . . . . . 83
Figure A-2 Ethernet Port . . . . . . . . . . . . . . . . . 85
Figure A-3 Serial Port Connector . . . . . . . . . . . . . . . 86
Figure A-4 Pin Number Locations for USB Type A Connector . . . . . . . 88
Figure B-1 VCCI Notice (Japan Only) . . . . . . . . . . . . . 93
Figure B-2 Korean Class A Regulatory Notice . . . . . . . . . . . 93
xii 007-5832-002
List of Tables
Table 4-1 Tall Rack Technical Specifications . . . . . . . . . . . 56
Table 5-1 External NUMAlink Technical Specifications . . . . . . . . 60
Table 6-1 SGI UV 2000 PCIe Support Levels . . . . . . . . . . . 64
Table 6-2 PCIe Expansion Slot Bandwidth Support Levels . . . . . . . 65
Table 7-1 Troubleshooting Chart . . . . . . . . . . . . . . 72
Table 7-2 Power Supply LED States . . . . . . . . . . . . . 73
Table A-1 SGI UV 2000 System Configuration Ranges . . . . . . . . 79
Table A-2 SGI UV 2000 Physical Specifications . . . . . . . . . . 80
Table A-3 Environmental Specifications . . . . . . . . . . . . 81
Table A-4 Power Specifications . . . . . . . . . . . . . . . 82
Table A-5 VGA Pin Functions . . . . . . . . . . . . . . . 84
Table A-6 Ethernet Pinouts . . . . . . . . . . . . . . . . 85
Table A-7 Serial Port Pinout. . . . . . . . . . . . . . . . 87
Table A-8 Pin Assignments for USB Type A Connector . . . . . . . . 88
007-5832-002 xiii
About This Guide
This guide provides an overview of the architecture, general operation and descriptions of the major components that compose the SGI procedures for powering on and p owering of f th e syst em, bas ic trou bleshoot ing and mainte nance information, and important safety and regulatory specificatio ns .
Audience
This guide is written for owners, system administrators, and users of SGI UV 2000 computer systems. It is written with the assumption that the reader has a good working knowledge of computers and computer systems.
Important Information
W arning: To avoid pro blems that coul d void your warra nty, your S GI or other app roved
system support engineer (SSE) should perform all the set up, addition, or replacement of parts, cabling, and service of your SGI UV 200 0 s ystem, with the exception of the following items that you can perform yourself:
Using your system consol e controller to ente r commands and perf orm system func tions such as powering on and powering off, as described in this guide.
UV 2000 family of s ervers. It also pr ovides the standard
Adding and replacing PCIe cards, as described in this guide.
Adding and replacing disk drives in dual-disk enabled riser blades.
Removing and replacing the IRU power supplies.
Using the On/Off switch and other switches on the rack PDUs.
Using the ESI/ops panel (operating panel) on optional mass storage bricks.
007-5832-002 xv
About This Guide
Chapter Descriptions
The following topics are covered in this guide:
Chapter 1, “Operation Procedures,” provides instructions for powering on and powering off
Chapter 2, “System Control,” describes the function of the overall system control network
Chapter 3, “System Overview,” provides technical overview information needed to
Chapter 4, “Rack Information,” describes the rack sizes and general features.
Chapter 5, “Optional Octal Router Chassis Information,” describes the optional NUMAlink
Chapter 6, “Add or Replace Procedu res,” provides instructions for installing or removing the
Chapter 7, “Troubleshooting and Diagnostics,” provides recommended actions if problems
Appendix A, “Technical Specifications and Pinouts ‚" provi des phy sical, env ironmenta l, and
your system.
interface and provides basic instructions for operating the controllers.
understand the basic functional architecture of the SGI UV 2000 systems.
router technology abailable in SGI UV 2000 systems consisting of two or more racks. This router technology is available in an enclosure “package” known as the Octal Router Chassis.
customer-replaceable components of your system.
occur on your system.
power specifications for your system. Also included are the pinouts for the non-proprietary connectors.
Appendix B, “Safety Information and Regulatory Specifications‚" lists regulatory information related to use of the UV 2000 system in the United States and other countries. It also provides a list of safety instructions to follow when installing , operating, or servicing the product.
xvi 007-5832-002
Related Publications
The following SGI documents are relevant to the UV 2000 series system:
SGI UV CMC Software User Guide
SGI UV System Management Node Administrator's Guide
Related Publications
(P/N 007-5636-00x) This guide describes how to use the system console controller commands to monitor and
manage your SGI UV 2000 system via line commands. Coverage of control includes descriptions of the interface and usage of the commands. These commands are primarily used when a system management node is not present in the system. Note that it does not cover controller command information for the SGI UV 10 or UV 20.
(P/N 007-5694-00x) This guide covers the system management node (SMN) for SGI UV 2000 series systems. It
describes the software and hardware compo nents used with the SMN as well as providing an overview of the UV system control network. System network addressing is covered and a chapter on how to use KVM to enable remote console access from the system management node is included.
SGI Management Center Quick Start Guide
(P/N 007-5672-00x) This document may be helpful to users or administrators of SGI UV systems using the SGI
Management Center interface. The guide provides introductory information on configuration, operation and monitoring of your UV system using the management center software.
SGI Management Center System Administrator’s Guide
(P/N 007-5642-00x) This guide is intended for system administrators who work with the SGI Management
Center software GUI to manage and control SGI UV 2000 systems. Depending on your system configuration and implementation, this guide may be optional. The manual is written with the assumption the user has a good working knowledge of Linux.
SGI UV Software Install Guide
(P/N 007-5675-00x) In UV systems that come with pre-installed Linux software operati ng systems; this
document describes how to re-install it when necessary.
007-5832-002 xvii
About This Guide
SGI UV Systems Linux Configuration and Operations Guide (P/N 007-5629-00x)
This guide is a reference docum ent for people who manage the operation of SGI UV 2000 systems. It explains how to perf orm general sys tem configuration and operation un der Linux for SGI UV. For a list of manuals supporting SGI Linux releases and SGI online resources, see the SGI Performance Suite documentation.
SGI UV Systems Installation Guide (P/N 007-5675-00x)
This guide covers software installation on UV 2000 systems and their SMNs.
Linux Application Tuning Guide for SGI X86-64 Based Systems (P/N 007-5646-00x)
This guide includes a chapter that covers ad vanced tuni ng strategi es for applicati ons running on SGI UV systems as well as other SGI X86 ba sed systems.
MegaRAID SAS Software User’s Guide, publication number (860-0488-00x) This document describes the LSI Corporation’s MegaRAID Storage Manager software.
LSI Integrated SAS for RAID User’s Guide, publication number (860-0476-00x) This user guide explains how to configure and use the software components of the LSI
Integrated RAID software product used with LSI SAS controllers.
Man pages (online) Man pages locate and print the titled entries from the online reference manuals.
You can obtain SGI documentation, release notes, or man pages in the following ways:
See the SGI Technical Publications Library at http://do cs.sgi.com Various formats are available. This library contains th e most r ecent and mos t comprehen sive
set of online books, release notes, man pages, and other information.
The release notes, which contain the latest information about software and documentation in this release, are in a file named README.SGI in the root directory of the SGI ProPack for Linux Docu mentation CD.
You can also view man pages by typing man <title> on a command line.
SGI systems shipped with Linux include a set of Linux man pages, formatted in the standard UNIX “man page” style. Important system configuration files and comman ds are d ocumented on man pages. These are found online on the internal system disk (o r DVD) and are displayed using
xviii 007-5832-002
Conventions
Conventions
the man command. References in the documentation to these pages include the name of the command and the section number in whi ch the command is foun d. For example, to di splay a man page, type the request on a command line:
man commandx
For additional informati on about display ing man pages usi ng the man command, see man(1). In addition, the apropos command locates man pages b ased on ke ywords. For example, to display a list of man pages that describe disks, type t he following on a command line:
apropos disk
For information about setting up and using apropos, see apropos(1).
The following conventions are used throughout this document:
Convention Meaning
Command This fixed-space font denotes literal items such as commands, files,
routines, path names, sign als, messages, and programming language structures.
variable The italic typeface denotes variable entries and words or concepts being
defined. Italic typeface is also used for book titles.
user input This bold fixed-space font denotes literal items that the user enters in
interactive sessions. Output is shown in nonbold, fixed-space font. [ ] Brackets enclose optional portions of a command or directive line. ... Ellipses indicate that a preceding element can be repeated. man page(x) Man page section identifiers appear in parentheses after man page names. GUI element This font denotes the names of graphical us er interface (GUI) elements such
as windows, screens, dialog boxes, menus, toolbars, icons, buttons, boxes,
fields, and lists.
007-5832-002 xix
About This Guide
Product Support
Reader Comments
SGI provides a comprehensive product support and maintenance program for its products, as follows:
If you are in North America, contact the Technical Assistance Center at +1 800 800 4SGI or contact your authorized service provider.
If you are outside North America, contact the SGI subsidiary or authorized distributor in your country. International customers can visit http://www.sgi.com/support/ Click on the “Support Centers” link under the “Online Support” heading for information on how to contact your nearest SGI customer support center.
If you have comments about the technical accuracy, content, or organization of this document, contact SGI. Be sure to include the title and document number of the manual with your comments. (Online, the document number is located in the front matter of the manual. In printed manuals, the document number is located at the bottom of each page.)
You can contact SGI in any of the following ways:
Send e-mail to the following address: techpubs@sgi.com
Contact your customer service representative and ask that an incident be filed in the SGI incident tracking system.
Send mail to the following address: Technical Publications
SGI 46600 Landing Parkway Fremont, California 94538
SGI values your comments and will respond to them promptly.
xx 007-5832-002
Precautions
Chapter 1
1. Operation Procedures
This chapter explains the basics of how to operate your new system in the following sections:
“Precautions” on page 1
“Power Connections Overview” on page 2
“System Control Overview” on page 12
“Using Embedded Support Partner (ESP)” on page 22
“Optional Components” on page 23
Before operating your system, familiarize yourself with the safety information in the following sections:
“ESD Precaution” on page 1
“Safety Precautions” on page 2
ESD Precaution
Caution: Observe all ESD precautions. Failure to do so can result in damage to the equipment.
Wear a grounding wrist strap when you handle any ESD-sensitive device to eliminate possible ESD damage to equipment. Connect the wrist strap cord directly to earth ground.
007-5832-002 1
1: Operation Pr ocedures
Safety Precautions
!
Warning: Before operating or servicing any part of this product, read the “Safety
Information” on page 89.
Danger: Keep fingers and co nductive tools away from hi gh-voltage areas. Failure to
follow these precautions will result in serious injury or death. The high-voltage areas of the system are indicated with high-voltage warning labels.
Caution: Power off the system only after the system software has been shut down in an orderly
manner . If you power of f the sy stem befor e you halt the operat ing system, data may be corrupted.
W arning: If a lithium battery is installed in your system as a soldered part, only qualified
SGI service personnel should replace this lithium battery. For a battery of another type, replace it only with the same type or an equivale nt type recommended by the battery manufacturer, or an explosion could occur. Discard used batteries according to the manufacturer’s instructions.
Power Connections Overview
Prior to operation, your SGI UV 2000 system should be set up and connected by a professional installer. If you are powering on the system for the first time or want to confirm proper power connections, follow these steps:
1. Check to ensure that the power connector on the cable between the rack’s power distribution units (PDUs) and the wall power-plug receptacles are s ecurely plugged in.
2. For each individual IRU that you want to power on, make sure that the power cables are plugged into all the IRU power supplies correctly, see the example in Figure 1-1 on page 3. Setting the circuit breakers on the PDUs to the “On” position will apply power to the IRUs and will start the CMCs in the IRUs. Note that the CMC in each IRU stays powered on as
2 007-5832-002
Power Connections Overview
long as there is power coming into the unit. Turn off the PDU breaker switch on each of the PDUs that supply voltage to the IRU’s power supplies if you want to remove all power from the unit.
Important: In a system configuration using 2-outlet s ingle-phas e PDUs, each po wer supp ly in an IRU should be connected to a different PDU within the rack. This will ensure the maximum amperage output of a single PDU is not exceeded if a power supply fails.
Power cord
Figure 1-1 IRU Power Supply Cable Locatio n Example
3. If you plan to power on a server that includes optional mass storage enclosures, make sure
that the power switch on the rear of each PSU/cooling module (one or two per storage enclosure) is in the
1 (on) position.
4. Make sure that all PDU circuit breaker switches (see the examples in the following three
figures) are turned on to provide power to the server when the system is powered on.
Figure 1-2 shows an example of a single-phase 2-plug PDU that can be used with the SGI UV 2000 system. This PDU can be used to d istribute power to th e IRUs when the system is configur ed with single-phase power.
007-5832-002 3
1: Operation Pr ocedures
Figure 1-2 Single-Phase 2-Outlet PDU Example
Figure 1-3 on page 5 shows an example of an eight-plug single-phase PDU that can be used in the SGI UV 2000 rack system. This unit is used to support auxiliary equipment in the rack.
4 007-5832-002
n )
Power source
Power Connections Overview
Power distributio unit (PDU
Figure 1-3 Single-Phase 8-Outlet PDU
007-5832-002 5
1: Operation Pr ocedures
Figure 1-4 shows examples of the three-phase PDUs that can be used in the SGI UV 2000 system. These PDUs are used to distribute power to the IRUs when the system is configured with three-phase power. The enlarged section shows an optional PDU status interface panel.
Figure 1-4 Three-Phase PDU Examples
6 007-5832-002
System Connections Overview
You can monitor and interact with your SGI UV 2000 server from the following sources:
Using the SGI 1U rackmount console option you can connect directly to the system
management node (SMN) for basic monitoring and administration of the system. See “1U Console Option” in Chapter 2 for more information; SLES 11 or later is required.
A PC or workstation on the local area network can connect to the SMN’s extern al ethernet
port and set up remote console sessions or display GUI objects from the SGI Management Center interface.
A serial console display can be plugged into the CMC at the rear of IRU 001. You can also
monitor IRU information and system operational status from other IRUs that are connected to IRU 001.
These console connections enable you to view the status and error messages generated by the chassis management controllers in your SGI UV 2000 rack. For example, you can monitor error messages that warn of power or temperature values that are out of tolerance. See the section “1U Console Option” in Chapter 2, for additional in f orm ation.
The following subsections describe the options for establishing and using communication connections to work with your SGI UV 2000 system.
System Connections Overview
Connecting to the UV System Control Network
The ethernet connection is the preferred method of accessing the system console.
Administrators can perform one of the following options for connectivity:
If the SMN is plugged into the customer LAN, connect to the SMN (SSH w/ X11
Forwarding) and start the SGI Management Center remotely.
An in-rack system console can be directly connected to the system management node via
VGA and PS2, see Figure 1-5 on page 8. You can then log into the SMN and perform system administration either through CLI commands or via the SGI Management Center interface.
Note that the CMC is factory set to DHCP mode and thus has no fixed IP address and cannot be accessed until an IP address is established. See the subsection “Using DHCP to Establish an IP Address” on page 10 for more information on this topic.
007-5832-002 7
1: Operation Pr ocedures
A micro-USB serial connection can be used to communicate directly with the CMC. This connection is typically used for service purposes or for system controller and system console access in small systems where an in-rack system console is not used or available.
System Controller Access
Access to the SGI UV 2000 system controller network is accompli shed by the following connection methods:
A LAN connection to the system management node (running the SGI Management Center
A micro-USB serial cable connection to the “Console” port (see Figure 1-6) on the CMC
Note: Each IRU has two chassis management controller (CMC) slots located in the rear of the IRU directly below the cooling fans. Only one CMC is supported in each IRU. The CMC slo t on the right is the slot that is populated.
software application). This can also be done using an optional VGA-connected console, see Figure 1-5.
(see note below). See also “Serial Console Hardware Requirements” on page 9.
Mouse
Keyboard
VGA Port
Figure 1-5 System Management Node Rear Video Connections
HEARTBEAT PWR GOOD
CMCSMN ACC CONSOLE RESET
Figure 1-6 UV CMC Connection Faceplate Example
8 007-5832-002
Serial Console Hardware Requirements
The console type and how these console types are connected to the SGI UV 2000 servers is determined by what cons ole option is chosen. I f you have an SGI UV 2000 serv er and wish to use a serially-connected “dumb terminal”, you can connect the terminal via a micro-USB serial cable to the console port connector on the CMC. The terminal should be set to the following functional modes:
Baud rate of 115,200
8 data bits
One stop bit, no parity
No hardware flow control (RTS/CTS)
Note that a serial console is generally connected to the first (bottom) IRU in any single rack configuration.
Establishing a Serial Connection to the CMC on SGI UV 2000
System Connections Overview
If you have an SGI UV 2000 system and wish to use a serially-connected "dumb terminal", you can connect the terminal via a micro-USB serial cable to the console port connecto r on the C MC board of the IRU.
1. The terminal should be set to the operational modes described in the previous subsection.
Note that a serial console is generally connected to the CMC on th e first (bottom) IRU in an y single rack configuration.
2. On the system management node (SMN) port, the CMC is configured to request an IP
address via dynamic host configuration protocol (DHCP).
3. If your system does not have an SMN, the CMC address cannot be directly obtained by
DHCP and will have to be assigned, see the following subsections for more inf orm ation.
Establishing CMC IP Hardware Connections
For IP address configuration, there are two options: DHCP or static IP. The followi ng subs ections provide information on t he setup and use of both.
007-5832-002 9
1: Operation Pr ocedures
Note: Both options require the use of the CMC's serial port, refer to Figure 1-6 on page 8.
For DHCP, you must determine the IP address that the CMC has been assigned; for a static IP, you must also configure the CMC to us e the desired static IP address.
To use the serial port connection, you must attach and properly configure a micro-USB cable to the CMC's "CONSOLE" port. Configure the se rial port as described in “Serial Console Hardware Requirements” on page 9.
When the serial port session is established, the console will show a CMC login, and the user can login to the CMC as user "root" with password "root".
Using DHCP to Establish an IP Address
To determine the IP address assigned to the CMC, you must first establish a conn ection to the CMC serial port (as indi cated in the section “Serial Console Hardware Req uirements” on page 9), and run the command "ifconfig eth0". This will report the IP address that the CMC is configured to use.
Running the CMC with DHC P is not recommended as the preferred option for SGI UV 2000 systems. The nature of DHCP makes it difficult to determine the IP address of the CMC, and it is possible for that IP address to change over ti me, d epend i ng o n t he DHC P configuration usage. The exception would be a configuration where the system administrator is using DHCP to assign a "permanent" IP address to the CMC.
To switch from a static IP back to DHCP, the configuration file /etc/sysconfig/ifcfg-eth0 on the CMC must be modified (see additional instructions in the “Using a Static IP Address” section) . The file must contain the following line to enable use of DHCP:
BOOTPROTO=dhcp
Using a Static IP Address
T o configure the CMC to use a static IP add ress, the user/administrator must edit the configur ation file /etc/sysconfig/ifcfg-eth0 on the CMC. The user can use the "vi" command (i.e. "vi /etc/sysconfig/ifcfg-eth0") to modify the file.
The configuration file should be modified to contain these lines:
10 007-5832-002
System Connections Overview
BOOTPROTO=static IPADDR=<IP address to use> NETMASK=<netmask> GATEWAY=<network gateway IP address> HOSTNAME=<hostname to use>
Note that the "GATEWAY" and "HOSTNAME" lines are optional.
After modifying the file, save and write it using the vi command ":w!", and then exit vi using ":q". Then reboot the CMC (using the "reboot" command); after it reboots, it will be configured with the specified IP address.
007-5832-002 11
1: Operation Pr ocedures
System Control Overview
All SGI UV 2000 system individual rack units (IRUs) use an embedded chassis manag ement controller (CMC). The CMC communicates with both the blade-level board management controllers (BMCs) and the system management node (SMN), which runs the SGI Management Center software. In concert with the SGI Management Center software, they are generically known as the system control network.
The SGI UV 2000 system contro l network pro vides control and monitoring functionality for each compute blade, power supply, and fan assembly in each individual rack unit (IRU) enclosure in the system. The IRU is a 10U-high enclosure that supplies power, cooling, network fabric switching and system control for up to eight compute blades. A single chassis management controller blade is installed at the rear of each IRU.
The SGI Management Center System Administrator’s Guide (P/N 007-5642-00x) provides more detailed information on using the GUI to administer your SGI UV 2000 system.
The SGI Management Center is an application that provides control over multiple IRUs, and communication to other UV systems. Remote administration requ ires that the SMN be connected by an Ethernet connection to a private or public Local Area Network (LAN).
The CMC network in concert with the SMN provides the following functionality:
Powering the entire system on and off.
Powering individual IRUs on and off.
Power on/off individual blades in an IRU.
Monitoring the environmental state of the system.
Partitioning the system.
Enter controller commands to monitor or change particular system functions within a particular IRU. See the SGI UV CMC Software User G uide (P/N 007-5636-00x) for a complete list of command line interface (CLI) commands.
Provides access to the system OS console allowing you to run diagnostics and boot the system.
12 007-5832-002
Communicating with the System
The two primary ways to communicate with and administer the SGI UV 200 0 system are throug h SGI Management Center interface or the UV command line interface (C LI).
The SGI Management Center Graphical User Interface
The SGI Management Center interface is a server monitoring and management system. The SGI Management Center GUI provides status metrics on operational aspects for each node in a system. The interface can also be customized to meet the specific needs of individual systems.
The SGI Management Center System Administrator’s Guide (P/N 007-5642-00x) provides information on using the interface to monitor and maintain your SGI UV 2000 system. Also, see Chapter 2 in this guide for additional reference information on the SGI Manageme nt Center interface.
Powering-On and Off From the SGI Management Cen te r In terface
System Control Overview
Commands issued from the SGI Management Center interface ar e typically sent to all enclosures and blades in the system (up to a maximum 128 blades per SSI) dep ending on set parameter s. SGI Management Center services are started and stopped from scripts that exist in /etc/init.d
SGI Management Center, is commonly installed in /opt/sgi/sgimc, and is controlled by one of these services—this allows you to manage SGI Management Cen ter services using standar d Linux tools such as chkconfig and service.
If your SGI Management Center interface is not already ru nning, o r you are bring ing it up for the first time, use the following steps:
1. Power on the server running the SGI Management Center interface.
2. Open an ssh or other terminal session command line console to the SMN using a remote
workstation or local VGA terminal.
3. Use the information in the section “Power Connections Overview” on page 2 to ensure that
all system components are supplied with power and ready for bring up.
4. Log in to the SMN as root (the default password is sgisgi).
007-5832-002 13
1: Operation Pr ocedures
5. On the command line, enter mgrclient and press Enter. The SGI Management Center Login dialog box is displayed.
6. Enter a user name (root by default) and password (root by default) and click OK. The SGI Management Center interface is displayed.
7. The power on (green button) and power off (red button) are located in the middle of the SGI Management Center GUI’s Tool Bar - icons which provide quick access to common tasks and features.
See the SGI Management Center System Administrator’s Guide for more information.
The Command Line Interface
The UV command line interface (CLI) is accessible by logging into either a system main tenance node (SMN) or chassis management controller (CMC).
Note: The command line interface is virtually the same when used from either the SMN or the CMC. Using the command line interface from the SMN may require that the command be tar geted to a specific UV 2000 system if the SMN is managing more than one SSI.
Log in as root, (default password is root) when logging into the CMC.
Login as sysco, when logging into the SMN.
Once a connection to the SMN or CMC is established, various system control commands can be entered. See “Powering On and Off from the Command Line Interface” on page 14 for specific examples of using the CLI commands.
Powering On and Off from the Command Line Interface
The SGI UV 2000 command line interface is accessible by logging into either the system management node (SMN) as root or the CMC as root.
Instructions issued at the command line interface of a local console prompt typically only affect the local partition or a part of the system. Depending on the directory level you are logged in at, you may power up an entire partition (SSI), a single rack, or a single IRU enclosure. In CLI command console mode, you can obtain only limited information about the overall system configuration. An SMN has information about the IRUs in its SSI. Each IRU has information
14 007-5832-002
about its internal blades, and also (if other IRUs are attached via NUMAlink to the IRU) information about those IRUs.
Command Options for Power On
The following example command options can be used with either the SMN or CMC CLI:
usage: power [-vcow] on|up [system-SSN]...t urns power on
-v, --verbose verbose output
-c, --clear clear EFI variables (system and partition targets only)
-o, --override override partition check
-w, --watch watch boot progress
To monitor the power-on sequence during boot, see the section “Monitoring Power On” on page 19, the -uvpower option must be included with the command to power on.
Power On the System From the SMN Command Line
1. Login to the SMN as root, via a ter minal window similar to the following:
System Control Overview
The default password for logging in to the SMN as root is sgisgi.
# ssh -X root@uv-system-smn
root@system-smn> Once a connection to the SMN is established, the SMN prompt is presented and various
system control commands can be entered.
2. To see a list of available commands enter the following:
root@uv-system-smn>ls /sysco/bin/help
3. Change the working d irectory to sysco, similar to the following:
root@uv-system-smn>cd /sysco In the following example the system is powered on without monitoring the progress or status
of the power-on process.When a power command is issued, it checks to see if the individaul rack units (IRUs) are powered on; if not on, the power command powers up the IRUs and then the blades in the IRU are powered on.
4. Enter th e power on command, similar to the following:
sysco@uv-system-smn>power on
007-5832-002 15
1: Operation Pr ocedures
The system will take time to fully power up (depending on size and options).
Specific CLI Commands Used With the SMN
The following list of available CLI commands are specifically for the SMN:
auth authenticate SSN/APPWT change
bios perform bios actions
bmc access BMC shell
cmc access CMC shell
config show system configuration
console access system consoles
help list available commands
hel access hardware error logs
hwcfg access hardware configuration variable
leds display system LED values
log display system controller logs
power access power control/status
Type '<cmd> --help' for help on individual commands.
16 007-5832-002
Optional Power On From the CMC Command Line
Most SGI UV 2000 systems come with a system management node (SMN) and there should be few reasons for powering on the system from a CMC.
Note: Some basic systems are sold without an SMN and requir e sligh tly dif feren t admini strat ive procedures. These “SMN-less” systems are restricted in regards to number of blades and router options available.
Use the following information if you have a need to power on from a CMC rather than the SMN CLI or the SGI Management Center GUI. If the SMN is not availa ble you can still boot the system directly by using the CMC, see “Booting Directly From a CMC”.
Note: The command line interface for the CMC is virtually the same as that for the SMN, with the exception that the CMC does not have the ability to target a system when multiple systems are supported from one SMN.
System Control Overview
Booting Directly From a CM C
If a system m anagement node (SMN) is not available, it is possible to power on and administer your system directly from the CMC. When available, the optional SMN should always be the primary interface to the system.
The console type and how these console types are connected to the SGI UV 2000 systems is determined by what console option is chosen. T o monitor or adminis ter a system through the CMC network, you will need to establish a mini-USB serial connection to the CMC. See the information in the following two subsections.
Power On the System Using the CMC Network
You can use a use a direct mini-USB serial connection to the CMC to power on your UV system; note that this process is not the standard way to administer a system. Use the following steps:
1. Establish a connection (as detailed in the previous subsections). C M Cs have their rack and
“U” position set at the factory. The CMC will have an IP address, similar to the following:
172.17.<rack>.<slot>
007-5832-002 17
1: Operation Pr ocedures
2. You can use the IP address of the CMC to login, as follows:
ssh root@<IP-ADDRESS>
T yp ically, the default password for the CMC set out of the SGI factory is root. The default password for logging in as sysco on the SMN is sgisgi.
The following example shows the CMC prompt:
SGI Chassis Manager Controller, Firmware Rev. x.x.xx
CMC:r1i1c>
This refers to rack 1, IRU 1, CMC.
3. Power up your UV system using the power on command, as follows:
CMC:r1i1c> power on
The system will take time to fully power up (depending on size and options). Lar ger system s take longer to fully power on. Information on booting Linux from the shell prompt is included at the end of the subsection (“Monitoring Power On” on page 19).
Optional Power On Using the SMN to Connect to the CMC
Typically, the default password for the CMC set out of the SGI factory is root.
Use the following steps to establish a network connection from the SMN to the CMC and power on the system using the CMC prompt and the command line interface:
1. Establish a network connection to the CMC by using the ssh command from the SMN to connect to the CMC, similar to the following example:
Note: This is only valid if your PC or workstation that is connected to the CMC (via the SMN connection) has its /etc/hosts file setup to include the CMCs.
ssh root@hostname-cmc
The following example shows the CMC prompt:
SGI Chassis Manager Controller, Firmware Rev. x.x.xx
CMC:r1i1c>
18 007-5832-002
Monitoring Power On
System Control Overview
This refers to rack 1, IRU 1, CMC.
2. Power up your UV system using the power-on command, as follows:
CMC:r1i1c> power on
Note that the larger a system is, the more time it will take to power up completely. Information on booting Linux from the sh ell promp t is includ ed at the end o f the subs ection ( “Monitorin g Power On” on page 19).
Open a separate window on your PC or workstation and establish anot her connectio n to the SMN or CMC and use the uvcon command to open a system console and monitor the system boot process. Use the following steps:
CMC:r1i1c> uvcon
uvcon: attempting connection to localhost... uvcon: connection to SMN/CMC (localhost) established. uvcon: requesting baseio console access at r001i01b00... uvcon: tty mode enabled, use ’CTRL-]’ ’q’ to exit uvcon: console access established uvcon: CMC <--> BASEIO connection active ************************************************ ******* START OF CACHED CONSOLE OUTPUT ******* ************************************************ ******** [20100512.143541] BMC r001i01b10: Cold Reset via NL broadcast reset ******** [20100512.143541] BMC r001i01b07: Cold Reset via NL broadcast reset ******** [20100512.143540] BMC r001i01b08: Cold Reset via NL broadcast reset ******** [20100512.143540] BMC r001i01b12: Cold Reset via NL broadcast reset ******** [20100512.143541] BMC r001i01b14: Cold Reset via NL broadcast reset
******** [20100512.143541] BMC r001i01b04: Cold Reset via NL....
Note: Use CTRL-] q to exit the console.
007-5832-002 19
1: Operation Pr ocedures
Depending upon the size of your system , it can take 5 to 10 min utes for the UV sys tem to boot to the EFI shell. When the shell> prompt appears, enter fs0:, as follows:
shell> fs0:
At the fs0: prompt, enter the Linux boot loader information, as follows:
fs0:\> \efi\SuSE\elilo
The ELILO Linux Boot loader is called and various SGI configuration scripts are run and the SUSE Linux Enterprise Server 11 Service Pack x installation program appears.
Power off a UV System
To power down the UV system, use the power off command, as follows:
CMC:r1i1c> power off ==== r001i01c (PRI) ====
You can also use the power status command, to check the power status of your system
CMC:r1i1c> power status ==== r001i01c (PRI) ====
on: 0, off: 32, unknown: 0, disabled: 0
The following command options can be used with the power off|down command:
usage: power [-vo] off|down [system-SSN]...turns power off
-v, --verbose verbose output
-o, --override override partition check
Additional CLI Power Command Options
The following are examples of command options related to power status of the system IRUs. These commands and arguments can be used with either the SMN or CMC CLI.
usage: power [-vchow] reset [system-SSN]...toggle reset
-v, --verbose verbose output
-c, --clear clear EFI variables (system and partition
targets only)
-h, --hold hold reset high
-o, --override override partition check
-w, --watch watch boot progress
20 007-5832-002
System Control Overview
usage: power [-v] ioreset [system-SSN]...toggle I/O reset
-v, --verbose verbose output
usage: power [-vhow] cycle [system-SSN]...cycle power off on
-v, --verbose verbose output
-h, --hold hold reset high
-o, --override override partition check
-w, --watch watch boot progress
usage: power [-v10ud] [status] [system-SSN]...show power status
-v, --verbose verbose output
-1, --on show only blades with on status
-0, --off show only blades with off status
-u, --unknown show only blades with unknown status
-d, --disabled show only blades with disabled status
usage: power [-ov] nmi|debug [system-SSN]...issue NMI
-o, --override override partition check
-v, --verbose verbose output
usage: power [-v] margin [high|low|norm|<value>] [system-SSN]...power margin control high|low|norm|<value> margin state
-v, --verbose verbose output
usage: power --help
--help display this help and exit
007-5832-002 21
1: Operation Pr ocedures
Using Embedded Support Partner (ESP)
Embedded Support Part ner (ES P) aut om atical l y d e tects system conditions that indicate poten tial future problems and then notifies the appr opriate personnel. This enables you and SGI system support engineers (SSEs) to proactively support systems and resolve issues before they develop into actual failures.
ESP enables users to monitor one or more systems at a site fr om a local or remote connection. ESP can perform the following functions:
Monitor the system configuration, events, performance, and availability.
Notify SSEs when specific events occur.
Generate reports.
ESP also supports the following:
Remote support and on-site troubleshooting.
System group management, which enabl es you to manag e an entire group of systems from a single system.
For additional information on this and other available monitoring services, see the section “SGI Electronic Support” in Chapter 7.
22 007-5832-002
Optional Components
Besides adding a network-connected system console or basic VGA monitor, you can add or replace the following hardware items on your SGI UV 2000 series server:
Peripheral component interface (PCIe) cards into the optional PCIe expansion chassis.
PCIe cards into the blade-mounted PCIe riser card.
Disk drives in your dual disk drive riser card equipped compute blade.
PCIe Cards
The PCIe based I/O sub-systems, are industry standard for connecting peripherals, storage, and graphics to a processor blade. The following are the primary configurable I/O system interfaces for the SGI UV 2000 series systems:
The optional full -height two-slot internal PCIe blade is a dual-node compute blade that
supports one full-height x16 PCIe Gen3 card in the top slot and on e low-profile x16 PCIe Gen3 card in the lower slot. See Figure 1-7 on page 24 for an example.
Optional Components
The optional dual low-p rofile PCIe b lade supp orts t wo PCIe x 16 Gen 3 cards . See Figur e 1-8
on page 24 for an example.
The optional external PCIe I/O expansion chassis supports up to four PCIe cards. The
external PCIe chassis is supported by connection to a compute blade using an optional host interface card (HIC). Each x16 PCIe enabled blade ho st inter face con nector can s upp ort o ne I/O expansion chassis. See Chapter 6 for more details on the optional external PCIe chassis.
Important: PCIe cards installed in an optional two-slot PCIe blade are not hot swappabl e or hot pluggable. The compute blade using the PCIe riser must be powered down and removed from the system before installation or removal of a PCIe card(s). Also see “Installing Cards in the 1U PCIe Expansion Chassis” on page 65 for more PCIe related information.
Not all blades or PCIe cards may be available with your system configuration. Check with your SGI sales or service representative for availability. See Chapter 6, “Add or Replace Procedures” for detailed instructions on installing or removing PCIe cards or SGI UV 2000 system disk drives.
007-5832-002 23
1: Operation Pr ocedures
Figure 1-7 PCIe Option Blade Example with Full-Height and Low-Profile Slots
Figure 1-8 PCIe Option Blade Example with Two Low-Profile Slots
24 007-5832-002
PCIe Drive Controllers in BaseIO Blade
The SGI UV 2000 system offers a RAID or non-RAID (JBOD) PCIe-based drive controller that resides in the BaseIO blade’s PCIe slot. Figure 1-9 shows an example of the system disk HBA controller location in the BaseIO blade.
Note: The PCIe drive controller (u pper-r ight section in Figure 1-9) always hosts the system boot drives.
Optional Components
SAS System Disk Connectors
PCIE
LAN1
LAN0 BMC
SAS0-3 SAS4-7
VGA
2
1
0
USB
SERIAL
Figure 1-9 BaseIO Blade and PCIe Disk Controller Example
RAID PCIe Disk Controller
At the time this document was published, the optional RAID controller used in the BaseIO blade PCIe slot is an LSI MegaRAID SAS 9280-8e. This PCIe 2.0 card uses two external SAS control connectors and supports the following:
RAID levels 0, 1, 5, 6, and 10
Advanced array configuration and management utilities
007-5832-002 25
1: Operation Pr ocedures
Support for global hot spares and dedicated hot spares
Support for user-defined stripe sizes: 8, 16, 32, 64, 128, 256, 512, or 1024 KB
The RAID controller also supports the following advanced array configuration and management capabilities:
Online capacity expansion to add space to an existing drive or a new drive
No reboot necessary after expansion
Online RAID level migration, including drive migration, roaming and load balancing
Media scan
User-specified rebuild rates (specifying the percentage of system resources to use from 0
Nonvolatile random access memory (NVRAM) of 32 KB for storing RAID system
Non-RAID PCIe Disk Controller
percent to 100 percent)
configuration information; the MegaRAID SAS firmware is stored in flash ROM for easy upgrade.
At publication time, the LSI 9200-8e low-profile PCIe drive controller HBA is the defau lt non-RAID system disk controller for the SGI UV 2000. This drive controller has the following features:
Supports SATA and SAS link rates of 1.5 Gb/s, 3.0 Gb/s, and 6.0 Gb/s
Provides two x4 external mini-SAS connecto r s (SFF-8088)
The HBA has onboard Flash memory for the firmware and BIOS
The HBA is a 6.6-in. x 2.713-in., low-profile board
Supports eight-lane, full-duplex PCIe 2.0 performance
The HBA has multiple status and activity LEDs and a diagnostic UART port
A x8 PCIe s lot is required for the HBA to operate within the system
26 007-5832-002
Chapter 2
2. System Control
This chapter describes the general interaction and functions of the overall SGI UV 2000 system control. System control parameters depend somewhat on the overall size and complexity of the SGI UV 2000 but will generally include the following three areas:
The system management node (SMN) which runs the SGI Management Center software
The chassis management controllers (CMC) boards - one per IRU
The individual blade-based board management controllers (BMC) - report to the CMCs
Note: While it is possible to operate and administer a very basic (single-rack) SGI UV 2000 system without using an SMN an d SGI Management Center, this is an exception rather than rule.
Levels of System Control
The system control network configuration of your server will depend on the size of the system and control options selected. T yp ically , an E thernet LAN connection to the system controller network is used. This Ethernet connection is made from a r emote PC/workstation conne cted to the system management node (SMN). The SMN is a separate stand-alone server installed in the SGI UV 2000 rack. The SMN acts as a gateway and buffer between the UV system control network and any other public or private local area networks.
Important: The SGI UV system control network is a private , closed network. It should not be reconfigured in any way to change it from the standard SGI UV factory installation. It should not be directly connected to any other network. The UV system control network is not designed for and does not accommodate additional network traffic, routing, address naming (other than its own schema), or DCHP controls (other than its own configuration). The UV system control network also is not security hardened, nor is it tolerant of heavy network traffic, and is vulnerable to Denial of Service attacks.
007-5832-002 27
2: System Control
System Management Node (SMN) Overview
An Ethernet connection directl y from the SMN (Figure 2-1 on page 29) to a local private or publ ic Ethernet allows the system to be administered directly from a local or remote console via the SGI Management Center interface installed on the SMN. Note that there is no direct inter-connected system controller function in the optional expansi on PCIe modules.
The system controller network is designed into all IRUs. Controllers within the system report and share status information via the CMC Ethernet interconnects. This maintains controller configuration and topology information between all controllers in an SSI. Figure 2-2 on page 30 shows an example system control netwo rk using an optional and separate (remot e) workstation to monitor a single-rack SGI UV 2000 system. It is also possible to connect an optional PC or (in-rack) console directly to the SMN, see Figure 2-4 on page 34.
Note: Mass storage option enclosures are not specifically monitored by the system controller network. Most optional mass storage enclosures have their own internal microcontrollers for monitoring and controlling all elements of the disk array. See the user’s gui de for your mass storage option for more information on this topic.
For information on administering network connected SGI systems using the SGI Management Center, see the SGI Management Center System Administrator’s Guide (P/N 007-5642-00x).
28 007-5832-002
Levels of System Control
n er
Slim DVD drive option
Power Supply Module
Mouse
Keyboard
BMC Port
USB
Port 1
COM Port1
USB Port 0
Disk drive bays
VGA Port
LAN ports 1-4
System
LEDs
Full-height (full-depth)
x16 PCIe slot
Full-height (half-depth)
x16 PCIe slot
System
reset
Mai
pow
Figure 2-1 System Management Node Front and Rear Panels
CMC Overview
The CMC system for the SGI UV 2000 se rvers ma nages po wer contr ol and s equencing , pr ovides environmental control and monitoring, initiates system resets, stores identi ficat ion and configuration information, and pro vides console/di agnostic and scan interface. A CMC port fro m each chassis management controller connects to a dedicated Ethernet switch that provides a synchronous clock signal to all the CMCs in an SSI.
Viewing the system from the rear, the CMC blade is on the right side of the IRU. The CMC accepts direction from the SMN and supports powering-up and powering-down individual or groups of compute blades and environmental monitoring of all units within the IRU. The CMC sends operational requests to the Baseboard Management Controller (BMC) on each compute
007-5832-002 29
2: System Control
blade installed. The CMC provides data collected from the compute nodes within the IRU to the system management node upon request.
CMCs can communicate with the blade BMCs and ot her IRU CMCs when they are linked together under a single system image (SSI); also called a partition. Each CMC shares its information with the SMN as well as other CMCs within the SSI. Note that the system management node (serv er), optional mass storage units and PCIe expansion enclosures do not have a CMC installed.
Remote workstation monitor
Local Area Network (LAN)
SGI UV 2000
SGI UV 2000 system
Local Area Network (LAN) Cat-5 Ethernet
Figure 2-2 SG I UV 2000 LAN-attached System Control Network Example
30 007-5832-002
BMC Overview
System Controller Interaction
Each compute blade in an IRU has a baseboard management controller (BMC). The BMC is a built-in specialized microcontroller hardware component that monitors and reports on the functional “health” status of the blade. The BMC provides a key functional element in the overall Intelligent Platform Management Interface (IPMI) architecture.
The BMC acts as an interface to the higher leve ls of system control such as the IRU’ s CMC board and the higher level control system used in the system management node. The BMC can report any on-board sensor information that it has regarding temperatures, power status, operating system condition and other functio nal paramet ers that may be repor ted by the blade. When any o f the preset limits fall out of bounds, the information will be reported by the BMC and an administrator can take some corrective action. This could entai l a node shu tdown , reset (NMI ) or power cycling of the individual blade.
The individual blade BMCs do not have inf ormation on the status of other blades within the IRU. This function is handled by the CMCs and the system management node. Note that blades equipped with an optional BaseIO riser board have a dedicated BMC Ethernet port.
System Controller Interaction
In all SGI UV 2000 servers all the system controller types (SMNs, CMCs and BMCs) communicate with each other in the following ways:
System control commands and communications are passed between the SMN and CMCs via
a private dedicated gigabit Ethernet network. The CMCs communicate directly with the BMC in each installed blade by way of the IRU’s internal backplane.
All CMCs can communicate with each other via an Ethernet “ring” configuration network
established within an SSI.
In larger configurations the system con trol communication path includes a private, dedicated
Ethernet switch that allows communication between an SMN and multiple SSI environments.
007-5832-002 31
2: System Control
IRU Controllers
All IRUs have a chassis ma nagement co ntroller ( CMC) board install ed. The fo llowing subsection describes the basic features of the controllers:
Note: For additional information on controller commands, see the SGI UV CMC Software User Guide (P/N 007-5636-00x).
Chassis Management Controller Functions
The following list summarizes the control and monitoring functions that the CMC performs. Many of the controller functions are common across all IRUs , however, some functions are specific to the type of enclosure.
Monitors individual blade status via blade BMCs
Controls and monitors IRU fan speeds
1U Console Option
Reads system identification (ID) PROMs
Monitors voltage levels and reports failures
Monitors and contro l s warning LEDs
Monitors the On/Off power process
Provides the ability to create multiple system partitions
Provides the ability to flash system BIOS
The SGI optional 1U console (Figure 2-3 on page 33) is a rackmountable unit that includes a built-in keyboard/touch pad. It uses a 17-inch (43- cm) LCD flat panel display of up to 1280 x 1024 pixels.
32 007-5832-002
IRU Controllers
1
2
3
4
Figure 2-3 Optional 1U Rackmount Console
Flat Panel Rackmount Console Option Features
The 1U flat panel console option has the following listed features:
1. Slide Release - Move this tab sideways to slide the console out. It locks the drawer closed
when the console is not in use and prevents it from accidentally sliding open.
2. Handle - Used to push and pull the module in and out of the rack.
3. LCD Display Controls - The LCD controls include On/Off buttons and buttons to control
the position and picture settings of the LCD display.
007-5832-002 33
2: System Control
4. Power LED - Illuminates blue when the unit is receiving power.
The 1U console attaches to the system management node server using PS/2 and HD15M connectors or to an optional KVM switch (not provided by SGI). See Figure 2-4 for the SMN video connection points. The 1U console is basically a “dumb” VGA terminal, it cannot be used as a workstation or loaded with any system administration program.
The 27-pound (12.27-kg) console automatically goes into sleep mode when the cover is closed.
Mouse
Keyboard
Figure 2-4 System Management Node (SMN) Direct Video Connection Ports
VGA Port
34 007-5832-002
Chapter 3
3. System Overview
This chapter provides an overview of t he physical and architectural aspects of your SGI UV 2000 series system. The major components of the SGI UV 2000 series systems are described and illustrated.
The SGI UV 2000 series is a family of multiprocessor distributed shared memory (DSM) computer systems that can scale from 16 to 2,048 Intel proces sor cores as a cache-co herent single system image (SSI). Future releases may s cale to lar ger processor counts for sing le system image (SSI) applications. Contact your SGI sales or service representative for the most current information on this topic.
In a DSM system, each processor board contains me mory that it shares with the o ther p rocess ors in the system. Because the DSM system is modular, it combines the advantages of lower entry-level cost with global scalability in processors, memory, and I/O. You can install and operate the SGI UV 2000 series system in your lab or server r oom. Each 4 2U SGI rack holds o ne to four 10-U high enclos ures that support up to eight compute/memory and I/O sub modules known as “blades.” These blades contain printed circuit boards (PCBs) with ASICS, processors, memory components and I/O chipsets mounted on a mec hanical carrier. The blades slide directly in and out of the SGI UV 2000 IRU enclosures.
This chapter consists of the following sections:
“System Mo dels” on page 37
“System Architecture” on page 39
“System Features” on page 41
“System Components” on page 46
Figure 3-1 shows the front view of a single-rack SGI UV 2000 system.
007-5832-002 35
3: System Overview
SGI UV 2000
Figure 3-1 SGI UV 2000 Single-Rack System Example
36 007-5832-002
System Models
System Models
The basic enclosure within the SGI UV 2000 system is the 10U high “individual rack unit” (IRU). The IRU enclosure contains up to eight comp ute blad es conn ected to each other via a b ackp lane. Each IRU has ports that are brought out to external NUMAlink 6 connectors. The 42U rack for this server houses all IRU enclosures, option modules, and other components; up to 64 processor sockets (512 processor cores) in a single rack. The SGI UV 2000 server system requires a minimum of one BaseIO equipped blad e for every 2,048 processor cores. Higher core counts in an SSI may be available in future releases, check with your SGI sales or service representative for the most current information.
Note: Special systems operated without a system management node (SMN) must have an optional external DVD drive available to connect to the BaseIO blade.
Figure 3-2 shows an example of how IRU placement is done in a single-rack SGI UV 2000 server .
The system requires a minimum of one 42U tall rack with three single-phase power distribution unit (PDU) plugs per IRU installed in the rack. Three outlets are required to support each power shelf. There are three power su pplies per power s helf and two power connections are requ ired for an SMN.
You can also add additional PCIe expansion enclosures or RAID and non-R AID disk st orag e to your server system. Power outlet needs for these options should be calculated in advance of determining the number of outlets needed for the overall system.
007-5832-002 37
3: System Overview
UV Rack
IRU
IRU
(System Management Node)
1U (I/O) Expansion Slot
IRU
Individual Rack Unit (IRU)
IRU
Figure 3-2 SGI UV 2000 IRU and Rack
38 007-5832-002
System Architecture
The SGI UV 2000 computer system is based on a distr ibuted shared memory (DSM) architecture. The system uses a global-address-space, cache-coherent multiprocessor that scales up to 512 processor cores in a single rack. Because it is modular , the DSM combines the advantages of lower entry cost with the ability to scale processor count, memory, and I/O independently in each rack. Note that a maximum of 2,048 cores are supported on a single-system image (SSI). Larger SSI configurations may be offered in the future, contact your SGI sales or service representative for additional information.
The system architecture for the SGI UV 2000 system is a sixth-generation NUMAflex DSM architecture known as NUMAlink 6 or NL6. In the NUMAlink 6 architecture, all processors and memory can be tied together into a single logical system. This combination of processors, memory, and internal switches constitute the interconnect fabric called NUMAlink within an d between each 10U IRU enclosure.
The basic expansion building block for the NUMAlink interconnect is the processor node; each processor node consists of a dual-Hub ASIC (also known as a HARP) and two eight-core processors with on-chip secondary caches. The Intel processors are connected to the dual-Hub ASIC via quick path interconnects (QPIs). Each dual-HUB ASIC is also connected to the system’ s NUMAlink interconnect fabric through one of sixteen NL6 ports.
System Architecture
The dual-Hub ASIC is the heart of the proces sor and memory node blade technology. This specialized ASIC acts as a crossbar between the processors and the network interface. The Hub ASIC enables any processor in the SSI to access the memory of all processors in the SSI.
Figure 3-3 on page 40 shows a function a l block diagram of the SGI UV 2000 s eri es system IRU.
System configurations of up to eight IRUs can be constructed withou t the use of ex ternal routers. Routerless systems can have any number of blades up to a maximum of 64. Routerless system topologies reduce the number of external NUMAlink cables required to interconnect a system.
External optional routers are needed to support multi-rack systems with more than four IRUs, see Chapter 5, “Optional Octal Router Chassis Informat ion” for mor e information.
007-5832-002 39
3: System Overview
1
N
N
I
I
3
3
-
-
3
1
N
N I
I
3
3
-
-
1
3
0
NI3-2
N
I
1
-
2
IRU Backplane Connections
2
-
3
I
N
2
-
1
I
N
1
0
N
I
­0
3
I
N
-
N
I
0
-
N
3
I
2
-
3
N
I
N
2-2
I
-
0
-
N
N
2
I
I
1
1
-
-
3
1
N
N I
I
1
1
-
-
3
1
0-2
I
N
2-2
I
N
I
1
0
N
-
I
-
3
0
N
N
I
0
-3
N
I
2
-
3
N
I
0
-
N
3
I
2
-
3
0-2
I
N
N
N
I
0
-
N
I
0
-
N
I
2
-
3
2-2
I
N
N
I
I
3
3
-
-
3
1
N
N
N
I
I
3
3
I
-
-
2
1
3
-
2
2
3 3
2
2
-
3
I
N
2
-
1
I
N
1
0
N
-
I
3-0
N
I
N
N
I
I
1
1
-
-
1
3
N
N
I
I
1
1
-
-
3
1
2
-
3
I
N
2
-
1
I
N
N
I
1
0
-
I
3
0
N
-
1
0
I
-
N
0
-
3
I
N
-
3
I
N
1
I
N
I
N
-
1 3
N
I
-
N
5
2
2
-
N
N
I
I
3
3
-
-
3
1
N
N I
I
3
3
-
-
3
1
0
4
0
2
-
3
I
2
-
1
I
N
N
I
0
-
3
N
I
2
-
N
3
I
2
N
-
I
2
0
-
N
N
2
I
I
1
1
-
-
3
1
N
N I
I
1
1
-
-
3
1
2
-
2
I
N
2
-
0
I
N
N
I
0
-
3
N
I
2
-
3
N N
N
N
I
2
1
0
N
I
-
3
0
N
I
-
2
-
2
I
N
2
-
0
I
N
N
I
N
2
I
0
-
2
N
I
0
-
3
N
I
2
-
3
1
I
0
-
I
3
0
-
I
N
N
N
I
1
I
3
-
2
I
0
-
-
3
-
2
3
7
N
N
N
N
I
I
I
I
1
1
3
3
-
-
-
-
1
3
3
1
N
N
N
N I
3
-
-2
3
I
I
I
1
3
1
-
-
-
1
1
3
6
2
-
1
2
-
3
I
N
Left
Backplane
Figure 3-3 Functional Block Diagram of the Individual Rack Unit (IRU)
40 007-5832-002
Right
Backplane
Note: this drawing only shows the cabling between the compute blades within the IRU. External cabling (cabling that exits the IRU) is not shown.
System Features
The main features of the SGI UV 2000 series server systems are discussed in the following sections:
“Modularity and Scalability” on page 41
“Distributed Shared Memory (DSM)” on page 41
“Chassis Management Controller (CMC)” on page 43
“Distributed Shared I/ O” on page 43
“Reliability, Availability, and Serviceability (RAS)” on page 44
Modularity and Scalability
The SGI UV 2000 series systems are modular systems. The components are primarily hous ed in building blocks referred to as individual rack units (IRUs). Additional optional mass storage may be added to the rack along with additional IRUs. You can add different types of blade options to a system IRU to achieve the desired system configuration. You can easily configure systems around processing capability, I/O capability, memory size, MIC/GPU capability or storage capacity. The air-cooled IRU enclosure system has redundant, hot-swap fans and redundant, hot-swap power supplies.
System Features
Distributed Shared Memory (DSM)
In the SGI UV 2000 series server, memory is physically distributed both within and among the IRU enclosures (compute/memory/I/O blades); however, it is accessible to and shared by all NUMAlinked devices within the single-system image (SSI). This is to say that all NUMAlinked components sharing a single Linux operating system, operate and share the memory “fabric” of the system. Memory latency is the amount of time required for a processor to retrieve data from memory . Memor y latency is lowest when a processor accesses local memory. Note the following sub-types of memor y within a system:
If a processor accesses memory that is directly connected to its resident socket, the memory is referred to as local memory. Figure 3-4 on page 42 shows a conceptual block diagram of the blade’s memory, compute and I/O pathways.
If a processor needs to access memory located in another socket, or on another blade within the IRU, (or other NUMAlinked IRUs) the memory is referred to as remote memory.
007-5832-002 41
3: System Overview
The total memory within the NUMAlinked system is referred to as global memory.
CPU/DDR3 Power
DDR3
DDR3
VTT
VPPL
Clocks
Clocks
POWER
CONN
BMC to CMCEnet
BMC to CMCEnet
Vcore/
VSA
A
B
DDR3
A
B
DDR3
A
B
DDR3
A
B
DDR3
Chan-0
Chan-1
Chan-2
Chan-3
PCIe Gen 3 X16
TOP NODE PCA
PSOC
SGPIO
DDR3
BMC to CMCEnet
BMC to CMCEnet
SPI
PSOC
Spartan 6
FPGA
/ a
t
s
a
u
D
B
l
r
e
l
d
l
d
a
r
A
a P
SGPIO
RSPI
BACKPLANE CONNECTOR
s
P R O C
Q
P
I
1
QPI
l e n n
6
a
L
h
N
C 2
1
(4) QFSP iPass
C O N
PSOC
N
6
L
N
a
h
n
e
n
l
s
4
C
E C T O R
SGPIO
Local Bus 0 &1 BNI Bus 0 & 1
HARP CONNECTOR
QPI
QPI
HARP PCA
Only used
for BaseIO
PCIe X4
PCIe X4
Only used
for BaseIO
CPU/DDR3 Power
DDR3
DDR3
VTT
VPPL
Vcore/
VSA
A
B
DDR3
A
B
DDR3
A
B
DDR3
A
B
DDR3
BOTTOM NODE PCA W/BMC
Figure 3-4 Blade Node Block Diagram Example
42 007-5832-002
Chan-0
Chan-1
Chan-2
Chan-3
QPI
PCIe Gen 3 X16
Used for: PCIe x16 slot or BaseIO card
Distributed Shared I/O
Like DSM, I/O devices are distribu ted among the blade nod es within the IRUs. Each BaseIO riser card equipped blade node is accessible by all compute nodes within the SSI (partition) through the NUMAlink interconnect fabric.
Chassis Management Controller (CMC)
Each IRU has a chassis management contr olle r (CMC) located directly below the coo ling fans in the rear of the IRU. The chassis manager supports powering up and down of the compute blades and environmental monitoring of all units within the IRU.
One GigE port from each compute blade conn ects to the CMC blade via the internal IRU backplane. A second GigE port from each blade slot is also connected to the CMC. This connection is used to support a BaseIO riser card. Only one BaseIO is supported in an SSI. The BaseIO must be the first blade (lowest) in the SSI.
ccNUMA Architecture
System Features
As the name implies, the cache-coherent non-unifor m memory access (ccNUMA) architecture has two parts, cache coherency and nonuniform memory access, which are discussed in the sections that follow.
Cache Coherency
The SGI UV 2000 server series use caches to reduce memory latency . Although data exists in local or remote memory , cop ies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent.
T o keep the copies consistent, the ccNUMA architecture uses directory-based coherence protocol. In directory-based coherence protocol, each block of memory (128 bytes) has an entry in a table that is referred to as a directory . Lik e the blocks of memory that they represent, the directories are distributed among the compute/memory blade nodes. A block of memory is also referred to as a cache line.
Each directory entry indicates the state of the memory block that it represents. For example, when the block is not cached, it is in an unowned state. When only one processor has a copy of the
007-5832-002 43
3: System Overview
memory block, it is in an exclusive state. And when more than one processor has a copy of the block, it is in a shared state; a bit vector indicates which caches may contain a copy.
When a processor modifies a block of dat a, the processors that have the same block of data in their caches must be notified of the modification. The SGI UV 2000 server series uses an invalidation method to maintain cache coherence. The invalidation method purges all unmodified copies of the block of data, and the processor that wants to modify the block receives exclusive ownership of the block.
Non-uniform Memory Access (NUMA)
In DSM systems, memory is physically located at various distances from the processors. As a result, memory access times (latencies) are different or “non-uniform.” For exa mple, it takes less time for a processor blade to reference its locally installed memory than to reference remote memory.
Reliability, Availability, and Serviceability (RAS)
The SGI UV 2000 server series components have the following features to increase the reliability , availability, and serviceability (RAS) of the systems.
Power and cooling:
IRU power supplies are redundant and can be hot-swapped under most circumstances.
Note that this might not be possible in a “fully loaded” system. If all the blade positions are filled, be sure to consult with a service technician before removing a power supply while the system is running.
IRUs have overcurrent protection at the blade and power supply level. – Fans are redundant and can be hot-swapped. – Fans run at multiple speeds in the IRUs. Speed increases automatically when
temperature increases or when a single fan fai ls.
System monitoring:
System controllers monitor the inter nal power and temperature of the IRUs, and can
automatically shut down an enclosure to prevent overh eating.
All main memory has Intel Single Dev ice Data Co rrection, to detect and correct 8
contiguous bits failing in a memory device. Additionally, the main memory can detect and correct any two-bit errors coming from two memory devices (8 bits or more apart).
44 007-5832-002
System Features
All high speed links including Intel Quick Path Interconnect (QPI), Intel Scalable
Memory Interconnect (SMI), and PCIe have CRC check and retry. – The NUMAlink interconnect network is protected by cyclic redundancy check (CRC). – Each blade/node installed has status LEDs that indicate the blade’s operational
condition; LEDs are readable at the front of the IRU. – Systems support the optional Embedded Support Partner (ESP), a tool that monitors the
system; when a condition occurs that may cause a failure, ESP notifies the appropriate
SGI personnel. – Systems support remote console and maintenance activities.
Power-on and boot:
Automatic testing occurs after you power on the system. (These power-on self-tests or
POSTs are also referred to as power-on diagnostics or PODs). – Processors and memory are automatically de-allocated when a self-test failure occurs. – Boot times are minimized.
Further RAS features:
Systems have a local field-replaceable unit (FRU) analyzer. – All system faults are lo gged in files. – Memory can be scrubbed using error checking code (ECC) when a single-bit error
occurs.
007-5832-002 45
3: System Overview
System Components
The SGI UV 2000 series system features the following major components:
42U rack. This is a custom rack used for both the compute and I/O rack in the SGI UV 2000
Individual Rack Unit (IRU). This enclosure contains three power supplies, 2-8
Compute blade. Holds two processor sockets and 8 or 16 memory DIMMs. Each compute
BaseIO enabled compute blade. I/O riser enabled blade that supports all base system I/O
system. Up to four IRUs can be installed in each rack. There is also space reserved for a system management node and other optional 19-inch rackmounted components.
compute/memory blades, BaseIO and other optional riser enabled blades for the SGI UV
2000. The enclosure is 10U high. Figure 3-5 on page 47 shows the SGI UV 2000 IRU system components.
blade can be ordered with a riser card that enables the blade to support various I/O options.
functions including two ethernet connectors, one BMC ethernet port and three USB ports. System disks are always controlled by a PCIe disk controller installed in the BaseIO blade’s PCIe slot. Figure 3-6 on page 48 shows a front-view example of the BaseIO blade.
Note: While the BaseIO blade is capable of RAID 0 support, SGI does not recommend the end user configure it in this way. RAID 0 offers no fault tolerance to the system disks, and a decrease in overall system reliability. The SGI UV 2000 ships with RAID 1 functionality (disk mirroring) configured if the option is ordered.
Dual disk enabled compute blade. This riser enabled blade supports two hard disk drives
that normally act as the system disks for the SSI. This blade must be installed adjacent to and physically connected with the BaseIO enabled compute blade. JBOD, RAID 0 and RAID 1 are supported. Note that you must have the B aseIO riser blade optionally enabled to use RAID 1 mirroring on your system disk pair.
Two-Slot Internal PCIe enabled compute blade. The internal PCIe riser based compute
blade supports two internally installed PCI Express option cards. Either two half-height or one half-height and one full-height cards are supported.
MIC/GPU PCIe enabled compute blade. This blade supports one optional MIC or GPU
card in the upper slot via PCIe interface to the bottom node board. Option cards are limited, check with your SGI sales or service representative for available types supported.
External PCIe enabled compute blade. This PCIe enabled board is used in conjunction
with an external PCIe expansion enclosure. A x16 adapter card connects from the blade to the external expansion enclosure, supporting up to four PCIe option cards.
46 007-5832-002
System Components
C
C
C
C
te 7
te 6
te 5
te 4
Note: PCIe card optio ns may be limite d, check with your SGI sales or support representat ive.
ompute blade 3
ompute blade 2
ompute blade 1
Compu
blade
Compu
blade
Compu
blade
PCIE
ompute blade 0
2
1
0
SAS0-3 SAS4-7
VGA
USB
SERIAL
LAN1
LAN0 BMC
Compu
blade
PS0 PS2PS1
Figure 3-5 SGI UV 2000 IRU System Components Example
007-5832-002 47
3: System Overview
Optional BaseIO SSDs
The BaseIO blade can be configured with one or two internal 1.8-inch solid state drives (SSDs). The SSDs can be configured as JBOD or RAID1. Th e RAID1 SSD pair is a software RA ID1 and two SSDs must be ordered with the system BaseIO to enable this configuration.
PCIE
VGA
2
1
SAS0-3 SAS4-7
0
USB
SERIAL
Figure 3-6 BaseIO Riser Enabled Blade Front Panel Example
LAN1
LAN0 BMC
48 007-5832-002
MIC/GPU Enabled Compute Blade
The single-socket MIC/GPU enabled compute blade has one single socket node blade and supports one PCIe accelerator card. The MIC/GPU enabled compute blade has the following features:
One HARP ASIC based board assembly with twelve NUMALink six (NL6) ports that connect the blade to the backplane and four NL6 port s connecting the blade to external QSFP ports.
Specialized connectors support the connection to both the bottom compute no de and the top MIC or GPU board assembly.
One Bottom compute node board assembly with a single processor socket also supports eight memory DIMM slots (1600 MT/s memory DIMMs).
One baseboard management controller (BMC) and one x16 Gen3 PCIe full-height double-wide slot that supports a single MIC or GPU accelerator card.
The accelerator card connects directly to the bottom compute board assembly via a ribbon cable and draws power from the IRU backplane.
System Components
Figure 3-7 MIC/GPU Enabled Compute Blade Example Front View
007-5832-002 49
3: System Overview
Bay (Unit) Numbering
Bays in the racks are numbered using standard units. A standard uni t (S U) or un it (U) i s equ al to
1.75 inches (4.445 cm). Because IRUs occupy mu ltiple standard units, IRU locations within a rack are identified by the bottom unit (U) in which the IRU resides. For example, in a 42U rack, an IRU positioned in U01 through U10 is identified as U01.
Rack Numbering
Each rack is numbered with a three-digit number sequentiall y beginning with 001. A rack contains IRU enclosures, optional mass storage encl osures, and potentially other options. In a single compute rack system, the rack number is always 001.
Optional System Components
A vailability of opti onal components for the SGI UV 2000 sys tems may vary based on new product introductions or end-of-life components. Some options are listed in this manual, others may be introduced after this document goes to production status. Check with your SGI sales or support representative for current information on available product options not discussed in this manual.
50 007-5832-002
Overview
Chapter 4
4. Rack Information
This chapter describes the physical characteristics of the tall (42U) SGI UV 2000 racks in the following sections:
“Overview” on page 51
“SGI UV 2000 Series Rack (42U)” on page 52
“SGI UV 2000 System Rack Technical Specifications” on page 56
At the time this document was published only the tall (42U) SGI UV 2000 rack (shown in Figure 4-2) was available from the SGI factory for use with the SGI UV 2000 sys tems. Other racks may be available to house the system IRUs, check with your SGI sales or service representative for information.
007-5832-002 51
4: Rack Information
SGI UV 2000 Series Rack (42U)
The tall rack (shown in Figure 4-1 on page 53) has the following features and components:
Front and rear door. The front door is opened by grasping the outer end of the
rectangular-shaped door piece and pulling outward. It uses a key lock for security purposes that should open all the front doors in a multi-rack system (see Figure 4-2 on page 54).
Note: The front door and rear door locks are keyed differently. The optional water-chilled rear door (see Figure 4-3 on page 55) does not use a lock.
The standard rear door has a pu sh-button key lock to prevent unauthorized access to the system. The rear doors have a master key that locks and unlocks all rear doors in a system made up of multiple racks. You cannot use the rear door key to secure the front door lock.
Cable entry/exit area. Cable access openings are located in the front floor and top of the
rack. Multiple cables are attached to the front of the IRUs; therefore, a significant part of the cable management occurs in the front part of the rack. The stand-alone system management nodes have cables that attach at the rear of the rack. Rear cable connections will also be required for optional storage modules installed in the same rack with the IRU(s). Optional inter-rack communication cables can pass through the top of the rack. These are necessary whenever the system consists of multiple racks. I/O and power cables normally pass through the bottom of the rack.
Rack structural features. The rack is mounted on four casters; the two rear casters swivel.
There are four leveling pads available at the base of the rack. The base of the rack also has attachment points to support an optional ground strap, and/or seismic tie-downs.
Power distribution units in the rack. Up to 12 outlets are required for a single-rack IRU
system as follows: – Allow three outlets for the fi rst IRU – Two outlets for a maintenance node SMN (server) – Two outlets for each storage or PCIe expansion chassis – Allow three more outlets for each additional IRU in the syst em Note than an e ight outlet single-pha se PDU may be used for the system manag ement node
and other optional equipment.
52 007-5832-002
SGI UV 2000 Series Rack (42U)
Each three-phase power distribution unit has 9 outlet connections.
SGI UV 2000
Figure 4-1 SGI UV 2000 Series Rack Example
007-5832-002 53
4: Rack Information
SGI UV 2000
Figure 4-2 Front Lock on Tall (42U) Rack
54 007-5832-002
SGI UV 2000 Series Rack (42U)
SGI UV 2000
Figure 4-3 Optional Water-Chilled Cooling Units on Rear of SGI 42U Rack
007-5832-002 55
4: Rack Information
SGI UV 2000 System Rack Technical Specifications
Table 4-1 lists the technical specifications of the SGI UV 2000 series tall rack.
Table 4-1 Tall Rack Technical Specifications
Characteristic Specification
Height 79.5 in. (201.9 cm) Width 31.3 in. (79.5 cm) Depth 45.8 in. (116.3 cm) Single-rack shipping
weight (approximate ) Single-rack system weight
(approximate) Vol ta ge range
Nominal Tolerance range
Frequency Nominal Tolerance range
Phase required Single-phase or 3-phase Power requirements (max) 34.57 kVA (33.88 kW) approximate Hold time 16 ms Power cable 8 ft. (2.4 m) pluggable cords
2,381 lbs. (1,082 kg) air cooled 2,581 (1,173 kg) water assist cooling
2,300 lbs. (1,045 kg) air cooled 2,500 lbs (1,136 kg) water assist cooling
North America/International 200-240 VAC /230 VAC 180-264 VAC
North America/International 60 Hz /50 Hz 47-63 Hz
56 007-5832-002
Chapter 5
5. Optional Octal Router Chassis Information
This chapter describes the optional NUMAlink router technology available in SGI UV 2000 systems consisting of two or more racks. This router technology is available in an enclosure “package” known as the Octal Router Chassis (ORC). This optional ORC chassis can be mounted on the top of the SGI UV 2000 rack. NUMAl ink 6 advanced router technolog y reduces UV 20 00 system data transfer latency and increases bisection bandwidth performance. Router option information is covered in the following sections:
“Overview” on page 57
“SGI UV 2000 Series NUMAlink Octal Router Chassis” on page 58
“SGI UV 2000 External NUMAlink System Technical Specifications” on page 60
Overview
At the time this document was published, external NUMAlink router technology was available to support from 2 to 512 SGI UV 2000 racks. Other “internal” NUMAlink router options are also available for high-s peed communication bet ween smaller groups of SGI UV 2000 racks . For more information on these topics, contact your SGI sales or service representative.
The standard routers used in the SGI UV 2000 systems are the NL6 router blades located internally to each IRU. Each of these first level routers contain a single 16-port NL6 HARP router ASIC. Twelve ports are used for internal connetions (connecting blades together), the remaining four ports are used for external connections. The NUMAlink ORC enclosure is located at the top of each SGI UV 2000 rack equipped with the option. Each top-mo unted NUMAlink ORC enclosure contains one to eight 16-port HARP ASIC based rout er boards . Each of these router boar ds has a single NL6 HARP router ASIC. This is the same rout er ASIC that is used in the NL6 router blades installed inside the system IRUs.
Note that the ORC chassis also contains a chassis management controller (CMC) board, two power supplies and its own cooling fans.
007-5832-002 57
5: Optional Octal Router Chassis Information
SGI UV 2000 Series NUMAlink Octal Router Chassis
The NUMAlink 6 ORC router is a 7U-high fully self contained chassis that holds up to eight 16-port NL6 router blade assemblies. Figure 5-1 shows an example rear view of the ORC with no power or NUMAlink cables connected.
The NUMAlink ORC is composed of the following:
7U-high chassis
4 or 8 HARP based router blade assemblies
Cooling-fan assemblies
Chassis Management Controll er (CMC)/power supply assembly (with two power supplies)
HEARTBEAT
PWR GOOD
CMCSMN ACC CONSOLE
RESET
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
BMC ETH0 LINK HEARTBEAT PWR GD
3.3v 12v
ID
ID
ID
ID
ID
ID
ID
ID
Figure 5-1 SGI UV 2000 Optional NUMAlink ORC (Rear View)
58 007-5832-002
SGI UV 2000 Series NUMAlink Octal Router Chassis
7
6
5
4
3
2
1
0
Figure 5-2 SGI UV 2000 Optional ORC Chassis Example (Front View)
Note: The NUMAlink unit’s CMC is connected to the CMC in each IRU installed in the rack.
007-5832-002 59
5: Optional Octal Router Chassis Information
SGI UV 2000 External NUMAlink System Technical Specifications
Table 5-1 lists the basic technical specifications of the SGI UV 2000 series external NUMAlink ORC chassis.
Table 5-1 External NUMAlink T ec hnic al Specificatio n s
Characteristic Specification
Height 7U or 12.25 in. (31.1 cm) Width 13.83 in. (35.13 cm) Depth 14.66 in (37.24 cm) T op -m ount NUMAlink
router weight (approximate)
Power supply Three 760-Watt hot-plug power supplies Vol ta ge range
Nominal Frequency
Nominal Tolerance range
Phase required Single-phase Power cables 6.5 ft. (2 m) pluggable cords
53 lbs. (24.1 kg) not including attached cables
North America/International 100-240 VAC /230 VAC
North America/International 60 Hz /50 Hz 47-63 Hz
60 007-5832-002
Chapter 6
6. Add or Replace Procedures
This chapter provides information about installing and removing PCIe cards and system disk drives from your SGI system, as follows:
“Maintenance Precautions and Procedures” on page61
“Adding or Replacing PCIe Cards in the Expansion Enclosure” on page 64
“Removing and Replacing an IRU Enclosure Power Su pply” on page 68
Maintenance Precautions and Procedures
This section describes how to open the system for maintenance and upgrade, protect the components from static damage, and return the system to operation. The following topics are covered:
“Preparing the System for Maintenance or Upgrade” on page 62
“Returning the System to Operation” on page 62
W arning: To avoid problems that cou ld void your w arranty, your SGI or other approved
system support engineer (SSE) should perform all the setup, addition, or replacement of parts, cabling, and service of your SGI UV 2000 system, with the exception of the following:
Using your system console or net work access workstation to enter commands and perform system functions such as powering on and powering off, as described in this guide.
Installing, removing or replacing cards in the optional 1U PCIe expansion chassis.
Adding and replacing disk drives used with your system and using the ESI/ops panel (operating panel) on optional mass storage.
Removing and replacing IR U power supplies.
007-5832-002 61
6: Add or Replace Procedures
Preparing the System for Maintenance or Upgrade
To prepare the system for maintenance, follow these steps:
1. If you are logged on to the system, log out. Follow standard procedures for gracefully halting the operating system.
2. Go to the section “Powering-On and Off From the SGI Management Center Interface” in Chapter 1 if you are not familiar with power down procedures.
3. After the system is pow ered off, locate t he power di st ribution un i t(s) (PDUs) i n the front of the rack and turn off the circuit breaker switches on each PDU.
Note: Powering the system off is not a requirement when replacing a RAID 1 system disk. Addition of a non-RAID disk can be accomplished while the system is powered on, but the disk is not automatically recognized by system software.
Returning the System to Operation
When you finish installing or removing components , retu rn th e system to operation as follows:
1. Turn each of the PDU circuit breaker switches to the “on” position.
2. Power up the system. If you are not familiar with the proper power down procedure, review the section “Powering-On and Off From the SGI Management Center Interface” in Chapter 1.
3. Verify that the LEDs on the system power supplies and system blades tur n on an d illuminate green which indicates that the power-on procedure is proceeding properly.
If your system does not boot correctly, see “Troubleshooting Chart” in Chapter 7, for troubleshooting procedures.
62 007-5832-002
Overview of PCI Express (PCIe) Operation
P
This section provides a brief overview of the PCI Express (PCIe) technology available as an option with your system. PCI Express has both compatibility and differences with older PCI/PCI-X technology. Check with your SGI sales or service representative for more detail on specific PCI Express board options available with the SGI UV 2000.
PCI Express is compatible with PCI/PCI-X in the following ways:
Compatible software layers
Compati ble device driver models
Same basic board form factors
PCIe controlled devices appear the same as PCI/PCI-X devices to most software
PCI Express technology is different from PCI/PCI-X in the following ways:
PCI Express uses a point-to-point serial interface vs. a shared parallel bus interface used in
older PCI/PCI-X technology
PCIe hardware connectors are not compatible with PCI/PCI-X, (see Figure 6-1)
Overview of PCI Express (PCIe) Operation
Potential sustained throughput of x16 PCI Express is approximately four times that of the
fastest PCI-X throughput s
PCI 2.0 32-bit
PCI Express x1
CI Express x16
Figure 6-1 Comparison of PCI/PCI-X Connector with PCI Express Connectors
PCI Express technology uses two pairs o f wires for each tran smit and receive connection (4 wires total). These four wires are generally referr ed to as a lane or x1 connection - also called “by 1”.
007-5832-002 63
6: Add or Replace Procedures
SGI UV 2000 PCIe technology is available up to a x16 connector (64 wires) or “by 16” in PCI Express card slots. This techn ology will support PC Ie boards that use connectors up to x16 in size. Table 6-1 shows this concept.
For information on which slots in the PCIe expansion chassis support wh at lan e levels , see Table 6-2 on page 65.
Table 6-1 SGI UV 2000 PCIe Support Levels
SGI x16 PCIe Connectors
x1 PCIe cards Supported in all four slots x2 PCIe cards Supported in all four slots x4 PCIe cards Supported in all four slots x8 PCIe cards Supported in two slots x16 PCIe cards 1 slot supported x32 PCIe cards Not supported
Support levels in optional chassis
Adding or Replacing PCIe Cards in the Expansion Enclosure
Warning: Before installing, operating, or servicing any part of this product, read the
“Safety Inf ormation” on page 89.
This section provides instructions for adding or replacing a PCIe card in a PCIe expansion enclosure installed in your system. T o maximize the operating efficiency of your cards , be sure to read all the introductory matter before beginning the installation.
Caution: To protect the PCIe cards from ESD damage, SGI recommends that you use a
!
64 007-5832-002
grounding wrist strap while installing a PCIe card.
Installing Cards in the 1U PCIe Expansion Chassis
The PCIe expansion chassis functions in a s imilar manner to a computer chassis that suppor ts PCIe slots. Always follow the manufacturer’s instructions or restrictions for installing their card.
Important: Replacement (swapping) of a PCIe card in the 1U chassis may be done while the system is powered on. Addition of a new card while the system is running requires a reboot to initiate recognition and functionality. Removal (without replacement) of an existing PCIe card may cause system error messages. When installing PCIe cards, ensure that the input current rating specified on the AC input label is not exceeded.
The EB4-1U-SGI chassis provides space for up to four (4) PCIe cards in the following “lane” bandwidth configurat ions:
Table 6-2 PCIe Expansion Slot Bandwidth Support Levels
Adding or Replacing PCIe Cards in the Expansion Enclosure
PCIe expansion enclosure slot #
Slot 1 Up to x16 Bottom-left side Slot 2 Up to x4 Top-left side Slot 3 Up to x8 Top-right side Slot 4 Up to x4 Bottom-right side
PCIe connector level supported by slot
PCIe slot number location in board “carriage”
Note: Before installing the PCIe expansion cards, be sure to remove each respective slot cover and use its screw to secure your expansion card in place.
1. Working from the front of the expansion chassis, locate the two “thumb screws” that hold
the PCIe board “carriage” in the expansion chassis.
2. Turn the two thumb screws counter-clockwise until they disengage from the 1U chassis.
3. Pull the T-shaped board “carriage” out of the chassis until the slots are clear of the unit.
4. Select an available slot based on the lane support your PCIe card requires, see Table 6-2.
5. Remove the metal slot cover from the selected slot and retain its screw.
007-5832-002 65
6: Add or Replace Procedures
6. Fit the PCIe card into the slot connector with the connector(s) extending out the front of the bracket, then secure the board with the screw that previously held the metal slot cover.
7. Push the PCIe board “carriage” back into the enclosure until it is seated and t wist the retaining thumb screws clockwise (right) until fully secure.
Important: After installation, be sure to power on the PCIe expansion enclosure before re-booting your system.
Figure 6-2 The PCIe Expansion Enclosure
66 007-5832-002
Adding or Replacing PCIe Cards in the Expansion Enclosure
Figure 6-3 Card Slot Locations
007-5832-002 67
6: Add or Replace Procedures
P
Removing and Replacing an IRU Enclosure Power Supply
T o remove an d replace power supplies in an SGI UV 2000 IRU, you do not need any tools. Under most circumstances a single power supply in an IRU can be replaced without shutting down the enclosure or the complete system. In the case of a fully configured (loaded) enclosure, th is may not be possible.
Caution: The body of the power supply may be hot; allow time fo r cooling and handle wit h care.
Use the following steps to replace a power supply in the blade enclosure box:
1. Open the front door of the rack and locate the power supply that needs replacement.
2. Disengage the power-cord retention clip and disconnect the power cord from the power supply that needs replacement.
ress latch
to release
Figure 6-4 Removing an Enclosure Power Supply
68 007-5832-002
Adding or Replacing PCIe Cards in the Expansion Enclosure
3. Press the retention latch of the power supply toward the power connector to release the
supply from the enclosure, see Figure 6-4 on page 68.
4. Using the power supply handle, pull the power supply straight out until it is partly out of the
chassis. Use one hand to support the bottom of the supply as you fully extract it from the enclosure.
5. Align the rear of the replacement power supply with the enclosure opening.
6. Slide the power supply into the chassis until the retention latch engages - you should hear an
audible click.
7. Reconnect the power cord to the supply and engage the retention clip.
Note: If AC power to the rear fan assembly is dis c onnected prior to the replacement procedure, all the fans will come on and run at top speed when power is reapplied. The speeds will readjust when normal communication with the IRU’s CMC is fully established.
Replacing an IRU Enclosure Power Supply
007-5832-002 69
Chapter 7
7. Troubleshooting and Diagnostics
This chapter provides the following sections to help you troubleshoot your system:
“Troubles ho oti ng Char t” on page 72
“LED Status Indicators” on page 73
“SGI Electronic Support” on page 75
007-5832-002 71
7: Troubleshooting and Diagnostics
Troub leshoo ting Chart
Table 7-1 lists recommended actions for problems that can occur. To solve problems that are not listed in this table, use the SGI Electronic Support system or contact your SGI system support representative. For more information about the SGI Electronic Support system, see the “SGI Electronic Support” on page 75. For an international list of SGI support centers, see:
http://www.sgi.com/support/supportcenters.html
Table 7-1 Troubleshootin g Chart
Problem Description Recommended Action
The system will not power on. Ensure that the power cords of the IRU are seated properly
An individual IRU will not power on. Ensure the power cables of the IRU are plugged in.
in the power receptacles. Ensure that the PDU circuit breakers are on an d properly
connected to the wall source. If the power cord is plugged in and the circuit breaker is on,
contact your technical support organization.
Confirm the PDU(s) s upporting the IRU are on.
No status LEDs are lighted on an individual blade.
The system will not boot the operating system. Contact your SGI support organization:
The amber (yellow) status LED of an IRU power supply is lit or the LED is not lit at all. See Table 7-2 on page 73.
The PWR LED of a populated PCIe slot is not illuminated.
The Fault LED of a popu lated PCIe slot is illuminated (on).
The amber LED of a disk drive is on. Replace the disk drive.
72 007-5832-002
Confirm the blade is firmly seated in the IRU enclosure. See also “Compute/Memory Blade LEDs” on page 74.
http://www.sgi.com/support/supportcenters.html Ensure the power cable to the supply i s firmly connecte d at
both ends and that the PDU is turned to on. Check and confirm the supply is fully plugged in. If the green LED does not light, contact your support organization.
Reseat the PCI card.
Reseat the card. If the fault LED remains on, replace the card.
LED Status Indicators
There are a number of LEDs on the front of the IRUs that can help you detect, identify and potentially correct functional interruptions in the system. The followin g subsections describe these LEDs and ways to use them to understand potential problem areas.
IRU Power Supply LEDs
Each power supply installed in an IRU has a bi-color status LED. The LED will either light green or amber (ye l low), or fl ash green or yellow to indicate the status of the individual supply. See Tabl e 7- 2 fo r a complete li st.
Table 7-2 Power S upply LED States
Power supply status G reen LED Amber LED
No AC power to the supply Off Off Power supply has failed Off On
LED Status Indicators
Power supply problem warning Off Blinking AC available to supply (standby)
but IRU is off Power supply on (IRU on) On Off
Blinking Off
007-5832-002 73
7: Troubleshooting and Diagnostics
Compute/Memory Blade LEDs
Each compute/memory blade installed in an IRU has a total of seven LED indicators visable behind the perforated sheetmetal of the blade.
At the bottom end (or left side) of the blade (from left to right):
System power good green LED
BMC heartbeat green LED
Blue unit identifier (UID) LED
BMC Ethernet 1 green LED
BMC Ethernet 0 green LED
Green 3.3V auxiliary power LED
Green 12V power good LED
If the blade is properly seated and the system is powered on and there is no LED activity s howing on the blade, it must be replaced. Figure 7-1 shows the locations of the blade LEDs.
Figure 7-1 UV Compute Blade Status LED Locations Example
74 007-5832-002
Green LEDsGreen LEDs
Blue LED
SGI Electronic Support
S
rt
SGI Electronic Support provides system support and problem-solving services that function automatically, which helps resolve problems before they can affect system availability or develop into actual failures. SGI Electronic Support integrates several services so they wo rk to geth er to monitor your system, notify you if a problem exists, and search for solutions to problems.
Figure 7-2 shows the sequence of events that occurs if you use all of the SGI Electronic Support capabilities.
SGI Electronic Support
1
Customer's system
Implement solution
6
2
e-mail
3
upportfolio
Online
Page or e-mail
5
alert
SGI customer and SGI support engineer
View the case
solutions
SGI global customer suppo center
4
SGI Knowledgebase
Figure 7-2 Full Support Sequence Example
007-5832-002 75
7: Troubleshooting and Diagnostics
The sequence of events can be described as follows:
1. Embedded Support Partner (ESP) software monitors your system 24 hours a day.
2. When a specified system event is detected, ESP notifies SGI via e-mail (plain text or
3. Applications that are running at SGI analyze the information, determine whether a support
4. SGI Knowledgebase searches thousands of tested solutions for possible fixes to the problem.
5. You and the SGI support engineers can view and manage the case by using Supportfolio
6. Implement the solution.
Most of these actions occur autom atically, and you may receive solutions to problems before they affect system availability. You also may be able to return your system to service s ooner if it is out of service.
encrypted).
case should be opened, and open a case if necessary. You and SGI support engineers are contacted (via pager or e-mail) with the case ID and problem description.
Solutions that are located in SGI Knowledgebase are attached to the service cas e.
Online as well as search for additional solutions or schedule maintenance.
In addition to the event monitoring and problem reporti ng, SGI Electroni c Support monit ors both system configuration (to help with asset management) and system availability and performance (to help with capacity planning).
76 007-5832-002
SGI Electronic Support
The following three components compose the integrated SGI Electronic Support system:
SGI Embedded Support Partner (ESP) is a set of tools and utilities that are embedded in the SGI Linux ProPack release. ESP can monitor a single system or group of system s for system events, software and hardware failures, availability, p erformance, and configuration changes, and then perform actions based on those events. ESP can detect system conditions that indicate potential problems, and then alert appropriate personnel by pager, console messages, or e-mail (plain text or encrypted). Y ou also can configure ESP to notify an SGI call center about problems; ESP then sends e-mail to SGI with information about the event.
SGI Knowledgebase is a database of solutions to problems and answers to questions that can be searched by sophisticated knowledge management tools. You can log on to SGI Knowledgebase at any time to describe a problem or ask a qu estion. Knowledgebase searches thousands of possible causes, problem descriptions, fixes, and how-to in structions for the solutions that best match your description or question.
Supportfolio Online is a customer support resource that includes the latest information about patch sets, bug reports, and software releases.
The complete SGI Electronic Support services are available to customers who have a valid SGI Warranty, FullCare, FullExpress, or Mission-Critical support contract. To purchase a support contract that allows you to use the complete SGI Electronic Support services, contact your SGI sales representative. F or mor e in for mat ion ab out t he various support contracts, see the following Web page:
http://www.sgi.com/support
For more information about SGI Electronic Support, see the following Web page:
http://www.sgi.com/support/es
007-5832-002 77
Appendix A
A. Te chnical Specifications and Pinouts
This appendix contains technical specification information about your system, as follows:
“System-level Specifications” on page 79
“Physical Specifications” on page 80
“Environmental Specifications” on page 81
“Power Specifications” on page 82
“I/O Port Specifications” on page 83
System-level Specifications
Table A-1 summarizes the SGI UV 2000 system configuration ranges. Note that while each compute/memory blade holds two processor sockets (one per node board); each socket can support four, six, or eight processor “cores”.
Table A-1 SGI UV 2000 System Configuration Ranges
Category Minimum Maximum
Processors 32 processor cores (2 blades) Individual Rack Units (IRUs) 1 per rack 4 per rack Blades per IRU 2 per IRU 8 per IRU Compute/memory blade DIMM
capacity CMC units 1 per IRU 1 per IRU Number of baseIO riser enabled
blades
a. Dual-node blades support eight to 16 c ores per blade.
007-5832-002 79
8 DIMMs per blade 16 DIMMs per blad e
One per SSI One per SSI
a
2,048 processor cores (per SSI)
A: Technical Specifications and Pinouts
Physical Specifications
Table A -2 shows the physical specifications of the SGI UV 2000 system.
Table A-2 SGI UV 2000 Physical Specifications
Feature Specification
Dimensions for a single 24-inch wide tall rack, including doors and side panels
Shipping dime nsions Height: 81.2 5 in. (206.4 cm)
Single-rack shipping weight (approximate)
Single-rack system weight (approximate)
Access requirements
Front Rear Side
10U-high Indivi dual Rack Unit (IRU) enclosure specifications
Height: 79.5 in. (201.9 cm) Width: 31.3 in. (79.5 cm) Depth: 43.45 in. (110. 4 cm)
Width: 42 in. (106.7 cm) Depth: 52 in. (132.1 cm)
2,381 lbs. (1, 082 kg) air cooled 2,581 lbs. (1, 173 kg) water assist cooling
2,300 lbs. (1, 045 kg) air cooled 2,500 lbs. (1, 136 kg) water assist cooling
48 in. (121.9 cm) 48 in. (121.9 cm) None
Dimensions: 17.5 in high x 19 in (flange width) wide x 27 in deep
44.45 cm high x 48. 26 cm wide x 68.58 cm deep
Note: Racks equipped with optional top-mounted NUMAlink (ORC) routers have an additional weight of 53 lbs. (24.1 kg) plus the weight of additional cables.
80 007-5832-002
Loading...