Sun Microsystems Sun Fire 15K, Sun Fire 12K User Guide

Sun™ Fire 15K/12K Dynamic
Reconfiguration (DR) User Guide
Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A.
Part No. 816-5075-12 January 2003, Revision A
Send comments about this document to: docfeedback@sun.com
Copyright 2003Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, California95054, U.S.A.All rightsreserved. Sun Microsystems, Inc.has intellectualproperty rightsrelating to technology embodied in the product that is described in this document. In
particular,and without limitation, these intellectual property rightsmay includeone ormore ofthe U.S.patents listedat http://www.sun.com/patents, and one or moreadditional patentsor pendingpatent applicationsin theU.S. andin othercountries.
This document and the productto whichit pertainsare distributedunder licensesrestricting their use, copying, distribution, and decompilation. No part of the product orof thisdocument maybe reproducedin any form by any means without prior written authorization of Sun and its licensors, if any.
Third-partysoftware, includingfont technology,is copyrighted and licensed from Sun suppliers. Parts of the product maybe derivedfrom BerkeleyBSD systems,licensed fromthe University of California. UNIX is a registered trademarkin
the U.S. and other countries, exclusively licensed through X/OpenCompany,Ltd. Sun, Sun Microsystems,the Sunlogo, AnswerBook2,docs.sun.com, andSolaris aretrademarks, registeredtrademarks, or service marks of Sun
Microsystems,Inc. inthe U.S.and othercountries. All SPARCtrademarks areused underlicense andare trademarks or registered trademarksof SPARCInternational, Inc.in theU.S. andother
countries. Products bearingSPARCtrademarks arebased upon an architecture developedby SunMicrosystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems,Inc. forits usersand licensees.Sun acknowledges
the pioneering effortsof Xeroxin researchingand developing the concept of visual or graphical user interfaces for the computer industry.Sun holds a non-exclusive license fromXerox tothe XeroxGraphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
Use, duplication, ordisclosure bythe U.S.Government is subject to restrictionsset forthin theSun Microsystems, Inc.license agreementsand as providedin DFARS 227.7202-1(a) and 227.7202-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (Oct. 1998), FAR12.212(a)(1995), FAR52.227-19,or FAR52.227-14 (ALT III), as applicable.
DOCUMENTATIONIS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONSAND WARRANTIES, INCLUDING ANY IMPLIED WARRANTYOF MERCHANTABILITY,FITNESS FORA PARTICULARPURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THEEXTENT THAT SUCH DISCLAIMERS ARE HELD TO BELEGALLYINVALID.
Copyright 2003 Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, California95054, Etats-Unis.Tousdroitsréservés. Sun Microsystems, Inc.a lesdroits depropriété intellectuels relatants à la technologie incorporée dans le produit quiest décritdans ce
document. En particulier,et sans la limitation, ces droits depropriété intellectuelspeuvent inclureun ou plus des brevetsaméricains énumérés à http://www.sun.com/patentset unou lesbrevets plussupplémentaires ou les applications de brevet enattente dansles Etats-Uniset dans les autres pays.
Ce produit oudocument estprotégé parun copyrightet distribuéavec deslicences quien restreignentl’utilisation, la copie, la distribution, et la décompilation. Aucune partie de ce produit oudocument nepeut êtrereproduite sous aucune forme, parquelquemoyen quece soit,sans l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y ena.
Le logiciel détenu par des tiers, et qui comprendla technologierelative auxpolices decaractères, est protégé par un copyright et licencié par des fournisseurs de Sun.
Des parties de ce produitpourront êtredérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque déposée aux Etats-Unis et dans d’autres payset licenciéeexclusivement parX/Open Company,Ltd.
Sun, Sun Microsystems,le logoSun, AnswerBook2,docs.sun.com, etSolaris sontdes marquesde fabriqueou desmarques déposées de Sun Microsystems,Inc. auxEtats-Unis etdans d’autrespays.
Toutes les marquesSPARCsont utiliséessous licenceet sontdes marquesde fabrique ou des marquesdéposées deSPARCInternational, Inc. aux Etats-Unis et dans d’autrespays. Lesproduits protantles marques SPARC sont basés sur une architecture développéepar Sun Microsystems,Inc.
L’interfaced’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc.pour sesutilisateurs etlicenciés. Sun reconnaîtles effortsde pionniersde Xeroxpour la rechercheet ledéveloppment duconcept desinterfaces d’utilisationvisuelle ougraphique pour l’industrie de l’informatique. Sun détient une license non exclusive do Xerox surl’interface d’utilisationgraphique Xerox,cette licence couvrant également les licenciées de Sun qui mettent en place l’interface d ’utilisation graphique OPEN LOOK et qui en outre seconforment aux licences écrites de Sun.
LA DOCUMENTATIONESTFOURNIE "EN L’ÉTAT"ET TOUTES AUTRESCONDITIONS, DECLARATIONSETGARANTIES EXPRESSES OU TACITESSONT FORMELLEMENTEXCLUES, DANSLA MESUREAUTORISEE PARLA LOIAPPLICABLE, YCOMPRIS NOTAMMENT TOUTEGARANTIE IMPLICITERELATIVEA LAQUALITE MARCHANDE,A L’APTITUDE A UNE UTILISATIONPARTICULIEREOU A L’ABSENCEDE CONTREFAÇON.
Recycle
Contents
Preface vii Before You Read This Book vii How This Book Is Organized vii Using UNIX Commands viii Typographic Conventions ix Shell Prompts ix Related Documentation x Accessing Sun Documentation Online x Sun Welcomes Your Comments x
1. Introduction to DR on the Sun Fire 15K/12K Server 1
What Is DR? 1
Where You Execute DR Commands 1 Command Line Interface (CLI) 2 Graphical User Interface (GUI) 2 Automatic DR 2 Enhanced System Availability 3
DR Concepts 3
Detachability 3 Quiescence 4
iii
Suspend-Safe and Suspend-Unsafe Devices 4 Attachment Points 5 Conditions and States 6 DR Operations 6
Hot-Plug Hardware 7 Sun Fire 15K/12K Domains 7 Component Types 8 DR on I/O Boards 8
Solving a Problem With an I/O Device 9
Golden IOSRAM 9
DR on hsPCI+ I/O Boards 10 Permanent and Non-permanent Memory 10
Target Memory Constraints 10
Correctable Memory Errors 11 Capacity on Demand (COD) 11
DR on COD Boards 11 Enabling DR on Domains Running the Solaris 8 2/02 Operating Environment 12 An Illustration of DR Concepts 12
2. DR State and Condition Models 15
Board States and Conditions 15
Board Slot States 16
Board Occupant States 16
Board Conditions 17 Component States and Conditions 17
Component Receptacle States 17
Component Occupant States 17
Component Conditions 18
iv Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
3. DR Operations and Software Components on the Domain 19
DR Operations 19
Before You Perform DR Operations 19
Before Performing DR Operations on I/O Boards 20 Connect Operation 20 Configure Operation 22
CPUs and Memory 22
I/O Boards 23
After the Configure Operation 23 Disconnect Operation 23 Unconfigure Operation 24
Non-permanent Memory 24
Permanent Memory 24
Software Components 26
Domain Configuration Server 26 DR Driver 27 Reconfiguration Coordination Manager 27 System Events Framework 27
4. DR User Interfaces on the Domain 29
DR Commands and Options on the Domain 29
State Change Functions 30 Availability Change Functions 30 Condition Change Functions 30 Options and Operands 31
5. DR Domain Procedures 33
Attachment Points 33 Displaying Board Status 34
v
Basic Status Display 34 Detailed Status Display 34
Removing a Board 35
To Remove a CPU/Memory Board 35 To Remove an I/O Board 36
Adding a Board 37
To Install a Board 37
DR Using cfgadm(1M) - Examples 39
Displaying Help 39 Displaying Verbose Messages 39 Suppressing User Confirmation 40 Power Control When Disconnecting Boards 40 Power Control of Disconnected Boards 40 Connecting and Configuring Boards 41 Hot Plugging PCI Adapter Cards 41 Testing a Board 42 Displaying Attachment Point Information 42 Tracking Memory Unconfigure Operations 43 Finding the Board Containing Permanent Memory 43
vi Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Preface
This book describes the Dynamic Reconfiguration (DR) feature of the Sun™ Fire 15K and Sun Fire 12K systems. DR enables you to attach system boards to and detach them from Sun Fire 15K/12K domains while the Solaris operating environment continues to run.
Before You Read This Book
This book is intended for the Sun Fire 15K/12K system administrator who has a working knowledge of UNIX® systems, particularly those based on the Solaris™ operating environment. If you do not have such knowledge, first read the Solaris user and system administrator books provided with this system and consider UNIX system administration training.
How This Book Is Organized
This book contains the following chapters: Chapter 1 “Introduction to DR on the 15K/12K Server” Chapter 2 “DR State and Condition Models” Chapter 3 “DR Operations and Software Components on the Domain” Chapter 4 “DR User Interfaces on the Domain” Chapter 5 “DR Domain Procedures”
vii
Using UNIX Commands
This document may not contain information on basic UNIX® commands and procedures such as shutting down the system, booting the system, and configuring devices.
See one or more of the following for this information:
Online documentation for the Solaris™ software environment
Other software documentation that you received with your system
viii Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Typographic Conventions
Typeface or Symbol
AaBbCc123 The names of commands, files,
AaBbCc123 What you type, when
AaBbCc123 Book titles, new words or terms,
Meaning Examples
Edit your .login file. and directories; on-screen computer output
contrasted with on-screen computer output
words to be emphasized
Command-line variable; replace with a real name or value
Use ls -a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
Shell Prompt
C shell machine_name% C shell superuser machine_name# Bourne shell and Korn shell $ Bourne shell and Korn shell superuser #
Preface ix
Related Documentation
Application Title Part Number
SMS-related DR User information
SMS Administration Guide
Platform-specific release notes
SMS Release Notes System Management Services (SMS) 1.3
DR Webpage http://www.sun.com/servers/highend/dr_su
System Management Services (SMS) 1.3 Dynamic Reconfiguration User Guide
System Management Services (SMS) 1.3 Administrator Guide
Solaris 9 4/03 Release Notes Supplement for Sun Hardware
Release Notes
nfire
816-7723
816-5319
817-1106
816-5321
n/a
Accessing Sun Documentation Online
You can view and print a broad selection of Sun(TM) documentation, including localized versions, at:
http://www.sun.com/documentation
You can also purchase printed copies of select Sun documentation from iUniverse, the Sun documentation provider, at:
http://corppub.iuniverse.com/marketplace/sun
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and suggestions. You can email your comments to Sun at:
docfeedback@sun.com
Please include the part number of this document (816-5075-12) in the subject line of your email.
x Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
CHAPTER
1

Introduction to DR on the Sun Fire 15K/12K Server

This chapter contains descriptions about general concepts that pertain to the Dynamic Reconfiguration (DR) feature on the Sun Fire 15K and Sun Fire 12K servers.

What Is DR?

DR on the Sun Fire 15K/12K server enables you to perform hardware configuration changes to a live domain that is running the Solaris operating environment, without causing machine downtime. You can also use DR in conjunction with hot-swap to physically add boards to or remove them from the server.

Where You Execute DR Commands

You can execute DR operations from the Sun Fire 15K/12K system controller (SC) by using the system management services (SMS) commands: addboard(1M), moveboard(1M), deleteboard(1M), and rcfgadm(1M); or from the domain by using the cfgadm(1M) command. DR operations using SMS commands are described in Chapter 5, “DR Domain Procedures.”
Note – If the addboard(1M), moveboard(1M), deleteboard(1M), rcfgadm(1M),
or cfgadm(1M) command fails during a DR operation, the board does not return to its original state. A dxs or dca error message is logged to the domain. If the error is recoverable, you can retry the command. If the error is unrecoverable, you must reboot the domain to use the board.
1

Command Line Interface (CLI)

The DR software has a command line interface through the cfgadm(1M) command, which is the configuration administration program. The DR agent also provides a remote interface to the Sun Management Center 3.0 software.

Graphical User Interface (GUI)

The optional Sun Management Center 3.0 Platform Update 4 software, which is designed for these systems, provides features such as domain management, as well as a graphical user interface (GUI) where you perform DR operations. If you prefer to use a graphical user interface instead of a command line interface, use the Sun Management Center 3.0 software.
To use the Sun Management Center 3.0 Platform Update 4 software, you must attach the system controller board to a network. With a network connection, you can view both the command line interface and the graphical user interface. For instructions on how to use the Sun Management Center 3.0 Platform Update 4 software, refer to the Sun Management Center 3.0 User’s Guide, shipped with the Sun Management Center
3.0 Platform Update 4 software. For instructions on how to connect the system controller to a network connection on the system controller board, see your systems installation documentation.

Automatic DR

Automatic DR enables an application to execute DR operations without requiring user interaction. This ability is provided by an enhanced DR framework that includes the reconfiguration coordination manager (RCM) and the system event facility, called sysevent. The RCM enables application-specific loadable modules to register callbacks. The callbacks perform preparatory tasks before a DR operation, error recovery during a DR operation, or clean-up after a DR operation. The sysevent facility enables applications to register for system events and receive notifications of those events. The automatic DR framework interfaces with the RCM and with the sysevent facility to enable applications to automatically give up resources prior to unconfiguring them and to capture new resources as they are configured into the domain.
2 Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003

Enhanced System Availability

The DR feature enables you to hot-swap system boards without bringing the server down. It is used to unconfigure the resources on a faulty system board from a domain so that the system board can be removed from the server. The repaired or replacement board can be inserted into the domain while the Solaris operating environment continues to run. DR then configures the resources on the board into the domain. If you use the DR feature to add or remove a system board or component, DR always leaves the board or component in a known configuration state. See Chapter 2 “DR State and Condition Models” for more information about configuration states for system board and components.

DR Concepts

This section contains descriptions of general DR concepts that pertain to Sun Fire 15K/12K domains. For more information about DR concepts on the SC, refer to the System Management Services (SMS) 1.3 Dynamic Reconfiguration User Guide.

Detachability

For a device to be detachable, it must conform to the following items:
The device driver must support DDI_DETACH.
Critical resources must be redundant or accessible through an alternate pathway.
CPUs and memory banks can be redundant critical resources. Disk drives are examples of critical resources that can be accessible through an alternate pathway.
Some boards cannot be detached because their resources cannot be moved. For example, if a domain has only one CPU board, that CPU board cannot be detached. An I/O board is not detachable if it controls the boot drive.
If there is no alternate pathway for an I/O board, you can:
Put the disk chain on a separate I/O board. The secondary I/O board can then be
detached.
Add a second path to the device through a second I/O board so that the I/O
board can be detached without losing access to the secondary disk chain.
Note – If you are unsure whether a device is detachable, consult your Sun service
representative.
Chapter 1 Introduction to DR on the Sun Fire 15K/12K Server 3

Quiescence

During the unconfigure operation on a system board with permanent memory (OpenBoot™ PROM or kernel memory), the operating environment is briefly paused, which is known as operating environment quiescence. All operating environment and device activity on the domain must cease during this critical phase of the operation.
Before it can achieve quiescence, the operating environment must temporarily suspend all processes, CPUs, and device activities. If the operating environment cannot achieve quiescence, it displays the reasons, which may include the following:
An execution thread did not suspend.
A device exists that cannot be paused by the operating environment.
Note – Real-time processes do not prevent quiescence.
The conditions that cause processes to fail to suspend are generally temporary. Examine the reasons for any failure, and if the operating environment encountered a failure to suspend a process, simply try the operation again.

Suspend-Safe and Suspend-Unsafe Devices

When DR suspends the operating environment, all of the device drivers that are attached to the operating environment must also be suspended. If a driver cannot be suspended (or subsequently resumed), the DR operation fails.
A suspend-safe device does not access memory or interrupt the system while the operating environment is in quiescence. A driver is suspend-safe if it supports operating environment quiescence (if it can be suspended and then resumed). A suspend-safe driver also guarantees that when a suspend request is successfully completed, the device that the driver manages will not attempt to access memory, even if the device is open when the suspend request is made.
A suspend-unsafe device allows a memory access or a system interruption to occur while the operating environment is in quiescence.
DR uses an unsafe driver list in the dr.conf file to prevent unsafe devices from accessing memory or interrupting the operating environment during a DR operation. The dr.conf file resides in the following directory: /platform/SUNW,Sun-Fire-15000/kernel/drv/. The unsafe driver list is a property in the dr.conf file with the following format:
unsupported-io-drivers=”driver1”,”driver2”,”driver3”;
4 Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
DR reads this list when it prepares to suspend the operating environment so that it can unconfigure a memory component. If DR finds an active driver in the unsafe driver list, it aborts the DR operation and returns an error message. The message includes the identity of the active, unsafe driver. You must manually remove the usage of the device by performing one, or more, of the following tasks.
Kill the processes using the device.
Unload the driver by using the modunload(1M) command.
Disconnect the cables (depending on the type of device).
You can retry the DR operation after you have stopped usage of the device.
Note – If you are unsure whether a device is suspend-safe, contact your Sun service
representative.

Attachment Points

An attachment point is a collective term that refers to a board slot, a system board installed in the slot, and any devices connected to the board. DR can display the status of the board, the board slot, and the attachment point. The term occupant refers to the combination of a board and its attached devices.
A board slot (sometimes referred to as a receptacle) has the ability to electrically
isolate the occupant from the host machine. The software can put a board slot into low-power mode.
Board slots can be named according to slot numbers, or can be anonymous (for
example, a SCSI chain).
An occupant I/O board includes any external storage devices connected by
interface cables.
There are two types of names for attachment points:
A physical attachment point describes the software driver and location of the slot.
Examples of physical attachment point names are:
/devices/pseudo/dr@0:SBx (for a CPU/memory board in slot 0)
-OR-
/devices/pseudo/dr@0:IOx (for an I/O board or Max CPU board in slot 1)
Where, x represents the expander number (0 through 17 on the Sun Fire 15K system, and 0 through 8 on the Sun Fire 12K system) for a particular board.
Chapter 1 Introduction to DR on the Sun Fire 15K/12K Server 5
Note – CPU/memory boards are installed only in slot 0. I/O boards and Max CPU
boards are installed only in slot 1.
A logical attachment point is an abbreviated name created by the system to refer
to the physical attachment point. Logical attachment points take one of the following two forms:
SBx (for CPU/memory boards in slot 0)
-OR-
IOx (for I/O boards or Max CPU boards in slot 1)
To obtain a list of all available logical attachment points, use the cfgadm(1M) command with its -l option.

Conditions and States

A state is the operational status of either a board slot or its occupant. A condition is the operational status of an attachment point. The cfgadm(1M) command can display nine types of states and conditions. See Chapter 2, “DR State and Condition Models,” for descriptions of the conditions and states for system boards and components.

DR Operations

There are four main types of operations related to boards: connection, configuration, unconfiguration, and disconnection. A board that is brought into a domain is first connected and then configured. A board that is removed from a domain is first unconfigured and then disconnected.
During the connect operation, the system provides power to the slot, and the operating environment begins monitoring the board’s temperature.
During the configure operation, the operating environment assigns functional roles to the board, and loads device drivers for the board and for devices attached to it.
During the unconfigure operation, the system detaches the board logically from the operating environment and takes the associated device drivers offline. Environmental monitoring continues, but devices on the board are not available for system use.
During the disconnect operation, the system stops monitoring the board and power to the slot is turned off.
6 Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
To power-off a board that is in use (configured), first stop its use (unconfigure it), and then disconnect it from the domain. After a new or upgraded system board is inserted into the slot, connect the board and configure it.
The cfgadm(1M) command can connect and configure (or unconfigure and disconnect) in a single command. To connect and configure a board using a single command, see the section“Adding a Board” on page 37. To unconfigure and disconnect a board using a single command, see the section“Removing a Board” on page 35.
If necessary, each operation (connect, configure, unconfigure, or disconnect) can be performed separately using the cfgadm(1M) command.

Hot-Plug Hardware

Hot-plug boards and modules have special connectors that supply electrical power to the board or module before the data pins make contact. Boards and devices that do not have hot-plug connectors cannot be inserted or removed while the system is running.
I/O boards and CPU/memory boards used in the Sun Fire 15K/12K server are hot­plug devices. Some devices, such as the peripheral power supply, are not hot-plug modules and cannot be removed while the system is running.

Sun Fire 15K/12K Domains

The Sun Fire 15K/12K server can be divided into dynamic system domains, which are comprised of logical and physical groupings of system board slots. Each domain is electrically isolated into hardware partitions, which ensures that a problem encountered in one domain cannot affect other domains.
Domain configuration is determined by the domain configuration table in the platform configuration database (PCD), which resides on the SC. The domain table controls how system board slots are logically partitioned into domains. The domain configuration represents the intended domain configuration. Thus, the configuration can include empty slots and occupied slots.
The number of slots available to a given domain is controlled by an available component list that is maintained on the system controller. (Refer to the System Management Services (SMS) 1.3 Administrator Guide for more information about the available component list.) After a slot has been assigned to a domain, it becomes
Chapter 1 Introduction to DR on the Sun Fire 15K/12K Server 7
visible to that domain and unavailable and invisible to any other domain. Conversely, you must disconnect and unassign a slot from its domain before you can assign and connect it to another domain.
The logical domain is the set of slots that belong to the domain. The physical domain is the set of boards that are physically interconnected. A slot can be a member of a logical domain and not be part of a physical domain.
After a domain is booted, the system boards and empty slots can be assigned to (or unassigned from) a logical domain; however, they cannot become a part of the physical domain until the operating environment requests it.
System boards or slots that are not assigned to a domain are available to all domains in whose available component lists they appear. These boards can be assigned to a domain by the platform administrator. Or, an available component list can be set up on the system controller to allow users with appropriate privileges to assign available boards to a domain.

Component Types

You can use DR to configure or to unconfigure several types of components:
Component Type Description
cpu An individual CPU memory All of the memory on the board pci Any I/O device, controller, or bus

DR on I/O Boards

You must use caution when you add or remove I/O boards to which devices are attached. Before you can remove a board with I/O devices, all of its devices must be closed and all of its file systems must be unmounted.
If you need to remove an I/O board with attached devices from a domain temporarily and then re-add it before any other boards with I/O devices are added, reconfiguration is not necessary. In this case, device paths to the board devices remain unchanged.
8 Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Loading...
+ 40 hidden pages