Submit comments about this document at: http://www.sun.com/hwdocs/feedback
Copyright 2005 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
Sun Microsystems, Inc. has intellectual property rights relating to technology that is described in this document. In particular, and without
limitation, these intellectual property rights might include one or more of the U.S. patents listed at http://www.sun.com/patents and one or
more additional patents or pending patent applications in the U.S. and in other countries.
This document and the product to which it pertains are distributed under licenses restricting their use, copying, distribution, and
decompilation. No part of the product or of this document may be reproduced in any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Parts of the product might be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in
the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, AnswerBook2, docs.sun.com, Sun Fire, and Solaris™ are trademarks or registered trademarks of Sun
Microsystems, Inc. in the U.S. and in other countries.
All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and in other
countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun
holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
U.S. Government Rights—Commercial use. Government users are subject to the Sun Microsystems, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, Californie 95054, Etats-Unis. Tous droits réservés.
Sun Microsystems, Inc. a les droits de propriété intellectuels relatants à la technologie qui est décrit dans ce document. En particulier, et sans la
limitation, ces droits de propriété intellectuels peuvent inclure un ou plus des brevets américains énumérés à http://www.sun.com/patents et
un ou les brevets plus supplémentaires ou les applications de brevet en attente dans les Etats-Unis et dans les autres pays.
Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la copie, la distribution, et la
décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, par quelque moyen que ce soit, sans
l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y en a.
Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et licencié par des
fournisseurs de Sun.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, AnswerBook2, docs.sun.com, Sun Fire, et Solaris™ sont des marques de fabrique ou des marques déposées
de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays.
Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun
Microsystems, Inc.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique
pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence
couvrant également les licenciées de Sun qui mettent en place l’interface d ’utilisation graphique OPEN LOOK et qui en outre se conforment
aux licences écrites de Sun.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
Preface xi
1.Introduction to DR 1
DR on Sun Fire High-End and Midrange Systems 1
What DR Lets You Do 2
Overview of Common DR Operations 2
How to Use DR 3
Hot-Plug Hardware 4
Automatic DR (ADR) 4
Capacity on Demand (COD) 5
DR on Solaris Software 6
DR on Domains Running the Solaris 9 OS or Solaris 10 OS 6
DR on Domains Running the Solaris 8 OS 6
2.DR Concepts 7
Dynamic System Domains 8
Attachment Points 8
Attachment Point Classes 9
High-End System Attachment Points 10
Midrange System Attachment Points 10
iii
Changes To Attachment Points 11
States and Conditions 11
Board and Board Slot States 12
Board Conditions 13
Component States 13
Component Conditions 14
Detachability 14
Permanent and Non-Permanent Memory 15
Copy-Rename 15
Memory Interleaving 16
Correctable Memory Errors 16
Quiescence 16
Suspend-Safe and Suspend-Unsafe Devices 18
DR on I/O Boards 19
High-End Systems I/O Boards, Golden IOSRAM, MaxCPU, and hsPCI+ 19
Midrange Systems I/O Assemblies, PCI and CompactPCI 20
Notes about CompactPCI 20
Common DR Board Operations 21
Connect Operation 21
Configure Operation 22
Disconnect Operation 22
Unconfigure Operation 22
Illustrations of DR Concepts 23
3.Preparing to Use DR 27
The cfgadm(1M) Command 27
The rcfgadm(1M) Command (High-End Only) 29
Checking Device Type, State and Condition 30
▼To display states, types and conditions 30
iv Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
▼To display information about board slots and components 30
Preparing to Use DR on a Domain 30
▼To Display Boards Available to the Domain 31
Displaying System Board Status 31
▼To Display System Board Status 31
Testing Boards 32
▼To Test a System Board 32
▼To Test an I/O Board (Midrange Only) 33
▼To Prepare an I/O Board for DR (High-End Only) 34
4.DR Procedures – From the System Domain 37
Adding System Boards 38
▼To Add a System Board 38
▼To Connect a System Board But Not Configure it 39
▼To Configure a Connected System Board 39
Deleting System Boards 40
▼To Delete a System Board 40
▼To Unconfigure But Not Disconnect a System Board 40
▼To Delete an Unconfigured System Board 40
▼To Delete a System Board Temporarily 40
▼To Find the System Board that Contains a Domain’s Permanent Memory
41
▼To Unconfigure a System Board with Permanent Memory 41
Moving System Boards 42
▼To Move a System Board Between Domains 42
Adding I/O Boards 43
▼To Add an I/O Board 43
▼To Add and Connect an I/O Board But Not Configure it 44
▼To Configure a Connected I/O Board 44
Contents v
▼To Delete an I/O Board 45
▼To Unconfigure an I/O Board But Not Disconnect it 45
▼To Disconnect an Unconfigured I/O Board 45
Adding/Deleting/Tracking Memory and CPU 45
▼To Configure CPU on a System Board 45
▼To Configure Memory on a System Board 46
▼To Configure All CPUs and Memory on a System Board 46
▼To Unconfigure CPU on a System Board 46
▼To Unconfigure Memory on a System Board 46
▼To Unconfigure All CPUs and Memory on a System Board 47
▼To Track a Memory Unconfigure Operation 47
PCI Adapter Card Operations 47
▼To Connect a PCI slot on an I/O Board 48
▼To Configure a PCI slot on an I/O Board 48
▼To Disconnect a PCI slot on an I/O Board 48
▼To Unconfigure a PCI Slot on an I/O Board 49
5.SMS DR Procedures – From the SC (High-End Only) 51
Showing Device Information 52
▼To Show Device Information 52
Showing Platform Information 54
▼To Show Platform Information 55
Showing Board Information 55
SC State Models 55
The showboards(1M) command 56
▼To Show Board Information 57
Adding Boards 57
▼To Add a Board to a Domain 58
Deleting Boards 58
vi Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
TABLE 5-5Board State Conditions on the Sun Fire High-End Systems SC 56
TABLE 5-6addboard Command Options 61
TABLE 5-7Privileges Needed to Use the addboard command 62
TABLE 5-8deleteboard Command Options 63
TABLE 5-9Privileges Needed to Use the deleteboard Command 64
TABLE 5-10moveboard Command Options 65
ix
TABLE 5-11Privileges Needed to Use the moveboard Command 66
TABLE 5-12rcfgadm Command Options 67
TABLE 5-13Privileges Needed to Use the rcfgadm Command 68
TABLE 5-14showboards Command Options 69
TABLE 5-15showdevices Command Options 69
TABLE 5-16showplatform Command Options 70
TABLE A-1DR Operation and Command Summary 79
x Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Preface
This document describes the dynamic reconfiguration (DR) software on Sun Fire™
E25K/E20K/15K/12K systems and Sun Fire E6900/E4900/6800/4810/4800/3800
systems running the Solaris™ Operating System (Solaris OS).
This document replaces the following user guides:
■ Sun Fire High-End Systems Dynamic Reconfiguration User Guide
■ Sun Fire Midrange Systems Dynamic Reconfiguration User Guide
■ System Management Services (SMS) Dynamic Reconfiguration User Guide
Before You Read This Document
This book is intended for the Sun Fire high-end and midrange system platform
administrator who has a working knowledge of UNIX® systems, particularly those
based on the Solaris OS. If you do not have such knowledge, first read the Solaris OS
user and system administrator books provided with this system, and consider UNIX
system administration training.
Using UNIX Commands
This document does not contain information about basic UNIX® commands and
procedures such as shutting down the system, booting the system, and configuring
devices. See the following sources for this information:
■ Software documentation that you received with your system
■ Solaris OS documentation, which is at: http://docs.sun.com
xi
Shell Prompts
ShellPrompt
C shell machine-name%
C shell superuser machine-name#
Bourne shell and Korn shell $
Bourne shell and Korn shell superuser#
Typographic Conventions
1
Typeface
AaBbCc123The names of commands, files,
AaBbCc123What you type, when contrasted
AaBbCc123Book titles, new words or terms,
1 The settings on your browser might differ from these settings.
MeaningExamples
Edit your .login file.
and directories; on-screen
computer output
with on-screen computer output
words to be emphasized.
Replace command-line variables
with real names or values.
Use ls-a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
xii Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Sun is not responsible for the availability of third-party web sites mentioned in this
document. Sun does not endorse and is not responsible or liable for any content,
advertising, products, or other materials that are available on or through such sites
or resources. Sun will not be responsible or liable for any actual or alleged damage
or loss caused by or in connection with the use of or reliance on any such content,
goods, or services that are available on or through such sites or resources.
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can submit your comments by going to:
http://www.sun.com/hwdocs/feedback
Please include the title and part number of your document with your feedback:
Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide, part
number 819-1501-10.
xiv Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
CHAPTER
1
Introduction to DR
The Sun Fire high-end and midrange systems listed in the Preface can be divided
into domains, each functioning as a separate computer, running its own operating
system (see
(DR) feature lets you enable and disable a domain’s system boards, I/O boards, and
certain components while that domain continues running.
Part of DR runs on Solaris software in the domain and is managed through the
cfgadm(1M) command. Another part runs on the system controller (SC).
This chapter covers the following topics:
■ “DR on Sun Fire High-End and Midrange Systems” on page 1
■ “What DR Lets You Do” on page 2
■ “How to Use DR” on page 3
■ “Hot-Plug Hardware” on page 4
■ “Automatic DR (ADR)” on page 4
■ “Capacity on Demand (COD)” on page 5
■ “DR on Solaris Software” on page 6
“Dynamic System Domains” on page 8). The dynamic reconfiguration
DR on Sun Fire High-End and Midrange
Systems
System boards on midrange systems are sometimes called CPU/Memory boards. They
are the same boards as those on high-end systems. This document exclusively uses
the term system board. System boards are interchangable between high-end and
midrange platforms.
High-end system I/O boards and midrange systems I/O assemblies are similar in
some ways, but different in others. This document uses the term I/O board for both
except when necessary for clarity.
1
The I/O buses on a high-end system I/O board support PCI or hsPCI+ cards and
MaxCPU boards. A MaxCPU board fits into slot 1 and contains two CPUs and no
memory.
Midrange system I/O boards support PCI or CompactPCI cards.
This document uses the generic term PCI when referring to hsPCI+ and CompactPCI
cards except when clarity demands otherwise.
What DR Lets You Do
Some of the tasks you can use DR for include:
■ Display the status and state of system or I/O boards and some components to
help you prepare for DR operations.
■ Test live boards.
■ Logically detach (electrically isolate) system or I/O boards from a domain in
preparation for moving to another domain or removal from the system while the
domain remains running. The detach operation is sometimes called a delete board
action.
■ Logically attach system or I/O boards to a domain, to add resources or replace a
removed board, while the domain remains running. The attach operation is
sometimes called an add board action.
■ Configure or unconfigure CPU or memory modules on system boards to control
power and capacity of a domain or isolate faulty components.
■ Enable or disable PCI cards or related components and slots.
For example, you can DR detach a faulty system board, then use the system’s hotplug feature to physically remove it. After plugging in the repaired board or a
replacement, you can use DR to configure the board into the domain. If you use the
DR feature to add or remove a system board or component, DR always leaves the
board or component in a known configuration state. See
“States and Conditions” on
page 11 for more information about configuration states for system boards and
components.
You can also assign a system board or I/O board to a different domain for load
balancing or to provide extra capabilities for specific tasks.
Overview of Common DR Operations
DR software enables you to do the following tasks:
■ Add, delete, or move system boards or I/O boards between domains.
2 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
■ Configure or unconfigure CPU or memory modules on system boards.
■ Connect and configure or disconnect and unconfigure PCI cards on I/O boards.
The four main types of DR operations that support the above actions are connect,
configure, unconfigure, and disconnect.
TABLE 1-1 Main DR Operations
OperationDescription
ConnectProvides power to the slot that holds a board and begins system
monitoring of the board’s temperature.
ConfigureMakes the operating system assign functional roles to a board, and load
device drivers for the board, and for devices attached to the board. The
configure operation includes a connect operation.
UnconfigureLogically detaches a board from the operating system and takes the
associated device drivers offline. Environmental monitoring continues, but
devices on the board are not available for system use.
DisconnectTurns off power to the slot that holds the board and stops monitoring the
board. The disconnect operation includes an unconfigure operation.
Note – If a system board is in use, you must stop its use and disconnect it from the
domain before you power it off. After a new or upgraded system board is inserted
and powered on, connect its attachment point (see
“Attachment Points” on page 8)
and configure it for use by the operating system. For more information about DR
operations, see “Common DR Board Operations” on page 21.
How to Use DR
You can initiate DR operations in any of the following ways:
■ Use the GUI provided by Sun™ Management Center software. For more
information, see the Sun Management Center User’s Guide.
■ Use the Solaris command cfgadm(1M) with the appropriate options and flags in
the domain.
use cfgadm with its DR-related options, organized by task.
■ On high-end systems, use the System Management Services (SMS) DR command
rcfgadm(1M) on the SC. rcfgadm(1M) takes the same DR-related options as
cfgadm(1M). The main visible difference is that rcfgadm(1M) often requires an
additional -d domain_id parameter. For information about rcfgadm(1M), see
“rcfgadm(1M)” on page 67.
“DR Procedures – From the System Domain” on page 37 tells how to
Chapter 1 Introduction to DR 3
■ On high-end systems, use the SMS DR commands (besides rcfgadm(1M)) on the
SC. The SMS DR commands include addboard(1M), moveboard(1M),
deleteboard(1M), )and others. You can find information about these commands
in
“SMS DR Procedures – From the SC (High-End Only)” on page 51, in the SMS
Reference Manual, or by executing the man(1) command in an SC window running
SMS software.
When running DR on a midrange system you might need to execute one or more
midrange system SC commands – such as showplatform and showboards –
before or during DR operations. Their use is briefly described where appropriate in
this document, and you can find more information about them in the Sun Fire Midrange Systems Controller Command Reference Manual.
Caution – The midrange system SC commands addboard and deleteboard are
not DR commands like the high-end system SMS commands of the same name. You
can safely use these midrange system SC commands only when the domain is
powered off. For more information about these and other midrange system SC
commands, see the Sun Fire Midrange Systems Controller Command Reference Manual.
Hot-Plug Hardware
A hot-pluggable device can be logically connected to or disconnected from a running
system. (A hot-swappable device can be physically connected to or disconnected from a
running system.) Hot-pluggable boards and modules have special connectors that
supply electrical power to the board or module before the data pins make contact.
Boards and devices that have hot-plug connectors can be inserted or removed while
the system is running; that is, they are hot-swappable.
System boards and I/O boards are hot-plug devices. However, some devices, such as
the peripheral power supply, are not hot-plug modules and cannot be disconnected
while the system is running.
Automatic DR (ADR)
Automatic DR (ADR) lets your applications execute DR operations with no user
interaction. ADR uses an enhanced DR framework that includes the reconfiguration
coordination manager (RCM) and the system event facility, sysevent. The RCM
enables application-specific loadable modules to register callbacks. The callbacks can
4 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
perform preparatory tasks before, error-recovery actions during, and clean-up after a
DR operation. The system event framework enables applications to register for
system events and receive notifications of those events.
ADR interfaces with the RCM and sysevent to enable applications to automatically
give up resources prior to unconfiguring them, and to capture new resources as they
are configured into the domain.
An application can execute the cfgadm(1M) command from a domain, which is
called local ADR. In addition, on high-end systems, the application can execute an
SMS DR command from the SC, which is called global ADR. On high-end systems
you can use global ADR to move system boards from one domain to another,
configure hot-swapped boards into a domain, and remove system boards from a
domain.
Capacity on Demand (COD)
The Capacity on Demand (COD) option provides additional CPU resources on COD
system boards that you install in your Sun Fire system. A Sun Fire COD system can
have a mix of both standard and COD system boards installed. At least one active
CPU is required for each domain in the system.
You can use DR to move COD boards into and out of domains in the same way you
use it to move standard system boards. But you can use the CPUs on a COD board
only after you purchase right-to-use (RTU) licenses for them. Each COD RTU license
entitles you to receive a COD RTU license key that enables a specified number of
CPUs on COD boards in a single system.
Whenever you use DR to configure a COD board into a domain, make sure enough
RTU licenses are available to the target domain to enable each active CPU on the
COD board. If the target domain does not have enough RTU licences available to it
when you attempt to add a COD board, the system displays a status message for
each CPU that cannot be enabled in the domain.
For more information about the COD option for high-end systems, see the System Management Services (SMS) Administrator Guide.
Chapter 1 Introduction to DR 5
DR on Solaris Software
This document describes the latest version of DR as it runs on or with the latest
Solaris 8, Solaris 9, and Solaris 10 software releases. Be sure to check the SunSolve
database at
Note – Sun Microsystems suggests you run the latest versions of all Sun software on
your systems for the highest performance and to take advantage of the latest
enhancements.
The following sections describe any special considerations for using DR with specific
Solaris releases.
http://sunsolve.sun.com for the latest patches.
SM
DR on Domains Running the Solaris 9 OS or
Solaris 10 OS
The Solaris 10 3/05 HW1 OS is the first release of Solaris 10 software to support the
UltraSPARC® IV+ system board, and the Solaris 9 9/05 OS is the first release of
Solaris 9 software to do so. You can add UltraSPARC IV+ boards to a domain
configured with older boards, but you cannot use DR to add an older board to a
domain that was booted with all UltraSPARC IV+ boards. (You can add an older
board to a domain booted with all UltraSPARC IV+ boards if you shut down the
domain first.)
For additional information about domain restrictions with UltraSPARC IV+ boards
on Sun Fire midrange systems, see the Sun Fire Midrange Systems Platform Administration Manual for Firmware Release 5.19.
DR on Domains Running the Solaris 8 OS
The Solaris 8 2/02 OS was the first release of Solaris 8 software to support DR of I/O
boards. In addition, System Management Services (SMS) 1.3 on Sun Fire high-end
systems is the first release of SMS to fully support DR. You can enable the full
functionality of DR on domains running software no earlier than the Solaris 8 2/02
OS by installing patches and a new kernel update on the domain; and by installing
the latest version of SMS software on your high-end server’s system controller (SC).
The Solaris 8 OS does not support UltraSPARC IV+ boards.
6 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
CHAPTER
2
DR Concepts
This chapter describes the DR concepts you should understand before attempting to
use DR.
If you plan to execute DR operations on a high-end server’s system controller (SC)
using SMS DR commands, be sure to read
the SC (High-End Only)” on page 51. Some of the information in this chapter is
repeated in Chapter 5, but from a different perspective. Reading both chapters might
yield a more comprehensive picture of the DR feature.
This chapter covers the following topics:
■ “Dynamic System Domains” on page 8
■ “Attachment Points” on page 8
■ “States and Conditions” on page 11
■ “Detachability” on page 14
■ “Permanent and Non-Permanent Memory” on page 15
■ “Quiescence” on page 16
■ “Suspend-Safe and Suspend-Unsafe Devices” on page 18
■ “DR on I/O Boards” on page 19
■ “Common DR Board Operations” on page 21
■ “Illustrations of DR Concepts” on page 23
Chapter 5, “SMS DR Procedures – From
Note – The UltraSPARC IV+ board contains dual-core CPUs. References in this
document to CPUs or processors might refer to either single-core or double-core
types, and all procedures apply to both.
7
Dynamic System Domains
The Sun Fire system can be divided into domains. Each domain is based on the
system board slots that are assigned to it. Further, each domain is electrically
isolated into hardware partitions, which ensures that any failure in one domain does
not affect the other domains in the server.
Each domain configuration is determined in a onfiguration database which resides
on the SC. The configuration database – on high-end systems, the platform
configuration database (PCD) – controls how the system board slots are logically
partitioned into domains. The domain configuration represents the intended domain
configuration. Thus, the configuration can include empty slots and populated slots.
The physical domain is determined by the logical domain.
The number of slots available to a given domain is controlled by an ACL. ACL is an
abbreviation for available component list on high-end system domains, or access
control list on midrange system domains. The ACL for all domains is maintained on
the SC. A slot must be assigned or available to a domain before you can change its
state. After a slot has been assigned to a domain, it becomes visible to that domain
and invisible and unavailable to all other domains. Conversely, you must disconnect
and unassign a slot from its domain before you can assign and connect it to another
domain.
The logical domain is the set of slots that belong to the domain. The physical domain
is the set of boards that are physically interconnected. A slot can be a member of a
logical domain without having to be part of a physical domain. After the domain is
booted, the system boards and the empty slots can be assigned to or unassigned
from a logical domain; however, they are not allowed to become a part of the
physical domain until the operating system requests it. System boards or slots that
are not assigned to any domain are available to all domains. These boards can be
assigned to a domain by the platform administrator; however, an ACL can be set up
on the SC to allow users with appropriate privileges to assign available boards to a
domain.
Attachment Points
An attachment point is a collective term for a board or device, the slot that holds it,
and any components on it. Slots are sometimes called receptacles.
Sun Fire systems support the following attachment points:
8 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
■ Board attachment point – A system or I/O board slot, the board in that slot, and
any devices connected to the board.
■ PCI attachment point – A PCI card and its attachment to the PCI bus that holds it.
■ Component attachment point – A CPU or memory module and its connection to the
system board. A component attachment point is sometimes called a dynamic
attachment point.
Note – Many users are concerned only with changing the status of boards and
devices. So, for simplicity, some procedures in this document refer to board
attachment points simply as boards, PCI attachment points as PCI cards, and
component attachment points as CPU or memory modules. Where simplification
might cause confusion, proper names are used.
The term occupant refers to the combination of a board and its attached devices,
including any external storage devices connected by interface cables.
Board slots can be named according to slot numbers, or can be anonymous (for
example, when in a SCSI chain).
DR recognizes two types of attachment point names:
■ Physical attachment point – The software driver and the location of the slot.
■ Logical attachment point – An abbreviated name created by the system to see the
physical attachment point.
To obtain a list of all available logical attachment points, use the following command
in the domain:
# cfgadm -l
Attachment Point Classes
Sun Fire systems support classes of attachment points. The two classes DR users
need to know about are sbd and pci.
■ sbd – System boards, CPU and memory modules, and the CPU and memory
modules’ connections to the system board. Also, I/O boards, PCI buses, and the
PCI buses’ connections to the I/O board.
■ pci – PCI cards, which connect into PCI buses.
Chapter 2 DR Concepts 9
To view a list of the attachment points and the type of board associated with each,
use the following command as superuser:
# cfgadm -s -a “cols=ap_id:class”
High-End System Attachment Points
Examples of physical attachment point names on high-end systems are:
/devices/pseudo/dr@0:SBx (for a system board in slot 0)
/devices/pseudo/dr@0:IOx (for an I/O board in slot 1)
where 0 is node 0 (zero), SB is a system board, IO is an I/O board, and x represents
the board number or expander number for a particular board. System boards and
I/O boards are numbered 0 to 17.
Note – System boards are installed only in slot 0. I/O boards and Max CPU boards
are installed only in slot 1.
Logical attachment points on a high-end system take one of the following two forms:
SBx (for system boards)
IOx(for I/O boards or Max CPU boards)
Midrange System Attachment Points
Examples of physical attachment point names on a midrange system are:
/devices/ssm@0,0:N0.SBx (for a system board)
/devices/ssm@0,0:N0.IBx (for an I/O board)
where N0 is node 0 (zero), SB is a system board, IB is an I/O board, and x is a slot
number (0 through 5 for a system board, 6 through 9 for an I/O board).
10 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Logical attachment points on midrange systems take one of the following two forms:
N0.SBx (for a system board)
N0.IBx (for an I/O board)
Changes To Attachment Points
You can use the cfgadm(1M) command to change attachment points. You can:
■ Change the state of an attachment point. The specific cfgadm(1M) options are:
■ configure
■ unconfigure
■ connect
■ disconnect
■ Change the availability of an attachment point’s associated board. The specific
cfgadm(1M) options are:
■ assign
■ unassign
■ Change the condition of an attachment point’s board slot. The specific
cfgadm(1M) options are:
■ poweron
■ poweroff
■ test
For information about states, see the sections that follow. For more information
about attachment points, see the cfgadm(1M) man page.
States and Conditions
This section describes the states and conditions of boards, slots, components, and
attachment points.
■ State is the operational status of either a board slot or its occupant.
■ Condition is the operational status of an attachment point.
The cfgadm(1M) command can display nine types of states and conditions. For
more information, see
Conditions” on page 14.
“Component States” on page 13 and “Component
Chapter 2 DR Concepts 11
Note – The following information about boards and board slots also applies to PCI
cards and the PCI buses that hold them.
Board and Board Slot States
When a board slot does not hold a board, its state is empty. When the slot does
contain a board, the state of the board is either disconnected or connected.
TABLE 2-1 Board and Board Slot States
StateDescription
emptyThe slot does not hold a board.
disconnectedThe board in the slot is disconnected from the system bus. A board
can be in the disconnected state without being powered off.
However, a board must be powered off and in the disconnected
state before you remove it from the slot. A newly inserted board is
in the disconnected state.
connectedThe board in the slot is powered on and connected to the system
bus. You can view the components on a board only after it is in the
connected state.
Caution – Physically removing a board that is in the connected state, or that is
powered on and in the disconnected state, crashes the operating system and can
result in permanent damage to that system board.
A board in the connected state is either configured or unconfigured. A board
that is disconnected is always unconfigured.
TABLE 2-2 Conrfigured and Unconfigured Boards
NameDescription
configuredThe board is available for use by the Solaris software.
unconfiguredThe board is not available for use by the Solaris software.
12 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
The following states are visible only from the SC:
TABLE 2-3 Board States Visible Only From the SC
NameDescription
AvailableThe slot, which might or might not contain a board, is not assigned
to any particular domain.
AssignedThe slot, which might or might not contain a board, belongs to a
domain, but the hardware has not been configured to use it.
ActiveThe board in the slot is being actively used by the domain to which
it has been assigned. You cannot reassign an active board.
Board Conditions
A board can be in one of three conditions: unknown, ok, or failed. Its slot might be
designated as unusable.
TABLE 2-4 Board and Board Slot Conditions
NameDescription
unknownThe board has not been tested.
okThe board is operational.
failedThe board failed testing.
unusableThe board slot is unusable.
Component States
Unlike a board, a CPU or memory module cannot be individually connected or
disconnected. Thus, all such components are in the connected state.
The connected component is either configured or unconfigured.
TABLE 2-5 Connected Components: Configured or Unconfigured
NameDescription
configuredThe component is available for use by the Solaris OS.
unconfiguredThe component is not available for use by the Solaris OS.
Chapter 2 DR Concepts 13
Component Conditions
A CPU or memory module is unknown, ok, or failed.
TABLE 2-6 CPU or Memory Module Conditions
NameDescription
unknownThe component has not been tested.
okThe component is operational.
failedThe component failed testing.
Detachability
A detachable device is one that conforms to the following rules:
■ The device driver must support DDI_DETACH.
■ Critical resources must be redundant or accessible through an alternate pathway.
CPUs and memory banks can be redundant critical resources. Disk drives are
examples of critical resources that can be accessible through an alternate pathway.
Some boards cannot be detached because their resources cannot be moved. For
example, if a domain has only one CPU board, that CPU board cannot be detached.
An I/O board is not detachable if it controls the boot drive.
If an I/O board has no alternate pathway, you can do one of the following:
■ Put the disk chain on a separate I/O board. The secondary I/O board can then be
detached.
■ Add a second path to the device through a second I/O board so that the I/O
board can be detached without losing access to the secondary disk chain.
Note – If you are unsure whether a device is detachable, consult with your Sun
service representative.
14 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Permanent and Non-Permanent
Memory
Before you can delete a board, the operating system must vacate the memory on that
board. Vacating a board entails flushing the contents of its non-permanent memory
to swap space; and copying the contents of its permanent memory (that is, the kernel
and OpenBoot™ PROM software) to another memory board.
To relocate permanent memory, the operating system on a domain must be
temporarily quiesced. The length of the quiescence depends on the domain I/O
configuration and the running workloads.
Detaching a board with permanent memory is the only time when the operating
system is quiesced; therefore, you should know where permanent memory resides so
that you can avoid impacting the operation of the domain significantly. To display
the size of permanent memory, use the cfgadm(1M) command with its -av option.
To vacate a board that has permanent memory, the operating system must find a
sufficiently large block of available memory, called target memory, on which to copy
the current contents of permanent memory, which is referred to as source memory.
Copy-Rename
User processes can release memory by paging it out to the swap device. But the
Solaris kernel, which resides in permanent memory, cannot be released in that
manner. Instead, cfgadm uses the copy-rename technique to release the memory.
After the OS identifies a suitable target board – one that has enough memory to hold
the permanent memory to be moved – the DR software executes the following steps:
1. Vacates the memory on the target board by paging the memory out to swap.
2. Quiesces the operating system.
3. Copies the contents (permanent memory) from the source board to the target
board. This is the copy part of the operation.
4. Reprograms the hardware to swap the memory address ranges of the source and
target board. This is the rename part of the operation.
5. Releases the operating system from its quiesced state.
Chapter 2 DR Concepts 15
Memory Interleaving
System boards cannot be dynamically reconfigured if system memory is interleaved
across multiple system boards. PCI cards and I/O boards can be dynamically
reconfigured regardless of whether memory is interleaved.
For more information about memory interleaving on high-end systems, see the Sun Fire High-End Systems Administration Manual. For midrange systems, see the
interleave-scope parameter of the setupdomain command, which is described
in both the Sun Fire Midrange Systems Platform Administration Manual and the Sun
Fire Midrange System Controller Command Reference Manual.
Correctable Memory Errors
Correctable memory errors indicate that the memory on a system board – that is, one
or more of its dual inline memory modules (DIMMs), or portions of the hardware
interconnect – might be faulty and need replacement. When the SC detects
correctable memory errors, it initiates a record-stop dump to save the diagnostic
data, which can interfere with a DR operation.
When a record-stop occurs from a correctable memory error, allow the record-stop
dump to complete before you initiate a DR operation.
If the faulty component causes repeated reporting of correctable memory errors, the
SC performs multiple record-stop dumps. If this happens, you should temporarily
disable the dump-detection mechanism on the SC; allow the current dump to finish;
then initiate the DR operation. After the DR operation finishes, re-enable the dump
detection.
Quiescence
During the unconfigure operation on a system board with permanent memory
(OpenBoot™ PROM or kernel memory), the operating system is briefly paused,
which is known as operating system quiescence. All operating system and device
activity on the domain must cease during this critical phase of the operation.
A quick way to determine whether a board has permanent memory is to use the
following command:
# cfgadm -av | grep permanent
16 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Loading...
+ 80 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.