Sun Microsystems, Inc.
4150 Network Circle
Santa Clara, CA 95054 U.S.A.
Part No. 816-5075-12
January 2003, Revision A
Send comments about this document to: docfeedback@sun.com
Copyright 2003Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, California95054, U.S.A.All rightsreserved.
Sun Microsystems, Inc.has intellectualproperty rightsrelating to technology embodied in the product that is described in this document. In
particular,and without limitation, these intellectual property rightsmay includeone ormore ofthe U.S.patents listedat
http://www.sun.com/patents, and one or moreadditional patentsor pendingpatent applicationsin theU.S. andin othercountries.
This document and the productto whichit pertainsare distributedunder licensesrestricting their use, copying, distribution, and
decompilation. No part of the product orof thisdocument maybe reproducedin any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-partysoftware, includingfont technology,is copyrighted and licensed from Sun suppliers.
Parts of the product maybe derivedfrom BerkeleyBSD systems,licensed fromthe University of California. UNIX is a registered trademarkin
the U.S. and other countries, exclusively licensed through X/OpenCompany,Ltd.
Sun, Sun Microsystems,the Sunlogo, AnswerBook2,docs.sun.com, andSolaris aretrademarks, registeredtrademarks, or service marks of Sun
Microsystems,Inc. inthe U.S.and othercountries.
All SPARCtrademarks areused underlicense andare trademarks or registered trademarksof SPARCInternational, Inc.in theU.S. andother
countries. Products bearingSPARCtrademarks arebased upon an architecture developedby SunMicrosystems, Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems,Inc. forits usersand licensees.Sun acknowledges
the pioneering effortsof Xeroxin researchingand developing the concept of visual or graphical user interfaces for the computer industry.Sun
holds a non-exclusive license fromXerox tothe XeroxGraphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
Use, duplication, ordisclosure bythe U.S.Government is subject to restrictionsset forthin theSun Microsystems, Inc.license agreementsand as
providedin DFARS 227.7202-1(a) and 227.7202-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (Oct. 1998), FAR12.212(a)(1995), FAR52.227-19,or
FAR52.227-14 (ALT III), as applicable.
DOCUMENTATIONIS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONSAND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTYOF MERCHANTABILITY,FITNESS FORA PARTICULARPURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THEEXTENT THAT SUCH DISCLAIMERS ARE HELD TO BELEGALLYINVALID.
Copyright 2003 Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, California95054, Etats-Unis.Tousdroitsréservés.
Sun Microsystems, Inc.a lesdroits depropriété intellectuels relatants à la technologie incorporée dans le produit quiest décritdans ce
document. En particulier,et sans la limitation, ces droits depropriété intellectuelspeuvent inclureun ou plus des brevetsaméricains énumérés
à http://www.sun.com/patentset unou lesbrevets plussupplémentaires ou les applications de brevet enattente dansles Etats-Uniset dans
les autres pays.
Ce produit oudocument estprotégé parun copyrightet distribuéavec deslicences quien restreignentl’utilisation, la copie, la distribution, et la
décompilation. Aucune partie de ce produit oudocument nepeut êtrereproduite sous aucune forme, parquelquemoyen quece soit,sans
l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y ena.
Le logiciel détenu par des tiers, et qui comprendla technologierelative auxpolices decaractères, est protégé par un copyright et licencié par des
fournisseurs de Sun.
Des parties de ce produitpourront êtredérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres payset licenciéeexclusivement parX/Open Company,Ltd.
Sun, Sun Microsystems,le logoSun, AnswerBook2,docs.sun.com, etSolaris sontdes marquesde fabriqueou desmarques déposées de Sun
Microsystems,Inc. auxEtats-Unis etdans d’autrespays.
Toutes les marquesSPARCsont utiliséessous licenceet sontdes marquesde fabrique ou des marquesdéposées deSPARCInternational, Inc.
aux Etats-Unis et dans d’autrespays. Lesproduits protantles marques SPARC sont basés sur une architecture développéepar Sun
Microsystems,Inc.
L’interfaced’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc.pour sesutilisateurs etlicenciés. Sun
reconnaîtles effortsde pionniersde Xeroxpour la rechercheet ledéveloppment duconcept desinterfaces d’utilisationvisuelle ougraphique
pour l’industrie de l’informatique. Sun détient une license non exclusive do Xerox surl’interface d’utilisationgraphique Xerox,cette licence
couvrant également les licenciées de Sun qui mettent en place l’interface d ’utilisation graphique OPEN LOOK et qui en outre seconforment
aux licences écrites de Sun.
LA DOCUMENTATIONESTFOURNIE "EN L’ÉTAT"ET TOUTES AUTRESCONDITIONS, DECLARATIONSETGARANTIES EXPRESSES
OU TACITESSONT FORMELLEMENTEXCLUES, DANSLA MESUREAUTORISEE PARLA LOIAPPLICABLE, YCOMPRIS NOTAMMENT
TOUTEGARANTIE IMPLICITERELATIVEA LAQUALITE MARCHANDE,A L’APTITUDE A UNE UTILISATIONPARTICULIEREOU A
L’ABSENCEDE CONTREFAÇON.
Please
Recycle
Contents
Prefacevii
Before You Read This Bookvii
How This Book Is Organizedvii
Using UNIX Commandsviii
Typographic Conventionsix
Shell Promptsix
Related Documentationx
Accessing Sun Documentation Onlinex
Sun Welcomes Your Commentsx
1.Introduction to DR on the Sun Fire 15K/12K Server1
What Is DR?1
Where You Execute DR Commands1
Command Line Interface (CLI)2
Graphical User Interface (GUI)2
Automatic DR2
Enhanced System Availability3
DR Concepts3
Detachability3
Quiescence4
iii
Suspend-Safe and Suspend-Unsafe Devices4
Attachment Points5
Conditions and States6
DR Operations6
Hot-Plug Hardware7
Sun Fire 15K/12K Domains7
Component Types8
DR on I/O Boards8
Solving a Problem With an I/O Device9
Golden IOSRAM9
DR on hsPCI+ I/O Boards10
Permanent and Non-permanent Memory10
Target Memory Constraints10
Correctable Memory Errors11
Capacity on Demand (COD)11
DR on COD Boards11
Enabling DR on Domains Running the Solaris 8 2/02 Operating Environment12
An Illustration of DR Concepts12
2.DR State and Condition Models15
Board States and Conditions15
Board Slot States16
Board Occupant States16
Board Conditions17
Component States and Conditions17
Component Receptacle States17
Component Occupant States17
Component Conditions18
ivSun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
3.DR Operations and Software Components on the Domain19
DR Operations19
Before You Perform DR Operations19
Before Performing DR Operations on I/O Boards20
Connect Operation20
Configure Operation22
CPUs and Memory22
I/O Boards23
After the Configure Operation23
Disconnect Operation23
Unconfigure Operation24
Non-permanent Memory24
Permanent Memory24
Software Components26
Domain Configuration Server26
DR Driver27
Reconfiguration Coordination Manager27
System Events Framework27
4.DR User Interfaces on the Domain29
DR Commands and Options on the Domain29
State Change Functions30
Availability Change Functions30
Condition Change Functions30
Options and Operands31
5.DR Domain Procedures33
Attachment Points33
Displaying Board Status34
v
Basic Status Display34
Detailed Status Display34
Removing a Board35
▼To Remove a CPU/Memory Board35
▼To Remove an I/O Board36
Adding a Board37
▼To Install a Board37
DR Using cfgadm(1M) - Examples39
Displaying Help39
Displaying Verbose Messages39
Suppressing User Confirmation40
Power Control When Disconnecting Boards40
Power Control of Disconnected Boards40
Connecting and Configuring Boards41
Hot Plugging PCI Adapter Cards41
Testing a Board42
Displaying Attachment Point Information42
Tracking Memory Unconfigure Operations43
Finding the Board Containing Permanent Memory43
viSun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Preface
This book describes the Dynamic Reconfiguration (DR) feature of the Sun™ Fire 15K
and Sun Fire 12K systems. DR enables you to attach system boards to and detach
them from Sun Fire 15K/12K domains while the Solaris operating environment
continues to run.
Before You Read This Book
This book is intended for the Sun Fire 15K/12K system administrator who has a
working knowledge of UNIX® systems, particularly those based on the Solaris™
operating environment. If you do not have such knowledge, first read the Solaris
user and system administrator books provided with this system and consider UNIX
system administration training.
How This Book Is Organized
This book contains the following chapters:
Chapter 1 “Introduction to DR on the 15K/12K Server”
Chapter 2 “DR State and Condition Models”
Chapter 3 “DR Operations and Software Components on the Domain”
Chapter 4 “DR User Interfaces on the Domain”
Chapter 5 “DR Domain Procedures”
vii
Using UNIX Commands
This document may not contain information on basic UNIX® commands and
procedures such as shutting down the system, booting the system, and configuring
devices.
See one or more of the following for this information:
■ Online documentation for the Solaris™ software environment
■ Other software documentation that you received with your system
viiiSun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Typographic Conventions
Typeface or
Symbol
AaBbCc123The names of commands, files,
AaBbCc123What you type, when
AaBbCc123Book titles, new words or terms,
MeaningExamples
Edit your .login file.
and directories; on-screen
computer output
contrasted with on-screen
computer output
words to be emphasized
Command-line variable; replace
with a real name or value
Use ls -a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
ShellPrompt
C shellmachine_name%
C shell superusermachine_name#
Bourne shell and Korn shell$
Bourne shell and Korn shell superuser#
DR Webpagehttp://www.sun.com/servers/highend/dr_su
System Management Services (SMS) 1.3
Dynamic Reconfiguration User Guide
System Management Services (SMS) 1.3
Administrator Guide
Solaris 9 4/03 Release Notes Supplement
for Sun Hardware
Release Notes
nfire
816-7723
816-5319
817-1106
816-5321
n/a
Accessing Sun Documentation Online
You can view and print a broad selection of Sun(TM) documentation, including
localized versions, at:
http://www.sun.com/documentation
You can also purchase printed copies of select Sun documentation from iUniverse,
the Sun documentation provider, at:
http://corppub.iuniverse.com/marketplace/sun
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can email your comments to Sun at:
docfeedback@sun.com
Please include the part number of this document (816-5075-12) in the subject line of
your email.
xSun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
CHAPTER
1
Introduction to DR on the Sun Fire
15K/12K Server
This chapter contains descriptions about general concepts that pertain to the
Dynamic Reconfiguration (DR) feature on the Sun Fire 15K and Sun Fire 12K servers.
What Is DR?
DR on the Sun Fire 15K/12K server enables you to perform hardware configuration
changes to a live domain that is running the Solaris operating environment, without
causing machine downtime. You can also use DR in conjunction with hot-swap to
physically add boards to or remove them from the server.
Where You Execute DR Commands
You can execute DR operations from the Sun Fire 15K/12K system controller (SC) by
using the system management services (SMS) commands: addboard(1M),
moveboard(1M), deleteboard(1M), and rcfgadm(1M); or from the domain by
using the cfgadm(1M) command. DR operations using SMS commands are
described in Chapter 5, “DR Domain Procedures.”
Note – If the addboard(1M), moveboard(1M), deleteboard(1M), rcfgadm(1M),
or cfgadm(1M) command fails during a DR operation, the board does not return to
its original state. A dxs or dca error message is logged to the domain. If the error is
recoverable, you can retry the command. If the error is unrecoverable, you must
reboot the domain to use the board.
1
Command Line Interface (CLI)
The DR software has a command line interface through the cfgadm(1M) command,
which is the configuration administration program. The DR agent also provides a
remote interface to the Sun Management Center 3.0 software.
Graphical User Interface (GUI)
The optional Sun Management Center 3.0 Platform Update 4 software, which is
designed for these systems, provides features such as domain management, as well
as a graphical user interface (GUI) where you perform DR operations. If you prefer
to use a graphical user interface instead of a command line interface, use the Sun
Management Center 3.0 software.
To use the Sun Management Center 3.0 Platform Update 4 software, you must attach
the system controller board to a network. With a network connection, you can view
both the command line interface and the graphical user interface. For instructions on
how to use the Sun Management Center 3.0 Platform Update 4 software, refer to the
Sun Management Center 3.0 User’s Guide, shipped with the Sun Management Center
3.0 Platform Update 4 software. For instructions on how to connect the system
controller to a network connection on the system controller board, see your systems
installation documentation.
Automatic DR
Automatic DR enables an application to execute DR operations without requiring
user interaction. This ability is provided by an enhanced DR framework that
includes the reconfiguration coordination manager (RCM) and the system event
facility, called sysevent. The RCM enables application-specific loadable modules to
register callbacks. The callbacks perform preparatory tasks before a DR operation,
error recovery during a DR operation, or clean-up after a DR operation. The
sysevent facility enables applications to register for system events and receive
notifications of those events. The automatic DR framework interfaces with the RCM
and with the sysevent facility to enable applications to automatically give up
resources prior to unconfiguring them and to capture new resources as they are
configured into the domain.
2Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Enhanced System Availability
The DR feature enables you to hot-swap system boards without bringing the server
down. It is used to unconfigure the resources on a faulty system board from a
domain so that the system board can be removed from the server. The repaired or
replacement board can be inserted into the domain while the Solaris operating
environment continues to run. DR then configures the resources on the board into
the domain. If you use the DR feature to add or remove a system board or
component, DR always leaves the board or component in a known configuration
state. See Chapter 2 “DR State and Condition Models” for more information about
configuration states for system board and components.
DR Concepts
This section contains descriptions of general DR concepts that pertain to Sun Fire
15K/12K domains. For more information about DR concepts on the SC, refer to the
System Management Services (SMS) 1.3 Dynamic Reconfiguration User Guide.
Detachability
For a device to be detachable, it must conform to the following items:
■ The device driver must support DDI_DETACH.
■ Critical resources must be redundant or accessible through an alternate pathway.
CPUs and memory banks can be redundant critical resources. Disk drives are
examples of critical resources that can be accessible through an alternate pathway.
Some boards cannot be detached because their resources cannot be moved. For
example, if a domain has only one CPU board, that CPU board cannot be detached.
An I/O board is not detachable if it controls the boot drive.
If there is no alternate pathway for an I/O board, you can:
■ Put the disk chain on a separate I/O board. The secondary I/O board can then be
detached.
■ Add a second path to the device through a second I/O board so that the I/O
board can be detached without losing access to the secondary disk chain.
Note – If you are unsure whether a device is detachable, consult your Sun service
representative.
Chapter 1Introduction to DR on the Sun Fire 15K/12K Server3
Quiescence
During the unconfigure operation on a system board with permanent memory
(OpenBoot™ PROM or kernel memory), the operating environment is briefly
paused, which is known as operating environment quiescence. All operating
environment and device activity on the domain must cease during this critical phase
of the operation.
Before it can achieve quiescence, the operating environment must temporarily
suspend all processes, CPUs, and device activities. If the operating environment
cannot achieve quiescence, it displays the reasons, which may include the following:
■ An execution thread did not suspend.
■ A device exists that cannot be paused by the operating environment.
Note – Real-time processes do not prevent quiescence.
The conditions that cause processes to fail to suspend are generally temporary.
Examine the reasons for any failure, and if the operating environment encountered a
failure to suspend a process, simply try the operation again.
Suspend-Safe and Suspend-Unsafe Devices
When DR suspends the operating environment, all of the device drivers that are
attached to the operating environment must also be suspended. If a driver cannot be
suspended (or subsequently resumed), the DR operation fails.
A suspend-safe device does not access memory or interrupt the system while the
operating environment is in quiescence. A driver is suspend-safe if it supports
operating environment quiescence (if it can be suspended and then resumed). A
suspend-safe driver also guarantees that when a suspend request is successfully
completed, the device that the driver manages will not attempt to access memory,
even if the device is open when the suspend request is made.
A suspend-unsafe device allows a memory access or a system interruption to occur
while the operating environment is in quiescence.
DR uses an unsafe driver list in the dr.conf file to prevent unsafe devices from
accessing memory or interrupting the operating environment during a DR
operation. The dr.conf file resides in the following directory:
/platform/SUNW,Sun-Fire-15000/kernel/drv/. The unsafe driver list is a
property in the dr.conf file with the following format:
4Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
DR reads this list when it prepares to suspend the operating environment so that it
can unconfigure a memory component. If DR finds an active driver in the unsafe
driver list, it aborts the DR operation and returns an error message. The message
includes the identity of the active, unsafe driver. You must manually remove the
usage of the device by performing one, or more, of the following tasks.
■ Kill the processes using the device.
■ Unload the driver by using the modunload(1M) command.
■ Disconnect the cables (depending on the type of device).
You can retry the DR operation after you have stopped usage of the device.
Note – If you are unsure whether a device is suspend-safe, contact your Sun service
representative.
Attachment Points
An attachment point is a collective term that refers to a board slot, a system board
installed in the slot, and any devices connected to the board. DR can display the
status of the board, the board slot, and the attachment point. The term occupant
refers to the combination of a board and its attached devices.
■ A board slot (sometimes referred to as a receptacle) has the ability to electrically
isolate the occupant from the host machine. The software can put a board slot into
low-power mode.
■ Board slots can be named according to slot numbers, or can be anonymous (for
example, a SCSI chain).
■ An occupant I/O board includes any external storage devices connected by
interface cables.
There are two types of names for attachment points:
■ A physical attachment point describes the software driver and location of the slot.
Examples of physical attachment point names are:
/devices/pseudo/dr@0:SBx (for a CPU/memory board in slot 0)
-OR-
/devices/pseudo/dr@0:IOx (for an I/O board or Max CPU board in slot 1)
Where, x represents the expander number (0 through 17 on the Sun Fire 15K
system, and 0 through 8 on the Sun Fire 12K system) for a particular board.
Chapter 1Introduction to DR on the Sun Fire 15K/12K Server5
Note – CPU/memory boards are installed only in slot 0. I/O boards and Max CPU
boards are installed only in slot 1.
■ A logical attachment point is an abbreviated name created by the system to refer
to the physical attachment point. Logical attachment points take one of the
following two forms:
SBx (for CPU/memory boards in slot 0)
-OR-
IOx (for I/O boards or Max CPU boards in slot 1)
To obtain a list of all available logical attachment points, use the cfgadm(1M)
command with its -l option.
Conditions and States
A state is the operational status of either a board slot or its occupant. A condition is
the operational status of an attachment point. The cfgadm(1M) command can
display nine types of states and conditions. See Chapter 2, “DR State and ConditionModels,” for descriptions of the conditions and states for system boards and
components.
DR Operations
There are four main types of operations related to boards: connection, configuration,
unconfiguration, and disconnection. A board that is brought into a domain is first
connected and then configured. A board that is removed from a domain is first
unconfigured and then disconnected.
During the connect operation, the system provides power to the slot, and the
operating environment begins monitoring the board’s temperature.
During the configure operation, the operating environment assigns functional roles
to the board, and loads device drivers for the board and for devices attached to it.
During the unconfigure operation, the system detaches the board logically from the
operating environment and takes the associated device drivers offline.
Environmental monitoring continues, but devices on the board are not available for
system use.
During the disconnect operation, the system stops monitoring the board and power
to the slot is turned off.
6Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
To power-off a board that is in use (configured), first stop its use (unconfigure it),
and then disconnect it from the domain. After a new or upgraded system board is
inserted into the slot, connect the board and configure it.
The cfgadm(1M) command can connect and configure (or unconfigure and
disconnect) in a single command. To connect and configure a board using a single
command, see the section“Adding a Board” on page 37. To unconfigure and
disconnect a board using a single command, see the section“Removing a Board” on
page 35.
If necessary, each operation (connect, configure, unconfigure, or disconnect) can be
performed separately using the cfgadm(1M) command.
Hot-Plug Hardware
Hot-plug boards and modules have special connectors that supply electrical power
to the board or module before the data pins make contact. Boards and devices that
do not have hot-plug connectors cannot be inserted or removed while the system is
running.
I/O boards and CPU/memory boards used in the Sun Fire 15K/12K server are hotplug devices. Some devices, such as the peripheral power supply, are not hot-plug
modules and cannot be removed while the system is running.
Sun Fire 15K/12K Domains
The Sun Fire 15K/12K server can be divided into dynamic system domains, which
are comprised of logical and physical groupings of system board slots. Each domain
is electrically isolated into hardware partitions, which ensures that a problem
encountered in one domain cannot affect other domains.
Domain configuration is determined by the domain configuration table in the
platform configuration database (PCD), which resides on the SC. The domain table
controls how system board slots are logically partitioned into domains. The domain
configuration represents the intended domain configuration. Thus, the configuration
can include empty slots and occupied slots.
The number of slots available to a given domain is controlled by an available
component list that is maintained on the system controller. (Refer to the SystemManagement Services (SMS) 1.3 Administrator Guide for more information about the
available component list.) After a slot has been assigned to a domain, it becomes
Chapter 1Introduction to DR on the Sun Fire 15K/12K Server7
visible to that domain and unavailable and invisible to any other domain.
Conversely, you must disconnect and unassign a slot from its domain before you can
assign and connect it to another domain.
The logical domain is the set of slots that belong to the domain. The physical domain
is the set of boards that are physically interconnected. A slot can be a member of a
logical domain and not be part of a physical domain.
After a domain is booted, the system boards and empty slots can be assigned to (or
unassigned from) a logical domain; however, they cannot become a part of the
physical domain until the operating environment requests it.
System boards or slots that are not assigned to a domain are available to all domains
in whose available component lists they appear. These boards can be assigned to a
domain by the platform administrator. Or, an available component list can be set up
on the system controller to allow users with appropriate privileges to assign
available boards to a domain.
Component Types
You can use DR to configure or to unconfigure several types of components:
Component TypeDescription
cpuAn individual CPU
memoryAll of the memory on the board
pciAny I/O device, controller, or bus
DR on I/O Boards
You must use caution when you add or remove I/O boards to which devices are
attached. Before you can remove a board with I/O devices, all of its devices must be
closed and all of its file systems must be unmounted.
If you need to remove an I/O board with attached devices from a domain
temporarily and then re-add it before any other boards with I/O devices are added,
reconfiguration is not necessary. In this case, device paths to the board devices
remain unchanged.
8Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Solving a Problem With an I/O Device
All I/O devices must be closed before they are unconfigured. If you encounter a
problem with an I/O device, the following list can help you to overcome the
problem.
■ Use the fuser(1M) command to see which processes have the device open.
■ Run showdevices(1M) on the SC to determine the state and usage of the device.
■ If disk mirroring is being used to access a device connected to the board,
reconfigure the device so that it is accessible by way of controllers on other
system boards.
■ Unmount file systems.
■ Remove multipathing databases from board-resident partitions. The location of
multipathing databases is explicitly chosen by the user and can be changed.
Refer to the System Management Services (SMS) 1.3 Dynamic Reconfiguration User
Guide for special instructions for I/O devices.
■ Remove any private regions used by volume managers. By default, volume
managers use a private region on each device that they controls. Such devices
must be removed from volume manager control before they can be detached.
■ Take any RSM 2000 controllers offline by using the rm6 or rdacutil commands.
■ Remove disk partitions from the swap configuration.
■ If a detach-unsafe device is present on the board, close all instances of the device
and use modunload(1M) to unload the driver.
■ Unmounting file systems may affect NFS client systems.
■ Either kill any process that directly opens a device or raw partition, or direct it to
close the open device on the board.
Note – If you use the ndd(1M) command to set the configuration parameters for
network drivers, the parameters may not persist after a DR operation. Use the
/etc/system file or the driver.conf file for a specific driver to set the
parameters permanently.
Golden IOSRAM
Each I/O board in a domain contains an IOSRAM device. However, only one
IOSRAM device, called golden IOSRAM, is used for SC-to-domain communications
at a time. The golden IOSRAM contains the “tunnel” that is used for SC-to-domain
communications. Because DR can remove I/O boards, it is sometimes necessary to
stop using the current golden IOSRAM and make another IOSRAM device the
golden IOSRAM. This process is called a “tunnel switch,” and takes place whenever
DR unconfigures the current golden IOSRAM.
Chapter 1Introduction to DR on the Sun Fire 15K/12K Server9
When a domain is booted, the lowest-numbered I/O board in the domain is
typically selected to be the initial golden IOSRAM.
DR on hsPCI+ I/O Boards
DR supports dynamic reconfiguration of hsPCI+ I/O boards. Each hsPCI+ I/O
board includes two XMITS ASICs and four hot-pluggable hsPCI slots.
Permanent and Non-permanent
Memory
Before you can delete a board, the operating environment must vacate the memory
on that board. Vacating a board entails flushing the contents of its non-permanent
memory to swap space; and copying the contents of its permanent memory (that is,
the kernel and OpenBoot™ PROM software) to another memory board.
To relocate permanent memory, the operating environment on a domain must be
temporarily quiesced. The length of the quiescence depends on the domain I/O
configuration and the running workloads.
Detaching a board with permanent memory is the only time when the operating
environment is quiesced; therefore, you should know where permanent memory
resides so that you can avoid impacting the operation of the domain significantly. To
display the size of permanent memory, use the cfgadm(1M) command with its -av
option. To vacate a board that has permanent memory, the operating environment
must find a sufficiently large block of available memory, called target memory, on
which to copy the current contents of permanent memory, which is referred to as
source memory.
Target Memory Constraints
A DR memory operation can be disallowed if the target domain does not have
enough memory to hold the contents currently stored in permanent memory.
10Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Correctable Memory Errors
Correctable memory errors indicate that the memory on a system board (that is, one
or more of its Dual Inline Memory Modules (DIMMs), or portions of the hardware
interconnect) may be faulty and need replacement. When the SC detects correctable
memory errors, it initiates a record-stop dump to save the diagnostic data, which can
interfere with a DR operation.
When a record-stop occurs from a correctable memory error, allow the record-stop
dump to complete before you initiate a DR operation.
If the faulty component causes repeated reporting of correctable memory errors, the
SC performs multiple record-stop dumps. If this happens, you should temporarily
disable the dump-detection mechanism on the SC; allow the current dump to finish;
then initiate the DR operation. After the DR operation finishes, re-enable the dump
detection.
Capacity on Demand (COD)
The COD option provides additional CPU resources on COD CPU/Memory boards
that you install in your Sun Fire 15K/12K system. Although your Sun Fire 15K/12K
system comes configured with a minimum number of standard (active)
CPU/Memory boards, it can have a mix of both standard and COD CPU/Memory
boards installed, up to a maximum 18 boards. At least one active CPU is required for
each domain in the system.
DR on COD Boards
You can use DR to move COD boards into and out of domains in the same way you
use DR to move standard CPU/Memory boards.
You can use the CPUs on a COD board only after you purchase right-to-use (RTU)
licenses for them. Each COD RTU license entitles you to receive a COD RTU license
key that enables a specified number of CPUs on COD boards in a single system.
Whenever you use DR to configure a COD board into a domain, make sure that
enough RTU licenses are available to the target domain to enable each active CPU on
the COD board. If there are not enough RTU licenses available to a target domain
when you add a COD board, a status message is displayed for each CPU that cannot
be enabled in the domain.
For more information about the COD option, see the System Management Services(SMS) 1.3 Administrator Guide.
Chapter 1Introduction to DR on the Sun Fire 15K/12K Server11
Enabling DR on Domains Running the
Solaris 8 2/02 Operating Environment
While the Solaris 9 4/03 operating environment supports the full functionality of
DR, previous versions of the Solaris operating environment did not support
reconfiguration of I/O boards.
You can enable the full functionality of DR on domains that are running the Solaris 8
2/02 software by installing patches and a new kernel update on the domain; and by
installing the System Management Services (SMS) 1.3 software on the system
controller (SC).
For complete information and instructions for enabling DR on a domain that is
running the Solaris 8 2/02 software, visit:
http://www.sun.com/servers/highend/dr_sunfire
An Illustration of DR Concepts
DR lets you disconnect, then reconnect system circuit boards without bringing the
system down. You can use DR to add or remove system resources while the system
continues to operate.
To illustrate reconfiguration of system resources consider the following Sun Fire 15K
system configuration, as depicted in the diagram that follows.
Note – The following diagram illustrates DR operations on a Sun Fire 15K system.
For a Sun Fire 12K system everything is identical, including the commands that you
enter, except that there is a maximum nine boards, numbered 0 through 8.
Domain A contains system boards 0 and 2, and I/O board 2. Domain B contains
system boards 1 and 3, and I/O boards 1, 3, and 4.
12Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Domain A
•
•
•
System board 0
System board 1
System board 2
System board 3
System board 4
•
I/O 0
I/O 1
I/O 2
I/O 3
I/O 4
Domain B
FIGURE 1-1 Domains A & B before Reconfiguration
•
System board 16
•
I/O 16
System board 17
I/O 17
To assign system board 4 and I/O board 0 to Domain A, and to move I/O board 4
from Domain B to Domain A, you can use the Sun Management Center software’s
GUI. Or you can perform the following steps manually on the CLI in each domain as
follows:
1. Enter the following configuration command on the command line in Domain B to
disconnect I/O board 4 from Domain B:
# cfgadm -c disconnect -o nopoweroff,unassign IO4
2. Then, enter the following single command on the command line in Domain A,
which assigns, connects, and configures system board 4 and I/O boards 0 and 4
into Domain A:
# cfgadm -c configure SB4 IO0 IO4
Chapter 1Introduction to DR on the Sun Fire 15K/12K Server13
The following system configuration is the result. Only the way in which the boards
are connected has changed, not the physical layout of the boards within the cabinet.
Domain A
•
•
•
System board 0
System board 1
System board 2
System board 3
I/O 0
I/O 1
Domain B
I/O 2
System board 4
I/O 3
I/O 4
•
•
System board 16
•
I/O 16
System board 17
I/O 17
FIGURE 1-2 Domains A & B after Reconfiguration
14Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
CHAPTER
2
DR State and Condition Models
This chapter contains descriptions of the state and condition models for boards and
components. The state models are divided into two categories: receptacle and
occupant.
Before you attempt to perform any DR operation on a board or component from the
domain, determine its state and condition. To display the type, state, and condition
of each component and the state and condition of each board slot in the domain, use
the cfgadm(1M) command with the -la options. See the section “Component
Types” on page 8 for a list of the component types.
You can use the prtdiag(1M) command to display information about board slots
and components. The prtdiag(1M) command displays board numbers in the
format SBxx or IOxx, where xx is the board number that includes leading zeroes.
Board States and Conditions
This section contains descriptions of the states and conditions of system boards and
board slots (also known as receptacles).
15
Board Slot States
A slot can have one of three states: empty, disconnected, or connected.
NameDescription
emptyA board is not present.
disconnectedThe board is disconnected from the system bus. A board can be in
the disconnected state without being powered off. However, a
board must be powered off and in the disconnected state before you
remove it from the slot.
connectedThe board is powered on and connected to the system bus. You can
view the components on a board only after it is in the connected
state.
Whenever you insert a board into a slot, the slot’s state changes from empty to
disconnected. Whenever you remove a board, the slot’s state changes from
disconnected to empty.
Note – Physically removing a board that is in the connected state, or that is
powered on and in the disconnected state, crashes the operating system and can
result in permanent damage to that system board.
Board Occupant States
A board can have one of two occupant states: configured or unconfigured. The
occupant state of a disconnected board is always unconfigured.
NameDescription
configuredAt least one component on the board is configured.
unconfiguredAll of the components on the board are unconfigured.
16Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Board Conditions
A board can be in one of four conditions: unknown, ok, failed, or unusable.
NameDescription
unknownThe board has not been tested.
okThe board is operational.
failedThe board failed testing.
unusableThe board slot is unusable.
Component States and Conditions
This section contains descriptions of the states and conditions for components.
Component Receptacle States
A component cannot be individually connected or disconnected. Thus, components
can have only one state: connected.
Component Occupant States
A component can have one of two occupant states: configured or unconfigured. The
following table contains the name and description of the occupant states for
components.
NameDescription
configuredThe component is available for use by the Solaris operating
environment.
unconfiguredThe component is not available for use by the Solaris operating
environment.
Chapter 2DR State and Condition Models17
Component Conditions
A component can have one of three conditions: unknown, ok, and failed. The
following table contains the name and description of the conditions for components.
NameDescription
unknownThe component has not been tested.
okThe component is operational.
failedThe component failed testing.
18Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
CHAPTER
3
DR Operations and Software
Components on the Domain
This chapter contains descriptions of the four general DR operations: connect,
configure, disconnect, and unconfigure. For more information on how to perform
these operations, see Chapter 5 “DR Domain Procedures.”.
This chapter also contains information about the various software components that
work together to accomplish DR operations. The components that are used during a
DR operation depend entirely on the point of initiation of the DR operation. For
example, if you initiate the DR operation from the Sun Fire 15K/12K system
controller (SC), the system uses several more software components to accomplish the
DR operation than when you initiate the DR operation from the domain.
For more information about the software components that reside on the SC, refer to
the System Management Services (SMS) 1.3 Dynamic Reconfiguration User Guide.
DR Operations
This section contains descriptions of the four general DR operations: connect,
configure, disconnect, and unconfigure. These operations are described from the
point of view of the domain. They do not contain information that is specific to the
SC.
Before You Perform DR Operations
Before you perform DR operations for the first time on a domain after it has been
booted, make sure the board is available to the domain. To display a list of boards
that are available to the domain, use the cfgadm(1M) command with its -l option.
19
An error may occur if you attempt to perform DR operations on a board that:
■ Is not listed in the domain’s available component list and is not assigned to the
domain; or
■ Is listed in the domain’s available component list, but is assigned to another
domain.
In either of these cases, the board is not available to the domain. For more
information about the available component list refer to the System ManagementServices (SMS) 1.3 Administrator Guide.
Before Performing DR Operations on I/O Boards
Before you attempt to perform DR operations on an I/O board in a domain, make
sure there are at least two CPUs available to the domain. Further, make sure that at
least one of those CPUs is located on a CPU/memory board; and that no processes
are bound to it. See the pbind(1M) man page for more information about bound
processes.
When you use DR to configure an I/O board into a domain (or to test an I/O board
explicitly using the cfgadm(1M) command with its -t option), one CPU that is an
occupant on a CPU/Memory board in the same domain is selected to test the board.
Further, no process can be bound to the CPU, and at least one additional CPU must
remain in the domain. If no such CPU is available to perform the test, a message
such as the following is displayed: “WARNING: No CPU available for I/O cage
test.”
The CPU is unconfigured from the domain and the I/O board tested. After the test is
complete, the CPU is configured back into the domain. After the CPU is successfully
reconfigured, its timestamp as displayed by the psrinfo(1M) command will differ
from timestamps for other CPUs in the domain.
Connect Operation
During the connect operation, DR attempts to assign the slot to the domain if a
system board is available and if it is not part of any logical domain. After the slot
has been assigned, DR requests that the SC power on and test the board. After the
board has been tested, DR requests the SC to connect the board electronically to the
system, which makes the board part of the physical domain. The operating system
then probes the components on the board.
20Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
To connect a system board through the domain rather than the SC, use the
cfgadm(1M) command as follows:
# cfgadm -c connect SBx
where x represents the number of a particular board (0 through 17 for a Sun Fire
15K, or 0 through 8 for a Sun Fire 12K).
Note – If the cfgadm(1M) command fails during a DR operation, the board does not
return to its original state. A dxs or dca error message is logged to the domain. If
the error is recoverable, you can retry the command. If the error is unrecoverable,
you will need to reboot the domain to use the board.
The syntax of the cfgadm(1M) command to connect an I/O board is as follows:
# cfgadm -c connect IOx
where x represents the number (0 through 17 for a Sun Fire 15K, or 0 through 8 for a
Sun Fire 12K) of a particular board.
The states and conditions for the attachment point before a board is inserted are:
■ Receptacle state—Empty
■ Occupant state—Unconfigured
■ Condition—Unknown
After a board is physically inserted, the states and conditions are:
■ Receptacle state—Disconnected
■ Occupant state—Unconfigured
■ Condition—Unknown
After the attachment point is logically connected, the states and conditions are:
■ Receptacle state—Connected
■ Occupant state—Unconfigured
■ Condition—OK
Chapter 3DR Operations and Software Components on the Domain21
Configure Operation
During the configure operation, DR attempts to connect the board slot if its state is
disconnected. It then traverses the tree of devices that was created during the
connect operation. (DR creates Solaris device tree nodes and attaches device drivers
if necessary.)
The CPUs are added to the CPU list; and memory is initialized and added to the
system memory pool. After the configure function has completed successfully, the
CPUs and memory are ready for use.
For I/O devices, use the mount(1M) and the ifconfig(1M) commands before the
devices can be used.
When you configure a board into a domain using cfgadm, the board is automatically
connected and configured
CPUs and Memory
To configure a CPU on a system board through the domain rather than the SC, use
the cfgadm(1M) command as follows:
# cfgadm -c configure SBx::cpuy
where x represents the board number (0 through 17 on a Sun Fire 15K system, and 0
through 8 on a Sun Fire 12K system) and y represents the CPU number (0 through 3).
The syntax of the cfgadm(1M) command to configure memory is as follows:
# cfgadm -c configure SBx::memory
where x represents the board number (0 through 17 on a Sun Fire 15K system, and 0
through 8 on a Sun Fire 12K system). For memory, the command applies to all the
memory on the system board.
To configure all the CPUs and memory on a system board, use the following
command:
# cfgadm -c configure SBx
22Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
I/O Boards
To configure one of the PCI slots that holds the PCI adapter with hotplug capability,
the syntax of the cfgadm(1M) command is as follows:
# cfgadm -c configure pcisch0:e00b1slot1
For more information, see “Hot Plugging PCI Adapter Cards” on Page 41.
To configure an I/O board, use the following command:
# cfgadm -c configure IOx
After the Configure Operation
The states and conditions for a configured attachment point are:
■ Receptacle state—Connected
■ Occupant state—Configured
■ Condition—OK
Now the system is aware of the usable devices that reside on the board, and all
devices can be mounted or configured for use.
Disconnect Operation
During a disconnect operation, the DR framework communicates with the SC to
program the interconnect so that the system board is removed from the physical
domain. It then attempts to perform the tasks related to the unconfigure operation.
A board can be in the disconnected state without being powered off. However, the
board must be powered off and in the disconnected state before you can remove it
from the slot.
The syntax of the cfgadm(1M) command to disconnect the board is as follows:
# cfgadm -c disconnect SBx
where x represents the board number (0 through 17 on a Sun Fire 15K system, and 0
through 8 on a Sun Fire 12K system) for a particular board.
Before the board is disconnected, the states and conditions are:
Chapter 3DR Operations and Software Components on the Domain23
■ Receptacle state—Connected
■ Occupant state—Configured
■ Condition—OK
After the board is disconnected, the states and conditions are:
■ Receptacle state—Disconnected
■ Occupant state—Unconfigured
■ Condition—Unknown
Unconfigure Operation
The unconfigure operation can consist of a single operation or two separate
operations, depending on the presence of permanent memory. If the system board
hosts permanent memory, before the unconfigure operation DR moves the memory
contents from the specified board to available memory on a target board in the
domain. See the section“Permanent and Non-permanent Memory” on page 10 for
more information about boards that host permanent memory.
Non-permanent Memory
If the reconfiguration coordination manager (RCM) is present, then DR informs the
RCM about the DR operation. The RCM informs client applications, and the client
applications perform preparatory tasks such as stopping the usage of devices. The
clients communicate their readiness to the RCM, and the RCM communicates its
readiness to DR. Depending on the responses, DR either continues, or aborts the
operation and reports an error to the user.
During the unconfigure operation, DR unconfigures the board resources from the
Solaris operating environment and leaves the board in the disconnected state.
If the board hosts CPUs and/or memory, DR removes them from the Solaris
operating environment, making them unusable to the operating system. If the board
is an I/O board, DR detaches the device drivers.
Permanent Memory
The following paragraphs and examples specifically illustrate the unconfigure
operation for permanent memory.
In the following code examples, the permanent memory on board 0 must be moved
to another board in the domain, board 1. Board 0 is the source board, and board 1 is
the target board.
24Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
For brevity, the CPU information has been removed from the code examples. On the
domain, the unconfigure operation is started with the cfgadm(1M) command:
# cfgadm -c unconfigure -y SB0::memory &
First, a block of memory on the target board that resides in the same address range
as the permanent memory on the source board must be deleted. During this phase,
the source board, the target board, and the memory attachment points are marked as
busy. You can display the status with the following command:
# cfgadm -a -s cols=ap_id:type:r_state_o_state:busy SB0 SB1
After the memory has been deleted on board 1, it is marked as unconfigured. The
memory the source board remains configured, but it is still marked as busy, as in the
following example.
The memory from the source board is then copied to the target board. After it has
been copied, the occupancy state for the memory is switched. The memory on the
source board becomes unconfigured, and the memory on the target board becomes
configured. At this point in the process, only the source board remains busy, as in
the following example.
Chapter 3DR Operations and Software Components on the Domain25
After the entire process has been completed, the memory on the source board
remains unconfigured, and the attachment points are not busy, as in the following
example.
The permanent memory has been moved, and the memory on the source board has
been unconfigured. At this point, you can initiate a new status change operation on
either board.
Software Components
This section describes the software components that reside on the domain and make
DR operations possible. However, it does not contain descriptions of all of the DR
components on the Sun Fire 15K/12K platform. Refer to the System ManagementServices (SMS) 1.3 Dynamic Reconfiguration User Guide for descriptions of the software
components that reside on the Sun Fire 15K/12K system controller (SC).
Domain Configuration Server
The domain configuration server (DCS) is a daemon process that runs on a Sun Fire
15K/12K domain and is started by inetd(1M) when the first remote DR request is
received. A single instance of the DCS runs in each domain on the Sun Fire
15K/12K. The DCS accepts DR requests from the domain configuration agent (DCA)
that runs on the SC. After the DCS accepts a DR operation, it performs the request
and returns the results to the DCA. Refer to the System Management Services (SMS)
1.3 Dynamic Reconfiguration User Guide for more information about the DCA.
Note – If you alter or remove the sun-dr entry in the inetd.conf file, make the
same change to the sun-dr entry in the ipsecinit.conf file.
26Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
DR Driver
The DR driver consists of a platform independent driver, named dr, and a platform
specific module, named drmach. The DR driver uses standard features of the Solaris
Operating Environment whenever possible to control DR operations, and it calls the
platform specific module as needed. The DR driver is responsible for creating minor
nodes in the file system that are used as attachment points for DR operations.
Reconfiguration Coordination Manager
The reconfiguration coordination manager (RCM) is a daemon process that
coordinates DR operations on resources that are present in the domain. The RCM
daemon uses generic application program interfaces (APIs) to coordinate DR
operations between DR initiators and RCM clients.
The RCM consumers consist of DR initiators, which request DR operations, and DR
clients, which react to DR requests. Normally, the DR initiator is the configuration
administration command, cfgadm(1M). However, it can also be a GUI such as Sun™
Management Center.
The DR clients can be:
■ Software layers that export high-level resources comprised of one or more
hardware devices (for example, multipathing applications)
■ Applications that monitor DR operations (for example, Sun Management Center)
■ Entities on a remote system, such as the system controller on a server
System Events Framework
DR uses the Solaris system events framework to notify other software entities of the
occurrence of changes that result from a DR operation. DR accomplishes this by
sending DR events to the system event daemon, syseventd, which, in turn, sends
the events to the subscribers of DR events. For more information about the system
events daemon, refer to the syseventd(1M) man page.
Chapter 3DR Operations and Software Components on the Domain27
28Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
CHAPTER
4
DR User Interfaces on the Domain
This chapter describes the user interfaces on the Sun Fire 15K/12K domain, which
include the commands and options that are available to the user, and important files.
DR Commands and Options on the
Domain
The cfgadm(1M) command is used to perform DR operations on the domain. DR
operations are passed to the libcfgadm(3LIB) library interface, which dynamically
loads a hardware-specific library plugin that actually performs the DR operations.
Note – If the cfgadm(1M) command fails during a DR operation, the board does not
return to its original state. A dxs or dca error message is logged to the domain. If
the error is recoverable, you can retry the command. If the error is unrecoverable,
you will need to reboot the domain to use the board.
The sbd.so.1 hardware-specific plug-in provides DR functionality: connecting,
configuring, unconfiguring, and disconnecting system boards, which enables you to
connect or disconnect a system board from a running system without having to
reboot the system.
The cfgadm(1M) command resides in the /usr/sbin directory. (See the
cfgadm(1M) man page for more information.)
Each board slot appears as a single attachment point in the device tree. You can view
the type, state, and condition of each component, and the state and condition of each
board slot, by using the cfgadm(1M) command with its -a option.
29
State Change Functions
State change functions, which change the state of a board slot or a component on the
board, can be issued against any attachment point.
The following are state change functions:
■ configure
■ unconfigure
■ connect
■ disconnect
Availability Change Functions
Availability change functions, which change the availability of a board, can be issued
against any attachment point.
The following are availability change functions:
■ assign
■ unassign
Condition Change Functions
Condition change functions, which change the condition of a board slot or a
component on the board, can be issued against any attachment point.
The following are condition change functions:
■ poweron
■ poweroff
■ test
30Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Options and Operands
The following options and operands are supported for the functions shown, where
ap_id specifies the attachment point of the system board or component.:
Options and OperandsSpecifies
-c connect ap_idChanges the receptacle state to connected.
-c disconnect ap_idChanges the receptacle state to disconnected.
-c configure ap_idChanges the occupancy state to configured.
-c unconfigure ap_idChanges the occupancy state to unconfigured.
-x assign ap_idChanges the occupancy state to assigned.
-x unassign ap_idChanges the occupancy state to unassigned.
-x poweron ap_idChanges the occupancy state to powered-on.
-x poweroff ap_idChanges the occupancy state to powered-off.
-l ap_idThe state, status, and condition of system boards and
components to be displayed
-h [ap_id]Prints out a help message text.If ap_id is specified, the
help routine of the hardware-specific library for the
attachment point indicated by the argument is called.
-vExecutes in verbose mode.
-nSuppresses command prompts for user confirmation,
and automatically answers “no.”
-ySuppresses command prompts for user confirmation,
and automatically answers “yes.”
Chapter 4DR User Interfaces on the Domain31
Options and OperandsSpecifies
-s listing_optionsThe state of attachment points to be displayed
according to listing_options. Supplies listing options to
the list (-l) command. The listing_options argument
conforms to the syntax conventions of the
getsubopt(3C) man page, and specifies: the
attachment point selection criteria (i.e., select=
select_string); the type of matching desired (i.e.,
match=match_type); the order of listing (i.e., sort=
field_spec); the data that is displayed (i.e., cols=
field_spec and cols2=field_spec); the column
delimiter (i.e., delim=string); and whether to
suppress column headings (i.e., noheadings).
-o hardware_optionsSupplies hardware-specific options to the main
command option. The format and content of the
hardware_options string is completely hardwarespecific; and the string conforms to the syntax
conventions of the getsubopt(3C) man page.
-t ap_idPerforms a test of one or more attachment points. The
test function is used to re-evaluate the condition of the
attachment point. Without a test-level specifier in
hardware_options, the fastest test that identifies hard
faults is used.
32Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
CHAPTER
5
DR Domain Procedures
This chapter describes how you use the cfgadm(1M) command on the Sun Fire
15K/12K domain to perform DR operations. It also describes attachment points and
procedures for displaying the status of system boards.
Attachment Points
Before you use the cfgadm(1M) command, make sure you understand the syntax for
attachment points on the Sun Fire 15K/12K platform. There are physical and logical
attachment points. In addition, single attachment points are used for board slots,
and dynamic attachment points are used for components. Attachment points created
by the DR driver have a physical and logical path.
Physical attachment points for system boards take the following form:
/devices/pseudo/dr@0:SBx (for CPU/memory boards)
-OR-
/devices/pseudo/dr@0:IOx (for I/O boards)
where x represents the expander board number (0 through 17 on a Sun Fire 15K
system, and 0 through 8 on a Sun Fire 12K system).
Logical attachment points for system boards take the following form:
SBx (for CPU/memory boards)
-OR-
IOx (for I/O boards)
33
where x represents the board number (0 through 17 on a Sun Fire 15K system, and 0
through 8 on a Sun Fire 12K system).
Dynamic attachment points refer to components (CPUs and memory) on system
boards and I/O devices on I/O boards. The attachment points are created by the DR
driver. Refer to the dr(7D) man page for more details.
Displaying Board Status
The cfgadm(1M) command displays information about boards and slots. Refer to
the cfgadm_sbd(1M) man page for options to this command.
Basic Status Display
Many operations require that you specify the system board names. To obtain these
system names, type:
# cfgadm -a -s “select=class(sbd)”
The cfgadm(1M) command displays information only about those boards that are
assigned to the domain; or those boards that appear in the available component list
for the domain and are not assigned to any other domain.
For a more detailed status report, use the cfgadm(1M) command with its -v
(verbose) option, which turns on expanded descriptions. In addition to basic
information such as the attachment point ID, receptacle and occupant states, and
34Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
board status, the expanded status report also includes the date when the board was
configured into the domain, the type of board, the activity state, and the physical
attachment point.
Removing a Board
This section describes how to remove a CPU/Memory and an I/O board.
▼ To Remove a CPU/Memory Board
To perform the following steps, you must have domain administrator privileges.
1. Log in to the domain.
2. Use the cfgadm(1M) command with the -l option to determine the attachment
point for the board.
3. Stop all activity on the board.
You must halt all accesses by other CPU/Memory boards and prevent any further
use until the board is replaced by using the appropriate Solaris commands.
4. Verify that the board does not have bound processes running.
If a process is bound to a CPU, the board cannot be removed until the process is
unbound. Refer to the pbind(1M) man page for more information.
5. Unconfigure and disconnect the board using the following single command:
# cfgadm -v -c disconnect SBx
where x represents the board number (0 through 17 on a Sun Fire 15K system, or 0
through 8 on a Sun Fire 12K system).
Note – Do not remove a board until it is disconnected. Otherwise the board will be
damaged.
Chapter 5DR Domain Procedures35
▼ To Remove an I/O Board
To remove an I/O board, you must first stop all usage of the board. To complete the
steps in this procedure, you must have domain administrator privileges.
1. Log in to the domain.
2. Check the status of the board.
# cfgadm -a -s “select=class(sbd)”
3. If the system is using multipathing software:
a. Switch all board functions to the alternate board.
b. Remove any multipathing databases and/or private regions.
c. Wait until all of the alternate paths are functioning before proceeding.
4. Unmount file systems, including metadevices that have a board resident partition
(for example: umount /partition).
Caution – Unmounting file systems may affect NFS client systems.
5. If the board contains Sun RSM Array™ 2000 controllers, take the controllers off
line, using the rm6 or rdacutil commands.
6. Remove disk partitions from the swap configuration.
7. If any process directly opens a device or raw partition, either kill the process or
direct it to close the open device on the board.
8. If a detach-unsafe device is present on the board, close all instances of the device
and use modunload(1M) to unload the driver.
9. Disconnect the board.
# cfgadm -v -c disconnect IOx
where x represents the board number (0 through 17 on a Sun Fire 15K system, or 0
through 8 on a Sun Fire 12K system).
36Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
Note – If the cfgadm(1M) command fails during a DR operation, the board does not
return to its original state. A dxs or dca error message is logged to the domain. If
the error is recoverable, you can retry the command. If the error is unrecoverable,
you will need to reboot the domain to use the board.
Adding a Board
Before installing a board, consider the following points:
■ Never use a board that is defective or suspected to be unreliable; it can crash the
system.
■ The board type and option cards must be supported by DR.
Note – Whenever you use DR to add a COD board into a domain, make sure that
enough COD right-to-use (RTU) licenses are available to the target domain to enable
each active CPU on the COD board. If there are not enough RTU licenses available to
a target domain when you add a COD board to the domain, DR displays an error
message for each CPU that cannot be enabled in the domain. For more information
about the COD option, see the System Management Services (SMS) 1.3 AdministratorGuide.
▼ To Install a Board
To perform a board installation from the domain, the board must already be
assigned to the domain, or must be in the available component list. Refer to the
System Management Services (SMS) 1.3 Administrator Guide for information on how to
assign boards or to update the available component list.
1. Verify that the selected board slot can accept a board.
# cfgadm -a -s “select=class(sbd)”
The states and conditions should be:
■ Receptacle state—Empty
■ Occupant state—Unconfigured
Chapter 5DR Domain Procedures37
■ Condition—Unknown
-OR-
■ Receptacle state—Disconnected
■ Occupant state—Unconfigured
■ Condition—Unknown
2. Connect and configure the board using a single command.
# cfgadm -v -c configure SBx (CPU/memory board)
-OR# cfgadm -v -c configure IOx (I/O board)
where x represents the board number (0 through 17 on a Sun Fire 15K system, or 0
through 8 on a Sun Fire 12K system).
After a short delay during which the system tests the board, a message appears in
the domain console log indicating that the components have been configured. The
states and conditions for a connected and configured attachment point should be:
■ Receptacle state—Connected
■ Occupant state—Configured
■ Condition—OK
Now the system is aware of the usable devices on the board and the devices can be
used.
Note – If the cfgadm(1M) command fails during a DR operation, the board does not
return to its original state. A dxs or dca error message is logged to the domain. If
the error is recoverable, you can retry the command. If the error is unrecoverable,
you will need to reboot the domain to use the board.
38Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
DR Using cfgadm(1M) - Examples
Displaying Help
To display help text for commands use the -h option. If the -h option is followed by
an attachment point identifier, help messages (syntax) related to the hardwarespecific library of the attachment point are shown.
TABLE5-1Commands that Display Help
CommandDescription
# cfgadm -hShows general syntax.
# cfgadm -h pciShows PCI hotplug-specific commands.
# cfgadm -h SB2Shows help related to system board-specific commands and
options.
Displaying Verbose Messages
The -v option displays detailed messages as DR operations proceed. For example:
To configure the memory on system board 2 (SB2) with the verbose option use:
# cfgadm -v -c configure SB2::memory
To unconfigure CPU 3 (cpu3) on system board 0 (SB0) with the verbose option, use
the following command:
# cfgadm -v -c unconfigure SB0::cpu3
Chapter 5DR Domain Procedures39
Suppressing User Confirmation
When certain cfgadm commands are entered (such as unconfigure permanent
memory), the command prompts the user to confirm the operation, yes or no. For
example, the following command unconfigures system board 6 (SB6), which holds
permanent system memory, and prompts the user for confirmation:
# cfgadm -c unconfigure SB6::memory
System may be temporarily suspended, proceed (yes/no)?
You can suppress the confirmation prompt by using the -y or -n option on the
command line. The -y option automatically responds with “yes” and the -n option
responds with “no.” The following example performs exactly the same operation as
the previous command, but uses the -y option to bypass user confirmation:
# cfgadm -y -c unconfigure SB6::memory
#
Power Control When Disconnecting Boards
To unconfigure system board 6 (SB6), which holds the domain’s permanent memory;
answer user confirmation prompts with a “yes” response; and display verbose
messages, use the following command:
To disconnect I/O board 12 (IO12), but leave it powered-off and assigned to the
same domain, use:
# cfgadm -c disconnect I012
Power Control of Disconnected Boards
To power-on system board 2 (SB2), use the following command:
# cfgadm -x poweron SB2
40Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
To power-off system board 2 (SB2), use the following command:
# cfgadm -x poweroff SB2
Connecting and Configuring Boards
When DR configures a board into a domain, it first connects the board electrically to
the system, putting it into the connected state. DR then configures the system board
so that it is fully available to all applications running in the domain, putting it into
the configured state.
When DR removes a board from a domain, it first unconfigures the system board so
that it is no longer available to all applications running in the domain, putting it into
the unconfigured state. DR then disconnects the board electrically from the system,
putting it into the disconnected state.
Hot Plugging PCI Adapter Cards
Each hotplug slot on an I/O board can be individually connected, configured,
unconfigured, and disconnected. Each attachment point for a hotplug slot, which
identifies both the slot and the adapter card that is plugged into the slot, is created
when the I/O board is configured into the domain.
To connect, but not configure, an adapter at slot 1 of I/O board 1 into a domain, use
a command such as the following:
# cfgadm -c connect pcisch0:e01b1slot1
To configure the adapter at slot 1 of I/O board 1 into the domain, use a command
such as the following:
# cfgadm -c configure pcisch0:e01b1slot1
To disconnect an adapter at slot 1 of I/O board 1 before unplugging the adapter, use
a command such as the following:
# cfgadm -c disconnect pcisch13:eo1b1slot1
Chapter 5DR Domain Procedures41
To unconfigure the adapter at slot 1 of I/O board 1 out of the domain, use a
command such as the following:
# cfgadm -c unconfigure pcisch0:e01b1slot1
For more information, see cfgadm_pci(1M).
Testing a Board
The -t option causes a board to be tested. Prior to running the following command,
system board 2 (SB2) must be disconnected, assigned and powered-on. The
following command includes the verbose option:
# cfgadm -vt SB2
The board is tested using the diagnostic level specified for the domain in the
.postrc file; the default is 16.
Displaying Attachment Point Information
This section includes several example of commands that you can use to display
system information about attachment points. See the cfgadm(1M) man page for
additional information.
To list the state, status, and condition of all attachment points with the verbose
option use:
# cfgadm -val
To list the state and condition of an adapter at slot 1 of I/O board 3 use the following
command:
# cfgadm -al pcisch13:e03b1slot1
42Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
The following command displays in columnar format the logical name of each
attachment point; along with its condition; status time in both calendar and parsable
formats; and other information:
The following command displays in columnar format the logical name and physical
ID of each attachment point:
# cfgadm -s “cols=ap_id:physid”
The following command displays in columnar format the logical name of each
attachment point, along with its receptacle state; occupant state; occupant type; busy
status; and class:
domain, 22
connecting a board via the domain, 21
displaying board status, 34
displaying permanent memory size, 10
examples, 39
functions, 7
removing a board, 35
unconfiguring a board via the domain, 25
availability change, 30
condition change, 30
state change, 30
fuser(1M), 9
G
golden IOSRAM, 9
H
help
displaying, 39
hot swap, 3
hot-plug boards, 7
I
I/O devices
as component types, 8
before performing DR operations on, 20
detachability, 3
suspending, 4
suspend-safe, 4
with ADR, 8
ifconfig(1M), 22
IOSRAM
golden, 9
L
logical attachment point, 6
logical domain, 8
46Sun Fire 15K/12K Dynamic Reconfiguration (DR) User Guide • January 2003
M
memory
as a component type, 8
correctable errors, 11
non-permanent, 10
permanent, 10
source, 10
target, 10
unconfiguring
tracking the progress of, 43
mount(1M), 22
multipathing databases, 9
N
ndd(1M), 9
non-permanent memory, 10, 24
O
occupant
defined, 5
occupant states, 16
ok condition, 17, 18
P
permanent memory, 10, 24
finding the board that contains, 43
physical attachment point, 5
physical domain, 8
platform configuration database, 7
populated slots (receptacles), 7
power control