Sun Microsystems Sun Fire High-End and Midrange Systems User Guide

Sun Microsystems, Inc. www.sun.com
Sun Fire High-End and Midrange Systems
Dynamic Reconfiguration User’s Guide
Part No. 819-1501-10 August 2005, Revision A
Submit comments about this document at: http://www.sun.com/hwdocs/feedback
Copyright 2005 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology that is described in this document. In particular, and without
limitation, these intellectual property rights might include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries.
This document and the product to which it pertains are distributed under licenses restricting their use, copying, distribution, and decompilation. No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product might be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in
the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, AnswerBook2, docs.sun.com, Sun Fire, and Solaris™ are trademarks or registered trademarks of Sun
Microsystems, Inc. in the U.S. and in other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and in other
countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
U.S. Government Rights—Commercial use. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, Californie 95054, Etats-Unis. Tous droits réservés. Sun Microsystems, Inc. a les droits de propriété intellectuels relatants à la technologie qui est décrit dans ce document. En particulier, et sans la
limitation, ces droits de propriété intellectuels peuvent inclure un ou plus des brevets américains énumérés à http://www.sun.com/patents et un ou les brevets plus supplémentaires ou les applications de brevet en attente dans les Etats-Unis et dans les autres pays.
Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la copie, la distribution, et la décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, par quelque moyen que ce soit, sans l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y en a.
Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et licencié par des fournisseurs de Sun.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, AnswerBook2, docs.sun.com, Sun Fire, et Solaris™ sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays.
Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems, Inc.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant également les licenciées de Sun qui mettent en place l’interface d ’utilisation graphique OPEN LOOK et qui en outre se conforment aux licences écrites de Sun.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFAÇON.
Please

Contents

Preface xi
1. Introduction to DR 1
DR on Sun Fire High-End and Midrange Systems 1
What DR Lets You Do 2
Overview of Common DR Operations 2
How to Use DR 3
Hot-Plug Hardware 4
Automatic DR (ADR) 4
Capacity on Demand (COD) 5
DR on Solaris Software 6
DR on Domains Running the Solaris 9 OS or Solaris 10 OS 6
DR on Domains Running the Solaris 8 OS 6
2. DR Concepts 7
Dynamic System Domains 8
Attachment Points 8
Attachment Point Classes 9
High-End System Attachment Points 10
Midrange System Attachment Points 10
iii
Changes To Attachment Points 11
States and Conditions 11
Board and Board Slot States 12
Board Conditions 13
Component States 13
Component Conditions 14
Detachability 14
Permanent and Non-Permanent Memory 15
Copy-Rename 15
Memory Interleaving 16
Correctable Memory Errors 16
Quiescence 16
Suspend-Safe and Suspend-Unsafe Devices 18
DR on I/O Boards 19
High-End Systems I/O Boards, Golden IOSRAM, MaxCPU, and hsPCI+ 19
Midrange Systems I/O Assemblies, PCI and CompactPCI 20
Notes about CompactPCI 20
Common DR Board Operations 21
Connect Operation 21
Configure Operation 22
Disconnect Operation 22
Unconfigure Operation 22
Illustrations of DR Concepts 23
3. Preparing to Use DR 27
The cfgadm(1M) Command 27
The rcfgadm(1M) Command (High-End Only) 29
Checking Device Type, State and Condition 30
To display states, types and conditions 30
iv Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
To display information about board slots and components 30
Preparing to Use DR on a Domain 30
To Display Boards Available to the Domain 31
Displaying System Board Status 31
To Display System Board Status 31
Testing Boards 32
To Test a System Board 32
To Test an I/O Board (Midrange Only) 33
To Prepare an I/O Board for DR (High-End Only) 34
4. DR Procedures – From the System Domain 37
Adding System Boards 38
To Add a System Board 38
To Connect a System Board But Not Configure it 39
To Configure a Connected System Board 39
Deleting System Boards 40
To Delete a System Board 40
To Unconfigure But Not Disconnect a System Board 40
To Delete an Unconfigured System Board 40
To Delete a System Board Temporarily 40
To Find the System Board that Contains a Domain’s Permanent Memory
41
To Unconfigure a System Board with Permanent Memory 41
Moving System Boards 42
To Move a System Board Between Domains 42
Adding I/O Boards 43
To Add an I/O Board 43
To Add and Connect an I/O Board But Not Configure it 44
To Configure a Connected I/O Board 44
Contents v
To Delete an I/O Board 45
To Unconfigure an I/O Board But Not Disconnect it 45
To Disconnect an Unconfigured I/O Board 45
Adding/Deleting/Tracking Memory and CPU 45
To Configure CPU on a System Board 45
To Configure Memory on a System Board 46
To Configure All CPUs and Memory on a System Board 46
To Unconfigure CPU on a System Board 46
To Unconfigure Memory on a System Board 46
To Unconfigure All CPUs and Memory on a System Board 47
To Track a Memory Unconfigure Operation 47
PCI Adapter Card Operations 47
To Connect a PCI slot on an I/O Board 48
To Configure a PCI slot on an I/O Board 48
To Disconnect a PCI slot on an I/O Board 48
To Unconfigure a PCI Slot on an I/O Board 49
5. SMS DR Procedures – From the SC (High-End Only) 51
Showing Device Information 52
To Show Device Information 52
Showing Platform Information 54
To Show Platform Information 55
Showing Board Information 55
SC State Models 55
The showboards(1M) command 56
To Show Board Information 57
Adding Boards 57
To Add a Board to a Domain 58
Deleting Boards 58
vi Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
To Delete a Board From a Domain 59
Moving Boards 59
To Move a Board 60
Replacing Active System Boards 60
To Replace an Active System Board 60
SMS DR Commands and Options 61
addboard(1M) 61
deleteboard(1M) 63
moveboard(1M) 65
rcfgadm(1M) 67
scdrhelp(1M) 68
showboards(1M) 68
showdevices(1M) 69
showplatform(1M) 70
Error Message Help System 70
JavaHelp Table of Contents 71
JavaHelp Index 71
JavaHelp Search 72
6. DR Internals 75
Software Components on the Domain 75
Domain Configuration Server (High-End Only) 75
DR Driver 76
Reconfiguration Coordination Manager 76
System Events Framework 77
Software Components on the SC (High-End Only) 77
DR Administration Models 77
DR Processes and Daemons 77
Domain Configuration Agent (DCA) 78
Contents vii
Platform Configuration Daemon (PCD) (High-End Only) 78
Domain X Server (DXS) 78
A. DR Command Summary 79
B. Troubleshooting 81
Unconfigure Operation Failure 81
System Board Unconfiguration Failures 82
Cannot Unconfigure a Board Whose Memory Is Interleaved Across
Boards 82
Cannot Unconfigure a CPU to Which a Process is Bound 82
Cannot Unconfigure a CPU Before All Memory is Unconfigured
(Midrange Only) 83
Unable to Unconfigure Memory on a Board With Permanent Memory 83
Unable to Unconfigure a CPU 84
Unable to Disconnect a Board 85
I/O Board Unconfiguration Failure 85
Device Busy 85
Problems with I/O Devices 85
RPC or TCP Time-out or Loss of Connection 87
Configure Operation Failure 87
Memory Configuration Failure (Midrange Only) 87
I/O Board Configuration Failure 87
Glossary 89
Index 93
viii Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005

Tables

TABLE 1-1 Main DR Operations 3
TABLE 2-1 Board and Board Slot States 12
TABLE 2-2 Conrfigured and Unconfigured Boards 12
TABLE 2-3 Board States Visible Only From the SC 13
TABLE 2-4 Board and Board Slot Conditions 13
TABLE 2-5 Connected Components: Configured or Unconfigured 13
TABLE 2-6 CPU or Memory Module Conditions 14
TABLE 3-1 cfgadm Options 28
TABLE 3-2 System Board Status Sample Display 32
TABLE 3-3 Diagnostic Levels 33
TABLE 5-1 showdevices Sample Output, CPU 52
TABLE 5-2 showdevices Sample Output, UltraSPARC IV+ (showdevices -d G) 52
TABLE 5-3 showdevices Sample Output, Memory Drain In-Progress 53
TABLE 5-4 showdevices Sample Output, IO Devices 53
TABLE 5-5 Board State Conditions on the Sun Fire High-End Systems SC 56
TABLE 5-6 addboard Command Options 61
TABLE 5-7 Privileges Needed to Use the addboard command 62
TABLE 5-8 deleteboard Command Options 63
TABLE 5-9 Privileges Needed to Use the deleteboard Command 64
TABLE 5-10 moveboard Command Options 65
ix
TABLE 5-11 Privileges Needed to Use the moveboard Command 66
TABLE 5-12 rcfgadm Command Options 67
TABLE 5-13 Privileges Needed to Use the rcfgadm Command 68
TABLE 5-14 showboards Command Options 69
TABLE 5-15 showdevices Command Options 69
TABLE 5-16 showplatform Command Options 70
TABLE A-1 DR Operation and Command Summary 79
x Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005

Preface

This document describes the dynamic reconfiguration (DR) software on Sun Fire™ E25K/E20K/15K/12K systems and Sun Fire E6900/E4900/6800/4810/4800/3800 systems running the Solaris™ Operating System (Solaris OS).
This document replaces the following user guides:
Sun Fire High-End Systems Dynamic Reconfiguration User Guide
Sun Fire Midrange Systems Dynamic Reconfiguration User Guide
System Management Services (SMS) Dynamic Reconfiguration User Guide
Before You Read This Document
This book is intended for the Sun Fire high-end and midrange system platform administrator who has a working knowledge of UNIX® systems, particularly those based on the Solaris OS. If you do not have such knowledge, first read the Solaris OS user and system administrator books provided with this system, and consider UNIX system administration training.
Using UNIX Commands
This document does not contain information about basic UNIX® commands and procedures such as shutting down the system, booting the system, and configuring devices. See the following sources for this information:
Software documentation that you received with your system
Solaris OS documentation, which is at: http://docs.sun.com
xi
Shell Prompts
Shell Prompt
C shell machine-name%
C shell superuser machine-name#
Bourne shell and Korn shell $
Bourne shell and Korn shell superuser #
Typographic Conventions
1
Typeface
AaBbCc123 The names of commands, files,
AaBbCc123 What you type, when contrasted
AaBbCc123 Book titles, new words or terms,
1 The settings on your browser might differ from these settings.
Meaning Examples
Edit your .login file. and directories; on-screen computer output
with on-screen computer output
words to be emphasized. Replace command-line variables with real names or values.
Use ls -a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
xii Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Related Documentation
View the documents listed online at:
http://www.sun.com/products-n-solutions/hardware/docs/
Application Title
Platform-specific documents
Platform-specific release notes
Solaris commands, including cfgadm(1M)
Sun Management Center Sun Management Center User’s Guide
Capacity on Demand (COD)
Sun Fire Midrange Systems Platform Administration Manual
Sun Fire High-End Systems Administration Manual
System Management Services (SMS) Administrator Guide
SMS Reference Manual
Solaris 8 or 9 Release Notes Supplement for Sun Hardware
Solaris 10 Release Notes
System Management Services (SMS) Release Notes
Solaris Command Reference Manual
System Management Services (SMS) Administrator Guide,
Documentation, Support, and Training
Sun Function URL Description
Documentation http://www.sun.com/documentation/ Download PDF and HTML documents,
and order printed documents
Support and Training
http://www.sun.com/supportraining/ Obtain technical support, download
patches, and learn about Sun courses
Preface xiii
Third-Party Web Sites
Sun is not responsible for the availability of third-party web sites mentioned in this document. Sun does not endorse and is not responsible or liable for any content, advertising, products, or other materials that are available on or through such sites or resources. Sun will not be responsible or liable for any actual or alleged damage or loss caused by or in connection with the use of or reliance on any such content, goods, or services that are available on or through such sites or resources.
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by going to:
http://www.sun.com/hwdocs/feedback
Please include the title and part number of your document with your feedback:
Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide, part number 819-1501-10.
xiv Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
CHAPTER
1

Introduction to DR

The Sun Fire high-end and midrange systems listed in the Preface can be divided into domains, each functioning as a separate computer, running its own operating system (see (DR) feature lets you enable and disable a domain’s system boards, I/O boards, and certain components while that domain continues running.
Part of DR runs on Solaris software in the domain and is managed through the cfgadm(1M) command. Another part runs on the system controller (SC).
This chapter covers the following topics:
“DR on Sun Fire High-End and Midrange Systems” on page 1
“What DR Lets You Do” on page 2
“How to Use DR” on page 3
“Hot-Plug Hardware” on page 4
“Automatic DR (ADR)” on page 4
“Capacity on Demand (COD)” on page 5
“DR on Solaris Software” on page 6
“Dynamic System Domains” on page 8). The dynamic reconfiguration

DR on Sun Fire High-End and Midrange Systems

System boards on midrange systems are sometimes called CPU/Memory boards. They are the same boards as those on high-end systems. This document exclusively uses the term system board. System boards are interchangable between high-end and midrange platforms.
High-end system I/O boards and midrange systems I/O assemblies are similar in some ways, but different in others. This document uses the term I/O board for both except when necessary for clarity.
1
The I/O buses on a high-end system I/O board support PCI or hsPCI+ cards and MaxCPU boards. A MaxCPU board fits into slot 1 and contains two CPUs and no memory.
Midrange system I/O boards support PCI or CompactPCI cards.
This document uses the generic term PCI when referring to hsPCI+ and CompactPCI cards except when clarity demands otherwise.

What DR Lets You Do

Some of the tasks you can use DR for include:
Display the status and state of system or I/O boards and some components to
help you prepare for DR operations.
Test live boards.
Logically detach (electrically isolate) system or I/O boards from a domain in
preparation for moving to another domain or removal from the system while the domain remains running. The detach operation is sometimes called a delete board action.
Logically attach system or I/O boards to a domain, to add resources or replace a
removed board, while the domain remains running. The attach operation is sometimes called an add board action.
Configure or unconfigure CPU or memory modules on system boards to control
power and capacity of a domain or isolate faulty components.
Enable or disable PCI cards or related components and slots.
For example, you can DR detach a faulty system board, then use the system’s hot­plug feature to physically remove it. After plugging in the repaired board or a replacement, you can use DR to configure the board into the domain. If you use the DR feature to add or remove a system board or component, DR always leaves the board or component in a known configuration state. See
“States and Conditions” on page 11 for more information about configuration states for system boards and
components.
You can also assign a system board or I/O board to a different domain for load balancing or to provide extra capabilities for specific tasks.

Overview of Common DR Operations

DR software enables you to do the following tasks:
Add, delete, or move system boards or I/O boards between domains.
2 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Configure or unconfigure CPU or memory modules on system boards.
Connect and configure or disconnect and unconfigure PCI cards on I/O boards.
The four main types of DR operations that support the above actions are connect, configure, unconfigure, and disconnect.
TABLE 1-1 Main DR Operations
Operation Description
Connect Provides power to the slot that holds a board and begins system
monitoring of the board’s temperature.
Configure Makes the operating system assign functional roles to a board, and load
device drivers for the board, and for devices attached to the board. The configure operation includes a connect operation.
Unconfigure Logically detaches a board from the operating system and takes the
associated device drivers offline. Environmental monitoring continues, but devices on the board are not available for system use.
Disconnect Turns off power to the slot that holds the board and stops monitoring the
board. The disconnect operation includes an unconfigure operation.
Note – If a system board is in use, you must stop its use and disconnect it from the
domain before you power it off. After a new or upgraded system board is inserted and powered on, connect its attachment point (see
“Attachment Points” on page 8)
and configure it for use by the operating system. For more information about DR operations, see “Common DR Board Operations” on page 21.

How to Use DR

You can initiate DR operations in any of the following ways:
Use the GUI provided by Sun™ Management Center software. For more
information, see the Sun Management Center User’s Guide.
Use the Solaris command cfgadm(1M) with the appropriate options and flags in
the domain. use cfgadm with its DR-related options, organized by task.
On high-end systems, use the System Management Services (SMS) DR command
rcfgadm(1M) on the SC. rcfgadm(1M) takes the same DR-related options as cfgadm(1M). The main visible difference is that rcfgadm(1M) often requires an
additional -d domain_id parameter. For information about rcfgadm(1M), see
rcfgadm(1M)” on page 67.
“DR Procedures – From the System Domain” on page 37 tells how to
Chapter 1 Introduction to DR 3
On high-end systems, use the SMS DR commands (besides rcfgadm(1M)) on the
SC. The SMS DR commands include addboard(1M), moveboard(1M), deleteboard(1M), )and others. You can find information about these commands in
“SMS DR Procedures – From the SC (High-End Only)” on page 51, in the SMS
Reference Manual, or by executing the man(1) command in an SC window running SMS software.
When running DR on a midrange system you might need to execute one or more midrange system SC commands – such as showplatform and showboards – before or during DR operations. Their use is briefly described where appropriate in this document, and you can find more information about them in the Sun Fire Midrange Systems Controller Command Reference Manual.
Caution – The midrange system SC commands addboard and deleteboard are
not DR commands like the high-end system SMS commands of the same name. You can safely use these midrange system SC commands only when the domain is powered off. For more information about these and other midrange system SC commands, see the Sun Fire Midrange Systems Controller Command Reference Manual.

Hot-Plug Hardware

A hot-pluggable device can be logically connected to or disconnected from a running system. (A hot-swappable device can be physically connected to or disconnected from a running system.) Hot-pluggable boards and modules have special connectors that supply electrical power to the board or module before the data pins make contact. Boards and devices that have hot-plug connectors can be inserted or removed while the system is running; that is, they are hot-swappable.
System boards and I/O boards are hot-plug devices. However, some devices, such as the peripheral power supply, are not hot-plug modules and cannot be disconnected while the system is running.

Automatic DR (ADR)

Automatic DR (ADR) lets your applications execute DR operations with no user interaction. ADR uses an enhanced DR framework that includes the reconfiguration coordination manager (RCM) and the system event facility, sysevent. The RCM enables application-specific loadable modules to register callbacks. The callbacks can
4 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
perform preparatory tasks before, error-recovery actions during, and clean-up after a DR operation. The system event framework enables applications to register for system events and receive notifications of those events.
ADR interfaces with the RCM and sysevent to enable applications to automatically give up resources prior to unconfiguring them, and to capture new resources as they are configured into the domain.
An application can execute the cfgadm(1M) command from a domain, which is called local ADR. In addition, on high-end systems, the application can execute an SMS DR command from the SC, which is called global ADR. On high-end systems you can use global ADR to move system boards from one domain to another, configure hot-swapped boards into a domain, and remove system boards from a domain.

Capacity on Demand (COD)

The Capacity on Demand (COD) option provides additional CPU resources on COD system boards that you install in your Sun Fire system. A Sun Fire COD system can have a mix of both standard and COD system boards installed. At least one active CPU is required for each domain in the system.
You can use DR to move COD boards into and out of domains in the same way you use it to move standard system boards. But you can use the CPUs on a COD board only after you purchase right-to-use (RTU) licenses for them. Each COD RTU license entitles you to receive a COD RTU license key that enables a specified number of CPUs on COD boards in a single system.
Whenever you use DR to configure a COD board into a domain, make sure enough RTU licenses are available to the target domain to enable each active CPU on the COD board. If the target domain does not have enough RTU licences available to it when you attempt to add a COD board, the system displays a status message for each CPU that cannot be enabled in the domain.
For more information about the COD option for high-end systems, see the System Management Services (SMS) Administrator Guide.
Chapter 1 Introduction to DR 5

DR on Solaris Software

This document describes the latest version of DR as it runs on or with the latest Solaris 8, Solaris 9, and Solaris 10 software releases. Be sure to check the SunSolve database at
Note – Sun Microsystems suggests you run the latest versions of all Sun software on
your systems for the highest performance and to take advantage of the latest enhancements.
The following sections describe any special considerations for using DR with specific Solaris releases.
http://sunsolve.sun.com for the latest patches.
SM

DR on Domains Running the Solaris 9 OS or Solaris 10 OS

The Solaris 10 3/05 HW1 OS is the first release of Solaris 10 software to support the UltraSPARC® IV+ system board, and the Solaris 9 9/05 OS is the first release of Solaris 9 software to do so. You can add UltraSPARC IV+ boards to a domain configured with older boards, but you cannot use DR to add an older board to a domain that was booted with all UltraSPARC IV+ boards. (You can add an older board to a domain booted with all UltraSPARC IV+ boards if you shut down the domain first.)
For additional information about domain restrictions with UltraSPARC IV+ boards on Sun Fire midrange systems, see the Sun Fire Midrange Systems Platform Administration Manual for Firmware Release 5.19.

DR on Domains Running the Solaris 8 OS

The Solaris 8 2/02 OS was the first release of Solaris 8 software to support DR of I/O boards. In addition, System Management Services (SMS) 1.3 on Sun Fire high-end systems is the first release of SMS to fully support DR. You can enable the full functionality of DR on domains running software no earlier than the Solaris 8 2/02 OS by installing patches and a new kernel update on the domain; and by installing the latest version of SMS software on your high-end server’s system controller (SC). The Solaris 8 OS does not support UltraSPARC IV+ boards.
6 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
CHAPTER
2

DR Concepts

This chapter describes the DR concepts you should understand before attempting to use DR.
If you plan to execute DR operations on a high-end server’s system controller (SC) using SMS DR commands, be sure to read
the SC (High-End Only)” on page 51. Some of the information in this chapter is
repeated in Chapter 5, but from a different perspective. Reading both chapters might yield a more comprehensive picture of the DR feature.
This chapter covers the following topics:
“Dynamic System Domains” on page 8
“Attachment Points” on page 8
“States and Conditions” on page 11
“Detachability” on page 14
“Permanent and Non-Permanent Memory” on page 15
“Quiescence” on page 16
“Suspend-Safe and Suspend-Unsafe Devices” on page 18
“DR on I/O Boards” on page 19
“Common DR Board Operations” on page 21
“Illustrations of DR Concepts” on page 23
Chapter 5, “SMS DR Procedures – From
Note – The UltraSPARC IV+ board contains dual-core CPUs. References in this
document to CPUs or processors might refer to either single-core or double-core types, and all procedures apply to both.
7

Dynamic System Domains

The Sun Fire system can be divided into domains. Each domain is based on the system board slots that are assigned to it. Further, each domain is electrically isolated into hardware partitions, which ensures that any failure in one domain does not affect the other domains in the server.
Each domain configuration is determined in a onfiguration database which resides on the SC. The configuration database – on high-end systems, the platform configuration database (PCD) – controls how the system board slots are logically partitioned into domains. The domain configuration represents the intended domain configuration. Thus, the configuration can include empty slots and populated slots. The physical domain is determined by the logical domain.
The number of slots available to a given domain is controlled by an ACL. ACL is an abbreviation for available component list on high-end system domains, or access control list on midrange system domains. The ACL for all domains is maintained on the SC. A slot must be assigned or available to a domain before you can change its state. After a slot has been assigned to a domain, it becomes visible to that domain and invisible and unavailable to all other domains. Conversely, you must disconnect and unassign a slot from its domain before you can assign and connect it to another domain.
The logical domain is the set of slots that belong to the domain. The physical domain is the set of boards that are physically interconnected. A slot can be a member of a logical domain without having to be part of a physical domain. After the domain is booted, the system boards and the empty slots can be assigned to or unassigned from a logical domain; however, they are not allowed to become a part of the physical domain until the operating system requests it. System boards or slots that are not assigned to any domain are available to all domains. These boards can be assigned to a domain by the platform administrator; however, an ACL can be set up on the SC to allow users with appropriate privileges to assign available boards to a domain.

Attachment Points

An attachment point is a collective term for a board or device, the slot that holds it, and any components on it. Slots are sometimes called receptacles.
Sun Fire systems support the following attachment points:
8 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Board attachment point – A system or I/O board slot, the board in that slot, and
any devices connected to the board.
PCI attachment point – A PCI card and its attachment to the PCI bus that holds it.
Component attachment point – A CPU or memory module and its connection to the
system board. A component attachment point is sometimes called a dynamic attachment point.
Note – Many users are concerned only with changing the status of boards and
devices. So, for simplicity, some procedures in this document refer to board attachment points simply as boards, PCI attachment points as PCI cards, and component attachment points as CPU or memory modules. Where simplification might cause confusion, proper names are used.
The term occupant refers to the combination of a board and its attached devices, including any external storage devices connected by interface cables.
Board slots can be named according to slot numbers, or can be anonymous (for example, when in a SCSI chain).
DR recognizes two types of attachment point names:
Physical attachment point – The software driver and the location of the slot.
Logical attachment point – An abbreviated name created by the system to see the
physical attachment point.
To obtain a list of all available logical attachment points, use the following command in the domain:
# cfgadm -l

Attachment Point Classes

Sun Fire systems support classes of attachment points. The two classes DR users need to know about are sbd and pci.
sbd – System boards, CPU and memory modules, and the CPU and memory
modules’ connections to the system board. Also, I/O boards, PCI buses, and the PCI buses’ connections to the I/O board.
pci – PCI cards, which connect into PCI buses.
Chapter 2 DR Concepts 9
To view a list of the attachment points and the type of board associated with each, use the following command as superuser:
# cfgadm -s -a “cols=ap_id:class”

High-End System Attachment Points

Examples of physical attachment point names on high-end systems are:
/devices/pseudo/dr@0:SBx (for a system board in slot 0) /devices/pseudo/dr@0:IOx (for an I/O board in slot 1)
where 0 is node 0 (zero), SB is a system board, IO is an I/O board, and x represents the board number or expander number for a particular board. System boards and I/O boards are numbered 0 to 17.
Note – System boards are installed only in slot 0. I/O boards and Max CPU boards
are installed only in slot 1.
Logical attachment points on a high-end system take one of the following two forms:
SBx (for system boards) IOx (for I/O boards or Max CPU boards)

Midrange System Attachment Points

Examples of physical attachment point names on a midrange system are:
/devices/ssm@0,0:N0.SBx (for a system board) /devices/ssm@0,0:N0.IBx (for an I/O board)
where N0 is node 0 (zero), SB is a system board, IB is an I/O board, and x is a slot number (0 through 5 for a system board, 6 through 9 for an I/O board).
10 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Logical attachment points on midrange systems take one of the following two forms:
N0.SBx (for a system board) N0.IBx (for an I/O board)

Changes To Attachment Points

You can use the cfgadm(1M) command to change attachment points. You can:
Change the state of an attachment point. The specific cfgadm(1M) options are:
configure
unconfigure
connect
disconnect
Change the availability of an attachment point’s associated board. The specific
cfgadm(1M) options are:
assign
unassign
Change the condition of an attachment point’s board slot. The specific
cfgadm(1M) options are:
poweron
poweroff
test
For information about states, see the sections that follow. For more information about attachment points, see the cfgadm(1M) man page.

States and Conditions

This section describes the states and conditions of boards, slots, components, and attachment points.
State is the operational status of either a board slot or its occupant.
Condition is the operational status of an attachment point.
The cfgadm(1M) command can display nine types of states and conditions. For more information, see
Conditions” on page 14.
“Component States” on page 13 and “Component
Chapter 2 DR Concepts 11
Note – The following information about boards and board slots also applies to PCI
cards and the PCI buses that hold them.

Board and Board Slot States

When a board slot does not hold a board, its state is empty. When the slot does contain a board, the state of the board is either disconnected or connected.
TABLE 2-1 Board and Board Slot States
State Description
empty The slot does not hold a board.
disconnected The board in the slot is disconnected from the system bus. A board
can be in the disconnected state without being powered off. However, a board must be powered off and in the disconnected state before you remove it from the slot. A newly inserted board is in the disconnected state.
connected The board in the slot is powered on and connected to the system
bus. You can view the components on a board only after it is in the connected state.
Caution – Physically removing a board that is in the connected state, or that is
powered on and in the disconnected state, crashes the operating system and can result in permanent damage to that system board.
A board in the connected state is either configured or unconfigured. A board that is disconnected is always unconfigured.
TABLE 2-2 Conrfigured and Unconfigured Boards
Name Description
configured The board is available for use by the Solaris software.
unconfigured The board is not available for use by the Solaris software.
12 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
The following states are visible only from the SC:
TABLE 2-3 Board States Visible Only From the SC
Name Description
Available The slot, which might or might not contain a board, is not assigned
to any particular domain.
Assigned The slot, which might or might not contain a board, belongs to a
domain, but the hardware has not been configured to use it.
Active The board in the slot is being actively used by the domain to which
it has been assigned. You cannot reassign an active board.

Board Conditions

A board can be in one of three conditions: unknown, ok, or failed. Its slot might be designated as unusable.
TABLE 2-4 Board and Board Slot Conditions
Name Description
unknown The board has not been tested.
ok The board is operational.
failed The board failed testing. unusable The board slot is unusable.

Component States

Unlike a board, a CPU or memory module cannot be individually connected or disconnected. Thus, all such components are in the connected state.
The connected component is either configured or unconfigured.
TABLE 2-5 Connected Components: Configured or Unconfigured
Name Description
configured The component is available for use by the Solaris OS.
unconfigured The component is not available for use by the Solaris OS.
Chapter 2 DR Concepts 13

Component Conditions

A CPU or memory module is unknown, ok, or failed.
TABLE 2-6 CPU or Memory Module Conditions
Name Description
unknown The component has not been tested.
ok The component is operational.
failed The component failed testing.

Detachability

A detachable device is one that conforms to the following rules:
The device driver must support DDI_DETACH.
Critical resources must be redundant or accessible through an alternate pathway.
CPUs and memory banks can be redundant critical resources. Disk drives are examples of critical resources that can be accessible through an alternate pathway.
Some boards cannot be detached because their resources cannot be moved. For example, if a domain has only one CPU board, that CPU board cannot be detached. An I/O board is not detachable if it controls the boot drive.
If an I/O board has no alternate pathway, you can do one of the following:
Put the disk chain on a separate I/O board. The secondary I/O board can then be
detached.
Add a second path to the device through a second I/O board so that the I/O
board can be detached without losing access to the secondary disk chain.
Note – If you are unsure whether a device is detachable, consult with your Sun
service representative.
14 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005

Permanent and Non-Permanent Memory

Before you can delete a board, the operating system must vacate the memory on that board. Vacating a board entails flushing the contents of its non-permanent memory to swap space; and copying the contents of its permanent memory (that is, the kernel and OpenBoot™ PROM software) to another memory board.
To relocate permanent memory, the operating system on a domain must be temporarily quiesced. The length of the quiescence depends on the domain I/O configuration and the running workloads.
Detaching a board with permanent memory is the only time when the operating system is quiesced; therefore, you should know where permanent memory resides so that you can avoid impacting the operation of the domain significantly. To display the size of permanent memory, use the cfgadm(1M) command with its -av option. To vacate a board that has permanent memory, the operating system must find a sufficiently large block of available memory, called target memory, on which to copy the current contents of permanent memory, which is referred to as source memory.

Copy-Rename

User processes can release memory by paging it out to the swap device. But the Solaris kernel, which resides in permanent memory, cannot be released in that manner. Instead, cfgadm uses the copy-rename technique to release the memory. After the OS identifies a suitable target board – one that has enough memory to hold the permanent memory to be moved – the DR software executes the following steps:
1. Vacates the memory on the target board by paging the memory out to swap.
2. Quiesces the operating system.
3. Copies the contents (permanent memory) from the source board to the target board. This is the copy part of the operation.
4. Reprograms the hardware to swap the memory address ranges of the source and target board. This is the rename part of the operation.
5. Releases the operating system from its quiesced state.
Chapter 2 DR Concepts 15

Memory Interleaving

System boards cannot be dynamically reconfigured if system memory is interleaved across multiple system boards. PCI cards and I/O boards can be dynamically reconfigured regardless of whether memory is interleaved.
For more information about memory interleaving on high-end systems, see the Sun Fire High-End Systems Administration Manual. For midrange systems, see the interleave-scope parameter of the setupdomain command, which is described in both the Sun Fire Midrange Systems Platform Administration Manual and the Sun
Fire Midrange System Controller Command Reference Manual.

Correctable Memory Errors

Correctable memory errors indicate that the memory on a system board – that is, one or more of its dual inline memory modules (DIMMs), or portions of the hardware interconnect – might be faulty and need replacement. When the SC detects correctable memory errors, it initiates a record-stop dump to save the diagnostic data, which can interfere with a DR operation.
When a record-stop occurs from a correctable memory error, allow the record-stop dump to complete before you initiate a DR operation.
If the faulty component causes repeated reporting of correctable memory errors, the SC performs multiple record-stop dumps. If this happens, you should temporarily disable the dump-detection mechanism on the SC; allow the current dump to finish; then initiate the DR operation. After the DR operation finishes, re-enable the dump detection.

Quiescence

During the unconfigure operation on a system board with permanent memory (OpenBoot™ PROM or kernel memory), the operating system is briefly paused, which is known as operating system quiescence. All operating system and device activity on the domain must cease during this critical phase of the operation.
A quick way to determine whether a board has permanent memory is to use the following command:
# cfgadm -av | grep permanent
16 Sun Fire High-End and Midrange Systems Dynamic Reconfiguration User’s Guide • August 2005
Loading...
+ 80 hidden pages