IBM p5 590, p5 595 System Handbook

ibm.com/redbooks
IBM Eserver p5 590 and 595
System Handbook
Peter Domberg
Nia Kelley
TaiJung Kim
Ding Wei
Component-based description of the hardware architecture
A guide for machine type 9119 models 590 and 595
Capacity on Demand explained
Front cover
IBM Eserver p5 590 and 595 System Handbook
March 2005
International Technical Support Organization
SG24-9119-00
© Copyright International Business Machines Corporation 2005. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
First Edition (March 2005)
This edition applies to the IBM Sserver p5 9119 Models 590 and 595.
Note: Before using this information and the product it supports, read the information in “Notices” on page xv.
© Copyright IBM Corp. 2005. All rights reserved. iii
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Chapter 1. System overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 What’s new . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 General overview and characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Microprocessor technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Memory subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 I/O subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.4 Media bays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.5 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Features summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Operating systems support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.1 AIX 5L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Chapter 2. Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Server overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 The POWER5 microprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Simultaneous multi-threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Dynamic power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 The POWER chip evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 CMOS, copper, and SOI technology. . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.5 Processor books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.6 Processor clock rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Memory subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Memory cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.2 Memory placement rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Central electronics complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iv IBM Eserver p5 590 and 595 System Handbook
2.4.1 CEC backplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 System flash memory configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 Vital product data and system smart chips . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 I/O drawer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7.1 EEH adapters and partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7.2 I/O drawer attachment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7.3 Full-drawer cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7.4 Half-drawer cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.5 blind-swap hot-plug cassette. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.7.6 Logical view of a RIO-2 drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7.7 I/O drawer RAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.7.8 Supported I/O adapters in p5-595 and p5-590 systems . . . . . . . . . . 47
2.7.9 Expansion units 5791, 5794, and 7040-61D . . . . . . . . . . . . . . . . . . . 50
2.7.10 Configuration of I/O drawer ID and serial number. . . . . . . . . . . . . . 54
Chapter 3. POWER5 virtualization capabilities. . . . . . . . . . . . . . . . . . . . . . 57
3.1 Virtualization features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Micro-Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.1 Shared processor partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.2 Types of shared processor partitions . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.3 Typical usage of Micro-Partitioning technology. . . . . . . . . . . . . . . . . 64
3.2.4 Limitations and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Virtual Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.1 Virtual LAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2 Virtual Ethernet connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.3 Dynamic partitioning for virtual Ethernet devices . . . . . . . . . . . . . . . 72
3.3.4 Limitations and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Shared Ethernet Adapter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4.1 Connecting a virtual Ethernet to external networks. . . . . . . . . . . . . . 73
3.4.2 Using Link Aggregation (EtherChannel) to external networks . . . . . 77
3.4.3 Limitations and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 Virtual I/O Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Virtual SCSI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.6.1 Limitations and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 4. Capacity on Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1 Capacity on Demand overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 What’s new in Capacity on Demand? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Preparing for Capacity on Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.1 Step 1. Plan for future growth with inactive resources . . . . . . . . . . . 87
4.3.2 Step 2. Choose the amount and desired level of activation . . . . . . . 88
4.4 Types of Capacity on Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Capacity BackUp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Contents v
4.6 Capacity on Demand activation procedure . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Using Capacity on Demand. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.1 Using Capacity Upgrade on Demand . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.2 Using On/Off Capacity On Demand . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7.3 Using Reserve Capacity on Demand . . . . . . . . . . . . . . . . . . . . . . . . 98
4.7.4 Using Trial Capacity on Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.8 HMC Capacity on Demand menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.8.1 HMC command line functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.9 Capacity on Demand configuration rules . . . . . . . . . . . . . . . . . . . . . . . . 103
4.9.1 Processor Capacity Upgrade on Demand configuration rules . . . . 103
4.9.2 Memory Capacity Upgrade on Demand configuration rules . . . . . . 104
4.9.3 Trial Capacity on Demand configuration rules . . . . . . . . . . . . . . . . 104
4.9.4 Dynamic processor sparing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.10 Software licensing considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.10.1 License entitlements for permanent processor activations . . . . . . 106
4.10.2 License entitlements for temporary processor activations . . . . . . 107
4.11 Capacity on Demand feature codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Chapter 5. Configuration tools and rules . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.1 Configuration tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1.1 IBM Configurator for e-business . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1.2 LPAR Validation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Configuration rules for p5-590 and p5-595 . . . . . . . . . . . . . . . . . . . . . . . 117
5.2.1 Minimum configuration for the p5-590 and p5-595 . . . . . . . . . . . . . 118
5.2.2 LPAR considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.2.3 Processor configuration rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.2.4 Memory configuration rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2.5 Advanced POWER Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.6 I/O sub-system configuration rules . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.7 Disks, boot devices, and media devices . . . . . . . . . . . . . . . . . . . . . 128
5.2.8 PCI and PCI-X slots and adapters . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2.9 Keyboards and displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.10 Frame, power, and battery backup configuration rules . . . . . . . . . 130
5.2.11 HMC configuration rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2.12 Cluster 1600 considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.3 Capacity planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.3.1 p5-590 and p5-595 general considerations. . . . . . . . . . . . . . . . . . . 135
5.3.2 Further capacity planning considerations . . . . . . . . . . . . . . . . . . . . 137
Chapter 6. Reliability, availability, and serviceability. . . . . . . . . . . . . . . . 139
6.1 What’s new in RAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.2 RAS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3 Predictive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
vi IBM Eserver p5 590 and 595 System Handbook
6.3.1 First Failure Data Capture (FFDC) . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3.2 Predictive failure analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.3.3 Component reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.3.4 Extended system testing and surveillance . . . . . . . . . . . . . . . . . . . 145
6.4 Redundancy in components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.4.1 Power and cooling redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.4.2 Memory redundancy mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.4.3 Service processor and clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.4.4 Multiple data paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Fault recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5.1 PCI bus error recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5.2 Dynamic CPU Deallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.5.3 CPU Guard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.5.4 Hot-plug components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.5.5 Hot-swappable boot disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.5.6 Blind-swap, hot-plug PCI adapters . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.6 Serviceability features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.6.1 Converged service architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.6.2 Hardware Management Console . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.6.3 Error analyzing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.6.4 Service processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.6.5 Service Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.6.6 Service Focal Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.7 AIX 5L RAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.8 Linux RAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Chapter 7. Service processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.1 Service processor functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.1.1 Firmware binary image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.1.2 Platform initial program load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.1.3 Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2 Service processor cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.3 Advanced System Management Interface . . . . . . . . . . . . . . . . . . . . . . . 175
7.3.1 Accessing ASMI using HMC Service Focal Point utility . . . . . . . . . 175
7.3.2 Accessing ASMI using a Web browser . . . . . . . . . . . . . . . . . . . . . . 177
7.3.3 ASMI login window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.3.4 ASMI user accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.3.5 ASMI menu functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3.6 Power On/Off tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3.7 System Service Aids tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3.8 System Configuration ASMI menu . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.3.9 Network Services ASMI menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.3.10 Performance Setup ASMI menu . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Contents vii
7.4 Firmware updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.5 System Management Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Chapter 8. Hardware Management Console overview . . . . . . . . . . . . . . . 195
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.1.1 Desktop HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.1.2 Rack mounted HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.1.3 HMC characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.2 HMC setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.2.1 The HMC logical communications. . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.3 HMC network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.3.1 Private and open networks in the HMC environment . . . . . . . . . . . 201
8.3.2 Using the HMC as a DHCP server . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.3.3 HMC connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.3.4 Predefined HMC user accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.4 HMC login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.4.1 Required setup information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.5 HMC Guided Setup Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.6 HMC security and user management . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.6.1 Server security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
8.6.2 Object manager security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.6.3 HMC user management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.7 Inventory Scout services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.8 Service Agent and Service Focal Point . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.8.1 Service Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.8.2 Service Focal Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.9 HMC service utilities and tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8.9.1 HMC boot up fails with fsck required. . . . . . . . . . . . . . . . . . . . . . . . 243
8.9.2 Determining HMC serial number. . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Appendix A. Facts and features reference . . . . . . . . . . . . . . . . . . . . . . . . 245
Appendix B. PCI adapter placement guide. . . . . . . . . . . . . . . . . . . . . . . . 253
Expansion unit back view PCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
PCI-X slot description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Recommended system unit slot placement and maximums . . . . . . . . . . . 255
Appendix C. Installation planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Doors and covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Enhanced acoustical cover option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Slimline cover option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Raised-floor requirements and preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Securing the frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Considerations for multiple system installations . . . . . . . . . . . . . . . . . . . . 261
viii IBM Eserver p5 590 and 595 System Handbook
Moving the system to the installation site . . . . . . . . . . . . . . . . . . . . . . . . . 262
Dual power installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Planning and installation documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Appendix D. System documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
IBM Sserver Hardware Information Center . . . . . . . . . . . . . . . . . . . . . . . . . 266
What is the Hardware Information Center?. . . . . . . . . . . . . . . . . . . . . . . . 266
How do I get it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
How do I get updates? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
How do I use the application? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
© Copyright IBM Corp. 2005. All rights reserved. ix
Figures
1-1 Primary system frame organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1-2 Powered and bolt on frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1-3 POWER4 and POWER5 architecture comparison . . . . . . . . . . . . . . . . . 7
1-4 POWER4 and POWER5 memory structure comparison . . . . . . . . . . . . . 9
1-5 p5-590 and p5-595 I/O drawer organization . . . . . . . . . . . . . . . . . . . . . 10
2-1 POWER4 and POWER5 system structures. . . . . . . . . . . . . . . . . . . . . . 19
2-2 The POWER chip evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2-3 p5-590 and p5-595 16-way processor book diagram . . . . . . . . . . . . . . 25
2-4 Memory flow diagram for MCM0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2-5 Memory card with four DIMM slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2-6 Memory placement for the p5-590 and p5-595 . . . . . . . . . . . . . . . . . . . 29
2-7 p5-595 and p5-590 CEC logic diagram . . . . . . . . . . . . . . . . . . . . . . . . . 32
2-8 p5-595 CEC (top view). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2-9 p5-590 CEC (top view). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2-10 CEC backplane (front side view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2-11 Single loop 7040-61D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2-12 Dual loop 7040-61D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2-13 I/O drawer RIO-2 ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2-14 blind-swap hot-plug cassette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2-15 I/O drawer top view - logical layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2-16 Hardware Information Center search for PCI placement . . . . . . . . . . . . 48
2-17 Select Model 590 or 595 placement . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2-18 PCI-X slots of the I/O drawer (rear view) . . . . . . . . . . . . . . . . . . . . . . . . 50
2-19 PCI placement guide on IBM Sserver Information Center . . . . . . . . . 51
2-20 Minimum to maximum I/O configuration . . . . . . . . . . . . . . . . . . . . . . . . 53
2-21 I/O frame configuration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3-1 POWER5 partitioning concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3-2 Capped shared processor partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3-3 Uncapped shared processor partitions . . . . . . . . . . . . . . . . . . . . . . . . . 63
3-4 Example of a VLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3-5 VLAN configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3-6 Logical view of an inter-partition VLAN . . . . . . . . . . . . . . . . . . . . . . . . . 71
3-7 Connection to external network using AIX routing . . . . . . . . . . . . . . . . . 74
3-8 Shared Ethernet Adapter configuration . . . . . . . . . . . . . . . . . . . . . . . . . 75
3-9 Multiple Shared Ethernet Adapter configuration . . . . . . . . . . . . . . . . . . 76
3-10 Link Aggregation (EtherChannel) pseudo device . . . . . . . . . . . . . . . . . 78
3-11 IBM p5-590 and p5-595 Virtualization Technologies . . . . . . . . . . . . . . . 80
3-12 AIX 5L Version 5.3 Virtual I/O Server and client partitions . . . . . . . . . . 82
x IBM Eserver p5 590 and 595 System Handbook
4-1 HMC Capacity on Demand Order Selection panel . . . . . . . . . . . . . . . . 92
4-2 Enter CoD Enablement Code (HMC window) . . . . . . . . . . . . . . . . . . . . 92
4-3 HMC Billing Selection Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4-4 Manage On/Off CoD Processors HMC Activation window . . . . . . . . . . 97
4-5 Manage On/Off CoD HMC Confirmation Panel and Legal Statement . . 98
4-6 HMC Reserve CoD Processor Activation window . . . . . . . . . . . . . . . . . 99
4-7 CoD Processor Capacity Settings Overview HMC window . . . . . . . . . 101
4-8 CoD Processor Capacity Settings On/Off CoD HMC window . . . . . . . 102
4-9 CoD Processor Capacity Settings Reserve CoD HMC window . . . . . . 102
4-10 CoD Processor Capacity Settings “Trial CoD” HMC window . . . . . . . . 103
5-1 LPAR Validation Tool - creating a new partition . . . . . . . . . . . . . . . . . 114
5-2 LPAR Validation Tool - System Selection dialog . . . . . . . . . . . . . . . . . 114
5-3 LPAR Validation Tool - System Selection processor feature selection 115
5-4 LPAR Validation Tool - Partition Specifications dialog. . . . . . . . . . . . . 115
5-5 LPAR Validation Tool - Memory Specifications dialog. . . . . . . . . . . . . 116
5-6 LPAR Validation Tool - slot assignments . . . . . . . . . . . . . . . . . . . . . . . 117
6-1 IBMs RAS philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6-2 FFDC error checkers and fault isolation registers . . . . . . . . . . . . . . . . 143
6-3 Memory error recovery mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6-4 EEH on POWER5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6-5 blind-swap hot-plug cassette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6-6 Error reporting structure of POWER5 . . . . . . . . . . . . . . . . . . . . . . . . . 161
6-7 Service focal point overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7-1 Service processor (front view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7-2 Bulk power controller connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7-3 Oscillator and service processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7-4 Select service processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7-5 Select ASMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7-6 OK to launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7-7 ASMI login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7-8 ASMI menu: Welcome (as admin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7-9 ASMI menu: Error /Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7-10 ASMI menu: Detailed Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7-11 ASMI menu: Factory Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7-12 ASMI Menu: Firmware Update Policy . . . . . . . . . . . . . . . . . . . . . . . . . 184
7-13 ASMI menu: Network Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7-14 ASMI menu: Logical Memory Block Size . . . . . . . . . . . . . . . . . . . . . . . 186
7-15 Potential system components that require fixes . . . . . . . . . . . . . . . . . 187
7-16 Getting fixes from the IBM Sserver Hardware Information Center . . 188
7-17 Partition profile power-on properties . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7-18 System Management Services (SMS) main menu . . . . . . . . . . . . . . . 190
7-19 Select Boot Options menu options. . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7-20 Configure Boot Device Order menu. . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Figures xi
7-21 Current boot sequence menu (default boot list). . . . . . . . . . . . . . . . . . 193
8-1 Private direct network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8-2 HMC with hub/switch attachment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8-3 HMC attached to both private and public network . . . . . . . . . . . . . . . . 205
8-4 Primary and secondary HMC to BPC connections . . . . . . . . . . . . . . . 206
8-5 First screen after login as hscroot user . . . . . . . . . . . . . . . . . . . . . . . . 208
8-6 Guided Setup Wizard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8-7 Date and Time settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8-8 The hscroot password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8-9 The root password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8-10 First part of Guided Setup Wizard is done . . . . . . . . . . . . . . . . . . . . . . 213
8-11 Select LAN adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8-12 Speed selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8-13 Network type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8-14 Configure eth0 DHCP range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8-15 Second LAN adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8-16 Host name and domain name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8-17 Default gateway IP address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8-18 DNS IP address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8-19 End of network configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8-20 Client contact information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8-21 Client contact information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8-22 Remote support information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8-23 Callhome connection type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8-24 License agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8-25 Modem configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8-26 Country or region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8-27 Select phone number for modem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
8-28 Dial-up configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
8-29 Authorized user for ESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8-30 The e-mail notification dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
8-31 Communication interruptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8-32 Summary screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8-33 Status screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8-34 Inventory Scout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8-35 Select server to get VPD data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8-36 Store data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8-37 PPP or VPN connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8-38 Open serviceable events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8-39 Manage service events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8-40 Detail view of a service event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8-41 Exchange parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
B-1 7040-61D expansion unit back view with numbered slots . . . . . . . . . . 254
xii IBM Eserver p5 590 and 595 System Handbook
C-1 Search for planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
C-2 Select 9119-590 and 9119-595 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
C-3 Planning information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
D-1 Information Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
D-2 Search field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
D-3 Navigation bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
D-4 Toolbar with start off call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
D-5 Previous pSeries documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
© Copyright IBM Corp. 2005. All rights reserved. xiii
Tables
1-1 p5-590 and p5-595 features summary. . . . . . . . . . . . . . . . . . . . . . . . . . 12
1-2 p5-590 and p5-595 operating systems compatibility . . . . . . . . . . . . . . . 13
2-1 Memory configuration table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2-2 Types of available memory cards for p5-590 and p5-595 . . . . . . . . . . . 29
2-3 Number of possible I/O loop connections . . . . . . . . . . . . . . . . . . . . . . . 39
3-1 Micro-Partitioning overview on p5 systems . . . . . . . . . . . . . . . . . . . . . . 60
3-2 Interpartition VLAN communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3-3 VLAN communication to external network . . . . . . . . . . . . . . . . . . . . . . . 70
3-4 EtherChannel and Link Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4-1 CoD feature comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4-2 Types of Capacity on Demand (functional categories) . . . . . . . . . . . . . 90
4-3 Permanently activated processors by MCM . . . . . . . . . . . . . . . . . . . . 104
4-4 License entitlement example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4-5 p5-590 and p5-595 CoD Feature Codes . . . . . . . . . . . . . . . . . . . . . . . 109
5-1 p5-590 minimum system configuration . . . . . . . . . . . . . . . . . . . . . . . . 118
5-2 p5-595 minimum system configuration . . . . . . . . . . . . . . . . . . . . . . . . 119
5-3 Configurable memory-to-default memory block size . . . . . . . . . . . . . . 125
5-4 p5-590 I/O drawers quantity with different loop mode . . . . . . . . . . . . . 128
5-5 p5-595 I/O drawers quantity with different loop mode . . . . . . . . . . . . . 128
5-6 Hardware Management Console usage . . . . . . . . . . . . . . . . . . . . . . . 134
6-1 Hot-swappable FRUs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7-1 Table of service processor card location codes. . . . . . . . . . . . . . . . . . 173
7-2 Summary of BPC Ethernet hub port connectors . . . . . . . . . . . . . . . . . 173
7-3 ASMI user accounts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7-4 ASMI user-level access (menu options) . . . . . . . . . . . . . . . . . . . . . . . 180
8-1 HMC user passwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
A-1 Facts and Features for p5-590 and p5-595 . . . . . . . . . . . . . . . . . . . . . 246
A-2 System unit details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
A-3 Server I/O attachment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
A-4 Peak bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
A-5 Standard warranty in United States, other countries may vary . . . . . . 250
A-6 Physical planning characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
A-7 Racks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
A-8 I/O device options list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
B-1 Model 61D expansion unit slot location description (PHB 1 and 2) . . . 254
B-2 Model 61D expansion unit slot location description (PHB 3) . . . . . . . . 254
B-3 p5-590 and p5-595 PCI adapter placement table . . . . . . . . . . . . . . . . 255
xiv IBM Eserver p5 590 and 595 System Handbook
© Copyright IBM Corp. 2005. All rights reserved. xv
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES
THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
xvi IBM Eserver p5 590 and 595 System Handbook
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
Eserver®
Eserver®
eServer™ ibm.com® iSeries™ i5/OS™ pSeries® xSeries® zSeries® AIX 5L™ AIX® AS/400® BladeCenter™ Chipkill™
Electronic Service Agent™ Enterprise Storage Server® Extreme Blue™ ESCON® Hypervisor™ HACMP™ IBM® Micro Channel® Micro-Partitioning™ OpenPower™ OS/400® Power Architecture™ PowerPC® POWER™
POWER2™ POWER4™ POWER4+™ POWER5™ PS/2® PTX® Redbooks™ Redbooks (logo)™ RS/6000® S/390® SP TotalStorage® Versatile Storage Server™ Virtualization Engine™
The following terms are trademarks of other companies:
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
© Copyright IBM Corp. 2005. All rights reserved. xvii
Preface
This IBM® Redbook explores the IBM Sserver® p5 models 590 and 595 (9119-590, 9119-595), a new level of UNIX® servers providing world-class performance, availability, and flexibility. Ideal for on demand computing environments, datacenter implementation, application service providers, and high performance computing, this new class of high-end servers include mainframe-inspired self-management and security designed to meet your most demanding needs. The IBM Sserver p5 590 and 595 provide an expandable, high-end enterprise solution for managing the computing requirements necessary to become an on demand business.
This publication includes the following topics:
򐂰 p5-590 and p5-595 overview 򐂰 p5-590 and p5-595 hardware architecture 򐂰 Virtualization overview 򐂰 Capacity on Demand overview 򐂰 Reliability, availability, and serviceability (RAS) overview 򐂰 Hardware Management Console (HMC) features and functions
This publication is an ideal desk-side reference for IBM professionals, IBM Business Partners, and technical specialists who support the p5-590 and p5-595 systems, and for those who want to learn more about this radically new server in a clear, single-source handbook.
The team that wrote this redbook
This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center.
Peter Domberg (Domi) is a Technical Support Specialist in Germany. He has 27 years of experience in the ITS hardware service. His areas of expertise include pSeries®, RS/6000®, networking, and SSA storage. He is also an AIX 5L Certified Specialist and Hardware Support Specialist for the North and East regions in Germany.
Nia Kelley is a Staff Software Engineer based in IBM Austin with over four years of experience in the pSeries firmware development field. She holds a bachelor’s
xviii IBM Eserver p5 590 and 595 System Handbook
degree in Electrical Engineering from the University of Maryland at College Park. Her areas of expertise include system bringup and firmware development, in which she has led several project teams. Also she has held various architectural positions for existing and future pSeries products. Ms. Kelley is an alumni of the IBM Extreme Blue™ program and has filed numerous patents for the IBM corporation.
TaiJung Kim is a pSeries Systems Product Engineer at the pSeries post-sales Technical Support Team in IBM Korea. He has three years of experience working on RS/6000 and pSeries products. He is an IBM Certified Specialist in pSeries systems and AIX 5L. He provides clients with technical support on pSeries systems, AIX 5L, and system management.
Ding Wei is an Advisory IT Specialist working for IBM China ATS. He has eight years of experience in the Information Technology field. His areas of expertise include pSeries® and storage products and solutions. He has been working for IBM for six years.
Thanks to the following people for their contributions to this project:
International Technical Support Organization, Austin Center Scott Vetter
IBM Austin Anis Abdul, George Ahrens, Doug Bossen, Pat Buckland, Mark Dewalt, Bob Foster, Iggy Haider, Dan Henderson, Richard (Jamie) Knight, Andy McLaughlin, Jim Mitchell, Cathy Nunez, Jayesh Patel, Craig Shempert, Guillermo Silva, Joel Tendler
IBM Endicott Brian Tolan
IBM Raleigh Andre Metelo
IBM Rochester Salim Agha, Diane Knipfer, Dave Lewis, Matthew Spinler, Stephanie Swanson
IBM Poughkeepsie Doug Baska
IBM Boca Raton Arthur J. Prchlik
IBM Somers Bill Mihaltse, Jim McGaughan
Preface xix
IBM UK Derrick Daines, Dave Williams
IBM France Jacques Noury
IBM Germany Hans Mozes, Wolfgang Seiwald
IBM Australia Cameron Ferstat
IBM Italy Carlo Costantini
IBM Redbook “Partitioning Implementations for IBM Sserver p5 and pSeries Servers” Team Nic Irving (CSC Corporation - Australia), Matthew Jenner (IBM Australia), Arsi Kortesnemi (IBM Finland)
Become a published author
Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or clients.
Your efforts will help increase product acceptance and client satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
xx IBM Eserver p5 590 and 595 System Handbook
Comments welcome
Your comments are important to us!
We want our Redbooks™ to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways:
򐂰 Use the online Contact us review redbook form found at:
ibm.com/redbooks
򐂰 Send your comments in an email to:
redbook@us.ibm.com
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization Dept. JN9B Building 905 11501 Burnet Road Austin, Texas 78758-3493
© Copyright IBM Corp. 2005. All rights reserved. 1
Chapter 1. System overview
In this chapter we provide a basic overview of the p5-590 and p5-595 servers, highlighting the new features, marketing position, main features, and operating systems.
򐂰 Section 1.1, “Introduction” on page 2 򐂰 Section 1.2, “What’s new” on page 2 򐂰 Section 1.3, “General overview and characteristics” on page 4 򐂰 Section 1.4, “Features summary” on page 12 򐂰 Section 1.5, “Operating systems support” on page 13
1
2 IBM Eserver p5 590 and 595 System Handbook
1.1 Introduction
The IBM Sserver p5 590 and IBM Sserver p5 595 are the servers redefining the IT economics of enterprise UNIX and Linux® computing. The up to 64-way p5-595 server is the new flagship of the product line with nearly three times the commercial performance (based on rperf estimates) and twice the capacity of its predecessor, the IBM Sserver pSeries 690. Accompanying the p5-595 is the up to 32-way p5-590 that offers enterprise-class function and more performance than the pSeries 690 at a significantly lower price for comparable configurations.
Both systems are powered by IBMs most advanced 64-bit Power Architecture™ microprocessor, the IBM POWER5™ microprocessor, with simultaneous multi-threading that makes each processor function as two to the operating system, thus increasing commercial performance and system utilization over servers without this capability. The p5-595 features a choice of IBMs fastest POWER5 processors running at 1.90 GHz or 1.65 GHz, while the p5-590 offers
1.65 GHz processors.
These servers come standard with mainframe-inspired reliability, availability, serviceability (RAS) capabilities and IBM Virtualization Engine™ systems technology with breakthrough innovations such as Micro-Partitioning™. Micro-Partitioning allows as many as ten logical partitions (LPARs) per processor to be defined. Both systems can be configured with up to 254 virtual servers with a choice of AIX 5L™, Linux, and i5/OS™ operating systems in a single server, opening the door to vast cost-saving consolidation opportunities.
1.2 What’s new
The p5-590 and p5-595 bring the following features:
򐂰 POWER5 microprocessor
Designed to provide excellent application performance and high reliability. Includes simultaneous multi-threading to help increase commercial system performance and processor utilization. See 1.3.1, “Microprocessor technology” on page 6 and 2.2, “The POWER5 microprocessor” on page 18 for more information.
򐂰 High memory / I/O bandwidth
Fast processors wait less for data to be moved through the system. Delivers data faster for the needs of high performance computing and other memory-intensive applications. See 2.3, “Memory subsystem” on page 26 for more information.
Chapter 1. System overview 3
򐂰 Flexibility in packaging
High-density 24-inch system frame for maximum growth. See 1.3, “General overview and characteristics” on page 4 for more information.
򐂰 Shared processor pool
Provides the ability to transparently share processing power between partitions. Helps balance processing power and ensures the high priority partitions receive the processor cycles they need. See 3.2.1, “Shared processor partitions” on page 59 for more information.
򐂰 Micro-Partitioning
Allows each processor in the shared processor pool to be split into as many as ten partitions. Fine-tuned processing power to match workloads. See 3.2, “Micro-Partitioning” on page 58 for more information.
򐂰 Virtual I/O
Shares expensive resources to help reduce costs. See 3.6, “Virtual SCSI” on page 81 for more information.
򐂰 Virtual LAN
Provides the capability for TCP/IP communication between partitions without the need for additional network adapters. See 3.3, “Virtual Ethernet” on page 65 for more information.
򐂰 Dynamic logical partitioning
Allows reallocation of system resources without rebooting affected partitions. Offers greater flexibility in using available capacity and more rapidly matching resources to changing business requirements.
򐂰 Mainframe-inspired RAS
Delivers exceptional system availability using features usually found on much more expensive systems including service processor, Chipkill™ memory, First Failure Data Capture, dynamic deallocation of selected system resources, dual system clocks, and more. See Chapter 6, “Reliability, availability, and serviceability” on page 139 for more information.
򐂰 Broad range of Capacity on Demand (CoD) offerings
Provides temporary access to processors and memory to meet predictable business spikes. Prepaid access to processors to meet intermittent or seasonal demands. Offers a one-time 30 day trial to test increased processor or memory capacity before permanent activation. Allow processors and memory to be permanently added to meet long term workload increases. See Chapter 4, “Capacity on Demand” on page 85 for more information.
4 IBM Eserver p5 590 and 595 System Handbook
򐂰 Grid Computing support
Allows sharing of a wide range of computing and data resources across heterogeneous, geographically dispersed environments.
򐂰 Scaling through Cluster Systems Management (CSM) support
Allows for more granular growth so end-user demands can be readily satisfied. Provides centralized management of multiple interconnected systems. Provides ability to handle unexpected workload peaks by sharing resources.
򐂰 Multiple operating system support
Allows clients the flexibility to select the right operating system and the right application to meet their needs. Provides the ability to expand applications choices to include many open source applications. See 1.5, “Operating systems support” on page 13 for more information.
1.3 General overview and characteristics
The p5-590 and p5-595 servers are designed with a basic server configuration that starts with a single
frame (Figure 1-1), and is featured with optional and
required components.
Figure 1-1 Primary system frame organization
IBM Hardware
Management
Console
required
(HMC)
Bulk Power Assembly (second fully redundant Bulk
Power Assembly on rear) Central Electronics Complex (CEC) Up to four 16-way books Each book contains:
- two Multichip Modules (MCM)
- up to 512GB memory
- six RIO2 I/O hub adapters
24-inch System Frame, 42U
8U
18U
4U
4U
4U
4U
Two hot-plug redundant blowers with two more on the rear of the CEC
First I/O Drawer (required)
Optional I/O Drawer
Optional I/O Drawer or
Internal batteries
Optional I/O Drawer or
Internal batteries
First I/O Drawer (required)
Optional I/O Drawer
Optional I/O Drawer or
Internal batteries
Optional I/O Drawer or
Internal batteries
Chapter 1. System overview 5
Both systems are powered by IBMs most advanced 64-bit Power Architecture microprocessor, the POWER5 microprocessor, with simultaneous multi-threading that makes each processor logically appear as two to the operating system, thus increasing commercial throughput and system utilization over servers without this capability. The p5-595 features a choice of IBMs fastest POWER5 microprocessors running at 1.9 GHz or 1.65 GHz, while the p5-590 offers
1.65 GHz processors.
For additional capacity, either a powered or non-powered frame can be configured for a p5-595, as shown in Figure 1-2.
Figure 1-2 Powered and bolt on frames
The p5-590 can be expanded by an optional bolt-on frame.
Every p5-590 and p5-595 server comes standard with Advanced POWER™ Virtualization, providing Micro-Partitioning, Virtual I/O Server, and Partition Load Manager (PLM) for AIX 5L.
Micro-Partitioning enables system configurations with more partitions than processors. Processing resources can be allocated in units as small as 1/10th of a processor and be fine-tuned in increments of 1/100th of a processor. So a p5-590 or p5-595 system can define up to ten
virtual servers per processor
(maximum of 254 per system), controlled in a shared processor pool for automatic, nondisruptive resource balancing. Virtualization features of the
p5-595
Powered
Bolt-on
Required for 48- or 64-way server with more than 4 I/O drawers
Consider this frame if anticipating future rapid I/O growth, where CEC
will not handle power requirements
A bolt-on frame may be added later for more capacity
May be used for 16- or 32-way server with more than 4 I/O drawers, using
power from the primary CEC frame
A powered frame may be added later
6 IBM Eserver p5 590 and 595 System Handbook
p5-590 and the p5-595 are introduced in Chapter 3, “POWER5 virtualization capabilities” on page 57.
The ability to communicate between partitions using virtual Ethernet is part of Advanced POWER Virtualization and it is extended with the Virtual I/O Server to include shared Ethernet adapters. Also part of the Virtual I/O Server is virtual SCSI for sharing SCSI adapters and the attached disk drives.
The Virtual I/O Server requires APAR IY62262 and is supported by AIX 5L Version 5.3 with APAR IY60349, as well as by SLES 9 and RHEL AS 3. Also included in Advanced POWER Virtualization is PLM, a powerful policy based tool for automatically managing resources among LPARs running AIX 5L Version 5.3 or AIX 5L Version 5.2 with the 5200-04 Recommended Maintenance package.
IBM Sserver p5 590 and 595 servers also offer optional Capacity on Demand (CoD) capability for processors and memory. CoD functionality is outlined in Chapter 4, “Capacity on Demand” on page 85.
IBM Sserver p5 590 and 595 servers provide significant extensions to the mainframe-inspired reliability, availability, and serviceability (RAS) capabilities found in IBM Sserver p5 and pSeries systems. They come equipped with multiple resources to identify and help resolve system problems rapidly. During ongoing operation, error checking and correction (ECC) checks data for errors and can correct them in real time. First Failure Data Capture (FFDC) capabilities log both the source and root cause of problems to help prevent the recurrence of intermittent failures that diagnostics cannot reproduce. Meanwhile, Dynamic Processor Deallocation and dynamic deallocation of PCI bus slots help to reallocate resources when an impending failure is detected so applications can continue to run unimpeded. RAS function is discussed in Chapter 6, “Reliability, availability, and serviceability” on page 139.
Power options for these systems are described in 5.2.10, “Frame, power, and battery backup configuration rules” on page 130.
A description of RAS features, such redundant power and cooling, can be found in 6.4, “Redundancy in components” on page 146.
The following sections detail some of the technologies behind the p5-590 and p5-595.
1.3.1 Microprocessor technology
The IBM POWER4™ microprocessor, which was introduced in 2001, was a result of advanced research technologies developed by IBM to create a high-performance, high-scalability chip design to power future IBM Sserver systems. The POWER4 design integrates two processor cores on a single chip, a
Chapter 1. System overview 7
shared second-level (L2) cache, a directory for an off-chip third-level (L3) cache, and the necessary circuitry to connect it to other POWER4 chips to form a system. The dual-processor chip provides natural thread-level parallelism at the chip level.
The POWER5 microprocessor is IBMs second generation dual core microprocessor and extends the POWER4 design by introducing enhanced performance and support for a more granular approach to computing. The POWER5 chip features single- and multi-threaded execution and higher performance in the single-threaded mode than the POWER4 chip at equivalent frequencies.
The primary design objectives of the POWER5 microprocessor are:
򐂰 Maintain binary and structural compatibility with existing POWER4 systems 򐂰 Enhance and extend symmetric multiprocessing (SMP) scalability 򐂰 Continue to provide superior performance 򐂰 Deliver a power efficient design 򐂰 Enhance reliability, availability, and serviceability
POWER4 to POWER5 comparison
There are several major differences between POWER4 and POWER5 chip designs, and they include the following areas shown in Figure 1-3, and as discussed in the following sections:
Figure 1-3 POWER4 and POWER5 architecture comparison
Better performance12072
Floating-point rename registers
389mm
2
Enhanced dist. switch
Processor speed
½ proc. speed
1/10thof processor
Yes
36MB
12-way associative
Reduced latency
10-way associative
1.9MB
4-way associative
POWER5 design
Better usage of processor
resources
1 processor
Partitioning support
50% more transistors in
the same space
412mm
2
Size
Better systems throughput
Better performance
Distributed switch
½ proc. speed ½ proc. speed
Chip interconnect:
Type Intra MCM data bus Inter MCM data bus
Better processor utilization
30%* system improvement
No
Simultaneous multi-threading
Better cache performance
32MB
8-way associative
118 clock cycles
L3 cache
Fewer L2 cache misses
Better performance
8-way associative
1.5MB
L2 cache
Improved L1 cache
performance
2-way associative
L1 cache
Benefit
POWER4+ design
POWER4+ to POWER5 comparison
* Based on IBM rPerf projections
8 IBM Eserver p5 590 and 595 System Handbook
Introduction to simultaneous multi-threading
Simultaneous multi-threading is a hardware design enhancement in POWER5 architecture that allows two separate instruction streams (threads) to execute simultaneously on the processor. It combines the capabilities of superscaler processors with the latency hiding abilities of hardware multi-threading.
Using multiple on-chip thread contexts, the simultaneous multi-threading processor executes instructions from multiple threads each cycle. By duplicating portions of logic in the instruction pipeline and increasing the capacity of the register rename pool, the POWER5 processor can execute several elements of two instruction streams, or threads, concurrently. Through hardware and software thread prioritization, greater utilization of the hardware resources can be realized without an impact to application performance.
The benefit of simultaneous multi-threading is realized more in commercial environments over numeric intensive environments, since the number of transactions performed outweighs the actual speed of the transaction. For example, the simultaneous multi-threading environment would be much better suited for a Web server or database server than it would be for a Fortran weather prediction application. In the rare case that applications are tuned for optimal use of processor resources there may be a decrease in performance due to increased contention to cache and memory. For this reason simultaneous multi-threading may be disabled.
Although it is the operating system that determines whether simultaneous multi-threading is used, simultaneous multi-threading is otherwise completely transparent to the applications and operating system, and implemented entirely in hardware (simultaneous multi-threading is not supported on AIX 5L Version
5.2).
1.3.2 Memory subsystem
With the enhanced architecture of larger 7.6 MB L2 and 144 MB L3 caches, each mutichip module (MCM) can stage information more effectively from processor memory to applications. These caches allow the p5-590 and p5-595 to run workloads significantly faster than predecessor servers.
The difference of memory hierarchy between POWER4 and POWER5 systems is represented in Figure 1-4 as follows:
Chapter 1. System overview 9
Figure 1-4 POWER4 and POWER5 memory structure comparison
There are two types of memory technologies offered, namely DDR1 and DDR2. Equipped with 8 GB of memory in its minimum configuration, the p5-590 can be scaled to 1 TB using DDR1 266 MHz memory. From 8 GB to 128 GB of DDR2 533 MHz memory, useful for high-performance applications, is available. The p5-595 can be scaled from 8 GB to 2 TB of DDR1 266 MHz memory; From 8 GB to 256 GB of DDR2 533 MHz memory.
Additional information about memory can be found in 2.3, “Memory subsystem” on page 26 and 5.2.4, “Memory configuration rules” on page 123.
1.3.3 I/O subsystem
Using the RIO-2 ports in the processor books, up to twelve I/O drawers can be attached to a p5-595 and up to eight I/O drawers to the p5-590, providing up to
9.3 TB and 14 TB of 15 K rpm disk storage, respectively. Each 4U (4 EIA unit) drawer provides 20 hot-plug, blind-swap PCI-X I/O adapter slots, 16 or 8 front-accessible, hot-swappable disk drive bays and four or two integrated Ultra3 SCSI controllers. I/O drawers can be installed in the primary 24-inch frame or in an optional expansion frame. Attachment to a wide range of IBM TotalStorage® storage system offerings – including disk storage subsystems, storage area network (SAN) components, tape libraries, and external media drives – is also supported.
10 IBM Eserver p5 590 and 595 System Handbook
A minimum of one I/O drawer (FC 5791 or FC 5794) is required per system. I/O drawer FC 5791 contains 20 PCI-X slots and 16 disk bays, and FC 5794 contains 20 PCI-X slots and 8 disk bays. Existing 7040-61D I/O drawers may also be attached to a p5-595 or p5-590 servers as additional I/O drawers (when correctly featured). For more information about the I/O system, refer to 2.7, “I/O drawer” on page 37. The I/O features are shown in Figure 1-5.
Figure 1-5 p5-590 and p5-595 I/O drawer organization
1.3.4 Media bays
You can configure your IBM Sserver p5 595 (9119-595) and p5 590 (9119-590) systems to include a storage device media drawer. This media drawer can be mounted in the CEC rack with three available media bays, two in the front and one in the rear. New storage devices for the media bays include:
򐂰 16X/48X IDE DVD-ROM drive 򐂰 4.7 GB, SCSI DVD-RAM drive 򐂰 36/72 GB, 4 mm internal tape drive
This offering is an alternative to the 7212-102 Storage Device Enclosure, which cannot be mounted in the 590 and 595 CEC rack.
FRONT
REAR
ƒ Contains up to 16 hot-swappable disks
structured into four 4-packs of 36.4GB or
73.4GB 15K rpm disk drives (1.1 terabytes maximum per I/O drawer)
ƒ Feature options for two or four backplanes ƒ Existing IBM 7040-61D I/O drawers may be
moved to these servers
ƒ Twenty hot-plug PCI-X slots per drawer ƒ Maximum of 160 hot-plug slots per system ƒ Hot-plug slots permit PCI-X adapters to
be added or replaced without extending the I/O drawer while system remains available via blind-swap cassettes
ƒ Any slot can be assigned to any partition
Chapter 1. System overview 11
1.3.5 Virtualization
The IBM Virtualization Engine can help simplify IT infrastructure by reducing management complexity and providing integrated Virtualization technologies and systems services for a single IBM Sserver p5 server or across multiple server platforms. The Virtualization Engine systems technologies added or enhanced the systems using the POWER5 architecture are as follows. A detailed description of these features can be found in Chapter 3, “POWER5 virtualization capabilities” on page 57.
򐂰 POWER™ Hypervisor™
Is responsible for time slicing and dispatching the logical partition workload across the physical processors, and enforces partition security, and can provide Virtual LAN channels between partitions, reducing the need for physical Ethernet adapters using I/O adapter slots.
򐂰 Simultaneous multi-threading
Allows two separate instruction streams (threads) to run concurrently on the same physical processor, improving overall throughput and improving overall hardware resource utilization.
򐂰 LPAR
Allows processors, memory, and I/O adapters, and attached devices to be grouped logically into separate systems within the same server.
򐂰 Micro-Partitioning
Allows processor resources to be allocated to partitions in units as small as 1/10th of a processor, with increments in units of 1/100th of a processor.
򐂰 Virtual I/O
Includes virtual SCSI for sharing SCSI attached disks and virtual networking to enable sharing of Ethernet adapters.
򐂰 Virtual LAN (VLAN)
Enables high-speed, secure, partition-to-partition communications using the TCP/IP protocol to help improve performance.
򐂰 Capacity on Demand
Allows system resources such as processors and memory to be made available on an as-needed basis.
򐂰 Multiple operating system support
The POWER5 processor-based Sserver p5 products supports IBM AIX 5L Version 5.2, IBM AIX 5L Version 5.3, SUSE Linux Enterprise Server 9 (SLES 9), and Red Hat Enterprise AS Linux 3 (RHEL AS 3). IBM i5/OS V5R3 is also available on Sserver p5 models 570, 590, and 595.
12 IBM Eserver p5 590 and 595 System Handbook
1.4 Features summary
Table 1-1 summarizes the major features of the p5-590 and p5-595 servers. For more information, see Appendix A, “Facts and features reference” on page 245.
Table 1-1 p5-590 and p5-595 features summary
* 32 GB memory cards to enable maximum memory are planned for availability April 8, 2005. Until that time, maximum memory is half as much (512 GB on p5-590 and 1024 GB on p5-595).
IBM Sserver p5 system p5-590 p5-595
Machine type - Model 9119-590 9119-595
Packaging 24-inch system frame
Number of Expansion Racks 11 or 2
Number of processors per system
8 to 32 16 to 64
POWER5 processor speed 1.65 GHz 1.65 or 1.90 GHz
Number of 16-way processor books (2 MCMs per book)
1 or 2 1, 2, 3, or 4
Memory 8 GB - 1024 GB* 8 GB - 2048 GB*
CoD features (All memory CoD features apply to DDR1 memory only)
Processor/Memory CUoD
Reserve CoD
On/Off Processor/Memory CoD
Trial Processor/Memory CoD
Capacity BackUp
Maximum micro-partitions 10 times the number of processors (254 maximum)
PCI-X slots 20 per I/O drawer
Media bays Optional
Disk bays 8 or 16 per I/O drawer
Optional I/O drawers Up to 8 Up to 12
Maximum PCI-X slots with maximum I/O drawers
160 240
Maximum disk bays with maximum I/O drawers
128 192
Maximum disk storage maximum with I/O drawers
9.3 TB 14.0 TB
Chapter 1. System overview 13
1.5 Operating systems support
All new POWER5 processor-based servers are capable of running IBM AIX 5L Version 5.3 or AIX 5L Version 5.2 for POWER and support appropriate versions of Linux. Both of the aforementioned supported versions of AIX 5L have been specifically developed and enhanced to exploit and support the extensive RAS features on IBM Sserver pSeries systems. Table 1-2 lists operating systems compatibility.
Table 1-2 p5-590 and p5-595 operating systems compatibility
1
Many of the features are operating system dependent and may not be available
on Linux. For more information, check:
http://www.ibm.com/servers/eserver/pseries/linux/whitepapers/linux_pser ies.html
1.5.1 AIX 5L
The p5-590 and p5-595 requires AIX 5L Version 5.3 or AIX 5L Version 5.2 Maintenance Package 5200-04 (IY56722) or later.
The system requires the following media:
Operating system
1
p5-590 p5-595
AIX 5L V5.1 No No
AIX 5L V5.2 (5765-E62) Yes Yes
AIX 5L V5.3 (5765-G03) Yes Yes
AIX 5L LPAR Yes Yes
Red Hat Enterprise Linux AS 3 for POWER (5639-RDH) Yes Yes
SUSE LINUX Enterprise Server 8 No No
SUSE LINUX Enterprise Server 9 for POWER (5639-SLP) Yes Yes
Linux LPAR Yes Yes
i5/OS (5722-SS1) Yes Yes
HACMP™ for AIX 5L V5.2 (5765-F62) Yes Yes
Cluster Systems Management for AIX 5L V1.4 (5765-F67) Yes Yes
Cluster Systems Management for Linux on POWER V1.4 (5765-G16)
Ye s Ye s
14 IBM Eserver p5 590 and 595 System Handbook
򐂰 AIX 5L for POWER V5.2 5765-E62, dated 08/2004, or later
(CD# LCD4-1133-04) plus APAR IY60347 (Required AIX 5L Version 5.2 updates for Sserver p5 590/595)
򐂰 AIX 5L for POWER Version 5.3 5765-G03, dated 08/2004, or later.
(CD# LCD4-7463-00) with APAR IY60349 (Required AIX 5L Version 5.3 updates for Sserver p5 590/595)
IBM periodically releases maintenance packages for the AIX 5L operating system. These packages are available on CD-ROM (FC 0907) and can be downloaded from the Internet at:
http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html
You can also get individual operating system fixes and information about obtaining AIX 5L service at this site. AIX 5L Version 5.3 has the Service Update Management Assistant (SUMA) tool, which helps the administrator to automate the task of checking and downloading operating system downloads.
The Advanced POWER Virtualization feature is not supported on AIX 5L Version
5.2. AIX 5L Version 5.3 is required to take full advantage of Advanced POWER Virtualization feature.
1.5.2 Linux
For the p5-590 and p5-595, Linux distributions are available through SUSE and Red Hat at the time this publication was written. The p5-590 and p5-595 requires the following version of Linux distributions:
򐂰 SUSE LINUX Enterprise Server 9 for POWER, or later 򐂰 Red Hat Enterprise Linux AS 3 for POWER (update 4), or later
The Advanced POWER Virtualization feature, dynamic LPAR, and other features require SUSE SLES 9.
Find full information about Red Hat Enterprise Linux AS 3 for POWER at:
http://www.redhat.com/software/rhel/as/
Find full information about SUSE Linux Enterprise Server 9 for POWER at:
http://www.suse.com/us/business/products/server/sles/i_pseries.html
For the latest in IBM Linux news, subscribe to the Linux Line. See:
https://www6.software.ibm.com/reg/linux/linuxline-i
Many of the features that are described in this document are OS-dependent and may not be available on Linux. For more information, check:
Chapter 1. System overview 15
http://www.ibm.com/servers/eserver/pseries/linux/whitepapers/linux_pseries. html
IBM only supports the Linux systems of clients with a SupportLine contract that covers Linux. Otherwise, the Linux distributor should be contacted for support.
16 IBM Eserver p5 590 and 595 System Handbook
© Copyright IBM Corp. 2005. All rights reserved. 17
Chapter 2. Hardware architecture
This chapter reviews the contents in Chapter 1, “System overview” on page 1, but provides deeper technical descriptions of the topics, including hardware architectures that are implemented in the p5-590 and p5-595 servers, the POWER5 processor, memory subsystem, and I/O subsystem in the following topics:
򐂰 Section 2.1, “Server overview” on page 18 򐂰 Section 2.2, “The POWER5 microprocessor” on page 18 򐂰 Section 2.3, “Memory subsystem” on page 26 򐂰 Section 2.4, “Central electronics complex” on page 30 򐂰 Section 2.5, “System flash memory configuration” on page 36 򐂰 Section 2.6, “Vital product data and system smart chips” on page 36 򐂰 Section 2.7, “I/O drawer” on page 37
2
18 IBM Eserver p5 590 and 595 System Handbook
2.1 Server overview
The IBM Sserver p5 590 and 595 provides an expandable, high-end enterprise solution for managing the computing requirements necessary to become an on demand business. With the introduction of the POWER5 architecture, there has been numerous improvements over the previous POWER4 architecture based systems.
Both the p5-590 and p5-595 systems are based on the same 24-inch wide, 42 EIA height frame. Inside this frame all the server components are placed in predetermined positions. This design and mechanical organization offers advantages in optimization of floor space usage.
The p5-595 is a 16/32/48/64-way (at 1.9 GHz or 1.65 GHz) SMP system packaged in a 24-inch wide 18 EIA by 36 inch deep CEC. The CEC is installed in the 42 EIA base primary frame that also include two top mounted front and back bulk power assemblies (BPAs) and support for up to four I/O drawers. A powered I/O frame (FC 5792) and a bolt-on expansion frame (FC 8691) is also available to support additional I/O drawers for the p5-595 system. Up to 12 I/O drawers can be attached to a p5-595.
The p5-590 has identical architecture with p5-595. It differs from p5-595 in the following areas:
򐂰 Only a 1.65 GHz processor is supported in a p5-590. 򐂰 The maximum configuration is a 32-way system with up to eight I/O drawers. 򐂰 A powered I/O frame (FC 5792) is not required in the p5-590.
2.2 The POWER5 microprocessor
The POWER5 processor features single-threaded and multi-threaded execution, providing higher performance in the single-threaded mode than its POWER4 predecessor provides at equivalent frequencies. The POWER5 microprocessor maintains both binary and architectural compatibility with existing POWER4 systems to ensure that binaries continue executing properly and that all application optimizations carry forward to newer systems. The POWER5 microprocessor provides additional enhancements such as virtualization, simultaneous multi-threading support, improved reliability, availability, and serviceability at both chip and system levels, and it has been designed to support interconnection of 64 processors along with higher clock speeds.
Figure 2-1 shows the high-level structures of POWER4 and POWER5 processor-based systems. The POWER4 processors scale up to a 32-way
Chapter 2. Hardware architecture 19
symmetric multi-processor. Going beyond 32 processors with POWER4 architecture could increase interprocessor communication, resulting in higher traffic on the interconnection fabric bus. This can cause greater contention and negatively affect system scalability.
Moving the L3 cache reduces traffic on the fabric bus and enables POWER5 processor-based systems to scale to higher levels of symmetric multi-processing. The POWER5 processor supports a 1.9 MB on-chip L2 cache, implemented as three identical slices with separate controllers for each. Either processor core can independently access each L2 controller. The L3 cache, with a capacity of 36 MB, operates as a backdoor with separate buses for reads and writes that operate at half the processor speed.
Because of the higher transistor density of the POWER5 0.13-
µm technology
(over the original POWER4), it was possible to move the memory controller on-chip and eliminate a chip that was previously needed for the memory controller function. These changes in the POWER5 processor also have the significant side benefits of reducing latency to the L3 cache and main memory, as well as reducing the number of chips that are necessary to build a system.
The POWER5 processor supports the 64-bit PowerPC® architecture. A single die contains two identical processor cores, each supporting two logical threads. This architecture makes the chip appear as a four-way symmetric multi-processor to the operating system. The POWER5 processor core has been designed to support both enhanced simultaneous multi-threading and single-threaded (ST) operation modes.
Figure 2-1 POWER4 and POWER5 system structures
Processor Processor
L2
cache
Fabric
controller
L3
cache
Memory
controller
Memory
Processor Processor
L2
cache
Fabric
controller
L3
cache
Memory
controller
Memor y
Processor Processor
L2
cache
Fabric
controller
L3
cache
Memory
controller
Memory
Processor Processor
L2
cache
Fabric
controller
L3
cache
Memory
controller
Memory
POWER4 POWER5
Fabric bus
Fabric bus
Fabric bus
Fabric bus Fabric bus Fabric bus
20 IBM Eserver p5 590 and 595 System Handbook
2.2.1 Simultaneous multi-threading
As a permanent requirement for performance improvements at the application level, simultaneous multi-threading functionality is embedded in the POWER5 chip technology. Developers are familiar with process-level parallelism (multi-tasking) and thread-level parallelism (multi-threads). Simultaneous multi-threading is the next stage of processor for achieving higher processor utilization for throughput-oriented applications to introduce the method of instruction group-level parallelism to support multiple pipelines to the processor. The instruction groups are chosen from different hardware threads belonging to a single OS image.
Simultaneous multi-threading is activated by default when an OS that supports it is loaded. On a 2-way POWER5 processor-based system, the operating system discovers the available processors as a 4-way system. To achieve a higher performance level, simultaneous multi-threading is also applicable in Micro-Partitioning, capped or uncapped, and dedicated partition environments.
Simultaneous multi-threading is supported on POWER5 processor-based systems running AIX 5L Version 5.3 or Linux OS-based systems at the correct kernel level. AIX 5L provides the smtctl command that turns simultaneous multi-threading on and off without subsequent reboot. For Linux, an additional boot option must be set to activate simultaneous multi-threading after a reboot.
The simultaneous multi-threading mode increases the usage of the execution units. In the POWER5 chip, more rename registers have been introduced (both Floating-Point registers (FPR) and general-purpose registers (GPR) are increased to 120), that are essential for out-of-order execution and vital for the simultaneous multi-threading.
Enhanced simultaneous multi-threading features
To improve simultaneous multi-threading performance for various workload mixes and provide robust quality of service, POWER5 provides two features:
򐂰 Dynamic resource balancing
The objective of dynamic resource balancing is to ensure that the two threads executing on the same processor flow smoothly through the system.
Depending on the situation, the POWER5 processor resource balancing logic has a different thread throttling mechanism.
򐂰 Adjustable thread priority
Adjustable thread priority lets software determine when one thread should have a greater (or lesser) share of execution resources.
The POWER5 processor supports eight software-controlled priority levels for each thread.
Chapter 2. Hardware architecture 21
Single threaded operation
Not all applications benefit from simultaneous multi-threading. Having threads executing on the same processor does not increase the performance of processor intensive applications or applications that consume all of the chip’s memory bandwidth. For this reason, the POWER5 processor supports the single thread (ST) execution mode. In this mode, the POWER5 processor gives all of the physical resources to the active thread, enabling it to achieve higher performance than a POWER4 processor-based system at equivalent frequencies. Highly optimized scientific codes are one example where ST operation is ideal.
Simultaneous multi-threading and ST operation modes can be dynamically switched without affecting server operations. The two modes can coexist on a single physical system; however, only a single mode is possible on each OS instance (partition).
2.2.2 Dynamic power management
In current CMOS1 technologies, chip power consumption is one of the most important design parameters. With the introduction of simultaneous multi-threading, more instructions execute per cycle per processor core, thus increasing the core’s and the chip’s total switching power. To reduce switching power, POWER5 chips extensively use a fine-grained, dynamic clock-gating mechanism. This mechanism gates off clocks to a local clock buffer if dynamic power management logic knows that the set of latches that are driven by the buffer will not be used in the next cycle. This allows substantial power saving with no performance impact. In every cycle, the dynamic power management logic determines whether a local clock buffer that drives a set of latches can be clock-gated in the next cycle.
In addition to the switching power, leakage power has become a performance limiter. To reduce leakage power, the POWER5 chip uses transistors with low threshold voltage only in critical paths. The POWER5 chip also has a low-power mode, enabled when the system software instructs the hardware to execute both threads at priority 1. In low power mode, instructions dispatch once every 32 cycles at most, further reducing switching power. Both threads are set to priority 1 by the operating system when in the idle loop.
2.2.3 The POWER chip evolution
The p5-590 and p5-595 system complies with the RS/6000® platform architecture, which is an evolution of the PowerPC Common Hardware
1
Complementary Metal Oxide Semiconductor
22 IBM Eserver p5 590 and 595 System Handbook
Reference Platform (CHRP) specifications. Figure 2-2 on page 24 shows the POWER chip evolution.
򐂰 POWER4
POWER4 processor is not just a chip, but rather an architecture of how a set of chips is designed together to build a system. As such, POWER4 can be considered a technology in its own right. The interconnect topology, referred to as a Distributed Switch, was new to the industry with POWER4. In that light, systems are built by interconnecting POWER4 chips to form up to 32-way symmetric multi-processors. The reliability, availability, and serviceability design incorporated into POWER4 is pervasive throughout the system and is as much a part of the design. POWER4 is the chip technology used in the pSeries Models 615, 630, 650, 655, 670, 690, and IntelliStation
275. It is also the basis for the PowerPC® 970 used in JS20 BladeCenter™ servers.
The POWER4 design can handle a varied and robust set of workloads. This is especially important as the on demand business world evolves and data intensive demands on systems merge with commercial requirements. The need to satisfy high performance computing requirements with its historical high bandwidth demands and commercial requirements, along with data sharing and SMP scaling requirements dictate a single design to address both environments.
򐂰 POWER5
POWER5 technology is the next generation of IBM 64-bit architecture. Although the hardware is based on POWER4, POWER5 is much more than just an improvement in processor or chip design. It is a major architectural change, creating a much more efficient superscalar processor complex. For example, the high performance distributed switch is enhanced. POWER5 technology is implemented in the Sserver p5 Models 510, 520, 550, 570, 575, 590, 595 and the OpenPower™ 710 and 720 systems.
As with POWER4 hardware technology, POWER5 technology-based processors have two load/store, two arithmetic, one branch execution unit, and one execution unit for logical operations on the cycle redundancy (CR). The design of the processor complex is such that it can most efficiently execute multiple instruction streams concurrently. With simultaneous multi-threading active, instructions from two different threads can be issued per single cycle.
The POWER5 concept is a step further into autonomic computing
2
. Several enhanced reliability and availability enhancements are implemented. Along with increased redundant components, it incorporates new technological high
2
Autonomic computing: An approach to self-managed computing systems with a minimum of human interference. The term derives from the body's autonomic nervous system, which controls key functions without conscious awareness or involvement.
Chapter 2. Hardware architecture 23
standards, such as special ways to reduce junction temperatures to reach a high level of availability. The full system design approach is required to maintain balanced utilization of hardware resources and high availability of the new Sserver p5 systems.
Memory and CPU sharing, a dual clock, and dual service processors with failover capability are examples of the full system design approach for high availability. IBM designed the Sserver p5 system processor, caching mechanisms, memory allocation methods, and full ECC support for buses between chips inside a POWER5 system for performance and availability. In addition, advanced error correction and low power consumption circuitry is improved with thermal management.
Multi-processor POWER5 technology-based servers have multiple autonomic computing features for higher availability compared with single processor servers. If a processor is running, but is experiencing a high rate of correctable soft errors, it can be deconfigured. Its workload can be picked up automatically by the remaining processor or processors without an IPL. If there is an unused Capacity Upgrade on Demand processor or if one processor unit of unused capacity in a shared processor pool is available, the deconfigured processor can be replaced dynamically by the unused processor capacity to maintain the same level of available performance.
24 IBM Eserver p5 590 and 595 System Handbook
Figure 2-2 The POWER chip evolution
2.2.4 CMOS, copper, and SOI technology
The POWER5 processor design enables IBM Sserver p5 systems to offer clients improved performance, reduced power consumption, and decreased IT footprint size through logical partitioning, Micro-Partitioning and Virtual I/O. It is made using IBM 0.13-
µm-lithography CMOS. The POWER5 processor also uses
copper and Silicon-on-Insulator (SOI) technology to allow a higher operating frequency for improved performance, yet with reduced power consumption and improved reliability compared to processors not using this technology.
2.2.5 Processor books
In the p5-590 and p5-595 system, the POWER5 chip has been packaged with the L3 cache chip into a cost-effective multi chip module package. The storage structure for the POWER5 processor chip is a distributed memory architecture that provides high-memory bandwidth. Each processor can address all memory and sees a single shared memory resource. As such, two MCMs with their
32-bit
64-bit
Note: Not all
processor speeds
available on all
models
POWER4
1.0 to 1.3 GHz
POWER4+
1.2 to 1.9 GHz
RS64-IV 600 / 750
pSeries 620, 660,
and 680
604e
332 / 375
p615, p630,
p650, p655, p670
and p690
POWER3
-II
333 / 375 /
450
Models 270, B80, and
POWER3 SP Nodes
RS64
125
S70
RS64-II
262.5
S7A
RS64-II
340
H70
RS64-III
450
F80, H80, M80, S80
Power3
200+
SP Nodes
+ SOI =
SOI
Copper =
F50
p630, p670,
and p690
POWER5
1.5 to 1.9 GHz
P5-510, p5-520, p5-550,
p5-570,
p5-575, p5-590, p5-595,
OpenPower 710, and
OpenPower 720
Shared L2
Distributed Switch
0.13 microns
2004
Larger L2 and L3 caches
– – –
Enhanced core parallelism
Improved floating- point
performance
Faster memory environment
Mem Ctl
1.5 to
1.9 GHz Core
1.5 to
1.9 GHz Core
Shared L2Shared L2
Distributed switch
0.13 microns
2004
POWER5
Larger L2 and L3 caches
Micro-Partitioning
Enhanced distributed switch
Enhanced core parallelism
Improved floating- point
performance
Faster memory environment
Mem Ctl
1.5 to
1.9 GHz Core
1.5 to
1.9 GHz Core
1.2 to
1.9 GHz Core
1.2 to
1.9 GHz Core
Shared L2Shared L2
Distributed switch
0.13 microns
2002-3
Larger L2
More LPARs
High- speed switch
POWER4+
Shared L2
Distributed Switch
0.18 microns
2001
POWER4
Distributed switch
Shared L2
LPAR
Autonomic computing
Chip multiprocessing
1.0 to
1.3 GHz Core
1.0 to
1.3 GHz Core
Shared L2Shared L2
Distributed switch
0.18 microns
2001
POWER4
– –
Shared L2
LPAR
Autonomic computing
Chip multiprocessing
1.0 to
1.3 GHz Core
1.0 to
1.3 GHz Core
Chapter 2. Hardware architecture 25
associated L3 cache and memory are packaged on a single processor book. Access to memory behind another processor is accomplished through the fabric buses. The p5-590 supports up to two processor books (each book is a 16-way) and the p5-595 supports up to four processor books. Each processor book has dual MCMs containing POWER5 processor chips and 36 MB L3 modules. Each 16-way processor book also includes 16 slots for memory cards and six remote I/O-2 (RIO-2) attachment cards for connection of the system I/O drawers as shown in Figure 2-3.
Figure 2-3 p5-590 and p5-595 16-way processor book diagram
2.2.6 Processor clock rate
The p5-590 system features base 8-way (CoD), 16-way, and 32-way configurations with the POWER5 processor running at 1.65 GHz. The p5-595 system features base 16-way, 32-way, 48-way, and 64-way configurations with the POWER5 processor running at 1.65 GHz and 1.9 GHz.
To determine the processor characteristics on a running system, use one of the following commands:
lsattr -El procX Where X is the number of the processor; for example,
proc0 is the first processor in the system. The output from the command
3
would be similar to this:
Note: Any p5-595 or p5-590 system made of more than one processor book must have all processor cards running at the same speed.
26 IBM Eserver p5 590 and 595 System Handbook
type powerPC_POWER5 Processor type False frequency 165600000 Processor Speed False smt_enabled true Processor SMT enabled False smt_threads 2 Processor SMT threads False state enable Processor state False
(False, as used in this output, signifies that the value cannot be changed through an AIX 5L command interface.)
pmcycles -m This command (AIX 5L Version 5.3 and later) uses the
performance monitor cycle counter and the processor real-time clock to measure the actual processor clock speed in MHz. This is the sample output of a 8-way p5-590 running at 1.65 GHz system:
Cpu 0 runs at 1656 MHz Cpu 1 runs at 1656 MHz Cpu 2 runs at 1656 MHz Cpu 3 runs at 1656 MHz Cpu 4 runs at 1656 MHz Cpu 5 runs at 1656 MHz Cpu 6 runs at 1656 MHz Cpu 7 runs at 1656 MHz
2.3 Memory subsystem
The p5-590 and p5-595 memory controller is internal to the POWER5 chip. It interfaces to four Synchronous Memory Interface II (SMI-II) buffer chips and eight DIMM cards per processor chips as shown in Figure 2-4. There are 16 memory card slots per processor book and each processor chip on an MCM owns a pair of memory cards. The GX+ interface provides I/O subsystem connection.
3
The output of the lsattr command has been expanded with AIX 5L to include the processor clock
rate.
Note: The pmcycles command is part of the bos.pmapi fileset. First check whether that component is installed by using the lslpp -l bos.pmapi command in AIX 5L.
Chapter 2. Hardware architecture 27
Figure 2-4 Memory flow diagram for MCM0
The minimum memory for a p5-590 processor-based system is 2 GB and the maximum installable memory is 1024 GB using DDR1 memory DIMM technology (128 GB using DDR2 memory DIMM). The total memory depends on the number of available processor cards. Table 2-1 lists the possible memory configurations.
SMI-II
SMI-II
PDIMM
PDIMM
PDIMM
PDIMM
SMI-II
SMI-II
PDIMM
PDIMM
PDIMM
PDIMM
SMI-II
SMI-II
PDIMM
PDIMM
PDIMM
PDIMM
SMI-II
SMI-II
PDIMM
PDIMM
PDIMM
PDIMM
SMI-II
SMI-II
PDIMM PDIMM
PDIMM PDIMM
SMI-II
SMI-II
PDIMM PDIMM
PDIMM PDIMM
SMI-II
SMI-II
PDIMM PDIMM
PDIMM PDIMM
SMI-II
SMI-II
PDIMM PDIMM
PDIMM PDIMM
PU
L2
PU
Chip-Chip Comm
PUL2PU
Chip-Chip Comm
PU
L2
PU
Chip-Chip Comm
PU
L2
PU
Chip-Chip Comm
L3
A
B
C
D
L3
L3L3
SMI adrs
/ctl
bus
L3 bus
Memory Card Pair A0
SMI data bus
Memory bus
Memory Card Pair B0
SMI data bus
Memory bus
Memory Card Pair C0
SMI data bus
Memory bus
Memory Card Pair D0
SMI data bus
Memory bus
SMI adrs
/ctl bus
RIO G Ports
GX
Adapter
Card
L3 bus
L3 bus
L3 bus
RIO G Ports
GX
Adapter
Card
RIO G Ports
GX
Adapter
Card
RIO G Ports
GX
Adapter
Card
Memory Flow for MCM 0
28 IBM Eserver p5 590 and 595 System Handbook
Table 2-1 Memory configuration table
2.3.1 Memory cards
On the p5-590 and the p5-595 systems the memory is seated on a memory card shown in Figure 2-5. Each memory card has four soldered DIMM cards and two SMI-II chips for address/controls, and data buffers. Individual DIMM cards cannot be removed or added and memory cards have fixed amount of memory. Table 2-2 on page 29 lists the available type of memory cards.
Figure 2-5 Memory card with four DIMM slots
2.3.2 Memory placement rules
The memory features that are available for the p5-590 and the p5-595 at the time of writing are listed in Table 2-2. The memory locations for each processor chip in the MCMs are illustrated in Figure 2-6 on page 29.
System p5-590 p5-595
Min. configurable memory 8 GB 8 GB
Max. configurable memory using DDR1 memory 1,024 GB 2,048 GB
Max. configurable memory using DDR2 memory 128 GB 256 GB
Max. number of memory cards
a
a. Number of installable memory cards depends on number of installed processor books. (16 per processor book) Memory cards are installed in quads.
32 64
Note: DDR1 and DDR2 cannot be mixed within a p5-590/p5-595 server.
Chapter 2. Hardware architecture 29
Table 2-2 Types of available memory cards for p5-590 and p5-595
Figure 2-6 Memory placement for the p5-590 and p5-595
The following memory configuration guidelines are recommended.
Memory type
Size Speed Number of
memory cards
Feature code
DDR1 COD
4 GB (2 GB active) 266 MHz 1 7816
8 GB (4 GB active) 266 MHz 1 7835
DDR1 16 GB 266 MHz 1 7828
32 GB 200 MHz 1 7829
256 GB package 266 MHz 32 * 8 GB 8195
512 GB package 266 MHz 32 * 16 GB 8197
512 GB package 200 MHz 16 * 32 GB 8198
DDR2 4 GB 533 MHz 1 7814
A1
A1
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
D0
D0
C0
A0
A0
C0
B1
D1
C1
C1
MEZZANINE
BASE CONNECTOR.
MC04
MC03
MC02
MC01
J1_BASE
J2_EXT
A0
B0
B0
B0 D0
C0
A1
B1
D1
C1
MUX1
D1
B1
D1
D2
D3
D4
D5
D6
D7
D8
PWR
CBYTE
PWR
A
B
C
A1
TP
BERG - TP
A1
DOWN
DOWN
DOWN
DOWN
UP
UP
DOWN
DOWN
DOWN
DOWN
UP
UP
DOWN
DOWN
DOWN
DOWN
DOWN
UP
UP
UP
UP
UP
UP
DOWN
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
Memory
Memory
Memory
MC07
MC06
MC05
MC08
MC09
MC10
MC11
MC12
MC13
MC14
MC15
MC16
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
A0-D0 = MCM0 Chip A-D A1-D1 = MCM1 Chip A-D
C26
C27
MCM 0
MCM 1
A B
CD
A B
CD
30 IBM Eserver p5 590 and 595 System Handbook
򐂰 p5-590/p5-595 servers with one processor book must have a minimum of two
memory cards installed.
򐂰 Each 8-way MCM (two per processor book) should have memory installed. 򐂰 The same amount of memory should be used for each MCM (two per
processor book) in the system.
򐂰 No more than two different sizes of memory cards should be used in any
processor book.
򐂰 All MCMs (two per processor book) in the system should have the same total
memory size.
򐂰 A minimum of half of the available memory slots in the system should contain
memory.
򐂰 It is better to install more cards of smaller capacity than fewer cards of larger
capacity.
For high-performance computing the following in strongly recommended.
򐂰 DDR2 memory is recommended for memory intensive applications. 򐂰 Install some memory in support of each 8-way MCM (two MCMs per
processor book).
򐂰 Use the same size memory cards across all MCMs and processor books in
the system.
2.4 Central electronics complex
The Central Electronics Complex is an 18 EIA unit drawer that houses the processors and memory cards of the p5-590 and p5-595 systems. The fundamental differences between p5-590 and p5-595 systems are outlined in
1.3, “General overview and characteristics” on page 4.
The CEC contains the following components:
򐂰 CEC backplane that serves as the system component mounting unit 򐂰 Multichip modules books that contains the POWER5 processors and L3
cache modules
򐂰 Memory cards 򐂰 Service processor unit 򐂰 I/O books that provide the Remote I/O (RIO) ports 򐂰 Fans and blowers for CEC cooling
Chapter 2. Hardware architecture 31
Figure 2-7 on page 32 provides a logical view of the CEC components. It shows a system populated with two MCMs, eight dual core POWER5 processors, eight L3 cache modules, eight GX+ adapter card slots, the memory subsystem, and dual processor clock.
The CEC houses up to four processor books. Each processor book consists of two MCMs, memory, and GX+ ports (for Remote I/O or for the High Performance Switch
4
). A single MCM has exactly four dual-core POWER5 chips and their Level 3 cache memory (4 x 36 MB). A p5-595 supports maximum of eight MCMs and is 64-way (4 x 2 x 8).
The p5-595 supports a maximum configuration of four processor books, which allows for 64-way processing.
The p5-590 is packaged in the same physical structure as the p5-595. However, the p5-590 supports a maximum of two processor books, which allows for 32-way processing.
The enhanced GX+ high frequency processor bus drives synchronous memory Interface (SMI) memory buffer chips and interfaces with a 36 MB L3 cache controller. The processor bus provides eight bus adapter slots to interface with two types of bus technologies:
򐂰 I/O bus bridges 򐂰 CEC interconnect bridge to support clustered system configurations. The
p5-590 does not support clustered configurations.
The main service processor function is located in the CEC. The service processor subsystem runs an independent operating system and directly drives Ethernet ports to connect the external IBM l Hardware Management Console. The service processor unit is not specifically represented in the CEC logic diagram, Figure 2-7 on page 32. The service processor is explained in more detail in Chapter 7, “Service processor” on page 169.
There are 16 memory card slots available for the p5-595 and p5-590. For detailed information about memory, refer to 2.3.2, “Memory placement rules” on page 28.
4
A Statement of Direction has been announced for this product for the first half of 2005.
32 IBM Eserver p5 590 and 595 System Handbook
Figure 2-7 p5-595 and p5-590 CEC logic diagram
SMI II
L3 Bus
16B Read, 16B Write Bus 3x2B Cmd / Addr bus (1 pe r slice) Elastic Interface 36MB EDRAM
SMI Data Bus
Four 4B Read buses Four 2B Write buses Elastic Interface
Mem Bus
DDR1 & 2 8B Data to Ea ch Dimm Synchronous Interface
Additional memory
GX+
GX+
GX+ Bus
Elastic Interface 4B data each dir
Chip to Chip Internal Fabric Bus
8B Data each dir 6B Addr / Cmd each
dir
Book to Book Fabric Bus
Elastic interface 8B Data each dir 6B Addr / Cmd each dir 8 buses in+8 buses out / book
119 MHz osc - GR1.9 103 MHz osc - GR1.65
Processor Clock
16x Mul & Spread Spectrum
Diff Driver
GX+
osc
62.5 MHz w / Ent.
Memory
2 Memory Cards/POWER5
4 DIMM's / Card 2 SMI-II / Card
RIO2 12x IBT
MCM to MCM Fabric Bus
8B Data each dir 6B Addr / Cmd
each dir
SMI II
SMI II
SMI II
SMI Add/Ctrl Bus
Four 2B Buses Elastic Interface
Additional memory Additiona l memory
Additional memory Additiona l memory
Additi onal memoryAdditional memory
POWER5
L3
L3
L3 L3
B
A
C
D
GX+ Adapter Enterprise (RIO-2)
GX+ Adapter Enterprise (RIO-2)
L3
L3
L3
L3
Connector
Enterprise RIO-2 (Nodes 0&1)
A
B
CD
GX+ Adapter Enterprise (RIO-2)
JTAG
I2C
SC_VPD
Proc Ref Clks 0-1
PLD's 0-3
JTAG Fan Out
I2C Fan Out
SC_VPD Fan Out
Proc Clk Fan Out
A0
Connect or
MCM0
MCM1
0
2 3
osc
62.5 MHz w / Ent.
RIO-2 12x IBT
Enterprise RIO (Nodes 0&1)
GX+ Adapter
Enterprise (RIO-2)
0
2 3
Connector
osc
62.5 MHz w / Ent. 75 MHz w / Can.+
osc
62.5 MHz w / Ent. 75 MHz w / Can.+
GX+ Adapter Enterprise (RIO-2)
osc
62.5 MHz w / Ent. 75 MHz w / Can.+
GX+ Adapter Enterprise (RIO-2)
osc
62.5 MHz w / Ent.
GX+ Adapter Enterprise (RIO-2)
osc
62.5 MHz w / Ent. 75 MHz w / Can.+
GX+ Adapter Enterprise (RIO-2)
osc
62.5 MHz w / Ent.
GX+
GX+
GX+
GX+
Node MUX card
Clk redriv e
GX+ Adapter card
GX+ Adapter card
RIO2 12x IBT FED
GX+ Adapter card
RIO-2 12x IBT FED
GX+ Adapter card
RIO-2 12x IBT FED
GX+ Adapter card
GX+ Adapter card
RIO-2 12x IBT FED
GX+ Adapter card
RIO-2 12x IBT FED
RIO2 12x IBT
GX+ Adapter card
Mem Ref Clks 0-1
Mem Clk Fan Out
Clk redriv e
To S P1
To SP
POWER5
POWER5 POW ER5
POWER5 POW ER5
POWER5 POW ER5
Chapter 2. Hardware architecture 33
Major design efforts have contributed to the development of the p5-590 and p5-595 to analyze single points of failure within the CEC and to either eliminate them or to provide hardening capabilities to significantly reduce their probability of failure.
2.4.1 CEC backplane
The CEC backplane is a double-sided passive backplane that serves as the mounting unit for various system components. The top view of p5-595 CEC is shown in Figure 2-8; the p5-590 CEC is shown in Figure 2-9. There are no physical differences between the p5-590 backplane and the p5-595 backplane.
Figure 2-8 p5-595 CEC (top view)
Figure 2-9 p5-590 CEC (top view)
Book 1
Book 2
Book 3
Book 0
required
DCA11
DCA12
DCA13
DCA21
DCA22
DCA23
DCA31
DCA32
DCA33
DCA01
DCA02
DCA03
SP0
SP1
OSC 0
Front side
Back side
OSC 1
CEC mother board
VPD board
VPD board
2 Service Processor
cards
2 Oscillator cards
3 DCAs per node
Book 1
Filler book
Book 0
required
DCA11
DCA12
DCA13
DCA filler book
DCA01
DCA02
DCA03
SP0
DCA filler book
DCA filler book
SP1
OSC 0
Front side
Back side
OSC 1
CEC mother board
VPD board
VPD board
2 Service Processor
cards
2 Oscillator cards
3 DCAs per node
Filler book
DCA filler book
DCA filler book
DCA filler book
34 IBM Eserver p5 590 and 595 System Handbook
The backplane is positioned vertically in the center of the CEC, and provides mount spaces for processor books with sixteen POWER5 processors and sixteen level 3 (L3) cache modules, eight memory cards, and four I/O books. Figure 2-10 depicts component population on both the front side and back side of the backplane.
The clock card distributes sixteen paired clock signals to the four processor books. The connection scheme for each processor book consists of a pair of parallel connectors, a base and a mezzanine. As seen in Figure 2-10, there are twelve distributed converter assembly (DCA) connectors, two service processor card connectors and two clock card connectors on the back side of the backplane. In addition to the logic card connections there are bus bar connectors on the front side on the top and bottom of the card.
Figure 2-10 CEC backplane (front side view)
The Central Electronics Complex is an 18 EIA unit drawer that houses: 򐂰 1 to 4 processors books (nodes).
SLOT 1 SLOT 2
SLOT 3
SLOT 4
OUT
IN
PLUG Seq.
OSC0 OSC1
1234
OUT
IN
SP0
OUT
OUT
IN
OUT
IN
OUT
IN
OUT
IN
OUT
IN
Darker components = front
Book 0
Book 1 Book 2 Book 3
IN
OUT
DCA01
RIO-2
IN
1
0
0
R
o
w
s
90 Rows
1
0
5
R
o
w
s
9
5
R
o
w
s
0101
RIO-2
SIG
SIG
SIG
SIG
SIG
SIG
SIG
SIG
SIG
SIG
SIG
SIG
1
0
0
R
o
w
s
1
0
0
R
o
w
s
1
0
0
R
o
w
s
9
5
R
o
w
s
9
5
R
o
w
s
9
5
R
o
w
s
1
0
5
R
o
w
s
1
0
5
R
o
w
s
1
0
5
R
o
w
s
90 Rows
90 Rows
90 Rows
A01
A01
A01
A01
A01
DCA02
DCA00
GND
PWR_1V2_0
DCA10
DCA11
DCA12
DCA22
DCA21
DCA20
DCA32
DCA31
DCA30
PWR_1V2_1
SP1
GND
PWR_1V2_2
GND
PWR_1V2_3
VPD
PWR_1V2_0
DRA_0
GND
M
e
zz
a
B
a
s
e
B
a
s
e
B
a
s
e
B
a
s
e
M
e
zza
M
e
zza
M
e
z
za
GND
PWR_1V2_1
DRA_1
GND
PWR_1V2_2
DRA_2
GND
PWR_1V2_3
DRA_3
GND
A01
Lighter components = backside
Backplane as seen from front side
VPD
Chapter 2. Hardware architecture 35
The processor book contains the POWER5 processors, the L3 cache modules located in Multichip modules, memory and RIO-2 attachment cards.
򐂰 CEC backplane (double sided passive backplane) that serves as the system
component mounting unit. Processor books plug into the front side of the backplane. The node distributed converter assemblies (DCA) plug into the back side of the backplane. The DCAs are the power supplies for the individual processor books.
A Fabric bus structure on the backplane board provides communication between books.
򐂰 Service processor unit.
Located in the panel above the distributed converter assemblies (DCA): contains redundant service processors and Oscillator cards.
򐂰 Remote I/O (RIO) adapters to support attached I/O drawers. 򐂰 Fans and blowers for CEC cooling. 򐂰 Light strip (front, rear).
The backplane is positioned vertically in the center of the CEC, and provides mount spaces for processor books.This is a double sided passive backplane. Figure 2-8 depicts component population on both the front side and back side of the backplane.
The CEC backplane provides the following types of slots: 򐂰 Slots for up to four processor books.
Processor Books plug into the front side of the backplane and are isolated into their own power planes which allows the system to power on/off individual nodes within the CEC.
򐂰 Slots for up to 12 distributed converter assemblies DCA. Three DCAs per
processor book provide N+1 logic redundancy.The DCA trio is located on the rear CEC, behind the processor book they support.
򐂰 Fabric bus for communications between processor books.
Located in the panel above the CEC DCAs are
򐂰 Service processor and OSC unit assembly 򐂰 VPD card
Note: In the p5-590 configuration, book 2 and 3 are not required
36 IBM Eserver p5 590 and 595 System Handbook
2.5 System flash memory configuration
In the p5-590 and p5-595, a serial electronically erasable programmable read only memory (sEEPROM) adapter plugs into the back of the central electronics complex backplane. The platform firmware binary image is programmed into the system sEEPROM, also known as
system FLASH memory. FLASH memory is
initially programmed during manufacturing of the p5-590 and p5-595 systems. However, this single binary image can be reprogrammed to accommodate firmware fixes provided to the client using the hardware management console.
The firmware binary image contains boot code for the p5-590 and p5-595. This boot code includes, but is not limited to, system service processor code, code to initialize the POWER5 processors, memory, and I/O subsystem components, partition management code, and code to support Advanced POWER Virtualization. The firmware binary image also contains hardware monitoring code used during partition run time.
During boot time, the system service processor dynamically allocates the firmware image from flash memory into main system memory. The firmware code is also responsible for loading the operating system image into main memory. Additional information regarding the system service processor can be found in Chapter 7, “Service processor” on page 169.
Refer to 7.4, “Firmware updates” on page 186 for a summary of the firmware update process. Refer to 2.4, “Central electronics complex” on page 30 for more information regarding the system CEC.
2.6 Vital product data and system smart chips
Vital product data (VPD) carries all of the necessary information for the service processor to determine if the hardware is compatible and how to configure the hardware and chips on the card. The VPD also contains the part number and serial number of the card used for servicing the machine as well as the location information of each device for failure analysis. Since the VPD in the card carries all information necessary to configure the card, no card device drivers or special code has to be sent with each card for installation.
Smart chips are micro-controllers used to store vital product data (VPD). The smart chip provides a means for securely storing data that cannot be read, altered, or written other than by IBM privileged code. The smart chip provides a means of verifying IBM Sserver On/Off Capacity on Demand and IBM Sserver Capacity Upgrade on Demand activation codes that only the smart chip on the intended system can verify. This allows clients to purchase additional spare
Chapter 2. Hardware architecture 37
capacity and pay for use only when needed. The smart chip is the basis for the CoD function and verifying the data integrity of the data stored in the card.
2.7 I/O drawer
The p5-590 and p5-595 use remote I/O drawers (that are 4U) for directly attached PCI or PCI-X adapters and SCSI disk capabilities. Each I/O drawer is divided into two separate halves. Each half contains 10 blind-swap hot-plug PCI-X slots and one or two Ultra3 SCSI 4-pack backplanes for a total of 20 PCI-X slots and up to 16 hot-swappable disk bays per drawer.
A minimum of one I/O drawer (FC 5791 or FC 5794) is required per system. I/O drawer feature number 5791 contains 20 PCI-X slots and 16 disk bays, and feature number 5794 contains 20 PCI-X slots and 8 disk bays.
Existing 7040-61D I/O drawers may be attached to a p5-590 or p5-595 as additional I/O drawers, if available. Only 7040-61D I/O drawers containing feature number 6571 PCI-X planars are supported. FC 6563 PCI planars must be replaced with FC 6571 PCI-X planars before the drawer can be attached. RIO-2 drawer interconnects is the only supported protocol (as opposed to the older RIO) in the p5-590 and p5-595.
Only adapters supported on the p5-590 and p5-595 feature I/O drawers are supported in 7040-61D I/O drawers, if attached. Unsupported adapters must be removed before attaching the drawer to the p-590 and p5-595 server. The p5-590 and p5-595 only support EEH adapters when partitioned.
A maximum of eight I/O drawers can be connected to a p5-590. Each I/O drawer contains twenty 3.3-volt PCI-X adapter slots and up to sixteen disk bays. Fully configured, the p5-590 can support 160 PCI adapters and 128 disks at 15,000 rpm.
A maximum of 12 I/O drawers can be connected to a p5-595. Each I/O drawer contains twenty 3.3-volt PCI-X adapter slots and up to sixteen disk bays. Fully configured, the p5-595 can support 240 PCI adapters and 192 disks at 15,000 rpm.
A blind-swap hot-plug cassette (equivalent to those in FC 4599) is provided in each PCI-X slot of the I/O drawer. Cassettes not containing an adapter will be shipped with a plastic filler card installed to help ensure proper environmental characteristics for the drawer. If additional blind-swap hot-plug cassettes are needed, FC 4599 should be ordered.
All 10 PCI-X slots on each I/O drawer planar are capable of supporting either 64-bit or 32-bit PCI or PCI-X adapters. Each I/O drawer planar provides 10 PCI-X
38 IBM Eserver p5 590 and 595 System Handbook
slots capable of supporting 3.3 V signaling PCI or PCI-X adapters operating at speeds up to 133 MHz. Each I/O drawer planar incorporates two integrated Ultra3 SCSI adapters for direct attachment of the two 4-pack blind-swap backplanes in that half of the drawer and these adapters do not support external SCSI device attachments. Each half of the I/O drawer is powered separately. FC 5791 is a 7040-61D with 16 disk bays and FC 5794 is a 7040-61D with eight disk bays.
2.7.1 EEH adapters and partitioning
The p5-590 and p5-595 systems are currently only orderable with adapters that support EEH. Support of a non-EEH adapter (OEM adapter) is only possible when the system has not been configured for partitioning. This is the case when a new system is received, for example, and it is in full system partition and is planned to be used without an HMC. EEH will be disabled for that adapter upon system initialization.
When the platform is prepared for partitioning or is partitioned the POWER Hypervisor prevents disabling EEH upon system initialization. Firmware in the partition will detect any non-EEH device driver that are installed and not configure them. Therefore, all adapters in p5 systems must be EEH capable in order to be used by a partition. This applies to I/O installed in I/O drawers attached to a Sserver p5 system.
A client does not need to actually create more than a single partition to put the platform in a state where the Hypervisor considers it to be partitioned. The platform becomes partitioned (in general, but also in specific reference to EEH enabled by default) as soon as the client attaches an HMC and performs any function that relates to partitioning. Simple hardware service operations do not partition the platform, so it is not simply connecting an HMC that has this affect. But, modifying any platform attributes related to partitioning (such as booting under HMC control to only PHYP standby, and suppressing autoboot to the pre-installed OS partition) results in a partitioned platform, even if the client does not actually create additional partitions.
All Sserver p5 platform IO slots are managed the same with respect to EEH.
2.7.2 I/O drawer attachment
System I/O drawers are connected to the p5-590 and p5-595 CEC using RIO-2 loops. Drawer connections are made in loops to help protect against a single point-of-failure resulting from an open, missing, or disconnected cable. Systems with non-looped configurations could experience degraded performance and serviceability. The system has a non-looped configuration if only one RIO-2 path is running.
Chapter 2. Hardware architecture 39
RIO-2 loop connections operate at 1 GHz. RIO-2 loops connect to the system CEC using RIO-2 loop attachment adapters (FC 7818). Each of these adapters has two ports and can support one RIO-2 loop. Up to six of the adapters can be installed in each 16-way processor book. Up to 8 or 12 I/O drawers can be attached to the p5-590 or p5-595, depending on the model and attachment configuration.
I/O drawers may be connected to the CEC in either single-loop or dual-loop mode. Dual-loop mode is recommended whenever possible as it provides the maximum bandwidth between the I/O drawer and the CEC.
򐂰 Single-loop (Figure 2-11) mode connects an entire I/O drawer to the CEC
using one RIO-2 loop (2 ports). The two I/O planars in the I/O drawer are connected together using a short RIO-2 cable. Single-loop connection requires one loop (2 ports) per I/O drawer.
򐂰 Dual-loop (Figure 2-12) mode connects each I/O planar in the drawer to the
CEC separately. Each I/O planar is connected to the CEC using a separate RIO-2 loop. Dual-loop connection requires two loops (4 ports) per I/O drawer. With dual-loop configuration, the RIO-2 bandwidth for the I/O drawer is higher.
Table 2-3 lists the number of single-looped and double-looped I/O drawers that can be connected to a p5-590 or p5-595 server based on the number of processor books installed:
Table 2-3 Number of possible I/O loop connections
On initial orders of p5-590 or p5-595 servers, IBM manufacturing will place dual-loop-connected I/O drawers as the lowest numerically designated drawers followed by any single-looped I/O drawers.
򐂰 A minimum of two cables are required for each loop for each GX+ adapter.
Interconnection between drawers in a loop requires a additional RIO-2cable.
򐂰 FC 7924 (0.6 m) can only be used as a jumper cable to connect the two I/O
drawer planars in a single loop.
򐂰 FC 3147 (3.5 m) can only be used to connect FC 5791/5794 risers that are in
either a FC 5792 frame or the FC 8691 frame bolted to the primary frame to the GX+ adapters in a processor book.
Number of processor books Single-looped Dual-looped
1 6 3
2 8 (590) 12 (595) 6
3 12 (p5-595) 9 (p5-595)
4 12 (p5-595) 12 (p5-595)
40 IBM Eserver p5 590 and 595 System Handbook
򐂰 For the 9119-595 FC 3170 (8.0 m) can only be used to connect
FC 5791/5794 risers that are in either a FC 5792 frame or the FC 8691 frame bolted to the FC 5792 frame to the GX+ adapters in a processor book.
򐂰 I/O drawer RIO-2 cables are no longer than 8 m. 򐂰 For GX+ adapters to I/O drawer cabling, the first I/O drawer is connected to
the first GX+ adapter. The second I/O drawer to the next available GX+ adapter. All double-looped drawers will be connected first and then the single-looped.
򐂰 The RIO-2 cabling is GX+ adapter port 0 to I/O drawer riser port 0 and GX+
adapter port 1 to I/O drawer port 1.
2.7.3 Full-drawer cabling
For an I/O drawer, the following connections are required.
򐂰 One cable from the P1 RIO-2 Riser card J0 to CEC I/O card Px-Cx-T1. 򐂰 One cable from the P2 RIO-2 Riser card J1 to CEC I/O card Px-Cx-T2.
These cables provides a data and communication path between the memory cards and the I/O drawers (Figure 2-11).
򐂰 A cable is also added between P1 RIO-2 Riser card J1 and P2 RIO-2 Riser
card J0 in each drawer. This cable ensures that each side of the drawer (P1 and P2) can be accessed
by the CEC I/O (RIO-2 adapter) card, even if one of the cables are damaged. Each half of the I/O drawer can communicate with the CEC I/O card for its own uses or on behalf of the other side of the drawer.
There is an identifier (ID) in the cable plugs which gives the length of the cable to the Inter-Integrated Circuit (I2C) bus and the service processor.
The cable ID is the function that verifies the length of the RIO-2 cable. There are different RIO-2 cables, because we use CEC frame, powered 24 inch A frame, and unpowered 24 inch Z frame for the I/O drawer. With the cable ID we calculate the length and the link speed for the RIO-2 cable.
Chapter 2. Hardware architecture 41
Figure 2-11 Single loop 7040-61D
2.7.4 Half-drawer cabling
Although I/O drawers will not be built in half-drawer configurations, they can be cabled to, and addressed by the CEC, in half drawer increments (Figure 2-12).
Both STI connectors on one CEC I/O card Px-Cx-T1 and Px-Cx-T2 will be cabled to both ports on P1 RIO-2 Riser card.
Both STI connectors on a different CEC I/O card Px-Cx-T1 and Px-Cx-T2 (possibly on a different processor book) will be cabled to both ports on P2 RIO-2 Riser card.
CP B
CP A
1
RIO-2 CARD
0
1
RIO-2 CARD
0
1
0
1
0
RIO-2 CARD
RIO-2
CARD
P1 P2
P1 P2
0
1
0
1
CEC GX+ I/O RIO-2 PORTS
42 IBM Eserver p5 590 and 595 System Handbook
Figure 2-12 Dual loop 7040-61D
However, to simplify the management of the server we strongly recommend that I/O loops be configured as described in the IBM Sserver Information Center, and to only follow a different order when absolutely necessary.
In any case, it becomes extremely important for the management of the system to keep an up-to-date cabling documentation of your systems, because it may be different from the cabling diagrams of the installation guides.
The I/O drawers provide internal storage and I/O connectivity to the system. Figure 2-13 shows the rear view of an I/O drawer, with the PCI-X slots and riser cards that connect to the RIO-2 ports in the I/O books.
CP B
CP A
RIO-2
CARD
0
RIO-2
CARD
1
1
0
P1 P2
0
1
0
1
CEC GX+ I/O RIO-2 PORTS
Chapter 2. Hardware architecture 43
Figure 2-13 I/O drawer RIO-2 ports
Each drawer is composed of two physically symmetrical I/O planar boards that contain 10 hot-plug PCI-X slots each, and PCI adapters can be inserted in the rear of the I/O drawer. The planar boards also contain either one or two integrated Ultra3 SCSI adapters and SCSI Enclosure Services (SES), connected to a SCSI 4-pack backplane.
2.7.5 blind-swap hot-plug cassette
Also named the blind-swap mechanism (Figure 2-14) or PCI carrier, each PCI (I/O) Card must be housed in a blind-swap hot-plug cassette before being installed.
All PCI-X slots on the p5-590 and p5-595 are PCI 2.2-compliant and are hot-plug enabled, which allows most PCI adapters to be removed, added, or replaced without powering down the system. This function enhances system availability and serviceability.
The function of hot-plug PCI adapters is to provide concurrent adding or removal of PCI adapters when the system is running. In the I/O drawer, the installed adapters are protected by plastic separators called blind-swap hot-plug cassettes. These are used to prevent grounding and damage when adding or
10 PCI slots per side
PCI-X Cards C1-C10
64 bit 66MHz 3.3
or
64 bit 33MHz 5V
RIO-2 Riser Cards
P1 side
P2 side
2 DASD 4-packs
per side
Z2Z1Z2Z1
C1 C2 C3 C4 C5 C6 C7 C8 C9C10 C1 C2 C3 C4 C5 C6 C7 C8 C9C10
RIO-2 ports
44 IBM Eserver p5 590 and 595 System Handbook
removing adapters. The hot-plug LEDs outside the I/O drawer indicate whether an adapter can be plugged into or removed from the system.
The hot-plug PCI adapters are secured with retainer clips on top of the slots; therefore, you do not need a screwdriver to add or remove a card, and there is no screw that can be dropped inside the drawer.
Figure 2-14 blind-swap hot-plug cassette
2.7.6 Logical view of a RIO-2 drawer
Figure 2-15 shows a logical schematic of an I/O drawer (FC 5791) and the relationship between the internal controllers, disks, and I/O slots.
Chapter 2. Hardware architecture 45
Figure 2-15 I/O drawer top view - logical layout
Each of the 4-packs supports up to four hot-swappable Ultra3 SCSI disk drives, which can be used for installation of the operating system or storing data.
The 36.4 GB and 73.4 GB disks have the following characteristics:
򐂰 Form factor: 3.5-inch, 1-inch (25 mm) high 򐂰 SCSI interface: SCSI Ultra3 (Fast 80) 16 bit 򐂰 Rotational speed: 15,000 rpm (disks with 10K rotational speeds from earlier
systems are not supported)
The RIO-2 riser cards are connected to the planar boards. The RIO-2 ports of each riser card are connected through I/O loops to RIO-2 ports on I/O books in the CEC. The connectivity between the I/O drawer RIO-2 ports and the I/O books RIO-2 ports is described in Remote I/O loop.
BP BP BP BP
PLANAR BOARD 1
POWER
SUPPLY
1
POWER SUPPLY
2
D A S D
2
D A S D 3
D A S D
4
D A S D 5
D A S D 6
D A S D 7
D A S D 8
D
A S
D
9
D A S D
10
D A S D
11
D A S D
12
D A S D
13
D
A S
D
14
D A S D
15
D A S D
16
MIDPLANE
VPD
SSSS SS SS SS SS SS SSSSSS SS SS SS SS SSSS
D A S D
1
SS
=Soft Switch
=PCI-X
=DIO
VPD VPDVPD
PWR CNTL
PWR
CNTL
PWR CNTL
PWR
CNTL
=PHB
P C
I
1
EADSXEADSX
P
C
I
2
P C
I
3
P
C
I 4
P C
I
5
P C
I
7
P C
I 8
P C
I
9
P C
I 1 0
SCSI SES SCSISES
R
I S E R
EADSX
P C
I
6
RIO X2
I/O
SS SS SS SS SS SSSSSS SSSS
SS SS
PLANAR BOARD 2
P C
I 1 1
EADSXEADSX
P C
I 1 2
P C
I 1 3
P C
I 1 4
P C
I 1 5
P C
I 1 7
P C
I 1 8
P C
I 1 9
P C
I 2 0
SCSI SES SCSISES
R
I S E
R
EADSX
P
C
I 1 6
RIO X2
I/O
SS SS SS SS SSSSSSSS SSSS
SS SS
TERM TERM TERM TERM
RIO-2
RIO-2
Main
SS
Main
SS
=SCSI
46 IBM Eserver p5 590 and 595 System Handbook
On each planar board, the ten PCI-X slots have a 3.3V PCI bus signaling and operates at 33 MHz, 66 MHz, or 133 MHz, depending on the adapter. All PCI-X slots are PCI 2.2 compliant and are hot-plug enabled (see 2.7.6, “Logical view of a RIO-2 drawer” on page 44).
PCI adapters have different bandwidth requirements, and there are functional limitations on the number of adapters of a given type in an I/O drawer or a system.
The complete set of limitations are described in the Hardware Information Center. This is regularly updated and should be considered as the reference for any questions related to PCI limitations.
In a RIO-2 I/O-drawer all the I/O slots can be populated with high speed adapters (for example, Gigabit Ethernet, Fibre Channel, ATM or Ultra320 SCSI adapters). All can be populated, but in some situations we might not get optimum performance on each by bandwidth limitation.
2.7.7 I/O drawer RAS
If there is an RIO-2 failure in a port or cable, an I/O planar board can route data through the other I/O connection and share the remaining RIO-2 cable for I/O.
For power and cooling, each drawer has two redundant DC power supplies and four high reliability fans. The power supplies and fans have redundancy built into them, and the drawer can operate with a failed power supply or a failed fan. The hard drives and the power supplies are hot-swappable, and the PCI adapters are hot-plug.
All power, thermal, control, and communication systems are redundant in order to eliminate outages due to single-component failures.
I/O subsystem communication and monitoring
There are two main communication subsystems between the CEC and the I/O drawers. The power and RAS infrastructure are responsible for gathering environmental information and controlling power on I/O drawers. The RIO-2 loops are responsible for data transfer to and from I/O devices.
Power and RAS infrastructure
The power cables that connect each I/O drawer and the bulk power assembly (BPA) provide both electricity power distribution and reliability, availability, and serviceability infrastructure functions that include:
򐂰 Powering all system components up or down, when requested. These
components include I/O drawers and the CEC.
Chapter 2. Hardware architecture 47
򐂰 Powering down all the system enclosures on critical power faults. 򐂰 Verifying power configuration. 򐂰 Reporting power and environmental faults, as well as faults in the RAS
infrastructure network itself, on operator panels and through the service processor.
򐂰 Assigning and writing location information into various VPD elements in the
system.
The power and RAS infrastructure monitors power, fans, and thermal conditions in the system for problem conditions. These conditions are reported either through an interrupt mechanism (for critical faults requiring immediate operating system action) or through messages passed from the RAS infrastructure to the service processor to Run-Time Abstraction Service (RTAS).
2.7.8 Supported I/O adapters in p5-595 and p5-590 systems
The following are configuration rules for the I/O drawer.
I/O Drawer (5791/5794) adapters placement (p5-590 and p5-595 only)
The FC 5791 drawer provides 20 blind-swap hot-plug PCI-X slots and 4 integrated DASD backplanes that support up to 16 hot-swappable disk bays. The FC 5794 drawer is the same as FC 5791 but supports only two integrated DASD backplanes that support up to eight hot-swappable disk bays. The 20 PCI-X slots are divided into six PCI Host Bridges (PHB) as follows:
򐂰 PHB1 = slots 1, 2, 3, 4 򐂰 PHB2 = slots 5,6,7; Z1 onboard 򐂰 PHB3 = slots 8,9,10, Z2 onboard 򐂰 PHB4 = slots 11, 12, 13, 14 򐂰 PHB5 = slots 15, 16, 17, Z1 onboard 򐂰 PHB6 = slots 18, 19, 20, Z2 onboard
Figure 2-16 and Figure 2-17 show how to find more information about PCI adapter placement in IBM Sserver Hardware Information Center by placing a search for
PCI placement.
http://publib.boulder.ibm.com/infocenter/eserver/v1r2s/en_US/index.htm
Note: It is the cabling between the RIO-2 drawer and the BPA that defines the numbering of the I/O drawer not the physical location of the drawer.
48 IBM Eserver p5 590 and 595 System Handbook
Search for pci placement or refer to:
http://publib.boulder.ibm.com/infocenter/eserver/v1r2s/en_US/info/iphak/expansi on61d.htm#expansion61d
Figure 2-16 Hardware Information Center search for PCI placement
Figure 2-17 on page 49 shows sample results after searching for PCI placement.
Chapter 2. Hardware architecture 49
Figure 2-17 Select Model 590 or 595 placement
Adapters are placed based on the highest position in the table first, into the first slot in the slot priority for that row in the table. If that slot is filled, place the card in the next available slot in the slot priority for that adapter. Adapters have been divided into three performance categories as required.
򐂰 The first I/O-drawer at frame position EIA 05 must have a PCI-X Dual Channel
Ultra320 SCSI Adapter (FC 5712) for connection to a media device (FC
5710). It will be placed in either slot 10 or slot 20.
򐂰 A blind-swap hot-plug cassette will be assigned for every adapter card on the
order. Empty slots will be assigned a blind-swap hot-plug cassette with a plastic filler. Additional blind-swap hot-plug cassettes (FC 4599) can be ordered through your IBM Sales Representative or IBM Business Partner.
򐂰 Actual slot numbers are stamped on planar 1 = I1 through I10 (left to right
from rear). Slot numbers stamped on planar 2 are also numbered I1 through I10 and are represented as slots 11 through 20.
50 IBM Eserver p5 590 and 595 System Handbook
Adapter placement sequence
The adapters will be spread across PHBs (PHB1 = slots 1 to 4, PHB2 = slots 5 to 7, PHB3 = slots 8 to 10, PHB4 = slots 11 to 14, PHB5 = slots 15 to 17, PHB6 = slots 18 to 20) in each drawer starting with the primary drawer (EIA 5) in the following sequence (Figure 2-18).
Figure 2-18 PCI-X slots of the I/O drawer (rear view)
See Appendix B, “PCI adapter placement guide” on page 253 for more information about adapter placement.
2.7.9 Expansion units 5791, 5794, and 7040-61D
The following URL for IBM Sserver Information Center provides direction on which adapters can be placed in the 5791, 5794, and 7040-61D expansion units and where adapters should be placed for optimum performance. Figure 2-19 shows the screen capture of the Information Center.
http://publib.boulder.ibm.com/infocenter/eserver/v1r2s/en_US/index.htm
Search for pci placement 595 or refer to:
Chapter 2. Hardware architecture 51
http://publib.boulder.ibm.com/infocenter/eserver/v1r2s/en_US/info/iphak/expansi on61d.htm#expansion61d
Figure 2-19 PCI placement guide on IBM Sserver Information Center
Model 5791 and 5794 expansion units
The following is an overview of the PCI placement information located in the Information Center. It is intended to give you an idea of what you may find there.
򐂰 System unit back view 򐂰 PCI-X slot description 򐂰 Recommended system unit slot placement and maximums 򐂰 Performance notes (for optimum performance)
Note: Model 7040-61D expansion units can be migrated if they contain the PCI-X planar (FC 6571). Units with the non-PCI-X planar (FC 6563) cannot be migrated.
52 IBM Eserver p5 590 and 595 System Handbook
򐂰 Expansion unit back view 򐂰 Slots 1 through 20 are compatible with PCI or PCI-X adapters 򐂰 All slots support Enhanced Error Handling (EEH)
– The Uffff.ccc.sssssss.Pn.Cm..... represents the Hardware Management
Console location code, which provides information as to the identify of the enclosure, backplane, PCI adapter(s), and connector. The ffff.ccc.sssssss in the location code represents the following:
• ffff = Feature Code of the Enclosure (drawer or processor book)
• ccc = the Sequence Number of the Enclosure
• sssssss = the Serial Number of the Enclosure.
Recommended system unit slot placement and maximums
򐂰 Extra High Bandwidth (EHB) adapter. See the Performance notes before
installing this adapter.
򐂰 High Bandwidth (HB) adapter. See the Performance notes before installing
this adapter.
򐂰 For more information about listed adapters, see pSeries PCI and PCI-X
adapters.
򐂰 System unit information 򐂰 No more than three Gigabit (Gb) Ethernet ports per PHB. 򐂰 No more than three high bandwidth adapters per PHB. 򐂰 No more than one extra high bandwidth adapter per PHB. 򐂰 No more than one 10 Gb Ethernet port per two CPUs in a system. If one
10 Gb Ethernet port is present per two CPUs, no other 10 Gb or 1 Gb Ethernet ports should be installed for optimum performance.
򐂰 No more than two 1 Gb Ethernet ports per one CPU in a system for maximum
performance. More Ethernet adapters may be added for connectivity.
Figure 2-20 on page 53 and Figure 2-21 on page 53 show the possible system configurations including the I/O frames.
Chapter 2. Hardware architecture 53
Figure 2-20 Minimum to maximum I/O configuration
Figure 2-21 I/O frame configuration example
CEC BPA
CEC
A Frame
1-4
CEC
books
I/O
I/O
I/O
I/O
1-4
I/O drawers
I/O drawersI/O drawers
I/O
BPA
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
1-8
I/O
A Frame
NOTE: I/O A Frame BPA powers all hardware in I/O Z frame
I II III
9-16(18)
I/O
I/O
I/O
I/O
I/O
I/O
I/O
Z Frame
I/O
I/O
I/O
I/O
U01
U05
U09
U13
U19
U23
U27
U31
U35
U39
U17
CEC
BPA
CEC
books
I/O
I/O
I/O
I/O
1-4
CEC
A Frame
I/O
BPA
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
1-8
I/O
A Frame
CEC BPA
CEC
books
I/O
I/O
I/O
I/O
1-4
CEC
A Frame
54 IBM Eserver p5 590 and 595 System Handbook
2.7.10 Configuration of I/O drawer ID and serial number
In some cases if the I/O drawer was previously connected to a p650 or p655 you must configure the ID and serial number.
Using the ASMI5 to set the configuration ID
To perform this operation, the server must be powered on and your authority level must be one of the following:
– Administrator (Login as
Admin)
– Authorized service provider
򐂰 If the AC power is not applied, then apply it now. 򐂰 The drawer may power up automatically. 򐂰 FRU replacement will generate a new temporary unit value in the expansion
unit control panel. Use this new value to power down the expansion drawer without removing the power cord using
powering off an expansion unit. Then
return here and continue with the next step.
򐂰 On the ASMI welcome panel, specify your user ID and password, and click
Log In.
򐂰 In the navigation area, expand
System Configuration and click Configure I/O
Enclosures
.
򐂰 Select the unit identified by the new value in the panel of the unit you are
working on. In most cases it will appear as
TMPx.yyy.yyyyyyy where x is a
hex digit, and the
y's are any value.
򐂰 Select
change settings.
򐂰 Enter the Power Control Network Identifier:
– 81 for 5074 and 5079 expansion units – 89 for 5088 and 0588 expansion units – 8A for 5094 and 5294 expansion units – 8B for 5095 and 0595 expansion units – 88 for 5790 expansion units
򐂰 Enter the type-model from the label on the I/O unit. 򐂰 Enter the serial number (also called sequence number) from the label on the
I/O unit.
򐂰 Click
Save Setting to complete the operation.
򐂰 Do not use the browser’s back button or the values will not be saved.
5
More information about Advanced System Management Interface (ASMI) can be found in 7.3,
“Advanced System Management Interface” on page 175
Chapter 2. Hardware architecture 55
򐂰 Verify the correct value is now displayed in the panel of the unit you are
working on.
򐂰 Disconnect all AC power to the unit, wait for the panel to go off, and then
reconnect the AC power.
Using the control panel to set the configuration ID
To perform this operation, the server must be powered on.
Control panel function 07 is used to query and set the configuration ID and to
display the frame address of any drawer connected to the SPCN network. Since the drawer’s panel will have the MTMS and not frame address displayed, a function is provided to display the frame address.
򐂰 If the AC power is not applied, then apply it now. 򐂰 The drawer may power up automatically. 򐂰 FRU replacement will generate a new temporary unit value in the expansion
unit control panel. Use this new value to power down the expansion drawer without removing the power cord. See Powering off an expansion unit, then return here and continue with the next step.
򐂰 Select
function 07 on the control panel and press Enter.
򐂰 Select
sub function A6 to display the address of all units. The frame address
is displayed on all units for 30 seconds.
򐂰 Note the frame address on the unit that you are working on for use in the next
steps.
򐂰 Select
sub function A9 to set the ID of a drawer.
򐂰 Use the arrow keys to increment/decrement to the first two digits of the frame
address noted above.
򐂰 Press Enter. 򐂰 Use the arrow keys to increment/decrement to the last two digits of the frame
address noted above.
򐂰 Press Enter. 򐂰 Use the arrow keys to increment/decrement to a configuration ID for the type
of unit you are working on:
– 81 for 5074 and 5079 expansion units – 89 for 5088 and 0588 expansion units – 8A for 5094 and 5294 expansion units
Note: The drawer will automatically power on. Log off and close the ASMI and return to the procedure that sent you here.
56 IBM Eserver p5 590 and 595 System Handbook
– 8B for 5095 and 0595 expansion units – 88 for 5790 expansion units
򐂰 Press Enter (078x 00 will be displayed). 򐂰 Use the arrow keys to increment/decrement until 07** is shown. 򐂰 Press Enter to return the panel to 07. 򐂰 Disconnect all AC power to the unit, wait for the panel to go off and then
reconnect the AC power.
򐂰 The drawer will automatically power on. 򐂰 Continue with the next step to update the MTMS value using the ASMI. If you
do not have access to the ASMI, then return to the procedure that sent you here.
򐂰 On the ASMI Welcome pane, specify your user ID and password, and click
Log In.
򐂰 In the navigation area, expand System Configuration and click Configure I/O
Enclosures.
򐂰 Select the unit identified by the new value in the panel of the unit you are
working on. In most cases it will appear as
TMPx.yyy.yyyyyyy where x is a
hex digit, and the
y's are any value.
򐂰 Select change settings. 򐂰 Enter the type-model from the label on the I/O unit 򐂰 Enter the serial number (also called sequence number) from the label on the
I/O unit.
򐂰 Click Save Setting to complete the operation.
Note: Do not use the browser back button or the values will not be saved. Verify the correct value is now displayed in the panel of the unit you are working on. Log off and close the ASMI. Then return to the procedure that sent you here.
© Copyright IBM Corp. 2005. All rights reserved. 57
Chapter 3. POWER5 virtualization
capabilities
Virtualization is a critical component in the on demand operating environment, and the system technologies implemented in the POWER5 processor-based IBM Sserver p5 servers provide a significant advancement in the implementation of functions required for operating in this environment. IBM virtualization innovations on the p5-590 and p5-595 provide industry-unique utilization capabilities for a more responsive, flexible, and simplified infrastructure.
Advanced POWER Virtualization is a no-charge feature (FC 7992) on the p5-590 and p5-595 systems. On other Sserver p5 systems, it is a priced feature.
The Advanced POWER Virtualization feature provides the foundation of POWER5 virtualization technology. In this chapter we introduce the Virtualization Engine and associated features, including Micro-Partitioning, shared processor pool, Virtual I/O Server (disk and LAN), and the Partition Load Manager for AIX 5L logical partitions.
3
58 IBM Eserver p5 590 and 595 System Handbook
3.1 Virtualization features
Virtualization Engine is comprised of a suite of system services and technologies that form key elements of IBMs on demand computing model. It treats resources of individual servers, storage, and networking products to function as a single pool, allowing access and management of resources across an organization more efficiently. Virtualization is a critical component in the on demand operating environment, and the system technologies implemented in the POWER5 processor-based IBM Sserver p5 servers provide a significant advancement in the enablement of functions required for operating in this environment.
The following sections explain the virtualization engine system technologies that are integrated into Sserver p5 system hardware and operating systems, including:
Micro-Partitioning Enables you to allocate less than a full physical
processor to a logical partition allowing increased overall resource utilization
Virtual Ethernet Provides network virtualization capabilities that allows
communications between integrated servers
Virtual I/O Server Provides the ability to dedicate I/O adapters and devices
to a virtual server, allowing the on demand allocation and management of I/O devices
POWER Hypervisor Supports partitioning and dynamic resource movement
across multiple operating system environments
The following reference is recommended for the reader that is looking for more introductory material on IBM concepts on virtualization: Advanced POWER Virtualization on IBM Eserver p5 Servers Introduction and Basic Configuration, SG24-7940.
3.2 Micro-Partitioning
Micro-Partitioning is an advanced virtualization feature of POWER5 systems with AIX 5L Version 5.3 and Linux (SUSE LINUX Enterprise Server 9 for POWER systems and Red Hat Enterprise Linux AS 3 for POWER) that allows multiple partitions to share the processing power of a set of physical processors. A partition can be assigned as little as 1/10th of a physical processor's resource. The POWER Hypervisor controls the dispatching of the physical processors to each of the shared processor partitions. In most cases, a shared processor pool containing multiple physical processors is shared among the partitions. Shared processor partitions (partitions using Micro-Partitioning technology) still need dedicated memory, but the partition's I/O requirements can be supported through
Chapter 3. POWER5 virtualization capabilities 59
a virtual Ethernet adapter and virtual SCSI server. Virtual Ethernet and virtual SCSI are briefly explained in 3.3, “Virtual Ethernet” on page 65 and 3.6, “Virtual SCSI” on page 81. Micro-Partitioning requires the Advanced POWER Virtualization capabilities.
3.2.1 Shared processor partitions
The virtualization of processors enables the creation of a partitioning model which is fundamentally different from the POWER4 systems where whole processors are assigned to partitions and are owned by them. In the new model, physical processors are abstracted into virtual processors that are then assigned to partitions, but the underlying physical processors are shared by these partitions.
Virtual processor abstraction is implemented in the hardware and firmware. From an operating system perspective, a virtual processor is indistinguishable from a physical processor. The key benefit of implementing partitioning in the hardware is to allow an operating system to run on POWER5 technology with little or no changes. Optionally, for optimal performance, the operating system can be enhanced to exploit shared processor pools more in-depth. For instance, by voluntarily relinquishing CPU cycles to the hardware when they are not needed. AIX 5L Version 5.3 is the first version of AIX 5L that includes such enhancements.
Micro-Partitioning allows for multiple partitions to share one physical processor. Partitions using Micro-Partitioning technology are referred to as shared processor partitions.
A partition may be defined with a processor capacity as small as 10 processor units. This represents 1/10 of a physical processor. Each processor can be shared by up to 10 shared processor partitions. The shared processor partitions are dispatched and time-sliced on the physical processors under control of the POWER Hypervisor.
Figure 3-1 on page 60 shows the POWER5 partitioning concept:
60 IBM Eserver p5 590 and 595 System Handbook
Figure 3-1 POWER5 partitioning concept
Micro-Partitioning is supported across the entire POWER5 product line from the entry to the high-end systems. Table 3-1 provides the maximum number of logical partitions and shared processor partition the different models.
Table 3-1 Micro-Partitioning overview on p5 systems
It is important to point out that the maximums stated are supported by the hardware, but the practical limits based on production workload demands may be significantly lower.
The shared processor partitions are created and managed by the HMC. When you start creating a partition you have to choose between a shared processor partition and a dedicated processor partition.
Sserver p5 servers
Model 510
Model 520
Model 550
Model 570
Model 575
Model 590
Model 595
Processors 2241683264
Dedicated processor partitions
2241683264
Shared processor partitions
20 20 40 160 80 254 254
Virtual I/O
Server
Virtual
adapter
Virtual
SCSI
Linux AIX
v5.2
AIX
v5.3
AIX v5.3AIX v5.3
LinuxLinux
AIX v5.3AIX v5.3
LinuxLinux
AIX v5.3AIX v5.3
POWER Hypervisor
Virtual Ethernet
I/O
Storage Network
I/O
Sto Net
I/O
Sto Net
I/O
Sto Net
HMC
2 CPUs 2 CPUs 3 CPUs 3 CPUs 6 CPUs
I/O
S N
Network
External storage
Micro-Partitioning
POWER5 Partitioning
Chapter 3. POWER5 virtualization capabilities 61
Virtual processors
Virtual processors are the whole number of concurrent operations that the operating system can use. The processing power can be conceptualized as being spread equally across these virtual processors. Selecting the optimal number of virtual processors depends on the workload in the partition. Some partitions benefit from greater concurrence, where other partitions require greater power.
From 0.1 to 1 processor units can be used by one virtual processor according to different setting. By default, the number of virtual processor will be automatically set to the minimum number of virtual processors needed to satisfy the assigned number of processor unit. The default settings maintains a balance of virtual processors to processor units. For example:
򐂰 If you specify 0.50 processing units, one virtual processor will be assigned. 򐂰 If you specify 2.25 processing units, three virtual processors will be assigned.
You also can use the advanced tab in your partitions profile to change the default configuration and to assign more virtual processors.
At the time of publication, the maximum number of virtual processors per partition is 64.
Dedicated processors
Dedicated processors are whole processors that are assigned to a single partition. If you choose to assign dedicated processors to a logical partition, you must assign at least one processor to that partition.
You cannot mix shared processors and dedicated processors in one partition.
By default, a powered-off logical partition using dedicated processors will have its processors available to the shared processing pool. When the processors are in the shared processing pool, an uncapped partition that needs more processing power can use the idle processing resources. However, when you power on the dedicated partition while the uncapped partition is using the processors, the activated partition will regain all of its processing resources. If you want to prevent dedicated processors from being used in the shared processing pool, you can disable this function using the logical partition profile properties panels on the Hardware Management Console.
Note: You cannot disable the allow idle processor to be shared function when you create a partition. You need to open the properties for the created partition and change it on the processor tab.
62 IBM Eserver p5 590 and 595 System Handbook
3.2.2 Types of shared processor partitions
Shared processor partitions can be of two different types, depending on the capacity they have of using idle processing resources available on the system. If a processor donates unused cycles back to the shared processor pool, or if the system has idle capacity (because there is not enough workload running), the extra cycles may be used by some partitions, depending on their type and configuration.
Capped partitions
A capped partition is defined with a hard maximum limit of processing capacity. That means that it cannot go over its defined maximum capacity in any situation, unless you change the configuration for that partition (either by modifying the partition profile or by executing a dynamic LPAR operation). Even if the system is idle, the capped partition may reach a processor utilization of 100%.
Figure 3-2 shows an example where a shared processor partition is capped at an entitlement of 9.5 (up to the equivalent of 9.5 physical processors). In some moments the processor usage goes up to 100 percent, and while the machine presents extra capacity not being used, by design the capped partition cannot use it.
Figure 3-2 Capped shared processor partitions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time
0
2
4
6
8
10
12
14
16
Used capacity
Chapter 3. POWER5 virtualization capabilities 63
Uncapped partitions
An uncapped partition has the same definition of a capped partition, except that the maximum limit of processing capacity limit is a soft limit. That means that an uncapped partition may eventually receive more processor cycles than its entitled capacity.
In the case it is using 100 percent of the entitled capacity, and there are idle processors in the shared processor pool, the POWER Hypervisor has the ability to dispatch virtual processors from the uncapped partitions to use the extra capacity.
In the example we used for the capped partition, if we change the partition from capped to uncapped, a possible chart for the capacity utilization is the one shown in Figure 3-3. It still has the equivalent of 9.5 physical processors as its entitlement, but it an use more resources if required and available.
Figure 3-3 Uncapped shared processor partitions
The number of virtual processors on an uncapped partition defines the largest capacity it can use from the shared processor pool. By making the number of virtual processors too small, you may limit the processing capacity of an uncapped partition. A logical partition in the shared processing pool will have at least as many virtual processors as its assigned processing capacity. If you have a partition with 0.50 processing units and 1 virtual processor, the partition cannot exceed 1.00 processing units because it can only run one job at a time, which cannot exceed 1.00 processing units. However, if the same partition with 0.50
12345678910111213141516
Time
0
2
4
6
8
10
12
14
16
Used capacity
64 IBM Eserver p5 590 and 595 System Handbook
processing units was assigned two virtual processors and processing resources were available, the partition could use an additional 1.50 processing units.
3.2.3 Typical usage of Micro-Partitioning technology
With fractional processor allocations, more partitions can be created on a given platform which enable clients to maximize the number of workloads that can be supported simultaneously. Micro-Partitioning enables both optimized use of processing capacity while preserving the isolation between applications provided by different operating system images.
There are several scenarios where the usage of Micro-Partitioning can bring advantages such as optimal resource utilization, rapid deployment of new servers and application isolation:
򐂰 Server consolidation
Consolidating several small systems onto a large and robust server brings advantages in management and performance, usually together with reduced total cost of ownership (TCO). A Micro-Partitioning system enables the consolidation from small to large systems without the burden of dedicating very powerful processors to a small partition. You can divide the processor between several partitions with the adequate processing capacity for each one.
򐂰 Server provisioning
With Micro-Partitioning and Virtual I/O Server, a new partition can be deployed rapidly, to accommodate unplanned demands, or to be used as a test environment.
򐂰 Virtual server farms
In environments where applications scale with the addition of new servers, the ability to create several partitions sharing processing resources is very useful and contributes for a better use of processing resources by the applications deployed on the server farm.
3.2.4 Limitations and considerations
The following limitations should be considered when implementing shared processor partitions:
򐂰 The limitation for a shared processor partition is 10 processing units of a
physical processor. So the number of shared processor partitions you can create for a system depends mostly on the number of processors of a system.
򐂰 The maximum number of partitions is 254. 򐂰 In a partition, there is a maximum number of 64 virtual processors.
Chapter 3. POWER5 virtualization capabilities 65
򐂰 A mix of dedicated and shared processors within the same partition is not
supported.
򐂰 If you dynamically remove a virtual processor you cannot specify a particular
virtual CPU to be removed. The operating system will choose the virtual CPU to be removed.
򐂰 Shared processors may render AIX 5L affinity management useless. AIX will
continue to utilize affinity domain information as provided by firmware to build associations of virtual processors to memory, and will continue to show preference to redispatching a thread to the virtual CPU that it last ran on.
Operating systems and applications running in shared partitions need not be aware that they are sharing processors. However, overall system performance can be significantly improved by minor operating system changes.
In a shared partition, there is not a fixed relationship between the virtual processor and the physical processor. The POWER Hypervisor will try to use a physical processor with the same memory affinity as the virtual processor, but it is not guaranteed. Virtual processors have the concept of a home physical processor. If it can't find a physical processor with the same memory affinity, then it gradually broadens its search to include processors with weaker memory affinity, until it finds one that it can use. As a consequence, memory affinity is expected to be weaker in shared processor partitions.
Workload variability is also expected to be increased in shared partitions, because there are latencies associated with the scheduling of virtual processors and interrupts. Simultaneous multi-threading may also increase variability, since it adds another level of resource sharing, which could lead to a situation where one thread interferes with the forward progress of its sibling.
If an application is cache sensitive or cannot tolerate variability, then the dedicated partition with simultaneous multi-threading disabled is recommended. In dedicated partitions, the entire processor is assigned to a partition. Processors are not shared with other partitions, and they are not scheduled by the POWER Hypervisor. Dedicated partitions must be explicitly created by the system administrator using the HMC.
3.3 Virtual Ethernet
Virtual Ethernet enables inter-partition communication without the need for physical network adapters assigned to each partition. Virtual Ethernet allows the administrator to define in-memory point-to-point connections between partitions. These connections exhibit similar characteristics as physical high-bandwidth Ethernet connections and supports multiple protocols (IPv4, IPv6, ICMP). Virtual
66 IBM Eserver p5 590 and 595 System Handbook
Ethernet requires a IBM Sserver p5 590 and 595 system with either AIX 5L Version 5.3 or the appropriate level of Linux and a Hardware Management Console to define the virtual Ethernet devices. Virtual Ethernet does not require the Advanced POWER Virtualization feature because it is a function of the POWER Hypervisor.
Due to the number of partitions possible on many systems being greater than the number of I/O slots, virtual Ethernet is a convenient and cost saving option to enable partitions within a single system to communicate with one another through a VLAN. The VLAN creates logical Ethernet connections between one or more partitions and is designed to help avoid a failed or malfunctioning operating system from being able to impact the communication between two functioning operating systems. The virtual Ethernet connections may also be
bridged to an
external network to permit partitions without physical network adapters to communicate outside of the server.
The concepts of implementing virtual Ethernet are categorized in the following sections:
򐂰 Section 3.3.1, “Virtual LAN” on page 66 򐂰 Section 3.3.2, “Virtual Ethernet connections” on page 70 򐂰 Section 3.3.3, “Dynamic partitioning for virtual Ethernet devices” on page 72 򐂰 Section 3.3.4, “Limitations and considerations” on page 72
3.3.1 Virtual LAN
This section will discuss the concepts of Virtual LAN (VLAN) technology with specific reference to its implementation within AIX.
Virtual LAN overview
Virtual LAN (VLAN) is a technology used for establishing virtual network segments on top of physical switch devices. If configured appropriately, a VLAN definition can straddle multiple switches. Typically, a VLAN is a broadcast domain that enables all nodes in the VLAN to communicate with each other without any L3 routing or inter-VLAN bridging. In Figure 3-4, two VLANs (VLAN 1 and 2) are defined on three switches (Switch A, B, and C). Although nodes C-1 and C-2 are physically connected to the same switch C, traffic between two nodes can be blocked. To enable communication between VLAN 1 and 2, L3 routing or inter-VLAN bridging should be established between them; this is typically provided by an L3 device.
Chapter 3. POWER5 virtualization capabilities 67
Figure 3-4 Example of a VLAN
The use of VLAN provides increased LAN security and flexible network deployment over traditional network devices.
AIX 5L Version 5.3 virtual LAN support
Some of the various technologies for implementing VLANs include:
򐂰 Port-based VLAN 򐂰 Layer 2 VLAN 򐂰 Policy-based VLAN 򐂰 IEEE 802.1Q VLAN
VLAN support in AIX 5L Version 5.3 is based on the IEEE 802.1Q VLAN implementation. The IEEE 802.1Q VLAN is achieved by adding a VLAN ID tag to
68 IBM Eserver p5 590 and 595 System Handbook
an Ethernet frame, and the Ethernet switches restricting the frames to ports that are authorized to receive frames with that VLAN ID. Switches also restrict broadcasts to the logical network by ensuring that a broadcast packet is delivered to all ports which are configured to receive frames with the VLAN ID that the broadcast frame was tagged with.
A port on a VLAN capable switch has a default PVID (Port virtual LAN ID) that indicates the default VLAN the port belongs to. The switch adds the PVID tag to untagged packets that are received by that port. In addition to a PVID, a port may belong to additional VLANs and have those VLAN IDs assigned to it that indicates the additional VLANs the port belongs to.
A port will only accept untagged packets or packets with a VLAN ID (PVID or additional VIDs) tag of the VLANs the port belongs to. A port configured in the untagged mode is only allowed to have a PVID and will receive untagged packets or packets tagged with the PVID. The untagged port feature helps systems that do not understand VLAN tagging communicate with other systems using standard Ethernet.
Each VLAN ID is associated with a separate Ethernet interface to the upper layers (for example IP) and creates unique logical Ethernet adapter instances per VLAN (for example ent1 or ent2).
You can configure multiple VLAN logical devices on a single system. Each VLAN logical devices constitutes an additional Ethernet adapter instance. These logical devices can be used to configure the same Ethernet IP interfaces as are used with physical Ethernet adapters.
VLAN communication by example
This section discusses how VLAN communication between partitions and with external networks works in more detail using the sample configuration in Figure 3-5. The configuration is using four client partitions (Partition 1 - Partition
4) and one Virtual I/O Server. Each of the client partitions is defined with one virtual Ethernet adapter. The Virtual I/O Server has a Shared Ethernet Adapter which bridges traffic to the external network. The Shared Ethernet Adapter will be introduced in more detail in 3.4, “Shared Ethernet Adapter” on page 73.
Chapter 3. POWER5 virtualization capabilities 69
Figure 3-5 VLAN configuration
Interpartition communication
Partition 2 and Partition 4 are using the PVID (Port virtual LAN ID) only. This means that:
򐂰 Only packets for the VLAN specified as PVID are received 򐂰 Packets sent are added a VLAN tag for the VLAN specified as PVID by the
virtual Ethernet adapter
In addition to the PVID the virtual Ethernet adapters in Partition 1 and Partition 3 are also configured for VLAN 10 using specific network interface (en1) create through smitty vlan. This means that:
򐂰 Packets sent through network interfaces en1 are added a tag for VLAN 10 by
the network interface in AIX 5L.
򐂰 Only packets for VLAN 10 are received by the network interfaces en1. 򐂰 Packets sent through en0 are automatically tagged for the VLAN specified as
PVID.
򐂰 Only packets for the VLAN specified as PVID are received by the network
interfaces en0.
Table 3-2 lists which client partition can communicate which each other through what network interfaces.
70 IBM Eserver p5 590 and 595 System Handbook
Table 3-2 Interpartition VLAN communication
Communication with external networks
The Shared Ethernet Adapter is configured with PVID 1 and VLAN 10. This means that untagged packets that are received by the Shared Ethernet Adapter are tagged for VLAN 1. Handling of outgoing traffic depends on the VLAN tag of the outgoing packets.
򐂰 Packets tagged with the VLAN which matches the PVID of the Shared
Ethernet Adapter are untagged before being sent out to the external network
򐂰 Packets tagged with a VLAN other than the PVID of the Shared Ethernet
Adapter are sent out with the VLAN tag unmodified.
In our example, Partition 1 and Partition 2 have access to the external network through network interface en0 using VLAN 1. As these packets are using the PVID the Shared Ethernet Adapter will remove the VLAN tags before sending the packets to the external network.
Partition 1 and Partition 3 have access to the external network using network interface en1 and VLAN 10. This packets are sent out by the Shared Ethernet Adapter with the VLAN tag; therefore, only VLAN capable destination devices will be able to receive the packets. Table 3-3 lists this relationship.
Table 3-3 VLAN communication to external network
3.3.2 Virtual Ethernet connections
Virtual Ethernet connections supported in POWER5 systems use VLAN technology to ensure that the partitions can only access data directed to them.
VLAN Partition / Network interface
1 Partition 1 / en0
Partition 2 / en0
2 Partition 3 / en0
Partition 4 / en0
10 Partition 1 / en1
Partition 3 / en1
VLAN Partition / Network interface
1 Partition 1 / en0
Partition 2 / en0
10 Partition 1 / en1
Partition 3 / en1
Chapter 3. POWER5 virtualization capabilities 71
The POWER Hypervisor provides a virtual Ethernet switch function based on the IEEE 802.1Q VLAN standard that allows partition communication within the same server. The connections are based on an implementation internal to the Hypervisor that moves data between partitions. This section will describe the various elements of a virtual Ethernet and implications relevant to different types of workloads. Figure 3-6 is an example of an inter-partition VLAN.
Figure 3-6 Logical view of an inter-partition VLAN
Virtual Ethernet adapter concepts
Partitions that communicate through a virtual Ethernet channel will need to have an additional in-memory channel. This requires the creation of an in-memory channel between partitions on the HMC. The kernel creates a virtual device for each memory channel indicated by the firmware. The AIX 5l configuration manager creates the device special files. A unique Media Access Control (MAC) address is also generated when the virtual Ethernet device is created. A
prefix
value can be assigned for the system so that the generated MAC addresses in a system consists of a common system prefix, plus an algorithmically-generated unique part per adapter.
72 IBM Eserver p5 590 and 595 System Handbook
The virtual Ethernet can also be used as a bootable device to allow such tasks as operating system installations to be performed using Network Installation Management (NIM).
Performance considerations
The transmission speed of virtual Ethernet adapters is in the range of 1-3 Gigabits per second, depending on the transmission (MTU) size. A partition can support up to 256 virtual Ethernet adapters with each virtual Ethernet capable to be associated with up to 21 VLANs (20 VID and 1 PVID).
The virtual Ethernet connections generally take up more processor time than a local adapter to move a packet (DMA versus copy). For shared processor partitions, performance will be gated by the partition definitions (for example entitled capacity and number of processors). Small partitions communicating with each other will experience more packet latency due to partition context switching. In general, high bandwidth applications
should not be deployed in
small shared processor partitions. For dedicated partitions, throughput
should be
comparable to a 1 Gigabit Ethernet for small packets providing much better performance than 1 Gigabit Ethernet for large packets. For large packets, the virtual Ethernet communication is copy bandwidth limited.
For more detailed information relating to virtual Ethernet performance considerations refer to the following publication:
򐂰 Advanced POWER Virtualization on IBM Sserver p5 Servers Architecture
and Performance Considerations, SG24-5768.
3.3.3 Dynamic partitioning for virtual Ethernet devices
Virtual Ethernet resources can be assigned and removed dynamically. On the HMC, virtual Ethernet target and server adapters can be assigned and removed from a partition using dynamic logical partitioning. The mapping between physical and virtual resources on the Virtual I/O Server can also be done dynamically.
3.3.4 Limitations and considerations
The following are limitations that must be considered when implementing an virtual Ethernet:
򐂰 A maximum of up to 256 virtual Ethernet adapters are permitted per partition. 򐂰 Virtual Ethernet can be used in both shared and dedicated processor
partitions provided the partition is running AIX 5L Version 5.3 or Linux with the appropriate kernel or a kernel that supports virtualization.
Chapter 3. POWER5 virtualization capabilities 73
򐂰 A mixture of virtual Ethernet connections, real network adapters, or both are
permitted within a partition.
򐂰 Virtual Ethernet can only connect partitions within a single system. 򐂰 Virtual Ethernet uses the system processors for all communication functions
instead of off loading that load to processors on network adapter cards. As a result there is an increase in system processor load generated by the use of virtual Ethernet.
3.4 Shared Ethernet Adapter
A Shared Ethernet Adapter can be used to connect a physical Ethernet to the virtual Ethernet. It also provides the possibility for several client partitions to share one physical adapter.
The following sections discuss the various aspects of Shared Ethernet Adapters such as:
򐂰 Section 3.4.1, “Connecting a virtual Ethernet to external networks” on
page 73
򐂰 Section 3.4.2, “Using Link Aggregation (EtherChannel) to external networks”
on page 77
򐂰 Section 3.4.3, “Limitations and considerations” on page 79
3.4.1 Connecting a virtual Ethernet to external networks
There are two ways you can connect the virtual Ethernet that enables the communication between logical partitions on the same server to an external network.
Routing
By enabling the AIX 5L Version 5.3 routing capabilities (ipforwarding network option) one partition with a physical Ethernet adapter connected to an external network can act as router. Figure 3-7 shows a sample configuration. In this type of configuration the partition that routes the traffic to the external work does not necessarily have to be the Virtual I/O Server as in the example below. It could be any partition with a connection to the outside world. The client partitions would have their default route set to the partition which routes traffic to the external network.
74 IBM Eserver p5 590 and 595 System Handbook
Figure 3-7 Connection to external network using AIX routing
Shared Ethernet Adapter
Using a Shared Ethernet Adapter (SEA) you can connect internal and external VLANs using one physical adapter. The Shared Ethernet Adapter hosted in the Virtual I/O Server acts as a layer 2 switch between the internal and external network.
Shared Ethernet Adapter is a new service that acts as a layer 2 network bridge to securely transport network traffic from a virtual Ethernet to a real network adapter. The Shared Ethernet Adapter service runs in the Virtual I/O Server. It cannot be run in a general purpose AIX 5L partition.
Shared Ethernet Adapter requires the POWER Hypervisor and Advanced POWER Virtualization features of POWER5 systems and therefore cannot be used on POWER4 systems. It also cannot be used with AIX 5L Version 5.2 because the device drivers for virtual Ethernet are only available for AIX 5L Version 5.3 and Linux. Thus there is no way to connect an AIX 5L Version 5.2 system to a Shared Ethernet Adapter.
The Shared Ethernet Adapter allows partitions to communicate outside the system without having to dedicate a physical I/O slot and a physical network adapter to a client partition. The Shared Ethernet Adapter has the following characteristics:
򐂰 Virtual Ethernet MAC addresses are visible to outside systems 򐂰 Broadcast/multicast is supported 򐂰 ARP and NDP can work across a shared Ethernet
Chapter 3. POWER5 virtualization capabilities 75
In order to bridge network traffic between the virtual Ethernet and external networks, the Virtual I/O Server has to be configured with at least one physical Ethernet adapter. One Shared Ethernet Adapter can be shared by multiple VLANs and multiple subnets can connect using a single adapter on the Virtual I/O Server. Figure 3-8 shows a configuration example. A Shared Ethernet Adapter can include up to 16 virtual Ethernet adapters that share the physical access.
In Figure 3-8 and Figure 3-9 on page 76, the acronym RIOA means real I/O adapter, and VIOA means the virtual I/O adapter.
Figure 3-8 Shared Ethernet Adapter configuration
In the LPAR profile for the VIO Server partition, the virtual Ethernet adapter which will be associated with the (physical) Shared Ethernet Adapter must have the trunk flag set. Once an Ethernet frame is sent from the virtual Ethernet adapter on a client partition to the POWER Hypervisor, the POWER Hypervisor searches for the destination MAC address within the VLAN. If no such MAC address exists within the VLAN, it forwards the frame to the trunk virtual Ethernet adapter that is defined on the same VLAN. The trunk virtual Ethernet adapter enables a layer 2 bridge to a physical adapter.
The shared Ethernet directs packets based on the VLAN ID tags. It learns this information based on observing the packets originating from the virtual adapters. One of the virtual adapters in the Shared Ethernet adapter is designated as the default PVID adapter. Ethernet frames without any VLAN ID tags are directed to this adapter and assigned the default PVID.
76 IBM Eserver p5 590 and 595 System Handbook
When the shared Ethernet receives IP (or IPv6) packets that are larger than the MTU of the adapter that the packet is forwarded through, either IP fragmentation is performed and the fragments forwarded or an ICMP packet too big message is returned to the source when the packet cannot be fragmented.
Theoretically, one adapter can act as the only contact with external networks for all client partitions. Depending on the number of client partitions and the network load they produce performance can become a critical issue. Because the Shared Ethernet Adapter is dependant on virtual I/O, it consumes processor time for all communications. A significant amount of CPU load can be generated by the use of virtual Ethernet and Shared Ethernet Adapter.
There are several different ways to configure physical and virtual Ethernet adapters into Shared Ethernet Adapters to maximize throughput.
򐂰 Using Link Aggregation (EtherChannel), several physical network adapter
can be aggregated. See 3.4.2, “Using Link Aggregation (EtherChannel) to external networks” on page 77 for more details.
򐂰 Using several Shared Ethernet Adapters provides more queues and more
performance. An example for this configuration is shown in Figure 3-9.
Other aspects which have to be taken into consideration are availability and the possibility to connect to different networks.
Figure 3-9 Multiple Shared Ethernet Adapter configuration
Chapter 3. POWER5 virtualization capabilities 77
3.4.2 Using Link Aggregation (EtherChannel) to external networks
Link aggregation is network port aggregation technology that allows several Ethernet adapters to be aggregated together to form a single pseudo Ethernet device. This technology can be used to overcome the bandwidth limitation of a single network adapter and avoid bottlenecks when sharing one network adapter amongst many client partitions.
For example, ent0 and ent1 can be aggregated to ent3. Interface en3 would then be configured with an IP address. The system considers these aggregated adapters as one adapter. Therefore, IP is configured as on any other Ethernet adapter. In addition, all adapters in the link aggregation are given the same hardware (MAC) address, so they are treated by remote systems as though they were one adapter. The main benefit of link aggregation is that they have the network bandwidth of all of their adapters in a single network presence. If an adapter fails, the packets are automatically sent on the next available adapter without disruption to existing user connections. The adapter is automatically returned to service on the link aggregation when it recovers.
You can use EtherChannel or IEEE 802.3ad Link Aggregation
to aggregate
network adapters. While EtherChannel is an AIX 5L Version 5.3 specific implementation of adapter aggregation, Link Aggregation follows the IEEE
802.3ad standard. Table 3-4 shows the main differences between EtherChannel and Link Aggregation.
Table 3-4 EtherChannel and Link Aggregation
The main benefit of using Link Aggregation is, that if the switch supports the Link
Aggregation Control Protocol
(LACP) no special configuration of the switch ports is required. The benefit of EtherChannel is the support of different packet distribution modes. This means it is possible to influence the load balancing of the aggregated adapters. In the remainder of this document, we will use Link Aggregation where possible since that is considered a more universally understood term.
EtherChannel IEEE 802.3ad Link Aggregation
Requires switch configuration Little, if any, configuration of switch
required to form aggregation. Some initial setup of the switch may be required.
Supports different packet distribution modes
Supports only standard distribution mode
Note: Only outgoing packets are subject to the following discussion, incoming packets are distributed by the Ethernet switch.
78 IBM Eserver p5 590 and 595 System Handbook
Standard distribution mode selects the adapter for the outgoing packets by algorithm. The adapter selection algorithm uses the last byte of the destination IP address (for TCP/IP traffic) or MAC address (for ARP and other non-IP traffic). Therefore all packets to a specific IP-address will always go through the same adapter. There are other adapter selection algorithms based on source, destination, or a combination of source and destination ports available. EtherChannel provides one further distribution mode called
round robin. This
mode will rotate through the adapters, giving each adapter one packet before repeating. The packets may be sent out in a slightly different order than they were given to the EtherChannel. It will make the best use of its bandwidth, but consider that it also introduces the potential for out-of-order packets at the receiving system. This risk is particularly high when there are few, long-lived, streaming TCP connections. When there are many such connections between a host pair, packets from different connections could be intermingled, thereby decreasing the chance of packets for the same connection arriving out-of-order.
To avoid the loss of network connectivity by switch failure, EtherChannel and Link Aggregation can provide a backup adapter. The backup adapter should be connected to a different switch than the adapter of the aggregation. Now in case of switch failure the traffic can be moved with no disruption of user connections to the backup adapter.
Figure 3-10 shows the aggregation of three plus one adapters to a single pseudo Ethernet device including a backup feature.
Figure 3-10 Link Aggregation (EtherChannel) pseudo device
Loading...