For More Information .............................................................................................................51
2BladeSymphony 1000 Architecture White Paperwww.hitachi.com
Chapter 1
Introduction
Executive Summary
Blade servers pack more compute power into a smaller space than traditional rack-mounted servers.
This capability makes them an attractive alternative for consolidating servers, balancing or optimizing
data center workloads, or simply running a wide range of applications at the edge or the Web tier.
However, concerns about the reliability, scalability, power consumption, and versatility of conventional
blade servers keeps IT managers from adopting them in the enterprise data center. Many IT
professionals believe that blade servers are not intended for mission-critical applications or computeintensive workloads.
Leveraging their vast experience in mainframe systems, Hitachi set out to design a blade system that
overcomes these perceptions. The result is BladeSymphony® 1000, the first true enterprise-class
blade server. The system combines Virtage embedded virtualization technology, a choice of industrystandard Intel® processor-based blade servers, integrated management capabilities, and powerful,
reliable, scalable system resources — enabling companies to consolidate infrastructure, optimize
workloads, and run mission-critical applications in a reliable, scalable environment.
For organizations interested in reducing the cost, risk, and complexity of IT infrastructure — whether at
the edge of the network, the application tier, the database tier — or all three — BladeSymphony 1000 is
a system that CIOs can rely on.
Introducing BladeSymphony 1000
BladeSymphony 1000 provides enterprise-class service levels and unprecedented configuration
flexibility using open, industry-standard technologies. BladeSymphony 1000 overcomes the constraints
of previous-generation blade systems to deliver new capabilities and opportunities in the data center.
Blade systems were originally conceived as a means of increasing compute density and saving space
in overcrowded data centers. They were intended primarily as a consolidation platform. A single blade
enclosure could provide power, cooling, networking, various interconnects and management, and
individual blades could be added as needed to run applications and balance workloads. Typically blade
servers have been deployed at the edge or the Web tier and used for file-and-print or other non-critical
applications.
However, blade servers are not yet doing all they are capable of in the enterprise data center. The
perception persists that they are not ready for enterprise-class workloads. Many people doubt that
blade servers can deliver the levels of reliability, scalability, and performance needed to meet the most
stringent workloads and service-level agreements, or that they are open and adaptable enough to keep
pace with fast-changing business requirements.
1. This section and other sections of this chapter draw on content from “2010 Winning IT Management Strategy,” by Nikkei
Solutions Business, published by Nikkei BP, August 2006.
BladeSymphony 1000 (Figure 1) is the first blade system designed specifically for enterprise-class,
mission-critical workloads. It is a 10 rack unit (RU) system that combines Hitachi’s Virtage embedded
virtualization technology, a choice of Intel Dual-Socket, Multi-Core Xeon and/or Intel Dual-Core Itanium
Server Blades (running Windows or Linux), centralized management capabilities, high-performance
I/O, and sophisticated reliability, availability, and serviceability (RAS) features.
Figure 1. BladeSymphony 1000 front view
Enterprise-Class Capabilities
With BladeSymphony 1000, it is now possible for organizations to run mission-critical applications and
consolidate systems and workloads with confidence — at the edge, the application tier, the database
tier, or all three. BladeSymphony 1000 allows companies to run any type of workload with enterpriseclass performance, reliability, manageability, scalability, and flexibility. For example:
• BladeSymphony 1000 can be deployed at the edge tier — similar to dual-socket blade and rack
server offerings from Dell, HP, IBM, and others — but with far greater reliability and scalability than
competitive systems.
• BladeSymphony 1000 can be deployed at the application tier — similar to quad-socket blade server
offerings from HP and IBM, but with greater reliability and scalability.
• BladeSymphony 1000 ideal for the database tier — similar to the IBM p-Series or HP rack-mount
servers, but with a mainframe-class virtualization solution.
Designed with to be the first true enterprise-class blade server, the BladeSymphony 1000 provides
outstanding levels of performance, scalability, reliability, and configuration flexibility.
Performance — BladeSymphony 1000 supports both Intel Dual-Core Itanium and Dual-Core or
•
Quad-Core Xeon processors in the same chassis. Utilizing Intel Itanium processors, it delivers 64-bit
processing and large memory capacity (up to 256 GB) in an SMP configuration, as well as single
Intel Xeon blade configurations, allowing organizations to optimize for 64-bit or 32-bit workloads and
run all applications at extremely high performance. BladeSymphony 1000 also delivers large I/O
capacity for high throughput.
Scalability — BladeSymphony 1000 is capable of scaling out to eight Intel Dual-Core Itanium
•
processor-based server blades in the same chassis, or scaling up to two 16 core SMP servers with
Intel Dual-Core Itanium processor-based server blades.
•
based server blades, Windows and/or Linux, and industry-standard, best-of-class PCI cards
(PCI-X and PCI Express), providing flexibility and investment protection. The system is extremely
expandable in terms of processor cores, I/O slots, memory, and other components.
Data Center Applications
With its enterprise-class features, BladeSymphony 1000 is an ideal platform for a wide range of data
center scenarios, including:
•
Consolidation — BladeSymphony 1000 is an excellent platform for server and application
consolidation because it is capable of running 32-bit and 64-bit applications on Windows or Linux,
with enterprise-class performance, reliability, and scalability.
Workload Optimization — BladeSymphony 1000 runs a wide range of compute-intensive
•
workloads on either/both Windows and Linux, making it possible to balance the overall data center
workload quickly and without disruption or downtime.
Resource Optimization — BladeSymphony 1000 enables the IT organization to increase
•
utilization rates for expensive resources such as processing power, making it possible to fine-tune
capacity planning and delay unnecessary hardware purchases.
•
Reduce Cost, Risk, and Complexity — With BladeSymphony 1000, acquisition costs are
lower than traditional rack-mount servers. Enterprises can scale up on demand in fine-grained
increments, limiting capital expenditures. BladeSymphony 1000 also reduces the risk of downtime
with built-in sophisticated RAS features. And with support for industry standards such as Windows
and Linux, Itanium and Xeon processors, and PCI-X and PCI Express (PCIe) I/O modules,
BladeSymphony 1000 is designed for future and protects previous investments in technology.
BladeSymphony 1000 features a very modular design to maximize flexibility and reliability. System
elements are redundant and hot-swappable so the system can be easily expanded without downtime
or unnecessary disruption to service levels. The key components of the system, illustrated in Figure 2,
consist of:
• Server Blades — Up to eight depending on module, available with Intel Xeon or Itanium processors
• Storage Modules — up to two modules supporting either three or six SCSI drives
• I/O Modules — available with PCI-X slots, PCIe slots, or Embedded Fibre Channel Switch, up to two
modules per chassis
• Small footprint chassis containing a passive backplane — eliminates a number of FC and network
cables
• Redundant Power Modules — up to four hot-swap (2+1 or 2+2) modules per chassis for high
reliability and availability
• Redundant Cooling Fan Modules — four hot-swap (3+1) per chassis standard configuration for high
reliability and availability
• Switch & Management Modules — hot-pluggable system management board, up to two modules
per system for high reliability and availability
Cooling fan module
I/O module
Backplane
Server blade
Power module
Storage module
Switch &
management
module
Figure 2. Key BladeSymphony 1000 components
The server blades and I/O modules are joined together through a high speed backplane. Two types of
server blades are available: Intel Xeon Server Blade and Intel Itanium Server Blade. A 10 RU
BladeSymphony 1000 server chassis can accommodate eight server blades of these types. It can also
accommodate a mixture of server blades, as well as storage modules. In addition, multiple Intel
Itanium Server Blades can be combined to build multiple Symmetric Multi Processor (SMP)
configurations. Figure 3 shows a logical diagram of modules interconnecting on the backplane for a
possible configuration with one SMP server and one Intel Xeon server, as well as various options for
hard drive and I/O modules.
Figure 3. Logical components of the BladeSymphony 1000
The following chapters detail the major components of BladeSymphony 1000, as well as management
software and Virtage embedded virtualization technology.
• “Intel Itanium Server Blade” on page 8 provides details on the Intel Itanium Server Blades and how
they can be combined to create SMP systems up of to 16 cores and 256 GB of memory.
• “Intel Xeon Server Blade” on page 20 provides details on the Intel Xeon Server Blades.
• “I/O Sub System” on page 26 provides details on the PCI-X, PCIe, and Embedded Fibre Channel
Switch modules.
• “Chassis, Power, and Cooling” on page 36 provides details on the two chassis models, as well as
Power and Cooling Fan Modules.
• “Reliability and Serviceability Features” on page 39 discusses the various reliability, availability, and
serviceability features of the BladeSymphony 1000.
• “Management Software” on page 45 discuss software management features.
• “Virtage” on page 48 provides technical details on Virtage embedded virtualization technology.
The BladeSymphony 1000 can support up to eight blades for a total of up to 16 Itanium CPU sockets,
or 32 cores, running Microsoft Windows or Linux. Up to four Intel Itanium Server Blades can be
connected via the high-speed backplane to form a high-performance SMP server of up to 16 cores.
Each Intel Itanium Server Blade, illustrated in Figure 4, includes 16 DDR2 main memory slots. Using
4 GB DIMMs, this equates to 64 GB per server blade (16 GB per core) or 256 GB in a 16 core SMP
configuration, making it an ideal candidate for large in-memory databases and very large data sets.
Each server blade also includes two gigabit Ethernet ports, which connect to the internal gigabit
Ethernet switch in the chassis, as well as two front-side accessible USB 1.1 ports for local media
connectivity and one RS-232 port for debugging purposes.
Figure 4. Intel Itanium Server Blade
Intel Itanium Server Blades include the features listed in Table 1.
Table 2: Main components of the Intel Itanium Server Blade
ComponentManufacturerQuantityDescription
BridgeIntel1PCIe to PCI-X bridge
South BridgeIntel1South bridge — connects legacy devices
SIOSMSC1Super I/O chip — contains the COM port
and other legacy devices
FW ROMATMEL/
STMicro
Gigabit
Ethernet
USB controllerVIA1Compatible to UHCI and EHCI
BMCRenesas1Management processor
BMC SRAMRenesas2 MB, with
FPGAXilinx1Controls the BMC bus, decodes addresses
Flash ROMFujitsu16 MBBacks up BMC codes and SAL
BUS SW
switching over
BMC-SVP
Intel1Gigabit Ethernet interface controller, two
8 MBA flash ROM storing the images of system
firmware
Also used as NVRAM under the control of
the system firmware
ports, SerDes connection
Wake on LAN supported
TagVLAN supported
PXE Boot supported
Main memory for management processor
parity
and functions as a bridge for LPC
1Reserved for the SVP duplex (future)
Intel Itanium Processor 9100 Series
The Dual-Core Intel Itanium 9100 series 64-bit processor delivers scalable performance with two highperformance cores per processor, memory addressability up to 1024 TB, 24 MB of on-die cache, and a
667 MHz front-side bus. It also includes multi-threading capability (two threads per core) and support
for virtualization in the silicon.
Explicitly Parallel Instruction Computing (EPIC) technology is designed to enable parallel throughput on
a enormous scale, with up to six instructions per clock cycle, large execution resources
(128 general-purpose registers, 128 floating point registers and 8 branch registers) and advanced
capabilities for optimizing parallel throughput.
The processors deliver mainframe-class reliability, availability, and serviceability features with advanced
error detection and correction and containment across all major data pathways and the cache
subsystem. They also feature integrated, standards-based error handling across hardware, firmware,
and the operating system.
The Intel Itanium is optimized for dual processor-based platforms and clusters and includes the
following features:
• Wide, parallel hardware based on Itanium architecture for high performance
– Integrated on-die cache of up to 24 MB, cache hints for L1, L2, and L3 caches for reduced
memory latency
– 128 general and 128 floating-point registers supporting register rotation
– Register stack engine for effective management of processor resources
– Support for predication and speculation
• Extensive RAS features for business-critical applications
– Full SMBus compatibility
– Enhanced machine check architecture with extensive ECC and parity protection
– Enhanced thermal management
– Built-in processor information ROM (PIROM)
– Built-in programmable EEPROM
– Socket Level Lockstep
– Core Level Lockstep
• High bandwidth system bus for multiprocessor scalability
– 6.4 GB/sec. bandwidth
– 28-bit wide data bus
– 400 MHz and 533 data bus frequency
– 50-bits of physical memory addressing and 64-bits of virtual addressing
• Two complete 64-bit processing cores on one chip running at 104W
Cache
The processor supports up to 24 MB (12 MB per core) of low-latency, on-die L3 cache (14 cycles)
providing 102 GB/sec. aggregate bandwidth to the processor cores. It also include separate 16 KB
Instruction L1 and 16 KB Data L1 cache per core, as well as separate 1 MB Instruction L2 and 256 KB
Data L2 cache per core for higher speed and lower latency memory access.
Hyper-Threading Technology
Hyper-Threading Technology (HT Technology) enables one physical processor to transparently appear
and behave as two virtual processors to the operating system. With HT Technology, one dual-core
processor is able to simultaneously run four software threads. HT Technology provides thread-level
parallelism on each processor, resulting in more efficient use of processor resources, higher processing
throughput, and improved performance on multi threaded software, as well as increasing the number of
users a server can support. In order to leverage HT Technology, SMP support in the operating system is
required.
Intel Cache Safe Technology and Enhanced Machine Check Architecture
Intel Cache Safe Technology is an automatic cache recovery capability that allows the processor and
server to continue normal operation in case of cache error. It automatically disables cache lines in the
event of a cache memory error, providing higher levels of uptime.
Enhanced Machine Check Architecture provides extensive error detection and address/data path
correction capabilities, as well as system-wide ECC protection. It detects bit-level errors and manages
data corruption, thereby providing better reliability and uptime.
Intel VT Virtualization Technology
The Dual-Core Intel Itanium processor includes hardware-assisted virtualization support that helps
increase virtualization efficiency and broaden operating system compatibility. Intel Virtualization
Technology (Intel VT) enables one hardware platform to function as multiple virtual platforms.
Virtualization solutions enhanced by Intel VT allow a software hypervisor to concurrently run multiple
operating systems and applications in independent partitions.
Demand Based Switching
The Demand Based Switching (DBS) function reduces power consumption by enabling the processor
to move to power-saving mode when under a low system load. The DBS function must be supported
by the operating system.
Hitachi Node Controller
The Hitachi Node Controller controls various kinds of system busses, including the front side bus (FSB),
a PCIe link, and the node link. The Hitachi Node Controller is equipped with three node link ports to
combine up to four server blades. The server blades connect to each other through the node link,
maintain cache coherence collectively, and can be combined to form a ccNUMA type multiprocessor
configuration. The Hitachi Node Controller is connected to memory modules through memory
controllers.
The Hitachi Node Controller provides the interconnection between the two processors, two memory
controllers, three PCI bus interfaces, and connection to up to three other Intel Itanium Server Blades.
Three x 5.3 GB/sec. links can connect up to three other Intel Itanium Server Blades over the backplane
in order to provide 8, 12, or 16 core SMP capabilities. These direct connections provide a distinct
performance advantage by eliminating the need for a cross bar switch found in most SMP system
designs, which reduces memory access latency across server blades.
The Hitachi Node Controller is equipped with three PCIe ports to connect to I/O devices. Two of the
PCIe ports are used to connect to the I/O modules. The remaining port connects to an onboard I/O
device installed on the server blade, which serves a gigabit Ethernet controller, USB controller, and
COM ports.
The Hitachi Node Controller is designed for high performance processors and memory. Throughput
numbers to the processors, memory, and other nodes are listed in Table 3.
Table 3: Bus throughput from the Hitachi Node Controller
Table 3: Bus throughput from the Hitachi Node Controller
Priority
of DIMM- set
mounting
4
3
2
1
Row 45
Row 67
Row 01
Row 23
Logical
Name of
DIMM sets
72 bit
DIMM1A0
DIMM1A0
DIMM1B0
DIMM1B0
DIMM1C0
DIMM1C0
DIMM1D0
DIMM1D0
DIMM1A1
DIMM1A1
DIMM1B1
DIMM1B1
DIMM1C1
DIMM1C1
DIMM1D1
DIMM1D1
MC1
MC1
DIMM0A0
DIMM0A0
DIMM0B0
DIMM0B0
DIMM0C0
DIMM0C0
DIMM0A1
DIMM0A1
DIMM0B1
DIMM0B1
DIMM0C1
DIMM0C1
MC0
MC0
NDC
NDC
72 bit72 bit72 bit72 bit72 bit72 bit72 bit
72 bit72 bit
DIMM0D0
DIMM0D0
DIMM0D1
DIMM0D1
BusThroughput
Connection between nodes400 MHz FSB = 4.8 GB/sec.
667 MHz FSB = 5.3 GB/sec.
Baseboard Management Controller
The Baseboard Management Controller (BMC) is the main controller for Intelligent Platform
Management Interface (IPMI), a common interface to hardware and firmware used to monitor system
health and manage the system. The BMC manages the interface between system management
software and the hardware in the server blade. It is connected to the service processor (SVP) inside the
Switch & Management Module. The BMC and SVP cooperate with each other to control and monitor
the entire system. Sensors built into the system report to the BMC on different parameters such as
temperature, cooling fan speeds, power, mode, OS status, etc. The BMC can send alerts to the system
administrator if the parameters vary from specified preset limits, indicating a potential failure of a
component or the system.
Memory System
Intel Itanium Server Blades are equipped with 16 DIMM slots, which support Registered DDR2-400
SDRAM in 512 MB, 1 GB, 2 GB, and 4 GB (DDR2-533) for a total of up to 64 GB per server blade, or
16 GB per core. The memory system is designed to control a set of four DIMMs for the ECC and the
memory-device replacing function. Accordingly, if DIMMs are added, they must be arranged in four
DIMM units. The different DIMMs in each row can be used logically as shown in Figure 5.
Figure 5. Memory configuration
The memory system of the Intel Itanium Server Blade includes several RAS features:
• ECC protection (S2EC-D2ED) — Detects an error in any two sets of consecutive two bits and
corrects errors in any one set of consecutive two bits.
• ECC — The ECC can correct an error in consecutive four bits in any four DIMM set (i.e., a fault in one
DRAM device). This function is equivalent to technology generally referred to as Chipkill and allows
the contents of memory to be reconstructed even if one chip completely fails. The concept is similar
to the way RAID protects content on disk drives.
• Memory device replacing function — The NDC and MC have a function to replace a faulty DRAM
device with a normal spare one assisted by the System Abstraction Layer (SAL) firmware. This keeps
the ECC function (S2EC-D2ED) operating. It can replace up to two DRAM devices in any one set of
four DIMMs.
While dual processors systems are now common place, increasing the number of processors/sockets
beyond two poses many challenges in computer design, particularly in the memory system. As
processors are added to a system the amount of contention for memory access quickly increases to
the point where the intended throughput improvement of more processors is significantly diminished.
The processors spend more time waiting for data to be supplied from memory than performing useful
computing tasks. Conventional uniform memory systems are not capable of scaling to larger numbers
of processors due to memory bus contention. Traditional large SMP systems introduce cross bar
switches in order to overcome this problem. However, this approach adds to the memory hierarchy,
system complexity, and physical size of the system. SMP systems typically do not possess the
advantages of blade systems, e.g., compact packaging and flexibility.
Leveraging their extensive mainframe design experience, Hitachi employs a number of advanced
design techniques to create a blade-based SMP system, allowing the BladeSymphony 1000 to scale
up to an eight socket, 16 core system with as much as 256 GB of memory. The heart of the design is
the Hitachi custom designed Node Controller, which effectively breaks a large system into smaller, more
flexible nodes or server blades in blade format. These server blades can act as complete, independent
systems or up to four server blades can be connected to form a single, efficient multi-processor
system, as illustrated in Figure 6.
Figure 6. Hitachi Node Controller connects multiple server blades
By dividing the SMP system across several server blades, the memory bus contention problem is
solved by virtue of the distributed design. A processor’s access to its on-board memory incurs no
penalty. The two processors (four cores) can access up to 64 GB at the full speed of local memory.
When a processor needs data that is not contained in its locally attached memory, its node controller
needs to contact the appropriate other node controller to retrieve the data. The latency for retrieving
that data is therefore higher than retrieving data from local memory. Since remote memory takes longer
to access, this is known as a non-uniform memory architecture (NUMA). The advantage of using nonuniform memory is the ability to scale to a larger number of processors within a single system image
while still allowing for the speed of local memory access.
While there is a penalty for accessing remote memory, a number of operating systems are enhanced to
improve the performance of NUMA system designs. These operating systems take into account where
data is located when scheduling tasks to run on CPUs, using the closest CPU where possible. Some
operating systems are able to rearrange the location of data in memory to move it closer to the
processors where its needed. For operating systems that are not NUMA aware, the BladeSymphony
1000 offers a number of memory interleaving options that can improve performance.
The Node Controllers can connect to up to three other Node Controllers providing a point-to-point
connection between each Node Controller. The advantage of the point-to-point connections is it
eliminates a bus, which would be prone to contention, and eliminates the cross bar switch, which
reduces contention as a bus, but adds complexity and latency. A remote memory access is streamlined
because it only needs to pass through the two Node Controllers, this provides less latency when
compared to other SMP systems.
BladeSymphony 1000 supports two socket (four-core) Intel Itanium Server Blades that can be scaled to
offer up to two 16 core servers in a single chassis or eight four core servers, or a mixture of SMP and
single module systems, thus reducing footprint and power consumption while increasing utilization and
flexibility. SMP provides higher performance for applications that can utilize large memory and multiple
processors, such as large databases or visualization applications.
The maximum SMP configuration supported by BladeSymphony 1000 is:
• Four Dual Core Intel Itanium Server Blades for a total of 16 CPU cores
• 256 GB memory (64 GB per server blades x 4)
• Eight gigabit NICs (2 on-board per server blade) connected to two internal gigabit Ethernet switches
• Eight PCI-X slots (or 16 PCI-X slots with chassis B)
With its unique interconnect technology, BladeSymphony 1000 delivers a new level of flexibility in
adding computing resources to adapt to changing business needs. BladeSymphony 1000 can address
scalability requirements by scaling-out (horizontally), or by scaling-up (vertically). Scaling out is ideally
suited to online and other front-end applications that can divide processing requirements across
multiple servers. Scaling out can also provide load-balancing capabilities and higher availability through
redundancy.
Figure 7. Scale-up capabilities
Scaling up is accomplished through SMP, shown in Figure 7. This approach is better suited to
enterprise-class applications requiring 64-bit processing, high computational performance, and large
memory addressability beyond that provided in a typical x86 environment. BladeSymphony 1000 SMP
Interconnect technology and blade form factor allow IT staff to manage scale-up operations on their
own, without a service call. The interconnect allows up to four server blades to be joined into a single
server environment composed of the total resources (CPU, memory, and I/O) resident in each module.
NUMA Architecture
The Intel Itanium Server Blade supports two memory interleave modes, full and non-interleave.
In full interleave mode, the additional latency in accessing memory on other server blades is averaged
across all memory, including local memory, to provide a consistent access time. In non-interleave
mode, a server blades has faster access to local memory than to memory on other server blades. Both
of these options are illustrated in Figure 8.