Abstract: Tuning the disk system and I/O paths can be key to
achieving maximum performance from your server. This paper
begins with definitions, guidelines, and recommendations for I/O
performance tuning. The last section of this paper provides actual
performance data to reinforce recommendations. In general, this
paper deals with universal I/O concerns and applies to a wide range
of server applications. The major areas examined in this paper are
• disk systems
• NICs
• memory
• system configuration
• performance monitoring tools.
Help us improve our technical communication. Let us know what you think
about the technical information in this document. Your feedback is valuable
and will help us structure future communications. Please send your
comments to: novell.feedback@compaq.com
I/O Performance Tuning of Compaq Servers2
Notice
The information in this publication is subject to change without notice and is provided “AS IS” WITHOUT
WARRANTY OF ANY KIND. THE ENTIRE RISK ARISING OUT OF THE USE OF THIS
INFORMATION REMAINS WITH RECIPIENT. IN NO EVENT SHALL COMPAQ BE LIABLE FOR
ANY DIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL, PUNITIVE OR OTHER DAMAGES
WHATSOEVER (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS
PROFITS, BUSINESS INTERRUPTION OR LOSS OF BUSINESS INFORMATION), EVEN IF
COMPAQ HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
The limited warranties for Compaq products are exclusively set forth in the documentation accompanying
such products. Nothing herein should be construed as constituting a further or additional warranty.
This publication does not constitute an endorsement of the product or products that were tested. The
configuration or configurations tested or described may or may not be the only available solution. This test
is not a determination or product quality or correctness, nor does it ensure compliance with any federal
state or local requirements.
Product names mentioned herein may be trademarks and/or registered trademarks of their respective
companies.
Netelligent, Armada, Cruiser, Concerto, QuickChoice, ProSignia, Systempro/XL, Net1, LTE Elite,
Vocalyst, PageMate, SoftPaq, FirstPaq, SolutionPaq, EasyPoint, EZ Help, MaxLight, MultiLock,
QuickBlank, QuickLock, UltraView, Innovate logo, Wonder Tools logo in black/white and color, and
Compaq PC Card Solution logo are trademarks and/or service marks of Compaq Computer Corporation.
Microsoft, Windows, Windows NT, Windows NT Server and Workstation, Microsoft SQL Server for
Windows NT are trademarks and/or registered trademarks of Microsoft Corporation.
NetWare and Novell are registered trademarks and intraNetWare, NDS, and Novell Directory Services are
trademarks of Novell, Inc.
Technical Guide prepared by OS Integration Engineering
First Edition (March 1999)
Document Number ECG044.0399
ECG044.0399
I/O Performance Tuning of Compaq Servers3
Executive Summary
In order to maximize your investment, it is crucial that you get the highest consistent performance
from your server. As the demands placed on your server grow, the hardware performance and
configuration efficiency must keep pace. However, poorly implemented expansion and upgrades,
such as, unbalanced busses and memory bottlenecks, can seriously degrade performance. For
example, simply adding or rearranging drives or adapters can, in some cases, increase the
throughput of your entire disk system by as much as 96%. While not all servers will realize gains
of this magnitude so easily, every server can be optimized to increase its functionality and
longevity.
Performance tuning may not only save money now, but may also prevent premature replacement
of viable equipment. Since technology budgets are limited, it is important to get the most out of
every investment, in terms of both performance and, especially, longevity. Less than optimal
server configuration can shorten server life cycles as network demands outpaces server
performance. In the meantime, your network clients will work less efficiently and become
frustrated as they deal with a slower network.
You can prevent the adverse effects of a slow network by making sure that your server is
delivering maximum performance. Compaq performs extensive tests to determine the optimal
configuration for different environments. The results of these tests can be invaluable tools in
configuring and tuning your server. Many of the recommendations can be implemented
immediately, some without any cost. All of these guidelines, however, prevent the maintenance
and frustrations caused by poor server performance.
By implementing the recommendations found in this paper, your server can immediately become
more responsive. The amount of maintenance and total cost of ownership required by your server
can also drop dramatically. In the long run, you will have earned a greater return on your
investment from both your server and clients.
ECG044.0399
I/O Performance Tuning of Compaq Servers4
Introduction
Perhaps the most common term when maximizing server performance is bottleneck. Invariably,
one component in a system will be stressed more than others and, as a result, the performance of
the entire system will be limited when the limits of that device have been reached. A bottleneck is
not always a single device, and in some cases, it’s not a device at all. Often, an entire subsystem
or simply a less than optimal configuration limits performance. Given any configuration, there are
bottlenecks and, usually, the limiting factor can not be eliminated completely. Maximizing
performance for a server involves careful analysis and reconfiguration to reduce the effect of the
bottlenecks.
In order to identify and reduce the impact of the most stressed component, you must know the
hardware present and normal operating conditions for your server. As you will see, an important
factor in tuning a server is the server mix. The server mix is the ratio that expresses the amount of
server requests that are:
• random vs. sequential
• small vs. large block transfers
• reads vs. writes
By understanding what data is requested,and how often, you can decide which components are
most likely to become a bottleneck in your environment. For instance, a lightly-loaded server that
performs all random, small file transfers will most likely incur a delay as the drives seek to find
the requested data; the bus on this server remains relatively idle. On the other hand, large,
contiguous file transfers need only position the drive heads initially to begin streaming data to the
client; these requests are more likely to run into a bus saturation limitation.
Perhaps a more obvious factor in server performance is the server itself. In every server
environment, there are variables that must be considered regardless of configuration and
operation. For instance, physical drive delays as the heads are positioned over the data are present
in every computer. Performance tuning seeks to reduce the effect of individual delays on overall
system performance. This document will give general recommendations for reducing the effect of
any given delay and the role that some of the new technology can play.
I/O Performance and Recommendations
Dynamic systems, such as a network, operate within a set of boundary conditions. These
boundary conditions describe the limitations imposed by the individual components of the
system. Optimization of your system for maximum performance involves
• Knowledge of the components and their relationships
• Discovery of current performance through measurements and analysis
• Adaptation to increase performance.
ECG044.0399
The operating realm is a solution of the boundary conditions. In a graphical form, as in Figure 1,
the solution for each operating parameter is a polygon. While the system can operate anywhere
within the polygon, the boundaries of the polygon are where the system usually operates. In
particular the modes of a solution are usually at an apex of the polygon.
I/O Performance Tuning of Compaq Servers5
Figure 1. Boundary Value Conditions Graphical Solution
In Figure 1, the Parameter axis could represent a complex, composite variable such as disk
systems or a simpler variable such as the type of disks used in a system. The performance axis is
usually the axis being optimized. The optimal solution in this example is apex 1, but operating at
2 is not to be ignored. The differences between these points could represent a 5% performance
and a 40% price difference.
PIn this paper, a flag
indicates a key point to a
performance recommendation.
performance. For example, if a SCSI bus is nearly fully populated, with all disk drives in a single
RAID array, sustained read performance will be only marginally better than the same bus with
fewer drives tuned more appropriately for the load. This example benefits more from splitting the
single SCSI bus into two or more busses than from adding more drives to an already saturated
bus. When tuning the I/O of a server, you seek to find the level of redundancy that provides
optimal performance in your server’s application.
Network Interface
In general, simply adding redundant instances of a system allows parallel
execution of that system’s task and can increase the effective performance of
that system. For instance, striping the data from a single drive onto two
physical devices can nearly double drive throughput in some cases.
Unfortunately, this is a situation of diminishing returns. In fact, adding more
than the optimal number of redundant devices can actually degrade
ECG044.0399
The key to realizing the optimal performance from your server is understanding the way in which
clients access your server. The mix of client access can vary widely from server to server, and
even from hour to hour on the same server. In some cases, clients may be requesting random,
scattered, smaller files from the server; as in a web-server. Other situations may ask the server to
retrieve large, contiguous graphic or CAD files. While still other servers will have to respond to a
I/O Performance Tuning of Compaq Servers6
PPP
great deal of write requests. In each of these situations, tuning the performance of the server
requires a different approach. As a result, knowing the kind of load, or server mix, that clients
place on your server is key to tuning your I/O.
On most local area networks, servers are connected to clients over an Ethernet network.
Currently, there are three Ethernet specifications:
• 10BASE-T provides each client with a shared resource with a maximum bandwidth of 10
Megabits per second (Mb/s) or 1 Megabyte per second (MB/s).
• 100BASE-T is 10 times faster than 10BASE-T, providing 10 MB/s of bandwidth.
• Gigabit Ethernet, the newest specification, can move approximately 120 MB of data per
second; 100 times faster than the original 10BASE-T specification.
Ideally, the server should be able to deliver the full 1 MB/s, 10 MB/s, or 100 MB/s to every
network segment, depending on the NICs used.
In practice, however, on multi-client networks with unswitched hubs and repeaters each client
shares the bandwidth resource with all other clients on the segment. To overcome this limitation,
intelligent network switches enable point to point communication between nodes. When using
network switches, each port on the switch is seen as the only client on that network segment. In
this case, your network maximizes throughput because collisions are eliminated. Switches allow
greater utilization of bandwidth, however the added cost of intelligent switches may outweigh the
performance gains in smaller, less stressed network segments.
When planning or implementing your network, be aware of utilization limitations. Either
distribute network traffic between separate network segments or use intelligent switches
to eliminate packet collisions and maximize throughput.
Peripheral Component Interconnect (PCI) Bus
The PCI bus is a high-performance, 32 or 64-bit local bus that provides an interface for highspeed transfers to peripheral components without taxing the host processor. A single 32-bit PCI
bus provides 133 MB/s of bandwidth between PCI devices and the host bus, which holds the CPU
and main memory. In order to provide greater performance and connectivity, Compaq ProLiant
servers (models 1200 to 7000) use a bridged-PCI architecture to connect multiple PCI busses
within the server. Compaq uses two different architectures to connect the two PCI busses to the
host bus: the dual-peer PCI bus and the bridged PCI bus.
In order to maintain maximum performance, Compaq recommends that you balance the
load according to the architecture in your server.
Bus Balancing in ProLiant Servers
Bridged PCI bus
In bridged PCI busses, such as the ProLiant 2500 and ProLiant 2500R, you should
populate the primary PCI bus completely before adding any adapters to the secondary
bus.
Because the secondary PCI bus shares the data path of the primary PCI bus, bus balancing is not
recommended in bridged-PCI servers. In fact, Compaq recommends that you have the primary
bus completely populated before adding cards to the secondary bus. Cards in the secondary bus
need to pass their control and data through the extra bridge chip and incur delays not only from
ECG044.0399
I/O Performance Tuning of Compaq Servers7
PCI Slot
P
the bridge chip itself, but also from any synchronization or contention with the primary bus.
Simply stated, I/O loads on the secondary bus are not handled as efficiently as equivalent loads
on the primary bus. When placing devices on the secondary bus, select the adapters with the
lightest I/O load.
Microprocessor
Motherboard
Chipset
133MB/s
Primary PCI Bus
PCI Slot
PCI Slot
PCI Slot
Figure 2. Bridged PCI Architecture
Memory
PCI Slot
System
133MB/s
PCI
Bridge
Chip
133MB/
Secondary PCI Bus
s
PCI Slot
PCI Slot
PCI Slot
Peer PCI bus
In Peer PCI busses, however, you should attempt to balance the I/O load between the
busses.
Since each 32-bit PCI bus can move 133 MB of data per second, making efficient use of both PCI
busses can deliver 266 MB/s of combined throughput. By balancing the I/O load evenly between
peered PCI busses, you ensure most efficient use of PCI throughput. Although evenly distributing
adapters between the two busses is a good starting point, balancing the load on the two PCI
busses requires a bit more insight into the loads generated by each type of device.
ECG044.0399
Generally, the guidelines below will deliver a balanced bus when adding controllers to your
server.
1. When installing an even multiple of network or array controllers, split the controllers
evenly between the busses. For example, if you were adding two array controllers
and two network controllers, you should put one each of the network and array
controllers in each of the PCI busses.
2. If installing an "odd" number of controllers, for example, two NICs (Network
Interface Controller) and one drive array controller, split the two network controllers
between the busses. Network controllers consume more bandwidth than array
controllers do, so it is best to split the workload between two busses if possible.
I/O Performance Tuning of Compaq Servers8
Secondary PCI Bus
Primary PCI Bus
3. Avoid putting two network controllers together in the same bus unless both busses
already have a network controller installed. Note thatSince fewer devices can lower
contention, it is generally better to have a system with one dual-port NIC in each
bus than to have two single-port NICs in each bus.
4. When adding redundant NIC pairs, place both NICs on the same bus. If the server
was to failover to the backup device the load would remain balanced.
PCI Slot
133MB/s
PCI Slot
PCI Slot
PCI Slot
Microprocessor
Motherboard
Chipset
System
Memory
266MB/s
133MB/s
PCI Slot
PCI Slot
PCI Slot
PCI Slot
Figure 3. Peer PCI Architecture
These guidelines should not be followed if one device consistently operates at a higher load. If,
for example, one particular NIC operates at or near its upper limit while 3 other NICs in the same
server remain relatively idle, you should not split the number of NICs evenly. In this case, you
should attempt to balance the loads by placing the heavily loaded card in one bus and the less
active NICs in another bus. Balancing the total load on the PCI bus, from all devices, is the key to
maximizing PCI throughput.
Table 1. PCI Bus Architectures of ProLiant servers
While balancing the busses cannot increase your maximum throughput of 133 MB/s per bus, it
can increase the potential sustained throughput of the server. By making sure that you use the
maximum bandwidth on the two busses, you can increase overall performance. When a bus is
carrying its maximum sustained throughput, the bus is said to be saturated. Once any bus has
become saturated, it becomes a limiting factor in server performance. Balanced busses saturate at
higher performance levels. Here again, the loads placed on your server will determine the
importance of a balanced bus. In general, the high-speed PCI busses in Compaq servers are less
likely to become saturated in environments where random, small-block transfers are the norm.
Operating at 33 MHz and transmitting 32 bits in parallel, small files common in web serving and
user storage are not usually on the bus long enough to cause any sustained saturation. However,
exceptionally heavy loads of small block transfers or large block transfers, common in video
streaming or CAD storage, can make a balanced bus a critical part of your server’s performance.
Compaq recommends that you always balance the loads on your PCI busses according to the
guidelines above. However, the impact that balancing will have on performance will vary
depending on the load placed on your server.
SCSI Bus
SCSI provides performance and features that have made it the interface of choice for Compaq
servers. Originally, there was SCSI, which was renamed to SCSI-1 with the advent of SCSI-2.
SCSI-1 suffered from many compatibility issues that were addressed in the next revision. At the
same time SCSI-2 clarified and established SCSI standards, it extended the performance and
functionality, both making SCSI more powerful and resolving compatibility issues. SCSI-3, the
newest standard, extends the functionality of the SCSI bus to new devices such as Fibre Channel,
SCSI-3 Parallel Interface, and the High Performance Serial Bus. Most importantly, SCSI-3 paves
the way for a higher-performance bus interface.
When referencing SCSI devices, prefixes, such as Wide, Narrow, and Fast, are used. Each of
these prefixes gives some insight to the maximum performance of the SCSI device. There are two
classes of prefixes; those that deal with the bus speed and those that deal with bus width. Table 1
summarizes and defines some common SCSI prefixes.
Table 2. SCSI prefixes
Bus speed
Bus width
Regular
FastDefined in SCSI-2, the Fast protocol increases the speed of the SCSI bus to
UltraThe Ultra protocol, part of the SCSI-3 specification, builds on the performance of Fast
NarrowThe original SCSI bus is capable of transmitting 8 bits per clock cycle. The term narrow
WideIntroduced as part of the SCSI-2 specification, Wide busses allow the transmission of 2
This term is no longer used. Regular, or the lack of Fast or Ultra, denotes the original 5
MHz SCSI bus speed. On a narrow bus, Regular SCSI could transmit 5 MB/s.
10 MHz. On narrow busses, which transmit 1 byte per clock cycle, this gives a
maximum throughput of 10 MB.
SCSI, but doubles the clock again to 20 MHz. Ultra SCSI can transmit up to 20MB/s on
a narrow bus.
is rarely used, but implied by the lack of the Wide prefix.
bytes or 16 bits per clock cycle. By doubling the data bus width, the throughput of the
SCSI bus doubles. In Wide-Fast SCSI, the throughput reaches 20 MB/s; and in WideUltra SCSI, the throughput has a maximum of 40 MB/s.
ECG044.0399
I/O Performance Tuning of Compaq Servers10
SCSI Bus Interface
Since SCSI was introduced, several specifications have been released and many new extensions
have been defined. With each subsequent release of the SCSI specification; expandability,
performance, flexibility, and compatibility have increased or improved. Currently, there are three
SCSI specifications.
SCSI-1
The original SCSI standard, approved by ANSI in 1986, defined the first SCSI bus in terms of
cabling length, signaling characteristics, commands, and transfer modes. The default (Regular)
speed for SCSI was 5 MB/s. It had an 8-bit (Narrow) parallel bus that transferred a single byte of
data with each bus cycle. “Regular” and “Narrow” conventions are no longer mentioned in the
SCSI protocol names.
SCSI-2
The second version of the SCSI standard, SCSI-2, was approved in 1990. SCSI-2 was an
extensive enhancement that defined support for many advanced features, including:
• Fast SCSI: A high-speed transfer protocol that doubles the speed of the bus to 10 MHz.
With an 8-bit data pathway, the transfer rate is 10 MB/s.
• Wide SCSI: Widens the original 8-bit SCSI bus to 16 bits to permit more data
throughput at a given signaling speed. The combination of Fast and Wide (Fast-WideSCSI-2) offers data transfer rates up to 20 MB/s.
• More Devices per Bus: Wide SCSI busses support 16 devices (15 drives, plus
controller) as opposed to eight with regular (Narrow) SCSI.
• Better Cables and Connectors: SCSI-2 defined a new high-density 68-pin “B” cable
and connectors.
• Active Termination: Provided more reliable termination of the bus.
In addition to these features, SCSI-2 maintained backward compatibility with all SCSI devices.
SCSI-3
SCSI-3 is a group of documents that define the implementation of SCSI protocols on different
physical layers (SCSI-3 Parallel Interface, High Performance Serial Bus, Fibre Channel, and
Serial Storage Architecture). Each physical layer has different performance characteristics and
uses different hardware. Other documents in the SCSI-3 standard are still being developed.
Currently, the SCSI-3 standard includes SCSI-2’s performance and functionality enhancements
plus:
• Ultra SCSI: Doubles the bus speed to 20 MHz and the transfer rate to 20 MB/s with an
8-bit data pathway.
• Wide-Ultra SCSI-3: Doubles the Ultra SCSI transfer rate to 40 MB/s using a 16-bit data
pathway.
ECG044.0399
• Improved Cabling: A new 68-pin “P” cable replaces the “B” cable for use with Wide
SCSI.
Compaq has extensively tested and integrated the Wide-Ultra SCSI-3 technology in Compaq
servers and storage options because it allows the highest available performance in a SCSI host
I/O Performance Tuning of Compaq Servers11
P
interface and because its backward compatibility provides investment protection for Compaq
customers.
In general, use devices with the highest SCSI specification that your controller can
support.
In the case of Fibre Channel Arrays, use Wide-Ultra SCSI-3 compatible drives for maximum
performance. In all SCSI configurations, note that:
• Wide-SCSI will outperform narrow SCSI.
• Higher clock-rate interfaces (Fast and Ultra) will usually give performance gains.
Compaq does not recommend mixing SCSI revisions or protocols within RAID arrays. If you
were to put a SCSI-1 device as part of a RAID array of Wide-Ultra SCSI-3 devices, the
performance of the entire array would suffer. However, if drives are configured as independent
drives (not as a RAID array), protocols may be mixed to some extent.
Wide and narrow devices may be used on the same bus without affecting the performance of the
individual devices. That is, Wide devices will transfer 16 bits per clock; and Narrow devices will
use 8-bit transfers.
However, when mixing non-Ultra and Ultra drives, you must be aware of potential performance
implications. When the SCSI bus is initialized, the host adapter negotiates the highest transfer
protocol that each drive is capable. Ideally, the controller will use this maximum speed to do all
transfers to that device. In practice, however, operating a SCSI bus at Ultra speeds places strict
requirements on the configuration in order to maintain signal integrity. For any component of
your Compaq SCSI chain to use Ultra speeds you must adhere to the following:
• Both the device and your controller must be capable of Ultra SCSI.
• The SCSI controller must not be in the middle of a SCSI chain.
• For every 5 Narrow devices, you must have at least one Wide device.
• Any device that communicates at Ultra speeds must be Wide.
If any one of the above rules is broken, no device on the SCSI bus will be able to communicate at
Ultra speeds. In this case, the bus will fall back to Fast, 10 MHz transfers.
SCSI Bus Balancing
Just as with PCI, balancing the load across multiple SCSI busses can increase sustained
throughput. Here again, balancing SCSI controller loads is not as simple as evenly distributing the
number of disks between the busses. Proper bus balancing evenly distributes the loads generated
by access to each disk.
All Compaq SMART-2 SCSI controllers, Compaq SMART-2 SCSI Array Controllers, and most
ProLiant Storage Systems are available with multiple SCSI busses.
ECG044.0399
I/O Performance Tuning of Compaq Servers12
Table 3. Smart-2 Controller Family
ModelNumber of SCSI Channels (Busses)Maximum Number of Spindles
Smart-2SL Controller230
Smart-2DH Controller230
Smart-2/P Controller230
Smart-2/E Controller230
Smart Array 3100 ES345
Smart Array 3200230
Table 4. ProLiant Storage Systems
ModelNumber of SCSI Channels
(Busses)
ProLiant Storage System U117Wide-Ultra SCSI-3
ProLiant Storage System U228Wide-Ultra SCSI-3
ProLiant Storage System17Fast-Wide SCSI-2
ProLiant Storage System UE212-1" or 8-1.6"Wide-Ultra SCSI-3
Maximum Number of
Drives
SCSI Revision Supported
The SMART-2 family of controllers and the ProLiant Storage System boxes provide an
integrated storage array solution. The various topologies these combinations present must be load
balanced with the awareness of where the busses and their combined load are. Details of the
Smart-2 family of array controllers and the ProLiant Storage System boxes are presented in the
two tables above.
On controllers with two independent busses, making certain that the I/O load is evenly distributed
can provide higher sustained throughput from the SCSI interface. When drives of similar
performance and load are used, balancing the SCSI busses is as simple as dividing the number of
drives on each bus evenly. However, drives should not be divided evenly if dividing the drives
between the SCSI busses will require placing a higher I/O load on one of the two busses.
SCSI Bandwidth & Saturation
Because disk requests can be combinations of reads or writes, random or sequential, and small or
large; SCSI throughput is the most application-dependent I/O factor. In some server mixes the
bandwidth limit will never be reached. If you are doing random data retrieval, your drives will
spend more time seeking the data, and the data is so small that the transfer will be off of the SCSI
bus before the next read is completed by the drive. When small, random reads are the norm, you
can have many more devices on your SCSI bus before you reach saturation.
However, large block data transfer environments, such as video editing, will cause the drives to
do fewer seeks and retrieve large, contiguous streams of data. In this case, a Wide-Ultra SCSI-3
bus can be saturated by as few as 4 drives. Once the SCSI bus has become saturated adding more
drives can actually degrade performance.
Be aware of the load placed on your server and maximize SCSI bandwidth accordingly.
Fibre Channel
Fibre Channel (FC) is the next generation in storage technology. FC combines a high-speed
connection between server and storage with flexibility and expandability. This high-speed link is
ECG044.0399
Loading...
+ 26 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.