Brocade Communications Systems 53-0001575-01 User Manual

SAN Design: March 29, 2001 3:18 pm
Building and Scaling BROCADE
SAN Fabrics: Design and Best
©
Practices Guide
53-0001575-01 BROCADE Technical Note Page: 1 of 31
BROCADE SAN Integration and Application Department
Design and Best Practices Guide
This document contains BROCADE© recommendations and guidelines for configuring a Storage Area Network (SAN). The document includes several reference topologies and also provides pointers to products/solutions from BROCADE partners that can be used to implement the target configuration/solution. This information is for reference only and is meant to provide some ideas and starting points for a SAN design. Brocade provides more in depth training courses on SAN design. See the Brocade web site www.brocade.com to sign up for these courses where this information is covered in more depth.
1.0 Introduction
This document details SAN topologies supported by Brocade SilkWorm 2x00 switch fabrics and provides guid­ance on the number of end user ports that can reliably be deployed based on testing done to date. Exceeding these port count guidelines may have unpredictable results on fabric stability.
BROCADE is providing this document as a starting point for users interested in implementing a Storage Area Network (SAN). The tar­get users of this document are individuals who are responsible for developing a SAN architecture on behalf of a client user or an end user who desires information to aid in developing a SAN architecture. Section 1describes SAN topologies and maximum size configu­rations supported as of the publication of this document. Section 2 presents information that is related to SAN Design and that should prove helpful when designing and implementing a SAN. These relate to cabling, inter-switch links, switch counts, and fabric manage­ment options. BROCADE is working with OEMs and integrators exploring fabric solutions of 15, 20, and 30 or more switches. This paper is designed to help in developing fabric solutions that use a large number of switches with hundreds of end nodes in tested and proven topologies. This is a work in progress and as additional information is developed and larger SAN designs are tested this docu­ment will be updated.
Brocade provides SAN design guidance via our sales force to partners and end users. If you need to exceed the limits presented in
this guide, please contact a Brocade sales representative to receive help and guidance in designing your fabric.
53-0001575-01 BROCADE Technical Note Page 2 of 31
SAN Design: March 29, 2001 3:18 pm
2.0 Fabric Topologies
This section explores a variety of fabric topologies and provides some specific network examples for SAN fabrics. Topologies fall into the following general categories:
Meshed Topology-- a network of switches that has at least one link to each adjacent switch. Fully meshed designs will have a connection from each switch in the fabric to all other switches in the fabric. Other topologies are a specific instance of a mesh design.
Star Topology-- central switch(es) with some or all ports used to connect to other switches; edge switches connect only to the center switches
Tier Architecture -- a switch hierarchy of two or more levels with inter switch connections that assume data paths go from one side (hosts) to the other side (targets).
Each of these topologies has advantages and disadvantages. The SAN designer should be aware of the features and benefits of each design when building a solution for a specific customer environment. Some advantages and disadvantages are detailed here:
Meshed Topology Designs
Provide any-to-any connectivity for devices
Good for designs where locality of data is know and hosts and targets can be located on the same switch but where some amount of any-to-any connections are needed
Provide for resiliency on switch failure with the fabric able to re-route traffic via other switches in the mesh
Allows for expansion at the edges without disruption of the fabric and attached devices
Allows for scaling in size as port count demands increase (see SAN building block in the sample configuration section)
Host and storage devices can be placed anywhere in the fabric
Star Topology Designs
Two hops maximum, consistent latency
Multiple equal cost paths allowing for load sharing at time of configuration of fabric
Easy to start small and scale
Two paths through the core from edge switches allows for failover
53-0001575-01 BROCADE Technical Note Page: 3 of 31
SAN Design: March 29, 2001 3:18 pm
Tier Architecture Designs
Typically three layers of switches, a host layer, core layer, and storage layer
Natural extension to star design (in some cases, Tier designs are Stars)
Core switches are used to provide connectivity between host and storage layer switches, includes redundant switching elements
Each layer can be scaled independently
Cores can be simple to more complex, easily replaced with higher port count switches
Can still use knowledge of data locality in placing devices in fabric
Allows for bandwidth improvement by using multiple ISLs where needed
Multiple paths in fabric allowing for redundant path selection should a failure occur
2.1 REDU NDANCY AND FAILOVER
One of the major benefits of a Fibre Channel fabric is the ability of the fabric to recover from failures of individual fabric elements as long as the design includes redundant paths. The BROCADE SilkWorm design supports auto re-configuration of the fabric when switches are added or removed from the fabric. This allows for auto discovery of alternate routes between fabric nodes with the routing algorithm deter­mining the most efficient route between nodes based on the currently available switches. Obviously to take advantage of this feature the basic fabric design should have built in redundancy to allow for alternate paths to end node devices. A single switch fabric will have no alternate paths between devices should the switch fail. However, a simple two switch fabric can be designed, along with redundant ele­ments in the host and storage nodes, to allow for failure of a single switch and to use a route through an alternate switch.
A number of factors should be considered when designing a fabric and there is no one answer or single topology that addresses all prob­lems. Each user will have unique system elements and design needs that will need to be factored into the fabric design. The later portion of this document provides for a number of design topologies that can be used as templates for fabric designs. Key elements to consider are:
How much redundancy is required? Hosts with key applications should have redundant paths to required storage (via the fabric), meaning multiple HBA’s per host should be considered so a single HBA failure will not result in loss of host access to data
Storage considerations. RAID devices provide for more reliability and resilience to failure of a single drive and allow for auto­recovery on failure. HIgh availability designs should use RAID storage devices as the building blocks for storage -- these devic es have built in recovery when using RAID 1(mirroring) or RAID 5 (striping with parity disk). A greater level of reliability can be achieved by mirroring the storage device remotely using the switch support for devices at 10Km distance (or more using devices that support extended distance optical signaling). Some RAID subsystems include the ability to mirror writes to another disk system as a feature of the disk controller; software support for this feature (e.g. Veritas) also exist. A critical storage node can be mirrored loc ally within a fabric or mirrored across an extended fabric link. BROCADE provides a licensed software option (Extended Switch, available in release 2.1.3) that allows for increasing the E-port buffer credits for extended links. [Buffer credits allow for the sending device to continue to send data up to the credit limit without having to wait for acknowledgment, improving performance. More credits allows for a greater pipeline of data on a link, particularly useful when transmitting over extended distances.] The extended fabric option is
53-0001575-01 BROCADE Technical Note Page: 4 of 31
SAN Design: March 29, 2001 3:18 pm
useful when combined with a link extender that can allow from 30 to 120 kilometers distance between switch elements. There is a latency penalty for extended links that needs to considered where performance is a concern. Shorter links, lower latency -- with roughly 100 microseconds of delay per 10KM of distance for round trip traffic.
RAID devices have the added benefit of requiring only one switch port and an intelligent RAID controller can support multiple SCSI or Fibre Channel drives behind it. RAID controllers will also off-load hosts from dedicating CPU cyles to supporting software RAID. The trade off is cost and performance. A loop of disks contained in a JBOD can also be attached to a single switch port and managed via software RAID. Redundant loops can be used to provide for high availability to stored data.
Host systems can be designed to be passive fabric elements and only activated when a primary host system fails. Designs that use two active hosts sharing the same data can also be achieved. An example of a remote mirrored high availability design is detailed later in this paper.
2.2 REDU NDANT FABRICS
The previous section discussed redundant elements within a fabric design. Another design approach is to use redundant fabrics. Two inde­pendent switch fabrics are used. The simplest version of this is a two switch solution where the switches are not connected. [See the first example in section 2.0]. This solution allows for redundant fabrics and should a single switch fail in the case of the simplest design, data is routed via the second switch in the alternate fabric. Recovery to the alternate switch occurs at the host/device driver level where failure of a data path can be noted and an alternate path to storage can be selected.
There are four levels of redundancy possible within a fabric design. From least reliable to most reliable, they are:
Single fabric, non-resilient
All switches are connected in such a way as to form a single fabric, and this fabric contains at least one single point of failure.
Single fabric, resilient
All switches are connected in such a way as to form a single fabric, but no single point of failure exists which could cause the fabric to segment.
Dual fabric, non-resilient
Half of the switches are connected to form one fabric, and the other half form an identical fabric, which is completely unconnected to the first fabric. Within each fabric, at least one single point of failure exists. This can be used in combination with dual attach hosts and storage to keep a solution up even when one entire fabric fails.
Dual fabric, resilient
53-0001575-01 BROCADE Technical Note Page: 5 of 31
SAN Design: March 29, 2001 3:18 pm
Half of the switches are connected to form one fabric, and the other half form an identical fabric, which is completely unconnected to the first fabric. There is no single point of failure in either fabric which could cause the fabric to segment. This can be used in combination with dual attach hosts and storage to keep a solution up even when one entire fabric fails. This is generally the best
approach to take to SAN design for high availability.
FIGURE 1. This Figure shows an example of each of the types of resilient designs.
Levels of Redundancy
SPF
Single fabric,
non-resilient
Dual fabric,
non-resilient
Single fabric,
resilient
Dual fabric,
resilient
The following discussion will be about single fabrics with resiliency. If a dual fabric, resilient design is desired (in fact, this is recom- mended), simply pick the appropriate single fabric design and build two of them.
53-0001575-01 BROCADE Technical Note Page: 6 of 31
SAN Design: March 29, 2001 3:18 pm
FIGURE 2. Two Core Switch, Two ISL, Star Design.
The above SAN is a single fabric, resilient design. To deploy a dual fabric, resilient SAN based upon this architecture, the following SAN would be built
FIGURE 3. Dual Fabric Design, Hosts and Storage connection to Two Independent Fabrics:
Hosts and
Storage
Dual connections
No connection!
Redundancy builds in the ability to allow for SAN management to take place on one SAN while the other SAN stays in operation. For a site where high availability is mandatory and where significant down time cannot be tolerated this design approach is the most prudent.
When designing a fabric solution, consider using two redundant fabrics to provide the maximum flexibility in terms of SAN downtime and maintenance. This solution will allow for:
53-0001575-01 BROCADE Technical Note Page: 7 of 31
SAN Design: March 29, 2001 3:18 pm
Switch upgrades can take place on one SAN (firmware, hardware, both) and while this SAN is down the
redundant SAN stays in operation
A switch failure in one SAN allow for failover to the redundant SAN while the failed switch is replaced.
Eliminates the single point of failure in the system - while highly meshed and redundant connections in a sin-
gle fabric are possible, the overall design of the SAN when installed as a redundant set of device inter-connects is simpler to maintain and provides insurance against failure of a single fabric
2.3 SAMPLE SWITC H CONFIGURATIONS
This section provides a number so switch fabric designs using the topologies defined above. Simple switch designs are shown along with more complex meshed fabric designs, Tier Architecture designs, and Star topology designs. This section also provides guidance on the overall size in the form of port counts that can reliably be deployed today. As testing is completed for larger port count topologies they will be added to this document.
53-0001575-01 BROCADE Technical Note Page: 8 of 31
SAN Design: March 29, 2001 3:18 pm
2.4 DUAL SWITCH HIGH AVAILABILITY CONFIGURATION - REDUNDANT FABRIC
FIGURE 4. Two Fabrics -- Simplest Redundant Fabric Configuration
H
SW
H
SW
D
Can use multiple hosts sharing single disk device, with redundant paths in two fabrics
Redundant switches, not combined into fabric
Dual HBAs in hosts, host level software provides for failover to alternate HBA when failure is noted in one HBA
Dual ported storage allows for either host to access the same data
With 8 port switches can support 4 hosts and two disk devices, larger configurations possible with 16 port switches
H - host SW - switch D - storage device
53-0001575-01 BROCADE Technical Note Page: 9 of 31
SAN Design: March 29, 2001 3:18 pm
2.5 TWO SWITC H FABRIC FOR MIRRORING AND DISASTER TOLERANCE
FIGURE 5. Extended Fabric Example
H
SW
D
10 KM
SW
D
H
Sample configuration showing only two hosts and two storage devices, larger configurations can be deployed
This example shows how data being used at the local site can be mirrored at a remote site via an extended fabric link. Primary system data is replicated at remote site where a backup failover system is located. Primary system disk information is mirrored to remote site that can be 10KM distance with standard FC components (Long Wave Length GBICs). Extended distances (20 to 120 KM possible) using optical extender devices or DWDM devices.
Starting with Fabric OS version 2.1.1, optional software to support extended fabric is available. This allows increase in buffer-to-buffer credits on E-ports to allow for maximum performance on links extended over long distances. This option recommended when extended beyond 40KM.
Mirroring across the link is accomplished by use of host based mirroring software or by storage based mirroring options.
Local Site Primary Compute Site
53-0001575-01 BROCADE Technical Note Page: 10 of 31
Remote Site Activated when primary site fails
Note: this configuration does not show a highly available solution, there are single points of failure -- it points out the concept of remotely mirroring data for disaster tolerance; other more highly available architected solutions are possible at both sites.
Alternative designs are possible where both sites mirror to each other; sites can also consist of multi-switch fabrics connected over an extended link
Loading...
+ 21 hidden pages