Dell EMC, VPLEX, SAN Connectivity User Manual

H13546
Best Practices
Dell EMC VPLEX: SAN Connectivity
Implementation planning and best practices
Abstract
January 2021
Revisions
2 Dell EMC VPLEX: SAN Connectivity | H13546
Revisions
Date
Description
May 2020
Version 4: Updated ALUA connectivity requirements
January 2021
Added metro node content
Acknowledgments
Author: VPLEX CSE Team
VPLEXCSETeam@emc.com
This document may contain certain words that are not consistent with Dell's current language guidelines. Dell plans to update the document over subsequent future releases to revise these words accordingly.
This document may contain language from third party content that is not under Dell's control and is not consistent with Dell's current guidelines for Dell's own content. When such third party content is updated by the relevant third parties, this document will be revised accordingly.
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software described in this publication requires an applicable software license.
Copyright © January 2021 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [1/19/2021] [Best Practices] [H13546]
Table of contents
3 Dell EMC VPLEX: SAN Connectivity | H13546
Table of contents
Revisions............................................................................................................................................................................. 2
Acknowledgments ............................................................................................................................................................... 2
Table of contents ................................................................................................................................................................ 3
Executive summary ............................................................................................................................................................. 5
1 Frontend connectivity ................................................................................................................................................... 7
1.1 Frontend/host initiator port connectivity .............................................................................................................. 7
1.2 Host cross-cluster connect ............................................................................................................................... 11
1.3 VBLOCK and VPLEX front end connectivity rules ........................................................................................... 12
2 ESXi path loss handling ............................................................................................................................................. 14
2.1 Path loss handling semantics (PDL and APD) ................................................................................................. 14
2.1.1 Persistent device loss (PDL) ............................................................................................................................ 14
2.1.2 vSphere storage for vSphere 5.0 U1 ................................................................................................................ 15
2.1.3 vSphere storage for vSphere 5.5 ...................................................................................................................... 15
2.1.4 APD handling .................................................................................................................................................... 16
2.2 vSphere storage for vSphere 5.1 ...................................................................................................................... 17
2.2.1 Disable storage APD handling .......................................................................................................................... 17
2.2.2 Change timeout limits for storage APD ............................................................................................................ 17
2.2.3 PDL/APD references ........................................................................................................................................ 18
3 VPLEX backend array connectivity ............................................................................................................................ 19
3.1 Back-end/storage array connectivity ................................................................................................................ 19
3.1.1 Active/active arrays .......................................................................................................................................... 19
3.1.2 Active/passive and ALUA arrays ...................................................................................................................... 23
3.1.3 Additional array considerations ........................................................................................................................ 29
4 XtremIO connectivity .................................................................................................................................................. 30
4.1 XtremIO overview ............................................................................................................................................. 30
4.2 VPLEX connectivity to XtremIO ........................................................................................................................ 30
4.2.1 Single engine/single X-Brick ............................................................................................................................. 31
4.2.2 Single engine/dual X-Brick ............................................................................................................................... 32
4.2.3 Single engine/quad X-Brick .............................................................................................................................. 33
4.2.4 Dual engine/single X-Brick ............................................................................................................................... 34
4.2.5 Dual engine/dual X-Brick .................................................................................................................................. 35
4.2.6 Dual engine/quad X-Brick ................................................................................................................................. 36
4.2.7 Quad engine/single X-Brick .............................................................................................................................. 37
4.2.8 Quad engine/dual X-Brick ................................................................................................................................. 38
Table of contents
4 Dell EMC VPLEX: SAN Connectivity | H13546
4.2.9 Quad engine/quad X-Brick ............................................................................................................................... 39
4.3 XtremIO 6 X-Brick connectivity ......................................................................................................................... 40
4.3.1 Solution 1 .......................................................................................................................................................... 40
4.3.2 Solution 2 .......................................................................................................................................................... 42
5 PowerStore and Dell EMC Unity XT .......................................................................................................................... 44
5.1 Metro node feature ........................................................................................................................................... 44
5.2 PowerStore ....................................................................................................................................................... 44
5.3 Dell EMC Unity XT ............................................................................................................................................ 45
5.4 Metro node connectivity .................................................................................................................................... 47
5.5 Backend connectivity ........................................................................................................................................ 47
5.6 Dell EMC Unity XT and single-appliance PowerStore connectivity .................................................................. 48
5.7 PowerStore dual-appliance connectivity .......................................................................................................... 50
5.8 Frontend/host initiator port connectivity ............................................................................................................ 50
A Technical support and resources ............................................................................................................................... 52
Executive summary
5 Dell EMC VPLEX: SAN Connectivity | H13546
Executive summary
There are four general areas of connectivity for VPLEX:
1. Host to VPLEX connectivity.
2. VPLEX to array connectivity
3. WAN connectivity for either Fibre Channel or Ethernet protocols.
4. LAN connectivity.
This document is designed to address frontend SAN connectivity for host access to Dell EMC VPLEX Virtual Volumes and VPLEX backend SAN connectivity for access to storage arrays. WAN and LAN connectivity will be presented in VPLEX Networking Best Practices document.
Host connectivity to VPLEX, referred to as frontend connectivity, have specific requirements which differ from the backend or VPLEX to array connectivity. These requirements are based on architectural design limitations and are checked during health checks and ndu pre-checks. Connectivity violations will be flagged and prevent the system from being upgraded. Bypassing these requirements is allowed for test environments or Proof of Concept but it is not supported for production environments. System limits are available in the Release Notes for each GeoSynchrony Code level.
Array connectivity varies based on array type and performance or archival requirements. Path limit requirements are based on “active” path count thereby total path count would be double for active/passive or ALUA arrays. ALUA arrays are considered active/passive from VPLEX point of view as the non-preferred paths are not used except when the primary controller no longer has control of the LUN such as during an array ndu. Active/active array’s paths are all considered active regardless of host multipath software.
Content has been added to cover metro node. Metro node is an external hardware/software add-on feature based on the Dell PowerEdge R640 platform for both PowerStore and Dell EMC Unity XT arrays for which it provides active/active synchronous replication as well as standard local use cases. Additionally, it also provides a solution locally with the Local mirror feature to protect data from a potential array failure. Both of these use cases provide solutions for true continuous availability with zero downtime.
Factors that impact decisions on which product should be used for protection revolve around RT0 (Recover Time Objective) = How long can you can afford your applications to be down, RPO (Recover Point Objective) = How much data you can afford to lose when you roll back to a previous known good point in time, and DTO (Decision Time Objective) = The time it take to make the decision to cut over and roll back the data which adds to the RTO. Metro node provides an active/active storage solution that allows data access for a multitude of host types and applications giving them zero data unavailability (DU) and zero data loss (DL) which translates into zero DTO for any type of failure up to and including loss of an entire datacenter. Furthermore, metro node provides transparent data access even during an array outage through locally mirrored arrays or through a Metro configuration where the read/write I/O is serviced via the remote site. This solution is superior to array based replication for this reason. This solution achieves zero RTO, zero RPO and zero DTO.
The feature is deployed as a greenfield solution and is intended to be purchased at the same time as part of the order for either PowerStore or Unity XT array. All brownfield deployments (those requesting to upgrade an existing environment for active/active synchronous replication) will be supported by VPLEX VS6 hardware/software and the best practices for that solution will be covered in the VPLEX documentation.
Executive summary
6 Dell EMC VPLEX: SAN Connectivity | H13546
The metro node feature must be installed as either a Local or a Metro solution initially. It is currently not supported to upgrade a Local to a Metro at a later date.
This document will cover best practices for installation, configuration and operation of the metro node feature. There may be similarities with the VPLEX product but there are differences within the product code bases which may result in changes to processes, procedures or even some commands which make this the authoritative document for metro node.
Due to feature differences between PowerStore and Unity XT, there will be connectivity and configuration details specific to the array platform. Please familiarize yourself with the specific differences outlined within this document.
Most synchronous replication solutions require like to like between the two sites meaning same array models and microcode. This applies at the metro node layer but not at the array level. The only recommendation is that the arrays at both sites have similar performance levels. It is fully supported to have a Unity XT array at one site of the metro and a PowerStore array at the other and this mixed configuration is also supported starting with Ansible 1.1.
Metro node supports both Local RAID-1 and Metro Distributed RAID-1 device structures. PowerStore can take advantage of the Local RAID-1 by mirroring across appliances in a clustered PowerStore configuration protecting the data and allowing host access even during an outage of one appliance. Local RAID-1 resynchronizations will always be a full resynchronization whereas a Distributed RAID-1 will utilize the Logging volume mapping files for incremental rebuilds.
Frontend connectivity
7 Dell EMC VPLEX: SAN Connectivity | H13546
1 Frontend connectivity
1.1 Frontend/host initiator port connectivity
Dual fabric designs are considered a best practice
The front-end I/O modules on each director should have a minimum of two physical connections one
to each fabric (required)
Each host should have at least one path to an A director and one path to a B director on each fabric for a total of four logical paths (required for NDU).
Multipathing or path failover software is required at the host for access across the dual fabrics
Each host should have fabric zoning that provides redundant access to each LUN from a minimum of
an A and B director from each fabric.
Four paths are required for NDU
Dual and Quad engine VPLEX Clusters require spanning engines for host connectivity on each fabric
Observe Director CPU utilization and schedule NDU for times when average directory CPU utilization
is below 50%
GUI Performance Dashboard in GeoSynchrony 5.1 or newer
Skipping the NDU pre-checks would be required for host connectivity with less than four paths and is
not considered a best practice
- This configuration would be supported for a test environment or POC
Note: For cluster upgrades when going from a single engine to a dual engine cluster or from a dual to a quad engine cluster you must rebalance the host connectivity across the newly added engines. Adding additional engines and then not connecting host paths to them is of no benefit. The NDU pre-check will flag host connectivity that does not span engines in a multi-engine VPLEX cluster as a configuration issue. When scaling up a single engine cluster to a dual, the ndu pre-check may have passed initially but will fail after the addition of the new engine which is why the host paths must be rebalanced across both engines. Dual to Quad upgrade will not flag an issue provided there were no issues prior to the upgrade. You may choose to rebalance the workload across the new engines or add additional hosts to the pair of new engines.
Complete physical connections to the VPLEX before commissioning/setup. Use the same FE/BE ports on each director to avoid confusion, that is, B0-FC00 and A0-FC00. Please refer
to hardware diagrams for port layout.
Frontend connectivity
8 Dell EMC VPLEX: SAN Connectivity | H13546
Host connectivity for Single Engine Cluster meeting NDU pre-check requirements
This illustration shows dual HBAs connected to two Fabrics with each connecting to two VPLEX directors on the same engine in the single engine cluster. This is the minimum configuration that would meet NDU requirements.
Please refer to the Release Notes for the total FE port IT Nexus limit.
Note: Each Initiator / Target connection is called an IT Nexus. Please refer to the VPLEX Hardware Reference guide available on SolVe for port layouts of other VPLEX hardware families.
Frontend connectivity
9 Dell EMC VPLEX: SAN Connectivity | H13546
Host connectivity for HA requirements for NDU pre-checks dual or quad engine
The previous illustration shows host connectivity with dual HBAs connected to four VPLEX directors. This configuration offers increased levels of HA as required by the NDU pre-checks. This configuration applies to both the Dual and Quad VPLEX Clusters. This configuration still only counts as four IT Nexus against the limit as identified in the Release Notes for that version of GeoSynchrony.
Note: Please refer to the VPLEX Hardware Reference guide available on SolVe for port layouts of other VPLEX hardware families.
Frontend connectivity
10 Dell EMC VPLEX: SAN Connectivity | H13546
Host connectivity for HA quad engine
The previous illustration shows host connectivity with dual HBAs connected to four VPLEX engines (eight directors). This configuration counts as eight IT Nexuses against the total limit as defined in the Release Notes for that version of GeoSynchrony. Hosts using active/passive path failover such as VMware NMP software should connect a path to all available directors and manual load balance by selecting a different director for the active path with different hosts.
Note: Most host connectivity for hosts running load balancing software should follow the recommendations for a dual engine cluster. The hosts should be configured across two engines and subsequent hosts should alternate between pairs of engines effectively load balancing the total combined I/O across all engines. This will reduce the resource utilization of IT Nexus system limitation.
Frontend connectivity
11 Dell EMC VPLEX: SAN Connectivity | H13546
1.2 Host cross-cluster connect
Host Cluster connected across sites to both VPLEX Clusters
PowerPath VE provides an auto standby feature created specifically for this environment
Host cross-cluster connect applies to specific host OS and multipathing configurations as listed in the
VPLEX ESSM only.
Host initiators are zoned to both VPLEX clusters in a Metro.
Host multipathing software can be configured for active path/passive path with active path going to
the local VPLEX cluster. When feasible, configure the multipathing driver to prefer all local cluster paths over remote cluster paths.
Separate HBA ports should be used for the remote cluster connection to avoid merging of the local and remote fabrics
Connectivity at both sites follow same rules as single host connectivity
Supported stretch clusters can be configured using host cross-cluster connect (Please refer to
VPLEX ESSM)
Host cross-cluster connect is limited to a VPLEX cluster separation of no more than 1ms latency
Host cross-cluster connect requires the use of VPLEX Witness
VPLEX Witness works with Consistency Groups only
Host cross-cluster connect must be configured using VPLEX Distributed Devices only
Host cross-cluster connect is supported in a VPLEX Metro synchronous environment only
At least one backend storage array is required at each site with redundant connection to the VPLEX
cluster at that site. Arrays may be cross connected for metadata and logging volumes only
All Consistency Groups used in a host cross-cluster connect configuration are required to have the auto-resume attribute set to true
The unique solution provided by Host cross-cluster connect requires hosts have access to both datacenters. The latency requirements for host cross-cluster connect can be achieved using an extended fabric or fabrics that span both datacenters. The use of backbone fabrics may introduce additional latency preventing a viable use of host cross-cluster connect. The rtt must be within 1ms.
If using PowerPath VE, the only thing that the customer has to do is enable the autostandby feature:
#powermt set autostandby=on trigger=prox host=xxx
Frontend connectivity
12 Dell EMC VPLEX: SAN Connectivity | H13546
PowerPath will take care of setting to autostandby those paths associated with the remote/non-preferred VPLEX cluster. PP groups the paths by VPLEX cluster and the one with the lowest minimum path latency is designated as the local/preferred cluster.
1.3 VBLOCK and VPLEX front end connectivity rules
Note: All rules in BOLD cannot be broken, however Rules in Italics can be adjusted depending on customer requirement, but if these are general requirements simply use the suggested rule.
1. Physical FE connectivity
a. Each VPLEX Director has 4 front end ports. 0, 1, 2 and 3. In all cases even ports connect to
fabric A and odd ports to fabric B. i. For single VBLOCKS connecting to single VPLEX
- Only ports 0 and 1 will be used on each director. 2 and 3 are reserved.
- Connect even VPLEX front end ports to fabric A and odd to fabric B.
ii. For two VBLOCKS connecting to a single VPLEX
- Ports 0 and 1 will be used for VBLOCK A
- Ports 2 and 3 used for VBLOCK B
2. Connect even VPLEX front end ports to fabric A and odd to fabric B. ESX Cluster Balancing across VPLEX Frontend
All ESX clusters are evenly distributed across the VPLEX front end in the following patterns:
3. Host / ESX Cluster rules a. Each ESX cluster must connect to a VPLEX A and a B director.
b. For dual and quad configs, A and B directors must be picked from different engines (see table
above for recommendations)
c. Minimum directors that an ESX cluster connects to is 2 VPLEX directors.
d. Maximum directors that an ESX cluster connects to is 2 VPLEX directors.
e. Any given ESX cluster connecting to a given VPLEX cluster must use the same VPLEX frontend
ports for all UCS blades regardless of host / UCS blade count.
f. Each ESX host should see four paths to the same datastore
Engine # Director A B Cluster # 1,2,3,4,5,6,7,8 1,2,3,4,5,6,7,8
Engine # Director A B A B Cluster # 1,3,5,7 2,4,6,8 2,4,6,8 1,3,5,7
Engine # Director A B A B A B A B Cluster# 1,5 2,6 3,7 4,8 4,8 3,7 2,6 1,3
Dual Engine
Quad Engine
Engine 1
Engine 2
Engine 1
Engine 2
Engine 3
Engine 4
Engine 1
Single Engine
Frontend connectivity
13 Dell EMC VPLEX: SAN Connectivity | H13546
i. 2 across fabric A
- A VPLEX A Director port 0 (or 2 if second VBLOCK)
- A VPLEX B Director port 0 (or 2 if second VBLOCK)
ii. 2 across fabric B
- The same VPLEX A Director port 1 (or 3 if second VBLOCK)
- The same VPLEX B Director port 1 (or 3 if second VBLOCK)
4. Pathing policy a. Non cross connected configurations recommend to use adaptive pathing policy in all cases.
Round robin should be avoided especially for dual and quad systems.
b. For cross connected configurations, fixed pathing should be used and preferred paths set
per Datastore to the local VPLEX path only taking care to alternate and balance over the whole VPLEX front end (i.e. so that all datastores are not all sending IO to a single VPLEX director).
ESXi path loss handling
14 Dell EMC VPLEX: SAN Connectivity | H13546
2 ESXi path loss handling
2.1 Path loss handling semantics (PDL and APD)
vSphere can recognize two different types of total path failures to an ESXi 5.0 u1 and newer server. These are known as "All Paths Down" (APD) and "Persistent Device Loss" (PDL). Either of these conditions can be declared by the VMware ESXi server depending on the failure condition.
2.1.1 Persistent device loss (PDL)
A storage device is considered to be in the permanent device loss (PDL) state when it becomes permanently unavailable to your ESXi host. Typically, the PDL condition occurs when a device is unintentionally removed, its unique ID changes, when the device experiences an unrecoverable hardware error, or in the case of a vSphere Metro Storage Cluster WAN partition. When the storage determines that the device is permanently unavailable, it sends SCSI sense codes to the ESXi host. The sense codes allow your host to recognize that the device has failed and register the state of the device as PDL. The sense codes must be received on all paths to the device for the device to be considered permanently lost. If virtual machine files do not all reside on the same datastore and a PDL condition exists on one of the datastores, the virtual machine will not be killed. VMware recommends placing all files for a given virtual machine on a single datastore, ensuring that PDL conditions can be mitigated by vSphere HA.
When a datastore enters a Permanent Device Loss (PDL) state, High Availability (HA) can power off virtual machines and restart them later. A virtual machine is powered off only when issuing I/O to the datastore. Otherwise, it remains active. A virtual machine that is running memory-intensive workloads without issuing I/O to the datastore might remain active in such situations. VMware offers advanced options to regulate the power off and restart operations for virtual machines. The following settings apply only to a PDL condition and not to an APD condition.
Persistent device loss process flow
Advanced settings have been introduced in VMware vSphere 5.0 Update 1 and 5.5 to enable vSphere HA to respond to a PDL condition. The following settings are for the hosts and VMs in the stretched cluster consuming the virtual storage.
ESXi path loss handling
15 Dell EMC VPLEX: SAN Connectivity | H13546
Note: PDL response works in conjunction with DRS rules. If the rule is set to “must”, VMware HA will not violate the rule. If the rule is set to “should”, VMware HA will violate it. The DRS rule should be set to “should” to provide availability.
2.1.2 vSphere storage for vSphere 5.0 U1
2.1.2.1 disk.terminateVMonPDLDefault set to true:
For each host in the cluster, create and edit /etc/vmware/settings with Disk.terminateVMOnPDLDefault=TRUE, then reboot each host.
2.1.2.2 das.maskCleanShutdownEnabled set to true:
HA Advanced Option. If the option is unset in 5.0 U1, a value of false is assumed, whereas in ESXi 5.1 and later, a value of true is assumed. When a virtual machine powers off and its home datastore is not accessible, HA cannot determine whether the virtual machine should be restarted. So, it must make a decision. If this option is set to false, the responding FDM master will assume the virtual machine should not be restarted, while if this option is set to true, the responding FDM will assume the virtual machine should be restarted.
2.1.3 vSphere storage for vSphere 5.5
2.1.3.1 disk.terminateVMOnPDLDefault set to default:
Advanced Virtual Machine Option. Default value is FALSE. When TRUE, this parameter powers off the virtual machine if any device that backs up the virtual machine's datastore enters the PDL state. HA will not restart this virtual machine. When set to DEFAULT, VMkernel.Boot.terminateVMOnPDL is used.
2.1.3.2 VMkernel.Boot.terminateVMOnPDL set to true:
Advanced Vmkernel Option. Default value is FALSE. When set to TRUE, this parameter powers off all virtual machines on the system when storage that they are using enters the PDL state. Setting can be overridden for each virtual machine by disk.terminateVMOnPDLDefault parameter. Can be set only to TRUE or FALSE. With vSphere web client:
1. Browse to the host in the vSphere Web Client navigator.
2. Click the Manage tab and click Settings.
3. Under System, click Advanced System Settings.
4. In Advanced Settings, select the appropriate item.
5. Click the Edit button to edit the value.
6. Click OK
7. Reboot the host
2.1.3.3 das.maskCleanShutdownEnabled set to default:
HA Advanced Option. This option is set to TRUE by default. It allows HA to restart virtual machines that were powered off while the PDL condition was in progress. When this option is set to true, HA restarts all virtual machines, including those that were intentionally powered off by a user.
2.1.3.4 disk.AutoremoveOnPDL set to 0:
Advanced Vmkernel option. Default is 1. In the case of a vMSC environment the PDL’s are likely temporary because one site has become orphaned from the other, in which case a failover has occurred. If the devices in a PDL state are removed permanently when the failure or configuration error of the vMSC environment is fixed they will not automatically be visible to the hosts again. This will require a manual rescan in order to
ESXi path loss handling
16 Dell EMC VPLEX: SAN Connectivity | H13546
bring the devices back into service. The whole reason for having a vMSC environment is that it handles these types of things automatically. So you don’t want to have to do manual rescans all the time. For this reason the PDL AutoRemove functionality should be disabled on all hosts that are part of a vMSC configuration. Please note that this is recommended for Uniform or Non-Uniform vMSC configurations. Any vMSC configuration that could cause a PDL should have the setting changed. To disable this feature:
1. Connect to the ESXi host using the console or SSH. For more information, see Using Tech Support Mode in ESXi 4.1 and ESXi 5.x (KB article 1017910).
2. Run this command to disable AutoRemove: esxcli system settings advanced set -o "/Disk/AutoremoveOnPDL" -i 0
Or with vSphere web client:
1. Browse to the host in the vSphere Web Client navigator.
2. Click the Manage tab and click Settings.
3. Under System, click Advanced System Settings.
4. In Advanced Settings, select the appropriate item.
5. Click the Edit button to edit the value.
6. Click OK
2.1.3.5 Permanent device loss
Remove device from VPLEX, remove or offline a LUN from backend.
WAN partition, disable wan ports from switch or log in vplex and disable wan ports using vplexcli
2.1.3.6 All paths down
Remove volume from storage view.
Remove FC ports from ESXi host, can cause other errors.
Disable FC ports on switch.
2.1.4 APD handling
A storage device is considered to be in the all paths down (APD) state when it becomes unavailable to your ESXi host for an unspecified period of time. The reasons for an APD state can be, for example, a failed switch.
In contrast with the permanent device loss (PDL) state, the host treats the APD state as transient and expects the device to be available again.
The host indefinitely continues to retry issued commands in an attempt to reestablish connectivity with the device. If the host's commands fail the retries for a prolonged period of time, the host and its virtual machines might be at risk of having performance problems and potentially becoming unresponsive.
With vSphere 5.1, a default APD handling feature was introduced. When a device enters the APD state, the system immediately turns on a timer and allows your host to continue retrying non-virtual machine commands for a limited time period.
Loading...
+ 36 hidden pages