Configuring Self Migration from the D2D Web Management Interface ............................................... 123
Index .............................................................................................................................................. 129
For more information ........................................................................................................................ 131
3
Product
Generation
Product Number
HP StorageWorks D2D4004i/4009i
G1
EH938A/EH939A
HP StorageWorks D2D4004fc/4009fc
G1
EH941A/EH942A
HP StorageWorks D2D2503i
G1
EH945A
HP StorageWorks D2D4112
G1
EH993A
HP StorageWorks D2D2504i
G1
EJ002A
HP StorageWorks D2D2502i
G1
EJ001A
HP StorageWorks D2D4324
G2
EH985A
HP StorageWorks D2D4312
G2
EH983A
HP StorageWorks D2D4106i
G2
EH996A
HP StorageWorks D2D4112
G2
EH993B
HP StorageWorks D2D4106fc
G2
EH998A
HP StorageWorks D2D2504i
G2
EJ002B
HP StorageWorks D2D2502i
G2
EJ001B
HP StorageWorks D2D3003i
G2
AY149A
Abstract
The HP StorageWorks D2D Backup System products with Dynamic Data Deduplication are Virtual Tape library
and NAS appliances designed to provide a cost-effective, consolidated backup solution for business data and
fast restore of data in the event of loss.
In order to get the best performance from a D2D Backup System there are some configuration best practices that
can be applied. These are described in this document.
Related products
Information in this document relates to the following products:
Validity
This document replaces the document “HP StorageWorks D2D Backup System Best Practices for Performance
Optimization” (HP Document Part Number EH990-90921) to include more detailed content; the previous
document is now obsolete.
Best practices identified in this document are predicated on using up-to-date D2D system firmware (check
www.hp.com/support for available firmware upgrades). In order to achieve optimum performance after
upgrading from older firmware there may be some pre-requisite steps; see the release notes that are available
with the firmware download for more information.
4
Executive summary
This document contains detailed information on best practices to get good performance from an HP D2D Backup
System with HP StoreOnce Deduplication Technology.
HP StoreOnce Technology is designed to increase the amount of historical backup data that can be stored
without increasing the disk space needed. A backup product using deduplication combines efficient disk usage
with the fast single file recovery of random access disk.
As a quick reference these are the important configuration options to take into account when designing a backup
solution:
General D2D best practices at a glance
Always use the HP D2D Sizing tool to size your D2D solution. Always ensure the hardware component firmware and D2D software in your HP D2D Backup System are fully
up-to-date. (Check at http://www.hp.com/support.)
Take into account the need for the deduplication “Housekeeping” process to run when designing backup
configurations. Configure some time every day for the D2D to perform housekeeping.
Run multiple backups in parallel to improve aggregate throughput to a D2D appliance. Running too many concurrent jobs will impact the performance of each individual job. This will be true for all
types of job: backups, restores and replication. Ensure that the solution has been sized correctly to
accommodate the concurrent load requirements.
Restore performance is almost always slower than backup, so disabling “verify” in backup jobs will more than
halve the time taken for the job to complete.
Configure “High availability mode (link Aggregate)” or “Dual Port” network ports to achieve maximum
available network bandwidth.
Identify other performance bottlenecks in your backup environment using HP Library and Tape Tools.
VTL best practices at a glance
Make use of multiple network or fibre channel ports throughout your storage network to eliminate bottlenecks
and split virtual tape libraries across them.
Configure multiple VTLs and separate data types into their own VTLs. Configure larger “block sizes” within the backup application to improve performance. Disable any multiplexing configuration within the backup application. Disable any compression or encryption of data before it is sent to the D2D appliance. Delay physical tape offload/copy operations to allow for the housekeeping process to complete in order to
improve offload performance.
5
NAS best practices at a glance
Configure multiple shares and separate data types into their own shares. Adhere to the suggested maximum number of concurrent operations per share/appliance. Choose disk backup file sizes in backup software to meet the maximum backup size. Disable software compression, deduplication and synthetic full backups. Do not pre-allocate disk space for backup files. Do not append to backup files
6
HP StoreOnce Technology
A basic understanding of the way that HP StoreOnce Technology works is necessary in order to understand
factors that may impact performance of the overall system and to ensure optimal performance of your backup
solution.
HP StoreOnce Technology is an “inline” data deduplication process. It uses hash-based chunking technology,
which analyzes incoming backup data in “chunks” that average up to 4K in size. The hashing algorithm
generates a unique hash value that identifies each chunk and points to its location in the deduplication store.
Hash values are stored in an index that is referenced when subsequent backups are performed. When data
generates a hash value that already exists in the index, the data is not stored a second time. Rather, an entry
with the hash value is simply added to the “recipe file” for that backup session.
Key factors for performance considerations with deduplication:
The inline nature of the deduplication process means that there will always be some performance trade-off for
the benefits of increased disk space utilisation.
With each Virtual Library or NAS Share created there is an associated dedicated deduplication store. If
“Global” deduplication across all backups is required, this will only happen if a single virtual library or NAS
share is configured and all backups are sent to it.
The best deduplication ratio will be achieved by configuring a minimum number of libraries/shares. Best
performance will be gained by configuring a larger number of libraries/shares and optimising for individual
deduplication store complexity.
If servers with lots of similar data are to be backed up, a higher deduplication ratio can be achieved by
backing them all up to the same library/share.
If servers contain dissimilar data types, the best deduplication ratio/performance compromise will be achieved
by grouping servers with similar data types together into their own dedicated libraries/shares. For example, a
requirement to back up a set of exchange servers, SQL database servers, file servers and application servers
would be best served by creating four virtual libraries or NAS shares; one for each server set.
When restoring data from a deduplicating device it must reconstruct the original un-deduplicated data stream
from all of the data chunks. This can result in lower performance than that of the backup.
Full backup jobs will result in higher deduplication ratios and better restore performance (because only one
piece of media is needed for a full restore). Incremental and differential backups will not deduplicate as well.
Replication overview
Deduplication technology is the key enabling technology for efficient replication because only the new data
created at the source site needs to replicate to the target site. This efficiency in understanding precisely which
data needs to replicate can result in bandwidth savings in excess of 95% compared to having to transmit the full
contents of a cartridge from the source site. The bandwidth saving will be dependent on the backup change rate
at the source site.
There is some overhead of control data that also needs to pass across the replication link. This is known as
manifest data, a final component of any hash codes that are not present on the remote site and may also need to
be transferred. Typically the “overhead components” are less than 2% of the total virtual cartridge size to
replicate.
Replication can be “throttled” by using bandwidth limits as a percentage of an existing link, so as not to affect
the performance of other applications running on the same link.
Key factors for Key factors for performance considerations with replication:
Seed replication using physical tape or co-location to improve first replicate performance.
7
Appended backups need to “clone” the cartridge on the target side, so performance of appended tape
Model
Fan in
Maximum number concurrent
source jobs
Maximum number concurrent
target jobs
D2D4324
50
16
48
D2D4312
50
16
48
D2D4112 G2
24 8 24
D2D4106i G2
16 8 24
D2D4106fc G2
16 8 24
D2D2504i G2
8 4 8
D2D2502i G2
4 4 8
D2D4112 G1
24 2 8
D2D4009i/fc G1
16 2 6
D2D4004i/fc G1
16 2 6
D2D2504i G1
8 2 3
D2D2503i G1
6 1 2
D2D2502i G2
4 2 3
replication will not be significantly faster than replicating the whole cartridge.
If a lot of similar data exists on remote office D2D libraries, replicating these into a single target library will
give a better deduplication ratio on the target D2D Backup System.
Replication starts when the cartridge is unloaded or a NAS share file is closed after writing is complete, or
when a replication window is enabled. If a backup spans multiple cartridges or NAS files, replication will start
on the first cartridge/NAS file as soon as the job spans to the second.
Size the WAN link appropriately to allow for replication and normal business traffic taking into account data
change rates.
Apply replication bandwidth limits or apply blackout windows to prevent bandwidth hogging.
The maximum number of concurrent replication jobs supported by source and target D2D appliances can be
varied in the Web Management Interface to manage throughput. The table below shows the default settings for
each product.
8
Note: Fan in is the maximum number of source appliances that may replicate to the device as a replication
target.
Housekeeping overview
If data is deleted from the D2D system (e.g. a virtual cartridge is overwritten or erased), any unique chunks will
be marked for removal, any non-unique chunks are de-referenced and their reference count decremented. The
process of removing chunks of data is not an inline operation because this would significantly impact
performance. This process, termed “housekeeping”, runs on the appliance as a background operation, it runs on
a per cartridge and NAS file basis and will run as soon as the cartridge is unloaded and returned to its storage
slot or a NAS file has completed writing and has been closed by the appliance.
Whilst the housekeeping process can run as soon as a virtual cartridge is returned to its slot, this could cause a
high level of disk access and processing overhead, which would affect other operations such as further backups,
restores, tape offload jobs or replication. In order to avoid this problem the housekeeping process will check for
available resources before running and, if other operations are in progress, the housekeeping will dynamically
hold-off to prevent impacting the performance of other operations. It is, however, important to note that the holdoff is not binary, (i.e. on or off) so, even if backup jobs are in process, some low level of housekeeping will still
take place which may have a slight impact on backup performance.
Housekeeping is an important process in order to maximise the deduplication efficiency of the appliance and, as
such, it is important to ensure that it has enough time to complete. Running backup, restore, tape offload and
replication operations with no break (i.e. 24 hours a day) will result in housekeeping never being able to
complete. As a general rule a number of minutes per day should be allowed for every 100 GB of data
overwritten on a virtual cartridge or NAS share. See Appendix A for numbers for each product.
For example: if, on a daily basis, the backup application overwrites two cartridges in different virtual libraries
with 400 GB of data on each cartridge, an HP D2D4106 appliance would need approximately 30 minutes of
quiescent time over the course of the next 24 hours to run housekeeping in order to de-reference data and
reclaim any free space.
Configuring backup rotation schemes correctly is very important to ensure the maximum efficiency of the product;
doing so reduces the amount of housekeeping that is required and creates a predictable load.
Large housekeeping loads are created if large numbers of cartridges are manually erased or re-formatted. In
general all media overwrites should be controlled by the backup rotation scheme so that they are predictable.
Best practice
Create enough D2D virtual library cartridges for at least one backup rotation schedule and then overwrite the
tape cartridges when the virtual library cartridge data expires or when the data is no longer useful. If a large
expired media pool exists due to a non optimum rotation policy then this will use up space on the D2D
appliance. See Housekeeping monitoring and control on page 71 for more detailed information.
Upstream and Backup Application considerations
Multi-stream vs. Multiplex
Multi-streaming is often confused with Multiplexing; these are however two different (but related) terms. Multistreaming is when multiple data streams are sent to the D2D Backup System simultaneously but separately.
Multiplexing is a configuration whereby data from multiple sources (for example multiple client servers) is backed
up to a single tape drive device by interleaving blocks of data from each server simultaneously and combined
into a single stream. Multiplexing is used with physical tape devices in order to maintain good performance, if
the source servers are slow because it aggregates multiple source server backups into a single stream.
A multiplexed data stream configuration is NOT recommended for use with a D2D system or any other
deduplicating device. This is because the interleaving of data from multiple sources is not consistent from one
backup to the next and significantly reduces the ability of the deduplication process to work effectively; it also
reduces performance. Care must be taken to ensure that multiplexing is not happening by default in a backup
application configuration. For example when using HP DataProtector to back up multiple client servers in a single
backup job, it will default to writing four concurrent multiplexed servers in a single stream. This must be disabled
by reducing the “Concurrency” configuration value for the tape device from 4 to 1.
Use multiple backup streams
The HP D2D system performs best with multiple backup streams sent to it simultaneously. For example an HP
D2D4112 G2 will back up data at approximately 80 MB/s for a single stream, however multiple streams can
deliver an aggregate performance in excess of 360 MB/s.
9
The following graph illustrates only the relationship between the number of active data streams and performance.
It is not based on real data.
Data compression and encryption backup application features
Both software compression and encryption will randomize the source data and will, therefore, not result in a high
deduplication ratio for these data sources. Consequently, performance will also suffer. The D2D appliance will
compress the data at the end of deduplication processing anyway.
For these reasons it is best to do the following, if efficient deduplication and optimum performance are required:
Ensure that there is no encryption of data before it is sent to the D2D appliance. Ensure that software compression is turned off within the backup application.
Not all data sources will result in high deduplication ratios and performance can, therefore, vary across different
data sources. Digital images, video, audio and compressed file archives will typically all yield low deduplication
ratios. If this data predominantly comes from a small number of server sources, consider setting up a separate
library/share for these sources for best performance.
10
Network configuration
All D2D appliances have two 1GBit Ethernet ports, the D2D4312 and D2D4324 appliances also have two
10GBit Ethernet ports. The Ethernet ports are used for data transfer to iSCSI VTL devices and CIFS/NFS shares
and also for management access to the Web Management Interface.
In order to deliver best performance when backing up data over the Ethernet ports it will be necessary to
configure the D2D network ports, backup servers and network infrastructure to maximise available bandwidth to
the D2D device.
Each pair of network ports on the D2D, (i.e. the two 1Gbit ports and the two 10Gbit ports on D2D 4312/4324)
have four configuration modes available as follows:
Single port – In this case Port 1 (1Gbit) or Port 3 (10Gbit) must be used for all data and management traffic. Dual port – Both ports in the pair are used, but must be in separate subnets, both ports can access the Web
Management Interface and virtual libraries and NAS shares are available on both ports.
High Availability mode (Port failover) – Both ports in the pair are used but are bonded to appear as a single
port with a single IP address. Data is only carried on one link, with the other link providing failover.
High Availability mode (Link Aggregate) – Both ports in the pair are used but are bonded to appear as a
single port with a single IP Address. Data is carried across both links to double the available bandwidth on a
single subnet.
11
Single Port mode
1Gbit ports: use this mode only if no other ports are available on the switch network or if the appliance is being
used to transfer data over fibre channel ports only.
On an HP D2D4312 or D2D4324 with 10Gbit ports it is possible that a single 10Gbit port will deliver good
performance in most environments.
Network configuration, single port mode
12
Dual Port mode
Use this mode if:
Servers to be backed up are split across two physical networks which need independent access to the D2D
appliance. In this case virtual libraries and shares will be available on both network ports; the host
configuration defines which port is used.
Separate data (“Network SAN”) and management LANs are being used, i.e. each server has a port for
business network traffic and another for data backup. In this case one port on the D2D can be used solely for
access to the Web Management Interface with the other used for data transfer.
Network configuration, dual port mode
In the case of a separate network SAN being used, configuration of CIFS backup shares with Active Directory
authentication requires careful consideration, see Network Configurations for CIFS AD on page 15 for more
information.
13
High availability port mode (Port failover)
In this mode, no special switch configuration is required other than to ensure that both Ethernet ports in the pair
from the D2D appliance are connected to the same switch. This mode sets up “bonded” network ports, where
both network ports are connected to the same physical switch and behave as one network port. This mode
provides some level of load balancing across the ports but generally only provides port failover.
High availability port mode (Link Aggregate)
This is the recommended mode to achieve highest performance for iSCSI and NAS operation. It requires the
switch to support IEEE802.3ad Link Aggregation and needs the switch to be set up accordingly. See the relevant
product user guide for more details. This mode sets up “bonded” network ports, where both network ports in the
pair are connected to the same physical switch and behave as one network port. This mode provides maximum
bandwidth across the ports and also port failover if one link goes offline.
Network configuration, high availability port mode
14
10Gbit Ethernet ports on the 4312/4324 appliances
10Gbit Ethernet is provided as a viable alternative to the Fibre Channel interface for providing maximum VTL
performance and also comparable NAS performance. When using 10Gbit Ethernet it is common to configure a
“Network SAN”, which is a dedicated network for backup that is separate to the normal business data network;
only backup data is transmitted over this network.
Network configuration, HP D2D4312/4324 with 10Gbit ports
When a separate network SAN is used, configuration of CIFS backup shares with Active Directory authentication
requires careful consideration, see the next section for more information.
Network configuration for CIFS AD
When using CIFS shares for backup on a D2D device in a Microsoft Active Directory environment the D2D CIFS
server may be made a member of the AD Domain so that Active Directory users can be authenticated against
CIFS shares on the D2D.
However, in order to make this possible the AD Domain controller must be accessible from the D2D device. This
may be conflict with a configuration with a Network SAN in order to separate Corporate LAN and Backup
traffic.
15
Broadly there are two possible configurations which allow both:
Access to the Active Directory server for AD authentication and Separation of Corporate LAN and Network SAN traffic
Option 1: HP D2D Backup System on Corporate SAN and Network SAN
In this option, the D2D device has a port in the Corporate SAN which has access to the Active Directory Domain
Controller. This link is then used to authenticate CIFS share access.
The port(s) on the Network SAN are used to transfer the actual data.
This configuration is relatively simple to configure:
On D2D devices with only 1Gbit ports: Dual Port mode should be configured and each port connected and
configured for either the Corporate LAN or Network SAN. In this case one data port is “lost” for authentication
traffic so this solution will not provide optimal performance.
On HP D2D4312/4324 devices with both 10Gbit and 1Gbit ports: the 10Gbit ports can be configured in a
bonded network mode and configured for access to the Network SAN. One or both of the 1Gbit ports can
then be connected to the Corporate LAN for authentication traffic. In this case optimal performance can be
maintained.
The backup application media server also needs network connections into both the Corporate LAN and Network
SAN. The diagram below shows this configuration with an HP D2D4300 Series Backup System.
HP D2D Backup System on Corporate SAN and Network SAN
16
Option 2: HP D2D Backup System on Network SAN only with Gateway
In this option the D2D has connections only to the Network SAN, but there is a network router or Gateway server
providing access to the Active Directory domain controller on the Corporate LAN. In order to ensure two-way
communication between the Network SAN and Corporate LAN the subnet of the Network SAN should be a
subnet of the Corporate LAN subnet.
Once configured, authentication traffic for CIFS shares will be routed to the AD controller but data traffic from
media servers with a connection to both networks will travel only on the Network SAN. This configuration allows
both 1Gbit network connections to be used for data transfer but also allows authentication with the Active
Directory Domain controller. The illustration shows a simple Class C network for a medium-sized LAN
configuration.
HP D2D Backup System on Network SAN only with Gateway
17
Backup server networking
It is important to consider the whole network when considering backup performance, any server acting as a
backup server should be configured where possible with multiple network ports that are teamed / bonded in
order to provide a fast connection to the LAN. Client servers (those that back up via a backup server) may be
connected with only a single port, if backups are to be aggregated through the backup server.
Ensure that no sub-1Gbit network components are in the backup path because this will significantly restrict
backup performance.
18
Virtual Library 1
(e.g. D2D Generic)
Virtual Library n
(e.g. MSL 2024 )
Virtual Library 2
(e.g. MSL 4048)
Host 1Host nHost 2
Medium
Changer
Tape
Drive 1
Tape
Drive 2
Medium
Changer
Medium
Changer
Tape
Drive 1
Tape
Drive 2
Tape
Drive 3
Tape
Drive 4
Tape
Drive 1
Tape
Drive 2
FC Port 1FC Port 2
FC HBAFC HBAFC HBA
HP StorageWorks D2D Backup System
Fibre Channel Fabric
Fibre Channel configuration
Fibre Channel topologies
The D2D appliances support both switched fabric and direct attach (private loop) topologies. Direct Attach (point
to point) topology is not supported.
Switched fabric using NPIV (N Port ID Virtualisation) offers a number of advantages and is the preferred topology
for D2D appliances.
Switched fabric
A switched fabric topology utilizes one or more fabric switches to provide a flexible configuration between
several Fibre Channel hosts and Fibre Channel targets such as the D2D appliance virtual libraries. Switches may
be cascaded of meshed together to form large fabrics.
Fibre Channel, switched fabric topology
The D2D Backup Appliances with a Fibre Channel interface have two FC ports. Either or both of the FC ports
may be connected to a FC fabric, but each virtual library may only be associated with one of these FC ports. By
default, each virtual library will be visible to all hosts connected to the same fabric. It is recommended that each
virtual library is zoned to be visible to only the hosts that require access. Unlike the iSCSI virtual libraries, FC
virtual libraries can be configured to be used by multiple hosts if required.
19
Direct Attach (Private Loop)
Virtual Library 1
(e.g. D2D Generic)
Virtual Library n
(e.g. MSL 2024)
Virtual Library 2
(e.g. MSL 4048)
Host 1Host 2
Medium
Changer
Tape
Drive 1
Tape
Drive 2
Medium
Changer
Medium
Changer
Tape
Drive 1
Tape
Drive 2
Tape
Drive 3
Tape
Drive 4
Tape
Drive 1
Tape
Drive 2
FC Port 1FC Port 2
FC HBAFC HBA
HP StorageWorks D2D Backup System
A direct attach (private loop) topology is implemented by connecting the D2D appliance ports directly to a Host
Bus Adapter (HBA). In this configuration the Fibre Channel private loop protocol must be used.
Fibre Channel, direct attach (private loop) topology
Either of the FC ports on a D2D Backup System may be connected to a FC private loop, direct attach topology.
The FC port configuration of the D2D Appliance should be changed from the default N_Port topology setting to
Loop. This topology only supports a single host connected to each private loop configured FC port.
Zoning
Zoning is only required if a switched fabric topology is used and provides a way to ensure that only the hosts
and targets that they need are visible to servers, disk arrays, and tape libraries. Some of the benefits of zoning
include:
Limiting unnecessary discoveries on the D2D appliance Reducing stress on the D2D appliance and its library devices by polling agents Reducing the time it takes to debug and resolve anomalies in the backup/restore environment Reducing the potential for conflict with untested third-party products
20
Zoning may not always be required for configurations that are already small or simple. Typically the larger the
SAN, the more zoning is needed. Use the following guidelines to determine how and when to use zoning.
Small fabric (16 ports or less)—may not need zoning. Small to medium fabric (16 - 128 ports)—use host-centric zoning. Host-centric zoning is implemented by
creating a specific zone for each server or host, and adding only those storage elements to be utilized by
that host. Host-centric zoning prevents a server from detecting any other devices on the SAN or including
other servers, and it simplifies the device discovery process.
Disk and tape on the same pair of HBAs is supported along with the coexistence of array multipath software
(no multipath to tape or library devices on the HP D2D Backup System, but coexistence of the multipath
software and tape devices).
Large fabric (128 ports or more)—use host-centric zoning and split disk and tape targets. Splitting disk and
tape targets into separate zones will help to keep the HP D2D Backup System free from discovering disk
controllers that it does not need. For optimal performance, where practical, dedicate HBAs for disk and
tape.
Diagnostic Fibre Channel devices
For each D2D FC port there is a Diagnostic Fibre Channel Device presented to the Fabric. There will be one per
active FC physical port. This means there are two per HP D2D4000 series appliance that has two Fibre Channel
ports.
The Diagnostic Fibre Channel Device can be identified by the following example text.
Symbolic Port Name "HP D2D S/N-CZJ1440JBS HP D2DBS Diagnostic Fibre
Channel S/N-MY5040204H Port-1"
Symbolic Node Name "HP D2D S/N-CZJ1440JBS HP D2DBS Diagnostic Fibre Channel
S/N-MY5040204H"
A virtual driver or loader would be identified by the following example text:
Symbolic Port Name "HP D2D S/N-CZJ1440JBS HP Ultrium 4-SCSI Fibre Channel
S/N-CZJ1440JC5 Port-0"
Symbolic Node Name "HP S/N-CZJ1440JBS HP Ultrium 4-SCSI Fibre Channel S/NCZJ1440JC5"
In the above the S/N-CZJ1440JBS for all devices should be identical. If this is Node Port 1, the Node Name
string will be as above but, if Port 2, the Node Name string will end with “Port-2”. Often the diagnostic device
will be listed above the other virtual devices as it logs in first ahead of the virtual devices. The S/N-MY5040204Hstring is an indication of the QLC HBA’s SN not any SN of an appliance/node.
At this time these devices are part of the StoreOnce D2D VTL implementation and are not an error or fault
condition. It is recommended that these devices be removed from the switch zone that is also used for virtual
drives and loaders.
Fibre Channel configuration via Web Management Interface
Full details on how to use the Web Management Interface to create virtual libraries and assign them to one of the
two ports on the appliance (Port 0 and Port 1) are provided in the HP D2D Backup System user guide.
There is a page on the Web Management Interface that allows you to view and edit the Fibre Channel SAN
settings, if necessary. It shows FC settings for each port on the appliance. The editable fields are:
Speed: The default is Auto, which is the recommended option. For users who wish to fix the speed,
other available values are 8Gbs (D2D4312 and D2D4324 only), 4Gbs, 2Gbs and 1Gbs.
Configuring a slower speed can impact performance.
Topology: The default is Auto, which is the recommended option. Loop, where the D2D appliance
simulates a large number of FC devices, and N_Port, when a single target device creates many
virtual devices on a fabric attach port, are also supported. N_Port requires the switch port to
support NPIV (N_Port ID Virtualisation).
21
Another page on the Configuration – Fibre Channel page of the Web Management interface shows the status for
Product/Model
Name
Part Number
Ethernet Connection
Fibre Channel Connection
D2D2502iG1
EJ001A
2 x 1GbE
None
D2D2503 G1
EH945A
2 x 1GbE
None
D2D2504i G1
EJ002A
2 x 1GbE
None
D2D400xi G1
EH938A/EH939A
2 x 1GbE
None
D2D400xFC G1
EH941A/EH942A
2 x 1GbE
2 x 4Gb FC
D2D4112 G1
EH993A
2 x 1GbE
2 x 4Gb FC
D2D2502i G2
EJ001B
2 x 1GbE
None
D2D2504i G2
EJ002B
2 x 1GbE
None
D2D4106i G2
EH996A
2 x 1GbE
None
D2D4106FC G2
EH998A
2 x 1GbE
2 x 4Gb FC
D2D4112 G2
EH993B
2 x 1GbE
2 x 4Gb FC
D2D4312 G2
EH983A
2 x 1GbE,
2 x 10GbE
2 x 8Gb FC
D2D4324 G2
EH985A
2 x 1GbE,
2 x 10GbE
2 x 8Gb FC
all the FC devices that are configured on the D2D appliance. It lists the connection state, port ID, Port type and
number of logins for each virtual library and drive connection. This page is mainly for information and is useful in
troubleshooting.
Best practices for network and Fibre Channel configuration
The following table shows which network and fibre channel ports are present on each model of D2D appliance.
Correct configuration of these interfaces is important for optimal data transfer.
Key factors when considering performance.
It is important to consider the whole network when considering backup performance. Any server acting as a
backup server should be configured where possible with multiple network ports that are bonded in order to
provide a fast connection to the LAN. Client servers (those that backup via a backup server) may be connected
with only a single port if backups are to be aggregated through the backup server.
Ensure that no sub-1Gbit network components are in the backup path as this will significantly restrict backup
performance.
Configure “High availability mode (link Aggregate)” network ports to achieve maximum available network
bandwidth.
Virtual library devices are assigned to an individual interface. Therefore, for best performance, configure both
FC ports and balance the virtual devices across both interfaces to ensure that one link is not saturated whilst
the other is idle.
Switched fabric mode is preferred for optimal performance on medium to large SANs since zoning can be
used.
When using switched fabric mode, Fibre Channel devices should be zoned on the switch to be only accessible
from a single backup server device. This ensures that other SAN events, such as the addition and removal of
other FC devices, do not cause unnecessary traffic to be sent to devices. It also ensures that SAN polling
applications cannot reduce the performance of individual devices.
22
A mixture of iSCSI and FC port virtual libraries and NAS shares can be configured on the same D2D
Daily backup - typical amount of data that can
be backed up AND REPLICATED in 24hrs.
Daily backup - typical amount of data that can
be backed up in 24hrs.
32 TB
38TB
20 TB
24TB
D2D4324
10TB
13TB
7TB
8TB
appliance to balance performance needs.
Sizing solutions
The following diagram provides a simple sizing guide for the HP D2D Generation 2 product family for backups
and backups and replication.
HP D2D Backup System Gen 2 sizing guide
Daily backup: typical amount of data that can be protected by HP StoreOnce Backup systems
Note: Assumes fully configured product, compression rate of 1.5, data change rate of 1%, data retention period
of 6 months and a 12-hour backup window. Actual performance is dependent upon data set type, compression
levels, number of data streams, number of devices emulated and number of concurrent tasks, such as
housekeeping or replication. Additional time is required for periodic physical tape copy, which would reduce the
amount of data that can be protected in 24 hours.
HP also provides a downloadable tool to assist in the sizing of D2D-based data protection solutions at
http://www.hp.com/go/storageworks/sizer.
23
The use of this tool enables accurate capacity sizing, retention period decisions and replication link sizing and
performance for the most complex D2D environments.
A fully worked example using the Sizing Tool and best practices is contained later in the document, see
Appendix B.
24
VTL best practices
Summary of best practices
Tape drive emulation types have no effect on performance or functionality. Configuring multiple tape drives per library enables multi-streaming operation per library for good aggregate
performance.
Do not exceed the recommended maximum concurrent backup streams per library and appliance if maximum
performance is required. See Appendix A.
Target the backup jobs to run simultaneously across multiple drives within the library and across multiple
libraries.
Create multiple libraries on the larger D2D appliances to achieve best aggregate performance.
– Individual libraries for backing up larger servers
– Libraries for consolidated backups of smaller servers
– Separate libraries by data type if the best trade-off between deduplication ratio and performance is needed
Cartridge capacities should be set either to allow a full backup to fit on one cartridge or to match the physical
tape size for offload (whichever is the smaller)
Use a block size of 256KB or greater. For HP Data Protector and EMC Networker software a block size of
512 KB has been found to provide the best deduplication ratio and performance balance.
Disable the backup application verify pass for best performance. Remember that virtual cartridges cost nothing and use up very little space overhead. This allows you to assign
backup jobs their own cartridges rather than appending very many small backups to a single piece of virtual
media. This will provide performance benefits during backup and with housekeeping.
Define slot counts to match required retention policy. The D2DBS, ESL and EML virtual library emulations can
have a large number of configurable slots.
Design backup policies to overwrite media so that space is not lost to a large expired media pool and media
does not have different retention periods.
All backups within a policy can remain on the D2D (there is no need to export or delete cartridges) as there is
very little penalty in keeping multiple full backups in a deduping library.
Recommend that full backup jobs are targeted to specific cartridges, sized appropriately. Reduce the number of appends by specifying separate cartridges for each incremental backup.
Tape library emulation
Emulation types
The HP D2D Backup Systems can emulate several types of physical HP Tape Library device; the maximum
number of drives and cartridge slots is defined by the type of library configured.
Performance however is not related to library emulation other than in the respect of the ability to configure
multiple drives per library and thus enable multiple simultaneous backup streams (multi-streaming operation).
To achieve the best performance of the larger D2D appliances more than one virtual library will be required to
meet the multi-stream needs. For G1 products, a single virtual library is restricted to a maximum of four virtual
drives. As an example, an HP D2D4112 G1 appliance would require at least 10 streams running in parallel to
approach maximum aggregate throughput and so should be configured with a minimum of three virtual libraries,
each with four drives running in parallel, if achieving maximum performance is a critical factor.
For G2 products with 2.1.X or later software a more flexible emulation strategy is supported. The appliance is
provided with a drive pool and these can be allocated to libraries in a flexible manner and so more than 4
drives per library can be configured up to a maximum as defined by the library emulation type. The number of
25
cartridges that can be configured per library has also increased compared to G1 products. The table below lists
D2D2502 G1
D2D2503 G1
D2D2504 G1
D2D400X G1
D2D4112 G1
Maximum VTL drives per library
4 1 4 4 4
Maximum slots per library (D2DBS)
48
48
48
96
144
Maximum slots (MSL2024, MSL4048,
MSL8096)
24, 48, 96
24, 48, 96
24, 48, 96
24, 48, 96
24, 48, 96
Recommended maximum concurrent
backup streams per appliance
16 8 24
12
24
Recommended maximum concurrent
backup streams per library
4 4 4 4 4
D2D2502 G2
D2D2504 G2
D2D4106/4112 G2
D2D4312/4324 G2
Maximum VTL drives per
library/appliance
16
32
64/96
200
Maximum slots per library (D2DBS,
EML-E, ESL-E)
96
96
1024
4096
Maximum slots (MSL2024, MSL4048,
MSL8096)
24, 48, 96
24, 48, 96
24, 48, 96
24, 48, 96
Maximum active streams per store
32
48
64/96
128
Recommended maximum concurrent
backup streams per appliance
16
24
48
64
Recommended maximum concurrent
backup streams per library
4 4 6
12
the key parameters for both G1 and G2 products.
To achieve best performance the recommended maximum concurrent backup streams per library and appliance
in the table should be followed. As an example, while it is possible to configure 40 drives per library on a 4312
appliance, for best performance no more than 12 of these drives should be actively writing or reading at any
one time.
The HP D2DBS emulation type and the ESL/EML type, supported on the G2 product family with code revision 2.1
onwards, provide the most flexibility in numbers of cartridges and drives. This has two main benefits:
It allows for more concurrent streams on backups which are throttled due to host application throughput, such
as multi-streamed backups from a database.
It allows for a single library (and therefore Deduplication Store) to contain similar data from more backups that
can run in parallel to increase deduplication ratio.
The D2DBS emulation type has an added benefit in that it is also clearly identified in most backup applications
as a virtual tape library and so is easier for supportability. It is the recommended option for this reason.
There are a number of other limitations from an infrastructure point of view that need to be considered when
allocating the number of drives per library. As a general point it is recommended that the number of tape drives
per library does not exceed 64 due to the restrictions below:
26
For iSCSI VTL devices a single Windows or Linux host can only access a maximum of 64 devices. A
single library with 63 drives is the most that a single host can access. Configuring a single library with
more than 63 drives will result in not all devices in the library being seen (which may include the library
device). The same limitation could be hit with multiple libraries and fewer drives per library.
A similar limitation exists for Fibre Channel. Although there is a theoretical limit of 255 devices per FC
port on a host or switch, the actual limit appears to be 128 for many switches and HBAs. You should
either balance drives across FC ports or configure less than 128 drives per library.
Some backup applications will deliver less than optimum performance if managing many concurrent
backup tape drives/streams. Balancing the load across multiple backup application media servers can
help here.
Cartridge sizing
The size of a virtual cartridge has no impact on its performance; it is recommended that cartridges are created to
match the amount of data being backed up. For example, if a full backup is 500 GB, the next larger
configurable cartridge size is 800 GB, so this should be selected.
If backups span multiple cartridges, there is a risk that this will impact performance because housekeeping
operations will start on the first backup cartridge as soon as the backup application spans to the next cartridge. If
backups do span a cartridge, the effect of the housekeeping process should be quite small because it will hold off
due to the remaining backup traffic to the appliance taking priority. The performance overhead might become
higher, if several backup jobs are running concurrently and they all behave in the same way so that several
cartridges are performing low level housekeeping activity.
It is best practice to configure housekeeping blackout windows, as described on page 71.
Note that if backups are to be offloaded to physical media elsewhere in the network, it is recommended that the
cartridge sizing matches that of the physical media to be used. For G1 products, direct attach tape offload
cannot span physical cartridges so if a cartridge is too large, the offload operation will fail. Some backup
application-controlled copies can accommodate spanning of a single cartridge onto multiple physical cartridges,
however.
Number of libraries per appliance
The D2D appliance supports the creation of multiple virtual library devices. If large amounts of data are being
backed up from multiple hosts or for multiple disk LUNs on a single host, it is good practice to separate these
across several libraries (and consequently into multiple backup jobs). Each library has a separate deduplication
“store” associated with it. Reducing the amount of data in, and complexity of, each store will improve its
performance.
Creating a number of smaller deduplication “stores” rather than one large store which receives data from
multiple backup hosts could have an impact on the overall effectiveness of deduplication. However, generally,
the cross-server deduplication effect is quite low unless a lot of common data is being stored. If a lot of common
data is present on two servers, it is recommended that these are backed up to the same virtual library.
For best backup performance, configure multiple virtual libraries and use them all concurrently. For best deduplication performance, use a single virtual library and fully utilize all the drives in that one
library.
Backup application configuration
In general backup application configurations for physical tape devices can be readily ported over to target a
deduplicating virtual library with no changes; this is one of the key benefits of virtual libraries – seamless
integration. However considering deduplication in the design of a backup application configuration can improve
performance, deduplication ratio or ease of data recovery so some time spent optimizing backup application
configuration is valuable.
Blocksize and transfer size
As with physical tape, larger tape block sizes and host transfer sizes are of benefit. This is because they reduce
the amount of overhead of headers added by the backup application and also by the transport interface. The
recommended minimum is 256 KB block size, and up to 1 MB is suggested.
For HP Data Protector and EMC Networker Software a block size of 512 KB has been found to provide the best
deduplication ratio and performance balance and is the recommended block size for this application.
27
Some minor setting changes to upstream infrastructure might be required to allow backups with greater than 256
KB block size to be performed. For example, Microsoft’s iSCSI initiator implementation, by default, does not
allow block sizes that are greater than 256 KB. To use a block size greater than this you need to modify the
following registry setting:
Change the REG_DWORD MaxTransferLengthto “80000” hex (524,288 bytes), and restart the
media server – this will restart the iSCSI initiator with the new value.
Disable backup application verify pass
Most backup applications will default to performing a verify operation after a backup job. Whilst this offers a
very good way to ensure that data is backed up successfully it will also heavily impact the performance of the
whole backup job.
Performing a verify operation will more than double the overall backup time due to the fact that restore
performance (required for verify) is slower for inline deduplication-enabled devices.
Disabling verify for selected backup jobs can be done relatively safely as D2D Backup Systems perform CRC
(Cyclic Redundancy Check) checking for every backed-up chunk to ensure that no errors are introduced by the
D2D system. Verifying some backup jobs on a regular basis is recommended. For example, verifying the weekly
full backups where additional time is available might be an option.
Rotation schemes and retention policy
Retention policy
The most important consideration is the type of backup rotation scheme and associated retention policy to
employ. With data deduplication there is little penalty for using a large number of virtual cartridges in a rotation
scheme and therefore a long retention policy for cartridges because most data will be the same between backups
and will therefore be deduplicated.
A long retention policy has the following benefits:
It provides a more granular set of recovery points with a greater likelihood that a file that needs to be
recovered will be available for longer and in many more versions.
It reduces the overwrite frequency of full backup cartridge which reduces the amount of deduplication
housekeeping overhead required.
When using the D2D to copy cartridges to physical media a long retention policy ensures longer validity of the
offloaded cartridges because they inherit the same retention period as the virtual cartridge. Therefore, their
ability to restore data from a physical cartridge is improved.
Rotation scheme
There are two aspects to a rotation scheme which need to be considered:
Full versus Incremental/Differential backups Overwrite versus Append of media
Full versus Incremental/Differential backups
The requirement for full or incremental backups is based on two factors, how often offsite copies of virtual
cartridges are required and speed of data recovery. If regular physical media copies are required, the best
approach is that these are full backups on a single cartridge. Speed of data recovery is less of a concern with a
virtual library appliance than it is with physical media. For example, if a server fails and needs to be fully
recovered from backup, this recovery will require the last full backup plus every incremental backup since (or the
last differential backup). With physical tape it can be a time consuming process to find and load multiple
physical cartridges, however, with virtual tape there is no need to find all of the pieces of media and, because
the data is stored on disk, the time to restore single files is lower due to the ability to randomly seek within a
backup more quickly and to load a second cartridge instantly.
28
Overwrite versus append of media
Overwriting and appending to cartridges is also a concept where virtual tape has a benefit. With physical media
it is often sensible to append multiple backup jobs to a single cartridge in order to reduce media costs; the
downside of this is that cartridges cannot be overwritten until the retention policy for the last backup on that
cartridge has expired. The diagram below shows cartridge containing multiple appended backup sessions some
of which are expired and other that are valid. Space will be used by the D2D to store the expired sessions as
well as the valid sessions. Moving to an overwrite strategy will avoid this.
With virtual tape a large number of cartridges can be configured for “free” and their sizes can be configured so
that they are appropriate to the amount of data stored in a specific backup. Appended backups are of no benefit
because media costs are not relevant. In addition, there will be a penalty when performing any tape offload
because the whole cartridge is offloaded with all backup sessions.
Our recommendations are:
Target full backup jobs to specific cartridges, sized appropriately Reduce the number of appends by specifying separate cartridges for each incremental backup
Taking the above factors into consideration, an example of a good rotation scheme where the customer requires
weekly full backups sent offsite and a recovery point objective of every day in the last week, every week in the
last month, every month in the last year and every year in the last 5 years might be as follows:
– 4 daily backup cartridges, Monday to Thursday, incremental backup, overwritten every week.
– 4 weekly backup cartridges, Fridays, full backup, overwritten every fifth week
– 12 monthly backup cartridges, last Friday of month, overwritten every 13th month.
– 5 yearly backup cartridges, last day of year, overwritten every 5 years.
This means that in the steady state, daily backups will be small, and whilst they will always overwrite the last
week, the amount of data overwritten will be small. Weekly full backups will always overwrite, but housekeeping
has plenty of time to run over the following day or weekend or whenever scheduled to run, the same is true for
monthly and yearly backups.
Total virtual tapes required in above rotation = 25
Each backup job effectively has its own virtual tape.
The customer is also able to offload a full backup every week, month and year after the full backup runs to
physical tape for offsite storage.
29
D2D NAS best practices
Introduction to D2D NAS backup targets
The HP StorageWorks D2D Backup System now supports the ability to create a NAS (CIFS or NFS) share to be
used as a target for backup applications.
The NAS shares provide data deduplication in order to make efficient use of the physical disk capacity when
performing backup workloads.
The D2D device is designed to be used for backup not for primary storage or general purpose NAS (drag and
drop storage). Backup applications provide many configuration parameters that can improve the performance of
backup to NAS targets, so some time spent tuning the backup environment is required in order to ensure best
performance.
Note: HP D2D NAS Implementation guides are available for on the HP web site for the following backup
applications: HP Data Protector 6.11, Symantec Backup Exec 2010, CommVault Simpana 9 and Symantec
NetBackup.
Overview of NAS best practices
Configure bonded network ports for best performance. Configure multiple shares and separate data types into their own shares. Adhere to the suggested maximum number of concurrent operations per share/appliance. Choose disk backup file sizes in backup software to meet the maximum size of the backup data.If this is not
possible, make the backup container size as large as possible.
Disable software compression, deduplication or synthetic full backups. Do not pre-allocate disk space for backup files. Do not append to backup files.
Choosing NAS or VTL for backup
D2D Backup Systems provide both NAS and VTL interfaces; the most appropriate interface for a particular
backup need varies depending on several requirements.
Benefits of NAS
Simpler configuration as no drivers need to be installed and the interface is familiar to most computer users. Often backup to disk is provided as a no cost option in backup applications, so application licensing is
cheaper and simpler.
Backup applications are introducing new features that make good use of disk as a target such as Disk-to-Disk-
to-Tape migration.
Backup applications that do not support backup to tape devices can now be used.
Benefits of VTL
Seamless integration with existing physical tape environment. Backup application media copy is available for physical tape copies. Performance may be better than NAS in some configurations due to the efficiency differences between the
protocols, especially when using a Fibre Channel VTL interface.
30
Loading...
+ 101 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.