Design Guidelines for Virtual Tape Libraries with
Deduplication and Replication
This document describes the HP StorageWorks VLS and D2D systems and their concepts, including automigration,
deduplication, and replication, to help you define and implement your virtual tape library system. It includes
best practices for working with specific backup applications. This document is intended for use by system
administrators who are experienced with setting up and managing system backups over a SAN.
*AG306-96028*
Part number: AG306-96028
Seventh edition: March 2010
The information contained herein is subject to change without notice. The only warranties for HP products and services are set
forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Acknowledgements
Microsoft, Windows, Windows XP, and Windows NT are U.S. registered trademarks of Microsoft Corporation.
Oracle is a registered US trademark of Oracle Corporation, Redwood City, California.
Welcome to virtual tape libraries. This guide describes the HP StorageWorks VLS and D2D systems
and their concepts, including automigration, deduplication, and replication, to help you define and
implement your virtual tape library system. It includes best practices for working with specific backup
applications.
Although every user environment and every user’s goals are different, there are basic considerations
that can help you use the VLS or D2D effectively in your environment. The VLS and D2D are two
powerful and flexible families of devices. Because they can be productively used in so many ways,
there is no “best” configuration. But by asking yourself the questions and following the parameters
outlined in this guide, you can define and implement a system that is best for your particular
environment and applications.
Before proceeding, make sure you are familiar with the items below.
• Tape backup technologies, tape libraries, and backup software.
• SAN environments.
• Fibre Channel technology.
See the Glossary for the definition of acronyms and specific terms.
NOTE:
This guide replaces the
Deduplication and replication solutions guide
guide
.
HP StorageWorks Virtual Library System Solutions Guide
, and the
, the
HP StorageWorks
HP StorageWorks Deduplication solutions
HP StorageWorks VLS and D2D Solutions Guide13
Introduction14
2 Concepts
Disk-based Backup and Virtual Tape Libraries
Problems Addressed by Virtual Tape Libraries
You can optimize your backup environment with VLS and D2D if you are:
• Not meeting backup windows due to slow servers.
• Not consistently streaming your tape drives.
• Dealing with restore problems caused by interleaving.
• Performing many restores (such as single file or small database restores).
• Backing up data that has a short life.
• Having issues with backup reliability.
• Using snapshot and clone technology for non-critical data (which makes the storage inappropriately
expensive for the nature of the data).
• Looking to deemphasize tape in your environment. Bear in mind that removable media remains
valuable in its own right and for particular purposes such as site protection and protection from
malicious attack (e.g., viruses and hackers), data distribution, data copy, archive, and regulatory
compliance.
• Improving media management. You can keep incremental backups on virtual tape and send full
backups straight to tape.
Integration of Disk in Data Protection Processes
Globalization, 24x7 environments, and consolidation are driving more rigorous data protection
requirements. To address these requirements, disk is frequently introduced into the backup process.
In disk solutions, data is backed up from an application server (disk) over a dedicated SAN to a
disk-based system and from there to a traditional tape library. This provides enhanced solutions for
slow servers, single-file restores, and perishable data.
One of the particular benefits of the VLS and D2D is that they make a disk array look to your backup
server like a tape library. Implementation requires no new software and no significant redesign of
your backup processes. On the VLS300 and VLS12000 Gateways, because they are attached to an
EVA, the existing Fibre Channel infrastructure and management framework is used and there is no
new management server required.
NOTE:
Tape holds its value for ease of vaulting, economical long-term retention, and immutability (with
WORM). It is the last step in your data’s storage cycle.
HP StorageWorks VLS and D2D Solutions Guide15
Where Virtual Tape Fits in the Big Picture
Virtual libraries are not necessarily the only piece of your backup plans, but they can be an integral
piece of a successful solution. Figure 1 illustrates the common backup technologies and their relative
benefits and costs.
Figure 1 Common Backup Technologies
.
See What are the Alternatives? for more discussion of the other potential players in your backup
environment.
HP VLS and D2D Portfolio
HP offers a wide range of disk-based backup products to help organizations meet their data protection
challenges. Moving the front line of data protection from tape to disk reduces administrative overhead;
daily backups are entirely automated and involve no tape handling to provide better backup reliability
and less worry.
The entry level D2D100 series Backup System meets the needs of small businesses as a low-cost
solution that does not incorporate deduplication technology. The D2D2500 is well suited for remote
and branch offices and small IT environments, while the more powerful D2D4000 and D2D4100 are
designed for medium-sized companies and small data centers. The D2D2500, D2D4000 and
D2D4100 products include HP Dynamic deduplication, which provides low cost and flexibility to
meet the needs of smaller IT environments.
HP Virtual Library Systems are known for their easy integration, simple management, performance
and capacity scalability, and fast restores. The VLS6000, VLS9000 and VLS12000 EVA Gateway
are designed for medium to large-scale enterprises. They feature Accelerated deduplication, available
by license, to deliver the best backup performance and scalability for high availability data center
environments.
Concepts16
Figure 2 HP Virtual Tape Library Product Range
.
Typical VLS Environments
In a typical enterprise backup environment, there are multiple application servers backing up data to
a shared tape library on the SAN. Each application server contains a remote backup agent that sends
the data from the application server over the SAN fabric to a tape drive in the tape library. However,
because backup over the SAN is single-threaded (a single host is backing up to a single tape drive),
the speed of any single backup can be limited. This is particularly true when the environment has
high-speed tape drives such as Ultrium 2 or Ultrium 3. The hosts simply cannot keep the drives streaming
at capacity.
NOTE:
HP Ultrium drives will adjust the tape speed to match the data stream to prevent “back-hitching.”
However, the tape drive is still not operating at optimal performance and cannot share bandwidth
with another backup job.
Enterprise data centers with slow SAN hosts in the environment may be unable to utilize the full
performance of high-speed tape drives. Also, shared tape libraries on the SAN can be difficult to
configure both in the hardware and in the data protection software.
Typical D2D Environments
In a typical entry-level or mid-range backup environment, the backup application is performing LAN
backups to a dedicated (non-shared) backup target such as a tape library connected to the single
backup server. Multiple instances of the backup application will generally each require their own
dedicated backup target. These environments may also be remote branch offices, each with their own
local backup application.
As with the VLS, the backup speed of a single host backing up to a single tape drive is normally
limited by the host (which cannot stream high-speed tape drives such as LTO), so currently tape backups
use multiplexing to interleave multiple hosts’ backups together into a single tape drive impacting
HP StorageWorks VLS and D2D Solutions Guide17
restore performance. The addition of a D2D device to these environments allows de-multiplexing of
the backups so that restore performance is improved, the deduplication allows for a longer retention
time on disk without needing significantly higher disk capacities, and the deduplication-enabled
replication allows cost-effective off-site copying of the backups for disaster protection.
What are the Alternatives?
Alternatives to virtual tape solutions include:
• Physical Tape
• NAS (network attached storage)
• Application-based Disk Backup (disk to disk, backup to disk, disk to disk to tape)
• Business Copy (snapshot and clone solutions)
Physical Tape
Tape is the foundation for data protection and should be a part of most data protection solutions
(except those with highly perishable data). Consider a direct-to-tape scheme if:
• You are doing large image backups (i.e., databases), or
• Your servers can stream the tape drives.
and
• You do not need fast single file restore, or
• Your current backup window is not strained.
NAS
An alternative to a virtual library is a NAS device acting as a backup target (via NFS or CIFS network
file system protocols). However, this protocol has significant performance and scaling limitations;
writing backups over TCP/IP and NFS/CIFS to the NAS target uses much more CPU on the backup
infrastructure compared to Fibre Channel SAN. In addition, a NAS mount point does not scale to the
size of an enterprise virtual tape library. For example, a VLS can present a single virtual library target
containing multiple petabytes of tape capacity with all backup jobs configured to use the one common
shared high-performance high-capacity VLS backup device.
Consider a NAS target if you:
• Do not have high performance requirements.
• Do not want to run SAN backups.
• Do not need the backup target to significantly scale capacity or performance.
• Want to run Data Protector “virtual full backups.”
Application-based Disk Backup
Utilizing the file library functionality of backup applications is good for small or isolated jobs. When
a large-scale implementation is required, virtual tape offers a more easily managed, higher performing
solution. Consider a file library system if:
• The application is in a LAN or LAN/SAN hybrid configuration.
• Fewer than four servers write data to secondary disk storage.
• You can redeploy existing arrays as secondary disk storage.
• Your environment is static.
Concepts18
Figure 3 Basic Write-to-disk Setup
.
Table 1 VLS Compared to Application-based Write-to-disk
Write-to-diskVirtual tape devices
Setup and management complexity
Data compression
Performance
Cost
Business Copy
Using a business-copy solution (array snapshots/clones) generally involves a much higher cost than
a virtual library system. You might, however, implement such a solution if:
• Virtually instant recovery is critical.
• You need to leverage a high-availability investment.
• You are doing image recovery rather than file recovery.
• You need a zero downtime solution.
Sets up just like a physical tape library.
Software or hardware enabled (software
compression generally decreases performance).
Hardware devices are tuned for sequential
read and write operations.
More expensive acquisition cost.
Backup software licenses as if physical
library or per TB.
Storage efficiency gained through
compression.
Lower management overhead.
Requires configuration of RAID groups,
LUNs, volumes, and file systems.
No device-side data compression available.
Performance dependent on target array or
server.
Free or licensed per TB in most backup
applications.
Higher management overhead.
HP StorageWorks VLS and D2D Solutions Guide19
Deduplication
Introduction
In recent years, the amount of data that companies produce has been steadily increasing. To comply
with government regulations, or simply for disaster recovery and archival purposes, companies must
retain more and more data. Consequently, the costs associated with data storage – labor, power,
cooling, floor space, transportation of physical media – have all risen. Virtual tape libraries have
become a cornerstone in modern data protection strategy due to their many benefits; chief among
these is cost. The list of virtual tape benefits also includes seamless integration into existing backup
solutions, improved SAN backup performance, and faster single file restores than those performed
with physical tape.
Deduplication, one of the most significant storage enhancements in recent years, promises to reshape
future data protection and disaster recovery solutions. This technology is ideal for virtual tape libraries.
Deduplication technology references blocks of data that have been previously stored, and only storesnew backup data that is unique. Data that is not unique is replaced with a pointer to the location of
the original data. Because there is often a great deal of duplicate data present from one backup
session to the next, disk space is consumed by similar or identical iterations of data. Deduplication
greatly improves storage efficiency by only storing an instance of data once, while still allowing
backup streams to be restored as if they had been retained in their entirety. See Figure 4.
DescriptionItem
Data from the first backup stream is stored to disk.1
Duplicate data (in blue) as well as unique data (in red) in a second backup stream is identified.2
Duplicate data in the second backup stream is eliminated.3
Unique data in the second backup stream is stored to disk.4
Figure 4 Unique Backup Data
.
HP StorageWorks Deduplication Solutions
HP offers two deduplication technologies: HP Accelerated deduplication, a licensed feature available
with HP StorageWorks Virtual Library Systems (VLS), and HP Dynamic deduplication, an integrated
feature with HP StorageWorks D2D Backup System. Both HP deduplication solutions offer the following
benefits:
• Longer retention of data.
• Faster, less expensive recoveries and improved service levels.
• No data is lost – backup streams can be fully restored.
• Block or chunk level deduplication, providing greater reduction of data.
• Even greater reduction of data when combined with traditional data compression.
HP Accelerated deduplication and HP Dynamic deduplication are designed to meet different needs,
as shown in Table 2.
Table 2 HP Deduplication Solutions
HP Dynamic deduplicationHP Accelerated deduplication
• Intended for enterprise users.
• Uses object-level differencing technology.
• Fastest possible backup performance.
• Fastest restores.
• Most scalable solution in terms of performance and
capacity.
• Potentially higher deduplication ratios.
See VLS Accelerated Deduplication and D2D Dynamic Deduplication for more details on HP
deduplication technologies.
Deduplication Ratios
The storage capacity saved by deduplication is typically expressed as a ratio, where the sum of all
pre-deduplicated backup data is compared with the actual amount of storage the deduplicated data
requires. For example, a 10:1 ratio means that ten times more data is being stored than the actual
physical space it would require.
The most significant factors affecting the deduplication ratio are:
• How long the data is retained.
• How much the data changes between backups.
Table 3 provides an example of storage savings achieved with deduplication. However, many factors
influence how much storage is saved in your specific environment. Based on the retention policies
shown below, six months of data without deduplication requires 12.75 TB of disk space. With
deduplication, six months of data requires less than 1.25 TB of storage.
• Intended for mid-sized enterprise and remote office
users.
• Uses hash-based chunking technology.
• Integrated deduplication.
• Lower cost and a smaller RAM footprint.
• Backup application and data type independence
for maximum flexibility.
Retention policy:
• 1 week, 5 daily incremental backups
• 6 months, 25 weekly full backups
Data parameters:
• Data compression rate = 2:1
• Daily change rate = 1% (10% of data in 10% of files)
HP StorageWorks VLS and D2D Solutions Guide21
Table 3 1 TB File Server Backup
...
Approximately 11:1 reduction in data stored
Data stored with deduplicationData stored normally
500 GB500 GB1st daily full backup
5 GB50 GB1st daily incremental backup
5 GB50 GB2nd daily incremental backup
5 GB50 GB3rd daily incremental backup
5 GB50 GB4th daily incremental backup
5 GB50 GB5th daily incremental backup
25 GB500 GB2nd weekly full backup
25 GB500 GB3rd weekly full backup
25 GB500 GB25th weekly full backup
1,125 GB12,750 GBTotal
Table 4 is an example that may not reflect the savings that all environments achieve using deduplication.
As shown, deduplication ratios depend on the backup policy and on the percentage of change
between backups.
Table 4 Deduplication Ratio Impact
Backup policyDaily
change rate
*4 months = 5 daily + 17 weekly backups
See Performance for additional information on optimizing your deduplication performance.
Target-based Deduplication
VLS and D2D deduplication is target-based; the process is running transparently inside the hardware.
This means that when the data is read (by copying to physical tape, restoring a backup, etc.), the
device rebuilds the data. The data that is read is identical to the data that was originally written (like
tape drive compression); there are no pointers in the read data.
Daily incremental (10%) and weekly fullDaily full and weekly full
1 year6 months4 months*1 year6 months4 months*
23:116:112:125:119:115:10.5%
15:111:110:116:113:112:11.0%
9:17:17:19:19:18:12.0%
Concepts22
Tape Oversubscription
Deduplication requires more virtual tape capacity than physical disk; this is sometimes called tape
oversubscription. The purpose of deduplication is to reduce the amount of disk required to store
multiple generations of backups. Be sure to create enough virtual tape capacity to contain your entire
retention policy, and the amount of physical disk will be much less capacity due to deduplication.
For example, if you are backing up 50 TB per week and retaining four weeks, you need to create
enough virtual tape capacity (after compression) to store 200 TB of backups. If you have 2:1
compression, you must create 100 TB of virtual tape capacity to hold the four weeks of backup data.
Given deduplication across the four weeks of backup versions, the amount of physical disk required
for this 100 TB of virtual tape would be significantly less.
NOTE:
Do not create too much virtual tape capacity or your backup application may be set to prefer to use
blank tapes instead of recycling older tapes. You would likely run out of disk space because the older
backups are not being recycled/overwritten and thus the disk space used by these old backups is not
freed up. As in the example above, you should create enough virtual tape capacity to hold backups
for your entire retention policy but no more.
Replication
Introduction to Replication
Deduplication can automate the off-site process and enable disaster recovery by providing site to site
deduplication-enabled replication at a lower cost. Because deduplication knows what data has
changed at a block or byte level, replication becomes more intelligent and transfers only the changed
data instead of the complete data set. This saves time and replication bandwidth, and is one of the
most attractive features that deduplication offers. Replication enables better disaster tolerance with
higher reliability but without the operational costs associated with transporting data off-site on physical
tape.
You can take control of your data at its furthest outposts and bring it to the data center in a cost-effective
way. Using replication, you can protect data anywhere.
HP StorageWorks VLS and D2D Solutions Guide23
Figure 5 Enterprise Deployment with Small and Large Remote and Branch Offices
.
Replication provides end-to-end management of backup data from the small remote office to the
regional site and finally into the primary data center, all controlled from the primary data center,
while providing local access to backup data as well. Note that replication is within device families
(VLS to VLS, D2D to D2D).
HP StorageWorks Replication Solutions
Most companies now recognize the importance of a robust backup and restore data protection
strategy, although only enterprise level users tend to invest in site disaster recovery. In most cases,
data protection is in the form of daily off-siting of physical tapes. However, even the offsiting of
physical tapes has its down sides—a high level of manual intervention, tape tracking requirements,
etc. The physical transfer of tapes off-site is not very automated.
In addition, one of the pain points for many companies large and small is protecting data in remote
offices. Untrained IT staff manage a daily backup process involving changing of physical tapes, and
the process is prone to human error.
HP replication, available on its VLS and D2D systems, now offers the solution to both these problems.
You can replicate local backup data (virtual cartridges) between sites in a reliable, automated manner
at a fraction of the costs previously required when using high bandwidth links or in some cases physical
tape offsiting.
Consider the “Before” and “After” scenarios detailed below.
Concepts24
Figure 6 Remote Site Data Protection Before Replication
.
Figure 7 Remote Site Data Protection Using Replication
.
Deduplication is the key technology enabler for replication on HP VLS and D2D systems. (VLS systems
use HP Accelerated deduplication, and D2D systems use Dynamic deduplication.) The same technology
HP StorageWorks VLS and D2D Solutions Guide25
that allows duplicate data to be detected and stored only once on the HP VLS or D2D system also
allows only the unique data to replicate between sites. Because the volume of data being replicated
between sites is much less than if the full data set was replicated, you can use lower bandwidth links
at correspondingly lower price points. In addition, backup at remote offices can be automated to a
local virtual tape library and then replicated back to a regional data center or primary data center
allowing end-to-end management from the data center of all data in the remote offices.
This transformation is shown in Table 5 which compares the amount of data to transfer both with and
without deduplication. The amount of data to back up in this example is 1 TB.
Table 5 Estimated Time to Replicate Data for a 1 TB Backup Environment at 2:1
T1/T3 and OC12 are old terms with respect to WAN link terminology. Many link providers use their
own names (e.g., IP Clear, Etherflow). This document distinguishes them by their speed using 2
Mbits/sec, 50 Mbits/sec, etc.
One consideration with replication is that you must “initialize” the Virtual Tape Libraries with data
prior to starting the replication. This ensures that the source and target devices are both synchronized
with the relevant reference data to allow them to interpret the changed data (deltas) that comes across
the WAN link during replication.
Replication Deployment Options
You can deploy the HP VLS and D2D systems for replication in many ways depending on your
requirements. You should understand the terminology associated with deduplication and replication.
The key terminology for replication deployment:
• Source: A series of slots/cartridges in a virtual library that act as the source data to replicate. This
is the original copy of the backup data, written to and managed by the source site’s backup application.
• Target or LAN/WAN destination: A series of corresponding slots in another virtual library on
another site in another location which receives data from the source library. This is the secondary
(disaster recovery) copy of the backup data, managed by the replication system.
5.3 minutes73 minutes35 hours16.3 GB1.0%
7.3 minutes102 minutes49 hours22.5 GB2.0%
Concepts26
For both HP VLS and D2D systems, the unit of replication is a virtual cartridge and the replication link
is TCP/IP (one GbE connection per node on the VLS system, and one to two GbE connections on the
D2D system). Figure 8 shows how you can configure the system to replicate all of the cartridges or
just a subset of cartridges from the source virtual library to the target virtual library.
Figure 8 Replication Configuration Options
.
• Active-Passive: The best deployment for a single site disaster recovery protection. The active source
device receives the local backup data and then replicates it to a passive target device at the disaster
recovery site dedicated to receiving replication data.
• Many-to-one: The best deployment for several smaller sites consolidating their replication to a
central disaster recovery site. Each source device in the smaller sites replicate to a central target
device which you configure to have multiple sources replicate to a common virtual library on the
central device (each source replicates to its own subset of the cartridges in the virtual library). Alternatively, each source can have its own dedicated virtual library. Up to four remote VLS sites
can copy to a single HP VLS at the central site at launch, and this will be increased over time. Up
to 16 remote D2D sites can copy to a single D2D4000 at the central site, and up to 24 remote
D2D sites can copy to a D2D4100.
• Active-Active and N-Way: The best deployment for sharing your VLS or D2D system hardware for
both receiving backups and receiving replication data (so each device is both a source and a
target as shown in the above diagram). Active-active is one way to implement cross-replication
between sites, but you can use two active-passive deployments to achieve the same result.
Choosing between either active-active or 2x active-passive deployments for cross-replication depends
on which provides the lowest cost. Active-active is only recommended if the backup traffic on each
device is only using up to half of the device’s maximum performance and capacity, because you
need additional performance and capacity for the replication target operations.
For example, if you have two VLS9000 sites that each requires 2-nodes/2-arrays for just their
backup performance/capacity and 2-nodes/2-arrays for their replication target performance/capacity, then you have the following choices for cross-replication deployment:
HP StorageWorks VLS and D2D Solutions Guide27
• Active-Active: Each site requires a 4-node/4-array VLS9000 (with deduplication) shared
between backups and replication target, one rack, and four replication LTUs.
• 2x Active-Passive: Each site requires a 2-node/2-array VLS9000 (with deduplication) for backup
and a separate 2-node/2-array VLS9000 (with deduplication) for replication target, two racks,
and two replication LTUs.
In this example, it costs less to use active-active because it adds two replication LTUs but saves the
hardware/power/footprint cost of a second rack and the cost of a second VLS connectivity kit.
However, if your backups required more than half of the maximum device performance (for example, more than two nodes out of a maximum configuration of four nodes), you may have to
deploy two devices per site. In this case, it would be cheaper licensing (and better future device
scalability) to use 2x active-passive deployment.
NOTE:
Multi-hop replication (replicating a cartridge from device A to device B, and then replicating the
replicated cartridge from device B to device C) is not yet supported.
Backup Application Interaction with Replication
The replication in both the VLS and D2D systems is mirroring the source cartridge to its matching
target cartridge so both cartridges have the same barcode, the same tape contents, etc. Backup
applications currently cannot handle seeing two copies of the same cartridge at the same time (because
to the backup application, the cartridge is a single entity in the media database). Given this limitation,
you must hide the target virtual library from the source device’s backup application:
• For VLS systems, the replication target is a subset or an entire virtual library that is presented on
front-end Fibre Channel ports, so if the source backup application is running media agents on the
target site you either need to use SAN zoning or the device’s LUN mapping feature to hide this
replication target virtual library from the source device’s backup application.
• For D2D systems, this is currently automatic because the replication target is hidden from all ex-
ternal host access (until it is converted into a non-replicating library in the event of a disaster recovery).
Figure 9 Presenting the Replication Target to a Different Backup Application
.
Concepts28
On a VLS (by default) or a D2D (if you enable the read-only mode on the target library), you can still
present the replication target to a different backup application instance (i.e., a separate backup
application master/cell server on the target site with its own media database), which you can use to
“import” replicated cartridges into its media database and then perform restores or copy to physical
tape, etc. See “Creating Archive Tapes from the Target” on page 190 (VLS) or“Creating Archive Tapes from the Target” on page 99 (D2D) for an example on automating this.
NOTE:
With HP Data Protector, if you have a cell server in each site that can share library devices across
sites through a MoM/CMMDB, you still need to ensure that each cell server only sees its local virtual
library (i.e., the source cell server must not be configured to see the target virtual library and vice-versa).
Replication Limitations
VLS and D2D replication may not work in every environment. Understand the possible limitations:
• Do not confuse Virtual Tape Library replication with “high availability/continuous access” which
is a type of full bandwidth replication used on Disk Array technology whereby primary application
data can be accessed within hours of a disaster at one site from the other site. Virtual tape replication is not high availability; it is a means of automating the offsiting of data resulting in better
disaster recovery coverage.
• System data rate change. The higher the data change rate, the more data requires replicating.
Systems with very high change rates and slower links may not be able to replicate all the data
off-site within 24 hours.
• High latency links. For very large distance replications with many routers involved, the latency of
the WAN link may lead to high inefficiency of the link where throughput is limited and replications
cannot be performed in a timely manner.
• Current link speed is too slow or the implementation of replication on the existing link will cause
unacceptable delays in application response times. Using the HP StorageWorks sizer tool and
some of your inputs, you can evaluate if you will need to increase an existing link speed to be
able to benefit from replication. See http://www.hp.com/go/storageworks/sizer.
• Some additional financial investment will be required as increased bandwidth links, hardware
additions, and/or deduplication and replication licenses, but in general the increased robustness
of the data protection process should pay for itself within 2–3 years.
• On the VLS, the HP Accelerated deduplication relies on understanding the metadata format of the
incoming data stream. It does not currently support all data formats and backup API’s. In the case
where an HP VLS cannot deduplicate the data type, the data is sent untouched to the VLS. This
data is replicated as “whole cartridge;” the entire tape contents are replicated and not the delta’s
or unique data objects. If a high percentage of your date cannot deduplicate, the volume of data
to replicate will be very large. If you do not have very large volumes of data to replicate, you
should consider using HP whole cartridge replication. This works essentially in the same way as
replication using echo copy pools; it requires no tape transfer or initialization and no deduplication
or replication licenses. However, all data is transferred between sites and this means the WAN
links will have to considerably higher performance at an associated higher cost.
HP StorageWorks VLS and D2D Solutions Guide29
Concepts30
3 Backup Solution Design Considerations
This section uses use models to explores many of the concepts you must consider when designing
your system. Use models are organizational schemes that provide a basic organizational framework
that you can use to delineate your environment and visualize how to implement the VLS or D2D for
your best results. You might also think of the use models as decision trees in which each branch
junction is a decision point that comes with value judgments and trade-offs. The decisions you make
lead to an implementation plan.
The choices you make are economic as well as functional. Different implementations can have different
direct costs such as investment in licensing or equipment purchase/usage, and indirect costs such as
investment in maintenance or configuration. You need to balance time, money, and use policies
against one another. As with the major categories (tape, VT/D2D, clone), there is no one size fits all
solution. The optimum solution will be a balance of the models.
Analyze the Existing Environment
Carefully consider your environment in general, and your topology. Think about what you have, how
you use it, and most importantly, what you want to optimize or improve. Determine your goals for
your environment.
Consider How you Want to Back Up your Data to the VLS or D2D
• Identify the data that you want to backup to the VLS or D2D. For example:
• Data that needs a better service level than going directly to tape but does not need the highest
service level (split mirror, for example).
• Slow servers in your current backup topology. Supported topologies include SAN or LAN/SAN
hybrid:
HP StorageWorks VLS and D2D Solutions Guide31
Figure 10 Backup to the VLS in a LAN/SAN Hybrid Environment
.
This includes servers that contain lots of small files that are likely to slow down backup performance on the host (i.e., those found on Windows file servers and web servers), blade servers,
etc. Slower servers on the LAN typically interleave the backup through the media server and
go out to one tape drive. These are good candidates for VLS and D2D because you can configure multiple virtual tape drives, disable multiplexing, and then backup each slow server in
parallel (with multiplexing disabled the restores will run faster). See Multiplexing,
Multistreaming, and Multipathing. You can check the true performance of your servers in a
variety of ways, including using the tools found at in the Software Downloads section of the
Developer & Solution Partner Program web page: http://h21007.www2.hp.com/dev/.
• Data that you want to electronically off-site using deduplication—enabled replication (and thus
must be backed up to a local VLS or D2D which can then replicate the data automatically to
another VLS or D2D in another site).
• Archive data.
• Look for aggregate bandwidth bottlenecks between hosts and backup devices. One common
bottleneck is the LAN bandwidth (for LAN backups), so identify which of the larger application
Backup Solution Design Considerations32
servers could perform LAN-free backup (see LAN-free Backups) and thus move their backup traffic
from the LAN to the SAN.
• List your backup applications. In a heterogeneous environment, you can have multiple applications
writing to different virtual libraries on the same VLS or D2D (which is highly flexible with regard
to the libraries that can be configured). This is similar to library partitioning but more flexible and
easier to setup and maintain. To reduce backup administration, always use the fewest number of
virtual libraries. For example, use one virtual library for each backup application. (See Single
Library vs. Multiple Libraries.)
• How long do you want to retain the data on the VLS or D2D. (See Retention Planning.)
• How fast is your data growing. (See Future Data Growth.) This is critical in deciding which disk
backup technology best fits because you should choose a technology that will continue to scale
in performance and capacity so that it can provide a single backup target even after several years
of data growth.
Consider How you Want to Copy the Backup Data to an Off-site Location
Think about copying the backup data to an off-site location for site disaster protection or long-term
archival. See Considerations for Copies for more details.
Consider Speed and Ease of the Restore as well as the Backup
If you are still within your backup retention period, you can restore the data directly from the disk
backup device. If you are outside of your retention period, you can restore from your long-term retention
tapes if you have created them. If your local disk backup device has been destroyed (i.e., site disaster),
you can recover from your off-site copies (could be tape or could be another replicated disk backup
device). See Considerations for Restores.
Single Library vs. Multiple Libraries
The VLS and D2D configurations are highly flexible with regard to the number and types of libraries.
There are two basic backup use models (single library and multiple library). The choice of which
primarily depends on the architecture of the backup applications. To reduce backup administration,
use the fewest number of virtual libraries. For example, use one virtual library per backup application.
Single Library
The single-library use model is a many-to-one configuration -- all hosts see one communal library. This
is the default configuration.
HP StorageWorks VLS and D2D Solutions Guide33
Figure 11 Backup to VLS in a Simple Deployment (VLS9000–series with One Shared Library Shown)
.
Benefits of Single Library Systems
• This use model is easy to manage.
• It is easy to copy through the backup application because you already have shared devices and
they all see and are seen by a copy engine or server.
• Some configuration tasks are easier; you don’t, for example, have to worry about assigning spe-
cific jobs to specific libraries.
• You don’t have to worry about performance tuning. If, for example, you have enough bandwidth
for ten backup streams, you configure ten drives and let the device load balance automatically.
The backup application decides how to allocate; you don’t have to specify.
• Multiple hosts are mixed together in one media set. When you copy to tape, you’re using less
media.
• With the VLS9000 or VLS Gateway, you can consolidate your storage system to use only one
large VLS (greater than 500 usable TB) instead of multiple smaller appliances.
If you use Data Protector, licensing is relatively economical — media servers do not require individual
licenses, object copy doesn’t require additional licensing, and virtual tape is licensed per TB. Setup
is largely automated and does not require manual intervention.
Considerations for Single Library Systems
• Not all backup applications are well-suited to this setup. This depends (at least in part) on the
application’s licensing schemes. If you have to purchase shared drive licenses, this model may be
more expensive than a dedicated setup.
• Setup can be challenging in some applications. You may have to configure paths to the tape
drives from all the backup hosts. The backup application may do some of this automatically, but
this, in itself, can be time consuming. And if you change the configuration, you may have to repeat
some management steps and do some re-configuration.
Backup Solution Design Considerations34
• Multiple hosts are mixed together in one media set. Although you may be using less physical media,
this may not meet your media tracking and pooling needs.
Multiple Library
In this model, multiple hosts map to discrete libraries in a one-to-one or some-to-one configuration.
Figure 12 Backup to VLS in a Simple Deployment (VLS9000–series with Four Dedicated Libraries
Shown)
.
Benefits of Multiple Library Systems
The assignment of drives and media in this configuration is more individualized. This model is helpful
when you need to control the organization and grouping of media. Particular backup jobs go to
particular media (or pools) and no other hosts are writing to that media. Note that backup jobs can
be assigned to particular media in the shared model, also, but it’s easier to group physical media in
the dedicated model because you can group media by host.
You might choose a restricted use model if:
• Your backup application won’t handle SAN sharing or if your backup application’s licensing is
prohibitively expensive in a shared environment. You may be able to spend less on licenses because
you can pick the hosts that you want to be visible to particular LUNs
• You have multiple backup applications. Configure one virtual library per backup application
• You have multiple SAN fabrics. Configure one virtual library per SAN
• Your backup application does not handle SAN sharing
• You have licensing concerns with your backup application. You can adjust library slots and drives
to optimize license fees
If you choose to copy to tape through the backup application, there are two basic options:
• Shared copy library – There are two scenarios: 1) You have a dedicated copy host that sees the
copy library and all of the dedicated virtual libraries. There is SAN sharing and you may have to
HP StorageWorks VLS and D2D Solutions Guide35
do LUN mapping and masking 2) All hosts write to the library as a shared device. This involves
copies going on during prime backup or operational time on the hosts.
• Dedicated copy library for each virtual library -- This may be expensive with multiple virtual libraries.
Multiplexing, Multistreaming, and Multipathing
This section explains the concepts of Multiplexing, Multistreaming and Multipathing, and notes which
of these technologies are useful for disk based backup devices and which are not recommended.
Multiplexing
This technology (sometimes called interleaving) is where multiple backup streams (from multiple servers)
are concurrently mixed together into a single tape drive (and thus a single tape cartridge). This is
commonly used for physical tape drives such as LTO where the speed of a single backup stream is
too slow to maximize the tape drive performance and you must run multiple concurrent streams together
into the drive to achieve full drive performance. However, this also means that restore performance
is severely affected because to restore a single backup stream (i.e., restore one server) the backup
application still has to read all of the intermixed data on the tape cartridge and then throw away the
data from the other backups. If you have five servers’ backups multiplexed together on one tape
cartridge, you only get 20% restore performance when you restore one server.
Multiplexing is not recommended. When you have a disk backup device, you should disable
multiplexing on all backups going to the device because the main advantage of a disk backup device
is the ability to run many more concurrent backup streams than an equivalent physical library with
limited numbers of tape drives. For example, on the VLS you can create 32 virtual tape drives on
every Fibre Channel port and allow 32 non-multiplexed backup streams to be written concurrently to
each Fibre Channel port. This improves backup performance by allowing more backups to run in
parallel and improves restore performance because the backups are non-multiplexed. Disabling
multiplexing also improves deduplication efficiency and performance.
Multistreaming
This technology is where multiple objects in a backup job can be written concurrently to multiple tape
drives (each object going to a different drive). For example, if you have a file server with C:, D:, and
E: volumes, with multistreaming this could be concurrently written to three separate tape drives (C:
going to drive1, D: going to drive2, E: going to drive3). This increases the backup performance of
that server. Another example is a database backup where you may have multiple tablespaces or files
within the same database and these objects can be written in parallel to the tape drives, or some
database backup agents such as Oracle RMAN allow multiple streams even from one object (e.g.,
one database tablespace).
Multistreaming is recommended for disk based backup, particularly because many virtual tape drives
can be created thus allowing more concurrent streams to run from multistream backup jobs.
Multipathing
This technology uses two Fibre Channel paths from the backup server to the virtual tape library,
allowing for higher availability; for example, if you have dual SAN fabrics and wish to continue
operations even if one entire fabric fails. The VLS and D2D support the option of presenting any virtual
library device (changer or tape drive) on two Fibre Channel ports.
The main use of this is to present the virtual library changer device over two Fibre Channel ports
because the backup application must always be able to access the changer to perform any
backup/restore operations. Many enterprise backup applications (such as HP Data Protector, Symantec
Backup Solution Design Considerations36
NetBackup, etc.) support dual paths to the library change device and will automatically switch over
to the alternate path if the primary path fails.
Multipathing virtual tape drives is not recommended. Many enterprise backup environments (operating
systems or backup application) do not support dual path to tape drives because they see this as two
separate drives. Given the fact that a virtual library can create many virtual tape drives over multiple
different Fibre Channel ports, if a SAN fabric fails (and you have a dual-path changer configured)
the only impact of this is that half the virtual tape drives cannot be accessed but the remaining half
of the drives in the library can still be used and can still access every virtual cartridge in the library.
This also implies that when a SAN fabric fails this halves the available Fibre Channel performance
so if full backup performance is required even after a SAN fabric failure then the VLS needs to be
configured with double the number of Fibre Channel ports. Instead of configuring dual path virtual
tape drives, it is better to configure single-path virtual tapes across multiple Fibre Channel ports that
are connected to both SAN fabrics.
NOTE:
The D2D can configure its two LAN ports in high availability port mode (see D2D Ethernet Ports), but
this is transparent to the backup application because it only sees one "bonded" path to the virtual
library.
Blocksize and Transfer Size
As with physical tape, larger tape block sizes and host transfer sizes are of benefit; they reduce the
amount of overhead of headers added by the backup application and the transport interface.
• D2D: HP recommends a minimum of 64 KB blocksize and suggests up to 1 MB.
• VLS: HP recommends a blocksize of 256 KB (the maximum supported).
LAN-free Backups
All enterprise backup applications provide the ability to run “LAN-free” backups. This is where
application servers connected to the SAN run the backup media agent and can perform backups
directly over the SAN to the tape library. This removes this backup traffic from the LAN and speeds
up backups (by removing the LAN performance bottleneck). One of the disadvantages of LAN-free
backup to a physical library is that it cannot be multiplexed because each backup stream is written
directly from the data source to the tape drive, and so it could generally only be used with the largest
application servers which can supply high speed backup streams.
One of the key advantages of a SAN virtual library is the ability to increase the amount of LAN-free
backup used. With a virtual library you do not have to provide maximum performance per LAN-free
backup stream as needed on physical tape drives. You can create many virtual drives and thus enable
LAN-free backup on any SAN-connected appliance servers (regardless of their performance) that are
large enough to bother with LAN-free backup. This improves the backup performance of the LAN-free
backups and also improves performance of the remaining LAN backups because the LAN-free backup
traffic has been removed from the LAN.
Retention Planning
Retention planning and sizing go hand in hand. How long do you need to keep data on disk? How
many full backups do you want to keep on disk? How many incremental backups? How do you want
HP StorageWorks VLS and D2D Solutions Guide37
to optimize retention times of the VLS or D2D? Retention policies help you recycle virtual media. Bear
the following considerations in mind as you plan retention policies:
• If you are not using deduplication, you can retain the data on the disk backup device for a shorter
period such as 1-2 weeks (because more than 90% of restores generally occur within the first
week’s retention of backups) and then use tape copies to retain data for longer periods.
• If you are using deduplication, you can retain data on your disk backup device with the same
level of retention times as you would have had on tape. This provides a more granular set of recovery points with a greater likelihood that a file that you need to recover will be available for
longer and in many more versions. To use deduplication-enabled replication you must have at
least two full backups retained on the disk backup device.
• Once the retention period expires, the virtual media is automatically recycled.
• You should set the tape expiration dates (that is, when the tape is marked as worn out) high because
virtual media does not wear out.
• Backup-job retention time is for virtual media.
• Copy-job retention time is for physical media.
• When copying through the backup application, the virtual and physical pieces of media are
tracked separately and the retention times should be considered and set individually.
Future Data Growth
How fast is your data growing? This is critical in deciding which disk backup technology best fits;
choose a technology that will continue to scale in performance and capacity so that it still can provide
a single backup target even after several years of data growth.
If you choose a disk backup technology that starts close to it maximum performance/capacity, then
as data grows you must add another device (and thus another backup target) and so on, which means
significant increases in backup administration over time due to having to manually balance all backup
jobs across multiple backup targets. This manual balancing is even more difficult when you have
multiple targets that are each separate deduplication domains, because then when you switch a
backup job from one target to another it “resets” the deduplication for that backup making capacity
planning very complex.
Considerations for Copies
Generally the disk backup device is located at the same site as the data being backed up, which
means you must create copies of the backup data that are stored in a different remote site to protect
against a site disaster. There are also sometimes long-term (e.g., multi-year) retention requirements
for some backup data that may require copying the backups to physical tape. (There may be legal
requirements to use tape, plus a tape on a shelf uses zero power compared to storing the long-term
retention backup data on spinning disks). When designing your solution, consider the following
options available for copying backups. You can implement all of these without affecting the application
servers and with the minimum of impact on data center backup processes.
• Copy your backups from your virtual library to physical tape and then ship these physical tapes
to off-site storage. There are different methods of creating the copy on physical tape:
• Use the backup application to copy data from the virtual library to the physical library (preferred
method).
• Use the automigration functionality within the VLS (which turns the VLS into a disk-cache of the
physical library) or the tape offload functionality within the D2D.
• If you are using deduplication, another option is to use the deduplication—enabled replication
technology which cost-effectively copies the backups to another remote device (e.g., VLS replicates
Backup Solution Design Considerations38
to another VLS, D2D replicates to another D2D) and removes the need to move tapes between
sites. See VLS Replication.
Copy to Physical Tape through the Backup Application
You can migrate data from virtual media in the VLS or D2D device to physical tape media using the
various tape copy/clone mechanisms that exist in enterprise backup applications. Using the data
protection software to copy from virtual to physical media has the following advantages compared
to any transparent migration to tape:
• The virtual and physical media are separately tracked in the backup application’s database, so
you can have different retention times for virtual and physical media and the two pieces of media
can be in different locations. This provides flexibility when restoring data; you can restore from
tape or (if you have configured your system appropriately) you can restore from the virtual library
because it can retain virtual media online after the physical media is sent off-site.
• The tape copy process is an integrated part of the backup application.
• Improved media management and utilization. You can, for example, specify which backups are
copied to physical tape, so fewer tapes are used (the difference can be as much as half).
• The tape library can be used for other tape backup processes. The tape library is a common re-
source on the SAN because it is not hidden behind the VLS or D2D.
After installing the VLS or D2D device and configuring/redirecting the backups to the device, you
will create tape copy/clone jobs in the backup applications to perform the migration to physical tape.
Most backup applications include several copy/clone options, ranging from creating the tape copy
at the same time as the backup (mirroring), scheduling a copy of all backup media to run when all
the backups are expected to finish (scheduled copy), or creating the tape copy immediately after each
backup completes (triggered copy). Scheduled copy is the recommended option so the copy jobs run
after the backup window; otherwise, having copies and backups running at the same time results in
significant performance loss.
When copying virtual cartridges to physical cartridges through the backup application, the data is
read from the virtual library into one of the available media servers which then writes the data back
out again to the physical library. The physical tape library is independent of the virtual tape library
and is entirely under the control of the backup application. This means that the virtual library type
and virtual drive type and virtual cartridge size do not need to match the physical library in any way;
the backup applications all perform “object copy” where they effectively backup the backup and
therefore append and span copy jobs onto the physical tapes just like they would with backup jobs.
HP StorageWorks VLS and D2D Solutions Guide39
Figure 13 Writing to Tape in a LAN/SAN Hybrid Environment
.
Media Server Considerations
To support the background copy of backup data from the VLS or D2D to a tape library, use one or
more of your existing backup media servers (running the backup application media agents) to copy
the data directly from the device media onto the tape media with data passing over the SAN from
the VLS or D2D to the device server and then back again to the tape library. If you are running the
copies after the backup window is complete, all of the media servers that were performing LAN
backups are now idle and can be used for copies. Do not use media servers that are performing
LAN-free backup because these will be active application servers and could affect application
performance if used for copy operations. Ensure that the media servers used for tape copies are
plugged into the same SAN switch as the virtual library and physical library so the copy traffic does
not use any SAN bandwidth. In this configuration, the tape copy does not impact the running
application servers, so it can be run in the background after the backups are completed.
Benefits of Copying to Physical Tape through the Backup Application
Using a dedicated media server to migrate data stored on a VLS or D2D to a physical library allows
you to:
Backup Solution Design Considerations40
• Use the functionality of a backup application.
• Copy only the specific files you need onto physical tape.
• Not waste tape storage as physical tapes can be fully filled.
• Monitor and track copy jobs from the backup application.
• Use any tape library that is supported by the backup application.
Considerations for Copying to Physical Tape through the Backup Application
There are potential drawbacks to using a backup application instead of caching to copy data on the
VLS or D2D to physical tape. Consider the following when deciding which data migration method to
use:
• You must purchase additional backup application licenses for the physical library.
• You must configure a backup application with a scheduled copy job to perform the migration.
• If you have the media servers on different SAN switches from the libraries, copy bandwidth is
added to the SAN during data migration.
Copy to Tape using VLS Automigration
You can migrate data from virtual media in the VLS device to physical tape media using
automigration/tape-offload to perform transparent tape migration. The automigration echo copy
feature allows the VLS to act as a tape copy engine that transfers data from virtual cartridges on disk
to a physical tape library connected to the VLS device. See VLS Automigration.
Echo copy essentially acts as a transparent disk cache to the physical library so echo copy jobs can
be performed at times other than the peak backup window. Once automigration is set up, echo copy
operations are automatic and seamless to the user. Echo copy is managed through the automigration
software on the VLS instead of through the backup application.
HP StorageWorks VLS and D2D Solutions Guide41
Figure 14 Echo Copy is Managed through Automigration
.
Benefits of Echo Copy
• The destination library is not visible to the backup application, so it does not need licensing.
• There is no need to license/configure copy jobs in the backup application.
• Copy traffic is no longer routed through the SAN.
• Maximizes performance of high-speed tape drives by freeing bandwidth during the normal backup
window.
• All destination tapes created by echo copy are in the native tape format, so they can be restored
directly from any tape drive or library that is visible to the backup application.
Considerations of Echo Copy
• The destination library can only be used for copy operations.
• The copy is a full tape copy, rather than an incremental change copy, so it can be an inefficient
use of media if you are using non-appending media pools in your backup jobs.
• The backup application will not be aware of any copy failures.
• Echo copy can only use the tape libraries supported by the VLS firmware (for example, HP MSL,
EML, ESL-E libraries).
• Because echo copy automatically links the virtual and physical tapes (to maintain media manage-
ment with enterprise backup applications), human error is a factor to keep in mind. With linked
media management, any mistakes with the destination library's media will also affect the virtual
cartridges. For example, if new tapes are not loaded into the destination library the new, matching
Backup Solution Design Considerations42
virtual tapes will not be created. Subsequent backups will fail because the virtual tapes present
are protected.
NOTE:
Automigration echo copy is not suitable for use with deduplication because:
• You cannot use echo copy to create archive tapes from the replication target device because these
must have a different barcode, retention time, cartridge size, and contents. They must be created
by another instance of the backup application. (See Creating Archive Tapes from the Replication
Target.)
• Echo copy acts as a disk-cache to the physical tapes, so when physical tapes are ejected from the
library (for example, to be sent off-site) this also ejects the matching virtual cartridges into the
Firesafe; they are disabled in the deduplication system.
• Echo copy only copies whole cartridges and because the size of deduplicated virtual cartridges
is generally 50-200 GB, a large amount of tape capacity would be wasted if LTO3 or LTO4
physical tapes were used.
Copy to Tape using D2D Tape Offload
You can offload cartridges to physical tape using a tape drive or library that is physically attached
to the D2D system via SAS or pSCSI and is not visible to the backup application. This solution offers
the following benefits:
• Tape offload can be conducted during normal working hours without affecting network performance
because no data is sent over the network during the offload.
• Cartridges can be offloaded at the maximum read performance for a virtual library which makes
this process relatively fast.
• Offloaded cartridges are in backup application format, so you can use it to directly restore data
using a tape drive attached to a media server if necessary.
However, you must take into account a number of shortfalls of this configuration:
• The backup application cannot track the physical cartridges because it played no part in their
creation. A cartridge copied by the D2D is only valid for as long as the virtual cartridge remains
current; if the virtual cartridge is modified (overwritten or appended), the physical cartridge content
no longer has a valid entry in the backup application database. However, you could still use it
for disaster recovery if you lost the backup application database. Creating long rotation schemes
where cartridges are overwritten infrequently works well with this model.
• Even with the scheduling features provided in the D2D GUI, offloads are hard to accurately
schedule to coincide with completion of the original backup.
Do not use physical tape offload as a way to free up space on the D2D by removing the virtual
cartridge after an offload. Doing this will not save space due to the deduplication effect and will result
in backup application database entries becoming invalid. In addition, HP does not recommend using
physical media to extend the retention period of a backup (i.e., keeping a physical cartridge beyond
the point where its associated virtual cartridge has been overwritten) because the backup application
database will become inconsistent with the backup on tape.
Copy to Remote Disk Backup Device using Replication
Deduplication can automate the off-site process and enable disaster recovery by providing site to site
deduplication-enabled replication at a lower cost. Because deduplication knows what data has
changed at a block or byte level, replication becomes more intelligent and transfers only the changed
HP StorageWorks VLS and D2D Solutions Guide43
data instead of the complete data set. This saves time and replication bandwidth, and is one of the
most attractive features that deduplication offers. Replication enables better disaster tolerance without
the operational costs associated with transporting data off-site on physical tape. See Introduction to
Replication.
The replication system on both VLS and D2D creates and maintains a cartridge mirror between the
source and target devices. Once replication is set up, operations are automatic and seamless to the
user. Replication is managed through the firmware on the VLS or D2D, not the backup application.
Benefits of Replication
• Improves the overall reliability in offsiting data.
• Completely automates offsiting data.
• Improves the resilience of remote offices and regional data centers while maintaining local data
recovery capabilities.
• Reduces the overall costs of offsiting data compared to physical tape offsiting when you consider
all manual costs.
• The replication target library is not visible to the backup application and does not need licensing.
• There is no need to license/configure copy jobs in the backup application.
Considerations of Replication
• The backup application will not be aware of any copy failures.
• You cannot have different retention times between the source and target media because cartridges
are mirrored. However, with deduplication reducing the amount of disk space required to store
tape retention times, you do not generally need different retention times between source and target.
• You cannot present both the source and target media to the same backup application at the same
time. (See Backup Application Interaction with Replication.)
Creating Archive Tapes from the Replication Target
Using replication you cannot present both the source and target media to the same backup application
at the same time. (See Backup Application Interaction with Replication.) You cannot use automigration
(which mirrors whole cartridges) to create the archive tapes because they will have a different barcode,
retention time, contents, and potentially a different size from the replication target media. To create
archive tapes from the replication target media, you must use a second instance of the backup
application to perform this. (See Creating Archive Tapes from the Target.)
The backup application must have the ability to “import” tapes created on one backup application
domain into another domain (e.g., importing a cartridge from a Data Protector cell server to another
cell server). This can be done manually, or it can also be done automatically via a script. On the VLS
the script can be driven by an “ISV Email” email report. See Creating Archive Tapes from the Target
for details and example scripts.
Considerations for Restores
There are two restore use models: from tape and from the VLS or D2D. In either case, copies are in
native tape format and so can be restored directly. Retention policies drive copy and restore use
models.
Backup Solution Design Considerations44
Restoring from Disk Backup Device
Restore from the VLS or D2D is performed in the same way that restore from tape is done. Simply
direct your backup application to do the restore.
NOTE:
A restore from one tape can be performed at the same time as a backup to a different tape.
Restoring from Backup Application-created Tape Copy
The format of physical tapes used in the VLS and D2D environments is the same as the format in
environments not using a virtual tape library. To restore from tape, place the tape in a library or drive
that the host has access to and restore.
• Time to retrieve media is a factor in recovery time. Depending on the location of your vault and
the media management within the vaulting process, retrieval time can be a key part of recovery
time.
• Recovery from tape can be slower than recovery from disk.
Restoring from the Replication Target
Consider a use case after a site disaster where the local VLS or D2D device is unavailable and the
restore must be performed from the replication target device instead. There are three main method
of performing disaster recovery restore from a replication target:
• Restore directly from target device.
The replicated cartridges in the target device are all native format (so they can be restored directly
by a backup application). After presenting the replication target to a replacement backup application and restoring its media database, you can restore your servers from the VLS or D2D device.
See Restore Directly from the VLS Target Device and Restore Directly from the D2D Target Device.
• Restore over LAN/WAN.
This option is where some or all of the source device is rebuilt by restoring the replicated cartridges
over the LAN/WAN back from the target to the source. In the VLS, this is a wholesale (non-deduplicated) restore so can only restore a subset of the source device such as the last backup set. (See
Restore the VLS over the LAN/WAN.) In the D2D, this is a deduplicated reverse replication so
can rebuild the entire source device. (See Reverse Replication on the D2D.)
• Reverse tape initialization.
This option is where the replicated cartridges on the target device can be exported to physical
tape which can then be imported back into a new source device to rebuild it. This is currently only
supported on D2D devices. (See Reverse Tape Initialization on the D2D.)
Performance Bottleneck Identification
In many cases, backup and restore performance using the VLS or D2D is limited by external factors.
For example, performance is affected by the speed at which data can be transferred to and from the
source disk system (the system being backed up), or by the performance of the Ethernet or Fibre
Channel SAN link from the source to the VLS or D2D. To locate bottlenecks in the system, HP provides
some performance tools which are part of the Library and Tape Tools package available at http://
www.hp.com/support/tapetools.
HP StorageWorks VLS and D2D Solutions Guide45
The HP Library and Tape Tools:
• Dev Perf – This provides a simple test which will write data directly from system memory to a
cartridge in a library on the VLS or D2D system. If this tool is run on a server being used as the
backup media server, it can provide the maximum data throughput rate for a single backup or
restore process. This isolates any backup application or source disk system from the environment
and helps identify whether the bottleneck is the VLS or D2D system or the data transport link (Ethernet or Fibre Channel SAN) to the VLS or D2D system. For example, if this test reports 30-40
MB/s on a D2D, there is not a problem with the D2D or transport link. This test should report 100150 MB on a VLS. Before starting the test, HP recommends creating a new virtual library to avoid
overwriting any data on the current libraries. Use the frontpanel function on the Library and
Tape Tools to manually "move" a cartridge into the tape drive.
• Sys Perf – This tool provides two tests which are conducted on the source disk system to perform
backup and restore performance tests. These tests either read (for backup) or write (for restore)
from and to the system disks to calculate how fast data can be transferred from disk and therefore
whether this is a bottleneck. These tests should be run on the media server which backs up to the
VLS or D2D. In order to test how fast any client servers can transfer data, the same performance
tests can be used by mounting a directory from any of the client servers to the media server then
running the test from the media server against these mounted directories; this will show how quickly
data can be transferred from the client server disk right through to the media server.
Backup SAN Design Guidelines
The design of your SAN environment will affect the performance, efficiency, and reliability of your
backup and recovery scheme. Good SAN design is conducive to good VLS and D2D performance.
Inefficient SAN design can degrade the performance and efficiency of all members of the SAN.
General SAN Design Considerations
See the reference materials at www.hp.com/go/ebs for information about:
• SAN design and configuration
• Synchronizing equipment and firmware (heterogeneous SAN support)
• Working across operating systems (SAN in heterogeneous environments)
SAN Zoning
Due to complexities in multi-hosting tape devices on SANs, it helps to make use of zoning tools to
help keep the backup/restore environment simple and less susceptible to the effects of changing or
problematic SANs. Zoning provides a way for servers, disk arrays, and tape controllers to only see
what hosts and targets they need to see and use. See the Enterprise Backup Solutions Design Guide
available at http://www.hp.com/go/ebs for details.
The benefits of zoning include:
• The potential to greatly reduce target and LUN shifting.
• Limiting unnecessary discoveries on the FC interfaces.
• Reducing stress on backup devices by polling agents.
• Reducing the time it takes to debug and resolve anomalies in the backup/restore environment.
• Reducing the potential for conflict with untested third-party products.
Zoning may not always be required for configurations that are already small or simple. Typically the
bigger the SAN, the more zoning is needed. HP recommends the following for determining how and
when to use zoning.
Backup Solution Design Considerations46
• Small fabric (16 ports or less) may not need zoning. If no zoning is used, make sure that the virtual
tape external Fibre Channel connections resides in the lowest ports of the switch.
• Small to medium fabric (16 - 128 ports) can use host-centric zoning. Host-centric zoning is imple-
mented by creating a specific zone for each server or host, and adding only those storage elements
to be utilized by that host. Host-centric zoning prevents a server from detecting any other devices
on the SAN or including other servers, and it simplifies the device discovery process.
• Disk and tape on the same pair of HBAs is supported along with the coexistence of array multipath
software (no multipath to tape, but coexistence of the multipath software and tape devices).
• Large fabric (128 ports or more) can use host-centric zoning and split disk and tape targets.
Splitting disk and tape targets from being in the same zone together will help to keep the tape
controllers free from discovering disk controllers which it does not need to see. For optimal performance, where practical, dedicate HBAs for disk and tape.
NOTE:
Overlapping zones are supported.
Operating System Tape Configuration
Enterprise backup applications generally support media agents (which communicate with tape and
virtual tape devices) on various operating systems to provide a heterogeneous enterprise backup
environment. The VLS and D2D products support many of these operating system variations. (The EBSCompatibility Matrix available at http://www.hp.com/go/ebs details which operating systems with
each backup application are supported by VLS and D2D). The Enterprise Backup Solutions DesignGuide available at the same web location has detailed instructions and best practices for configuring
the tape drivers for the following operating systems:
• HP-UX
• Microsoft® Windows
• Tru64 UNIX®
• Linux
• NetWare
• Sun Solaris
• IBM AIX
When adding a VLS or D2D virtual library to a server running a media agent, you must:
• Ensure that the minimum operating system patches are installed.
• Configure the Fibre Channel HBAs in the operating system which generally means configuring
persistent binding so that the virtual library device LUNs are always presenting on the same SCSI
path regardless of SAN configuration changes.
• Detect the virtual tape drives and create the tape drive device paths (generally the virtual tape
robot is discovered and configured by the backup application itself rather than the operating
system). Some operating systems limit the number of tape LUNs that can be discovered to eight
LUNs on each device port (remember there are multiple Fibre Channel ports on the backup devices),
but there are workarounds in some cases:
• Windows Large LUN support.
Enable when using more than 8 LUNs per port. Information is available at http://support.mi-
crosoft.com/kb/310072/.
• HPUX Large LUN support (on v11.31 or higher).
HP StorageWorks VLS and D2D Solutions Guide47
For an HP-UX 11.31 server using persistent DSFs, the maximum number of LUNs per bus on
the server can be increased by entering # scsimgr set_attr –a max_lunid=32.
• Disable unnecessary polling/diagnostic applications that can interfere with backups or impact
performance.
• Windows: Disable Windows Removable Storage Manager (RSM).
1.Disconnect the Windows node from the SAN (unplug all Fibre Channel cables).
2.Delete all files and subfolders under the ..\system32\NtmsData folder (location of
the system32 folder varies with different Windows versions).
3.Enable and start the Removable Storage service in the Microsoft Computer Management
applet.
4.Access the Removable Storage (My Computer/Manage/Storage/Removable
Storage) in the Microsoft Computer Management applet.
5.Verify that there are no tape or library devices listed.
6.Stop and disable the Removable Storage service in the Microsoft Computer Management
applet.
7.Reconnect the Windows node to the SAN (plug all Fibre Channel cables back in).
8.Reboot.
• Windows using the HP LTO tape driver: Disable Windows Removable Storage Manager (RSM).
1.Install the 1.0.4.0 or later driver.
2.Complete one of these steps:
a.The driver package contains a DisableAutoRun.reg file that can used to modify
the system registry. Log into the system as a user with Administrative privileges and
double-click the DisableAutoRun.reg file. The system will prompt you to confirm.
Enter Y or Yes to modify the appropriate registry entry and disable polling.
b.If the driver package does not include the DisableAutoRun.reg file, manually
edit the system registry using RegEdit. Log into the system as a user with
Administrative privileges, run RegEdit, and navigate to the following registry key:
Edit the AutoRun value found in this key. A value of 0 (zero) indicates that polling is
disabled; a value of 1 indicates that polling is enabled.
3.After completing steps 1 and 2, reboot the affected system.
IMPORTANT:
Adding or removing tape drives from the system may cause an older driver inf file to be
re-read, which in turn can re-enable RSM polling. If tape drives are added or removed, check
the registry for proper configuration and, if necessary, repeat step 2.
• Windows: Disable Windows Test Unit Ready polling.
Information is available at http://support.microsoft.com/kb/842411.
• HPUX: Ensure the EMS is not polling tape drives by either disabling it or by copying the archived
dm_stape.cfg file to the /var/stm/config/tools/monitor folder, and the polling
interval can be set to 0 to disable polling.
Backup Solution Design Considerations48
• HPUX: prevent utilities like mt from rewinding a tape that is in use by the backup application
by turning on the HPUX kernel tunable parameter st_san_safe (which disables tape device
special files that are rewind-on-close)
LUN Masking and Mapping
D2D devices with the Fibre Channel option do not require LUN masking/mapping because every
virtual device LUN (library or drive) has its own unique Fibre Channel WWPN (using what is called
N-port virtualization). This means you use SAN zoning to present each virtual device’s WWPN to the
backup hosts (but you must use a SAN switch that supports this and it makes sharing virtual devices
across multiple media servers more difficult).
On VLS devices, by default all hosts on the SAN can access all the virtual devices (libraries and tape
drives) of the VLS and this mode is used to easily share virtual drives across multiple media servers.
(Enterprise backup applications support the ability to share tape drives across multiple servers).
• The backup applications take care of drive locking to ensure only one media server is using one
shared drive at a time.
• VLS software manages the LUNs assigned to the VLS shared virtual devices by default. The VLS
changes device LUN assignments as needed to make sure that each Fibre Channel host port has
a LUN0, that there are no duplicate LUNs, and that there are no gaps in the LUN numbering on
a Fibre Channel host port (these are operating system requirements).
• This is the easiest scheme for the backup administrator because they can simply configure all their
backup jobs across their media servers to use the shared library with shared drives and the backup
application takes care of deciding which drive is used for which backup job and automatically
scheduling all the jobs onto the available drives.
• One disadvantage of this scheme is that if any one media server goes wrong (for example, mixes
up its tape drives paths), then it can interfere with the backups running from the other media
servers to the shared drives and can be very difficult to diagnose when there are many media
servers.
The VLS has an option to enable LUN masking in the device which is used to assign dedicated virtual
libraries/drives on a VLS. LUN masking lets you restrict which hosts you want to be visible to particular
virtual devices so that if a media server has any problem it only affects the tape drives dedicated to
that media server; it will not affect any backups on other media servers. Because LUN masking prevents
a host from seeing selected virtual devices on the Fibre Channel host ports, a LUN-masked host will
not see the virtual devices to which it does not have access and to which it cannot write. This requires
the administrator to manually allocate specific virtual drives in the virtual library to specific media
servers to meet their performance requirements.
HP StorageWorks VLS and D2D Solutions Guide49
Figure 15 Virtual Tape Environment
.
In the VLS firmware version 1.x/2.x, the LUN mapping mode is specific to each host. It lets you
manually assign new LUN numbers to the virtual devices visible to the host, and hosts that are not
LUN masked continue to use the default (shared device) LUN numbering. In the firmware version 3.x,
the LUN mapping mode is a global setting across the entire device, and when LUN mapping is
enabled all hosts will only see the virtual devices presented to those hosts (so by default any new host
will see no virtual devices). Also in v3.x, the LUN numbering is automatically generated for any virtual
devices presented to a host.
Backup Application Basic Guidelines
The following are basic configuration guidelines specific to integrating an HP virtual library with
enterprise backup applications:
• Improve performance by using larger tape blocks such as a 256 KB block size
• Improve performance by disabling multiplexing/interleaving, because multiplexing will dramatically
reduce performance for restores and will also reduce deduplication performance. Instead of multiplexing, create more virtual tape drives and use multistreaming (so that backup jobs run multiple
streams concurrently to multiple tape drives)
• Disable client backup compression because this will defeat target-based deduplication/compression
in the backup device
For more information on backup application configuration optimizations needed for VLS deduplication,
see Detailed Backup Application Guidelines for VLS. Additional guidelines are available in the HP
StorageWorks backup application implementation guides at http://www.hp.com/go/ebs detailing
• HP Data Protector
• IBM Tivoli Storage Manager
• EMC NetWorker
• Symantec Backup Exec
Backup Solution Design Considerations50
• Symantec NetBackup
HP Data Protector Application Overview
The Data Protector cell is a network environment that includes a Cell Manager and client systems that
run agents. Client systems are imported into a cell and belong to a single Cell Manager. Multiple
cells may exist, each with their own Cell Manager. This environment may be managed by a single
Manager of Managers or “MoM.”
• Cell Manager: the main system in the cell. The Cell Manager is the “traffic cop” that controls the
activities and contains the Internal Database (IDB) within the Data Protector cell. It is not necessary
to administer the backup and restore activities directly from the Cell Manager itself, because any
client within the cell (as supported) can connect to the Cell Manager over the network and be
used to administrate the activities of the cell.
• Disk Agent: install the Disk Agent on client systems you want to back up. The Disk Agent reads or
writes data from a disk on the system and sends or receives data from the Media Agent. The Disk
Agent is also installed on the Cell Manager, allowing you to back up data on the Cell Manager,
the Data Protector configuration, and the IDB.
• Media Agent: (For servers that have direct access to tape drives.) During a backup session, the
Media Agent receives data from the Disk Agent and sends it to the tape device (which can be
directly attached or allocated over a SAN). During a restore session, the Media Agent locates
data on the backup medium and sends it to the Disk Agent. The Disk Agent then writes the data
to the disk. The Media Agent also manages the robotics control of a library. If a system has both
the client agent and the media agent installed then it can perform LAN-free backups.
Symantec NetBackup Application Overview
Symantec NetBackup Enterprise Server is based on a client/server architecture, and comprises several
distinct components:
• Master server: The master server (one per storage domain) manages data protection operations.
The NetBackup master server is responsible for containing the backup configurations and policies,
running the scheduler that initiates automated backups, maintaining catalogs that track the location
and contents of all backups, communicating with media servers to initiate backup and restore
processes, and providing both a command line interface and a graphical user interface to administer NetBackup.
• Enterprise Media Manager server: NetBackup Enterprise Server 6.0 introduced the Enterprise
Media Manager (EMM) server, which centrally manages Media Manager data that had previously
been distributed across multiple media servers. You can only have one EMM server per storage
domain. The master server can also be designated as the EMM server. The NetBackup EMM
server is responsible for managing a consolidated media and device database for the NetBackup
storage domain and maintaining run-time status information, NDMP credentials, and a managed
server list.
• Media server: A media server can be any standalone server running NetBackup server software
that receives requests for backup and restore operations from the master server. (A master server
can also be configured as a media server.) A storage domain can contain multiple media servers.
Media servers communicate with the master server and EMM server to initiate backup and restore
operations on its attached storage devices, communicate with NetBackup clients during a backup
or restore operation, monitor the status of storage devices, and provide robotic control.
• Client: NetBackup clients are servers that have NetBackup client software installed. A client performs
backups through a media server (over the LAN if the media server is running a separate server or
directly as LAN-free backup if the media server is in the client server).
• Storage Unit: A NetBackup storage unit is a storage device attached to a NetBackup server. A
storage unit consists of the media server and tape devices where NetBackup stores files and data.
HP StorageWorks VLS and D2D Solutions Guide51
If a storage unit contains two drives and one is busy, NetBackup can use the other drive without
administrator intervention. To send backups to a storage device, the administrator must define
storage units using the Device Configuration Wizard. For virtual library devices such as VLS and
D2D, the type of storage unit used is the "Media Manager storage unit." (Even though VLS and
D2D are using disk internally, they are seen by NetBackup as a tape library.)
IBM TSM Application Overview
Figure 16 shows an overview of TSM (Tivoli Storage Manager) LAN-based backups that incorporate
virtual libraries. The steps are described below.
Figure 16 TSM LAN-based backups
.
1.Data on application servers (A-G) is backed up by TSM server via the SAN to the TSM disk pool.
A typical TSM primary disk pool is sized for a minimum of one nightly backup.
2.As the primary disk pool fills, TSM moves data from it via the SAN to the primary tape pool.
Unscheduled Migrations Undersized disk storage pools; a common mistake is to define disk pools
that do not have the capacity to hold one night's worth of backup data. Doing so results in data
migration to tape starting when the disk pool has reached its high migration threshold, rather
than starting as scheduled using the "migrate" function. Unscheduled migrations can interfere
with backups causing them to fail or run beyond the allowable window.
3.Data is copied (migrated) from the primary disk pool to the primary tape storage pool on a tape
library. This migration is an I/O and CPU-intensive operation, which is typically limited by the
speed of tape drives. If a migration occurs during a backup, the backup performance slows
dramatically.
4.Data is copied from the primary tape storage pool to a copy storage pool, making a set of tapes
to send off-site for disaster protection. By replacing the primary tape storage pool with a virtual
library, you can dramatically improve backup and restore times, reduce the need for primary
disk, and enable more use of advanced TSM functions such as LAN—free backups.
Figure 17 shows an overview of TSM LAN-free backups that incorporate virtual libraries to improve
backup performance and reduce LAN load. The steps are described below.
Backup Solution Design Considerations52
Figure 17 TSM LAN-free backups
.
1.Data on large application servers (D-G) is backed up by TSM server via the SAN directly to the
tape storage pool.
2.Data is copied from the primary tape storage pool to a copy storage pool, making a set of tapes
send off-site for disaster protection.
EMC NetWorker Application Overview
EMC Networker is based on a client/server architecture and comprises several distinct components:
• Console server: all EMC NetWorker servers and clients are managed from the NetWorker Console
server. The Console server also provides reporting and monitoring capabilities for the servers and
clients.
• Web browser: the Console server is accessed through a graphical interface that can be run from
any computer that has a web browser. Multiple users can access the Console server concurrently
from different browser sessions. A computer that hosts the web browser can also be a NetWorker
client.
• Datazone: a datazone is a single NetWorker server and its client computers. Datazones can be
added as backup requirements increase.
• NetWorker server: NetWorker servers provide services to back up and recover data for the Net-
Worker client computers in a datazone.
• NetWorker storage node: a NetWorker storage node can be used to improve performance by
offloading much of the data movement from the Networker server involved in a backup or recovery
operation.
• NetWorker client: a NetWorker client computer is any computer whose data must be backed up.
The NetWorker Console server, NetWorker servers, and NetWorker storage nodes are also
NetWorker clients.
HP StorageWorks VLS and D2D Solutions Guide53
Backup Solution Design Considerations54
4 D2D Systems
D2D Devices
D2D Defined
The entry level D2D100 series Backup System meets the needs of small businesses as a low-cost
solution that does not incorporate deduplication technology. The D2D2500 is well suited for remote
and branch offices and small IT environments, while the more powerful D2D4000 and D2D4100 are
designed for medium-sized companies and small data centers. The D2D2500, D2D4000 and
D2D4100 products include HP Dynamic deduplication, which provides low cost and flexibility to
meet the needs of smaller IT environments.
All HP D2D products are are RAID disk-based backup devices using serial ATA drives. All D2D devices
include iSCSI virtual library interfaces over LAN, and the D2D4000/4100 have the option of adding
a Fibre Channel interface to broaden connectivity. The D2D emulates a variety of physical tape
libraries including the tape drives and cartridges inside the libraries. D2D accommodates mixed
IT-platform and backup-application environments. For more insight into mixed-IT solutions and
environment analysis, visit the HP StorageWorks Enterprise Backup Solutions web site: http://
www.hp.com/go/ebs.
The virtual tape solution uses disk storage to simulate a tape library. That is, a virtual library simulates
libraries with tape drives and slots as if they were physical devices. Backup servers see the virtual
libraries as physical libraries. Because you can create many more virtual libraries and drives than
you have physical tape drives, many more SAN-based backups can run concurrently from the
application servers, reducing the aggregate backup window. After the backups are complete, the
tape-offload feature or the data protection software can migrate backup data from the virtual media
to physical tape for off-site disaster protection or long-term archival. This migration to physical tape
can happen outside of the backup window at whatever time you determine will work optimally with
your backup environment. The deduplication-enabled replication feature also provides the ability to
create a second off-site copy in another D2D via a TCP/IP link.
HP StorageWorks VLS and D2D Solutions Guide55
D2D Technical Specifications
Table 6 D2D Technical Specifications
D2D4112 Backup SystemsD2D4000 Backup SystemsD2D2500 Backup Systems
For remote or branch offices and
smaller data centers
Up to 2, 3, or 4 TB raw capacity
(1.5, 2.25, or 3 TB usable) depending on model
Up to 6 source appliances per target
D2D Design Considerations
For remote or branch offices and
mid-size data centers
4.5 TB and 9 TB raw capacity (3
TB and 7.5 TB usable)
Up to 16 source appliances per
target
For mid-size data centers and distributed environments
2U rack-mount plus 2U upgrade2U rack-mount1U rack-mount
Scalable from 12 to 24 TB raw capacity
Speeds of > 150 MB/secSpeeds of > 90 MB/secSpeeds of up to 75 MB/sec
2x iSCSI and 2x FC interfaces2x iSCSI or 2x FC interfaces2 x iSCSI interfaces
Protects up to 24 serversProtects up to 16 serversProtects up to 6 servers
The D2Ds have two 1GBit Ethernet ports, and some models also have two 4 Gbit Fibre Channel
interfaces. Correct configuration of these interfaces is important for optimal data transfer.
D2D Ethernet Ports
The Ethernet ports are used for both data transfer over the iSCSI protocol and management access
to the web GUI. The two ports have the following configuration modes:
• Single port – Port 1 must be used for all data and management traffic.
Use this mode only if no other ports are available on the switch network or if the appliance is used
to transfer data over Fibre Channel ports only.
• Dual port – Both ports are used, but must be in separate subnets. Both ports can access the web
GUI but virtual libraries are split across the two ports. Use this mode if:
D2D Systems56
• Servers to back up are split across two physical networks which need independent access to
the D2D. You must assign the virtual libraries to whichever port is on the network of the device
to back up.
• Separate data and management LANs are used, i.e., each server has a port for business network
traffic and another for data backup. One port on the D2D can be used solely for GUI access
with the other used for data transfer.
• High Availability port – Both ports are used but are bonded to appear as a single port.
This is the recommended configuration, there is no special switch configuration required other
than to ensure that both Ethernet ports from the D2D are connected to the same switch. This mode
sets up “bonded” network ports where both network ports are connected to the same physical
switch and behave as one network port. This mode provides some level of load balancing across
the ports and also port failover.
Backup Server Networking
When considering backup performance, consider the whole network. Any server acting as a backup
server should be configured where possible with multiple network ports which are bonded in order
to provide a fast connection to the LAN. Client servers (those that backup via a backup server) may
connected with only a single port if backups are to be aggregated through the backup server.
Ensure that no sub 1Gbit network components are in the backup path because this will significantly
restrict backup performance.
Fibre Channel SAN
Two Fibre Channel ports are provided at 4Gbit. These should both be connected to a SAN switch
which supports 4Gbit Fibre Channel.
Virtual library devices are assigned to an individual interface; therefore, balancing virtual devices
across both interfaces will ensure that one link is not saturated while the other is idle.
Fibre Channel devices should be zoned on the switch to only be accessible from a single backup
server device to ensures that other SAN events (such as the addition and removal of other Fibre
Channel devices) do not cause unnecessary traffic to be sent to devices. It also ensures that SAN
polling applications cannot reduce the performance of individual devices.
Multiple Backup Streams
The HP D2D Backup System performs best with multiple backup streams sent to it simultaneously. For
example, a D2D4009i will back up data at approx 40 MB/s for a single stream; multiple streams
can deliver an aggregate performance in excess of 80 MB/s.
There are two ways that the D2D can be used to achieve multiple stream backups:
• Several virtual library devices each with a single virtual tape drive receiving one backup stream
each.
Use this configuration if you are backing up multiple servers, each of which is a backup application
“media server,” which can directly connect to a virtual library target. Also use this configuration
if you are backing up one or more servers but wish to keep the data backed up from each logically
separate for easier management.
• A single virtual library with multiple virtual tape drives (up to four on HP D2D4000 models) receiving
backup jobs to each tape drive simultaneously.
Use this configuration if you are backing up a single server (or multiple servers using a single
“media server”), and you wish to achieve some deduplication of backup data across all of the
HP StorageWorks VLS and D2D Solutions Guide57
servers. In this case, also ensure that bonded/trunk network ports are configured for both the
media server and the D2D device.
With either configuration there is a requirement to correctly configure the backup application to target
the devices correctly; this will usually involve creating multiple backup jobs to back up specific data
to selected virtual libraries.
Configuring backups to run in parallel rather than serially also improves replication and tape offload
operations because these run after backup jobs have completed. Having backups finish at roughly
similar times will ensure that the replication and tape offload jobs are not in contention with backup
jobs.
However, backup application vendors often base their licensing model on the number of tape devices
configured, so you may need to balance between performance and cost. Some applications (e.g.,
HP DataProtector) offer capacity based licensing (per TB) for virtual tape appliances; this licensing
model may be more cost effective because it allows you to configure any number of libraries or tape
devices.
Disable Backup Application Verify Pass
Most backup applications will default to performing a verify operation after a backup job. While this
offers a very good way to ensure that data is backed up successfully, it will also heavily impact the
performance of the whole backup job. Performing a verify operation will more than double the overall
backup time due to the fact that restore performance (required for verify) is slower for deduplication
enabled devices.
Disabling verify for selected backup jobs can be done relatively safely because D2D Backup Systems
perform CRC (Cyclic Redundancy Check) checking for every backed up chunk to ensure that no errors
are introduced by the D2D Backup System. However, if data is incorrectly written by the backup
application but is not corrupt, this will not be detected by the D2D. For this reason, verifying some
backup jobs is still recommended.
Tape Copy or Offload Performance
Copies of cartridges to physical media using the HP D2D Backup System can be achieved by using
either a directly attached tape drive or library where the offload is controlled by a manual or scheduled
operation from the D2D Web Management Interface, or by using a backup application on the media
server to copy data from a cartridge on the HP D2D Backup System to a tape device or library
connected to the media server. In either case, take the deduplication process into consideration when
scheduling the offload.
The D2D allows a tape offload to be automatically scheduled to start immediately after a backup to
the virtual tape completes; most backup applications provide a similar configuration option if performing
a backup application copy. Take care when using these options because they can result in slower
than expected offload performance due to some level of housekeeping running; also, if replication is
configured the two operations will clash and result in a delay to replication.
You should delay any tape offload for several hours after the backup completes if the backup is doing
a significant amount of data overwrite. Otherwise, the housekeeping process will have a detrimental
effect on the tape offload performance. As a general rule, for every 100 GB of data on the cartridge
being overwritten, allow 30 minutes to pass before starting an offload operation. Because it depends
on the deduplication ratio and amount of unique chunks contained in that backup, it is hard to
determine how many housekeeping operations will be generated (and therefore how long it will take
to run) by an overwrite backup.
D2D Systems58
In addition, consider other operations running on the D2D. For example, if multiple backups finish at
different times, each may result in some housekeeping work. Add all of the data backup amounts
together to get an idea of the time that housekeeping will take.
Tape Library Emulation
Emulation Types
The HP D2D Backup Systems can emulate several types of HP tape library devices, and the maximum
number of drives and cartridge slots is defined by the type of library configured. However, performance
is not related to library emulation except in the ability to configure multiple drives per library and thus
enable multiple simultaneous backup streams.
Cartridge Sizing
The size of a virtual cartridge has no impact on its performance. HP recommends that cartridges are
created to match the amount of data being backed up. For example, if a full backup is 500 GB, you
should select 800 GB, the next larger configurable cartridge. If backups span multiple cartridges,
you risk a performance impact because housekeeping operations will start on the first backup cartridge
as soon as the backup application spans to the next cartridge.
Number of Libraries per Appliance
The D2D appliance supports creating multiple virtual library devices. If large amounts of data are
backing up from multiple hosts or for multiple disk LUNs on a single host, you should separate these
across several libraries (and consequently into multiple backup jobs). Each library has a separate
deduplication “store” associated with it, and reducing the amount of data and complexity of each
store will improve its performance.
Creating a number of smaller deduplication stores rather than one large store which receives data
from multiple backup hosts can have an impact on the overall effectiveness of deduplication. However,
generally the cross server deduplication effect is quite low unless you are storing a lot of common
data. If you store a large amount of common data on two servers, HP recommends that you back
these up to the same virtual library.
Optimizing Rotation Scheme to Reduce Housekeeping
Reducing the frequency of full backup cartridge reduces the amount of deduplication housekeeping
overhead required. You can optimize it using the following:
• Longer retention policy: Deduplication has little penalty for using a large number of virtual cartridges
in a rotation scheme and therefore a long retention policy for cartridges (most data will be the
same between backups and therefore deduplicated), which reduces the number of overwrites.
• Full vs. incremental/differential backups: The requirement for full or incremental backups is based
on how often offsite copied of virtual cartridges are required and on speed of data recovery. If
regular physical media copies are required, the best approach is that these are full backups on
a single cartridge. Speed of data recovery is less of a concern with a virtual library than with
physical media. For example, if a server fails and needs to be fully recovered from backup, the
recovery will require the last full backup plus every incremental backup since (or the last differential
backup). Finding and loading multiple physical cartridges is time-consuming. However, with virtual
tape there is no need to find all of the pieces of media. Because the data is stored on disk, the
time to restore is lower due the ability to randomly seek more quickly within a backup and load
a second cartridge instantly.
HP StorageWorks VLS and D2D Solutions Guide59
• Overwrite vs. append: Overwriting and appending to cartridges is also where virtual tape has
an advantage. With physical media you may want to append multiple backup jobs to a single
cartridge in order to reduce media costs; the downside is that cartridges cannot be overwritten
until the retention policy for the last backup on that cartridge has expired. With virtual tape, you
can configure a large number of cartridges for “free” and you can configure their sizes appropriately to the amount of data stored in a specific backup. Therefore, appended backups provide
no benefit. (This does not apply to VLS deduplication where appending is required to fill up tapes
and perform space reclamation on them.)
Considering the above factors, the following provides an example of a good rotation scheme. This
user requires weekly full backups sent offsite and recovery point objectives of every day in the last
week, every week in the last month, every month in the last year, and every year in the last five years:
• Four daily backup cartridges, Monday to Thursday, incremental backup, overwritten every week.
• Four weekly backup cartridges, on Fridays, full backup, overwritten every fifth week.
• 12 monthly backup cartridges, last Friday of the month, overwritten every 13th month.
• Five yearly backup cartridges, last day of the year, overwritten every five years.
In the steady state daily backups will be small, and while they will always overwrite the last week the
amount of data overwritten will be small. Weekly full backups will always overwrite, but housekeeping
has plenty of time to run over the following day or weekend; the same is true for monthly and yearly
backups. The user can also offload a full backup to physical tape every week, month, and year after
the full backup runs for offsite storage.
D2D Blueprints
The following blueprints of D2D virtual tape libraries with deduplication and replication start from
specifying company requirements, then defining the HP blueprint for the solution, and finally defining
any solution caveats or ISV dependencies associated with the solution. This will help you make informed
decisions and allow you to quickly assess areas of concern and possible implementations.
Single Site Cost Effective Backup Device Consolidation
Company requirements: small site with fewer than 6 servers, fewer than 5 TB, no SAN, wants a single
device for backup that can grow with the business.
Solution: HP D2D Backup System (iSCSI) with Tape Offload, typically used with HP Data Protector
Express or Symantec Backup Exec
Caveats: The smaller D2D units shown here are a fixed size and not directly expandable in capacity.
The initial sizing is very important.
ISVs: HP Data Protector Express or Symantec Backup Exec are typical backup applications in this
segment. See http://www.hp.com/go/connect and http://www.hp.com/go/ebs.
D2D Systems60
Figure 18 Single Site Backup Consolidation using iSCSI
.
Device configuration highlights:
• Configure iSCSI ports in high availability mode to give added redundancy.
• Configure a single library for best deduplication ratio or multiple libraries for best throughput.
• Consider configuring multiple virtual libraries for different data types (office files, MSExchange
databases) across multiple host servers to get the best possible performance and deduplication
ratio.
Backup application highlights:
• D2D devices have the highest throughput with multiple concurrent backup streams. Configure
multiple jobs to run in parallel (at least 4) to obtain best throughput.
• No additional licensing required for physical tape library because the backup application is
connected directly to the D2D unit (SAS or SCSI HBA required).
Recovery options: Depending on retention periods set, it is possible to recover all data directly from
D2D device over extended periods (3-6 months). If backup data on D2D is regularly copied to physical
tape and off-sited, there is a Disaster Recovery solution available also.
Solution advantages:
• Simple to implement
• Cost effective consolidation (iSCSI is free)
Solution trade-offs: Tape Offload functionality needs monitoring as a separate process.
When to use direct attach tape and when to use Backup ISV copy functionality?
If cost is a major driver, use direct tape attached to the D2D unit (because this does NOT attract extra
licencing costs with the backup software), and then use D2D tape offload capability to transfer data
from D2D to physical tape. However, this requires closer daily management by checking D2D through
the GUI. This copy is termed a media copy. If D2D tape has 50 GB of data, then 50 GB of data will
be transferred to physical tape and cannot be appended to.
Use Copy from D2D to Physical Tape using Backup Application when a single point of management
is required and where better use of physical media is required. Object copies via the ISV software
can merge several virtual cartridges onto a single physical cartridge saving physical media costs but
incurring additional licencing costs.
Large SME Site Consolidation Requiring Fibre Channel Shared Devices
Company requirements: fewer than 20 TB, up to 24 servers, growth proof, 12 hour backup window,
reduce dependency on physical tape.
HP StorageWorks VLS and D2D Solutions Guide61
Solution: HP D2D 4112 “expandable” storage, good performance, backup application Object Copy
to small physical tape library for archiving and disaster recovery.
Caveats: Preferred offload to physical tape is via backup application “object copy” functionality for
maximum physical tape usage efficiency.
ISVs: Typically HP Data Protector, Symatec Netbackup, Tivoli Storage Manager, EMC Networker,
Commvault, Backup Exec, etc. For a full list see http://www.hp.com/go/connect and http://
www.hp.com/go/ebs.
Device configuration highlights: Consider configuring multiple virtual libraries for different data types
(office files, MSExchange databases) across multiple host servers to get the best possible performance
and deduplication ratio.
Backup application configuration highlights: Run multiple backups in parallel for best throughput.
Figure 19 Large SME Site Consolidation Requiring FC Shared Devices
.
Recovery options: Depending on retention periods set, it is possible to recover all data directly from
D2D device over extended periods (3-6 months). If backup data on D2D is regularly copied to physical
tape and off-sited, there is a Disaster Recovery solution available also.
Solution advantages:
• Effective consolidation for SME
• Allows backup window to be reduced
• Scalable using D2D4112 – up to 24 virtual devices can be created
• Single management point with Backup Application for D2D & physical tape
• Flexible - some backups to physical tape, some to D2D if required
• Disaster recovery solution included via Physical tape offsiting
Solution trade-offs: Using backup software to do D2D to tape copies attracts extra licencing costs.
Other information:
In this example we are using the backup example to copy data from D2D to physical tape because
the environment is significantly larger and a single point of management is required for all backup
and copy operations. Physical media usage is also a priority. These requirements are best served by
using object copy functionality within the backup application itself. This will attract higher licencing
costs, however.
D2D Systems62
Multi-site Small Business Disaster Recovery Solution
Company requirements: remove physical tape and any human intervention for backups at remote
sites – completely automate backup and disaster recovery. Fewer than 1 TB at each remote site and
up to 24 remote sites.
Solution: Multi-remote site D2D iSCSI devices replicating into a central consolidated D2D at a disaster
recovery site.
Caveats: Some scripting is required for automated replicated data catalog imports at the target site.
(HP Data Protector). Other ISVs require manual import. Sizing replication links and replication window
is key; use the HP StorageWorks Backup Sizer.
ISVs: HP Data Protector Express, HP Data Protector, Backup Exec, Netbackup, Networker, TSM,
Commvault Galaxy, BakBone NetVault – D2D4000 and D2D2500, Syncsort Backup Express –
D2D4000 and D2D2500.
Device configuration highlights: Remote sites typically will have only one virtual library configured.
The central site can have up to four slot mappings per virtual library. 6 x 4 gives support for up to
24 remote sites.
Consider configuring multiple virtual libraries for different data types (office files, MSExchange
databases) across multiple host servers to get best possible performance and deduplication ratio.
Figure 20 Multi-site Small Business Automated Disaster Recovery Solution
.
HP StorageWorks VLS and D2D Solutions Guide63
Figure 21 Multi-site Small Business Automated Disaster Recovery Solution with no Physical Tape
.
For a smaller business where there may be limited budgets or no compliance requirements dictating
data to be archived or stored on tape for several years, the solution shown without physical tape
above is perfectly adequate. The D2D4112 provides RAID 6 data protection plus hot spare disk
capability so data integrity is guaranteed.
Backup application configuration highlights: All replication is controlled via the D2D units. Remote
offices and the central site are configured as normal with the backup application. Multiple parallel
backup streams will always give better performance.
Object copy functionality at the Data centre copies both local data centre backups and replicated
data from the remote offices onto physical tape.
Recovery options: Remote sites can recover data from the local D2D in a matter of minutes. In the
event of a major disaster at the remote site there are several recover options:
• Rebuild server data from replicated data held on the D2D at the data centre.
• Rebuild remote office and “reverse replicate” data from the data centre back to the remote office
over the WAN link (not recommended for high volumes of data such as hundreds of GB).
• Take physical tape copy at the data centre and temporarily connect the tape device to the server
at the remote site and recover data from physical tape.
The main data centre can be recovered either from D2D at the main data centre or from copies of
data made to physical tape.
Solution advantages:
• Removes all human intervention from the backup at remote sites where there may be no IT expertise.
• Cost effective replication links as low as 2 Mbit/sec because D2D only replicates differences in
data.
• D2D at the data centre also acts as both a source for data centre backups and a target for remote
office replications offering significant benefits of consolidation.
• Physical tape for archiving and a disaster recovery solution for the data centre.
Solution trade-offs:
Before replication can start, a one-time “seeding process” has to be followed to ensure remote sites
and data centre D2D have the same data reference points. Seeding can be done via WAN link, via
D2D Systems64
co-location of units and then splitting of units, or by using physical tape to transfer data from remote
office to seed unit in data centre. This process only has to be done once.
Other information: WAN link sizing and data centre D2D sizing is critical. Size for existing link or
new link using the HP StorageWorks Backup Sizer.
Introduction to VMWare Terminology
Figure 22 VMWare Infrastructure Environment
.
The above diagram shows the different components that make up a virtualized server environment
and its connections to external primary storage and data protection devices.
ESX acts as a hardware extraction layer on one side providing virtual resources to the VM machines
whilst on the other side interfacing to the physical hardware.
In this next set of solutions we will talk about possible methods of providing data protection to this
virtualized environment:
• Base Support – Agents inside Virtual Machines
• Virtual Machine snapshots – ESX Console
• Virtual Machine snapshots – VCB
• Zero Downtime Backup and Instant Recovery
Each technique has its pros and cons as listed below. It is important to point out that “snapshot”
backups (sometimes called image or point-in-time backups) cannot always provide single file restore
capability (for example with Linux the whole image has to be restored). In addition, snapshots are
not best suited to complex online backups of applications such as Oracle, SQL, and Exchange because
they do not quiesce the application in the appropriate way before taking the snapshot. Snapshots
are best suited to being aligned with a VMWare machine OS where the whole environment can be
captured in a single image.
HP StorageWorks VLS and D2D Solutions Guide65
Using D2D in Simple VMWare Environments with ESXi
This configuration is currently being certified with VMWare. See future supported configurations at
the end of this document.
Using D2D in VMWare Environments with ESX4 and VMWare Consolidated Backup
Company requirements: Mid-size VMWare environment, wants VMWare based snapshot backup
with individual windows file based recovery using VMWare consolidated backup (VCB) in ESX4.
This is the backup solution promoted by VMWare themselves. The main idea behind VCB is to offload
the workload for backup from the ESX server onto a separate proxy server giving potentially better
performance.
Solution: HP D2D with VTL FC interface and separate VCB proxy server.
Caveats: This is mainly a snapshot based backup technology where complete images have to be
restored. File level recovery from a file system backup is only possible with Windows . The proxy
server has to have sufficient storage to store the snapshots if image based backups are being used
because the images are copied from the primary storage to the proxy server for these types of backup.
ISVs: VCB in ESX4 takes snapshots of the VMWare machines which are then presented to the proxy
server for backup so most backup applications will work when they are installed on the Proxy server.
HP has tested Data Protector and Symantec Netbackup to great depth in this area.
Device configuration highlights: Consider creating one virtual library per virtual machine (best
throughput) or one large library for all machines to share (best deduplication ratio). Consider
configuring multiple virtual libraries for different data types (office files, MSExchange Databases)
across multiple host servers to get best possible performance and deduplication ratio.
Backup application configuration highlights: Backup several streams concurrently for best performance.
Backup is a combination of VCB doing the snapshots and backup applications writing the snapshots
to a backup device. VCB is hardware independent hence can accommodate snapshots with all array
types.
Figure 23 Using D2D in VMWare Environment with ESX4 and VCB Backup for SAN-based Backup
.
D2D Systems66
Recovery options: Recovery is a two stage recovery: 1) the image is restored to the VCB proxy or the
ESX server, and 2) the image is restored as a VM using VMware Converter if restored to the VCB
proxy, or vcbRestore if restored to the ESX server.
Restart the VM and you are back in business as of the last snap. This is the fastest way to do a complete
disaster recovery. For example HP DP on Proxy can restore directly from D2D/Tape onto VM machines.
For Windows OS, you can also mount the filesystem backup to VCB proxy and restore single files or
restore the complete system.
Solution advantages:
• VMWare approved method of backing up, takes the load off the ESX console for backup and
restore and passes it to the proxy server
• Allows all HP supported Fibre Channel backup devices to be connected to Proxy server
• VCB is disk array independent snapshot technology.
• Automation of backup and restore of VM snapshots.
• Disaster recovery of ESX environments
• Easy transfer to physical tape if required, if backup application used.
• With HP Data Protector and most other Backup software the following approaches are possible:
• Online and offline snapshots of VMs.
• File level backup and recovery of data in the VMs
• VCB Image Backup: Online and offline snapshots, Full Backup only.
• VCB File Backup:
• Full Backup
• Incremental and Differential Backups based on per-file modification time — incremental
contains changes since previous backup, differential contains changes since latest full
backup.
Solution trade-offs:
• Full snapshot based recovery – single file recovery only on Windows OS file system backups.
• Distinctly separate proxy server required at additional cost – requires additional storage for copy
of snapshots.
• Backup application integrations such as Oracle, SQL, Exchange, etc. cannot be used with VCB.
Other Information:
After the snapshot has been copied to proxy and backed up, it is deleted from the primary storage
array where it was first taken.
Further enhancements are possible by providing multi-site disaster recovery by using low bandwidth
replication on HP D2D devices.
Using D2D in larger VMWare environments with HP Data Protector Zero Downtime Backup
with Instant Recovery (ZDB + IR)
Company requirements: Large VMWare environment, high throughput required to meet backup
window (Fibre Channel). High SLA s dictate an “Instant Recovery.” High proportion of online backups
required.
Solution: HP D2D with VTL interface and separate VCB proxy server with Data Protector VCB ZDB IR.
Note that the VCB backup and the ZDB backup are completely independent of each other.
Caveats: Limited to arrays that support DP ZDB IR – XP, EVA and LHN (future).
HP StorageWorks VLS and D2D Solutions Guide67
ISVs: HP Data Protector with Zero Downtime Backup Option and Symantec Netbackup with Snapshot
Client are both deep dive tested by HP in this mode of operation.
Device configuration highlights: Consider configuring multiple virtual libraries for different image
types.
Backup application configuration highlights: Configuring Zero Downtime Backup within Data Protector
and Symantec Netbackup requires some additional scripting work. See the HP EBS SAN Design
Guide at http://www.hp.com/go/ebs.
Figure 24 Using D2D in a VMWare Environment with ESX4 and HP Data Protector ZDB IR
.
Recovery options:
• Two step restore process that provides full recovery of VM and data:
• Recover the VM image from VCB backup.
• Required only in case the whole VM is lost or restore to a new ESX server.
• Recover Application Data:
• For replication based disc backups (IR), perform Instant Recovery.
• For disc/tape backups, restore directly to the VM.
• Recovery performed automatically by Data Protector as specified by user.
Solution advantages:
• VCB snapshot based backup is not really “application aware” whereas HP Data Protector ZDB
quiesces the application prior to the hardware based snapshot and creates a crash consistent
snapshot. It is much more suitable for live databases and email servers than VCB based backups.
• Remove the backup load from VM to a backup server with Data Protector Zero Downtime Backup.
• Reduce application recovery times from hours to minutes with Data Protector Instant Recovery.
• Easy copy to physical tape if required.
Solution trade-offs:
D2D Systems68
• Limited to EVA & XP & LHN (future)
• Snapshots could take up large amount of EVA and XP space depending on frequency of snapshots
at extra cost.
• Note that to do the ZDB backup of VM volumes it is a requirement that the VM volumes use the
VMware raw device mappings (RDM) feature.
Other information:
A simple two step backup for complete protection:
• VM Backup
• Data Protector Integration with VMware VCB provides the first layer of protection.
• Provides snapshots that can be used for restoring VM.
• Needed only when VM configuration changes.
• Application Backup
• Provides the second layer that enables protection of application data.
• Can be performed using ZDB/IR feature of Data Protector by using HP Business Copy for EVA,
XP ( LHN to follow).
Further enhancements possible by providing multi-site disaster recovery by using low bandwidth
replication on HP D2D devices.
D2D Dynamic Deduplication
HP Dynamic deduplication technology is designed around compatibility and cost for users with smaller
IT environments. It offers the following features and benefits:
• Uses hash-based chunking technology, providing deduplication technology at a lower cost.
• Operates independently of backup format, avoiding issues with backup application support.
• Automatic and manageable.
Configuring Dynamic deduplication is simple. There is no need to tell the deduplication software
which backup application you are using or where file markers are. Dynamic deduplication works
with any backup application and with any data type.
See http://www.hp.com/go/ebs for more information on backup applications and data types
recommended for use with Dynamic deduplication.
How it Works
Dynamic deduplication uses hash-based chunking technology, which analyzes incoming backup data
in “chunks” that average to 4K in size. The hashing algorithm generates a unique hash value that
identifies each chunk, and points to its location in the deduplication store.
NOTE:
Using small sections of about 4K in size allows the Dynamic deduplication software to locate a greater
number of instances of duplicated data. With smaller chunks of data, more hash values are generated
and compared against each other. If larger chunks of data are analyzed, and a pattern does not
repeat within of the boundaries of the chunk, the duplication is not caught.
HP StorageWorks VLS and D2D Solutions Guide69
Hash values are stored in an index that is referenced when subsequent backups are performed. When
data generates a hash value that already exists in the index, the data is not stored a second time.
Rather, an entry with the hash value is simply added to the “recipe file” for that backup session.
Over the course of many backups, numerous instances generating the same hash values occur.
However, the actual data is only stored once, so less storage is consumed by duplicate backup data.
Figure 25 illustrates the process that backup data undergoes with Dynamic deduplication. The numbered
list that follows corresponds with the image.
DescriptionItem
1
2
3
The backup stream is analyzed in 4K chunks that generate unique hash values. These hash
values are placed in an index in memory.
When a subsequent backup contains a hash value that is already in the index, a second instance of the data is not stored.
When new hash values are generated, they are added to the index and the data is written
to the deduplication store.
Figure 25 Hash-based Chunking
.
Dynamic Deduplication Implementation
Dynamic deduplication is enabled per library on the D2D. See Figure 26.
When you configure the library, it defaults to deduplication enabled. If you disable it, you cannot
selectively apply deduplication to any data on the library device. Compression is also disabled if
deduplication is disabled.
D2D Systems70
Figure 26 D2D Configuration
.
The current deduplication ratio for the D2D device is calculated and displayed in the GUI. This figure
updates automatically. See Figure 27.
Figure 27 Dynamic Deduplication Ratio
.
The D2D device fully emulates a tape library, including its drive-type and supported compression.
Dynamic deduplication can be used in combination with data compression for even greater storage
savings.
You can directly connect a tape drive to the D2D device for copying and exporting data to physical
media. Data that has been deduplicated can be exported to physical tape without being reassembled
first, and without using the backup application.
Restoring Data
On receiving a restore command from the backup system, the D2D device selects the correct recipe
file and begins reading it, starting with the first hash. This process is illustrated in Figure 28 and the
ordered list that follows.
HP StorageWorks VLS and D2D Solutions Guide71
Figure 28 Restoring from a Recipe File
.
1.The first hash in the recipe file is located in the index, which provides the location on disk of the
original chunk of data.
2.The original chunk is found and returned to the restore stream.
3.The D2D moves on to the second hash, and repeats this process for all subsequent hash values
in the recipe file until the entire file is returned to the restore stream.
Housekeeping
If data is deleted from the D2D system (e.g., a virtual cartridge is overwritten or erased), any unique
chunks only applicable to the deleted data will be marked for removal and non-unique chunks will
be de-referenced. The process of removing chunks of data is not an inline operation because this
would significantly impact performance. Instead, the process of “housekeeping” runs as a background
operation on a per cartridge basis and runs as soon as the cartridge is unloaded and returned to its
storage slot.
While the housekeeping process can run as soon as a virtual cartridge returns to its slot, this could
cause a high level of disk access and processing overhead affecting other operations such as further
backups, restores, tape offload jobs, or replication. In order to avoid this, the housekeeping process
checks for available resources before running; if other operations are in progress the housekeeping
will delay to prevent impacting performance of other operations. The delay is not binary (i.e., on or
off), so even if backup jobs are in process some low level of housekeeping will still occur which may
have a slight impact on backup performance.
Housekeeping is important in maximizing the deduplication efficiency of the appliance. Therefore,
you must ensure that it has enough time to complete. Running backup, restore, tape offload, and
replication operations with no break will result in housekeeping never completing. As a general rule,
housekeeping needs 30 minutes per day for every 100 GB of data overwritten on a virtual cartridge.
For example, if on a daily basis the backup application overwrites two cartridges in different virtual
libraries with 400 GB of data on each cartridge, the system will need two hours of inactivity over the
course of the next 24 hours to run housekeeping in order to de-reference data and reclaim any free
space.
D2D Systems72
Configuring backup rotation schemes correctly is very important to ensure the maximum efficiency of
the product by reducing the amount of housekeeping required. See Optimizing Rotation Scheme to
Reduce Housekeeping.
D2D Replication
The D2D replication technology (which leverages Dynamic deduplication) offers:
• Ease-of-Use. Unlike some competitors, D2D has a web based GUI to configure replication targets,
implement policies, and provide comprehensive reporting. Wizards for replication and recovery
make the configuration easy to accomplish in only five steps.
• Tape Offload functionality. Only HP offers easy migration of data from virtual tape to physical
tape either for off-site/archiving of data for compliance reasons, or for use in tape initialization
during replication setup, and in the disaster recovery process.
• HP D2D can run up to eight replications concurrently, allowing optimal use of the bandwidth
available so your replication completes as soon as possible.
• The D2D implementation offers link throttling so you can determine what percent of the available
link speed is to be utilized for replication. This prevents D2D replication swamping the inter-site
link and maintains acceptable application performance across sites whilst replication is also taking
place.
• With many-to-one and N-way replication, the D2D4100 devices can support up to 24:1 fan in
from D2D2500s located at remote sites, allowing you to make replication very cost effective by
consolidating up to 24 remote sites into a single replication target. Replication can also be arranged
to go Site 1 to Site 2 to Site 3 to Site 1 in N-way fashion.
• Minimizes the amount of data to replicate. One of the features of HP D2D replication is that if the
backup software is configured to append to cartridges rather than overwrite them, HP D2D only
transmits manifests to match from the append point, which saves further bandwidth compared to
matching the manifest for the full cartridge each time (i.e., only transmit the manifest for the new
‘session’ appended to the tape).
• Price point. HP D2D2500/4000/4100 Backup Systems support deduplication as standard and
are better value than products from other leading competitors. Add to that competitive license
costs for replication and the fact that you can replicate up to 16 D2D2500s into a single D2D4000
but only have to purchase a single replication license for the D2D4000. The final result is a very
compelling cost structure for deduplication and replication.
How it Works
HP D2D Backup Systems with HP Dynamic deduplication use a technology called hash-based chunking
to save space. Every unique 4K of data within the D2D deduplication store has its own unique hash
code. After the two D2Ds have been “initialized” with reference data (see Implementing the Initialization
Process) then only new data created at the source site needs to replicate to the target site. This efficiency
in understanding precisely which data needs to replicate can result in bandwidth savings in excess
of 95% compared to having to transmit the full contents of a cartridge from the source site. In addition,
there is some overhead of control data that also needs to pass across the replication link. This is known
as manifest data, a final component of any hash codes that are not present on the remote site and
may also need to be transferred. Typically the “overhead components” are less than 2% of the total
virtual cartridge size to replicate.
D2D Replication Implementation
The following must be performed to implement replication on D2D systems.
HP StorageWorks VLS and D2D Solutions Guide73
Licensing
On HP D2D 2500/4000/4100 Backup Systems, Dynamic deduplication is included as standard.
Replication is a licensable feature at extra cost. Only one license is required for the replication target.
Implementing the Initialization Process
Before D2D replication can take place the reference data (the first full backup cycle performed with
deduplication enabled) needs to be sent to the target. This initial reference copy of the data is large
and could take a significant amount of time to transfer (weeks or months) if the WAN/LAN replication
link was sized around the daily deduplicated data to be transferred. Once the target is initialized
with the reference data the daily deduplication-enabled replication can commence over the WAN/LAN
link. The following diagram shows the initialization options:
Figure 29 Options for Initialization the Initial Data
.
• Active/Passive initialization options:
• Option 1: Co-locate the units and use a local high bandwidth (GbE) link, then separate units
and ship one to the target site.
• Option 2: If the link bandwidth is high enough and you are willing to wait several days then
use the WAN replication link to initialize the target device.
• Option 3: Use physical tape initialization. Physical tape library access is required at both sites.
Export tapes from the source, import tapes into the target. During the import, the deduplication
store is automatically created. Data is exported in backup software format.
• Many-to-one initialization options:
Co-location is probably not an option here because once remote sites are replicating to a single
target device you do not want to disturb it to ship to a remote site to initialize it.
• Option 1: If the link bandwidth is high enough and you are willing to wait several days then
use the WAN replication link to initialize the target device.
D2D Systems74
• Option 2: Use physical tape initialization. Physical tape library access is required at both sites.
Export tapes from the source, import tapes into the target. During the import, the deduplication
store is automatically created. Data is exported in backup software format.
Initialization — Co-location and WAN Replication
See Configuring Replication for the process of configuring replication on HP D2D devices. In this case
the two D2D units can be directly connected through a GbE in a single location. The first replication
to a target involving a full local copy will take some significant time depending on size of cartridge
being replicated because the full set of hash codes and deduplication store have to replicate to the
target.
Physical Tape Initialization — Export and Import
With tape initialization:
• Attach a tape drive or MSL Tape library to the source D2D unit using the Tape Attach tab in the
GUI; perform a scan to discover the device.
• Perform your first full backup to the source D2D device.
• Using the Manual jobs tab under tape attach select the slots on your D2D device that require
copying and perform a virtual tape to physical tape copy.
• The virtual barcode on the source device is copied into the cartridge memory of the LTO physical
media and retrieved during the import process, so even the virtual barcode is transferred correctly.
• Select “Create.”
Once the tapes have been created and shipped to the remote site where the target D2D is located,
the reverse process can take place using “import.”
NOTE:
The import for the tape initialization process
established. (See Configuring Replication.)
As part of the import process the hash codes and deduplication store are automatically created on
the target device. The import is forced into the virtual mail slot of the virtual library on the target device
and can then be moved into a usable slot.
must
take place before the replication slot mappings are
HP StorageWorks VLS and D2D Solutions Guide75
Replication Setup
Before you set up the D2D for replication it is important to understand the concept of slot mapping,
because this is how replication is administered. It is also important to understand your cartridge
rotation scheme because the replication is all predicated on knowing that weekly backups go to Tape
X in Slot Y, that daily incrementals go to Tape A in Slot B and so on. Replication is triggered by a
tape being ejected from a virtual tape drive in the D2D. HP D2D replication ensures than minimum
bandwidth is used for incremental backups being replicated from the same tape, by only transferring
the incremental manifests associated with each “session” of incremental backups appended to the
same tape.
D2D Systems76
Figure 30 Slot Mappings
.
• A “slot mapping collection” is any number of slots within a source library that are configured to
replicate to a target library.
• There is a maximum of one slot mapping collection per source library (you can add slots to mapping
after initial creation).
• There is a maximum of four slot mapping collections per target library (one for the D2D2500).
The preceding example shows the flexibility of HP D2D replication technology.
In remote office A, daily “fulls” are taken each day so all six tapes in VTL A1 are required to replicate
to an identical target device at data center D because this is considered critical data. All six slots must
be mapped, but it is only when data changes on a cartridge in VTL A1 that replication will take place
because replication is only “triggered” on cartridge eject. A similar situation applies to VTL B2 in
remote office B.
However, for host A2 only the weekly full and monthly full require replicating. Rather than create a
target replication library at remote site D for just two cartridges, you can combine this with the five
cartridges from source library B1. This is because there is a limit of four slot mappings on the target
device. So A2 and B1 have to share a slot mapping on VTL D2. Notice how at data center D, target
replication libraries can be shared among sources, allowing for consolidation.
HP StorageWorks VLS and D2D Solutions Guide77
Configuring Replication
The following steps will configure D2D replication.
If using tape initialization, the initialization process has to happen before the replication is configured.
Prior to configuring replication, you must enter the replication license on the target device.
Use the diagram below when reviewing the steps to configuring replication. This example uses a
D2D2500 source library IP 192.168.0.44 and a D2D4000 Fibre Channel target library at
192.168.0.207. The target already has another source mapped into it from D2D4000 at
192.168.0.43.
All the network parameters must be set correctly before starting the replication configuration. The IP
addresses are set as part of the initial D2D installation using either a “Discovery CD” or by connecting
a monitor and keyboard to the rear of the D2D unit. You must set your DNS and gateway address.
DHCP is permitted but the network administrator should ensure that there is a permanent lease assigned.
Figure 31 Example Schematic
.
D2D Systems78
1.Start the replication wizard.
a.Select the Replication tab in D2D GUI and then the Mapping Configuration tab. The wizard
can then be initiated. All the configuration talks place on the source library. The only actions
necessary on the target library are to configure it with an IP address. Note also that this is
the starting point for the recovery wizard. Library 1 is the new source library in the D2D2500.
Notice how the source library is initially “ non-replicating” until it becomes part of a replication
pair.
b.Select Start Replication Wizard.
2.Add a target device to source device.
a.Using the D2D replication wizard you can set the IP address of the target library device.
You must know the target library IP address. In this case, 192.168.0.207.
b.Input the target appliance IP address.
c.Select Add Target Appliance.
The target library appears to the source and is online.
HP StorageWorks VLS and D2D Solutions Guide79
3.Assign target slots.
a.Select Next.
Use part of an existing library on the target or create a new library on the target directly
from the source for the purpose of replication. In this case map four slots from the local
(source) library onto target Lib 2 on the D2D4000 which has already been created.
Alternatively you can create a brand new target library by selecting Create New TargetLibrary.
If a new target library is created from the source the bar codes match identically source and
target. If the target library exists already the bar codes do not match.
Note how target-lib2 is non-replicating until the slot mappings have been configured.
b.Select target-lib2 as the target replication library to use.
c.Select Next.
D2D Systems80
4.Configure slot mappings.
Set up the slot mappings from the source library slots to the target-lib2 slots and give the slot
mappings a name.
a.Provide a name for the slot mapping.
b.Select Apply.
Notice how the virtual bar codes at both the source and the target are displayed.
In this case there is a 1:1 mapping from the source onto the target replication library using the
drop down lists.
HP StorageWorks VLS and D2D Solutions Guide81
5.Configure throttling.
a.Configure the necessary Blackout windows and Bandwidth throttling settings unique to your
requirements. This example does not allow replication between 08:00 and 18:00 during
the week.
b.Select Finish.
Note in this example the bandwidth is not limited so there is 100% utilization of the available
link.
Reporting
It is possible to view the replication status from both the source and target D2D devices.
Target Device Reporting
On the target device you can view the target replication library to see the “synchronization status”
of each virtual cartridge. For example, the screenshot below shows a D2D4000 target replication
library with three replication target libraries set up (Target-lib1, Target-lib2, Target-autoload). The top
part of the screen shows how many of the slots in the target libraries are already mapped to different
sources.
Imagine that you just performed a backup to the four cartridges in slots 1-4 on source library
192.168.0.43. As the cartridges are ejected from the drive, replication starts and reports
“synchronized” when the replication of a particular cartridge is complete.
By clicking on target lib 1, then on slot mapping 1 (for target lib 1) you can see the status of the four
cartridges configured for replication; two are already synchronized and two are in the processing of
replicating (replication synchronizing).
D2D Systems82
Source Device Reporting
From a source perspective, it is important to know if your cartridges have replicated successfully. From
each source device on the replication tab there is an “Event History” tab which displays all the
replication activity concerning that source in a time stamped Event History log. For the source library
configured in the example above, you can see the stages and times as each of the four mapped
cartridges is replicated. For the activity selected, the details are shown in the bottom of the screen
shot. Using this Event History you can determine how long a particular cartridge took to replicate,
average replication speed, and so on.
HP StorageWorks VLS and D2D Solutions Guide83
Design Considerations
Because every enterprise backup environment is different, there is no single answer for optimizing
replication. This section discusses replication design considerations to help you optimize replication
in your environment.
Link Sizing
Knowing the rate of change of data on your systems and hence being able to size the replication link
efficiently is the key to a successful implementation. Size the link too small and you cannot off-site all
your data in a reasonable time. Size the link too big and you are paying for bandwidth you may not
be able to use. The most likely implementation scenario is either to utilize unused bandwidth in an
existing link or to provision a bandwidth increase on an existing link to accommodate virtual tape
replication. For most users somewhere between a 2 Mb/sec and 50 Mb/sec link should be sufficient
for the amounts of typically data transferred using HP D2D replication. This section focuses on
understanding how to size the replication link bandwidth required.
The size of the replication link will depend on the usage model for replication.
• Backup and replication take place in a fixed window (up to 12 hours) preferably when the WAN
link is not heavily used by applications (i.e., outside business hours). The replication process has
almost the full bandwidth of the WAN link available to it. Within the HP D2D Backup System
configuration for replication you can set a “blackout window” during which time replication is
NOT allowed to operate.
• The business runs 24x7 and there are no fixed windows in which to do replication. In this example
the replication can be limited and the HP D2D devices can be configured to only us a percentage
of the available bandwidth so as not to adversely affect application performance. The replication
will take longer using WAN link throttling.
HP also provides at Backup Sizing Tool to help you size any backup solution including deduplication
and replication: http://www.hp.com/go/storageworks/sizer .
Sizing the bandwidth required is best illustrated by some examples:
• 100 GB virtual cartridge replication over a 2 Mb/sec link with 100% bandwidth utilization, data
off-sited in a 12 hour fixed window.
• 100 GB virtual cartridge replication over a 2 Mb/sec link with 25% bandwidth utilization, data
off-sited over 24 hours.
• 500 GB virtual cartridge replication over a 2 Mbit/sec link, data off-sited in a 12 hour fixed
window.
• 500 GB virtual cartridge replication over a 10 Mbit/sec link with 20% bandwidth utilization, data
off-sited over 24 hours.
• Many-to-one example with mixed WAN speeds.
D2D Systems84
Figure 32 Example 1: 100 GB Virtual Cartridge Replication over a 2 Mb/sec Link with 100% Bandwidth
Utilization
.
Calculating data to replicate:
• 100 GB daily backup to replicate within a 12–hour fixed window (between 20:00 and 08:00).
• Replication time for 2.16 GB is 3.46 hours over this link at 100% bandwidth utilization.
The above example shows a 100 GB tape to replicate daily between sites with a Symmetric Digital
Subscriber link (SDSL) between the two sites.
There are several factors involved in sizing the link:
• Full backup size in this example is 100 GB.
• Daily rate of change of data at primary site, in this example 1%.
• Replication required in fixed window or in shared bandwidth mode using bandwidth limiting
(throttling).
• New data being replicated is compressed, communication overhead data (manifest and imperfect
hash code matches) are already compressed.
• Communication overheads typically add up to about 1.5% of the full backup capacity.
• WAN links are not 100% efficient because of TCP/IP overheads contention and so on. This example
assumes 66% link efficiency (typical).
The net result is that the system can replicate a total of 2.16 GB in 3.64 hours with full bandwidth
available at 66% efficiency.
To replicate in shared bandwidth mode, perhaps because this business is a 24x7 operation, then set
the D2D device up to use no more than 25% of the bandwidth available. In this case the replication
will take 14.4 hours which is acceptable to most businesses.
HP StorageWorks VLS and D2D Solutions Guide85
Figure 33 Example 2: 100 GB Virtual Cartridge Replication over a 2 Mb/sec Link with 25% Bandwidth
Utilization
.
Calculating data to replicate:
• The same data calculations as the example in Figure 32.
• The total data to transmit is 2.16 GB.
Results:
• Initial 100 GB backup complete in 30 minutes.
• Replication time for 2.16 GB is 14.4 hours over this link at 25% bandwidth utilization.
The time required for replication is derived from the equation:
NOTE:
To set this in context, in a high bandwidth model you would have to replicate the whole 100 GB
between sites on a 2 Mb/sec link which would take a long time. Instead, using deduplication and
replication, you only have to replicate 2.16 GB of data to achieve the same result.
Now consider a larger backup requirement where 500 GB of backup requires replicating each day.
Figure 34 Example 3: 500 GB Virtual Cartridge Replication over a 2 Mbit/sec Link
.
D2D Systems86
Calculating data to replicate:
• 500 GB daily backup to replicate within a 12–hour fixed window (between 20:00 and 08:00).
• Replication time for 10.8 GB is 18.1 hours over this link at 100% bandwidth utilization – but this
does not meet the service level agreement.
In this example, start off with the an 2 Mbit SDSL link. You soon realize that the 10.8 GB of data to
replicate even using 100% of the available bandwidth is 18.1 hours which is outside most users'
fixed window of available time. It would take 4 x 18.1 hours using 25% throttling of the 2 Mb/sec
link. Clearly a link speed upgrade is required to meet either of your criteria.
Upgrading to a 10 Mbit/sec link resolves the problem.
The replication time now within a fixed window is 18.1/5 = 3.62 hours.
The replication time now with shared bandwidth mode at 20% = 18.1 hours (within the required 24
hours required).
Figure 35 Example 4: 500 GB Virtual Cartridge Replication over a 10 Mbit/sec Link
.
Calculating data to replicate:
• The same data calculations as the example in Figure 34.
• The total data to transmit is 10.8 GB.
Results:
• Initial 500 GB backup complete in 2.75 hours.
• Replication time for 10.8 GB is 18.1 hours over this link at 25% bandwidth utilization.
Finally, look at a many-to-one example which is a combination of the above scenarios.
HP StorageWorks VLS and D2D Solutions Guide87
Figure 36 Many-to-One Example using Mixed WAN Link Speeds
.
Here are four remote sites replicating 100 GB and 500 GB backups daily into a central D2D4000.
Note also that the central D2D4000 has 2 x 1 GbE connections from the central router for link
aggregation.
Follow these guidelines for the best approach to a many-to-one scenario:
• Size each of the individual links separately.
• If possible use the “blackout window” settings to stagger the replications throughout the day to
ensure there are not greater than eight replications active at any one time.
• Be careful with bandwidth limiting values. There is a minimum of 512 KB/sec and the throttling
value selected should not cause impaired application response times.
• If possible, provision some extra bandwidth above what is required for daily replication to allow
for “catch-up” mode if replications are paused/stopped due to a link failure.
• Wherever possible do a proof of concept before implementing into production.
D2D Systems88
Table 7 shows the typical ongoing replication times depending on link speeds and how long the
initialization process will take depending on if initialization is done via co-location or the WAN link.
Use this table as a reference guide.
Table 7 Sample Initialization and Replication Times
Initial
backup
size in
GB
50
Link
speed in
Mb/sec
2
Backup
time at
25
MB/sec
in hours
Initial.
time via
WAN in
hours
(66% efficiency
uncontended),
compression at
1.5:1,
deduplication at
1.5:1,
local
backups
in progress
Initial.
time via
co-location in
hours,
replication on
GbE
Ongoing
replication time
in hours
(1% Data
change
rate) –
uncontended link
(100%
utilization)
1.50.537.10.6
Limiting
factor in
replication
WAN
link
Ongoing
replication time
in hours
(1% data
change
rate) —
25% utilization
link
6.1
12.33.11.074.21.1100
61.3*15.3*5.0370.9*5.6500
Limiting
factor in
replication
WAN
link
100
100
100
10
15
30
122.6*30.6*9.9741.8*11.11000
0.61.014.81.1
0.41.09.91.1
0.41.04.91.1
0.41.03.01.150100
WAN
link
WAN
link/D2D
appliance
D2D appliance
D2D appliance
2.5
12.33.15.074.25.6500
24.56.19.9148.4*11.11000
1.6
8.22.15.049.55.6500
16.34.39.998.9*11.11000
0.8
4.12.15.024.75.6500
8.24.39.949.511.11000
0.5
WAN
link
WAN
link
WAN
link
WAN
link/D2D
appliance
HP StorageWorks VLS and D2D Solutions Guide89
Initial
backup
size in
GB
Link
speed in
Mb/sec
Backup
time at
25
MB/sec
in hours
Initial.
time via
WAN in
hours
(66% efficiency
uncontended),
compression at
1.5:1,
deduplication at
1.5:1,
local
backups
in progress
Initial.
time via
co-location in
hours,
replication on
GbE
Ongoing
replication time
in hours
(1% Data
change
rate) –
uncontended link
(100%
utilization)
Limiting
factor in
replication
Ongoing
replication time
in hours
(1% data
change
rate) —
25% utilization
link
2.52.15.014.85.6500
4.94.39.929.711.11000
Limiting
factor in
replication
100
1000
* = Unacceptable
usage
model
Depending on the size of the full backup and the link speeds deployed, D2D can deliver the replication
in either a 12 hour window (at 100% link utilization) or 24 hour window (with 25% link utilization).
Areas marked with an asterisk are outside of these stated usage models. A 1000 Mbit/sec link
represents the link speed used when co-locating the devices for initialization.
Telco Provisioning — Overview of Inter-Site Links
This section provides an overview of how Telcos provide WAN links. With HP D2D device replication
the typical parameters required to be specified to the Telco are:
• Physical location with respect to a Telco Point of Presence. It is often said that it is the last 100 m
of a WAN link where all the cost is incurred. A Point of Presence (PoP) is where the Telco has a
connection point to their Core infrastructure. Generally this is housed in the same building as their
telephone exchanges. Sometimes laying in the copper or fibre from a Telco’s point of presence
to your building is where the majority of the installation costs are incurred. The physical location
of your office or disaster recovery site will definitely have an effect of the link price. Needless to
say if your premises already have a link in place then generally increasing the bandwidth of an
existing link is relatively easy. Most Telcos require a business to be within 25 km of a point of
presence for Ethernet connections.
• Speed required—Most D2D replication implementations will not require links above 50 Mbits/sec.
Many will be able to function adequately on speeds as low as 2 Mbit/sec links.
0.41.01.01.1
D2D appliance
0.4
2.12.15.05.05.6500
4.34.39.99.911.11000
D2D appliance
D2D Systems90
• Resilience—dependent on business criticality of replication. Even with the most basic package
(single line unprotected) most Telcos will guarantee 99.9% availability. Above this you can opt
for master and standby paths (only one active at a time) that link to a single NTE (Network Termination Equipment) at your premises. If one path fails YOU are responsible for moving your equipment
over to the standby path. Finally you can have 2 x NTE at your site with Master and Standby
paths and fail-over can be automated.
• Distance—inside a single country for example UK, there is generally no additional charge for
distance (unless very high bandwidth requirements > 1 Gb/sec) but in large countries where a
LATA (Local Access and transport and area) boundary is crossed then additional costs (to use
different Telcos lines in different states) to establish a point to point connection will be incurred.
Inter country or Intercontinental distances require a more consultative engagement with your Telco
Provider.
• Contention/Uncontended—Contended links are those shared with other users and bandwidth
cannot always be guaranteed. Even some business services have a certain amount of contention.
For example, BT Etherflow Standard Class of Service offers 80% bandwidth guaranteed with 20%
contented. Whereas BT Ethernet Premium Class of Service allows 100% uncontented bandwidth.
HP recommend an 80% uncontended or above link for D2D replication.
• What is not acceptable is to try and run an HP D2D Replication solution on a Home broadband
type of link where contention rates of 20:1 or 50:1 can be present. Upgrade to business broadband
Class of Service. This is a way of prioritizing traffic on the Telco backbone. Voice over IP for example requires a high priority to maintain coherence. Replication is not considered business critical in this respect and so typically default class of service is acceptable. Above default class there
will be a premium class (Assured Forwarding) where data queues in an MPLS environment the
data are marked with an identifier to allow it to have some level of priority within the Telco’s infrastructure. Finally data such as VOIP is given a strict priority queue identifier so it always takes
precedence; this is sometimes known as expedited forwarding. Class of Service may also have
an effect on the Latency (see below) of your environment.
• Security. Another reason for choosing an uncontented service is that your replication data is not
mixed with other users. Today’s Ethernet connections typically take the form of a VPN (Virtual
private network) so an element of securing exists within the transfer protocol already. A further
enhancement if security is a real issue should be to consider implementing IPSEC (Internet Protocol
security). HP D2D will support this in future releases. The addition of IPSEC security however may
reduce overall replication throughput because of the increased overhead involved in encryption.
• Latencies. These are delays introduced in sending data over a distance or delays involved in
routers, etc., used to direct the data paths. A round trip delay (RTD) in milliseconds is a measure
used to indicate latency.
The longer the latency the longer the replication can take, because the HP D2D will only put so
much information on the line before waiting for a response from the target device. HP has tested
D2D with latencies up to 280 ms with no issues.
A Typical Telco example, British Telecom provides a Core Network performance SLA on Round
Trip Delay, this may vary with class of service. For example:
• Expedite Forwarding 20 ms (RTD)
• Assured Forwarding 23 ms (RTD)
• Default 30 ms (RTD)
• Packet Delivery: This is a measure of the packet loss due to noise or transients, packets are retrans-
mitted which again may slow replication down if the link is particularly bad, but most Telcos offer
an SLA on packet delivery of above 99.8%.
• Most Telcos using the latest Networking technology such as MPLS deliver WAN links via a “cloud”
of VPN connections as if their core infrastructure was a massive router.
HP StorageWorks VLS and D2D Solutions Guide91
Figure 37 A Basic WAN using MPLS
.
Some Idea of Telco Costs
It is very difficult to generalize on typical link costs because of the variables listed above, but as a
practical example a major UK Telco operator quoted costs for links at various speeds for HP Sites in
Bristol, UK and Warrington, UK, for a distance of around 250 km. This assumes they are within the
25 km of a point of presence and no additional construction is required to lay cables/Fibre.
HP D2D will typically require link speeds from 2 to 50 Mb/sec depending on how much data needs
to replicate.
Table 8 Telco Prices for a 3-year Contract Including Installation (No Existing Links in Place)
Rental per year (UK £)Installation Cost (UK £)Bandwidth Mbit/sec
Average per year (over
3 years) – (UK £)
155581300476632
1824215732753210
283552442611787100
532844556023172200
634795575523172500
8050572781231721000
D2D Systems92
Table 9 Prices of a Major US Telco for HP Facilities in the Houston, Texas Area
Bandwidth Mbit/sec
1
Rental per year (US $)Installation Cost (US $)
Average per year (over
3 years) – (US $)
120661140020002
1866618000200010
37000360003000100
733337200040001000
1
This company did not offer 200 and 500 Mb/s lines.
Table 10 Prices of a Telco Provider in Colorado for HP Sites Located around 250 km (175 miles) Apart
1
Installation Cost (US $)Bandwidth Mbit/sec
Rental per year (US $)
Average per year (over
3 years) – (US $)
836081606005
9380918060010
1550015300600100
21700213001200200
39724393241200500
1
These prices are all within the same local access and transport area.
Table 11 Prices for Point to Point Connects from HP Facilities based in Singapore
There is clearly a wide variety of costs. Be sure to have a thorough discussion with your Telco provider
to fully investigate all the options available.
Telco Terminology and Branding for WAN Services
Below are a sample of the services provided by BT (British Telcom) in the UK for inter-site WAN
replication links:
697606936012001000
Rental per year (S $)Installation Cost (S $)Bandwidth Mbit/sec
Average per year (over
3 years) – (S $)
277332640040005
4453343200400010
993339600010000100
2170014400010000200
21933321600010000500
339333336000100001000
• BTnet Premium (Internet access)
HP StorageWorks VLS and D2D Solutions Guide93
This is a flexible bandwidth solution with speeds between 64 Kbps and 1 Gbps. The router is
provided by BT and is designed primarily for internet access. The connection is typically to a BT
point of presence POP and optional features such as failover and load balancing are available.
The connection to the BT POP would be copper wire for the lower speeds (SDSL type connection)
(64 Kbps, 2 Mbps) and optical fibre at the higher speeds.
A VPN would need to be set up to connect to another sites subnet. This would be necessary for a
D2D replication connection. Contention may be an issue with this type of service.
• BT Etherflow
This service is a high-performance WAN services delivered as an ethernet connection but with
the end user retaining end to end control over the IP architecture and the ability to support non-IP
applications such as AppleTalk, IPX, DECNet and SNA.
Speeds are available between 1 Mbps and 1 Gbps. Building blocks are available to create VPNs
and separate traffic classes. (Traffic could be separated for different groups within a company or
for IP telephony.)
This service is optical fibre based. This service could be used for D2D replication but would typically
provide other services on site already (connections to servers located at a central site).
This technology also runs on BT 21st century Network—MPLS.
• BT IP Clear
This service provides network integration using the Telcos MPLS infrastructure and is suitable for
the larger corporate network. It provides a full range of access and is delivered to user locations
by leased line from 128 Kbps up to 1 Gbps through optical fibre. All routing and VPNs are handled
by the MPLS core and typically a user would be provided with an edge router. Voice circuits (IP
telephony) can be fully integrated into this service.
• BT Ethernet extension services
This service provides high speed dedicated links using permanent optical fibre connections. LAN
speed from 10 Mbps to 1 Gbps is supported. Alternative protocols can be used such as ATM and
fibre channel.
Using this service the LAN could span several sites with no VPNs.
There is a distance limit of 25 km between sites (can be expensive).
Table 12 Sample British Telcom Services and Speeds
Available speedsService
Suitable for D2D low bandwidth
replication
64 kbps to 1 GbpsBTnet Premium
10 Mbps to 1 GbpsBT Ethernet Extension
Yes (2 Mbps upwards) need VPN
and fixed IP. Contention?
Yes—2 Mbps upwards1 Mbps to 1 GbpsBT Etherflow
Yes—2 Mbps upwards128 Kbps to 1 GbpsBT IP Clear
Yes but limited distance and likely
to be expensive
When to use BT Etherflow and when to use BT IP Clear?
• BT Etherflow is the right choice for users who want an Ethernet (Layer 2) hand-off and seek to retain
end to end ownership of their IP Architecture. It best suits users with a small numbers of high
bandwidth sites.
• IP Clear is the right choice for users who want an IP (Layer 3) hand off and wish to share their IP
architecture with BT. It offers any to any connectivity and is best suited to users with a large number
of low bandwidth sites.
D2D Systems94
D2D Replication Data Recovery Options
Replication on D2D enables easier recovery of data in the event of a major site disaster. The data is
not instantly available but has to be recovered through a standard restore process using a backup
application. These scenarios look at what happens in the event of a total remote site disaster and lose
the server and the D2D appliance that was protecting it.
In the event of a total disaster at a remote site where the D2D Backup System and the servers it is
protecting are destroyed or damaged, the replicated data at the disaster recovery (target) site can
be accessed by means of a backup server on the remote site and transferred either to an application
or onto physical tape for distribution. Alternatively, when the damaged site is repaired the Virtual
Tape Library at the disaster recovery site can be used to “reverse replicate” to the Virtual Tape Library
at the previously damaged site.
Figure 38 Recovery Options — Data Center Rebuild and Reverse Replication
.
Restore Directly from the D2D Target Device
The first method of recovery uses the fact that there are probably more trained IT personnel at the
data center and equipment such as the spare servers may well be kept there. The HP D2D system
with replication can act as a source device, a target device or a non-replicating library (a library that
can be connected to a system and data recovered directly from the device). You can convert the target
library into a non-replicating library (by deleting the relevant replication slot mappings), construct a
new server at the data center, hook up the D2D device, and recover the unique server data. Once
complete, the new server and a new D2D can be shipped to the remote site and replication
reestablished.
Reverse Replication on the D2D
A second method of recovering is to perform a function called reverse replication. The data is copied
back from the target to the new source appliance at the remote site. When this is complete, the remote
site can be rebuilt and the server recovered from the data now available on the remote site. The effect
HP StorageWorks VLS and D2D Solutions Guide95
of reverse replication is the same as if initializing the device at the remote site and so depends on
the link speed and amount of critical data to be reverse replicated. Once the server is recovered and
able to perform backups again, configure replication back to the target device in the data center all
over again. This option is easy to implement even in many-to-one replication scenarios.
Reverse replication is a means whereby the source and target libraries maintain their identity, but
where at the point of rebuilding the source using a special “recover first” mode to force the source to
go to the target and reverse replicate data from the target back to the source. The basic process takes
place entirely from the remote site where the new D2D appliance has been installed:
1.With a new replacement D2D at the remote site set the appropriate network settings.
2.Create an identical virtual library (or libraries if more than one was originally configured).
3.At the source select the Replication tab from the GUI. Then select Mapping Configuration. Begin
as if configuring a replication target, but start the recovery wizard instead of the replication
wizard.
4.Now create a new target appliance. Select Add Target Appliance. The IP address of the target
appliance (which still exists) is entered and when found the target appliance is shown. Then select
Next. If there are multiple target libraries within the target appliance select the relevant one that
was configured prior to the disaster at remote site C.
D2D Systems96
5.The recovery wizard will then provide slot mappings. Select the required tapes which were last
replicated prior to the disaster at remote site C. Select Adopt Slot Mapping and then Next. Choose
the target cartridges you wish to be reverse replicated and select Apply.
The system displays a message indicating that reverse replication has been established.
The source will now commence synchronizing with the target and target virtual tapes will be
copied to the source complete with their original source barcodes, to aid easy recovery of the
data at the source using the backup application. The recover first box is automatically enabled.
Reverse replication starts and can be tracked by clicking on the slot mapping tag.
HP StorageWorks VLS and D2D Solutions Guide97
With reverse replication the deduplication store at the source has to be repopulated again from the
beginning so reverse replication will take longer than “normal replication” after initialization. After
reverse replication, the normal source to target relationship is reestablished automatically. Further
replication will take place from source to target in the normal manner.
It is also possible to reverse the replication even if the target D2D system has lost the mapping. This
is the case if a source library is accidentally deleted. Recovery is similar to the above except that only
the replication wizard will be available. The recover first mapping tick boxes have to be manually
‘ticked’ for each tape which is required to be reverse replicated.
Reverse Tape Initialization on the D2D
A third method is also available. At the data center, copy the tape contents required at remote site
C directly onto physical tape using the tape offload functionality within HP D2D Backup Systems.
Typically these tapes would be the last fully synchronized full backup and any incrementals required.
These tapes can then be transported to remote site C along with the tape library they were prepared
on (if necessary). Importing the tapes back onto a brand new replacement D2D2500 at remote site
C effectively initializes the deduplication store at remote site C and the required data can be recovered
from the replacement D2D device. The data transferred on the physical tapes is in 100% backup
application format. When it is imported into the replacement D2D at remote site C, the import process
of reading the physical tape regenerates all the necessary hash codes and the deduplication store at
remote site C. The data can even be recovered directly from the physical tapes at remote site C if
required; this is exactly the same process as tape initialization.
D2D Systems98
Figure 39 Recovery Options — Physical Tape
.
Creating Archive Tapes from the Target
Even with replication removing the need for physical tape offsite, there are still many users who wish
to use physical tape for archive or test recoveries, etc. An archive tape will have the following
differences from the original backup:
• A different retention time from the original backup (for example, you may keep backups and offsite
replicas for three months but keep tape archive for several years).
• Different cartridge contents because it will be a different size (optimum virtual cartridge size for
deduplication is 50-100 GB but physical tapes such as LTO-4 are 800 GB).
• A different barcode and be tracked separately in the backup application.
These characteristics mean that the backup application must control the creation of the archive tape
and must use an object-copy based system (cannot use a whole-tape copy system). This is easy if the
physical library is on the source site because you can use the normal backup application object-copy
mechanisms to simply copy from the source virtual library to the physical library via the backup
application media servers. However, if the physical library is at the target site and you want to perform
the tape archive from the replicated virtual cartridges in the target device, this requires additional
steps. This is because the backup application at the target device that sees the replicated virtual
cartridges cannot be the same backup application that performed the backups at the source site (see
Backup Application Interaction with Replication). The target site backup application does not have
any information about the replicated cartridges in its media database because it did not back up to
those tapes.
HP StorageWorks VLS and D2D Solutions Guide99
Figure 40 Creating Archive Tapes from the Target
.
In order to have the target backup application copy from the replicated cartridges to a physical tape
library, it must be “taught” what is on the replicated cartridges. This can be done automatically as
shown in Figure 40 by setting the target virtual libary to “read-only” mode (to allow the Backup
Application B to access it) and then polling the new XML report (https://<D2Daddress>/ReplicationStatus.xml) on the D2D with a script to generate a listing of which virtual cartridges
have been successfully replicated since the last poll. This cartridge list can then be fed into a script
that automatically triggers tape import jobs in the backup application (which read the new cartridge
data and import this content into the media database). The target backup application can then restore
from the imported virtual cartridges or copy them to physical tape, etc. For backup application import
script examples see Detailed Backup Application Guidelines for VLS.
D2D Systems100
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.