Dell DX6000, DX6004S, DX6012S, DX Content Router Setup And Configuration Manual

Content Router Setup
and Configuration Guide
Version 2.2
Content Router Setup and Configuration Guide: Version 2.2
Copyright © 2010 Caringo, Inc. All Rights Reserved.
No part of this document may be reproduced, transmitted, or transcribed without the written consent of Caringo, Inc.
Table of Contents
1. Introduction to DX Content Router ........................................................................................ 1
1.1. Overview of DX Content Router ................................................................................. 1
1.2. About this Document ................................................................................................. 1
1.2.1. Audience ....................................................................................................... 1
1.2.2. Scope ............................................................................................................ 1
2. Replication Topologies ......................................................................................................... 2
2.1. Disaster Recovery ..................................................................................................... 2
2.2. Mirrored Clusters ...................................................................................................... 2
2.3. Multi-Site Disaster Recovery ...................................................................................... 2
2.4. Content Distribution ................................................................................................... 3
3. DX Content Router Services ................................................................................................ 4
3.1. Basic Architecture ..................................................................................................... 4
3.1.1. Structure of a DX Content Router Node ........................................................... 5
3.1.2. Publisher Service ........................................................................................... 7
3.1.3. Replicator Service ........................................................................................ 10
4. Installation ......................................................................................................................... 11
4.1. Requirements .......................................................................................................... 11
4.1.1. Operating System ......................................................................................... 11
4.1.2. 3rd Party Package Pre-requisites .................................................................. 11
4.1.3. Network ....................................................................................................... 11
4.2. Installing and Configuring DX Content Router Services .............................................. 13
4.2.1. Installing Services ......................................................................................... 13
4.2.2. Configuring Services and Rules .................................................................... 13
4.3. Upgrading DX Content Router ................................................................................. 14
4.3.1. Upgrading from 1.x ....................................................................................... 14
4.3.2. Upgrading from 2.x ....................................................................................... 14
4.4. Removing DX Content Router .................................................................................. 14
5. Running and Managing DX Content Router ......................................................................... 15
5.1. Starting DX Content Router Services ....................................................................... 15
5.2. Publisher and Replicator Shutdown .......................................................................... 15
5.3. Customizing the Standard Rule Sets ........................................................................ 15
5.3.1. Publish all streams on a single channel ......................................................... 15
5.3.2. Publish all streams on two separate channels ................................................ 16
5.3.3. Publish streams with header ‘someHeaderName’ on one channel and all
others except text files to a second ........................................................................ 16
5.3.4. Complex Content Metadata Analysis ............................................................. 16
5.4. Using the Publisher Console .................................................................................... 18
5.4.1. Publisher Console layout: ............................................................................. 19
5.5. HTTP Status Reporting for Publisher and Replicator ................................................. 21
5.5.1. Publisher Response ...................................................................................... 21
5.5.2. Replicator Response .................................................................................... 22
5.5.3. Request for Source Cluster IP Addresses ...................................................... 23
6. Support for Content Restoration and Fail-Over .................................................................... 24
6.1. Administrative Disaster Recovery ............................................................................. 24
6.2. Content Mirroring .................................................................................................... 24
6.3. Application-Assisted Fail-Over .................................................................................. 24
A. Content Metadata .............................................................................................................. 25
A.1. System Metadata .................................................................................................... 25
A.2. Content File Server Metadata .................................................................................. 25
A.2.1. Custom Metadata ......................................................................................... 25
B. DX Content Router Configuration ....................................................................................... 26
Copyright © 2010 Caringo, Inc. All rights reserved iii
Version 2.2
December 2010
B.1. Publisher Configuration ........................................................................................... 26
B.2. Replicator Configuration .......................................................................................... 30
C. Enumerator API ................................................................................................................ 34
C.1. Enumerator Types .................................................................................................. 34
C.2. Enumerator Start .................................................................................................... 34
C.2.1. Enumerator Start Query Arguments .............................................................. 35
C.2.2. Enumerator Start Response ......................................................................... 35
C.3. Enumerator Next .................................................................................................... 36
C.3.1. Enumerator Next Query Arguments ............................................................... 36
C.3.2. Enumerator Next Response .......................................................................... 37
C.4. Enumerator End ..................................................................................................... 37
C.4.1. End Response ............................................................................................. 37
C.5. Enumerator Timeout ............................................................................................... 38
C.6. Configuration and Status Query Arguments .............................................................. 38
Copyright © 2010 Caringo, Inc. All rights reserved iv
Version 2.2
December 2010
Chapter 1. Introduction to DX Content Router
1.1. Overview of DX Content Router
A DX Storage cluster routinely replicates content objects to other nodes in the same cluster in order to improve fault tolerance and performance. For disaster recovery and other reasons, it may also be desirable to automatically replicate content objects to another, remote cluster. The remote cluster may typically have the following properties:
• A different multicast domain than the local cluster
• Connectivity to the local cluster via Transport Control Protocol (TCP)
• One or more firewalls separating it from the local cluster
• Unpredictable network latency that may prevent communication for long periods of time The normal replication techniques used within a DX Storage cluster are not appropriate for remote
replication between clusters of this sort. DX Content Router (CR) supplies a more appropriate mechanism for remote replication. It also supplies enumeration of DX Storage content for other purposes like search indexing or virus scanning.
1.2. About this Document
1.2.1. Audience
This document is intended for people in the following roles.
1. Storage system administrators
2. Network administrators
3. Technical architects
4. Application integrators writing with the Enumeration API Throughout this document, the storage system administrator and network administrator roles will be
referred to as the administrator. The administrators are normally responsible for allocating storage, managing capacity, monitoring storage system health, replacing malfunctioning hardware, and adding additional capacity when needed.
This document will be valuable to technical architects in designing scalable, highly redundant, cost effective application storage solutions.
1.2.2. Scope
This document covers the steps needed to deploy DX Content Router and the administrative actions necessary to monitor and run one or more DX Content Router nodes. The reader is expected to have a background in TCP/IP networking, basic knowledge of server-class hardware, and optional experience with regular expression languages.
Copyright © 2010 Caringo, Inc. All rights reserved 1
Version 2.2
December 2010
Chapter 2. Replication Topologies
A replication topology is a defined arrangement between independent DX Storage clusters, connected to one another via DX Content Router nodes. DX Content Router supports several alternative replication topologies.
2.1. Disaster Recovery
DX Content Router allows an administrator to replicate some or all the streams stored in a primary cluster to a disaster recovery site. In case of a complete failure or loss of the primary cluster, all replicated streams can be recovered from the DR site. Hence, it is important that the administrator carefully plan the type of content that should be replicated to the DR site.
2.2. Mirrored Clusters
DX Content Router allows mirroring between two or more primary clusters. All designated streams stored in Cluster A will be replicated to Cluster B, and vice versa. The administrator of each cluster can decide, based on stream metadata, which streams should be replicated to the other cluster. It is possible that the two replication sets may identical, completely disjoint or have some overlap.
2.3. Multi-Site Disaster Recovery
Using a combination of one-way DR replication and mirrored clusters, more complex disaster recovery topologies can be deployed. Such deployments rely upon a unique metadata identifier for the origin of each object to ensure the correct subset of content is recovered from a pooled (many­to-one) DR cluster in the event of a disaster. See Appendix I for an overview of the various means by which metadata can be associated with streams in DX Storage. Some or all the DR sites may be shared with other independent primary clusters. In case of loss of the primary cluster, all replicated streams can be recovered from one or more DR clusters. This requires modification of the rules set in the DR cluster and you should seek the assistance of your designated support resource.
Copyright © 2010 Caringo, Inc. All rights reserved 2
Version 2.2
December 2010
Below is a figure showing multiple primary clusters rolling up to one DR cluster. In order to distinguish the streams in Primary Cluster 1 from those in Primary Cluster 2, metadata needs to be stored with each stream identifying, at a minimum, the cluster of origin.
2.4. Content Distribution
An alternative use for DX Content Router replication infrastructure is to roll up or distribute content within an organization or between cooperating organizations. With a little forethought and planning when storing descriptive metadata with each stream, a very sophisticated data distribution and storage infrastructure can be created. Because you have the ability to create your own rules, a dynamic pool of data can be created and moved around with relative ease.
In the example below we use a combination of mirrored and one-way replication to create a network of clusters with distinctly different data sets. The only two identical clusters are the Central Repository in Singapore and the DR Cluster in Utah. This same model could be used for roll up of major functional areas, such as Financial, Legal, Engineering, HR, etc.
Copyright © 2010 Caringo, Inc. All rights reserved 3
Version 2.2
December 2010
Chapter 3. DX Content Router Services
3.1. Basic Architecture
A DX Content Router lives within a DX Storage cluster but is always visible to other clusters, and perhaps to the external network at large. Consequently, it is assumed that all communication between DX Content Router services occurs over a secure TCP connection.
The following diagram illustrates, at a high level, the flow of messages that takes place between two DX Storage clusters, one acting as the Primary Cluster and the other as a Disaster Recovery Cluster. Dotted lines represent local cluster communication. Solid lines represent TCP traffic over the HTTP protocol between DX Content Router services and also between storage nodes within the two independent clusters.
Alternatively, if network configuration prevents direct communication between the storage nodes (such as when DX Content Router installed on a CSN), communications can be configured to route through an SCSP Proxy:
Copyright © 2010 Caringo, Inc. All rights reserved 4
Version 2.2
December 2010
3.1.1. Structure of a DX Content Router Node
DX Content Router consists of a server machine running Linux and executes one or both of two services:
1. Publisher - processes all streams stored in a cluster, filters them based on stream metadata, and publishes UUIDs to remote Subscribers.
2. Subscriber - retrieves UUID publications from remote Publishers. A Replicator, the most common subscriber, retrieves remote UUIDs and sends replication requests to local nodes and must be installed on a server in the same subnet as its target storage cluster. A 3rd party application, which may or may not be installed on a node with other DX Content Router services, can also function as a subscriber by integrating with the Enumerator API defined in the Appendix.
DX Content Router service configuration parameters are used to enable one or both services, depending on the intended network topology. For the simple example in the previous section, where a primary cluster replicates in one direction to a single DR cluster and all nodes in both clusters are mutually visible, the DX Content Router in the primary cluster would likely be configured to run only the Publisher service and the DX Content Router in the DR cluster would run just the Subscriber. A slightly more complex example would be a pair of mirrored clusters where all nodes in both clusters still have mutual visibility. DX Content Router servers for this topology would look like the diagram below. For clarity, the assumed direct connections between storage nodes in the two clusters as shown and discussed in the previous example have been omitted.
Copyright © 2010 Caringo, Inc. All rights reserved 5
Version 2.2
December 2010
Similar to the previous proxy-enabled example, if the storage nodes are not able to communicate directly in the configured network topology, the Publisher can be configured to send responses via an SCSP Proxy in a mirrored configuration as well:
Copyright © 2010 Caringo, Inc. All rights reserved 6
Version 2.2
December 2010
The two optional DX Content Router services are designed and deployed as independent processes running on a server. We discuss each of these services in more detail in the following sections.
3.1.2. Publisher Service
The Publisher service collects a comprehensive list of all the UUIDs stored in the cluster (as well as those that have been deleted), filters those UUIDs, and publishes the resulting lists of UUIDs to one or more remote Subscribers. The Publisher consists of several subcomponents:
1. Simple HTTP server
2. Attached data store
3. Filter Rules Engine
3.1.2.1. Filtering UUIDs for Publication
The Publisher traverses the list of UUIDs and evaluates one or more existence validations or simple filter expressions against the values of certain headers in the metadata. A set of matching rules can be configured by the administrator that will determine the topology of the intercluster network. These rules are specified using an XML syntax, the full definition of which can be found below.
By way of an example, suppose we want to configure our local cluster to remotely replicate all high and medium priority streams to a primary disaster recovery site, while all other streams get replicated to a secondary DR site. The General XML rule structure to do this might look like this:
<rule-set> <publish> <select name="PrimaryDR"/> <select name="SecondaryDR"/> </publish>
Copyright © 2010 Caringo, Inc. All rights reserved 7
Version 2.2
December 2010
</rule-set>
The example above is a good starting point, but it, alone, will not perform the filtering required for this example. In order to select the PrimaryDR cluster as the destination of some of the locally stored streams, we want to find all streams whose content metadata contains a header called "DX Storage-priority" whose value starts with either a "1", a "2" or one of the words "high" or "medium". Note that the header name is not case sensitive but the actual header value with a match expression is case sensitive. Here is a select rule that uses a filter with a matches() expression that would accomplish this:
<select name="PrimaryDR"> <filter header="<storageProduct/>-priority"> matches('\s*[12].*|\s*[Hh]igh.*|\s*[Mm]edium.*') </filter> </select>
A select clause specifies a pattern for a single set of data to be retrieved by the Subscriber process by name. The select can contain zero or more filter clauses. If there are multiple filter clauses, then all of them must match a stream’s metadata before the stream is published. As in HTTP, the order of headers within the metadata is not significant. If there are multiple headers in the stream metadata with the given header-name, then any of them can match the given pattern in order for the select to execute. If there are no filter clauses, then the select matches any and every stream, as in the following:
<select name="SecondaryDR"> </select>
The root tag for a set of DX Content Router rules is called rule-set, which can contain one or more publish tags as shown above. The example rule set below will replicate all high and medium priority
streams to the PrimaryDR cluster and all others to the SecondaryDR cluster. It will also also send all streams whose Content-Disposition header does not contain a file name ending with ".tmp" to the Backup cluster.
<rule-set> <publish> <select name="PrimaryDR"> <filter header="<storageProduct/>-priority"> matches('\s*[12].*|\s*[Hh]igh.*|\s*[Mm]edium.*') </filter> </select> <select name="SecondaryDR"> </select> </publish> <publish> <select name="Backup"> <filter header="content-disposition"> not matches('.*filename\s*\=.*\.tmp.*') </filter> </select> </publish> </rule-set>
Notice that a rule-set can contain multiple publish clauses, and each publish clause can contain multiple select clauses. The Filter Rules Engine evaluates all content elements for each publish
Copyright © 2010 Caringo, Inc. All rights reserved 8
Version 2.2
December 2010
clause. In the example above, where there are two publish clauses, all content streams can be queued for remote replication once for each publish. In addition, when there are two select clauses in a given publish clause, the content metadata is evaluated against each select clause’s filter set. The select clauses are evaluated in order from top to bottom. When the rules engine finds a select whose filter clauses all evaluate to true, the content stream is placed in the appropriate queue, (i.e. PrimaryDR or SecondaryDR), and awaits remote replication. Once the evaluation of that
publish clause is complete, the rules engine begins evaluation of the next publish clause. When all publish clauses have been evaluated for a given content stream, then the Filter Rules Engine begins
evaluation for the next content stream.
3.1.2.2. Rules
The full syntax for the filter rules of a Publisher is presented in simplified RELAX-NG Compact
Syntax.
start = RuleSet
RuleSet = element rule-set { Publish+ }
Publish = element publish { Select+ }
Select = element select { (Filter|Exists|NotExists)+, attribute name { text } }
Filter = element filter { HeaderAttr, # filter expression, using olderThan(), matches() etc. text }
Exists = element exists { HeaderAttr }
NotExists = element not-exists { HeaderAttr }
HeaderAttr = attribute header { text }
Exists and NotExists are tests to check if the header is present or not. An empty header will match an exists query.
Filter expressions are built using a small set of functions. The set of functions available to a filter are:
matches(regexstr) or contains(regexstr) - matches any part of the header value to a given regular
expression
Copyright © 2010 Caringo, Inc. All rights reserved 9
Version 2.2
December 2010
Loading...
+ 30 hidden pages