The SCSP Proxy accepts HTTP requests from a host network, forwards them to a DX Storage
cluster, handles redirects transparently, and then supplies the response back to the requestor. In
many deployments, a DX Storage cluster might be isolated on an internal network, protecting it from
undesired interaction with the host network, and also protecting the host network from services like
PXE boot and multicast traffic that can interfere with other network resources.
This reverse proxy can serve several other purposes as well, both in production deployments and in
test environments.
Specifically, the SCSP Proxy supports the following types of interactions:
• Basic SCSP Proxy for a local cluster
• Remote cluster coordination and communication
• Validation of incoming client requests for proper syntax and formatting
1.1. Basic SCSP Proxy
In a local cluster deployment, the SCSP Proxy handles all local SCSP traffic and manages the
associated communication with the local DX Storage cluster. The SCSP Proxy listens for any
inbound SCSP communications on the configured port and determines which DX Storage node to
send the initial request to.
For optimal performance, the SCSP Proxy caches open connections to DX Storage for reuse.
For more information, see one of the following sections:
• Section 1.1.1, “Getting a List of DX Storage Cluster IPs”
The SCSP Proxy intercepts GET / and HEAD / requests and responds with information it stores
internally about itself and about the cluster for which it serves as a forward proxy. Only GET / and
HEAD / requests result in this type of special handling by the SCSP Proxy.
Query arguments are not processed, even when the resource is empty. A request with a query
argument is forwarded as-is to the cluster and receives no special processing by the Proxy.
The Proxy responds to a GET / request from a client running in a private network with the Proxy's
list of DX Storage IP addresses and data and metadata describing the Proxy's DX Storage cluster.
You can prevent the Proxy from returning cluster IP addresses using the configuration parameter
reportHosts. For more information about configuration parameters, see Chapter 3, Configuration
Parameters
1.1.1.1. Scsp-Proxy-Cluster Request Header
The optional Scsp-Proxy-Cluster: cluster-name request header, if included with a request,
causes the Proxy to first compare the case-sensitive cluster-name against the Proxy's own
configured cluster name. If the names do not match, no node IP addresses are returned in the
response.
If Scsp-Proxy-Cluster is not present, the request uses the configured cluster name and the list of
node IP addresses reflects the currently known node IPs for that cluster.
In either case, the response metadata includes a response header with the same name containing
the cluster name used by the Proxy.
1.1.1.2. Response Headers
The following table discusses response headers:
ResponseDescription
Scsp-Proxy-ClusterCluster name (for example, if the cluster is configured on
a CSN, it is the value of scsp.clusterName in /etc/
caringo/scspproxy/scspproxy.cfg).
Scsp-Proxy-NodesASCII string count of the number of node IP addresses
returned in the body of the response. If this count is zero,
an additional reason is supplied to explain why there were
no nodes returned. More information about the zero-node
response is discussed following the table.
Scsp-Proxy-AgentProxy and its software version.
Castor-System-
TotalGBAvailable, CastorSystem-TotalGBCapacity
Obtained by polling any available node on the local
cluster. These headers are provided to maintain
consistency with current Proxy and DX Storage
responses.
For example, the following response indicates there are 10 nodes in the cluster:
Scsp-Proxy-Nodes: count=10
If the response indicates zero nodes in the cluster, it is accompanied by a reason, as follows:
Scsp-Proxy-Nodes: count=0, reason=no-nodes
The following table discusses reason codes.
Reason for empty node listMeaning
no-nodesThe Proxy reports no nodes when, for example, the cluster is
off-line or is being rebooted.
bad-subnetThe request was made from a subnet from which the node IP
addresses are not routable.
The Proxy determines whether or not a request originated from
the same private subnet as its cluster and responds only if the
request did originate in that subnet. The Proxy does this to help
prevent malicious discovery about the cluster and because
private IP addresses are not routable anyway.
bad-clusterThe cluster name supplied in the Scsp-Proxy-Cluster
header did not match the currently configured cluster.
disabledIndicates the Proxy's reportHosts configuration parameter is
set to False.
1.1.1.3. Response Body
The body of the response to a GET / is Content-Type text/plain list of IP addresses of DX Storage
nodes local to that proxy, one address per CRLF-terminated line.
The Content-Length header calculation is based on the number of bytes in this list of CRLF
terminated IPs. If the list is empty, the Scsp-Proxy-Nodes header indicates count=0 and the
Content-Length header is zero.
The body of the response to a HEAD / is always empty, although the Content-Length header
indicates the total number of response bytes that would have been returned if the HEAD had been
a GET. Additionally, the Scsp-Proxy-Nodes header indicates the number of nodes that would have
been returned and, if this count is zero, provides the reason for the empty list exactly as it does for
GET /.
1.1.2. About Expect: 100-continue Behavior
If the initial request includes an Expect: 100-continue header, the SCSP Proxy waits to read
the input stream and won't rewind, seek, or reset it on a WRITE, APPEND, or UPDATE for small
streams until 100-continue is received. If the initial request does not include an Expect: 100-continue header, the SCSP Proxy buffers input from the client but attempts to stall the input data
object from the client if the buffer reaches a limit. If the request content length is larger than 128K, it
adds Expect: 100-continue to the request it sends to the local cluster and handle DX Storage's
100 response.
1.1.3. About Location Headers
To properly route subsequent requests, the SCSP Proxy returns one Location response header
with its own external IP address and discards any other Location headers. For Replicate on Write
requests, this means all Location response headers are rewritten.
1.2. Remote Cluster Communication and Coordination
Developers who need to interact with multiple DX Storage clusters, either independently or in
multi-cluster requests, can use the SCSP Proxy for remote cluster communication and can even
coordinate requests between more than one cluster at a time.
To use the following syntax, the proxy must be configured with a list of all known remote clusters (for
more information, see Chapter 3, Configuration Parameters). The following syntax is valid only with
the Proxy; sending requests formatted as follows directly to a DX Storage cluster results in a 404
(Not Found) error because DX Storage attempts to resolve it as a path to a named object.
SyntaxDescription
/_proxy/cluster-name/uuid-orname
/_proxy/any/uuid-or-nameany is valid only for remote INFO requests and
Sends a request for an object, referenced by UUID or
by name, to a specific cluster-name.
results in an error if used with any other method.
any causes a request to be sent for an object,
referenced by UUID or by name, to any available
cluster (local or remote).
If the object exists in the local cluster the information
is returned. Otherwise, the request is sent to each
remote cluster in random order. If no cluster is able
to locate the data, the error response from the local
server is returned.
/_proxy/remote/uuid-or-nameremote is valid only for remote INFO requests and
results in an error if used with any other method.
remote causes a request for an object to be sent,
referenced by UUID or by name, to a remote cluster.
The information is returned from the first cluster
that has the object. If the object cannot be found in
any of the remote clusters, the error response that
was received from the first remote cluster tried is
returned.
Note
Support for remote proxy requests without the _proxy/ prefix is deprecated in this
release and will be removed altogether in a future release.
1.2.1. Examples of using /_proxy Syntax
As the following examples show, you must separate the URI from the domain specification using the
slash character (/). In all cases, except where noted, the domain name must be passed as the Host
in the request.
Examples follow:
• AggregateInfo of domain on either a local or remote cluster
Local cluster: GET /_proxy/uuid?indirect=HEAD
Remote cluster: GET /_proxy/cluster-name/uuid?indirect=HEAD
For more information about AggregateInfo, see Section 1.2.3, “Remote Aggregate Info”.
• HEAD of a domain on either a remote or any cluster
Remote cluster: HEAD /_proxy/remote?domain=domain-name
Any cluster: HEAD /_proxy/any?domain=domain-name
Note
The domain= query argument is required whenever you perform an operation (in this
case, HEAD) on a domain.
• POST of an unnamed object to the default cluster domain
Because you are working with the default cluster domain, the domain name does not
need to be sent as the Host in the request. For unnamed objects, POST authentication
is supported only in the default cluster domain.
• Remote synchronous write POST of an unnamed object to the default cluster domain, which has
POST authentication enabled
POST /_proxy/cluster-name/?replicate=immediate
For more information about remote synchronous write, see the next section.
1.2.2. Remote Synchronous Writes and Updates
Remote Synchronous Write enables you to write or update a copy of the same stream both locally
and remotely as part of the same request.
A Remote Synchronous Write first writes two copies of the object to the local cluster. If the local
write fails for any reason, the error response is returned to the requestor and the operation is
abandoned.
If the local write succeeds, the Proxy writes the updated object to the specified remote cluster.
This request is authenticated using the DX Storage administrator credentials specified in the
configuration of the remote cluster.
All query arguments except alias=yes as well as the Expect: Content-MD5 header are stripped
from the remote write to simplify the response coordination between the two clusters.
If the remote write is also successful, a 201 (Created) response is returned to the requestor. If the
remote write fails for any reason, a 202 (Accept) response is returned, indicating that only the local
write was successful and that remote replication can occur at a later time using DX Content Router
(if enabled).
1.2.3. Remote Aggregate Info
The Proxy has an AggregateInfo method to validate that a set of content exists in a cluster.
Aggregate Info can be issued against a local cluster but it is usually used to validate remote
replication.
You determine the desired data set by first creating a "consistency checkpoint" using the following
format, terminated with CRLF:
{uuid [mutable | immutable] | url-encoded-name}
For unnamed anchor streams, you must use mutable parameter. (The default, with no parameter
specified, is immutable.)
All named objects are assumed to be mutable so no argument is required.
You must supply a list of either URL-encoded names or UUIDs. (Use percent encoding for object
names, if needed.) The name, UUID, or the consistency checkpoint, is stored as a DX Storage
object.
Info requests in the AggregateInfo method are issued for each name or UUID in the consistency
checkpoint and either object metadata or an error response is returned in the concatenated
response body. Similar to the individual Info method, the response for a successful AggregateInfo
method execution is a 200 code.
The following is a sample checkpoint manifest stream. All object names must be URL-encoded so
you must use percent encoding to escape special characters in named objects, including space. All
line terminators must be CRLF:
The checkpoint manifest should be stored in the local cluster and its name or UUID used to issue
the AggregateInfo request. The AggregateInfo method supports named and unnamed objects
for both the manifests and streams stored in the manifest. It supports authentication only for the
manifest stream itself, and checkpoint streams in the manifests that are protected for HEAD are
returned as 401s in the AggregateInfo response body.
Any additional query arguments and headers included with the AggregateInfo method apply to the
GET request issued for the checkpoint manifest only and are not included in the individual Info
requests for each name or UUID in the manifest. If the checkpoint manifest is stored in an unnamed
anchor stream, the AggregateInfo method must be called with alias=yes in its queryArg
dictionary.
AggregateInfo uses Aggregate-Stream-Count as its trailer header.
Expect: Content-MD5 and Range headers, as well as any integrity seal query arguments, are
not supported for the AggregateInfo method.
1.2.3.1. Aggregate Info Response
For each line of the checkpoint manifest, the SCSP Proxy uses chunked encoding to send in the
AggregateInfo response body either a parse error for the line or the line's name or UUID followed by
the DX Storage's verbatim response for the Info query for it. A CRLF line-end sequence separates
the uuid line from the Info response. For example, for a manifest that included two UUIDs, both of
which the SDK was able to Info successfully, the following representative response body would be
returned:
Response for stream uuid = 64f9e40bfcb7046d79cc02c5876c1904
HTTP/1.1 200 OK
Castor-System-Alias: 64f9e40bfcb7046d79cc02c5876c1904
Castor-System-Cluster: DR Cluster
Castor-System-Created: Tue, 03 Aug 2010 18:49:27 GMT
Castor-System-Version: 1280861367.827
Content-Length: 11
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 03 Aug 2010 18:49:27 GMT
Etag: "aa9ba7fb4648f05c21411cce06c7c3d5"
Date: Tue, 03 Aug 2010 18:49:28 GMT
Server: CAStor Cluster/4.0.2
Response for stream uuid = f6a0c60d53dd1047ce670c747b1aba91
HTTP/1.1 200 OK
Castor-System-Cluster: DR Cluster
Castor-System-Created: Tue, 03 Aug 2010 18:49:27 GMT
Content-Length: 11
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 03 Aug 2010 18:49:27 GMT
Etag: "f6a0c60d53dd1047ce670c747b1aba91"
Date: Tue, 03 Aug 2010 18:49:28 GMT
Server: CAStor Cluster/4.0.2
If the first line does not have a valid UUID, the proxy responds with a single error message in the
response body stating that the checkpoint stream is incorrectly formatted and stops processing.
Chapter 2. Installing, Configuring and Running SCSP
Proxy
The SCSP Proxy installs as an RPM and is managed very similarly to other RPM packages. If
the SCSP Proxy is being utilized on a Cluster Services Node (CSN), the installation and base
configuration are performed automatically as part of CSN setup. See the CSN Installation and
Configuration Guide for complete details of customizing the SCSP Proxy on the CSN.
2.1. Operating System Requirement
SCSP Proxy has been developed and tested with 64-bit Red Hat Enterprise Linux (RHEL) 5.5.
Other RHEL versions or Linux distributions are not currently supported. The information discussed
in this chapter assumes you use a pre-installed RHEL Linux environment.
2.2. Installing and Configuring the SCSP Proxy
To install the SCSP Proxy, you must first install prerequisite packages (if required). The following
topics discuss these tasks in more detail:
• Section 2.2.3, “Installing or Upgrading the SCSP Proxy Software”
• Section 2.2.4, “Configuring the SCSP Proxy”
2.2.1. Installing SCSP Proxy Prerequisites
Before you can install the SCSP Proxy, you must verify the following prerequisites are installed:
• Python 2.5
• Python Setuptools 0.6c9
• Twisted 8.2.0
None of the preceding are available as standard RPM packages with RHEL 5.5; however, a Python
2.5-compatible Twisted RPM is located in the /scspproxy directory of this SDK.
If you are installing the SCSP Proxy on a Cluster Services Node (CSN), all of the required RPMs are
installed for you.
If you are installing the SCSP Proxy on a non-CSN system, you can manually install the packages
using commands like the following (in the order shown):
Before starting your Proxy installation, upgrade to RHEL 5.5 using upgrade utilities provided with the
operating system.
2.2.3. Installing or Upgrading the SCSP Proxy Software
Installing the SCSP Proxy from the SDK is similar to installing the prerequisites because the
procedure is different depending on whether or not you have CSN. To install the SCSP Proxy from
the SDK, do either of the following:
• If you use a CSN, to upgrade the SCSP Proxy, copy and unzip the release distribution, and run
the following commands in the order shown:
$ sudo su # /etc/init.d/scspproxy stop
# cd scspproxy-copy-dir
# ./caringo-scspproxy-install.sh
• If you do not use a CSN and instead run the proxy from a RHEL 5.5 server, extract the SDK on
the server and install the enclosed RPM.
Stop the proxy process using the following command:
$ sudo su # /etc/init.d/scspproxy stop
The following commands then install or upgrade the software when run by a user with sudo
privileges:
$ sudo su # unzip DX StorageSDK-1.2.zip
# cd DX StorageSDK-1.2/scspproxy
# ./caringo-scspproxy-install.sh
Warning
Upgrading the SCSP Proxy using standard Red Hat packaging tools like yum is not
supported because upgrading with these tools can result in losing configuration data.
2.2.4. Configuring the SCSP Proxy
After the proxy has been installed, you must copy the sample scspproxy.cfg and hosts.cfg
sample files and then modify them using a text editor. To create and edit the files in their default
location for instance:
You must configure the interface and the port that the SCSP Proxy will listen on, logging, connection
pool settings, the port and host list as well as the name for the local cluster, and any remote
clusters. For full details of all configuration parameters, see Chapter 3, Configuration Parameters.
2.3. Running the SCSP Proxy
After the configuration file has been updated, start and stop the Proxy in any of the following ways:
• The Proxy is installed on an operating system other than RHEL, such as Ubuntu.
Use the startup/shutdown scripts if the Proxy is installed on a CSN, or if it is installed on RHEL
and you do not want to use optional startup options.
• You can start the Proxy using proxyservice.py under any of the following circumstances:
Run the following command from the directory where the install file was unzipped (or from any
location if you have configured your Python path to point to the installed location):
python2.5 proxyservice.py options
where options are discussed in the following table.
proxyservice.py startup optionMeaning
--staticlocatorStart the Proxy and use the static locator.
--cfgfile path-to-scspproxy.cfgAccess the Proxy configuration file from the
specified location.
--pidfile path-to-scspproxy.pidSpecify the location of the Proxy service
Process Identifier (PID) file.
The default location is /var/run/scspproxy.pid.
To validate if the proxy is running in front of a DX Storage cluster, enter the Proxy's IP address and
port in a web browser's location or address field. If you see a DX Storage status page the proxy is
working correctly.
2.3.1. Validation Mode
In addition to the basic validation included in the DX Storage SDK, the SCSP Proxy includes a
validation mode that provides validation for incoming SCSP requests. This validation checks the
syntax for common operations like adding query arguments and creating lifepoint headers. The
SCSP Proxy does not discern between clients created using the DX Storage SDK and those that
were created without it.
Validation mode and execution mode are mutually exclusive. Requests are not sent to DX Storage
while in validation mode but are instead analyzed by the Proxy for a response. If a query fails
validation, it returns an error describing the reason for the failed validation.
Successful status codes in validation mode follow:
• 200 for GET, HEAD, and DELETE
• 201 for POST, PUT, COPY, and APPEND
Errors are returned as one of the following codes:
• The SDK returns a 50x for general request errors such as bad host name, unknown method,
HTTP syntax errors, and so on.
Your client might return other response codes for these types of errors.
Outside of validation mode, headers or query arguments that are invalid are either silently ignored or
trigger an error at the DX Storage server layer and return as SCSP errors to the client.
To enable validate mode in the SCSP Proxy, change the value of the validationMode parameter
in scspproxy.cfg file to True.
The following items are currently checked by the proxy while in validation mode. Validation mode
supports validation of lifepoint time inputs in RFC 1123 format.
You can configure the SCSP Proxy using either the Cluster Services Node (CSN) console or by
editing the scspproxy.cfg file. If you use CSN, see the CSN Installation and Configuration Guide.
3.1. Configuring the Proxy by Editing scspproxy.cfg
To configure the SCSP proxy when you are not using CSN, edit /etc/caringo/scspproxy/
scspproxy.cfg. The configuration file is divided into 5 sections: proxy, log, connectionpool, scsp,
and remote. The following table discusses all configuration parameters:
Option NameValueDescription
[proxy] interfaceEx: 192.168.1.1The IP address of the external
interface for the server where
the proxy is installed.
[proxy] portDefault: 80The port of the external
interface the proxy will listen
on for the server where the
proxy is installed.
[proxy] reportHostsDefault: TrueEnables or disables the ability
to detect the addresses of
cluster nodes. Valid casesensitive values are True and
False.
[proxy] validationModeFalseWhether or not the proxy
should be running in validation
mode or sending received
requests to the DX Storage
cluster.
[log] hostEx: localhostThe address of the syslog
server to send log messages
to
[log] port514The port number of the syslog
server to send log messages
to
[log] filestdoutThe name of the file to log
messages to if host is blank
[log] fileSize0Limit of the size of the log file.
[log] level40Logging level, one of: 10 =
#
# PARAMETERS
#
[proxy]
port = 80
interface = 192.168.1.1
validationMode = False
[log]
#<10 = CHATTER, which includes twisted.internet logs
#10 = DEBUG - Turns on detail message handling logs
#20 = INFO - Turns on request and response echo and connection pool logging
#40 = ERROR - Turns on error logging
level = 40
#default = localhost
#For stdout, leave blank.
host = localhost
#default = 514. Not used for empty host.
port = 514
#default = stdout. Not used if host is not empty.
file = stdout
#If loghost is empty and file is not stdout, this is the log file size
#limit
fileSize = 0
#If host is empty, this is not used. If host is not empty
#then hostBindInterface should be the host that will
#be written with log messages.
hostBindInterface = 'localhost'
facility = 0
[connectionPool]
maxStoredConnections = 200
connectTimeOut = 60
poolTimeOut = 300
[scsp]
hosts = 192.168.1.121 192.168.1.122 192.168.1.123
#name of the DX Storage cluster. This is the cluster we will dns browse for nodes
clusterName =
port = 80
[remote]
hostsConfigFile = /etc/caringo/scspproxy/hosts.cfg
#
# END
3.2. hosts.cfg
Communication with remote proxies or clusters is configured in a configuration file called hosts.cfg
that is installed by default in the /etc/caringo/scspproxy directory. The hosts.cfg file contains a list of
all known remote clusters, each configured with the following 5 values on a single line per cluster:
Option NameDescription
ClusterNameThe common name for a remote cluster that
will be used in requests sent to the proxy. May
not contain whitespaces and cannot be a 32-
Version 1.2
December 2010
Option NameDescription
character hex string. May be the DNS name for
the cluster but that is not required.
RemoteAddressThe IP address for the remote proxy the local
proxy will communicate with. May not contain
whitespaces.
PortThe port from which the remote proxy will listen
incoming requests. This is usually port 80. May
not contain preceding or trailing whitespaces.
RemoteAdminNameThe name of an administrator that belongs to
the DX Storage Administrators group for the
remote DX Storage cluster. This value cannot
contain whitespace.
RemoteAdminPasswordThe administrator's password. This option can
contain blanks as long as they are not leading
or trailing.
The following is an example of the hosts.cfg file:
# Sample scspproxy Remote Host Configuration file
#
# ClusterName RemoteAddress Port RemoteAdminName RemoteAdminPassword
# One entry per line
# Fields are whitespace (space and tab) separated, although the password field
# may contain whitespace.
# Note that this implies that none of the other fields may contain whitespace.
#
#DRCluster 192.168.1.123 80 admin yourpwdofchoicehere
#dr 192.168.1.123 80 admin yourpwdofchoicehere
#dr2 192.168.1.123 80 admin yourpwdofchoicehere