Sun Microsystems 5800 User Manual

Sun StorageTek 5800 System Client API Reference Manual

Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A.
Part No: 820–4796 June 2008
Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more U.S. patents or pending patent applications in the U.S. and in other countries.
U.S. Government Rights – Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, the Solaris logo, the Java Coee Cup logo, docs.sun.com, Java, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
The OPEN LOOK and Sun of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements.
Products covered by and information contained in this publication are controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries. Nuclear, missile, chemical or biological weapons or nuclear maritime end uses or end users, whether direct or indirect, are strictly prohibited. Export or reexport to countries subject to U.S. embargo or to entities identied on U.S. export exclusion lists, including, but not limited to, the denied persons and specially designated nationals lists is strictly prohibited.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDINGANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THATSUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
TM
Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering eorts
Copyright 2008 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. Tous droits réservés.
Sun Microsystems, Inc. détient les droits de propriété intellectuelle relatifs à la technologie incorporée dans le produit qui est décrit dans ce document. En particulier, et ce sans limitation, ces droits de propriété intellectuelle peuvent inclure un ou plusieurs brevets américains ou des applications de brevet en attente aux Etats-Unis et dans d'autres pays.
Cette distribution peut comprendre des composants développés par des tierces personnes.
Certaines composants de ce produit peuvent être dérivées du logiciel Berkeley BSD,licenciés par l'Université de Californie. UNIX est une marque déposée aux Etats-Unis et dans d'autres pays; elle est licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, le logo Solaris, le logo Java Coee Cup, docs.sun.com, Java et Solaris sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d'autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems, Inc.
L'interface d'utilisation graphique OPEN LOOK et Sun a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît les eorts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d'utilisation visuelle ou graphique pour l'industrie de l'informatique. Sun détient une licence non exclusive de Xerox sur l'interface d'utilisation graphique Xerox, cette licence couvrant également les licenciés de Sun qui mettent en place l'interface d'utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun.
Les produits qui font l'objet de cette publication et les informations qu'il contient sont régis par la legislation américaine en matière de contrôle des exportations et peuvent être soumis au droit d'autres pays dans le domaine des exportations et importations. Les utilisations nales, ou utilisateurs naux, pour des armes nucléaires, des missiles, des armes chimiques ou biologiques ou pour le nucléaire maritime, directement ou indirectement, sont strictement interdites. Les exportations ou réexportations vers des pays sous embargo des Etats-Unis, ou vers des entités gurant sur les listes d'exclusion d'exportation américaines, y compris, mais de manière non exclusive, la liste de personnes qui font objet d'un ordre de ne pas participer, d'une façon directe ou indirecte, aux exportations des produits ou des services qui sont régis par la legislation américaine en matière de contrôle des exportations et la liste de ressortissants spéciquement designés, sont rigoureusement interdites.
LA DOCUMENTATIONEST FOURNIE "EN L'ETAT"ET TOUTES AUTRESCONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L'APTITUDE A UNE UTILISATIONPARTICULIERE OU A L'ABSENCE DE CONTREFACON.
080616@20490
Contents
Preface ...................................................................................................................................................11
1 Sun StorageTek 5800 System Client API .......................................................................................... 15
Changes in Version 1.1 ....................................................................................................................... 15
5800 System Overview ........................................................................................................................ 16
5800 System Summary ................................................................................................................ 16
The 5800 System and Honeycomb ............................................................................................. 17
The 5800 System Data Model ..................................................................................................... 17
The 5800 System Metadata Model ............................................................................................. 19
The 5800 System Query Model ................................................................................................... 20
The 5800 System Query Integrity Model .................................................................................. 21
Deleting Objects from the 5800 System .................................................................................... 22
2 Sun StorageTek 5800 System Java Client API ................................................................................. 25
Overview of the 5800 System Java Client API .................................................................................. 25
Client Library ............................................................................................................................... 25
Interfaces ....................................................................................................................................... 26
Retrying Operations .................................................................................................................... 26
Performance and Scalability .......................................................................................................26
Updating Client View of the Schema ......................................................................................... 27
Java Client Application Deployment ................................................................................................ 27
Java API ................................................................................................................................................ 27
Java API Packages ........................................................................................................................ 27
Java API Documentation ............................................................................................................27
Basic Concepts ............................................................................................................................. 28
Key Classes .................................................................................................................................... 28
NameValueObjectArchive Application Access ........................................................................ 30
3
Contents
3 Sun StorageTek 5800 System C Client API ....................................................................................... 39
Overview of the 5800 System C Client API ...................................................................................... 39
Architecture .................................................................................................................................. 40
Interfaces ....................................................................................................................................... 40
Retrying Operations .................................................................................................................... 40
Multithreaded Access .................................................................................................................. 40
Performance and Scalability .......................................................................................................40
Memory Usage ............................................................................................................................. 41
Updating Schema Denitions .................................................................................................... 41
Session Management ................................................................................................................... 41
C Client Application Deployment ..................................................................................................... 43
Nonblocking C API ............................................................................................................................. 43
Synchronous C API ............................................................................................................................. 44
Changes for the 1.1 Release .........................................................................................................44
Limitations .................................................................................................................................... 45
Synchronous C Data Types ................................................................................................................ 46
hc_string_t ................................................................................................................................. 46
hc_long_t ..................................................................................................................................... 46
hc_double_t ................................................................................................................................. 46
hc_type_t ..................................................................................................................................... 47
hc_value_t ................................................................................................................................... 47
hc_schema_t ................................................................................................................................. 48
hc_nvr_t ....................................................................................................................................... 48
hc_session_t ............................................................................................................................... 48
hc_pstmt_t ................................................................................................................................... 49
hc_query_result_set_t ............................................................................................................ 49
read_from_data_source ............................................................................................................ 49
write_to_data_destination ................................................................................................... 50
hcerr_t ......................................................................................................................................... 51
Synchronous C API Functions .......................................................................................................... 53
Managing 5800 System Sessions ................................................................................................ 53
hc_session_create_ez .............................................................................................................. 53
hc_session_free ........................................................................................................................ 55
hc_session_get_status ............................................................................................................ 55
hc_session_get_schema ............................................................................................................ 56
hc_session_get_host ................................................................................................................ 57
Sun StorageTek5800 SystemClient API Reference Manual • June 20084
Contents
hc_session_get_platform_result ......................................................................................... 58
hc_session_get_archive .......................................................................................................... 59
Managing a Schema ..................................................................................................................... 59
hc_schema_get_type .................................................................................................................. 60
hc_schema_get_length .............................................................................................................. 61
hc_schema_get_count ................................................................................................................ 61
hc_schema_get_type_at_index ............................................................................................... 62
Manipulating Name-Value Records .......................................................................................... 63
Using the API for Storing Name-Value Records ...................................................................... 63
Using Returned Name-Value Records ...................................................................................... 64
Creating and Freeing Name-Value Records ............................................................................. 65
hc_nvr_create ............................................................................................................................. 65
hc_nvr_free ................................................................................................................................. 66
Building Name-Value Records ................................................................................................... 66
hc_nvr_add_value ...................................................................................................................... 67
hc_nvr_add_long ........................................................................................................................ 68
hc_nvr_add_double .................................................................................................................... 69
hc_nvr_add_string .................................................................................................................... 70
hc_nvr_add_binary .................................................................................................................... 71
hc_nvr_add_date ........................................................................................................................ 72
hc_nvr_add_time ........................................................................................................................ 73
hc_nvr_add_timestamp .............................................................................................................. 74
hc_nvr_add_from_string .......................................................................................................... 75
Retrieving Name-Value Records ................................................................................................ 76
hc_nvr_get_count ...................................................................................................................... 76
hc_nvr_get_value_at_index ................................................................................................... 77
hc_nvr_get_long ........................................................................................................................ 78
hc_nvr_get_double .................................................................................................................... 79
hc_nvr_get_string .................................................................................................................... 80
hc_nvr_get_binary .................................................................................................................... 81
hc_nvr_get_date ........................................................................................................................ 82
hc_nvr_get_time ........................................................................................................................ 82
hc_nvr_get_timestamp .............................................................................................................. 83
Creating and Converting Name-Value Records From and To String Arrays ...................... 84
hc_nvr_create_from_string_arrays .................................................................................... 84
hc_nvr_convert_to_string_arrays ....................................................................................... 86
5
Contents
Storing Data and Metadata ......................................................................................................... 87
hc_store_both_ez ...................................................................................................................... 87
hc_store_metadata_ez .............................................................................................................. 88
hc_check_indexed_ez ................................................................................................................ 89
Retrieving Data and Metadata .................................................................................................... 91
hc_retrieve_ez .......................................................................................................................... 91
hc_retrieve_metadata_ez ....................................................................................................... 92
hc_range_retrieve_ez .............................................................................................................. 93
Querying Metadata ...................................................................................................................... 94
hc_query_ez ................................................................................................................................. 94
hc_qrs_next_ez .......................................................................................................................... 96
hc_qrs_is_query_complete ..................................................................................................... 97
hc_qrs_get_query_integrity_time ....................................................................................... 98
hc_qrs_free ................................................................................................................................. 99
hc_pstmt_create ...................................................................................................................... 100
hc_pstmt_free ........................................................................................................................... 101
hc_pstmt_set_string .............................................................................................................. 101
hc_pstmt_set_char .................................................................................................................. 102
hc_pstmt_set_double .............................................................................................................. 103
hc_pstmt_set_long .................................................................................................................. 104
hc_pstmt_set_date .................................................................................................................. 105
hc_pstmt_set_time .................................................................................................................. 106
hc_pstmt_set_timestamp ........................................................................................................ 107
hc_pstmt_set_binary .............................................................................................................. 108
hc_pstmt_query_ez .................................................................................................................. 109
Querying With a Prepared Statement ............................................................................................. 110
Deleting Records ........................................................................................................................ 111
hc_delete_ez ............................................................................................................................. 111
Translating Error and Type Codes .......................................................................................... 112
hc_decode_hcerr ...................................................................................................................... 112
hc_decode_hc_type .................................................................................................................. 113
4 Sun StorageTek 5800 System Query Language ............................................................................ 115
Interfaces ............................................................................................................................................ 115
Operation ........................................................................................................................................... 116
Sun StorageTek5800 SystemClient API Reference Manual • June 20086
Contents
Supported Data Types ....................................................................................................................... 116
Queries ................................................................................................................................................ 117
Translating a Query to the Underlying Database ................................................................... 117
Attribute Format in Queries ..................................................................................................... 117
SQL Syntax in 5800 System Queries ........................................................................................ 118
Literals In Queries ............................................................................................................................. 118
Dynamic Parameters ................................................................................................................. 118
String Literals ............................................................................................................................. 118
Numeric Literals ......................................................................................................................... 118
Literals for 5800 System Data Types ........................................................................................ 119
Canonical String Format .................................................................................................................. 119
The Canonical String Decode Operation ................................................................................ 120
JDBC and HADB Date and Time Operations ................................................................................ 120
Reserved Words ................................................................................................................................. 121
Supported Expression Types ............................................................................................................ 121
Examples of Supported Query Expressions ................................................................................... 123
Queries Not Supported in Version 1.1 ............................................................................................ 123
SQL Words That Are Allowed in Queries ...................................................................................... 124
SQL Words That Are Not Allowed in Queries ............................................................................... 124
5 Programming Considerations and Best Practices .......................................................................127
Retries and Timeouts ........................................................................................................................ 127
Query Size Limit ................................................................................................................................ 127
Limit the Size of Schema Query Parameters and Literals ............................................................. 128
Limit Results Per Fetch ..................................................................................................................... 128
Index ................................................................................................................................................... 129
7
8
Tables
TABLE 4–1 Canonical String Representation of Data Types .................................................. 119
9
10

Preface

The Sun StorageTek 5800 System Client API Reference Manual is written for programmers and application developers who develop custom applications for the Sun StorageTek This document, along with the Sun StorageTek 5800 SystemSDK Reference Manual, provides the information that you need to develop custom applications for the 5800 system.

HowThis Book Is Organized

Chapter 1, “Sun StorageTek 5800 System Client API,” provides a summary of the changes
for the Sun StorageTek 5800 System 1.1 release, and overviews of the client APIs and query language.
Chapter 2, “Sun StorageTek 5800 System Java Client API,” provides detailed information on
the Sun StorageTek 5800 System Java client API.
Chapter 3, “Sun StorageTek 5800 System C Client API,” provides detailed information on
the Sun StorageTek 5800 System C client API.
Chapter 4, “Sun StorageTek 5800 System Query Language,” provides detailed information
on the Sun StorageTek 5800 System query language.
Chapter 5, “Programming Considerations and Best Practices,” provides programming
considerations and best practices that can help you create ecient 5800 system applications.
TM
5800 System.

Related Books

Sun StorageTek 5800 System Regulatory and Safety Compliance Manual, part number 819–3809
Sun StorageTek 5800 System Site Preparation Guide, part number 820–1635
Sun StorageTek 5800 System Administration Guide, part number 820–4118
Sun StorageTek 5800 System SDK Reference Manual, part number 820–4797
Sun StorageTek 5800 System 1.1.1 Release Notes, part number 820–4120
11
Preface

RelatedThird-Party Web Site References

Third-party URLs are referenced in this document and provide additional, related information.
Note – Sun is not responsible for the availability of third-party web sites mentioned in this
document. Sun does not endorse and is not responsible or liable for any content, advertising, products, or other materials that are available on or through such sites or resources. Sun will not be responsible or liable for any actual or alleged damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content, goods, or services that are available on or through such sites or resources.

Documentation, Support, and Training

The Sun web site provides information about the following additional resources:
Documentation (http://www.sun.com/documentation/)
Support (http://www.sun.com/support/)
Training (http://www.sun.com/training/)

Typographic Conventions

The following table describes the typographic conventions that are used in this book.
TABLE P–1 Typographic Conventions
Typeface Meaning Example
AaBbCc123 The names of commands, les, and directories,
and onscreen computer output
AaBbCc123 What you type, contrasted with onscreen
computer output
aabbcc123 Placeholder: replace with a real name or value The command to remove a le is rm
Sun StorageTek5800 SystemClient API Reference Manual • June 200812
Edit your .login le.
Use ls -a to list all les.
machine_name% you have mail.
machine_name% su
Password:
lename.
TABLE P–1 Typographic Conventions (Continued)
Typeface Meaning Example
Preface
AaBbCc123 Book titles, new terms, and terms to be
emphasized

Shell Prompts in Command Examples

The following table shows the default UNIX® system prompt and superuser prompt for the C shell, Bourne shell, and Korn shell.
TABLE P–2 Shell Prompts
Shell Prompt
C shell machine_name%
C shell for superuser machine_name#
Bourne shell and Korn shell $
Bourne shell and Korn shell for superuser #
Read Chapter 6 in the User's Guide.
A cache is a copy that is stored locally.
Do not save the le.
Note: Some emphasized items appear bold online.

Sun WelcomesYour Comments

Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by clicking the Feedback link on the http://docs.sun.com web site.
Please include the title and part number of your document with your feedback:
Sun StorageTek 5800 System Client API Reference Manual, part number 820-4796
13
14
CHAPTER 1
1

Sun StorageTek 5800 System Client API

The SunTMStorageTekTM5800 system client API provides programmatic access to a 5800 system server to store, retrieve, query, and delete object data and metadata. Synchronous versions are provided in C and Java use with POSIX operations.
This chapter provides a summary of the changes for the Sun StorageTek 5800 System 1.1 release, and overviews of the client APIs and query language.
The following topics are discussed:
“Changes in Version 1.1” on page 15
“5800 System Overview” on page 16
TM
languages. A future release will implement a non-blocking C API for

Changes in Version 1.1

The following general changes have been made in Version 1.1.
Handling is added for storing, retrieving, and querying the following metadata types:
– char — for Latin 1 character set – string — for Unicode character set – binary – date – time – timestamp
Query and queryplus are merged.
Prepared statements (pstmts) are introduced to handle the values of queries that cannot be placed inline, and a new query is introduced to handle them.
The handling of strings that are longer than the string length of the associated eld has changed.
15

5800 System Overview

In 5800 system version 1.1, an attempt to store a value that is longer than the associated eld generates an immediate error.
5800 System Overview
This section provides an overviews of the 5800 system, the 5800 system history, and a summaries of the key points of the 5800 system usage model.
The following topics are discussed:
“5800 System Summary” on page 16
“The 5800 System and Honeycomb” on page 17
“The 5800 System Data Model” on page 17
“The 5800 System Metadata Model” on page 19
“The 5800 System Query Model” on page 20
“The 5800 System Query Integrity Model” on page 21
“Deleting Objects from the 5800 System” on page 22

5800 SystemSummary

The 5800 system is an object-based storage archive appliance for xed-content data and metadata. The 5800 system is designed from the ground up to be reliable, aordable, and scalable, and to integrate data storage with intelligent data retrieval. It is designed to store huge amounts of data for decades at a time. At that scale, issues of how and where the data is stored — and how that changes over time — can be quite cumbersome. The 5800 system usage model is designed to manage those issues for you, so that your application can deal with just the data.
A custom Application Programming Interface (the 5800 Client API) is provided so that your applications can take advantage of all the features in the 5800 system usage model. The API provides the following capabilities:
Store a new object into the archive (storeObject)
Associate a new metadata record with stored object data (storeMetadata)
Retrieve the data from an object that was previously stored (retrieveData)
Retrieve the metadata from an object that was previously stored (retrieveMetadata)
Delete an object (delete)
Query for matching objects given a query expression of specic object characteristics (query)
The 5800 system API Release 1.1 provides two APIs:
The Java API is described in Chapter 2, “Sun StorageTek 5800 System Java Client API”
The C API is described in Chapter 3, “Sun StorageTek 5800 System C Client API”
Sun StorageTek5800 SystemClient API Reference Manual • June 200816
5800 System Overview
This chapter provides a summary of key points of the 5800 system usage model that are useful for understanding either API.
In the following sections, the terms from the Java API are used as an aid to exposition. In all cases, a simple equivalent using the C API is available.
Chapter 4, “Sun StorageTek 5800 System Query Language,” provides a detailed description
of query capabilties and query syntax.
Chapter 5, “Programming Considerations and Best Practices,” provides programming
considerations and best practices that can help you create ecient 5800 system applications.

The 5800 System and Honeycomb

The original code name for the project that grew into the 5800 system was Project Honeycomb. The Honeycomb name lives on as the name of an Open Solaris community that is bringing the Honeycomb software stack into the world of Open Source. The rst realization of the Honeycomb storage model as a real product is the 5800 system as described in this guide and related guides.
As a model for programmable storage systems, however, the Honeycomb API has a much broader reach than just the 5800 system. The programming model is designed to scale both up and down to any storage archive system that needs to abstract and separate the issues of how data is stored from how it is used. In recognition of both the past and the future, the string “honeycomb” and the initials “hc“ still live on in certain aspects of the API described in this guide. When the 5800 system API is used in contexts outside of the 5800 system, the API is referred to as the Project Honeycomb API.

The 5800 System Data Model

The 5800 system stores two types of data: arbitrary object data and structured metadata records. Every metadata record is associated with exactly one data object. Every data object has at least one metadata record. A unique object identier (OID) is returned when a metadata record is stored. This OID can later be used to retrieve the metadata record or its data object. In addition, metadata records can be retrieved by a query:
OID Metadata Record Underlying Object Data
There are two types of metadata, system metadata and user metadata. You cannot override the names and types of system metadata.
Each object in the 5800 system archive consists of some arbitrary bytes of data together with associated metadata that describes the data. Once an object is stored, it is immutable. The 5800 system programming model does not allow the data or the metadata associated with an object to be changed once the object has been stored, in other words the system is a Write-Once
Chapter 1 • Sun StorageTek 5800 System Client API 17
5800 System Overview
Read-Multiple (WORM) archive. Each object corresponds to a single stream of data and a single set of metadata; there are no “grouped objects” or “compound objects” other than by application convention.
Each object corresponds to a single stream of data and a single set of metadata. There are no “grouped objects” or “compound objects” other than by application convention. Similarly, there are no “links” or “associations“ from one object to another. The customer application is shielded from all details of how or where the object is stored. Internally, the actual location of an object might change over time, or several objects might even share the same underlying storage. The customer application can retrieve the object without knowing these details.
A stream of data is stored in the object archive using storeObject. Once stored, each such object is associated with an object identier or objectid (OID). The storeObject operation takes both a stream of data and an optional set of user metadata information and returns an OID. The OID can be remembered outside of the 5800 system and may later be used to retrieve the data associated with that object using the retrieveObject operation.
Every object has metadata whether or not user metadata was supplied at the time of the store. For example each object has system metadata that is system assigned and can never be modied by the user. The OID is associated with the metadata record that represents this object as a whole; the metadata record is then associated with the underlying data:
OID ↔ Metadata Record → Underlying Object Data
The retrieveObject operation takes an OID as input and returns a stream of bytes as output that are identical to the bytes stored during the storeObject operation. Both the storeObject and retrieveObject operations handle the data in a streaming manner. Not all of the data need be present in client memory or in server memory at the same time, which is a crucial point for working with large objects.
For the 5800 system Release 1.1, data sizes up to 400 GBytes are tested and supported. Using sizes even smaller than this may be appropriate as a best practice. For more information, see
Chapter 5, “Programming Considerations and Best Practices.”
From within a customer application, the storing of an object into the archive is an all-or-nothing event. Either the object is stored or it is not; there are no partial stores. If a store operation is interrupted, the entire storeObject call fails. Once an OID is returned to the customer application, the object is known to be durable. In the event of an outage that causes some data loss, the system should be no more likely to lose a newly stored object than any other object. There is no way to tie together two dierent store operations so that both either succeed together or fail together.
Note – A stored object may or may not immediately be queryable. For more information, see
“The 5800 System Query Integrity Model” on page 21.
Sun StorageTek5800 SystemClient API Reference Manual • June 200818
5800 System Overview

The 5800 System Metadata Model

Metadata means “data about the data”; it describes the data and helps to determine how the data should be interpreted. In addition, metadata can be used to facilitate querying the 5800 system for objects that match a particular set of search criteria.
For the 5800 system, the supported metadata option is in the form of name-value elds stored with each object. The set of possible elds is dened in the metadata schema. Setting up a metadata schema is an important system administration task that is described in the 5800 System Administration Guide, and is analogous to the process of database design that goes into creating a data management application. The metadata schema determines what eld names, types, and lengths may be used with the metadata stored with each object. In addition, the layout of elds into tables within the schema, together with the denition of views that speed certain searches, determine which kinds of queries about that metadata will be both possible and eective. As such, the metadata schema should match the characteristics of the expected range of applications that will deal with the stored data. The underlying software is designed to support multiple dierent kinds of metadata to aid in searching. For example, eventually there might be a specialized index to facilitate full-text search within the data objects. This document describes only the API for dealing with the name-value metadata type.
Fields in the schema can be either queryable or non-queryable. The values for non-queryable elds may be retrieved later but may not be used in queries. The 5800 system supports only single-valued elds. Each object can have only a single name-value pair of a given name. There is no built-in support for multiple-valued elds, such as a list of authors of a book in the form of multiple elds named 'author'.
Each data object is associated with a set of name-value pairs at the time the object is stored. Some metadata (system metadata) is assigned by the5800 system as each object is stored. For example, each object contains an “object creation time” (system.object_ctime) and an OID (system.object_id), both of which are assigned by the system at the time an object is created. Some metadata (the computed metadata) is implicit in the stored data, and is made explicit at the time of the object store. For example, the system exposes the object data length as a metadata eld (system.object_size). In addition, the 5800 system computes a Secure Hash Algorithm (SHA1) hash of the stored data as the data is stored and stores the hash as a metadata eld (system.object_hash). There is also an associated eld (system.object_hash_alg)to specify which hash algorithm was used in computing the system.object_hash. It is currently always set to “sha1.”
Finally, some metadata (the user metadata) is supplied by the customer application in the API call at the time an object is stored. Each store operation is allowed to include a NameValueRecord that indicates a set of name-value pairs to be associated with the data object as metadata. Each name in the name-value record must match a eld name from the metadata schema; in addition, the data value supplied for each eld must match the type and length for the eld as specied in the schema. If the names or values supplied for the user metadata do not match the active schema, then an exception is generated and the object is not stored.
Chapter 1 • Sun StorageTek 5800 System Client API 19
5800 System Overview
The metadata associated with an object is immutable. There is no operation to modify the metadata associated with an object after the object has been stored. Instead, the storeMetadata operation can be used to create a completely new object by associating new user metadata with the underlying data and system-metadata of an existing object. The storeMetadata operation does not merge the new metadata in with the metadata from the original OID; instead, the storeMetadata operation creates a new metadata record pointing to the same data object. To accomplish a merge of new eld values into existing metadata, the customer application must manually retrieve the existing metadata from the original object, perform the merge into a single NameValueRecord on the client side, and then call storeMetadata to create a new object with the merged metadata.
When creating a new object using storeMetadata, a new system.object_id and new system.object_ctime are generated, to indicate that a new object has been created. The metadata computed from the object data itself (system.object_length, system.object_hash_alg, and system.object_hash) does not change. Both the storeObject and the storeMetadata operations return a SystemRecord value that includes all of the system-assigned elds.
While retrieving the OID is the most common use of the SystemRecord, the other system elds can also be helpful. For example, the customer application might use the system.object_length, the system.object_hash_alg and the system.object_hash elds to verify that the data as stored matches the data as present in the customer application. If a hash independently computed on the client matches the hash stored on the 5800 system, then the data store has been validated.
The metadata values associated with an object can be retrieved using the retrieveMetadata operation. The retrieveMetadata operation takes an OID as input, and returns the entire set of user, system, and system-computed metadata. The retrieved metadata is in the form of a NameValueRecord that contains the value of each eld as originally stored. The system elds occur using their eld names, for example. the eld system.object_ctime contains the object creation time. There is no operation to retrieve just a single eld or a subset of elds by supplying a list of eld names. The retrieveMetadata operation retrieves the values of both queryable and non-queryable elds.

The 5800 System Query Model

One of the primary methods for retrieving data is to specify the characteristics of the desired data and then let the system nd it for you. In the 5800 system, a query expression species a set of conditions on metadata eld values. The system then returns a list of all the objects whose metadata values match the query conditions. Each object is considered individually without reference to any other objects. There are no queries that compare elds in one object with elds in a dierent object.
Sun StorageTek5800 SystemClient API Reference Manual • June 200820
5800 System Overview
Query expressions can use much of the power of Structured Query Language (SQL). Each query expression combines SQL functions and operators, eld names from the metadata schema, and literal values. There are no query expressions that select objects based on the data stored in the object itself; all queries apply only to the metadata elds associated with the object. Only queryable elds can be used in query expressions. For an object to show up in a query result set, the object must have a value for each of the elds mentioned in the query; in other words, there is an implicit INNER JOIN between the elds in the query.
A query may optionally specify that the result set should include not just the OID of each matching object, but also the values from a set of selected elds of each matched object . The value retrieved by Query With Select for some eld may be a canonical equivalent of the value originally stored in that eld. For example, values in numeric elds may have been converted to standard numeric format. Trailing spaces at the end of string elds will have been truncated (The value that is returned will be some value that would match the original data as stored, in the SQL sense.) To be included in the result set, an object must include values for all queried elds and all selected elds. In other words, there is an implicit INNER JOIN between all the elds in the query and in the select list.
There are signicant limitations on which queries may be executed eciently, or at all. See
Chapter 4, “Sun StorageTek 5800 System Query Language,” and Chapter 5, “Programming Considerations and Best Practices”
s for details of these limitations.
There are no ordering guarantees between queries and store operations that are proceeding at the same time. If an object is added to the 5800 system while a query is being performed, and the object matches the query, then the object may or may not show up in the query result set.
For a detailed description of query syntax and query semantics, including a description of exactly what it means for an object to match a query, see
System Query Language.”
Chapter 4, “Sun StorageTek 5800

The 5800 System Query Integrity Model

The result set of any query will only return results that match the query. But will it return ALL the matching results? That is the concept of query completeness, referred to here as query integrity. 100% query integrity for a result set is dened as a state in which the result set contains all the objects in the 5800 system that match that particular query. The 5800 system is not always in a state of 100% query integrity. Various system events can induce a state in which the set of objects that are available for query is smaller than the total set of objects stored in the archive. Each query result set supports an operation (isQueryComplete) whereby the customer application can ask, once all the results from the query result set have been processed, whether that set of results constitutes a complete set.
Chapter 1 • Sun StorageTek 5800 System Client API 21
5800 System Overview
Note – The format of records as stored in the reliable and scalable object archive is not suitable
for fast query. To enable searching, the queryable elds from the metadata are indexed in a query engine that can provide fast and exible query services. The query engine is basically an SQL database. This is why the 5800 system's query language can borrow so heavily from SQL. At various times, the data as indexed in the query engine can get out of date compared to what is stored in the archive. When this happens, query result sets are not known to be complete until the contents of the query engine can be brought back up to date with the actual contents of the archive again.
The 5800 system concept of query integrity as actually implemented is somewhat looser than that of 100% query integrity. Even if a query result set indicates the result set is complete, the 5800 system allows certain objects, known as store index exceptions, to be missing from the query result set, as long as those exceptions were communicated to the customer application at the time the object was stored.
A store index exception is an object for which the original store of the object into the archive succeeded, but at least some part of the insert into the query engine (database) did not succeed. The object may or may not show up in all of the queries that it matches. A store index exception is communicated to the customer application at the time of store by means of a method SystemRecord.isIndexed. A value of false from isIndexed means that the object is not immediately available for query.
A store index exception is said to be resolved when the object becomes available for query. The checkIndexed method can be used to attempt to resolve a store index exception under program control. The checkIndexed operation checks if the object has been added to the query engine, and attempts to insert it if the object has not been added. If the insert into the query engine succeeds, the object is thereby restored to full queryability.
All store index exceptions will also eventually be resolved automatically by ongoing system healing. Each query result set also exports a method getQueryIntegrityTime that can be used to get detailed status on which store index exceptions might still be unresolved. The query integrity time is a time such that all store index exceptions from before that time have been resolved. There is an “ideal” query integrity time, which is the time of the oldest still-unresolved store index exception: an ideal implementation when asked for the query integrity time would always report this ideal value. In actual implementation, the reported query integrity time might be hours or even days earlier than the ideal query integrity time, depending on how far the ongoing system healing has progressed.

Deleting Objects from the 5800 System

The 5800 system client API exports an operation to delete a specic object as specied by its OID. Once a delete operation completes normally, subsequent attempts to retrieve that object will fail with an exception. In addition, the object will stop showing up in query result sets that
Sun StorageTek5800 SystemClient API Reference Manual • June 200822
5800 System Overview
match the original object metadata. There are no transactional guarantees regarding ordering of queries and delete operations that are occurring at the same time. If an object is being deleted at the same time that a query that matches that object is being performed, then that object may or may not show up in the query result set, with no guarantee either way.
Note – When all objects that share an underlying block of data storage been deleted, the
underlying block of data storage will itself be scavenged and returned to the supply of free disk space. But all details of how objects are stored, and how and whether they ever share data — or ever are scavenged — are outside of the scope of this API.
Delete operations are all-or-nothing,with some caveats. Specically, if a delete operation fails with an error, it is possible that the object is not fully deleted but is temporarily not queryable. Such an object is in an analogous state to a store index exception (see
“The 5800 System Query Integrity Model” on page 21). The queryability of such an object will eventually be resolved by
automatic system healing. In addition, the queryability of such an object can be resolved under program control by using the checkIndexed method. Alternatively, the customer application may choose to re-execute the delete operation until it succeeds, or until it fails with an error that indicates the object is already deleted.
Chapter 1 • Sun StorageTek 5800 System Client API 23
24
CHAPTER 2
2

Sun StorageTek 5800 System Java Client API

This chapter provides information on the 5800 system Java client API.
The following topics are discussed:
“Overview of the 5800 System Java Client API” on page 25
“Java Client Application Deployment” on page 27
“Java API” on page 27
Note – You can nd detailed information on the 5800 system Java client API in the Javadocs,
which are located in the java/doc/htdocs directory.

Overview of the 5800 System Java Client API

This section provides an overview of the 5800 system Java client API. The following topics are discussed:
“Client Library” on page 25
“Interfaces” on page 26
“Retrying Operations” on page 26
“Performance and Scalability” on page 26
“Updating Client View of the Schema” on page 27

Client Library

The 5800 system Java client library provides a simple way to communicate with 5800 system clusters. It provides programmatic access to the 5800 system network protocol, which operates over HTTP, enabling you to store, retrieve, query, and delete object data and metadata.
25
Overview of the 5800 System Java Client API
The 5800 system Java client library provides a platform-independent mechanism to upload data and metadata to a 5800 system, and to retrieve and query the data and metadata. The Java client library works with any implementation of J2SE to the 5800 system cluster. Access is designed to be high-level and easy to use. Most operations are accomplished in a single (synchronous) function call.

Interfaces

The Java client API interacts with the 5800 system server entirely through an HTTP protocol. The HTTP communication layer uses the Apache Commons HTTP client.
Object data is streamed through the Java client library opaquely and a well-dened data hash is returned for verication purposes. Metadata is added or retrieved with typed accessors. The stored representation of metadata on the 5800 system server is not exposed to the user, and no hash is returned when metadata is stored.
The 5800 system Java client library provides the NameValueObjectArchive class as an application access layer, which should be appropriate for most applications. In addition, an advanced interface provides a mechanism to customize the 5800 system and to serve as a toolkit to build new applications.
Note – The advanced toolkit is not described in this document. If you are interested in pursuing
advanced applications, contact your 5800 system Sales Representative.
TM
platform 4.0 or later with HTTP connectivity

Retrying Operations

Calls to the Java API should be wrapped with retry logic so that their applications are resilient to transient failures that may be experienced when a node or switch fails while servicing an operation.
Requests that fail on recoverable HTTP errors are automatically retried once. A typical recoverable error occurs when the 5800 system HTTP server times out a connection that the client then tries to reuse (the client maintains a collection pool). This results in a connection failure at request time. Because this is a recoverable error, it is retried and the retry typically succeeds.

Performance and Scalability

Starting the Java Virtual Machine (JVM) incurs a performance penalty, but once the JVM is running, you can use the client object archive repeatedly and from multiple threads. I/O is synchronous (blocking). HTTP connections are pooled for performance. You should instantiate one instance of the NameValueObjectArchive per 5800 system server and use it for all access to that server until exit.
Sun StorageTek5800 SystemClient API Reference Manual • June 200826

Updating ClientView of the Schema

In the Java client API, the schema is fetched when the NameValueObjectArchive class is instantiated. If the schema has changed, the client application needs to create a new NameValueArchive. A local copy of the schema is used for some metadata operations.

Java Client Application Deployment

Java applications using the 5800 system Java API reference the honeycomb-client.jar library. You must include this library in your classpath when running your application. The 5800 system Java API was designed to run on Java v1.4, so you need to run your client applications with a Java environment of v1.4 or greater.

Java API

The 5800 system Java client library provides a simple way of communicating with 5800 system clusters. It provides programatic access to the 5800 system network protocol, which operates over HTTP. You can implement most applications using a handful of these classes, but access to “expert” features is also included.
This section discusses the following topics:
“Java API Packages” on page 27
“Java API Documentation” on page 27
“Basic Concepts” on page 28
“Key Classes” on page 28
NameValueObjectArchive Application Access” on page 30
Java API

Java API Packages

The Java API is implemented in two packages:
com.sun.honeycomb.client
Provides the base classes required to interact with a 5800 system cluster.
com.sun.honeycomb.common
Contains classes for server-side exceptions.

Java API Documentation

The Java API documentation (Javadoc) is located in the SDK java/doc/htdocs directory, and can be accessed using a browser.
Chapter 2 • Sun StorageTek 5800 System Java Client API 27
Java API

Basic Concepts

The root of the 5800 system Java client API is the NameValueObjectArchive class, which represents a connection to a single 5800 system server. All operations are initiated by invoking methods on a NameValueObjectArchive instance after initializing it with the address of a cluster. The fact that a cluster of machines, rather than a single server, is handling the requests is transparent to the application programmer.
A NameValueObjectArchive uses instances of the ObjectIdentifier class to uniquely identify stored data objects. That is, there is a one-to-one correspondence between instances of ObjectIdentifer and 5800 system metadata objects.
Note – There is potentially a many-to-one relationship between metadata and data objects.
When using NameValueObjectArchive, all metadata queries are executed against a 5800 system server’s user-congurable index of name-value pair lists. This class also ensures that a metadata entry is created for every data object stored, even if no metadata is provided at store time.
An instance of the NameValueObjectArchive class functions as a proxy for the 5800 system server. Instantiation incurs some overhead in establishing communication, so reusing a single instance is the recommended practice. Multithreading is supported with the same instance.
NameValueObjectArchive also allows all metadata operations to be performed in terms of two classes that represent metadata records: SystemRecord and NameValueRecord. These classes represent 5800 system metadata entries. When using NameValueObjectArchive, every stored data object has a corresponding NameValueRecord that contains the extended attributes stored with that data object, and each NameValueRecord has a reference to its SystemRecord, which contains built-in system attributes such as data object size and creation time. In this model, all instances of ObjectIdentifer returned from store operations and metadata queries correspond directly to instances of NameValueRecord.
The results of a 5800 system metadata query are returned using instances of the QueryResultSet class, which the application can step through to retrieve metadata or identiers. This class manages the details of fetching one batch of results after another.

Key Classes

This section provides an overview of the following key classes in the 5800 system Java client API. For more information on using the following classes, see Also see the Javadoc provided with the 5800 system SDK.
NameValueObjectArchive” on page 29
NameValueSchema” on page 29
Sun StorageTek5800 SystemClient API Reference Manual • June 200828
“Basic Concepts” on page 28.
Java API
ObjectIdentifier” on page 29
QueryResultSet” on page 30
SystemRecord” on page 30
NameValueRecord” on page 30
For more information on using these classes, see “Basic Concepts” on page 28.
NameValueObjectArchive
The NameValueObjectArchive class is the main entry point into the 5800 system. Each instance of NameValueObjectArchive provides access to a specic 5800 system server, functioning as a proxy object on which operations can be performed. Multiple simultaneous operations can be accomplished in separate threads on the same NameValueObjectArchive instance. Communication with the 5800 system server is entirely by means of HTTP requests. A pool of HTTP connections is maintained for eciency.
A NameValueObjectArchive instance enables you to store, retrieve, query and delete object data and associated metadata records. Metadata is associated with an object in a set of name-value pairs (see associate application-specic information with the raw data, such as name, mime type, or purge date. Metadata records consist of structured data that can be queried. Object data is opaque to the 5800 system.
NameValueRecord” on page 30). Metadata records can be used to
A NameValueObjectArchive instance always ensures that a metadata record is created on the 5800 system server for each newly stored object, even if no metadata is provided with the store. This enables a model of programming where every stored data object is accessed by name-value metadata records (for example, for examining results from queries or performing delete operations). Object data is never deleted directly; it is deleted when its last referencing metadata record is deleted.
For additional information, see
NameValueObjectArchive Application Access” on page 30.
NameValueSchema
An instance of NameValueSchema represents information about the name-value metadata that the 5800 system system uses to index data. This instance can be used to enumerate the elds available in the schema as attributes. Each attribute has a name and a type.
See the Sun StorageTek 5800 System Administrator’s Guide for information on how to dene attributes.
ObjectIdentifier
Instances of ObjectIdentifier uniquely represent objects in a 5800 system store. The 5800 system creates these instances when objects are stored and are returned to the client as part of the store result.ObjectIdentifier instances can be stored outside of the 5800 system and used
Chapter 2 • Sun StorageTek 5800 System Java Client API 29
Java API
later for retrieving objects. External storage can be accomplished using an identier's string representation by invoking the toString method. An instance of ObjectIdentifier can be reconstituted using the constructor that takes String as an argument.
QueryResultSet
Instances of QueryResultSet provide access to the objects and metadata matching a query. The query results can be stepped through using the next method. The individual results are identiers representing objects that match the query.
If selectKeys was specied in the original query, these metadata elds can be accessed using the typed getter methods with each eld’s name.
SystemRecord
Instances of SystemRecord represent the system metadata for an object, including OID, object size, SHA1 hash, and creation time. They are returned by storeObject and storeMetadata.
NameValueRecord
Instances of NameValueRecord represent metadata used by the 5800 system to store and index user-extensible lists of name-value pairs. For convenience, instances of NameValueRecord also contain references to the SystemRecord instances of the objects they represent.

NameValueObjectArchive Application Access

Most applications make use of the NameValueObjectArchive class. This class ensures that a default metadata entry is created for every data object stored, even if no metadata is explicitly provided at store time.
The NameValueObjectArchive object functions as a proxy for the 5800 system server. All access is enabled by invoking methods on this object.
The following key methods and classes are used with the NameValueObjectArchive class:
NameValueObjectArchive” on page 31
delete” on page 31
storeObject” on page 31
storeMetadata” on page 32
checkIndexed” on page 32
retrieveObject” on page 33
retrieveMetadata” on page 33
getSchema” on page 33
query” on page 34
query (with selectKeys)” on page 34
Sun StorageTek5800 SystemClient API Reference Manual • June 200830
Loading...
+ 106 hidden pages