VMware vFabric SQLFire - 1.0 User's Guide

vFabric SQLFire User's Guide

VMware vFabric SQLFire 1.0.3

VMware vFabric Suite 5.1

This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs.

EN-000663-00

You can find the most up-to-date technical documentation on the VMware Web site at: http://www.vmware.com/support/

The VMware Web site also pro vides the latest product updates. If you hav e comments about this documentation, submit your feedback to: docfeedback@vmware.com

Copyright © 2012 VMware, Inc. All rights reserved.This product is protected by U.S. and international copyright and intellectual property laws.VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents

VMware is a registered trademark or trademark of VMw are , Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.

VMware, Inc. 3401 Hillview Ave. Palo Alto, CA 94304

About the SQLFire User's Guide.................................................................................13

Supported Configurations and System Requirements.............................................15

Part I: Getting Started with vFabric SQLFire............................................1

Chapter 1: Overview of vFabric SQLFire..................................................3

Data Flow Diagram.......................................................................................................................................4

GemFire, Apache Derby, and SQL Components..........................................................................................4

Chapter 2: Understanding the SQLFire Distributed System...................7

vFabric SQLFire Members............................................................................................................................7

Servers, Peer Clients, and Server Groups....................................................................................................8

Discovery Mechanisms.................................................................................................................................8

Group Membership Service..........................................................................................................................8

Replicated Tables and Par titioned Tables.....................................................................................................9

Parallel Execution of Data-Aware Stored Procedures...................................................................................9

Cache Plug-ins for External Data Connections..........................................................................................10

Chapter 3: Installing vFabric SQLFire.....................................................11

Installation Note for vFabr ic Suite Customers.............................................................................................11

RHEL Only: Install vFabric SQLFire from the VMWare YUM Repository....................................................11

Install vFabric SQLFire from a JAR File......................................................................................................13

Chapter 4: Activating vFabric SQLFire Licensing.................................15

Understand vFabric SQLFire License Options...........................................................................................15

Choose a License Option Based on Topology............................................................................................16

How vFabric SQLFire Manages Licensing..................................................................................................18

Install and Configure vFabric SQLFire Licenses.........................................................................................19

Ver ify Your License and Check Your License Usage...................................................................................21

Chapter 5: Upgrading vFabric SQLFire..................................................23

RHEL Only: Upgrade vFabric SQLFire from the VMWare YUM Repository...............................................23

Upgrade Issues...........................................................................................................................................24

Chapter 6: Connect to vFabric SQLFire with JDBC Tools.....................25

Chapter 7: Tutorials..................................................................................27

Main Steps..................................................................................................................................................27

Create a SQLFire Cluster...........................................................................................................................27

Connect to the Cluster Using SQLF...........................................................................................................29

Create Replicated Tables and Execute Quer ies.........................................................................................30

iii

Implement a Partitioning Strategy...............................................................................................................31

Persist Tables to Disk..................................................................................................................................34

Add Servers to the Cluster and Stop Servers.............................................................................................35

Perfor m Additional Tasks............................................................................................................................36

Chapter 8: vFabric SQLFire Features and Benefits...............................37

Part II: Managing Your Data in vFabric SQLFire.....................................39

Chapter 9: Designing vFabric SQLFire Databases................................41

Design Principles of Scalable, Partition-Aware Databases.........................................................................41

Identify Entity Groups and Partitioning Keys...............................................................................................41

Replicate Code Tables................................................................................................................................42

Dealing with Many-to-Many Relationships..................................................................................................42

Example: Adapting a Database Schema for SQLFire.................................................................................43

Chapter 10: Using Server Groups to Manage Data................................47

Server Groups Overview.............................................................................................................................47

Adding Members to Server Groups............................................................................................................49

Assigning Tables to Server Groups.............................................................................................................50

Chapter 11: Partitioning Tables...............................................................51

How Table Par titioning Wor ks.....................................................................................................................51

Understanding Where Data Is Stored.........................................................................................................52

Failure and Redundancy.............................................................................................................................53

Creating Partitioned Tables.........................................................................................................................54

Rebalancing Partitioned Data on SQLFire Members..................................................................................59

Managing Replication Failures....................................................................................................................60

Chapter 12: Replicating Tables................................................................61

How SQLFire Replicates Tables.................................................................................................................61

Deciding When to Use Replicated Tables...................................................................................................61

Creating Replicated Tables.........................................................................................................................62

Chapter 13: Estimating Memory Requirements.....................................63

Estimating SQLFire Overhead....................................................................................................................63

Viewing Memory Usage in SYS.MEMORYANALYTICS..............................................................................64

Chapter 14: Using Disk Stores to Persist Data......................................69

Overview of Disk Stores..............................................................................................................................69

Guidelines for Designing Disk Stores.........................................................................................................74

Creating a Disk Store or Using the Default.................................................................................................75

Persist Table Data to a Disk Store...............................................................................................................76

Optimizing Availability and Performance.....................................................................................................77

Starting System with Disk Stores................................................................................................................77

vFabric SQLFire User's Guideiv

Contents

Disk Store Management.............................................................................................................................79

Chapter 15: Expor ting and Importing Data with vFabric SQLFire.......89

Using SQLF Commands to Export and Impor t Data..................................................................................89

Using Apache DdlUtils to Import Data........................................................................................................91

Exporting and Importing Data from Text Files.............................................................................................94

Chapter 16:Using T ab le Functions to Import Data as a SQLFire T ables.95

Overview of SQLFire Table Functions.........................................................................................................95

Example Table Function..............................................................................................................................97

Part III: Developing Applications with SQLFire......................................99

Chapter 17:Starting SQLFire Servers with the FabricServer Interface.101

Starting a Network Ser ver........................................................................................................................102

Chapter 18: Developing Java Clients and Peers..................................105

Connect to a SQLFire Server with the Thin Client JDBC Driver...............................................................105

Start a SQLFire Peer with the Peer Client JDBC Driver...........................................................................107

Chapter 19: Configuring SQLFire as a JDBC Datasource..................109

Chapter 20: Storing and Loading JAR Files in SQLFire......................111

Class Loading Overview...........................................................................................................................111

Alternate Methods for Managing JAR Files..............................................................................................113

Chapter 21: Developing ADO.NET Client Applications.......................117

About the ADO.NET Driver.......................................................................................................................117

ADO.NET Driver Classes..........................................................................................................................118

Installing and Using the ADO.NET driver..................................................................................................119

Connecting to SQLFire with the ADO.NET Driver....................................................................................119

Managing Connections.............................................................................................................................120

Executing SQL Commands.......................................................................................................................120

Working with Result Sets..........................................................................................................................121

Storing a Table..........................................................................................................................................122

Storing Multiple Tables..............................................................................................................................123

Specifying Command Parameters with SQLFParameter..........................................................................124

Updating Row Data...................................................................................................................................125

Adding Rows to a Table............................................................................................................................126

Managing SQLFire Transactions...............................................................................................................127

Perfor ming Batch Updates........................................................................................................................128

Generic Coding with the SQLFire ADO.NET Driver..................................................................................129

Chapter 22: Using SQLFire.NET Designer...........................................133

Installing SQLFire.NET Designer..............................................................................................................133

Connecting to a SQLFire Distributed System...........................................................................................133

Editing Tables............................................................................................................................................134

Chapter 23: Understanding the Data Consistency Model...................135

Data Consistency Concepts.....................................................................................................................135

No Ordering Guarantee for DML in Separate Threads.............................................................................136

Updates on Any Row Are Atomic and Isolated.........................................................................................136

Atomicity for Bulk Updates........................................................................................................................136

Chapter 24: Using Distributed Transactions in Your Applications.....137

Overview of SQLFire Distributed Transactions.........................................................................................137

Sequence of Events for a Distributed Transaction....................................................................................140

SQLFire Transaction Design.....................................................................................................................141

Best Practices for Using Transactions.......................................................................................................141

Transaction Functionality and Limitations.................................................................................................142

Chapter 25: Using Data-Aware Stored Procedures.............................143

Configuring a Procedure...........................................................................................................................143

Configuring a Custom Result Processor...................................................................................................144

Invoking a Procedure................................................................................................................................145

Example JDBC Client...............................................................................................................................146

Chapter 26: Using the Procedure Provider API....................................149

Procedure Parameters..............................................................................................................................149

Populating OUT and INOUT Parameters..................................................................................................150

Populating Result Sets..............................................................................................................................150

Using the <local> and <global> Escape Syntax with Nested Queries......................................................152

Chapter 27: Using the Custom Result Processor API.........................153

Implementing the ProcedureResultProcessor Interface ..........................................................................153

Example Result Processor: MergeSort.....................................................................................................153

Chapter 28: Programming User-Defined Types....................................157

Chapter 29: Using Result Sets and Cursors........................................161

Non-updatable, Forward-Only Result Sets...............................................................................................161

Updatable Result Sets..............................................................................................................................162

Scrollable Insensitive Result Sets.............................................................................................................167

Result Sets and Autocommit.....................................................................................................................168

Holdable Result Sets................................................................................................................................168

Part IV: Caching Data with vFabric SQLFire.........................................171

Chapter 30: SQLFire Cache Strategies.................................................173

vFabric SQLFire User's Guidevi

Contents

Chapter 31: Using a RowLoader to Load Existing Data......................175

How SQLFire Invokes a RowLoader.........................................................................................................175

Implementing the RowLoader Interface....................................................................................................176

Using the JDBCRowLoader Example.......................................................................................................176

Chapter 32: Evicting Table Data from SQLFire.....................................179

How LRU Eviction Works..........................................................................................................................179

Limitations of Eviction...............................................................................................................................179

Eviction in Partitioned Tables....................................................................................................................180

Create a Table with Eviction Settings........................................................................................................180

Chapter 33: Handling DML Events Synchronously.............................183

Writer and Listener Cache Plug-ins..........................................................................................................183

Example Writer Implementation................................................................................................................184

Example Listener Implementation............................................................................................................184

Chapter 34: Handling DML Events Asynchronously..........................185

How the AsyncEventListener Works.........................................................................................................185

Implementation Requirements..................................................................................................................186

Implementing an AsyncEventListener.......................................................................................................186

Chapter 35: Using DBSynchronizer to Apply DML to an RDBMS......191

How DBSynchronizer Works.....................................................................................................................191

Restrictions and Limitations......................................................................................................................192

Configuring DBSynchronizer....................................................................................................................193

Chapter 36:Suppressing Event Callbacks for a vFabric SQLFire Connection.197

Part V: Deploying vFabric SQLFire........................................................199

Chapter 37: SQLFire Deployment Models............................................201

Embedded Peer-to-Peer Deployment.......................................................................................................202

Client-Server Deployment.........................................................................................................................204

Multi-site Deployment...............................................................................................................................205

Chapter 38: Steps to Plan and Configure a Deployment....................209

Chapter 39: Configuring Discovery Mechanisms................................211

Using Locators..........................................................................................................................................211

Configure Multicast Discovery..................................................................................................................213

Chapter 40: Star ting and Configuring SQLFire Servers.....................215

Start and Stop SQLFire Ser vers Using sqlf..............................................................................................215

Specify the Server Working Directory.......................................................................................................216

vii

Specify Client Connection Information......................................................................................................216

Define Server Groups...............................................................................................................................216

Execute SQL When You Start a Server.....................................................................................................217

Using Additional Boot Properties..............................................................................................................217

Chapter 41: Configuring Multi-site (WAN) Deployments.....................219

About Gateways........................................................................................................................................219

About High Availability for WAN Deployments..........................................................................................221

Limitations of Multi-Site Replication..........................................................................................................222

Prerequisites for WAN Replication............................................................................................................223

Steps to Configure a Multi-site Deployment..............................................................................................223

Chapter 42: Configuring Authentication and Authorization...............229

Configuring User Authentication...............................................................................................................229

User Names in Authentication and Authorization ....................................................................................237

Configuring User Authorization.................................................................................................................238

Configuring Network Encryption and Authentication with SSL/TLS..........................................................242

Part VI: Managing and Monitoring vFabric SQLFire............................247

Chapter 43: Configuring and Using SQLFire Log Files.......................249

Log Message Format................................................................................................................................249

Severity Levels..........................................................................................................................................249

Using java.util.logging.Logger for Application Log Messages...................................................................250

Using Trace Flags for Advanced Debugging.............................................................................................250

Chapter 44: Querying SQLFire System Tables and Indexes...............253

Getting Information About SQLFire Members..........................................................................................253

Getting Information About User Tables.....................................................................................................254

Chapter 45:Evaluating Query Execution Plans and Query Statistics.259

Capture the Query Execution Plan for Individual Statements...................................................................259

Capture Query Execution Plans for All Statements..................................................................................260

View Query Plans and Statistics...............................................................................................................260

SQLFire Query Execution Plan Codes.....................................................................................................262

Chapter 46: Overriding Optimizer Choices..........................................265

Chapter 47: Evaluating System and Application Performance..........269

Collecting System Statistics......................................................................................................................269

Collecting Application Statistics................................................................................................................270

Using VSD to Analyze Statistics...............................................................................................................270

Chapter 48: Using Java Management Extensions (JMX)....................281

Overview of the vFabric SQLFire JMX Agent...........................................................................................281

vFabric SQLFire User's Guideviii

Contents

Example Configuration.............................................................................................................................281

Starting the JMX Agent.............................................................................................................................282

Stopping the JMX Agent...........................................................................................................................286

Configuring JMX Connectors and Adapters.............................................................................................286

SSL Communication Properties................................................................................................................288

vFabric SQLFire MBeans..........................................................................................................................288

Chapter 49: Best Practices for Tuning Performance...........................291

Tune Application Logic..............................................................................................................................291

Reduce Distribution Overhead..................................................................................................................291

Reduce Overhead of Eviction to Disk.......................................................................................................292

Minimize Update Latency for Replicated Tables.......................................................................................292

Tune FabricServers...................................................................................................................................292

Tuning Disk I/O.........................................................................................................................................293

Running SQLFire in Virtualized Environments..........................................................................................294

Chapter 50:Detecting and Handling Network Segmentation ("Split Brain").295

Part VII: vFabric SQLFire Reference......................................................297

Chapter 51: Configuration Properties..................................................299

Chapter 52: JDBC API............................................................................331

Mapping java.sql.Types to SQL Types......................................................................................................331

java.sql.BatchUpdateException Class......................................................................................................332

java.sql.Connection Interface....................................................................................................................332

java.sql.DatabaseMetaData Interface.......................................................................................................333

java.sql.Driver Interface............................................................................................................................335

java.sql.DriverManager.getConnection Method........................................................................................335

java.sql.PreparedStatement Interface.......................................................................................................335

java.sql.ResultSet Interface......................................................................................................................337

java.sql.SavePoint Class...........................................................................................................................337

java.sql.SQLException Class....................................................................................................................337

java.sql.Statement Class..........................................................................................................................337

javax.sql.XADataSource...........................................................................................................................337

Chapter 53: sqlf Launcher Commands................................................339

sqlf backup................................................................................................................................................340

sqlf compact-all-disk-stores......................................................................................................................343

sqlf compact-disk-store.............................................................................................................................344

sqlf encrypt-password...............................................................................................................................345

sqlf install-jar.............................................................................................................................................345

sqlf list-missing-disk-stores.......................................................................................................................348

sqlf locator................................................................................................................................................349

sqlf Logging Support.................................................................................................................................352

sqlf merge-logs.........................................................................................................................................353

sqlf remove-jar..........................................................................................................................................353

sqlf replace-jar..........................................................................................................................................355

sqlf revoke-missing-disk-store...................................................................................................................358

sqlf server.................................................................................................................................................359

sqlf shut-down-all......................................................................................................................................368

sqlf stats....................................................................................................................................................369

sqlf validate-disk-store..............................................................................................................................372

sqlf version................................................................................................................................................373

sqlf write-data-dtd-to-file...........................................................................................................................373

sqlf write-data-to-db..................................................................................................................................376

sqlf write-data-to-xml................................................................................................................................380

sqlf write-schema-to-db............................................................................................................................384

sqlf write-schema-to-sql............................................................................................................................388

sqlf write-schema-to-xml...........................................................................................................................392

Chapter 54: sqlf Interactive Commands...............................................397

absolute ...................................................................................................................................................397

after last ...................................................................................................................................................398

async .......................................................................................................................................................399

autocommit ..............................................................................................................................................399

before first ................................................................................................................................................400

close ........................................................................................................................................................400

commit .....................................................................................................................................................401

connect ....................................................................................................................................................401

connect client ...........................................................................................................................................402

connect peer ............................................................................................................................................403

describe ...................................................................................................................................................403

disconnect ................................................................................................................................................404

driver ........................................................................................................................................................404

elapsedtime .............................................................................................................................................405

execute ....................................................................................................................................................406

exit ...........................................................................................................................................................407

first............................................................................................................................................................407

get scroll insensitive cursor.......................................................................................................................408

GetCurrentRowNumber ...........................................................................................................................410

help ..........................................................................................................................................................410

last ...........................................................................................................................................................410

LocalizedDisplay.......................................................................................................................................411

MaximumDisplayWidth.............................................................................................................................412

next...........................................................................................................................................................412

prepare ....................................................................................................................................................413

previous....................................................................................................................................................414

protocol.....................................................................................................................................................415

relative......................................................................................................................................................415

remove......................................................................................................................................................416

vFabric SQLFire User's Guidex

Contents

rollback......................................................................................................................................................417

run.............................................................................................................................................................418

set connection...........................................................................................................................................418

show..........................................................................................................................................................419

wait for......................................................................................................................................................422

Chapter 55: SQLFire API........................................................................423

CredentialInitializer...................................................................................................................................424

UserAuthenticator.....................................................................................................................................424

Procedure Implementation Interfaces.......................................................................................................424

Procedure Result Processor Interfaces....................................................................................................426

Chapter 56: SQL Language Reference.................................................431

Keywords and Identifiers...........................................................................................................................431

SQL Statements.......................................................................................................................................432

SQL Clauses.............................................................................................................................................489

SQL Expressions......................................................................................................................................493

JOIN Operations.......................................................................................................................................506

Built-in Functions......................................................................................................................................508

Built-in System Procedures......................................................................................................................550

Data Types................................................................................................................................................573

SQL Standards Conformance...................................................................................................................584

Chapter 57: System Tables.....................................................................605

ASYNCEVENTLISTENERS table.............................................................................................................605

GATEWAYRECEIVERS table...................................................................................................................606

GATEWAYSENDERS table.......................................................................................................................607

MEMBERS system table...........................................................................................................................608

MEMORYANALYTICS system table..........................................................................................................609

STATEMENTPLANS system table............................................................................................................609

SYSALIASES system table.......................................................................................................................610

SYSCHECKS system table.......................................................................................................................610

SYSCOLPERMS system table.................................................................................................................611

SYSCOLUMNS system table....................................................................................................................612

SYSCONGLOMERATES system table.....................................................................................................613

SYSCONSTRAINTS system table............................................................................................................613

SYSDEPENDS system table....................................................................................................................614

SYSDISKSTORES system table..............................................................................................................615

SYSFILES system table............................................................................................................................615

SYSFOREIGNKEYS system table............................................................................................................616

SYSKEYS system table............................................................................................................................616

SYSROLES system table.........................................................................................................................617

SYSROUTINEPERMS system table.........................................................................................................618

SYSSCHEMAS system table....................................................................................................................619

SYSSTATEMENTS system table..............................................................................................................619

SYSSTATISTICS system table..................................................................................................................620

SYSTABLEPERMS system table..............................................................................................................620

SYSTABLES system table........................................................................................................................622

SYSTRIGGERS system table...................................................................................................................624

SYSVIEWS system table..........................................................................................................................625

Chapter 58: Exception Messages and SQL States..............................627

Chapter 59: ADO.NET Driver Reference...............................................631

SQLFire Data Types in ADO.NET.............................................................................................................631

VMware.Data.SQLFire.BatchUpdateException........................................................................................631

VMWare.Data.SQLFire.SQLFClientConnection.......................................................................................631

VMware.Data.SQLFire.SQLFCommand...................................................................................................636

VMware.Data.SQLFire.SQLFCommandBuilder........................................................................................642

VMware.Data.SQLFire.SQLFType............................................................................................................642

VMware.Data.SQLFire.SQLFDataAdapter...............................................................................................643

VMware.Data.SQLFire.SQLFDataReader................................................................................................645

VMware.Data.SQLFire.SQLFException....................................................................................................647

VMware.Data.SQLFire.SQLFParameter...................................................................................................648

VMware.Data.SQLFire.SQLFParameterCollection...................................................................................648

VMware.Data.SQLFire.SQLFTransaction.................................................................................................649

Chapter 60: vFabric SQLFire Limitations.............................................653

SQL Language Limitations.......................................................................................................................653

ADO.NET Driver Limitations.....................................................................................................................660

Troubleshooting Common Problems........................................................................663

vFabric SQLFire Glossary..........................................................................................665

vFabric SQLFire User's Guidexii

About the SQLFire User's Guide

Revised July 31, 2012. The SQLFire User's Guide provides step-by-step procedures for installing, conﬁguring, managing, and developing

applications with VMware® vFabric™ SQLFire. The guide also provides a complete reference for the SQLFire tools, supported SQL statements, and APIs.

Intended Audience

The SQLFire User's Guide is intended for anyone who wants to install or deploy SQLFire, and for developers who want to program applications that access a SQLFire system and/or implement SQLFire APIs. The guide assumes that you are familiar with SQL databases and database terminology, and that you have experience in developing database applications using Java or .NET technologies.

Supported Configurations and System Requirements

Before installing vFabric SQLFire, make sure your system meets the minimum system requirements for installing and running the product.

• Supported Conﬁgurations on page 15

• Host Machine Requirements on page 16

• Increase Unicast Buffer Size on Linux Platforms on page 16

• Disable SYN Cookies on Linux Platforms on page 16

• Client Requirements on page 16

Supported Configurations

The following table shows all supported conﬁgurations for vFabric SQLFire. These include vFabric Suite

supported conﬁgurations, as well as additional conﬁgurations supported by standalone vFabric SQLFire.

Note:

The tables indicate whether the supported conﬁguration is for production or development. Generally, production support means you can run your production application on the platform; developer support means you can develop on the platform but you should not run your production application on it.

Operating System

Architecture

SP2 Professional or Enterprise*

Standard*

Professional or Enterprise*

Production or Developer SupportJVMProcessor

ProductionJava SE 6, update 0_26x86_64, x86_32Red Hat EL 5

ProductionJava SE 6, update 0_26x86_64, x86_32Red Hat EL 6

ProductionJava SE 6, update 0_26x86_64, x86_32Windows Server 2003 R2

ProductionJava SE 6, update 0_26x86_64, x86_32Windows Server 2008 R2

DeveloperJava SE 6, update 0_26x86_64, x86_32Windows 7 SP1

DeveloperJava SE 6x86_64Windows XP

DeveloperJava SE 6x86_64Ubuntu 10.x, 11.x**

*The Microsoft Loopback Adapter is not supported. **Only stable and non-EOL releases supported.

Note: The SQLFire product download does not include Java; you must download and install a supported JDK for your system.

Host Machine Requirements

Requirements for each host:

• A supported Java SE installation.

• File system that supports long ﬁle names.

• Adequate per-user quota of ﬁle handles (ulimit for Linux)

• TCP/IP.

• System clock set to the correct time.

•

For each Linux host, the hostname and host ﬁles must be properly conﬁgured. See the system manpages for hostname and hosts.

• Time synchronization service such as Network Time Protocol (NTP).

Note: For troubleshooting, you must run a time synchronization service on all hosts. Synchronized time stamps allow you to merge log messages from different hosts, for an accurate chronological history of a distributed run.

Increase Unicast Buffer Size on Linux Platforms

On Linux platforms, execute the following commands as the root user to increase the unicast buffer size:

sysctl -w net.core.rmem_max=1048576 sysctl -w net.core.wmem_max=1048576

Disable SYN Cookies on Linux Platforms

Many default Linux installations use SYN cookies to protect the system against malicious attacks that ﬂood TCP SYN packets. The use of SYN cookies dramatically reduces network bandwidth, and can be triggered by a running SQLFire distributed system.

If your SQLFire distributed system is otherwise protected against such attacks, disable SYN cookies to ensure that SQLFire network throughput is not affected.

To disable SYN cookies permanently:

Edit the /etc/sysctl.conf ﬁle to include the following line:

net.ipv4.tcp_syncookies = 0

Setting this value to zero disables SYN cookies.

Reload sysctl.conf:

sysctl -p

Client Requirements

SQLFire supports two JDBC drivers: a thin client JDBC dri ver and a peer JDBC driv er . SQLFire server instances and the peer driver are supported only with Java SE 6. You can download Java from

http://www.oracle.com/technetwork/java/javase/downloads/index.htm.

SQLFire provides a managed ADO.NET driver that you can use for developing non-Java client applications. The ADO.NET driver uses IKVM technology to translate SQLFire JDBC core libraries to .NET MSIL. The ADO.NET driver is supported for applications built using the Microsoft .NET 2.0 SP1 or higher framework.

vFabric SQLFire User's Guide16

Part 1

Getting Started with vFabric SQLFire

Getting Started with vF abric SQLFir e provides step-by-step procedures for installing, conﬁguring, and using VMw are vFabric™ SQLFire. The guide also explains main concepts and provides tutorials to help you quickly begin using SQLFire.

Topics:

• Overview of vFabric SQLFire

• Understanding the SQLFire Distributed System

• Installing vFabric SQLFire

• Activating vFabric SQLFire Licensing

• Upgrading vFabric SQLFire

• Connect to vFabric SQLFire with JDBC Tools

• Tutorials

• vFabric SQLFire Features and Benefits

Chapter 1

Overview of vFabric SQLFire

vFabric SQLFire is a memory-optimized, distributed database management system designed for applications that hav e demanding scalability and availability requirements. Applications can manage database tables entirely in memory, or they can persist tables to disk to reload the data after restarting the system. A SQLFire distributed system can be easily scaled out using commodity hardware.

Getting Started with vFabric SQLFire

Data Flow Diagram

GemFire, Apache Derby, and SQL Components

vFabric SQLFire incorporates core vFabric GemFire technology and Apache Derby RDBMS components to provide a high-performance, distributed database management system. SQLFire extends standard SQL statements where necessary for creating and managing tables and conﬁguring the SQLFire system.

The sections that follow document how SQLFire utilizes the GemFire and Derby component functionality.

vFabric GemFire Technology

vFabric SQLFire incorporates the following VMware vFabric GemFire technology:

• Reliable data distribution

• High performance replication and partitioning

• Caching framework

• Parallel 'data-aware' application behavior routing

vFabric SQLFire User's Guide4

Overview of vFabric SQLFire

The SQLFire community site provides a comparison of SQLFire to other data management systems, such as vFabric GemFire.

Apache Derby RDBMS Components

SQLFire integrates vFabric GemFire functionality with several components of the Apache Derby relational database management system (RDBMS):

• JDBC driver. SQLFire supports a native, high performant JDBC driver (peer driver) and a thin JDBC driver. The peer driver is based on the Derby embedded driver and JDBC 4.0 interfaces, but all communication with SQLFire servers is implemented through the vFabric GemFire distribution layer.

• Query engine. SQLFire uses Derby to parse the SQL queries and generate parse trees. SQLFire injects its own logic for intermediate plan creation and distributes the plan to data stores in the cluster . SQLFire also capitalizes on some aspects of the built-in optimizer in Derby to generate query plans. The query execution itself uses memory-based indexes and custom storage data structures. When query e xecution requires distribution, SQLFire uses a custom algorithm to execute the query in parallel on multiple data stores.

• Network server. SQLFire servers embed the Derby network server for connectivity from thin JDBC and ADO.NET clients. The communication protocol is based on the DRDA standard that is used by in IBM DB2 drivers.

SQL Extensions

vFabric SQLFire modiﬁes and extends the query engine and SQL interface to provide support for partitioned and replicated tables, data-aware procedures, data persistence, data eviction, and other features unique to the distributed SQLFire architecture. SQLFire also adds SQL commands, stored procedures, system tables, and functions to help easily manage features of the distributed system, such as persistent disk stores, listeners, and locators.

Chapter 2

Understanding the SQLFire Distributed System

A SQLFire deployment consists of distributed member processes that connect to each other to form a peer-to-peer network, also known as a distributed system or SQLFire cluster.

The sections that follow explain the interactions of main system components and processes. Tutorials on page 27 help you get started conﬁguring and using a SQLFire distributed system.

vFabric SQLFire Members

Member processes form a single, logical system, and each member has single-hop access to any other member, with single-hop or no-hop access to data.

A SQLFire member is an instance of the SQLFire code that runs in a JVM. A SQLFire member can optionally host data, provide network server functionality for client connections, and provide location services for the distributed system.

A SQLFire distributed system is dynamic, and members can be added or removed at any time. The SQLFire implementation guarantees a consistent view of the distributed system to ensure that data consistency and data integrity are not compromised.

Most SQLFire members are conﬁgured to host data, and are referred to as data stores. Members that are conﬁgured to not host data are referred to as accessors. Both data stores and accessors can execute the DDL and DML commands that SQLFire supports. Data stores provide single-hop or no-hop access to data stored that is stored on members of the distributed system. Accessors provide single-hop access to data stores in the distributed system. Data stores and accessors are licensed separately.

A third type of member, the standalone locator, does not host data and does not support DDL and DML statements on user-deﬁned tables. You use locators to discover members of the SQLFire cluster.

A SQLFire agent is an optional distributed system member that provides access to JMX MBeans for monitoring and managing SQLFire.

For more information, see:

• Starting and Conﬁguring SQLFire Servers on page 215

• Using Locators on page 211

• Start a SQLFire Peer with the Peer Client JDBC Driver on page 107

• Overview of the vFabric SQLFire JMX Agent on page 281

Getting Started with vFabric SQLFire

Servers, Peer Clients, and Server Groups

A SQLFire server is a process that hosts data and is a member of a peer-to-peer distributed system. SQLFire servers run in Java Virtual Machines (JVMs).

You start a SQLFire server using the sqlf tool from a command prompt or terminal window. sqlf launches servers as standalone processes that are akin to database servers. The servers can accept TCP connections from thin clients, authenticate credentials, manage sessions, delegate work to a thread pool for SQL processing, and so forth.

A peer client, also known as an embedded client, is a SQLFire-aware JDBC client that connects to the distrib uted system using the JDBC peer driver. Peer clients are always connected to the distributed system, and they have single-hop access to data. A peer client can be conﬁgured as a pure client (referred to as an accessor member) or as a client that also hosts data (a data store).

Both JDBC peer client processes and server processes are peer members of the distributed system. The members discover each other dynamically through a built-in multicast based discovery mechanism or by using a locator service when TCP discovery is more desirable.

Note: In addition to peer client members that participate in a cluster, SQLFire servers support thin client connections from JDBC and ADO.NET thin client drivers. See Developing Java Clients and Peers on page 105 and Developing ADO.NET Client Applications on page 117.

SQLFire servers and peer clients that host data (when the host-data property is set to true) are automatically part of the default server group. A server gr oup is a logical grouping of SQLFire serv er and peer client members that deﬁnes the data stores that should host data for table. When any SQLFire object is created, such as a table, in the CREATE TABLE statement you can specify the server group name where the table will be hosted. If no group is speciﬁed, the table is hosted in the default server group. Using Server Groups to Manage Data on page 47 provides additional information.

For more information, see:

• Starting and Conﬁguring SQLFire Servers on page 215

• Using Locators on page 211

• Start a SQLFire Peer with the Peer Client JDBC Driver on page 107

• Using Server Groups to Manage Data on page 47

Discovery Mechanisms

A peer member (a server or peer client process) announces itself to the distributed system using one of two mechanisms.

SQLFire provides these discovery mechanisms:

• Locator (TCP/IP). A locator service maintains a registry of all peer members in the distrib uted system at an y

given moment. A locator is typically started as a separate process (with redundancy), but you can also embed a locator in any peer member, such as a SQLFire server. The locator opens a TCP port through which all new members connect to get initial membership information.

• UDP/IP Multicast. Members can optionally use a multicast address to broadcast their presence and receive

membership notiﬁcation information.

Conﬁguring Discovery Mechanisms on page 211 provides more information.

Group Membership Service

The Group Membership Service (GMS) uses self-deﬁned system membership. Processes can join or leave the distributed system at any time. The GMS communicates this information to every other member in the system,

vFabric SQLFire User's Guide8

Understanding the SQLFire Distributed System

with certain consistency guarantees. Each member in the group participates in membership decisions, which ensures that either all members see a new member or no members see it.

The membership coordinator, a key component of the GMS, handles "join" and "lea ve" requests, and also handles members that are suspected of having left the system. The system automatically elects the oldest member of the distributed system to act as the coordinator, and it elects a new one if the member fails or is unreachable. The coordinator's basic purpose is to relay the current membership view to each member of the distributed system and to ensure the consistency of the view at all times.

Because the SQLFire distributed system is dynamic, you can add or remove members in a very short time period. This makes it easy to reconﬁgure the system to handle added demand (load).The GMS permits the distributed system to progress under conditions in which a statically-deﬁned membership system could not. A static model deﬁnes members by host and identity, which mak es it difﬁcult to add or remove members in an acti ve distributed system. The system would have to shut do wn partially or completely to expand or contract the number of members that participate in the system.

For more information, see:

• Start and Stop SQLFire Servers Using sqlf on page 215

• Connect to a Distributed System Using Locators on page 213

• Rebalancing Partitioned Data on SQLFire Members on page 59

Replicated Tables and Partitioned Tables

Tables in SQLFire can be partitioned or replicated. A replicated table keeps a copy of its entire data set locally on every SQLFire server in its server group. A partitioned table manages large volumes of data by partitioning it into manageable chunks and distributing those chunks across all members in the table's server group.

By default, all tables are replicated unless you specify partitioning in the CREATE TABLE statement. (You can change the default behavior to create partitioned tables using the table-default-partitioned boot property). The schema information for all SQLFire objects is visible at all times to all peer members of the distributed system including peer clients, but excluding standalone locators.

Partitioning Tables on page 51 and Replicating Tables on page 61 provide more information.

Parallel Execution of Data-Aware Stored Procedures

In a traditional relational database, stored procedures are application routines that are stored as part of the data dictionary and executed on the database system itself. Stored procedures generally offer high performance because they execute in close proximity to data required by the application logic. SQLFire extends this basic stored procedure capability to support parallel execution of application logic on table data that is partitioned across many peers.

SQLFire applications can execute stored procedures on speciﬁc data hosts, in parallel on all the members of a server group, or can target speciﬁc members based on the data requirements for the procedure. Essentially, application behavior that is encapsulated in stored procedures is moved to the process that hosts the associated data set, and it is executed there. If the required data set is spread across multiple partitions, the procedure is executed in parallel on the partition members. Results are streamed to a coordinating member and aggregated for the client invoking the procedure.

For example, consider an 'Order' table that is partitioned by its 'customer_id', and an application wanting to execute an expensive 'credit check' for several customers. Assume the credit test requires iteration over all the order history. You can parallelize the execution on all members that manage data for these customers and stream the results to the client. All order history required by each execution is locally available in-process.

// typical procedure call CallableStatement callableStmt = connection.prepareCall("{CALL

Getting Started with vFabric SQLFire

order_credit_check(?) "); callableStmt.setArray(1, <list of customer IDs>);

// SQLFire data-aware procedure invocation CallableStatement callableStmt = connection.prepareCall("{CALL order_credit_check() " + "ON TABLE Orders WHERE customerID IN (?)}"); callableStmt.setArray(1, <list of customer IDs>);

// order_credit_check will be executed in parallel on all members where the orders // corresponding to the customerIDs are managed

For more information, see:

• Using Data-Aware Stored Procedures on page 143

• Using the Procedure Provider API on page 149

• Using the Custom Result Processor API on page 153

Cache Plug-ins for External Data Connections

SQLFire is commonly used as a distributed SQL cache in an embedded (peer client) or client-server conﬁguration. It provides a plug-in framework to connect the cache to external data sources such as another relational database.

Several cache plug-ins are supported:

• Reading from Backend on a Miss ("read through")

When SQLFire is used as a cache, applications can conﬁgure a SQL RowLoader that is triggered to load the data from a backend repository on a miss in SQLFire. When an incoming query request for a uniquely identiﬁed row cannot be satisﬁed by the distributed cache, the loader is invoked to retrieve the data from an external source. SQLFire locks the associated row and prevents concurrent readers trying to fetch the same row from bombarding the backend database.

Note: When SQLFire is used as a "pure" cache (that is, some or all tables are conﬁgured with LRU eviction, and only the actively used subset of data is cached in SQLFire), queries on these tables can be based only on the primary key. Only primary key-based queries invoke a conﬁgured loader. Any other query potentially could produce inconsistent results from the cache.

See Using a RowLoader to Load Existing Data on page 175.

• Synchronous Write Back to Backend ("write through")

When data is strictly managed in memory, even with replication, a failure of the entire cluster means loss of data. With synchronous write-through, all changes can be synchronously written to the backend repository before the cache is changed. If and only if the write-through succeeds, the data becomes visible in the cache. You conﬁgure synchronous write-through with a SQLFire "cache writer". See Handling DML Events

Synchronously on page 183.

• Asynchronous Write Back to Backend ("write behind")

If synchronous writes to the backend are too costly, the application can conﬁgure the use of a "write behind cache listener". SQLFire supports several options for how the events are queued, batched, and written to the database of record. It is designed for very high reliability and handles many failure conditions. Persistent queues ensure writes even when the backend database is temporarily unavailable. You can order event delivery or batch it based on a time interval. You can also conﬂate a certain number of rows and continuous updates to a single update, to reduce the load on the backend database of an 'update heavy' system. See Handling DML

Events Asynchronously on page 185.

vFabric SQLFire User's Guide10

Chapter 3

Installing vFabric SQLFire

You can install vFabric SQLFire from the VMware yum repository (RHEL only) or from a downloaded JAR ﬁle. Installation procedures also vary according to whether you obtained vFabric SQLFire as a standalone product or as part of VMware® vFabric Suite™Advanced.

Installation Note for vFabric Suite Customers

vFabric SQLFire is available as a standalone product installation, and as part of vF abric Suite Advanced. vF abric Suite is a set of runtime components that let you build, scale, and run modern, agile applications in virtual environments.

If you obtain SQLFire standalone, you can install it on physical or virtual machines according to the procedures in this document.

If you obtain SQLFire as part of vFabric Advanced, you install it exclusively on VMware virtual machines that run on vSphere. The vFabric Suite install process adds the vFabric License Server to an e xisting vCenter serv er. The vFabric License Server accepts a network license, actually a pool of licenses for vFabric components, which makes it easy to manage the licensing of all components installed on the VMs.

If you obtain SQLFire as part of vFabric Suite Advanced, ﬁrst complete the license activation and installation procedures in Getting Started with vFabric Suite . Then follow procedures in this document to set up your environment for SQLFire and complete any remaining SQLFire-speciﬁc installation and conﬁguration tasks.

RHEL Only: Install vFabric SQLFire from the VMWare YUM Repository

If your operating system is Red Hat Enterprise Linux (RHEL), you have the option to use yum to install and manage vFabric SQLFire instead of using the JAR ﬁle installer.

The yum installation process can be used whether you have purchased vFabric Suite or only the vFabric SQLFire product. vFabric Suite users must ﬁrst install the vFabric License Server in the vCenter environment before installing components. See Getting Started with vFabric Suite for more information.

Prerequisites

See Supported Conﬁgurations and System Requirements on page 15 for speciﬁc host machine requirements for Linux systems.

Procedure

1. Log in as the root user to the RHEL VM on which you are going to install the vFabric component and start

a terminal.

2. Run the rpm command to install both the vfabric-5-repo and vfabric-all-repo RPMs from the VMware

repository.

Getting Started with vFabric SQLFire

For RHEL 5:

rpm -Uvh http://repo.vmware.com/pub/rhel5/vfabric/5.1/vfabric-5.1-repo-5.1-1.noarch.rpm rpm -Uvh http://repo.vmware.com/pub/rhel5/vfabric-all/vfabric-all-repo-1-1.noarch.rpm

For RHEL 6:

rpm -Uvh http://repo.vmware.com/pub/rhel6/vfabric/5.1/vfabric-5.1-repo-5.1-1.noarch.rpm rpm -Uvh http://repo.vmware.com/pub/rhel6/vfabric-all/vfabric-all-repo-1-1.noarch.rpm

Execute the yum install command to install vFabric SQLFire:

yum install vfabric-sqlfire

Note: The yum install command installs the most recent version of the vFabric SQLFire RPM that it ﬁnds in all installed repositories. If you want to install another version, you must explicitly specify the version with the yum install command. Use yum search vfabric-sqlfire

--showduplicates to ﬁnd all versions that are available in the installed repositories.

The yum command begins the install process, resolves dependencies, and displays the packages it plans to install.

If this is the ﬁrst time that you install a vFabric component on the VM, the yum command also installs the vfabric-eula RPM and prompts you to accept the VMware license agreement.

Enter y at the prompt to begin the actual installation.

5. The installer may prompt you to install a GNU Privacy Guard (GPG) k ey for the vFabric repositories. Enter y at each prompt to continue the installation.

Press Space to scroll and read the license agreement. When you reach the end of the agreement, enter yes to accept the license.

The installer displays Complete! when the installation process is ﬁnished.

7. If you have not already done so, download and install a compatible JDK or JRE on the RHEL computer or VM.

Optionally, specify that the vfabric-sqlfire process should automatically start when the operating system starts by running the following command:

chkconfig --level 35 vfabric-sqlfire on

Optionally, specify the conﬁguration of the vfabric-sqlfire process by editing the ﬁle /etc/sysconfig/sqlfire, which is the ﬁle sourced by the script that you will later use to start the SQLFire process (/etc/init.d/vfabric-sqlfire.)

The /etc/sysconfig/sqlfire ﬁle includes many comments to help you decide whether you need to modify it. Here are additional pointers:

• If you do not modify the /etc/sysconﬁg/sqlﬁre ﬁle but simply use the one installed by default, the

vfabric-sqlﬁre process starts up a server instance in a multicast conﬁguration.

• If you want the vfabric-sqlﬁre process to start up using a locator-based conﬁguration, change the LOCATOR

property in the /etc/sysconﬁg/sqlﬁre ﬁle to local, as shown:

LOCATOR=local

This conﬁguration allows a local locator process to start along with a local server instance. To add additional remote locators, add their IP address and port to the LOCA T OR_IP_STRING as shown in the conﬁguration ﬁle as a commented-out example.

vFabric SQLFire User's Guide12

Installing vFabric SQLFire

• If you want to start up only a local locator process and not a local server instance, set the LOCATOR property to locator_only. This sets up a redundant locator conﬁguration; be sure you add the locator IP addresses and port numbers to the LOCATOR_IP_STRING; an example is shown in the conﬁguration ﬁle.

• Finally, set the LOCATOR property to remote if you want to start a local server instance that relies on having locator processes running on one or more remote hosts. Specify these remote hosts using the LOCATOR_IP_STRING property.

10. Start the processes associated with SQLFire by running the following command:

/etc/init.d/vfabric-sqlfire start

By default, the process uses an evaluation license; if you ha ve purchased a production license, see Acti vating

vFabric SQLFire Licensing on page 15 for information about conﬁguring it in the

/opt/vmware/vfabric-sqlfire/vFabric_SQLFire_10x/sqlfire.properties ﬁle. The RPM installation process creates a skeleton sqlfire.properties ﬁle to get you started.

T o stop, restart, and get status about the processes, pass the stop, restart, and status parameters, respectively, to the /etc/init.d/vfabric-sqlfire script:

/etc/init.d/vfabric-sqlfire status

Install vFabric SQLFire from a JAR File

This procedure describes how to install the SQLFire software on a single computer or VM, for either a peer client or client/server deployments. Repeat the procedure to install and license SQLFire on each physical or virtual machine where you want to run a SQLFire member.

Prerequisites

1. Conﬁrm that your system meets the hardware and software requirements described in Supported Conﬁgurations

and System Requirements on page 15.

2. Download SQLFire from the VMware downloads page.

Procedure

1. The SQLFire installation program is distributed as an executable JAR ﬁle, and it requires Java SE 6 or higher JVM to run. Ensure that the path to a supported Java executable is in your PATH environment variable:

java -version

2. Change to the directory where you downloaded the SQLFire software, and execute the JAR ﬁle installer:

java -jar vFabric_SQLFire_10x_Installer.jar

Use the actual ﬁlename of the installer you downloaded (for example, vFabric_SQLFire_103_Installer.jar).

Press Enter to scroll and read the license agreement. When you reach the end of the agreement, enter agree to accept the license.

4. Enter the full path where you want to install SQLFire. By default, SQLFire is installed in the directory where you executed the JAR ﬁle.

Enter yes to create the directory if necessary, or to verify that the installation directory is correct. The installer copies the license agreement and installs SQLFire to a top-level vFabric_SQLFire_10x

directory in the location you speciﬁed.

6. Conﬁgure your SQLFire license, if you have purchased a production license. Activating vFabric SQLFire

Licensing on page 15 provides more information.

Getting Started with vFabric SQLFire

To begin using the sqlf utility to start servers or execute other commands, add the

vFabric_SQLFire_10x/bin directory to your path. For example: export PATH=$PATH:/vFabric_SQLFire_10x/bin

The sqlf script automatically sets the class path relative to the installation directory.

8. Repeat this procedure to install and license SQLFire on each different computer where you want to run a SQLFire member.

vFabric SQLFire User's Guide14

Chapter 4

Activating vFabric SQLFire Licensing

vFabric SQLFire includes a default evaluation license that enables you to run the product tutorial and perform simple evaluation acti vities. You can obtain custom evaluation licensing and purchase production licensing from your VMware account manager or the VMware license portal.

Understand vFabric SQLFire License Options

vFabric SQLFire has a number of licensing options. Read the following sections to understand the different license types.

Default Evaluation License

The default evaluation license included with SQLFire allows you to run up to three non-production data stores in the distributed system. SQLFire uses the default evaluation license upon startup if custom evaluation or production licensing has not been installed. This license never expires. It is not allowed for production use.

Custom Evaluation and Production Licenses

T o get expanded conﬁguration options and to move into production with SQLFire, contact your VMware account manager. You can obtain custom evaluation licenses for your speciﬁc ev aluation requirements. Y ou can purchase a production license when you are ﬁnished with your evaluation. Your VMware account manager will help you determine the best topology and expiration options for your needs. For information on purchasing SQLFire licensing, see http://www.vmware.com/products/datacenter-virtualization/vfabric-sqlﬁre/buy.html.

Custom evaluation licenses have an expiration date. Production licenses never expire.

vFabric SQLFire Professional and Enterprise Licenses

SQLFire uses these main license types for custom evaluation and production licensing:

• vFabric SQLFire Professional license. The vFabric SQLFire Professional license enables you to deploy a

SQLFire distributed system in a production environment. A license is required for each SQLFire distributed system member that hosts data, whether or not they also act as a server in client/server deployments.

• vFabric SQLFire Enterprise license. The standalone vFabric SQLFire Enterprise license enables you to

deploy a SQLFire distributed system in a production environment. This license also enables you to loosely couple SQLFire distributed systems that are at different geographical locations (WAN replication). A license is required for each SQLFire distributed system member that hosts data, whether or not they also act as a server in client/server deployments.

Because the vFabric SQLFire Enterprise license is available only as a standalone license, you must install the license in the sqlfire.properties ﬁle or as a boot property. You cannot install a vFabric SQLFire Enterprise license in the vFabric License Server.

Getting Started with vFabric SQLFire

SQLFire licenses do not restrict the number of clients that can connect to a distributed system. Any number of thin clients and peer clients (SQLFire accessor members) are permitted.

vFabric SQLFire Local Licensing and vFabric Suite Licensing

If you obtained vFabric SQLFire as a standalone product, you install a license locally on each physical and virtual machine that runs vFabric SQLFire.

If you obtained vFabric SQLFire as part of vFabric Suite, you can either use the vFabric License Server to install your SQLFire licenses, or install the license locally in vSphere virtual machines. To use the vFabric License Server for licensing, you must ﬁrst install and conﬁgure the vFabric License Server. Then you install your SQLFire serial numbers in the vFabric License Server. SQLFire will retrieve your licenses dynamically.

To set up the vFabric License Server, see Install vFabric License Server on Your vCenter Server in the vFabric Suite documentation.

Note: You cannot use the vFabric License Server to manage licenses on physical machines, or with standalone SQLFire licenses such as vFabric SQLFire Enterprise.

Choose a License Option Based on Topology

How you plan to use vFabric SQLFire determines which SQLFire license offerings are right for your system. Licensing options are available for the following topologies:

• Single-site System with Multiple Data Stores on page 16

• Multi-site (WAN) System on page 17

For further information about SQLFire purchase options, refer to

http://www.vmware.com/products/datacenter-virtualization/vfabric-sqlﬁre/buy.html.

Single-site System with Multiple Data Stores

Any client/server system or peer-to-peer system requires a vFabric SQLFire license for the SQLFire data stores (SQLFire members that host data). You conﬁgure these licenses for each SQLFire member.

The vFabric SQLFire Professional license enables you to run multiple data stores in a single cluster, with unlimited clients.

Figure 1: Example vFabric SQLFire Professional Deployment

If you need to replicate table data across multiple SQLFire clusters, you must purchase and install a standalone vFabric SQLFire Enterprise license. See Multi-site (WAN) System on page 17.

vFabric SQLFire User's Guide16

Activating vFabric SQLFire Licensing

Installing a vFabric SQLFire Professional License

1. Obtain a product vFabric SQLFire Professional license from VMware.

2. Install the license in SQLFire. Choose one of the following options:

• On each SQLFire data store member, add the vFabric SQLFire license serial number directly to the sqlﬁre.properties ﬁle. See Option 1: Install Licenses Using sqlﬁre.properties on page 19; or

• Create a ﬁle that contains the vFabric SQLFire license serial number, and copy the ﬁle to the vFabric serial number directory on each SQLFire data store member. See Option 2: Install Licenses Using Serial Number

Files on page 20; or

• If you have purchased SQLFire as part of vFabric Suite, install the vFabric SQLFire license using vFabric License Server. See vFabric Suite Only: Conﬁgure vFabric SQLFire for vFabric License Server on page

20.

Multi-site (WAN) System

If you need to replicate table data across multiple SQLFire clusters (for example, over a WAN), you must purchase and install a standalone vFabric SQLFire Enterprise license. Any number of thin clients and peer clients (SQLFire accessor members) are permitted in each cluster. All systems that participate in the WAN installation must be licensed. The license is conﬁgured in each data store member in the distributed system. Any number of thin clients and peer clients (SQLFire accessor members) are permitted in each cluster.

Installing a vFabric SQLFire Enterprise License

Getting Started with vFabric SQLFire

1. Obtain a vFabric SQLFire Enterprise license. You can use the same serial number in each distributed system; however, as a best practice you should obtain a unique serial number for each distributed system that communicates over the WAN.

2. Install the license in SQLFire. Choose one of the following options:

• On each data store member in the distributed system, add the vFabric SQLFire Enterprise serial number

directly to the sqlﬁre.properties ﬁle. See Option 1: Install Licenses Using sqlﬁre.properties on page 19; or

• Create a ﬁle that contains both the vFabric SQLFire Enterprise serial number, and copy the ﬁle to the

vFabric serial number directory on all SQLFire data store members. See Option 2: Install Licenses Using

Serial Number Files on page 20.

3. If you have obtained unique serial numbers for your other distributed systems, install the unique serial numbers on the other distributed systems by using one of the options described in step 2.

How vFabric SQLFire Manages Licensing

Before you install vFabric SQLFire licenses, understand how SQLFire manages your licensing information.

• How vFabric SQLFire Finds and Veriﬁes Your License on page 18

• License Working Directory on page 18

• Local VMware vFabric Directories on page 19

How vFabric SQLFire Finds and Verifies Your License

SQLFire has a default license that it uses if it cannot locate any other valid licensing. Non-default licensing is veriﬁed using a combination of the product sqlfire.jar ﬁle and the serial numbers

you provide, or a license may be acquired dynamically from the vFabric License Server. SQLFire uses the ﬁrst valid licensing it ﬁnds in this list:

Serial number speciﬁed by the license-serial-number boot property.

2. Dynamic licensing provided from a serial number ﬁle stored in the local VMware vFabric serial number directory, conﬁgured by specifying license-serial-number=dynamic as a boot property.

3. Dynamic licensing provided by the vFabric License Server, conﬁgured by specifying license-serial-number=dynamic as a boot property.

If SQLFire cannot validate any of the speciﬁed licensing in the above list, the member process does not start and throws an exception.

If no licensing is speciﬁed, SQLFire uses the default evaluation licensing shipped with the product.

License Working Directory

SQLFire stores licensing information in a directory on your system. SQLFire writes to the ﬁrst writable directory it ﬁnds in this list:

The value of the license-working-dir boot property, if speciﬁed for the member.

2. The SQLFire member's current working directory as determined by System.getProperty("user.dir") at startup.

These are the ﬁles that SQLFire writes:

License state ﬁles with names ending with -license.cfg. Example: vf.sf-license.cfg.

License events ﬁles with names ending with -events.txt. Example: vf.sf-events.txt.

vFabric SQLFire User's Guide18

Activating vFabric SQLFire Licensing

Leave these ﬁles alone. Do not edit or delete these ﬁles, and do not alter the permissions on the ﬁles or the directory where these ﬁles are located. These ﬁles are created using the default permissions of the user who is starting up the SQLFire process. To stop or start the SQLFire process, a user needs write permissions for this directory and the ﬁles in it. Make sure that the permissions for each user are sufﬁcient; otherwise SQLFire may throw an exception during stop or start.

Local VMware vFabric Directories

The location of the local VMware vFabric home directory, if it exists, varies by operating system:

Windows

Linux (or other OS)

%ALLUSERSPROFILE%\VMware\vFabric /opt/vmware/vFabric

The location of the local VMware vFabric serial number directory, if it exists, varies by operating system:

Windows

Linux (or other OS)

%ALLUSERSPROFILE%\VMware\vFabric /etc/opt/vmware/vfabric

Install and Configure vFabric SQLFire Licenses

Installation is required for all licensing except the default evaluation license, which is used automatically when other licensing is not provided. You do not need to install or conﬁgure the default evaluation license.

When you obtain custom evaluation licenses or purchase production licenses, you receive one or more serial numbers to use in your SQLFire member conﬁgurations. The number of serial numbers that you receive depends on the type of licensing that you purchase. You install licensing in all members that you run as data stores. You do not need to install licensing in your clients (thin client applications or SQLFire accessors). Choose a License

Option Based on Topology on page 16 describes the appropriate licensing for each topology.

Note: If you obtain SQLFire as part of vFabric Suite, refer ﬁrst to the license activation procedure in

Getting Started with vFabric Suite in the vFabric Suite documentation. Then complete the tasks in this

procedure, as necessary, to conﬁgure SQLFire-speciﬁc licensing.

Where to Install Your License

If you obtained vFabric SQLFire as a standalone product, you install licenses locally to each SQLFire member by modifying the sqlfire.properties ﬁle, or by supplying a boot property when you start the SQLFire member. See Option 1: Install Licenses Using sqlﬁre.properties on page 19 and Option 2: Install Licenses Using

Serial Number Files on page 20.

If you are installing SQLFire in a vSphere virtual machine as part of vFabric Suite, you can use the vFabric License Server to install your SQLFire licenses and then conﬁgure vFabric SQLFire to communicate with the license server. See vFabric Suite Only: Conﬁgure vFabric SQLFire for vFabric License Server on page 20.

Note: You cannot use the vFabric License Server if you purchased a standalone SQLFire license, such as vFabric SQLFire Enterprise. Install standalone licenses in sqlfire.properties or as a boot property.

Option 1: Install Licenses Using sqlfire.properties

Add a license key to the sqlﬁre.properties ﬁle on each data store member in your deployment, or specify the keys as a boot property when you start a SQLFire member using sqlf or the FabricServer interface.

Getting Started with vFabric SQLFire

This is the recommended way to install licensing for standalone SQLFire deployments. (For SQLFire deployments installed on vSphere virtual machines as part of a vFabric Suite deployment, VMware recommends that you use the vFabric License Server. See vFabric Suite Only: Conﬁgure vFabric SQLFire for vFabric License Server on page 20.)

For example:

# sqlfire.properties for data store or accessor member license-serial-number=#####-#####-#####-#####-#####

If you need to specify multiple serial numbers, use a comma separated list:

# sqlfire.properties for data store or accessor member license-serial-number=#####-#####-#####-#####-#####,#####-#####-#####-#####-#####

Option 2: Install Licenses Using Serial Number Files

Place serial numbers in serial number ﬁles in the VMware vFabric serial number directory. T o conﬁgure SQLFire to use these for licensing, you need to specify license-serial-number=dynamic

as a boot property. You can specify this property in sqlfire.properties, or when starting a SQLFire member using sqlf or the FabricServer interface.

Use this procedure to install and conﬁgure serial number ﬁles for any standalone SQLFire license.

Create a ﬁle named vf.sf-serial-numbers.txt and paste the serial number into it.

Note: If you have multiple serial numbers, enter each serial number on a new line.

2. Save the ﬁle to the appropriate serial numbers directory on each SQLFire member. See Local VMware

vFabric Directories on page 19 for the appropriate directory based on your operating system.

On all SQLFire data store members, specify license-serial-number=dynamic in

sqlfire.properties, or as a boot property. For example: # sqlfire.properties for dynamic licensing of data stores

license-serial-number=dynamic

vFabric Suite Only: Configure vFabric SQLFire for vFabric License Server

You can only use this conﬁguration method if you are running SQLFire on a vSphere virtual machine as part of vFabric Suite. This method is not applicable for installing a standalone vFabric SQLFire license, such as a vFabric SQLFire Enterprise license.

To set up vFabric License Server licensing, refer ﬁrst to the license activation procedure in Activate vFabric

Suite Licenses in the vFabric Suite documentation.

After you install SQLFire licenses on the vFabric License Server, conﬁgure dynamic licensing in SQLFire:

Specify license-serial-number=dynamic in the sqlfire.properties ﬁle on each data store or accessor member. For example:

# sqlfire.properties for dynamic licensing license-serial-number=dynamic

2. (Optional) Change the default timeout value (10000) to indicate the maximum time in milliseconds that the member should wait when obtaining a license dynamically from the vFabric License Server. For example, in sqlﬁre.properties:

#timeout in milliseconds license-server-timeout=20000

vFabric SQLFire User's Guide20

Activating vFabric SQLFire Licensing

Verify Your License and Check Your License Usage

You can verify the validity of your license and monitor your license usage. If you install an invalid serial number or if SQLFire cannot obtain a dynamic license from the vFabric License

Server, SQLFire fails to start and throws an exception. In Getting Started with vFabric Suite, see "Using the vFabric Licensing Check Tool" to learn how to verify your license.

See Licensing Problems on page 663 for a list of common problems and their solutions. If you obtained vFabric SQLFire as part of vFabric Suite and want to check your license usage, see Monitoring

vFabric License Usage in the vFabric Suite documentation.

Chapter 5

Upgrading vFabric SQLFire

SQLFire 1.0.x is compatible with persistence ﬁles (disk stores) created in earlier versions of SQLFire. If you use the JAR ﬁle installation program, simply install SQLFire 1.0.x into a new, dedicated directory, and update your path to use the new vFabric_SQLFire_10x/bin directory (for example, vFabric_SQLFire_103/bin).

Refer to the Release Notes to learn about changes that were introduced in the latest update.

Note: If you installed SQLFire using an RPM from the VMware YUM repository, see RHEL Only: Upgrade

vFabric SQLFire from the VMWare YUM Repository on page 23 for upgrade instructions.

RHEL Only: Upgrade vFabric SQLFire from the VMWare YUM Repository

When VMware releases a new maintenance version of SQLFire, the appropriate VMware YUM repository is updated to include the new RPM. You can use yum to quickly upgrade SQLFire to the latest version.

Prerequisites

• You must have an existing RPM installation of SQLFire. See RHEL Only: Install vFabric SQLFire from the

VMWare YUM Repository on page 11.

• Stop the default vFabric SQLFire server if it is currently running:

/etc/init.d/vfabric-sqlfire stop

Procedure

1. On the RHEL VM or computer on which you have installed vFabric SQLﬁre, start a terminal either as the

root user or as an unprivileged user using sudo.

Execute the yum upgrade command to upgrade SQLFire:

yum upgrade vfabric-sqlfire

Note: The yum upgrade command shown in the preceding e xample upgrades SQLFire to the most recent RPM it ﬁnds in all of the repositories that the yum command knows about. If you want to upgrade to a version that is not the most recent, then you must explicitly specify the version in the yum upgrade command. Use yum search vfabric-sqlfire --showduplicates to ﬁnd all versions of a component in all repositories.

Enter y at the prompt to begin the actual upgrade. When the upgrade process is ﬁnished, you see the Complete! message.

Getting Started with vFabric SQLFire

4. If you use the default SQLFire server conﬁguration, then you must copy user data from the old server installation directory to the new, shared data directory (/var/opt/vmware/sqlfire). For example, if you were upgrading from version 1.0.1 you would enter:

prompt# cp -Rp /opt/vfabric/sqlfire/vFabric_SQLFire_101/datadictionary \ /var/opt/vmware/sqlfire/ prompt# cp -p /opt/vfabric/sqlfire/vFabric_SQLFire_101/vf.sf-license.cfg \ /var/opt/vmware/sqlfire/

5. If you conﬁgured disk persistence or overﬂow for tables and you did not specify an e xplicit disk store to use, then you must copy the default disk store from the old SQLFire installation directory to the new directory. For example:

prompt# cp -p /opt/vfabric/sqlfire/vFabric_SQLFire_101/BACKUPSQLF-DEFAULT-DISKSTORE* \ /var/opt/vmware/sqlfire/

6. If you created new disk stores by specifying path relative to the old SQLFire version's working directory, you must copy those disk stores to the same relative directory in the ne w SQLFire installation. F or e xample, if you created a disk store in the previous release using a command similar to:

sqlf> CREATE DISKSTORE store1 'dir1'; then you must copy the dir1 directory to the new SQLFire installation's working directory:

prompt# cp -Rp /opt/vfabric/sqlfire/vFabric_SQLFire_101/dir1 \ /var/opt/vmware/sqlfire/

Note: Disk stores that you created using a path outside of the older SQLFire working directory continue to work without any changes.

If you modiﬁed /etc/sysconfig/sqlfire from the default conﬁguration prior to the upgrade, the yum upgrade process saved your modiﬁcations to the ﬁle /etc/sysconfig/sqlfire.rpmsave . If you want to continue using these modiﬁcations, then you must manually update the new conﬁguration ﬁle (/etc/sysconfig/sqlfire) with the same changes.

If you made license key modiﬁcations to the old sqlfire.properties ﬁle (such as /opt/vfabric/sqlfire/vFabric_SQLFire_101/sqlfire.properties) prior to the upgrade, the yum upgrade process saved your modiﬁcations to the ﬁle /opt/vfabric/sqlfire/vFabric_SQLFire_101/sqlfire.properties.rpmsave. If you want to continue using these modiﬁcations, then you must manually update the new conﬁguration ﬁle with the same changes.

Upgrade Issues

Upgraded SQLFire members cannot participate in a distributed system that contains SQLFire members from an earlier version. Shut down all existing cluster members before restarting the cluster with upgraded SQLFire members.

Thin clients from earlier versions of SQLFire can operate against the latest version of SQLFire. However, thin clients using the latest SQLFire JDBC drivers are not compatible with older versions of SQLFire.

Refer to the Release Notes to learn about changes that were introduced in the latest update.

vFabric SQLFire User's Guide24

Chapter 6

Connect to vFabric SQLFire with JDBC Tools

Third-party JDBC tools can help you browse data in tables, issue SQL commands, design new tables, and so forth. You can conﬁgure these tools to use the SQLFire JDBC thin client driver to connect to a SQLFire distributed system.

Although the instructions for setting up each tool vary, the general process for establishing a connection involves conﬁguring the JDBC client driver and setting JDBC connection URL properties. Follow these basic steps:

In the third-party tool, choose to conﬁgure a new driver, and select the sqlfireclient.jar ﬁle that contains the SQLFire JDBC client driver. This ﬁle is installed in the lib directory of your SQLFire installation.

2. If the tool does not automatically select a driver class, you will generally have the option of selecting a class from

within the JAR ﬁle. For SQLFire, select the com.vmware.sqlﬁre.jdbc.ClientDriver class.

3. In order to use the client driver, you must specify a JDBC connection URL for your SQLFire distributed system. The basic URL format for the client driver is:

jdbc:sqlfire://hostname:port/

where hostname and port correspond to the -client-bind-address and -client-port value of a SQLFire server or locator in your distributed system.

4. Y our tool may require that you specify a username and password to use for connecting to the system. If the SQLFire server or locator enables authentication (using the -auth-provider boot property), then enter a valid username and password combination to connect to the distributed system.

If authentication is disabled, specify "app" as both the username and password values, or any other temporary v alue.

Note: SQLFire uses the username speciﬁed in the JDBC connection as the schema name when you do not provide the schema name for a database object. SQLFire uses "APP" as the default schema. If your system does not enable authentication, you can specify "APP" for both the username and password to maintain consistency with the default schema behavior.

For a full example of conﬁguring SQLFire with a third-party JDBC tool, see Connecting to VMware vF abric SQLFire

using SQuirreL SQL on the SQLFire community site.

Chapter 7

Tutorials

Learn to conﬁgure and use SQLFire features such as table replication and partitioning, persisting data to disk, and dynamically resizing the cluster.

Main Steps

The tutorial is divided into the following steps, which explain how to set up a cluster of SQLFire servers on multiple Java VMs and then distribute data across the cluster. Perform the steps in the sequence shown.

DescriptionStep

Create a SQLFire Cluster on page 27Step 1 Connect to the Cluster Using SQLF on page 29Step 2 Create Replicated Tables and Execute Queries on page 30Step 3 Implement a Partitioning Strategy on page 31Step 4 Persist Tables to Disk on page 34Step 5 Add Servers to the Cluster and Stop Servers on page 35Step 6

Prerequisites

Install SQLFire on your local computer as described in Install vFabric SQLFire from a JAR File on page 13.

Create a SQLFire Cluster

In this procedure you set up and start a cluster of two SQLFire servers.

Procedure

1. Begin by opening a command prompt or terminal window on the computer or VM where you installed SQLFire.

2. Move to the directory in which you installed the SQLFire software. This tutorial uses the e xample directory

~/vFabric_SQLFire_10x, but you should substitute the actual path to your installation. For example: cd ~/vFabric_SQLFire_103

3. The initial SQLFire cluster in this tutorial contains two standalone SQLFire server members. Create a new directory for each server:

mkdir server1 mkdir server2

Each server will use its local directory to write log ﬁles, backup disk store ﬁles, a datadictionary directory for persisting data, and a single status ﬁle, .sqlfserver.ser.

Getting Started with vFabric SQLFire

4. To manage client connections to the available SQLFire server members, the cluster in this tutorial uses a SQLFire locator member. Create a new directory for the locator member:

mkdir locator

A locator maintains a list of available servers in the cluster, and updates that list as servers join and leave the cluster. Locators also load balance client connections across all available servers.

To start or stop a SQLFire locator or a SQLFire server, you use the sqlf script (for Linux platforms) or

sqlf.bat script (for W indows platforms). In either case, you must ﬁrst ensure that the path to the SQLFire bin directory is part of your PATH environment variable. For example, on a Linux platform enter:

export PATH=$PATH:~/vFabric_SQLFire_10x/bin

On a Windows platform enter:

set Path=%Path%;c:\vFabric_SQLFire_10x\bin

When you start a SQLFire distributed system, always begin by starting the locator member. Use the sqlf

locator command to start the locator in the speciﬁed directory: sqlf locator start -dir=locator -peer-discovery-address=ip_address

-peer-discovery-port=10101 \

-client-bind-address=ip_address -client-port=1527 &

Note: In this step and in all subsequent steps, replace ip_address with the IP address of your local system.

The -peer-discovery-address and -peer-discovery-port combination deﬁnes a unique connection that all members of this SQLFire distributed system use for communicating with one another.

Note: Always use a unique -peer-discovery-port number to avoid joining a cluster that is already running on your network. If other people might be evaluating SQLFire on your netw ork, choose a port number other than 10101.

The -client-bind-address and -client-port combination deﬁnes the connection that client applications use to connect to this locator. In this tutorial, all SQLFire members run on the local computer's IP address and use different port numbers to deﬁne unique connections.

You can verify the peer and client connections in the locator startup messages, which are similar to:

Starting SQLFire Locator using peer discovery on: ip_address[10101] Starting network server for SQLFire Locator at address localhost/127.0.0.1[1527] SQLFire Locator pid: 41149 status: running

Note: By starting the locator member ﬁrst, the locator can manage cluster membership from the start as new servers join and leav e the distributed system. The locator member should also be the last process that you shut down when you want to stop a SQLFire distributed system.

Note: As an alternative, SQLFire members can discover each other using multicast messaging. However, important SQLFire features such as WAN replication and user authentication require that a SQLFire system use locators rather than multicast for discovery. See Conﬁguring Discovery

Mechanisms on page 211.

Now use the sqlf server command to start both SQLFire server members and join them to the distributed system:

sqlf server start -dir=server1 -locators=ip_address[10101]

-client-bind-address=ip_address -client-port=1528 &

vFabric SQLFire User's Guide28

sqlf server start -dir=server2 -locators=ip_address[10101]

-client-bind-address=ip_address -client-port=1529 &

In each command, the -locators option deﬁnes the peer discovery address to use for joining the SQLFire distributed system. (Production deployments generally use multiple locator members, in which case you would specify a comma-separated list of locator host[port] connections when starting a server.)

Again, the combination of -client-bind-address and -client-port indicates that each server will listen for thin clients on a unique connection (ip_address:1528 and ip_address:1529, respectively). However, in this distributed system all clients will connect using the locator instead of making direct connections to servers.

Note: Always start SQLFire servers in the background (using "&"). In a working system, a server may need to wait for other cluster members to become available.

8. Both SQLFire servers start in the background and output messages similar to:

Starting SQLFire Server using locators for peer discovery: ip_address[10101] Starting network server for SQLFire Server at address /ip_address[1528] SQLFire Server pid: 41502 status: running Distributed system now has 2 members. Other members: localhost(41149:locator)<v0>:32977/50114 Logs generated in /Users/yozie/vFabric_SQLFire_10x/server1/sqlfserver.log

Tutorials

Starting SQLFire Server using locators for peer discovery: ip_address[10101] Starting network server for SQLFire Server at address /ip_address[1529] SQLFire Server pid: 41533 status: running Distributed system now has 3 members. Other members: localhost(41149:locator)<v0>:32977/50114,

10.0.1.11(41502:datastore)<v1>:49915/50462 Logs generated in /Users/yozie/vFabric_SQLFire_10x/server2/sqlfserver.log

Startup messages show the cluster membership details. By default, new SQLFire servers host data as data stores, and are automatically added to a default server

group. You can optionally specify server group membership at startup using the server-groups boot property .

Connect to the Cluster Using SQLF

sqlf implements a command-line tool that is based on the Apache Derby ij tool. You can use sqlf to connect to a SQLFire cluster and run scripts or interactive queries. You execute sqlf using the sqlf or sqlf.bat script.

Procedure

1. In the same command prompt or terminal Window in which you started the SQLFire cluster, change to the

quickstart directory: cd quickstart

The quickstart directory contains example SQL script ﬁles that you will use later in the tutorial.

Start an interactive sqlf session:

sqlf This command starts the interactive shell and displays the prompt: sqlf>.

Getting Started with vFabric SQLFire

Print a brief list of the available sqlf commands:

help;

T o connect to the SQLFire cluster using the JDBC thin client driver, use the connect client command and specify the host and port number of the SQLFire locator:

connect client 'ip_address:1527';

Notice that SQLFire does not have the concept of a "database". When you connect to a SQLFire cluster, the distributed system that you connect to is deﬁned by the locator (or alternately, the mcast-port) speciﬁed in the JDBC or ADO.NET connection.

5. Use the following command to view tables in the "sys" schema:

show tables in sys;

You will use information from many of these tables to view information and statistics about a working SQLFire cluster.

6. The sys.members table stores information about the SQLFire members. Execute the following query to see the unique ID assigned to the two SQLFire servers and the locator that you started:

select id from sys.members;

You will see output similar to:

-------------------------------

0.0.1.11(41533)<v2>:52614/50508

localhost(41149)<v0>:32977/50114

10.0.1.11(41502)<v1>:49915/50462

3 rows selected

The output displays the member process ID that is also logged when you start the members at the command line (41149, 41502, and 41533 in the tutorial example).

Create Replicated Tables and Execute Queries

By default SQLFire replicates tables to members of the cluster. In this procedure you create new tables that are replicated the SQLFire cluster.

Procedure

In the same sqlf session, run the ToursDB_schema.sql script to create the tables associated with the ToursDB sample database:

run 'ToursDB_schema.sql';

You see DDL output such as:

sqlf> CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION, BUSINESS_LEVEL_FACTOR DOUBLE PRECISION, FIRSTCLASS_LEVEL_FACTOR DOUBLE PRECISION,

vFabric SQLFire User's Guide30

Tutorials

ECONOMY_SEATS INTEGER, BUSINESS_SEATS INTEGER, FIRSTCLASS_SEATS INTEGER ); 0 rows inserted/updated/deleted [...]

Run the loadTables.sql script to populate the tables with data:

run 'loadTables.sql';

The script output completes with:

sqlf> insert into FLIGHTAVAILABILITY values ('US1357',2,'2004-04-18',0,0,3); 1 row inserted/updated/deleted

3. Enter the following command to show the table names that you created (tables in the APP schema):

show tables in APP;

4. The new tables that you create and the data that you load are replicated on the two SQLFire servers by default. You can check whether tables are partitioned or replicated by querying information in sys.systables. Use the following query to check the data policy that SQLFire has assigned to the tables you just created:

select tablename, datapolicy from sys.systables where tableschemaname='APP';

The output shows that each of the ToursDB tables that you created are replicated. SQLFire replicates tables by default if you do not use the PARTITION BY clause in the CREATE TABLE statement.

Partitioning Tables on page 51 and Replicating Tables on page 61 provide more information about creating

tables in SQLFire.

5. You can also use the sys.members table to determine which members host a particular table:

select id from sys.systables s, sys.members where s.tablename='FLIGHTS';

In the tutorial, both of the SQLFire servers store information for each replicated table.

6. SQLFire provides query features similar to those available in other data management products. F or example, the following command executes a simple query:

SELECT city_name, country, language FROM cities WHERE language LIKE '%ese';

The following query executes a join between tables:

SELECT city_name, countries.country, region, language FROM cities, countries WHERE cities.country_iso_code = countries.country_iso_code AND language LIKE '%ese';

Implement a Partitioning Strategy

In this procedure you drop all tables in the ToursDB schema and then recreate them using a new partitioning and replication strategy.

The ToursDB schema in this tutorial is similar to a 'STAR' schema, having only a few fact tables and several dimension tables. Dimension tables are generally small and change infrequently, b ut are commonly used in join queries. Dimension tables are good candidates for replicating across SQLFire members, because join queries can execute in parallel.

Getting Started with vFabric SQLFire

The AIRLINES, CITIES, COUNTRIES, and MAPS tables are treated as dimension tables, and are replicated across the SQLFire cluster. In the tutorial it is assumed that applications frequently join these related tables based on the FLIGHT_ID column, which is chosen as the partitioning column.

FLIGHTS, FLIGHTS_HISTORY, and FLIGHTAVAILABILITY are fact tables, and they will be partitioned. You will co-locate these tables to ensure that all rows that are associated with FLIGHT_ID are maintained in a single partition. This step ensures that frequent join queries based on a selected ﬂight are pruned to a single member and executed efﬁciently.

Procedure

In a separate terminal window or GUI editor, open the create_colocated_schema.sql ﬁle in the vFabric_SQLFire_10x/quickstart directory to examine the included DDL commands. The SQL script begins by dropping the existing tables in the schema:

DROP TABLE AIRLINES; DROP TABLE CITIES; DROP TABLE COUNTRIES; DROP TABLE FLIGHTAVAILABILITY; DROP TABLE FLIGHTS; DROP TABLE MAPS; DROP TABLE FLIGHTS_HISTORY;

Dimension tables can be replicated using the same basic CREATE statement from the preceding section of the tutorial. However, this script explicitly adds the REPLICATE keyword for clarity. For example:

CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION, BUSINESS_LEVEL_FACTOR DOUBLE PRECISION, FIRSTCLASS_LEVEL_FACTOR DOUBLE PRECISION, ECONOMY_SEATS INTEGER, BUSINESS_SEATS INTEGER, FIRSTCLASS_SEATS INTEGER ) REPLICATE;

The FLIGHTS table is partitioned based on the FLIGHT_ID column:

CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3), ARRIVE_TIME TIME, MEAL CHAR(1), FLYING_TIME DOUBLE PRECISION, MILES INTEGER, AIRCRAFT VARCHAR(6), CONSTRAINT FLIGHTS_PK PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER), CONSTRAINT MEAL_CONSTRAINT CHECK (meal IN ('B', 'L', 'D', 'S'))

vFabric SQLFire User's Guide32

Tutorials

) PARTITION BY COLUMN (FLIGHT_ID);

The remaining facts tables are also partitioned, and also colocated with the FLIGHTS table. For example:

CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, BUSINESS_SEATS_TAKEN INTEGER DEFAULT 0, FIRSTCLASS_SEATS_TAKEN INTEGER DEFAULT 0, CONSTRAINT FLIGHTAVAIL_PK PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE), CONSTRAINT FLIGHTS_FK2 Foreign Key ( FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER)

) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);

In the sqlf session, execute the create_colocated_schema.sql script to drop the existing tables and recreate them with the new partitioning and replication strategy . Execute loadTables.sql to populate the tables with data:

run 'create_colocated_schema.sql'; run 'loadTables.sql';

3. Conﬁrm that the tables are created:

show tables in APP;

4. Verify whether individual tables are replicated or partitioned:

select tablename, datapolicy from sys.systables where tableschemaname='APP';

5. The FLIGHTS table and others are now partitioned across the SQLFire cluster . Query the sys.members table again to display the member IDs that host the table:

select id from sys.systables s, sys.members where s.tablename='FLIGHTS';

6. Now use the DSID Function to see how many rows of the partitioned FLIGHT table are stored on that SQLFire server. For example:

select count(*) memberRowCount, dsid() from flights group by dsid();

7. Execute a join on both partition members in parallel.

select * from flights f, flightavailability fa where f.flight_id = fa.flight_id and f.flight_id = 'AA1116';

The combined results are returned. Because the table is partitioned by FLIGHT_ID, the execution of the join is pruned to the partition that stores the value 'AA1116.' You can verify that ﬂight_id 'AA1116' is located on only one data store using the query:

select count(*), dsid() from flights where flight_id = 'AA1116';

Getting Started with vFabric SQLFire

Persist Tables to Disk

At this point, the SQLFire cluster manages the ToursDB tables only in memory. In this procedure you persist table data to disk.

Procedure

1. In a separate terminal window or GUI editor, examine the contents of the create_persistent_schema.sql script. Notice that this script uses the PERSISTENT keyword in each CREATE TABLE statement. For example:

CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL CONSTRAINT COUNTRIES_UNQ_NM Unique, COUNTRY_ISO_CODE CHAR(2) NOT NULL CONSTRAINT COUNTRIES_PK PRIMARY KEY, REGION VARCHAR(26), CONSTRAINT COUNTRIES_UC CHECK (country_ISO_code = upper(country_ISO_code) )

) REPLICATE PERSISTENT;

In the sqlf session, execute the create_persistent_schema.sql script, then load table data:

run 'create_persistent_schema.sql'; run 'loadTables.sql';

Exit the sqlf session:

exit;

4. Now list the contents of each SQLFire server directory:

ls -l ../server1 ../server2

Notice that each SQLFire directory now contains several disk store ﬁles to manage the table data on disk. For example:

BACKUPSQLF-DEFAULT-DISKSTORE.if BACKUPSQLF-DEFAULT-DISKSTORE_1.crf BACKUPSQLF-DEFAULT-DISKSTORE_1.drf

5. Because the data is persisted to disk, SQLFire will recover data from disk even if you shut down the entire cluster. Shutdown the cluster, and then restart the locator and servers:

cd ~/vFabric_SQLFire_10x sqlf shut-down-all -locators=ip_address[10101]

Connecting to distributed system: locators=ip_address[10101] Successfully shut down 3 members

sqlf locator start -dir=locator -peer-discovery-address=ip_address

-peer-discovery-port=10101 \

-client-bind-address=ip_address -client-port=1527 & sqlf server start -dir=server1 -locators=ip_address[10101]

-client-bind-address=ip_address -client-port=1528 & sqlf server start -dir=server2 -locators=ip_address[10101]

-client-bind-address=ip_address -client-port=1529 &

Note: If you are running the tutorial on a W indows system, you will need to start the servers in parallel using two separate command windows.

vFabric SQLFire User's Guide34

When you restart the servers, you may notice messages similar to:

[info 2012/07/24 10:18:11.949 PDT <main> tid=0x1] Region /_DDL_STMTS_META_REGION initialized with data from /10.118.33.206:/Users/yozie/vFabric_SQLFire_103/server1/./datadictionary created at timestamp 1343149012698 version 0 diskStoreId 1fc6a853-69d6-4ffe-8029-218acf165c34 is waiting for the data previously hosted at [/10.118.33.206:/Users/yozie/vFabric_SQLFire_103/server2/./datadictionary created at timestamp 1343149035681 version 0 diskStoreId 49c1e827-4bb5-49cc-951b-221d47bbc92f] to be available

These are not error messages. They indicate that the SQLFire member you are starting is waiting for another member to become available online.

6. Now verify that the persistent tables were reloaded:

sqlf connect client 'ip_address:1527'; select id from sys.systables s, sys.members where s.tablename='FLIGHTS'; select count(*) memberRowCount, dsid() from flights group by dsid();

Add Servers to the Cluster and Stop Servers

SQLFire manages data in a ﬂexible way that enables you to expand or contract your cluster at runtime to support different loads. To dynamically add more capacity to a cluster, you add new server members and specify the

-rebalance option.

Tutorials

Procedure

1. Open a new terminal or command prompt window, and create a directory for the new server. Also set your PATH in the new terminal if necessary:

cd ~/vFabric_SQLFire_10x mkdir server3 export PATH=$PATH:~/vFabric_SQLFire_10x/bin

Start the new server with a unique port number and specify the -rebalance option:

sqlf server start -dir=server3 -locators=ip_address[10101]

-client-bind-address=ip_address -client-port=1530 -rebalance &

3. View the contents of the new SQLFire directory:

ls server3

Notice that the new SQLFire server also persists the available table data, and contains the same disk store ﬁles as the other two servers.

You can view all members of the distributed system using sqlf. In the available sqlf session, e x ecute the query:

select id from sys.members;

5. Verify that all servers now host the data:

select id from sys.systables s, sys.members where s.tablename='FLIGHTS';

6. Examine the table data that each server hosts:

select count(*) memberRowCount, dsid() from flights group by dsid();

Exit the sqlf session:

exit;

Getting Started with vFabric SQLFire

You can stop an individual SQLFire server by using the sqlf server stop command and specifying the server directory. To shut down an entire cluster of servers at once, use the sqlf shut-down-all command:

sqlf shut-down-all -locators=ip_address[10101]

Connecting to distributed system: locators=ip_address[10101] Successfully shut down 4 members

Perform Additional Tasks

After you complete the SQLFire tutorial, you can perform related tutorial tasks to explore other areas of the product.

Explore the toursDB Database

The example toursDB database manages information about air transportation for scheduling ﬂights. If you need quick access to a database to use when trying SQLFire features, toursDB DDL and data script sare available in the quickstart directory. To view the toursDB schema, see ToursDB sample database

Explore SQLFire Language Extensions

The tutorial introduces the basics of partitioning, replication, and perisistence using simple modiﬁcations to the toursDB DDL. See SQL Statements on page 432 to begin learning about other features that are implemented in SQLFire DDL. Y ou can copy and modify the create_colocated_schema.sql as necessary to implement new features.

vFabric SQLFire User's Guide36

Chapter 8

vFabric SQLFire Features and Benefits

The sections that follow summarize main features. The SQLFire community site provides additional information about the features and beneﬁts of SQLFire and a comparison of SQLFire to other data management systems. See also the

SQLFire product page.

In-Memory Data Management with Optimized Disk Persistence

SQLFire enables applications to manage data entirely in memory by using partitioning and synchronous replication to distribute the data across numerous SQLFire members. SQLFire also provides an optimized disk persistence mechanism with a non-ﬂushing algorithm to maintain high performance in applications that require stable, long-term storage. Applications can also use SQLFire to actively cache table data from a traditional disk-based RDBMS.

Continuous Availability, Elastically Scaled, Low Latency

A ﬂexible architecture enables SQLFire to pool memory and disk resources from hundreds of clustered members. This clustered approach provides extremely high throughput, predictable latency, dynamic and linear scalability, and continuous availability of data. By collocating application logic with data and executing application logic in parallel, SQLFire substantially increases application throughput. It also transparently re-executes application logic if a server fails.

Highly Adaptable to Existing Applications

SQLFire is implemented entirely in Java, and it can be embedded directly within a Java application. You can also deploy SQLFire members as standalone servers that participate in a cluster . Jav a applications can connect to a SQLFire cluster using the provided JDBC drivers. Microsoft .NET and Mono applications can connect using the provided ADO.NET driver.

The use of JDBC, ADO.NET, and SQL means that many existing database applications can be easily adapted to use a SQLFire cluster. SQLFire introduces se veral extensions to common SQL Data Deﬁnition Language (DDL) statements to manage data partitioning, replication, synchronization with data sources, and other features. However , most common queries and Data Manipulation Language (DML) statements are based on ANSI SQL-92, so experienced database application developers can use their knowledge of SQL when working with SQLFire.

Part 2

Managing Your Data in vFabric SQLFire

Managing Your Data in vFabric SQLF ir e describes how to design your database schema to take adv antage of replication and partitioning in a distributed SQLFire cluster. You can use this information to design new databases, or to adapt an existing database for use with SQLFire. This guide also describes how to persist your data to disk store ﬁles, and it provides a simple procedure for importing an existing database into SQLFire.

Topics:

• Designing vFabric SQLFire Databases

• Using Server Groups to Manage Data

• Partitioning Tables

• Replicating Tables

• Estimating Memory Requirements

• Using Disk Stores to Persist Data

• Exporting and Importing Data with vFabric SQLFire

• Using Table Functions to Import Data as a SQLFire Tables

Chapter 9

Designing vFabric SQLFire Databases

Unlike in traditional database design, SQLFire requires the designer to analyze the common access patterns and choose a partitioning strategy that results in queries that are pruned to a single partition. In addition, fact tables may need to be replicated in order to make data available for join operations against partitioned data. This chapter describes the basic table design principles that you can use to achieve linear scaling with SQLFire.

Design Principles of Scalable, Partition-Aware Databases

A key design principle for linear scaling is a partitioning strategy that allows most data access (queries) to be pruned to a single partition, thus avoiding expensive locking or latching across multiple partitions during query execution.

In a highly concurrent system having thousands of connections, multiple queries are uniformly spread across the entire data set (and therefore across all partitions), and increasing the number of data stores enables linear scalability. Given sufﬁcient network performance, more connections can be supported without degrading the response time.

Note: SQLFire supports distributed queries by parallelizing the query execution across data stores. However, each query instance on a partition can only join rows that are collocated with the partitioned data. This means that queries can join rows between a partitioned table and any number of replicated tables hosted on the data store with no restrictions. But queries that join multiple, partitioned tables have to be ﬁltered based on the partitioning key. Query examples are provided in this section and in Query

Capabilities and Limitations on page 655.

Identify Entity Groups and Partitioning Keys

In relational database terms, an entity group corresponds to rows that are related to one another through foreign key relationships. Members of an entity group are typically related by parent-child relationships and can be managed in a single partition. To design a SQLFire database for data partitioning, begin by identifying "entity groups" and their associated partitioning keys.

For example:

• In a customer order management system, most transactions operate on data related to a single customer at a

time. Queries frequently join a customer's billing information with their orders and shipping information. For this type of application, you partition related tables using the customer identity. Any customer row along with their "order" and "shipping" rows forms a single entity group having the customer ID as the entity group identity (partitioning key). Partitioning related tables using the customer identity enables you to scale the system linearly as you add more members to support additional customers.

• In a system that manages a comprehensive product catalog (product categories, product speciﬁcations, customer

reviews, rebates, related products, and so forth) most data access focuses on a single product at a time. In such a system, you would partition your data on the product key.

Managing Your Data in vFabric SQLFire

• In an online auction application, you may need to stream incoming auction bids to hundreds of clients with very low latency. To do so, you must manage selected "hot" auctions on a single partition so that they receive sufﬁcient processing power . As the processing demand increases, add more partitions and route the application logic that matches bids to clients to the data store itself.

• In a ﬁnancial trading engine that constantly matches bid prices to asking prices for thousands of securities, partition data using ID of the security. When market data for a security changes, all of the related reference data is co-located with the matching algorithm to ensure low-latency execution.

Life beyond Distributed Transactions provides additional background information about entity groups and

distributed systems.

Creating Partitioned Tables on page 54 describes other ways to partition a table in SQLFire.

Use Entity Groups Instead of "Sharding"

Sharding is a common strategy in traditional databases where database rows are horizontally partitioned into multiple "database shards."

There are a number of differences between application-assisted or system-assisted sharding and SQLFire partition-aware database design:

• Rebalancing: SQLFire rebalances the data automatically by making sure that related rows are migrated together and without any integrity loss. This enables you to add capacity as needed.

• Distributed transactions: SQLFire transaction boundaries are not limited to a single partition. Atomicity and isolation guarantees are provided across the entire distributed system.

• Parallel scatter-gather: Queries that cannot be pruned to a single partition are automatically executed in parallel on data stores. Joins can be performed between tables with the restriction that the joined rows are in fact collocated.

• Subqueries on remote partitions: Even when a query is pruned to a single partition, the query can execute subqueries that operate on data that is stored on remote partitions.

Replicate Code Tables

The "star" schema is the most common relational design pattern. In a star schema, large "fact" tables have foreign key relationships to one or more "code" or "dimension" tables. With SQLFire, you should denormalize code tables into fact tables when possible. When this is not possible, determine whether code tables can be replicated in the SQLFire cluster.

The main goal is to partition the entities in fact tables, but to replicate the rows in slow-changing code tables on each member where the partitions are managed. In this way, a join between the fact table and any number of its code tables can be executed concurrently on each partition, without requiring network hops to other members.

Replicating Tables on page 61 provides more information.

Dealing with Many-to-Many Relationships

Where tables have many-to-many relationships, you have a choice of strategies for handling queries that need to join non-colocated data.

Note: Joins are permitted only on data that is colocated. Query execution can be distributed and ex ecuted in parallel, but the joined rows in each partition member have to be restricted to other rows in the same partition.

For tables have many-to-many relationships, choose one of the following strategies to handle queries that need to join non-colocated data:

vFabric SQLFire User's Guide42

Designing vFabric SQLFire Databases

• Use parallel, data-aware procedures to run the logic for the query on the member that stores some or all of the data (to minimize data distribution hops). The procedure should execute multiple queries and then join the results using application code. Using Data-Aware Stored Procedures on page 143 provides more information.

• Split the query into multiple queries, and perform the join in the application client code.

Example: Adapting a Database Schema for SQLFire

If you have an existing database design that you want to deploy to SQLFire, translate the entity-relationship model into a physical design that is optimized for SQLFire design principles.

The goal is to identify tables to partition or replicate in the SQLFire cluster, and determine the partitioning key(s) for partitioned tables. This generally requires an iterative process to produce the most optimal design.

Procedure

1. Read Guidelines for Adapting a Database to SQLFire on page 43and the preceding sections under Designing

vFabric SQLFire Databases on page 41.

2. Evaluate data access patterns to deﬁne entity groups that are candidates for partitioning. See Step 1: Determine

the entity groups on page 44.

3. Identify all tables in the entity groups. See Step 2: Identify the tables in each entity group on page 44.

4. Identify the "partitioning key" for each partitioned table. The partitioning ke y is the column or set of columns

that are common across a set of related tables. See Step 3: Deﬁne the partitioning key for each group on page

45.

5. Identify the tables that are candidates for replication. You can replicate table data for high availability, or to

co-locate table data that is necessary to execute joins. See Step 4: Identify replicated tables on page 45.

Guidelines for Adapting a Database to SQLFire

Follow these guidelines for designing a SQLFire database or adapting an existing database to SQLFire:

• Focus your efforts on commonly-joined entities. Remember that all join queries must be performed on data that is co-located. In this release, SQLFire only supports joins where the data is co-located. Co-located data is also important for transaction updates, because the transaction can execute without requiring distributed locks in a multi-phase commit protocol.

• After you locate commonly-joined entities, look for parent-child relationships in the joined tables. The primary key of a root entity is generally also the best choice for partitioning key.

• Understand the trade-offs associated with different partitioning and co-location strategies.The steps that follow describe how to evaluate a customer order management system.

This example shows tables from the Microsoft Northwind Traders sample database.

Managing Your Data in vFabric SQLFire

Step 1: Determine the entity groups

Entity groups are generally course-grained entities that have children, grand children, and so forth, and they are commonly used in queries. This example chooses these entity groups:

DescriptionEntity group

Customer

Product

This group uses the customer identity along with orders and order details as the children.

This group uses product details along with the associated supplier information.

Step 2: Identify the tables in each entity group

Identify the tables that belong to each entity group. In this example, entity groups use the following tables.

TablesEntity group

Customer

Customers Orders Shippers

vFabric SQLFire User's Guide44

TablesEntity group

Order Details

Designing vFabric SQLFire Databases

Product

Product Suppliers Category

Step 3: Define the partitioning key for each group

In this example, the partitioning keys are:

Partitioning keyEntity group

CustomerIDCustomer

ProductIDProduct

This example uses customerID as the partitioning key for the Customer group. The customer ro w and all associated orders will be collocated into a single partition. To explicitly colocate Orders with its parent customer row, use the colocate with clause in the create table statement:

create table orders (<column definitions, constraints>) partition by (customerID) colocate with (customers);

Create the OrderDetails table in a similar fashion. In this way, SQLFire supports any queries that join any of Customer, Orders, and OrderDetails. This join query w ould be distributed to all partitions and executed in parallel, with the results streamed back to the client:

select * from customer c , orders o where c.customerID = o.customerID;

A query such as this would be pruned to the single partition that stores "customer100" and executed only on that SQLFire member:

select * from customer c, orders o where c.customerID = o.customerID and c.customerID = 'customer100';

The optimization provided when queries are highly selective comes from engaging the query processor and indexing on a single member rather than on all partitions. With all customer data managed in memory, query response times are very fast. Consider how the above query would execute if the primary key was not used to partition the table. In this case, the query would be routed to each partition member where an index lookup would be performed, even though only a single member might have any data associated with the query.

Finally, consider a case where an application needs to access customer order data for several customers:

select * from customer c, orders o where c.customerID = o.customerID and c.customerID IN ('cust1', 'cust2', 'cust3');

Here, SQLFire prunes the query execution to only those partitions that host 'cust1', 'cust2', and 'cust3'. The union of the results is then returned to the caller.

Step 4: Identify replicated tables

If we assume that the number of categories and suppliers rarely changes, those tables can be replicated in the SQLFire cluster (replicated to all of the SQLFire members that host the entity group). If we assume that the Products table does change often and can be relatively large in size, then partitioning is a better strate gy for that table.

Managing Your Data in vFabric SQLFire

So for the product entity group, table Products is partitioned by ProductID, and the Suppliers and Categories tables are replicated to all of the members where Products is partitioned.

Applications can now join Products, Suppliers and categories. For example:

select * from Products p , Suppliers s, Categories c where c.categoryID = p.categoryID and p.supplierID = s.supplierID and p.productID IN ('someProductKey1', ' someProductKey2', ' someProductKey3');

In the above query, SQLFire prunes the query execution to only those partitions that host 'someProductKey1', ' someProductKey2', and ' someProductKey3.'

vFabric SQLFire User's Guide46

Chapter 10

Using Server Groups to Manage Data

Use server groups to control where table data is stored.

Server Groups Overview

A server group speciﬁes the SQLFire members that will host data for a table. Y ou use a server group to logically group SQLFire data stores for managing a table's data.

Any number of SQLFire members that host data can participate in one or more server groups. You specify named server groups when you start a SQLFire data store.

Managing Your Data in vFabric SQLFire

A SQLFire member that is booted with host-data=false is an accessor and does not host table data, even if you specify one or more server groups. However, peer clients that host data can also participate in server groups.

vFabric SQLFire User's Guide48

Using Server Groups to Manage Data

By default, all servers that host data are added to the "default" server group. Different logical database schema are often managed in different server groups. F or example, an order management

system might manage all customers and their orders in an "Orders" schema deployed to one server group. The same system might manage shipping and logistics data in a different server group. A single peer or server can participate in multiple server groups, typically as a way to colocate related data or to control the number of redundant copies in replicated tables.

With support for dynamic group membership, the number of processes hosting data for a server group can change dynamically. Ho we ver, this dynamic aspect of server group membership is abstracted away from the application developer, who can look at a server group as a single logical server.

Server groups only determine those peers and servers where a table's data is being managed. Tables are always accessible for any peer member of the distributed system, and from thin clients that connect to a single server.

When you invoke server side procedures, you can parallelize the execution of the procedure on all members in the server group. These data-aware procedures also e x ecute on any peer clients that belong to the serv er groups.

Without associating tables to speciﬁc member IP addresses, the capacity of a server group can be dynamically increased or decreased without any impact to existing servers or client applications. SQLFire can automatically rebalance the tables in the server group to the newly added members.

Adding Members to Server Groups

You deﬁne server group membership and/or create a server group when you start a SQLFire member using the server-groups boot property.

Managing Your Data in vFabric SQLFire

For example, if you start a SQLFire server from the command line with sqlf, use the server-groups property to specify the names of one or more server groups that the server should join:

sqlf server start

-server-groups=OrdersDB,OrdersReplicationGrp,DBProcessInstance1

In this example, the SQLFire server participates in three server groups: OrdersDB, OrdersReplicationGrp and DBProcessInstance1. If this is the ﬁrst SQLFire member to deﬁne a named server group, then SQLFire creates the group and adds the new member after startup. If other SQLFire members in the cluster were booted using the same server group, then SQLFire adds this new member to the existing group. If you specify no

-server-groups property, then SQLFire automatically adds the data store member to the default server group.

If you start a SQLFire peer client from within a Java application, specify the server-groups property as part of the JDBC peer client connection string. For example, use the connection URL:

jdbc:sqlfire:;mcast-port=33666;host-data=true;server-groups=OrdersDB,OrdersReplicationGrp,DBProcessInstance1

Starting and Conﬁguring SQLFire Servers on page 215 and Starting SQLFire Servers with the FabricServer Interface on page 101 provide more information about specifying boot properties.

Assigning Tables to Server Groups

When you create a new table, the CREATE TABLE statement can specify the server group to which the table belongs.

A partitioned table is spread across all of the members of the speciﬁed server group. A replicated table is replicated to all members of the server group. See Replicating Tables on page 61.

For example, the following command creates a replicated table on two server groups:

CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL, COUNTRY_ISO_CODE CHAR(2) NOT PRIMARY KEY, REGION VARCHAR(26), ) SERVER GROUPS (OrdersDB, OrdersReplicationGrp)

Tables in SQLFire are replicated by default if you do not specify partitioning. If you do not specify one or more server group names, tables are partitioned or replicated across all members of

the default server group for that schema. This behavior may not be desirable in all situations. For example, if the data in a replicated table changes frequently, the cost of maintaining a copy on each server in the default group may be prohibitive. In this case, the application developer or system administrator can have several members participate in a new, smaller server group to limit the number of replicas.

When two tables are partitioned and colocated, it forces partitions that have the same values for those columns in the two tables to be located on the same member. Colocated tables must belong to at least one common serv er group. As a best practice, you deploy colocated tables on e xactly the same server groups. See P artitioning Tables on page 51.

vFabric SQLFire User's Guide50

Chapter 11

Partitioning Tables

Horizontal partitioning involv es spreading a large data set (many ro ws in a table) across members in a cluster . SQLFire uses a variant of the consistent hash algorithm to ensure that data is uniformly balanced across all members of the target server group.

How Table Partitioning Works

Y ou specify the partitioning strate gy of a table in the PARTITION BY clause of the CREATE TABLE statement. The available strategies include hash-partitioning on each ro w's primary k ey v alue, hash-partitioning on column values other than the primary key, range-partitioning, and list-partitioning.

SQLFire maps each row of a partitioned table to a logical "bucket." The mapping of rows to buckets is based on the partitioning strategy that you specify. For example, with hash-partitioning on the primary key, SQLFire determines the logical bucket by hashing the primary key of the table. Each bucket is assigned to one or more members, depending on the number of copies that you conﬁgure for the table. Conﬁguring a partitioned table with one or more redundant copies of data ensures that partitioned data remains available even if a member is lost.

When members are lost or removed, the buckets are reassigned to ne w members based on load. Losing a member in the cluster never results in re-assigning rows to buckets. You can specify the total number of buckets to use with the BUCKETS Attribute of the CREATE TABLE statement. The default number of buckets is 113.

In SQLFire, all peer servers in a distributed system know which peers host which b uckets, so they can efﬁciently access a row with at most one network hop to the member that hosts the data. Reads or writes to a partitioned table are transparently routed to the server that hosts the row that is the target of the operation. Each peer maintains persistent communication channels to every peer in the cluster.

Managing Your Data in vFabric SQLFire

Figure 2: Partitioned Table Data

Although each bucket is assigned to one or more speciﬁc servers, you can use a procedure to relocate buckets in a running system, in order to improve the utilization of resources across the cluster . See Rebalancing Partitioned

Data on SQLFire Members on page 59.

Understanding Where Data Is Stored

SQLFire uses a table's partitioning column values and the partitioning strategy to calculate routing values (typically integer values). It uses the routing values to determine the "bucket" in which to store the data.

Each bucket is then assigned to a server, or to multiple servers if the partitioned table is conﬁgured to have redundancy . The b uckets are not assigned when the table is started up, b ut occurs lazily when the data is actually put into a bucket. This allows you to start a number of members before populating the table.

vFabric SQLFire User's Guide52

Partitioning Tables

Figure 3: Partitioned Data in Buckets

If you set the redundant-copies for the table to be greater than zero, SQLFire designates one of the copies of each bucket as the primary copy. All writes to the bucket go through the primary copy. This ensures that all copies of the bucket are consistent.

The Group Membership Service (GMS) and distributed locking service ensure that all distributed members hav e a consistent view of primaries and secondaries at any moment in time across the distributed system, regardless of membership changes initiated by the administrator or by failures.

Failure and Redundancy

If you have redundant copies of a partitioned table, you can lose servers without loss of data or interruption of service. When a server fails, SQLFire automatically re-routes any operations that were trying to read or write from the failed member to the surviving members.

In this ﬁgure, M1 is reading table values W and Y. It reads W directly from its local copy and attempts to read Y from M3, which is ofﬂine. The read is automatically retried in another available member.

Managing Your Data in vFabric SQLFire

Creating Partitioned Tables

You create a partitioned table on a set of servers identiﬁed by named server groups (or on the default server group if you do not specify a named server group). Clauses in the CREATE TABLE statement determine how table data is partitioned, colocated, and replicated across the server group.

The full syntax for the CREATE TABLE statement is:

CREATE TABLE table-name { ( { column-definition | table-constraint } [, { column-definition | table-constraint } ] * ) | [( column-name [, column-name ] * ) ]

AS query-expression WITH NO DATA

} [ partitioning_clause | REPLICATE ] [ SERVER GROUPS ( server_group_name [, server_group_name ]*)] [ HUB ( 'hub-name' | ALL ) ] [ ASYNCEVENTLISTENER (async-listener-id) ] [ EVICTION BY {eviction_criterion} EVICTACTION { OVERFLOW | DESTROY } ]

vFabric SQLFire User's Guide54

Partitioning Tables

[ EXPIRE { TABLE | ENTRY } WITH { IDLETIME value | TIMETOLIVE value} ACTION { DESTROY | INVALIDATE } ]* [ PERSISTENT ] [ 'disk-store-name' ] [ ASYNCHRONOUS | SYNCHRONOUS ]

Note: This topic focuses only on the partitioning_clause. The CREATE TABLE reference page describes all of the options in more detail.

The partitioning_clause controls the location and distribution of data in server groups. Using server groups and colocation is important for optimizing queries, and it is essential for cross-table joins. This version of SQLFire does not support cross-table joins for non-colocated data, so you must choose the partitioning clause carefully to enable the joins required by your application.

The partitioning clause can specify column partitioning, range partitioning, list partitioning, or expression partitioning:

{ { PARTITION BY { PRIMARY KEY | COLUMN ( column-name [ , column-name ]* ) } | PARTITION BY RANGE ( column-name ) ( VALUES BETWEEN value AND value [ , VALUES BETWEEN value AND value ]* ) | PARTITION BY LIST ( column-name ) ( VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* ) } [ COLOCATE WITH ( table-name [ , table-name ] * ) ] } [ REDUNDANCY integer-constant ] [ MAXPARTSIZE integer-constant ] [ BUCKETS integer-constant ]

Note: If the table has no primary key , then SQLFire generates a unique row ID that is uses for partitioning the data.

SQLFire supports the partitioning strategies described below.

DescriptionPartitioning strategy

Column partitioning

Range partitioning

The PARTITION BY COLUMN clause deﬁnes a set of column names to use as the basis for partitioning. As a short-cut, you can use PARTITION BY PRIMARY KEY to refer to the table's primary key column(s). SQLFire uses an internal hash function that typically uses the hashCode() method of the underlying Java type for the speciﬁed column. For multiple columns, the internal hash function uses the serialized bytes of the speciﬁed columns to compute the hash.

The PARTITION BY RANGE clause speciﬁes the ranges of a ﬁeld that should be colocated. This ensures the locality of data for range queries and for cross-table joins. The lower limit of the range is inclusive and the upper limit is exclusiv e. It is not necessary for the ranges to cover the whole spectrum of possible values for the ﬁeld. Values that are not covered by the range are automatically partitioned in the server group, but with no guarantee of locality for those values.

Managing Your Data in vFabric SQLFire

DescriptionPartitioning strategy

List partitioning

Expression partitioning

The PARTITION BY LIST clause speciﬁes the set of values of a ﬁeld that should be colocated to optimize queries and to support cross-table joins. It is not necessary to list all of the possible values for the ﬁeld. Any the v alues that are not part of the list are automatically partitioned in the server group, but with no guarantee of locality for those values.

The PARTITION BY clause that includes an expression is a type of hash partitioning that uses the expression to specify the value on which to hash. The expression must only reference ﬁeld names from the table. This allows rows to be colocated based on a function of their values.

Partitioning Examples

You can partition tables by, for example, column (such as customer name), expression, priority ranges, and status.

Partition Based on Columns

This statement creates a table that is partitioned by the "CustomerName" column. All rows with the same CustomerName are guaranteed to be colocated in the same process space. Here, the SERVER GROUPS clause determines the peers and servers that host data for the partitioned table. A server group is a subset of all the peers and servers that host data in the distributed system.

CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) ) PARTITION BY COLUMN ( CustomerName ) SERVER GROUPS ( OrdersDBServers);

Partition Based on Ranges

When you use the PARTITION BY RANGE clause, specify a column with multiple ranges of values to use for partitioning. The following example speciﬁes partitioning based on three ranges of values for the "Priority" column:

CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) ) PARTITION BY RANGE ( Priority ) ( VALUES BETWEEN 1 AND 11, VALUES BETWEEN 11 AND 31,

vFabric SQLFire User's Guide56

Partitioning Tables

VALUES BETWEEN 31 AND 50 );

Partition Based on a List

When you use the P ARTITION BY LIST clause, specify a column name and one or more lists of column values to use for partitioning. The following example partitions the table based on three different lists of values for the "Status" column:

CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) ) PARTITION BY LIST ( Status ) ( VALUES ( 'pending', 'returned' ), VALUES ( 'shipped', 'received' ), VALUES ( 'hold' ) );

Partition Based on an Expression

Expression partitioning partitions a table by evaluating a SQL expression that you supply. For example, the following statement partitions the table based on the month of the OrderDate column, using the MONTH function as the SQL expression:

Colocating Related Rows from Multiple Tables

The COLOCATE WITH clause speciﬁes the tables with which the partitioned table must be colocated.

Note: Tables that are referenced in the COLOCATE WITH clause must exist at the time you create the partitioned table.

When two tables are partitioned on columns and colocated, it forces partitions having the same values for those columns in both tables to be located on the same SQLFire member. F or e xample, with range or list partitioning, any rows that satisfy the range or list are colocated on the same member for all the colocated tables.

When you specify the COLOCATE WITH clause, you must use the PARTITION BY clause to specify partition columns in the target tables in the same order using the same partitioning strategy (for example, with identical

Managing Your Data in vFabric SQLFire

ranges). The columns must also be of the same type, not considering constraints. Any REDUNDANCY or BUCKETS clause must also be the same as the tables with which it is colocated.

Note: In order for two partitioned tables to be colocated, the SER VER GR OUPS clauses in both CREA TE TABLE statements must be identical. In order for two replicated tables to be colocated, both tables must specify the same server groups or one table's server groups must be a subset of the other table's server groups.

For example, if you create the partitioned table, "countries," as follows:

You can colocate another table, "cities," using the command:

CREATE TABLE CITIES ( CITY_ID INTEGER NOT NULL CONSTRAINT CITIES_PK Primary key, CITY_NAME VARCHAR(24) NOT NULL, COUNTRY VARCHAR(26) NOT NULL, AIRPORT VARCHAR(3), LANGUAGE VARCHAR(16), COUNTRY_ISO_CODE CHAR(2) CONSTRAINT COUNTRIES_FK REFERENCES COUNTRIES (COUNTRY_ISO_CODE) ) PARTITION BY COLUMN (COUNTRY_ISO_CODE) COLOCATE WITH (COUNTRIES)

In this example, both "countries" and "cities" are partitioned using the COUNTRY_ISO_CODE column. Rows with the same COUNTRY_ISO_CODE value are colocated on the same SQLFire members.

See the CREATE TABLE reference page for more information.

Making a Partitioned Table Highly Available

Use the REDUNDANCY clause to specify a number of redundant copies of a table for each partition to maintain. Because SQLFire is primarily a memory-based data management system, it is important to use redundancy when

necessary to enable fail-over if a member shuts down or fails. However, keep in mind that maintaining a large number of redundant copies has an adverse impact on performance, network usage, and memory usage. A

REDUNDANCY value of 1 is recommended to maintain a secondary copy of the table data. For example: CREATE TABLE COUNTRIES

( COUNTRY VARCHAR(26) NOT NULL, COUNTRY_ISO_CODE CHAR(2) NOT PRIMARY KEY, REGION VARCHAR(26), ) REDUNDANCY 1

SQLFire attempts to place copies of the same bucket onto hosts that have different IP addresses if possible, to protect against machine failure. Howev er , if there is only one machine a v ailable SQLFire places multiple copies on that machine. Setting the enforce-unique-host boot property prevents SQLFire from ever placing multiple copies on the same machine.

vFabric SQLFire User's Guide58

Partitioning Tables

Set the redundancy-zone boot property to ensure that SQLFire places redundant copies on speciﬁc zones that you deﬁne. For example, to ensure that redundant copies are placed on different racks, set the redundanc y-zone for each machine to the logical name of the rack on which the machine runs.

See Boot PropertiesY ou specify boot properties when starting a SQLFire serv er with the FabricServer API; when

you make the ﬁrst connection to an embedded SQLFire member with the JDBC embedded driver; and when you use the sqlf connect peer command. .

Limiting the Memory Consumption on a Member

Use the MAXPARTSIZE clause of the CREATE TABLE statement to load- balance partitioned data among the available members.

The MAXP AR TSIZE Attribute on page 460 clause speciﬁes the maximum memory for any partition on a SQLFire member.

Rebalancing Partitioned Data on SQLFire Members

You can use rebalancing to dynamically increase or decrease your SQLFire cluster capacity, or to improve the balance of data across the distributed system.

Rebalancing is a SQLFire member operation that affects partitioned tables created in the cluster. Rebalancing performs two tasks:

• If the a partitioned table's redundancy setting is not satisﬁed, rebalancing does what it can to recover redundancy . See Making a Partitioned Table Highly Available on page 58.

• Rebalancing moves the partitioned table's data buckets between host members as needed to establish the best balance of data across the distributed system.

For efﬁciency, when starting multiple members, trigger the rebalance a single time, after you have added all members.

Note: If you have transactions running in your system, be careful in planning your rebalancing operations. Rebalancing may move data between members, which could cause a running transaction to fail with a TransactionDataRebalancedException.

Start a rebalance operation using one of the following options:

• At the command line when you boot a SQLFire server:

sqlf server start -rebalance

• Eexecuting a system procedure in a running SQLFire member:

call sys.rebalance_all_buckets();

This procedure initiates rebalancing of buckets across the entire SQLFire cluster for all partitioned tables.

How Partitioned Table Rebalancing Works

The rebalancing operation runs asynchronously. As a general rule, rebalancing takes place on one partitioned table at a time. For tables that have colocated data,

rebalancing works on the tables as a group, maintaining the data colocation between the tables. You can continue to access partitioned tables while rebalancing is in progress. Queries, DML operations, and

procedure executions continue while data is moving. If a procedure executes on a local data set, you may see a performance degradation if that data moves to another member during procedure execution. Future invocations are routed to the correct member.

Managing Your Data in vFabric SQLFire

For tables that are conﬁgured with expiration based on idle time, the rebalancing operation resets the last accessed time of the table entries on buckets that are moved.

When to Rebalance a Partitioned Table

Y ou typically w ant to trigger rebalancing when overall capacity is increased or reduced through member startup, shut down or failure.

You may also need to rebalance when you use partitioned table redundancy for high availability, and you have conﬁgured your table to not automatically recover redundancy after a SQLFire member fails (the default RECOVERYDELAY setting). In this case, SQLFire only restores redundancy when you invoke a rebalance operation. See Making a Partitioned Table Highly Available on page 58.

Managing Replication Failures

SQLFire uses multiple failure detection algorithms to detect replication problems quickly. SQLFire replication design focuses on consistency, and does not allow suspect members or network-partitioned members to operate in isolation.

Configuring suspect-member Alerts

When any member of the distributed system fails, it is important for other services to detect the loss quickly and transition application clients to other members. An y peer or server in the cluster can detect a problem with another member of the cluster, which initiates "SUSPECT" processing with the membership coordinator . The membership coordinator then determines whether the suspect member should remain in the distributed system or should be removed.

Use the ack-wait-threshold property to conﬁgure how long a SQLFire peer or server waits to recei v e an acknowledgment from other members that are replicating a table's data. The default value is 15 seconds; you specify a value from 0 to 2147483647 seconds. After this period, the replicating peer sends a severe alert w arning to other members in the distributed system, raising a "suspect_member" alert in the cluster.

To conﬁgure how long the cluster waits for this alert to be acknowledged, set the ack-severe-alert-threshold property. The default value is zero, which disables the property.

How Replication Failure Occurs

Failures during replication can occur in the following ways:

• A replica fails before sending an acknowledgment. The most common failure occurs when a member process is terminated during replication. When this occurs,

the TCP connection from all members is terminated, and the membership view is updated quickly to reﬂect the change. The member who initiated replication continues replicating to other members.

If instead of terminating, the process stays alive (but f ails to respond) the initiating member waits for a period of time and then raises an alert with the distributed system membership coordinator . The membership coordinator evaluates the health of the suspect member based on heartbeats and health reports from other members in the distributed system. The coordinator may decide to evict the member from the distrib uted system, in which case it communicates this change in the membership view to all members. At this point, the member that initiated replication proceeds and completes replication using available peers and servers. In addition, clients connected to this member are automatically re-routed to other members.

• An "Owning" member fails. If the designated owner of data for a certain key fails, the system automatically chooses another replica to

become the owner for the key range that the failed member managed. The updating thread is blocked while this transfer takes place. If at least one replica is available, the operations alw ays succeeds from the application's viewpoint.

vFabric SQLFire User's Guide60

Chapter 12

Replicating Tables

SQLFire server groups control which SQLFire data store members replicate the table's data. SQLFire replicates table data both when a new table is initialized in a cluster and when replicated tables are updated.

How SQLFire Replicates Tables

SQLFire replicates every single row of a replicated table synchronously to each table replica in the tar get serv er group(s). W ith synchronous replication, table reads can be e v enly balanced to any replica in the cluster, with no single member becoming a contention point.

SQLFire replicates data to all peers in the server groups where the table was created. Replication occurs in parallel to other peers over a TCP channel.

SQLFire replicates table data both when a new table is initialized in a cluster and when replicated tables are updated.

Replication at Initialization

When a non-persistent ("memory only") replicated table is created in a peer or server, it initializes itself using data from another member of the server group that hosts the table. A single data store member in the server group is chosen, and data for the table is streamed from that member to initialize the new replica. If the selected member fails during the replication process, the initializing process selects a different member in the server group to stream the data.

Replication During Updates

When an application updates a replicated table, SQLFire distributes the update to all replicas in parallel, utilizing the network bandwidth between individual members of the cluster . The sending peer or server locks the updated row locally, and then distributes the update to the replicas. After each of the replicas processes the update and responds with an acknowledgment message, the originating SQLFire peer returns control to the application. The update process is conservative in order to ensure that all copies of a replicated table contain consistent data. Each receiver processes the update entirely, applying constraint checks if necessary and updating the local table data, before responding with an acknowledgment. Each operation on a single row key is performed atomically per replica, regardless of how many columns are being read or updated.

Deciding When to Use Replicated Tables

Code tables are often good candidates for replication. Application data is frequently normalized to maintain "code" ﬁelds in "fact" tables, and to maintain the details

associated with each "code" in an associated "dimension" table. Code tables are often small and change infrequently, but they are frequently joined with their parent "fact" table in queries. Code tables of this sort are good candidates for using replicated tables.

Managing Your Data in vFabric SQLFire

Also note that this version of SQLFire supports joins only on co-located data. Instead of using partitioning in all cases, you should consider having applications replicate smaller tables that are joined with other partitioned tables.

Note: If multiple applications update the same row of a replicated table at the same time outside of a transaction, the table data can become out of sync when SQLFire replicates those updates. Keep this limitation in mind when using replicated tables.

Creating Replicated Tables

You can create replicated tables explicitly or by default, using CREATE TABLE statement. SQLFire creates a replicated table by default when you execute a CREATE TABLE on page 449 statement and

you do not include a PARTITIONING clause. You can also explicitly create a replicated table using the

REPLICATE clause in the CREATE TABLE statement. For example: CREATE TABLE COUNTRIES

( COUNTRY VARCHAR(26) NOT NULL CONSTRAINT COUNTRIES_UNQ_NM Unique, COUNTRY_ISO_CODE CHAR(2) NOT NULL CONSTRAINT COUNTRIES_PK PRIMARY KEY, REGION VARCHAR(26), CONSTRAINT COUNTRIES_UC CHECK (country_ISO_code = upper(country_ISO_code) ) ) REPLICATE;

Because this command omits the SERVER GROUPS clause, the example creates the 'countries' table and replicates it on members of the cluster that host data (all peers and servers in the default server group that set the host-data property to true).

vFabric SQLFire User's Guide62

Chapter 13

Estimating Memory Requirements

Designing a SQLFire database also involves estimating the memory requirements for your data based on the size of the actual table values and indexes, the o verhead that SQLFire requires for your data, and the o v erall usage pattern for your data. You can estimate the memory requirements for tables using general guidelines for SQLFire overhead. Or, you can load tables with representative data and then query the SQLFire SYS.MEMORYANAL YTICS table to obtain details about the memory required for individual tables and indexes.

Estimating SQLFire Overhead

SQLFire requires different amounts of overhead per table and index entry depending on whether you persist table data or conﬁgure tables for overﬂow to disk. Add these overhead ﬁgures to the estimated size of each table or index entry to provide a rough estimate for your data memory requirements. If you already have representati ve data, use the SQLFire Java agent to query the SYS.MEMORYANALYTICS table to obtain a more accurate picture of the memory required to store your data.

Note: All o verhead values are approximate. Be sure to validate your estimates in a test environment with representative data.

Table 1: Approximate Overhead for SQLFire Table Entries

Approximate overheadOverflow is configured?Table is persisted?

64 bytesNoNo 120 bytesNoYes 152 bytesYesYes

Table 2: Approximate Overhead for SQLFire Index Entries

Approximate overheadType of index entry

80 bytesNew index entry 24 bytesFirst non-unique index entry 8 bytes to 24 bytes*Subsequent non-unique index entry

*If there are more than 100 entries for a single index entry, the overhead per entry increases from 8 bytes to approximately 24 bytes.

Managing Your Data in vFabric SQLFire

Viewing Memory Usage in SYS.MEMORYANALYTICS

SQLFire includes instrumentation to display the memory used by individual tables and indexes in a SQLFire member. You can view this memory usage information by starting a Java agent process when you boot SQLFire members, and then querying the SYS.MEMORYANALYTICS virtual table from a client connection.

Enabling SYS.MEMORYANALYTICS

Follow this procedure to enable the SYS.MEMORYANALYTICS virtual table.

Prerequisites

• Create an evaluation SQLFire distributed system to determine the memory footprint for your data. Do not enable memory analytics on a production system.

• Y ou must ha ve representati ve table data and inde x entries in order to accurately ev aluate the memory footprint. Create the necessary SQL scripts to automate creating your schema and loading data into your tables.

• Consider using fewer SQLFire members in the evaluation system to simplify the evaluation process.

• If your SQLFire distributed system uses locators for member discovery, be sure to enable the Java agent on

the locator as well as the data store members. Although a locator does not generally host data, you can use the locator data to compare data memory footprints with non-data memory footprints.

Procedure

Follow these steps to start a SQLFire member with the Java agent to provide memory analytics:

Use the -javaagent: jar_path Jav a system property to specify the sqlfire.jar ﬁle in your installation when you start each SQLFire member. For example, if you use sqlf to start a SQLFire server:

sqlf server start -client-address=1527

-J-javaagent:c:\vFabric_SQLFire_10x\lib\sqlfire.jar

Specify the complete path and ﬁlename of sqlfire.jar for your system. If you use a locator for member discovery, also use the -javaagent: jar_path system property when

starting the locator.

2. If your schema and data are not already available in the SQLFire system (as persisted data), run any necessary

SQL scripts to create the schema and load table data. For example:

cd c:\vFabric_SQLFire_10x\quickstart sqlf sqlf> connect client 'localhost:1527'; sqlf> run 'create_colocated_schema.sql'; sqlf> run 'loadTables.sql';

3. Connect to SQLFire and query the SYS.MEMORYANAL YTICS table to view the memory usage information:

sqlf sqlf> connect client 'localhost:1527'; sqlf> select * from sys.memoryanalytics; SQLENTITY |ID |MEMORY

---------------------------------------------------------------APP.FLIGHTS (Entry Size, Value Size, Row Count) |dyozie-e4310(6880)<v0>:3439/59731 |30352,31436,542

AAP.FLIGHTS.DESTINDEX (Index Entry Overhead, SkipList Size, Max& |dyozie-e4310(6880)<v0>:3439/59731 |2104,696,3 (2803 = 2.74 kb)

vFabric SQLFire User's Guide64

Estimating Memory Requirements

APP.FLIGHTS.DESTINDEX (Index Entry Size, Value Size, Row Count) |dyozie-e4310(6880)<v0>:3439/59731 |4888,3784,87

[...]

Understanding Table and Index Values

Querying the SYS.MEMORYANALYTICS table provides run-time information about the tables and indexes available in SQLFire.

Table Values

Each table has a single row in SYS.MEMORYANALYTICS identitﬁed with the SQLENTITY format: schema_name.table_name (Entry Size, Value Size, Row Count). The ID column value displays the values separated by commas.

DescriptionTable Value

Entry Size

Value Size

Row Count

The per-entry overhead, in bytes. (This excludes the Value Size below.)

The total size of a table row, in bytes. (This includes the Entry Size overhead.)

The total number of rows stored on the local SQLFire member. For a partitioned table, this includes all buckets for the table, as well as primary and secondary replicas.

For example, the following row from SYS.MEMORYANALYTICS shows that the APP.FLIGHTS table has 30352 bytes of overhead per row, with 542 total rows:

SQLENTITY |ID |MEMORY

---------------------------------------------------------------APP.FLIGHTS (Entry Size, Value Size, Row Count) |dyozie-e4310(6880)<v0>:3439/59731 |30352,31436,542

Index Values

Each index has two rows in SYS.MEMORYANALYTICS. The ﬁrst row uses the SQLENTITY format: schema_name.table_name.index_name (Index Entry Overhead, SkipList Size, Max Level). This row provides details about concurrent skip lists for the index. The ID column value displays these v alues separated by commas:

DescriptionIndex Value

Index Entry Overhead

SkipList Size

The number of linked list objects that the index uses for storing the entry.

The number of linked list objects that the index uses for skipping values to expedite searches.

The total number of skip lists that are currently available for lookups.Max Level

Managing Your Data in vFabric SQLFire

For example, the following row from SYS.MEMORYANALYTICS shows that APP.FLIGHTS.DESTINDEX uses 2803 bytes for skip lists:

SQLENTITY |ID |MEMORY

----------------------------------------------------------------

APP.FLIGHTS.DESTINDEX (Index Entry Overhead, SkipList Size, Max& |dyozie-e4310(6880)<v0>:3439/59731 |2104,696,3 (2803 = 2.74 kb)

The second row for an index uses the SQLENTITY format: schema_name.table_name.inde x_name (Index Entry Size, Value Size, Row Count). This row provides information similar to the values for a table entry. The ID column value displays these values separated by commas:

DescriptionIndex Value

Index Entry Size

Value Size

The per-index overhead, in bytes. (This includes the Index Entry Overhead from the previous table, but excludes the Value Size below.)

The size of the data structure used to point to the underlying region entries, in bytes. For a unique index, this corresponds to the size of a pointer reference.

The total number of entries in the index.Row Count

For example, the following row from SYS.MEMORYANALYTICS shows that APP.FLIGHTS.DESTINDEX has 4888 bytes of overhead with 87 index entries:

SQLENTITY |ID |MEMORY

----------------------------------------------------------------

APP.FLIGHTS.DESTINDEX (Index Entry Size, Value Size, Row Count) |dyozie-e4310(6880)<v0>:3439/59731 |4888,3784,87

Displaying the Total Memory Footprint

You can query SYS.MEMORYANALYTICS with an optmizer hint to display the complete memory footprint of tables and indexes.

Use the sizerHints=withMemoryFootPrint hint with your query to display the memory footprint:

SELECT * FROM sys.memoryAnalytics -- SQLFIRE-PROPERTIES sizerHints=withMemoryFootPrint;

When you include this optimizer hint, the query displays an additional row for each table and index, summarizing the memory footprint for the SQLFire, GemFire, and other components. For example:

SQLENTITY |ID |MEMORY

----------------------------------------------------------------

[...]

APP.FLIGHTS (sqlfire,gemfire,others) |dyozie-e4310(6880)<v0>:3439/59731 |12912,299736,680704 (993352 = 970.07 kb)

APP.FLIGHTS.DESTINDEX (sqlfire,gemfire,others)

vFabric SQLFire User's Guide66

Estimating Memory Requirements

|dyozie-e4310(6880)<v0>:3439/59731 |5072,0,3288 (8360 = 8.16 kb)

[...]

Chapter 14

Using Disk Stores to Persist Data

You can persist table data to disk as a backup to the in-memory copy, or ov erﬂo w table data to disk when memory use gets too high.

Overview of Disk Stores

The two disk store options, overﬂow and persistence, can be used individually or together. Overﬂow uses disk stores as an extension of in-memory table management for both partitioned and replicated tables. Persistence stores a redundant copy of all table data managed in a peer.

See Evicting Table Data from SQLFire on page 179 for more information about conﬁguring tables to overﬂow to disk.

Data Types for Disk Storage

Disk storage is available for these data types:

• Table data. Persist and/or overﬂow table data managed in SQLFire peers.

• Gateway sender queues. Persist gateway sender queues for high availability in a WAN deployment. These

queues always overﬂow.

• AsyncEventListener and DBSynchronizer queues. Persist these queues for high availability. These queues

always overﬂow, and can be persistent.

Creating Disk Stores and Using the Default Disk Store

Y ou create named disk stores in the data dictionary using the CREATE DISKSTORE DDL statement. Individual tables can then specify named disk stores in their CREATE TABLE DDL statements to use the stores for persistence and/or overﬂow. You can store data from multiple tables and queues in the same named disk store. See Guidelines for Designing Disk Stores on page 74.

T ables that do not name a disk store b ut specify persistence or ov erﬂo w in their CREATE TABLE statement use the default disk store. The location of the default diskstore is determined by the value of the sys-disk-dir boot property. The default disk store is named SQLF-DEFAULT-DISKSTORE.

Gateway sender queues, AsyncEventListener queues, and DBSynchronizer queues can also be conﬁgured to use a named disk store. The default disk store is used if you do not specify a named disk store when creating the queue. See CREATE GATEWAYSENDER on page 443 or CREATE ASYNCEVENTLISTENER on page 436.

Peer Client Considerations for Persistent Data

Peer clients (clients started using the host-data=false property) do not use disk stores and can never persist the SQLFire data dictionary. Instead, peer clients rely on other data stores or locators in the distributed system for persisting data. If you use a peer client to execute DDL statements that require persistence and there are no

Managing Your Data in vFabric SQLFire

data stores available in the distributed system, SQLFire throws a data store unavailable exception (SQLState: X0Z08).

You must start locators and data stores before starting peer clients in your distributed system. If you start a peer client as the ﬁrst member of a distributed system, the client initializes an empty data dictionary for the distributed system as a whole. Any subsequent datastore that attempts to join the system conﬂicts with the empty data dictionary and fails to start with a ConﬂictingPersistentDataException.

Shared-Nothing Disk Store Design

Individual SQLFire peers that host table data manage their own disk store ﬁles, completely separate from the disk stores ﬁles of any other member . When you create a disk store, you can deﬁne certain properties that specify where and how each SQLFire peer should manages disk store ﬁles on their local ﬁlesystem.

SQLFire supports persistence for replicated and partitioned tables. The disk store mechanism is designed for locally-attached disks. Each peer manages its own local disk store, and does not share any disk artifacts with other members of the cluster. This shared nothing design eliminates the process-lev el contention that is normally associated with traditional clustered databases. Disk stores use rolling, append-only log ﬁles to avoid disk seeks completely. No comple x B-Tree data structures are stored on disk; instead SQLFire al ways assumes that complex query navigation is performed using in-memory indexes.

Disk stores also support the SQLFire data rebalancing model. When you increase or decrease capacity by adding or removing peers in a cluster, the disk data also relocates itself as necessary.

What SQLFire Writes to the Disk Store

For each disk store, SQLFire stores detailed information about related members and tables. SQLFire stores these items in each disk store:

• List of members that host the store and information on their status, such as running or ofﬂine and time stamps.

• List of tables that use the disk store.

For each table in the disk store, SQLFire stores:

• Conﬁguration attributes pertaining to loading and capacity management, used to load the data quickly on startup.

• Table DML operations.

Disk Store State

Disk store access and management differs according to whether the store is online or ofﬂine. When a member shuts down, its disk stores go ofﬂine. When the member starts up again, its disk stores come

back online in the SQLFire cluster.

• Online, a disk store is owned and managed by its member process.

• Ofﬂine, the disk store is simply a collection of ﬁles in your host ﬁle system. The ﬁles are open to access by

anyone with the right ﬁle system permissions. You can copy the ﬁles using your ﬁle system commands, for backup or to move your member’s disk store location. You can also run maintenance operations on the ofﬂine disk store, like ﬁle compaction and validation using the sqlf utility.

Note: The ﬁles for a disk store are used by SQLFire as a group. Treat them as a single entity. If you copy them, copy them all together. Do not change the ﬁle names or permissions.

When a disk store is ofﬂine, its data is unavailable to the SQLFire distributed system. For partitioned tables, the data is split between multiple members, so you can access the ofﬂine data only if you store replicas of the partitioned table on other members of the cluster.

vFabric SQLFire User's Guide70

Using Disk Stores to Persist Data

Disk Store Directories

When you create a disk store, optionally specify the location of directories where sqlf stores persistence-related ﬁles.

sqlf generates disk store artifacts in a directory that it chooses in the following order:

1. If you provide absolute directory paths, sqlf uses the paths as-is. You must ensure that the proper directory structure exists.

2. If you provide relative directory paths, or you do not specify an y directories, then the path resolution is done in this order:

If the sys-disk-dir boot property speciﬁes a directory, the disk store path is resolv ed relati v e to that directory or is used as the directory for persistence.

If sys-disk-dir is not set, SQLFire uses the directory from which the system was started (the current directory) for path resolution, or it uses that directory as the persistence directory.

Disk Store Persistence Attributes

SQLFire persists data on disk stores in synchronous or asynchronous mode. You conﬁgure the persistence mode for a table in the CREATE TABLE statement, while attributes to control

asynchronous persistence are conﬁgured in the CREATE DISKSTOREstatement. In synchronous mode, SQLFire writes each DML operation to the OS buffer as part of the statement ex ecution.

This mode provides greater reliability than asynchronous mode, but with lower performance. In asynchronous mode, SQLFire batches DML statements before ﬂushing them to the OS buffers. This is f aster

than synchronous mode, but batch operations may be lost if a failure occurs. (You can use redundancy to ensure that updates are successfully logged on another machine.) In asynchronous mode, you can control the frequency of ﬂushing batches by setting the following attributes when you create a named disk store:

• QUEUESIZE sets the number of affected rows that can be asynchronously queued. After this number of

pending rows are queued, operations begin blocking until some of the modiﬁed, created, or deleted rows are

ﬂushed to disk.

• TIMEINTERVAL sets the number of milliseconds that can elapse before queued data is ﬂushed to disk.

See CREATE DISKSTORE on page 438

Note: Always start all SQLFire servers and peer processes that host data on disk in parallel. A SQLFire process with persistent data may wait for other processes to startup ﬁrst to guarantee consistency.

Always use the shutdown-all command to gracefully shut down a cluster. This allows each member to reach a consistent replication state and record that state with other replicas in the cluster. When you restart peers after a graceful shutdown, each member can recover in parallel without waiting for others to pro vide consistent replication data.

Disk Store File Names and Extensions

Disk store ﬁles include store management and access control ﬁles and the operation log, or oplog, ﬁles, consisting of one ﬁle for deletions and another for all other operations.

The next table describe ﬁle names and extensions; they are followed by example disk store ﬁles.

File Names

File names have three parts.

Managing Your Data in vFabric SQLFire

First Part of File Name: Usage Identiﬁer

ExamplesUsageUsage Identifier Values

OVERFLOW

only.

BACKUP

Oplog data from persistent and persistent+overﬂow tables and queues.

Access control - locking the disk store.DRLK_IF

Second Part of File Name: Disk Store Name

Non-default disk stores.<disk store name>

SQLF-DEFAULT-DISKSTORE

Default disk store name, used when persistence or overﬂow are speciﬁed on a table or queue but no disk store is named.

OVERFLOWoverﬂowDS1_1.crfOplog data from overﬂow tables and queues

BACKUPoverﬂowDS1.if, BACKUPSQLF-DEFAULT-DISKSTORE.if

DRLK_IFoverﬂowDS1.lk, DRLK_IFSQLF-DEFAULT-DISKSTORE.lk

ExamplesUsageDisk Store Name Values

name=”OVERFLOWDS1” DRLK_IFOVERFLOWDS1.lk, name=”PERSISTDS1” BACKUPPERSISTDS1_1.crf

Note: SQLFire internally converts and uses disk store names in upper case characters, even if you specify lower case characters in the DDL statement.

DRLK_IFSQLF-DEFAULT-DISKSTORE.lk, BACKUPSQLF-DEFAUL T-DISKSTORE_1.crf

SQLF-DD-DISKSTORE

dictionary.

Third Part of File Name: oplog Sequence Number

Sequence number in the format _n

Oplog data ﬁles only. Numbering starts with

File Extensions

Disk store metadataif

Disk store access controllk

crf

Oplog: create, update, and invalidate operations

BACKUPSQLF-DD-DISKSTORE_1.crfDefault disk store for persisting the data

ExamplesUsageoplog Sequence Number

OVERFLOWoverﬂowDS1_1.crf, BACKUPpersistDS1_2.crf, BACKUPpersistDS1_3.crf

NotesUsageFile Extension Values

Stored in the ﬁrst disk-dir listed for the store. Negligible size - not considered in size control.

Pre-allocated 90% of the total max-oplog-size at creation.

vFabric SQLFire User's Guide72

Using Disk Stores to Persist Data

NotesUsageFile Extension Values

Oplog: delete operationsdrf

Pre-allocated 10% of the total max-oplog-size at creation.

Disk Store Operation Logs

At creation, each operation log is initialized at the disk store's MAXLOGSIZE value, with the size divided between the crf and drf ﬁles. SQLFire only truncates the unused space on a clean shutdown (for example, sqlf server stop or sqlf shut-down-all).

When an operation log is full, SQLFire automatically closes it and creates a new log with the next sequence number. This is called oplog rolling.

Note: Log compaction can change the names of the disk store ﬁles. File number sequencing is usually altered, with some existing logs removed or replaced by newer logs with higher numbering. SQLFire always starts a new log at a number higher than any existing number.

The system rotates through all available disk directories to write its logs. The next log is always started in a directory that has not reached its conﬁgured capacity, if one exists.

When Disk Store oplogs Reach the Configured Disk Capacity

If no directory exists that is within its capacity limits, how SQLFire handles this depends on whether automatic compaction is enabled.

• If A UT OCOMPACT is enabled (set to 'true), SQLFire creates a new oplog in one of the directories, going o ver

the limit, and logs a warning that reports:

Even though the configured directory size limit has been exceeded a new oplog will be created. The current limit is of XXX. The current space used in the directory is YYY.

Note: When auto-compaction is enabled, directory sizes do not limit how much disk space is used. SQLFire performs auto-compaction, which should free space, but the system may go over the conﬁgured disk limits.

• If auto-compaction is disabled, SQLFire does not create a new oplog. DML operations to tables block, and

SQLFire logs the error:

Disk is full and rolling is disabled. No space can be created

Factors Contributing to High Disk Throughput

SQLFire disk store design contains several factors that contribute to very high disk throughput. They include pooling, avoiding seeks, and buffered logging.

Pooling

Each SQLFire member manages its own disk store, and no disk contention exists between processes. Each partition can manage its data on local disks.

If the application "write" load can be uniformly balanced across the cluster, the aggregate disk throughput is equal to the disk transfer rate time the number of partitions, assuming a single disk per partition. Disk transfer rates can be up to 100MB/sec on commodity machines today.

Managing Your Data in vFabric SQLFire

Avoiding Seeks

Random access on disk, which causes disk heads to seek for hundreds of concurrent thread, is probably the single biggest reason why traditional databases do not scale. Average disk seek times today are still 2ms or higher. SQLFire manages most (or all) of the data in cluster memory, and all reads are serv ed without navigating through BTree-based indexes and data ﬁles on disk. This is the case when data is persistent. Note, however, that in overﬂow-only mode, data ﬁles on disk are accessed as necessary.

Buffered Logging

When writes do occur, SQLFire simply logs the operations to disk in "append-only" oplog ﬁles. By appending, SQLFire can continuously write to consecutive sectors on disk without requiring disk head mov ement. SQLFire ﬂushes writes to the OS buffer rather than 'fsync' all the way to disk. The writes are buf fered by the IO subsystem in the kernel, which allows the IO scheduler to merge and sort disk writes to achieve the highest possible disk throughput. Write requests need not initiate any disk I/O until some time in the future.

Thus, from the perspective of a user application, write requests stream at much higher speeds, unencumbered by the performance of the disk. Risk of data loss due to sudden failures at a hardware level are mitigated by having multiple members writing in parallel to disk. In fact, it is assumed that hardware will fail, especially in large clusters and data centers, and that software needs to take these failures into account. The SQLFire system is designed to recover in parallel from disk and to guarantee data consistency when data copies on disk do not agree with each other. Each member of the distributed system logs membership changes to its persistent ﬁles and uses this information during recovery to determine the replica that has the latest changes, and members automatically synchronize these changes at startup.

Guidelines for Designing Disk Stores

W ork with your system designers and de velopers to plan for disk storage requirements in testing and production database systems. Work with host system administrators to determine where to place your disk store directories on each peer computer.

vFabric SQLFire User's Guide74

Using Disk Stores to Persist Data

Consider these guidelines when designing disk stores:

• Tables can be overﬂowed, persisted, or both. For efﬁciency, place table data that is overﬂowed on one disk

store with a dedicated physical disk. Place table data that is persisted, or persisted and overﬂowed, on another disk store with on a different physical disk.

For example, gateway sender, AsyncEventListener, and DBSynchronizer queues are always overﬂowed and may be persisted. Assign them to overﬂow disk stores if you do not persist, and to persistence disk stores if you do. Ensure that each disk store resides on a separate physical disk, for best performance.

• When calculating your disk requirements, consider your table modiﬁcation patterns and compaction strategy.

SQLFire creates each oplog ﬁle at the speciﬁed MAXLOGSIZE. Obsolete DML operations are only removed from the oplogs during compaction, so you need enough space to store all operations that are done between compactions. For tables where you are doing a mix of updates and deletes, if you use automatic compaction, a good upper bound for the required disk space is

(1 / (1 - (compaction_threshold/100)) ) * data size

where data size is the total size of all the table data you store in the disk store. So, for the default COMPACTIONTHRESHOLD of 50, the disk space is roughly twice your data size. The compaction thread could lag behind other operations, causing disk use to rise above the threshold temporarily. If you disable automatic compaction, the amount of disk required depends on how many obsolete operations accumulate between manual compactions.

• Based on your anticipated disk storage requirements and the available disks on your host systems:

• Make sure the new storage does not interfere with other processes that use disk on your systems. If possible,

store your ﬁles to disks that are not used by other processes, including virtual memory or swap space. If you have multiple disks available, for the best performance, place one directory on each disk.

• Use different directories for different peers that run on the same computer. You can use any number of directories for a single disk store.

• Choose disk store names that reﬂect how the stores should be used and that work for your operating systems. Disk store names are used in the disk ﬁle names:

• Use disk store names that satisfy the ﬁle naming requirements for your operating system. For example, if you

store your data to disk in a Windows system, your disk store names could not contain any of these reserved characters, < > : " / \ | ? *.

• Do not use very long disk store names. The full ﬁle names must ﬁt within your operating system limits. On

Linux, for example, the standard limitation is 255 characters.

• Create each disk store with CREATE DISKSTORE before you create persistent or overﬂow tables.

SQLFire peers in the cluster manage their local disk stores using the properties you specify in the CREATE

DISKSTORE statement.

After you create named disk stores, you can create tables that persist or overﬂow their data to disk stores.

Creating a Disk Store or Using the Default

You can create a disk store for persistence and/or overﬂow or use the default disk store. Data from multiple tables can be stored in the same disk store.

Managing Your Data in vFabric SQLFire

Using the Default Disk Store

T ables that do not name a disk store b ut specify persistence or o v erﬂo w in their CREATE TABLE statement are automatically assigned to the default disk store, SQLF-DEFAULT-DISKSTORE. Also, gateway, AsyncEventListener, and DBSynchronizer queues always use the default disk store. The default diskstore is saved to the SQLFire data store's working directory, unless you change the value of the sys-disk-dir boot property to specify another location.

Create a Disk Store

You create a named disk store in the data dictionary using the CREATE DISKSTORE DDL statement. You then assign the disk store to an individual table by specifying the disk store in the table's CREATE TABLE DDL statement. You can store data from multiple tables and queues in the same named disk store.

Persist Table Data to a Disk Store

You conﬁgure the persistence settings for a partitioned or replicated table when you create the table with the CREATE TABLE DDL statement. SQLFire automatically recovers data from disk for persistent tables when you restart SQLFire members.

Procedure

1. Ensure that the data dictionary is persisted in your SQLFire cluster. SQLFire persists the data dictionary by default for all data stores, but you can explicitly enable or disable data dictionary persistence using the persist-dd boot property.

Note: All SQLFire data stores in the same cluster must use a consistent persist-dd value. Accessors cannot persist data, and you cannot set persist-dd to true for an accessor.

2. Create the disk store that you want to use for persisting the table's data, or use the default disk store. See

Creating a Disk Store or Using the Default on page 75.

3. Specify table persistence and the named disk store in the CREATE TABLE statement. For example:

CREATE TABLE Orders(OrderId INT NOT NULL,ItemId INT ) persistent 'OrdersDiskStore' asynchronous

This example uses asynchronous writes to persist table data to the "OrdersDiskStore."

Note: Persistent tables must be associated with a disk store. If you do not specify a named disk store in the CREATE TABLE statement, SQLFire persists the table to the default disk store. For example, the following statement persists the new table to the default disk store:

CREATE TABLE Orders(OrderId INT NOT NULL,ItemId INT ) persistent asynchronous

Note: When you specify asynchronous persistence, asynchronous writes to the disk store use certain disk store attributes. See Disk Store Persistence Attributes on page 71.

Optimizing Availability and Performance

Be aware of what you can do to optimize availability and performance of disk stores. Consider these guidelines:

• When you start your system, start all the members that have persistent tables in parallel. Create and use startup scripts for consistency and completeness.

•

Shut down your system using the sqlf shut-down-all command. This is an ordered shutdown that positions your disk stores for a faster startup.

• Decide on a ﬁle compaction policy and, if needed, develop procedures to monitor your ﬁles and execute re gular compaction.

• Decide on a backup strategy for your disk stores and follow it. You can back up by copying the ﬁles while the system is ofﬂine, or you can back up an online system using the sqlf backup command.

• If you drop or alter any persistent table while your disk store is ofﬂine, consider synchronizing the tables in your disk stores.

Starting System with Disk Stores

When you shut down a member that persists data, the data remains in the disk store ﬁles, av ailable to be reloaded when the member starts up again. Keep in mind that peer clients are dependent on locators or data store members to persist data, as they cannot persist data on their own.

The following sections explain what happens during startup and shutdown:

• Shutdown: Most Recent Data from the Last Run on page 77

• Startup Process on page 77

• Example Startup Scenarios on page 78

Shutdown: Most Recent Data from the Last Run

If more than one member has the same persistent table or queue, the last member to exit leaves the most up-to-date data on disk.

SQLFire stores information on member exit order in the disk stores, so it can start your members with the most recent data set:

• For a persistent replicated table, the last member to exit leaves the most recent data on disk.

• For a partitioned table, where the data is split into buckets, the last member to exist that hosts a particular buck et

leaves the most recent data on disk for that bucket.

Note: Peer clients rely on data stores for persistence. See Peer Client Considerations for Persistent Data on page 76.

Startup Process

When you start a member with disk stores, the stores are loaded back into memory to initialize the member’s table data.

Note: Peer clients rely on data stores for persistence. See Peer Client Considerations for Persistent Data on page 76.

If the member does not hold all of the most recent data in the system:

1. The member does not immediately join the server group, but waits for the member with the most recent data.

Managing Your Data in vFabric SQLFire

If your log level is info or below, the system provides messaging about the wait. Here, the disk store for hostA has the most recent data and the hostB member is waiting for it.

[info 2010/04/09 10:48:26.039 PDT CacheRunner <main> tid=0x1] Region /persistent_PR initialized with data from /10.80.10.64:/export/straw3/users/jpearson/GemFireTesting/hostB/ backupDirectory created at timestamp 1270834766425 version 0 is waiting for the data previously hosted at [/10.80.10.64:/export/straw3/users/jpearson/GemFireTesting/hostA/ backupDirectory created at timestamp 1270834763353 version 0] to be available

During normal startup you can expect to see some waiting messages.

2. When the most recent data is available, the system updates the local tables as needed, logs a message like this, and continues with startup.

[info 2010/04/09 10:52:13.010 PDT CacheRunner <main> tid=0x1] Done waiting for the remote data to be available.

Each member’s persistent tables load and go online as quickly as possible, not waiting unnecessarily for other members to complete. For performance reasons, if both primary and secondary buckets are persisted, data is made available when the primary buckets are loaded without waiting for the secondary buckets to load. The secondary buckets will load asynchronously.

Example Startup Scenarios

• Stop order for a replicated, persistent table:

1. Member A (MA) exits ﬁrst, leaving persisted data on disk for TableP.

2. Member B (MB) continues to run DML operations on TableP, which update its disk store and leaves the disk store for MA in a stale condition.

3. MB exits, leaving the most up-to-date data on disk for Table P.

• Restart order Scenario 1:

1. MB is started ﬁrst. SQLFire recognizes MB as having the most recent disk data for TableP and initializes it from disk.

2. MA is started, recovers its data from disk, and updates it as needed from the data in MB.

• Restart order Scenario 2:

1. MA is started ﬁrst. SQLFire recognizes that MA does not have the most recent disk store data and waits for MB to start before creating TableP in MA.

2. MB is started. SQLFire recognizes MB as having the most recent disk data for TableP and initializes it from disk.

3. MA recovers its TableP data from disk and updates it as needed from the data in MB.

Start a System with Disk Stores

When you start a SQLFire cluster with disk stores, it is recommended that you start all members with persisted data at the same time.

Procedure

1. Start members with persisted data at the same time.

When members with persistent data boot, they determine which have the most recent table data, and they initialize their caches using that data. If you do not start persistent data stores in parallel, some members may hang while waiting for other members to start.

vFabric SQLFire User's Guide78

Using Disk Stores to Persist Data

The following example bash script starts members in parallel. The script waits for the startup to ﬁnish and exits with an error status if one of the jobs fails.

#!/bin/bash ssh servera "cd /my/directory; sqlf server start & ssh serverb "cd /my/directory; sqlf server start &

STATUS=0; for job in `jobs -p` do echo $job wait $job; JOB_STATUS=$?; test $STATUS -eq 0 && STATUS=$JOB_STATUS; done exit $STATUS;

2. Respond to any member startup hangs. If a disk store with the most recent table data does not come online,

other members wait indeﬁnitely rather than come online with stale data. Check for missing disk stores with the sqlf list-missing-disk-stores command. See Handling Missing Disk Stores on page 85.

a. If no disk stores are missing, your peer initialization may be slow for some other reason. Check to see if

other members are hanging or fail to start.

b. If disk stores are missing that you think should be there:

a. Make sure you have started the member. Check the logs for any failure messages. b. Make sure your disk store ﬁles are accessible. If you have moved your member or disk store ﬁles,

you must update your disk store conﬁguration to match.

c. If disk stores are missing that you know are lost, because you have deleted them or their ﬁles are otherwise

unavailable, revoke them so the startup can continue. See Handling Missing Disk Stores on page 85.

Disk Store Management

The sqlf command-line tool has several options for examining and managing your disk stores. The sqlf tool, along with the CREATE DISKSTORE statement, are your management tools for online and ofﬂine disk stores.

Note: Each of these commands operates either on the online disk stores or ofﬂine disk stores, but not both.

sqlf Command

Offline Command

validate-disk-store compact-all-disk-stores compact-disk-store

See . . .Online or

Validate a Disk Store on page 80Off

Compacting Disk Store Log Files on page 80On

Compacting Disk Store Log Files on page 80Off

backup list-missing-disk-stores revoke-missing-disk-store

Backing Up and Restoring Online Disk Stores on page 83On

Handling Missing Disk Stores on page 85On

Managing Your Data in vFabric SQLFire

For the complete command syntax of any sqlf command, run sqlf <command> --help at the command line.

Online sqlf Disk Store Operations

For online operations, sqlf connects to a distributed system and sends the operation requests to the members that have disk stores. These commands do not run on ofﬂine disk stores. You must provide the command with a distributed system properties in a sqlfire.properties ﬁle, or specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number).

Offline sqlf Disk Store Operations

For ofﬂine operations, sqlf runs the command against the speciﬁed disk store and its speciﬁed directories. You must specify all directories for the disk store.

Ofﬂine operations will not run on online disk stores. The tool locks the disk store while it is running, so the member cannot start in the middle of an operation.

If you try to run an ofﬂine command for an online disk store, you get a message like this:

ERROR: Operation "validate-disk-store" failed because: disk-store=ds1: com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: ds1: Could not lock "hostA/ds1dir1/DRLK_IFds1.lk". Other JVMs might have created

diskstore with same name using the same directory., caused by java.io.IOException: The file "hostA/ds1dir1/DRLK_IFds1.lk" is being used by another process.

Validate a Disk Store

The sqlf validate-disk-store command veriﬁes the health of your ofﬂine disk store. It gives you information about the tables in it, the total rows, and the number of records that would be removed if you compacted the store.

When to use this command:

• Before compacting an ofﬂine disk store to help decide whether it’s worth doing.

• Before restoring a disk store.

• Any time you want to be sure the disk store is in good shape.

Example

sqlf validate-disk-store ds1 hostB/bupDirectory /partitioned_table entryCount=6 bucketCount=10 Disk store contains 1 compactable records. Total number of table entries in this disk store is: 6

Compacting Disk Store Log Files

You can conﬁgure automatic compaction for an operation log based on percentage of garbage content. You can also request compaction manually for online and ofﬂine disk stores.

The following topics deal with compaction:

• How Compaction Works on page 80

• Online Compaction Diagram on page 81

• Run Online Compaction on page 81

• Run Ofﬂine Compaction on page 82

• Performance Beneﬁts of Manual Compaction on page 82

• Directory Size Limits on page 82

vFabric SQLFire User's Guide80

Using Disk Stores to Persist Data

• Example Compaction Run on page 82

How Compaction Works

When a DML operation is added to a disk store, any preexisting operation record for the same record becomes obsolete, and SQLFire marks it as garbage. For example, when you update a record, the update operation is added to the store. If you delete the record later, the delete operation is added and the update operation becomes garbage. SQLFire does not remove garbage records as it goes, but it tracks the percentage of garbage in each operation log, and provides mechanisms for removing garbage to compact your log ﬁles.

SQLFire compacts an old operation log by copying all non-garbage records into the current log and discarding the old ﬁles. As with logging, oplogs are rolled as needed during compaction to stay within the MAXLOGSIZE setting.

You can conﬁgure the system to automatically compact any closed operation log when its garbage content reaches a certain percentage. You can also manually request compaction for online and ofﬂine disk stores. For the online disk store, the current operation log is not available for compaction, no matter how much garbage it contains.

Online Compaction Diagram

Ofﬂine compaction runs essentially in the same way, but without the incoming DML operations. Also, because there is no current open log, the compaction creates a new one to get started.

Run Online Compaction

Old log ﬁles become eligible for online compaction when their garbage content surpasses a conﬁgured percentage of the total ﬁle. A record is garbage when its operation is superseded by a more recent operation for the same record. During compaction, the non-garbage records are added to the current log along with new DML operations. Online compaction does not block current system operations.

•

Run automatic compaction. When AUTOCOMPACT is true, SQLFire automatically compacts each oplog when its garbage content surpasses the COMPACTIONTHRESHOLD. Automatic compaction takes cycles from your other operations, so you may want to disable it and only do manual compaction, to control the timing.

• Run manual compaction. To run manual compaction:

Managing Your Data in vFabric SQLFire

•

Set the disk store attribute ALLOWFORCECOMPACTION to true. This causes SQLFire to maintain extra data about the ﬁles so that it can compact on demand. This is disabled by default to save space. You can run manual online compaction at any time while the system is running. Oplogs eligible for compaction based on the COMPACTIONTHRESHOLD are compacted into the current oplog.

• Run manual compaction as needed. You can compact all online disk stores in a distributed system from the

command-line. For example:

sqlf compact-all-disk-stores

Note: This sqlf command requires a local sqlfire.properties ﬁle that contains properties to locate the distributed system. Or, specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number).

Run Offline Compaction

Ofﬂine compaction is a manual process. All log ﬁles are compacted as much as possible, regardless of how much garbage they hold. Ofﬂine compaction creates new log ﬁles for the compacted log records.

Use this syntax to compact individual ofﬂine disk stores:

sqlf compact-disk-store myDiskStoreName /firstDir /secondDir

-maxOplogSize=maxMegabytesForOplog

You must provide all of the directories in the disk store. If no oplog max size is speciﬁed, SQLFire uses the system default.

Ofﬂine compaction can take a lot of memory. If you get a java.lang.OutOfMemory error while running this, you made need to increase your heap size. See the sqlf command help for instructions on how to do this.

Performance Benefits of Manual Compaction

You can improve performance during busy times if you disable automatic compaction and run your own manual compaction during lighter system load or during downtimes. You could run the API call after your application performs a large set of data operations. You could run sqlf compact-all-disk-stores every night when system use is very low.

To follow a strategy like this, you need to set aside enough disk space to accommodate all non-compacted disk data. You might need to increase system monitoring to make sure you do not overrun your disk space. You may be able to run only ofﬂine compaction. If so, you can set ALLOWFORCECOMPACTION to false and avoid storing the information required for manual online compaction.

Directory Size Limits

If you reach the disk directory size limits during compaction:

• For automatic compaction, the system logs a warning, but does not stop.

•

For manual compaction, the operation stops and returns a DiskAccessException to the calling process, reporting that the system has run out of disk space.

Example Compaction Run

In this example ofﬂine compaction run listing, the disk store compaction had nothing to do in the *_3.* ﬁles, so they were left alone. The *_4.* ﬁles had garbage records, so the oplog from them was compacted into the new *_5.* ﬁles.

bash-2.05$ ls -ltra backupDirectory total 28

-rw-rw-r-- 1 jpearson users 3 Apr 7 14:56 BACKUPds1_3.drf

vFabric SQLFire User's Guide82

Using Disk Stores to Persist Data

-rw-rw-r-- 1 jpearson users 25 Apr 7 14:56 BACKUPds1_3.crf drwxrwxr-x 3 jpearson users 1024 Apr 7 15:02 ..

-rw-rw-r-- 1 jpearson users 7085 Apr 7 15:06 BACKUPds1.if

-rw-rw-r-- 1 jpearson users 18 Apr 7 15:07 BACKUPds1_4.drf

-rw-rw-r-- 1 jpearson users 1070 Apr 7 15:07 BACKUPds1_4.crf drwxrwxr-x 2 jpearson users 512 Apr 7 15:07 .

bash-2.05$ sqlf validate-disk-store ds1 backupDirectory /root: entryCount=6 /partitioned_region entryCount=1 bucketCount=10 Disk store contains 12 compactable records. Total number of region entries in this disk store is: 7

bash-2.05$ sqlf compact-disk-store ds1 backupDirectory Offline compaction removed 12 records. Total number of region entries in this disk store is: 7

bash-2.05$ ls -ltra backupDirectory total 16

-rw-rw-r-- 1 jpearson users 3 Apr 7 14:56 BACKUPds1_3.drf

-rw-rw-r-- 1 jpearson users 25 Apr 7 14:56 BACKUPds1_3.crf drwxrwxr-x 3 jpearson users 1024 Apr 7 15:02 ..

-rw-rw-r-- 1 jpearson users 0 Apr 7 15:08 BACKUPds1_5.drf

-rw-rw-r-- 1 jpearson users 638 Apr 7 15:08 BACKUPds1_5.crf

-rw-rw-r-- 1 jpearson users 2788 Apr 7 15:08 BACKUPds1.if drwxrwxr-x 2 jpearson users 512 Apr 7 15:09 . bash-2.05$

Backing Up and Restoring Online Disk Stores

When you invok e the sqlf backup command, SQLFire backs up disk stores for all members that are running in the distributed system at that time. Each member with persistent data creates a backup of its own conﬁguration and disk stores.

• Backup Guidelines and Prerequisites on page 83

• Back Up an Online Disk Store on page 84

• What the Online Backup Saves on page 84

• Restore an Online Backup on page 84

Backup Guidelines and Prerequisites

• Run the backup during a period of low activity in your system. The backup does not block any activities in the distributed system, but it does use ﬁle system resources on all hosts in your distributed system and can affect performance.

• Optionally, compact your disk store before running the backup. See Compacting Disk Store Log Files on page

80.

•

Only use the sqlf backup command to create backup ﬁles from a running distributed system. Do not try to create backup ﬁles from a running system using ﬁle copy commands. Y ou will get incomplete and unusable copies.

• Back up to a directory that all members can access. Make sure the directory exists and has the proper permissions for your members to write to it and create subdirectories.

•

Make sure there is a sqlfire.properties ﬁle for the distributed system in the directory where you will run the sqlf command, or specify the multicast port or locator properties to connect to the cluster (for example,

-mcast-port= port_number). The command will back up all disk stores in the speciﬁed distributed system.

• The directory you specify for backup can be used multiple times. Each backup ﬁrst creates a top level directory for the backup, under the directory you specify, identiﬁed to the minute.

Managing Your Data in vFabric SQLFire

You can specify a directory by one of two methods; the command in the procedure below uses the ﬁrst method.

• Use a single physical location, such as a network ﬁle server. Example:

/export/fileServerDirectory/sqlfBackupLocation

• Use a directory that is local to all host machines in the system. Example:

./sqlfBackupLocation

Back Up an Online Disk Store

1. Run the backup command, providing your backup directory location. Example:

sqlf backup /export/fileServerDirectory/gemfireBackupLocation

-locators=warsaw.vmware.com[26340]

2. Read the message that reports on the success of the operation. If the operation is successful, you see a message like this:

Connecting to distributed system: locators=warsaw.vmware.com[26340] The following disk stores were backed up: DiskStore at hosta.vmware.com /home/dsmith/dir1 DiskStore at hostb.vmware.com /home/dsmith/dir2 Backup successful.

If the operation does not succeed at backing up all known members, you see a message like this:

Connecting to distributed system: locators=warsaw.vmware.com[26357] The following disk stores were backed up: DiskStore at hosta.vmware.com /home/dsmith/dir1 DiskStore at hostb.vmware.com /home/dsmith/dir2 The backup may be incomplete. The following disk stores are not online: DiskStore at hostc.vmware.com /home/dsmith/dir3

A member that fails to complete its backup is noted in this ending status message and leaves the ﬁle INCOMPLETE_BACKUP in its highest level backup directory. Ofﬂine members leave nothing, so you only have this message from the backup operation itself. Although ofﬂine members cannot back up their disk stores, a complete backup can be obtained if at least one copy of the data is available in a running member.

What the Online Backup Saves

For each member with persistent data, the backup includes:

• Disk store ﬁles for all stores containing persistent tables.

•

Conﬁguration ﬁles from the member startup (sqlfire.properties). These conﬁguration ﬁles are not automatically restored, to avoid interfering with any more recent conﬁgurations. In particular, if these are extracted from a master jar ﬁle, copying the separate ﬁles into your working area could override the ﬁles in the jar.

• A restore script, written for the member’s operating system, that copies the ﬁles back to their original locations.

For example, in Windows, the ﬁle is restore.bat and in Linux, it is restore.sh.

Restore an Online Backup

The restore script included in the online backup copies ﬁles back to their original locations. You can do this manually if you wish.

Prerequisites

Your members are ofﬂine and the system is down.

vFabric SQLFire User's Guide84

VMware vFabric SQLFire - 1.0 User's Guide

Specifications and Main Features

Frequently Asked Questions

User Manual

Contents

About the SQLFire User's Guide

Supported Configurations and System Requirements

Getting Started with vFabric SQLFire

Overview of vFabric SQLFire

Data Flow Diagram

GemFire, Apache Derby, and SQL Components

Understanding the SQLFire Distributed System

vFabric SQLFire Members

Servers, Peer Clients, and Server Groups

Discovery Mechanisms

Group Membership Service

Replicated Tables and Partitioned Tables

Parallel Execution of Data-Aware Stored Procedures

Cache Plug-ins for External Data Connections

Installing vFabric SQLFire

Installation Note for vFabric Suite Customers

RHEL Only: Install vFabric SQLFire from the VMWare YUM Repository

Install vFabric SQLFire from a JAR File

Activating vFabric SQLFire Licensing

Understand vFabric SQLFire License Options

Choose a License Option Based on Topology

How vFabric SQLFire Manages Licensing

Install and Configure vFabric SQLFire Licenses

Verify Your License and Check Your License Usage

Upgrading vFabric SQLFire

RHEL Only: Upgrade vFabric SQLFire from the VMWare YUM Repository

Upgrade Issues

Connect to vFabric SQLFire with JDBC Tools

Tutorials

Main Steps

Create a SQLFire Cluster

Connect to the Cluster Using SQLF

Create Replicated Tables and Execute Queries

Implement a Partitioning Strategy

Persist Tables to Disk

Add Servers to the Cluster and Stop Servers

Perform Additional Tasks

vFabric SQLFire Features and Benefits

Managing Your Data in vFabric SQLFire

Designing vFabric SQLFire Databases

Design Principles of Scalable, Partition-Aware Databases

Identify Entity Groups and Partitioning Keys

Use Entity Groups Instead of "Sharding"

Replicate Code Tables

Dealing with Many-to-Many Relationships

Example: Adapting a Database Schema for SQLFire

Using Server Groups to Manage Data

Server Groups Overview

Adding Members to Server Groups

Assigning Tables to Server Groups

Partitioning Tables

How Table Partitioning Works

Understanding Where Data Is Stored

Failure and Redundancy

Creating Partitioned Tables

Partitioning Examples

Colocating Related Rows from Multiple Tables

Making a Partitioned Table Highly Available

Limiting the Memory Consumption on a Member

Rebalancing Partitioned Data on SQLFire Members

Managing Replication Failures

Replicating Tables

How SQLFire Replicates Tables

Deciding When to Use Replicated Tables

Creating Replicated Tables

Estimating Memory Requirements

Estimating SQLFire Overhead

Viewing Memory Usage in SYS.MEMORYANALYTICS

Enabling SYS.MEMORYANALYTICS

Understanding Table and Index Values

Displaying the Total Memory Footprint

Using Disk Stores to Persist Data

Overview of Disk Stores

What SQLFire Writes to the Disk Store

Disk Store State