Business objects DATA INTEGRATOR 6.5.1 CORE TUTORIAL User Manual

Data Integrator Core Tutorial

Data Integrator

Core Tutorial

Version 6.5.1

COPYRIGHT No part of the computer software or this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from Business Objects. The information in this document is subject to change without notice. If you find any problems with this documentation, please report them to Business Objects in writing at documentation@businessobjects.com. Business Objects does not warrant that this document is error free.

TRADEMARKS The Business Objects logo, BusinessObjects and Rapid Marts are registered trademarks of Business Objects S.A. in the United States and/or other countries.

Microsoft, Windows, Windows NT, Access, and other names of Microsoft products referenced herein are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Oracle is a registered trademark of Oracle Corporation. All other names of Oracle products referenced herein are trademarks or registered trademarks of Oracle Corporation.

DETAIL is a trademark of Striva Technology Ltd. SAP, R/3, BW, ALE/WEB and ABAP/4 are the registered or unregistered trademarks of SAP AG. PeopleSoft is a registered trademark of PeopleSoft, Inc. All other product, brand, and company names mentioned herein are the trademarks of their respective owners.

USE RESTRICTIONS This software and documentation is commercial computer software under Federal Acquisition regulations, and is provided only under the Restricted Rights of the Federal Acquisition Regulations applicable to commercial computer software provided at private expense. The use, duplication, or disclosure by the U.S. Government is subject to restrictions set forth in subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at 252.227- 7013.

Document Number DI-651-0108-001

June 30, 2004

Data Integrator Core Tutorial

Chapter 1 Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Audience and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Tutorial objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Data Integrator product documentation. . . . . . . . . . . . . . . . . . . . . . . . . 5

Tutorial prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Preparation for this tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Environment required. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Tutorial setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Create repository, source, and target databases on an existing

RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Install Data Integrator on your computer . . . . . . . . . . . . . . . . . 9

Run SQL scripts to create sample source and target tables . . 11

Data Integrator tutorial structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

To exit the tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Chapter 2 Data Integrator Product Overview. . . . . . . . . . . . . . . . . . . . . . .17

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

What does Data Integrator do?. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Data Integrator components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

How do you use Data Integrator?. . . . . . . . . . . . . . . . . . . . . . . . . . 21

Data Integrator system configurations . . . . . . . . . . . . . . . . . . . . . . 22

Windows implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . .23

UNIX implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

The Data Integrator Designer window . . . . . . . . . . . . . . . . . . . . . . . . . 24

Data Integrator objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Object hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Projects and jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

Data Integrator Core Tutorial iii

Contents

Work flows and data flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Object-naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

New terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 3 Defining Source and Target Metadata in Data Integrator .33

Logging in to the repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Defining a datastore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Importing metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Defining a file format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

New terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . . 49

Chapter 4 Populating the SalesOrg Dimension from a Flat File. . . . . . 51

Objects and their hierarchical relationships. . . . . . . . . . . . . . . . . . . . . 52

Adding a new project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Adding a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Adding a work flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

About data flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

Adding a data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Defining the data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Validating the data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Addressing errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Saving the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65

Executing the job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

About deleting objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

New terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . . 72

Chapter 5 Populating the Time Dimension Using a Transform . . . . . . . 73

Retrieving the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Adding the job and data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75

Defining the time dimension data flow . . . . . . . . . . . . . . . . . . . . . . . . 76

Saving and executing the job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . . 81

Chapter 6 Populating the Customer Dimension from a Relational

Ta b l e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83

Adding the CustDim job and work flow. . . . . . . . . . . . . . . . . . . . . . . . 84

Adding the CustDim data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

iv Data Integrator Core Tutorial

Contents

Defining the CustDim data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Validating the CustDim data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

Executing the CustDim job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Using the interactive debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . . 94

Chapter 7 Populating the Material Dimension from an XML File . . . . 95

Adding the MtrlDim job, work flow, and data flow. . . . . . . . . . . . . . . 96

Importing a document type definition . . . . . . . . . . . . . . . . . . . . . . . . .97

Defining the MtrlDim data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98

Validating the MtrlDim data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Executing the MtrlDim job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . 104

Chapter 8 Populating the Sales Fact Table from Relational Tables . . 105

Adding the SalesFact job, work flow, and data flow. . . . . . . . . . . . . . 106

Defining the SalesFact data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Validating the SalesFact data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Executing the SalesFact job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Using metadata reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Configuring Data Integrator to enable metadata reports . . . . . . 113

Viewing metadata reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . 119

Chapter 9 Changed Data Capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Exercise overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122

Adding and defining the initial-load job . . . . . . . . . . . . . . . . . . . . . .123

Adding the job and defining global variables . . . . . . . . . . . . . . . 123

Adding and defining the work flow . . . . . . . . . . . . . . . . . . . . . . .124

Defining the set and update time scripts . . . . . . . . . . . . . . .124

Defining the data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Adding and defining the delta-load job . . . . . . . . . . . . . . . . . . . . . . .128

Executing the jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . .133

Chapter 10 Recovery Mechanisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135

Creating a recoverable work flow manually . . . . . . . . . . . . . . . . . . . .136

Adding the job and defining local variables. . . . . . . . . . . . . . . . . . . . 137

Specifying a recoverable job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Creating the script that determines the status . . . . . . . . . . . . . . . 138

Data Integrator Core Tutorial v

Contents

Defining the recoverable data flow with a conditional. . . . . . . . 139

Adding the script that updates the status. . . . . . . . . . . . . . . . . . . 142

Specifying job execution order . . . . . . . . . . . . . . . . . . . . . . . . . . .143

Status of the exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Executing the job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145

Data Integrator automated recovery properties . . . . . . . . . . . . . . . . .146

Summary and what to do next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147

Chapter 11 Multiuser Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150

Exercise overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153

Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Creating a central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . .154

Creating two local repositories . . . . . . . . . . . . . . . . . . . . . . . . . . .155

Defining connections to the central repository . . . . . . . . . . . . . .156

Working in a multiuser environment . . . . . . . . . . . . . . . . . . . . . . . . . 158

Importing objects into your local repository. . . . . . . . . . . . . . . .158

Activating a connection to the central repository . . . . . . . . . . . .160

Adding objects to the central repository . . . . . . . . . . . . . . . . . . . 161

Checking out objects from the central repository . . . . . . . . . . . .164

Checking in objects to the central repository . . . . . . . . . . . . . . .166

Undoing check out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .167

Comparing objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168

Checking out without replacement . . . . . . . . . . . . . . . . . . . . . . .170

Getting objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173

Using filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174

Deleting objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175

Chapter summary and what to do next . . . . . . . . . . . . . . . . . . . . . . .177

Chapter 12 Extracting SAP R/3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .179

Defining an SAP R/3 datastore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180

Importing metadata for SAP R/3 source tables. . . . . . . . . . . . . . . . . .182

Repopulating the customer dimension table . . . . . . . . . . . . . . . . . . . 184

Adding the SAP_CustDim job, work flow, and data flow . . . . . .184

Defining the SAP_CustDim data flow . . . . . . . . . . . . . . . . . . . . .184

Defining the R/3 data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Validating the SAP_CustDim data flow . . . . . . . . . . . . . . . . . . . . 191

Executing the SAP_CustDim job . . . . . . . . . . . . . . . . . . . . . . . . . 191

About ABAP job execution errors . . . . . . . . . . . . . . . . . . . . . . . . .192

Repopulating the material dimension table . . . . . . . . . . . . . . . . . . . .193

Adding the SAP_MtrlDim job, work flow, and data flow . . . . . . 193

vi Data Integrator Core Tutorial

Contents

Defining the SAP_MtrlDim data flow . . . . . . . . . . . . . . . . . . . . .193

Defining the R/3 data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Executing the SAP_MtrlDim job. . . . . . . . . . . . . . . . . . . . . . . . . . 199

Repopulating the SalesFact table . . . . . . . . . . . . . . . . . . . . . . . . . . . . .200

Adding the SAP_SalesFact job, work flow, and data flow . . . . . . 200

Defining the DF_SAP_SalesFact data flow . . . . . . . . . . . . . . . . . . 200

Defining the R/3 data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Validating the SAP_SalesFact data flow and executing the job . .208

New Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Chapter summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Chapter 13 Running a Real-time Job in Test Mode. . . . . . . . . . . . . . . . . . . 211

Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Appendix A Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .249

Data Integrator Core Tutorial vii

Contents

viii Data Integrator Core Tutorial

Data Integrator Core Tutorial

Before You Begin

Welcome to the Data Integrator Core Tutorial.

The exercises in this tutorial introduce concepts and techniques to extract, transform and load batch data from flat-file and relational database sources for use in a data warehouse. Use this tutorial to obtain practical experience using Data Integrator components including the Designer, repository and Job Server.

This chapter covers the following:

• Introduction

• Audience and assumptions

• Tutorial objectives

• Data Integrator product documentation

• Tutorial prerequisites

• Data Integrator tutorial structure

• To exit the tutorial

Data Integrator Core Tutorial 1

Before You Begin

Introduction

This tutorial introduces core features of Data Integrator. Data Integrator is a component of the Business Objects Business Intelligence suite that allows you to extract and integrate data for analytical reporting and e-business.

While exercises in this tutorial focus on how to use batch data flows, you can also use Data Integrator to develop real-time data extraction and integration as described in Chapter 13, “Running

a Real-time Job in Test Mode”.

The Business Objects Data Integration platform also provides a number of Rapid Marts, which are predefined data models with built-in jobs provided by Data Integrator for use with business intelligence (BI) and online analytical processing (OLAP) tools. Contact your sales representative for more information about Rapid Marts.

2 Data Integrator Core Tutorial

Audience and assumptions

This tutorial assumes that:

• You are an application developer or database administrator

working on data extraction, data warehousing, or data

integration.

• You understand your source data systems, DBMS, business

intelligence, and e-business messaging concepts.

• You understand your organization’s data needs.

• You are familiar with SQL (Structured Query Language).

• You are familiar with Microsoft Windows NT or Windows

2000.

Before You Begin

Audience and assumptions

Data Integrator Core Tutorial 3

Before You Begin

Tutorial objectives

The intent of this tutorial is to introduce core Data Integrator Designer functionality.

After completing this tutorial you should be able to:

• Describe the process for extracting, transforming, and

loading data using Data Integrator

• Identify Data Integrator objects

• Define Data Integrator objects to extract flat-file and

relational data from your sources, transform the data to suit

your needs, and load the data to your targets

• Use Data Integrator features and functions to:

! Recover from run-time errors

! Capture changed data

! Set up a multiuser development environment

! Run a real-time job

! View and print metadata reports

! Examine data throughout a job using the debugger

4 Data Integrator Core Tutorial

Before You Begin

Data Integrator product documentation

Consult the Data Integrator Getting Started Guide for:

• An overview of Data Integrator products and architecture

• Data Integrator installation and configuration information

• A list of product documentation and a suggested reading

path

After you install Data Integrator, you can view technical documentation from many locations. To view documentation in PDF format, you can:

• Select

Start > Programs > Business Objects Data Integrator

version > Data Integrator Documentation,

Designer’s

! Release Notes

! Release Summary

! Technical Manuals

Help menu, click:

Help in Data Integrator Web Administrator

or from the

You can also view and download PDF documentation by visiting

Business Objects Customer Support online. To access this Web

site, you must have a valid user name and password. To obtain your user name and password, go to

http://www.techsupport.businessobjects.com and click

After installing Data Integrator Designer you can also view documentation in HTML format.

Online Help opens

automatically the first time you open the Designer.

You can also open

• Choose

Contents from the Designer’s Help menu.

• Click objects in the object library or workspace and press

Help, using one of the following methods:

F1.

Online Help opens to the subject you selected.

Data Integrator Core Tutorial 5

Before You Begin

Data Integrator product documentation

Use Online Help’s links and tool bar to navigate:

Online help tool bar:

Button Name Description

Home Returns to the Online Help home page.

Show/Hide Navigation

Show/Hide Descriptions

Start/Stop Help Updates

Shows and hides the Table of Contents for the core suite of Data Integrator Technical Manuals. Includes a master Index and Search tabs.

The Online Help window automatically updates its information when you click an object in the Designer window. In addition, you can click any object in the Designer window then click this button to show or hide an object’s local and global variables.

When the lock appears open, each time you click an object in the Designer window, the Online Help information automatically updates.

When the lock button appears closed, the Online Help window does not automatically update when you click objects in the Designer window; you must press F1.

6 Data Integrator Core Tutorial

Tutorial prerequisites

This section provides a high-level description of the steps you need to complete before you begin the tutorial exercises.

Preparation for this tutorial

Read the chapters on logging in to the Designer and the Designer user interface in the Data Integrator Designer Guide to get an overview of the Designer user interface including terms and concepts relevant to this tutorial.

This tutorial also provides a high-level summary in the next chapter, “Data Integrator Product Overview” on page 17.

Environment required

To use this tutorial, you must have Data Integrator running on Windows NT or 2000 and a supported RDBMS (Oracle, IBM DB2, Informix, Microsoft SQL Server, or Sybase).

Before You Begin

Tutorial prerequisites

You can install Data Integrator product components (Designer, Web Administrator, Job Server, Access Server) on a single computer or distribute them across multiple computers.

In the simplest case, all components in the following diagram can reside on the same computer

Data Integrator Core Tutorial 7

Before You Begin

Tutorial prerequisites

Tutorial setup

Ensure you have access to a RDBMS; contact your system administrator for assistance.

To set up your computer for this tutorial you must do the following tasks:

• Create repository, source, and target databases on an existing

RDBMS

• Install Data Integrator on your computer

• Run SQL scripts to create sample source and target tables

The following sections describe each of these tasks.

Create repository, source, and target databases on an existing RDBMS

" To create the databases

1. Log in to your RDBMS.

8 Data Integrator Core Tutorial

Before You Begin

Tutorial prerequisites

2. (Oracle only). Optionally create a service name alias, for

example, using the Oracle Net8 Easy Config utility.

Set the protocol to TCP/IP and enter a service name; for

example, training.bobj. This can act as your connection

name.

3. Create three databases—for your repository, source

operational data store (ODS), and target. For each, you must

create a user account and password. The recommended

values used in the tutorial SQL scripts are:

Repository Source Target

User name repo ods target Password repo ods target

4. Grant access privileges for the user account. For example for

Oracle, grant CONNECT and RESOURCE roles.

5. Make a note of the connection names, database versions,

user names and passwords in the following table. You will be

asked to refer to this information throughout the tutorial.

Value Repository Source Target

• Database connection name (Oracle) or

• Database server name and Database name (MS-SQL Server)

Database version

User name

Password

Install Data Integrator on your computer

See the Data Integrator Getting Started Guide for detailed information about system requirements, configuration, and running the installation program for Windows NT or 2000. See also the previous section, “Environment required” on page 7.

Data Integrator Core Tutorial 9

Before You Begin

Tutorial prerequisites

Be prepared to enter the following information when installing Data Integrator:

• Your Windows NT or 2000 domain and user name

• Windows NT or 2000 password

• Your Windows NT or 2000 computer’s name and host ID

• Data Integrator license information

• Connection information for the local repository and Job Server

NOTE: In the Repository Manager window, click the Local

option for the repository.

• A second Windows NT or 2000 account with rights to log in as a service so the Job Server can run using this account

The installation process takes you into the Data Integrator Server Manager to define a Job Server. Then, you can optionally create an Access Server if you want to use web-based batch job administration.

10 Data Integrator Core Tutorial

Before You Begin

Tutorial prerequisites

The default installation creates the following entries in the

Start > Programs > Business Objects Data Integrator version

menu:

Command Function

Data Integrator Documentation

Data Integrator Designer Opens Data Integrator Designer. License Information Displays license information. Metadata Reports Opens the metadata reporting tool in a

Repository Manager Opens a dialog box that you can use to

Server Manager Opens a dialog box that you can use to

Web Administrator Opens the Data Integrator Web

Provides a list of documentation PDFs including Release Notes, Release Summary, Technical Manuals (complete documentation set), and Tutorial.

web browser

update repository connection information.

configure Job Servers and Access Servers.

Administrator in a browser.

When you installed Data Integrator, it sets up a Windows service for the Job Server. To verify that the service is enabled, open the Services Control Panel and ensure that any Data Integrator services are configured for Started Status and Automatic Startup Type.

Run SQL scripts to create sample source and target tables

Data Integrator installation includes a batch file (CreateTables_databasetype.bat) for each supported RDBMS (indicated by the suffix databasetype). The batch files run SQL scripts that create and populate tables on your source database and create the target schema on the target database.

Data Integrator Core Tutorial 11

Before You Begin

Tutorial prerequisites

" To run the scripts

1. Using Windows Explorer, locate the CreateTables batch file

for your RDBMS in your Data Integrator installation directory in \Tutorial Files\Scripts.

2. Open the appropriate script file and edit the pertinent

connection information (and user names and passwords if you are not using ods/ods and target/target).

The Oracle batch file contains commands of the form:

sqlplus username/password@connection @scriptfile.sql > outputfile.out

The Microsoft SQL Server batch file contains commands of the form:

isql /e /n /U username /S servername /d databasename /P password /i scriptfile.sql /o outputfile.out

(The output files provide logs that you can examine for success or error messages.)

3. Double-click on the batch file to run the SQL scripts.

4. Use an RDBMS query tool to check your source ODS

database.

5. The following tables should exist on your source database

after you run the script. These tables should include a few rows of sample data.

Descriptive Name Table Name in Database

Customer Customer Material Material Sales Order Header SalesOrder Sales Order Line Item SalesItem Sales Delivery Delivery Employee Employee Region Region

12 Data Integrator Core Tutorial

Before You Begin

Tutorial prerequisites

6. Use an RDBMS query tool to check your target data

warehouse.

The following tables should exist on your target database after you run the script.

Descriptive Name Table Name in Database

Sales Org Dimension salesorg_dim Customer Dimension cust_dim Material Dimension mtrl_dim Time Dimension time_dim Employee Dimension employee_dim Sales Fact sales_fact Recovery Status status_table CDC Status CDC_time

Data Integrator Core Tutorial 13

Before You Begin

Data Integrator tutorial structure

The goal of the tutorial exercises is to demonstrate Data Integrator features using a simplified data model. The model is a sales data warehouse with a star schema that contains one fact table and some dimension tables.

Sales Org Dimension

From flat file

Employee Dimension

From relational table

Material Dimension

From XML file or SAP R/3 table

Star Schema

Sales Fact

From relational table

Time Dimension

From transform

Customer Dimension

From relational table

Chapters build on jobs you create and skills learned in previous chapters. You must complete each exercise to begin the next.

This tutorial is organized as follows.

Chapter 2, “Data Integrator Product Overview” introduces the

basic architecture and the user interface for Data Integrator.

Chapter 3, “Defining Source and Target Metadata in Data Integrator” introduces working with the Designer. Use the

Designer to define a datastore and a file format, then import metadata to the object library. After completing this chapter, you will have completed the preliminary work required to define data movement specifications for flat-file data.

14 Data Integrator Core Tutorial

Before You Begin

Data Integrator tutorial structure

Chapter 4, “Populating the SalesOrg Dimension from a Flat File”

introduces basic data flows, query transforms, and source and target tables. The exercise populates the sales organization dimension table from flat-file data.

Chapter 5, “Populating the Time Dimension Using a Transform”

introduces Data Integrator functions. This exercise creates a data flow for populating the time dimension table.

Chapter 6, “Populating the Customer Dimension from a Relational Table” introduces data extraction from relational

tables. This exercise defines a job that populates the customer dimension table.

Chapter 7, “Populating the Material Dimension from an XML File” introduces data extraction from nested sources. This

exercise defines a job that populates the material dimension table.

Chapter 8, “Populating the Sales Fact Table from Relational Tables” continues data extraction from relational tables and

introduces joins and the lookup function. The exercise populates the sales fact table.

Chapter 9, “Changed Data Capture” introduces a basic approach

to changed data capture. The exercise uses variables, parameters, functions, and scripts.

Chapter 10, “Recovery Mechanisms” presents techniques for

recovering from incomplete data loads.

Chapter 11, “Multiuser Development” presents the use of a

central repository for setting up a multiuser development environment.

Chapter 12, “Extracting SAP R/3 Data” provides optional

exercises on extracting data from an SAP R/3 source.

Chapter 13, “Running a Real-time Job in Test Mode” provides

optional exercises on running a real-time job in test mode.

Data Integrator Core Tutorial 15

Before You Begin

To exit the tutorial

You can exit the tutorial at any point after creating a sample project (see “Adding a new project” on page 53).

" To exit the tutorial:

1. From the Project menu, click Exit.

If any work has not been saved, you are prompted to save your work.

2. Click Ye s or No.

" To resume the tutorial:

1. Open Data Integrator and log in to the repository in which

you saved your work.

2. The Designer window opens.

3. From the Project menu, click Open.

4. Click the name of the project you want to work with, then

click

The Designer window opens with the project and the objects within it displayed in the project area.

16 Data Integrator Core Tutorial

Open.

Data Integrator Core Tutorial

Data Integrator Product

Overview

This chapter provides an overview of Data Integrator. It introduces the product architecture and the Data Integrator Designer.

This chapter contains the following sections:

• Introduction

• The Data Integrator Designer window

• Data Integrator objects

• New terms

• Chapter summary and what to do next

Data Integrator Core Tutorial 17

Data Integrator Product Overview

Introduction

This section gives you a brief introduction to Data Integrator.

What does Data Integrator do?

Data Integrator is a data movement and integration tool. One of its fundamental capabilities is extracting, transforming, and loading (ETL) data from heterogeneous sources into a target database or data warehouse. You create applications (jobs) that specify data mappings and transformations by using the Data Integrator Designer.

Data Integrator real-time interfaces provide additional support for real-time data movement and access. Data Integrator real-time reacts immediately to messages as they are sent, performing predefined operations with message content. Data Integrator real-time components provide services to web applications and other client applications.

18 Data Integrator Core Tutorial

Data Integrator components

The Data Integrator product consists of several components including:

• Data Integrator Designer

The Designer allows you to create, test, and execute jobs that populate a data warehouse. It is a development tool with a unique graphical user interface. It enables developers to create objects, then drag, drop, and configure them by selecting icons in a source-to-target flow diagram. It allows you to define data mappings, transformations, and control logic

. Use the Designer to create applications specifying work

flows (job execution definitions) and data flows (data

transformation definitions).

• Data Integrator Job Server

The Job Server is an application that launches the Data Integrator processing engine and serves as an interface to the engine and other components in the Data Integrator suite.

Data Integrator Product Overview

Introduction

• Data Integrator engine

The Data Integrator engine executes individual jobs defined in the application you create using the Designer. When you start your application, the Data Integrator Job Server launches enough engines to effectively accomplish the defined tasks.

• Data Integrator Repository

The repository is a database that stores Designer predefined system objects and user-defined objects including source and target metadata and transformation rules. In addition to the local repository used by the Designer and Job Server, you can optionally establish a central repository for object sharing and version control.

Data Integrator Core Tutorial 19

Data Integrator Product Overview

Introduction

The Designer handles all repository transactions. Direct manipulation of the repository is unnecessary except for:

! Setup before installing Data Integrator

You must create space for a repository within your RDBMS before installing Data Integrator.

! Security administration

Data Integrator uses your security at the network and RDBMS levels.

! Backup and recovery

You can export your repository to a file. Additionally, you should regularly back up the database where the repository is stored.

• Data Integrator Access Server

The Access Server passes messages between web applications and the Data Integrator Job Server and engines. It provides a reliable and scalable interface for request-response processing.

• Data Integrator Web Administrator

The Web Administrator provides browser-based administration of Data Integrator resources, including:

! Scheduling, monitoring, and executing batch jobs ! Configuring, starting, and stopping real-time services ! Configuring Job Server, Access Server, and repository

! Configuring and managing adapters ! Managing users ! Publishing batch jobs and real-time services via Web

20 Data Integrator Core Tutorial

usage

services

Data Integrator Product Overview

Introduction

The following diagram illustrates Data Integrator product components and relationships.

Central Repository

(shared, optional)

Heterogeneous

Data Sources

Designer and

Local Admin

Local Repository

(user-specific)

Job Server and

Engine

Data Warehouse

(Target)

How do you use Data Integrator?

You use Data Integrator to design, produce, and run data movement applications.

Using the Designer, you can build work flows and data flows that specify data extraction, transformation, and loading processes. In Data Integrator real-time, you have the added capability to build real-time data flows that support e-business transactions.

Web-based

Administration

Access Server

Web

Applications

You create jobs to contain, organize, and run your flows. You create projects to organize the jobs.

Refine and build on your design until you have created a well-tested, production-quality application. In Data Integrator, you can set applications to run in test mode or on a specific schedule. Using Data Integrator real-time, you can run applications in real time so they immediately respond to web-based client requests.

Data Integrator Core Tutorial 21

Data Integrator Product Overview

Introduction

Data Integrator system configurations

You can configure Data Integrator in various ways. The following diagram illustrates one possible system configuration.

Designer

Job Server

Access Server

(Windows NT

or 2000)

TCP/IP Network

Sources

Job Server

(UNIX)

Data Warehouse

ERP Data

ERP

Application

Server

When integrating Data Integrator into your existing environment, consider:

• The servers shown in the diagram can be separate physical computers, or they can be installed on a single computer.

• For peak performance, install and create the Data Integrator local repository on either the same computer as the Data Integrator Job Server or on the same computer as the target data warehouse.

In either of the previous configurations, the computer should be on the same LAN segment as the rest of the Data Integrator components.

22 Data Integrator Core Tutorial

As shown in the diagram, most Data Integrator components—the Designer, Job Server, and Access Server—can run on the same Windows NT or 2000 system, or you can install the Job Server on a UNIX system running Hewlett Packard HP-UX, Sun Solaris, or IBM AIX.

Windows implementation

You can configure a Windows NT or 2000 system as either a server or a workstation. A large-memory, multiprocessor system is ideal because the multithreading, pipelining, and parallel work flow execution features in Data Integrator take full advantage of such a system.

You can create your target data warehouse on a database server that may or may not be a separate physical computer.

You can use a shared disk or FTP to transfer data between your source system and the Data Integrator Job Server.

Data Integrator Product Overview

Introduction

UNIX implementation

You can install the Data Integrator Job Server on a UNIX system running Hewlett Packard HP-UX, Sun Solaris, or IBM AIX. You can also configure the Job Server to start automatically when you restart the computer.

Data Integrator Core Tutorial 23

Data Integrator Product Overview

The Data Integrator Designer window

The following illustration shows the key areas of the Data Integrator Designer window.

Workspace

Project area

Tool palette

Local object library

The key areas of the Data Integrator application window are:

• Project area — Contains the current project (and the job(s) and other objects within it) available to you at a given time.

In Data Integrator, all entities you create, modify, or work with are objects.

• Workspace — The area of the application window in which you define, display, and modify objects.

• Local object library — Provides access to local repository objects including built-in system objects, such as transforms, and the objects you build and save, such as jobs and data flows.

24 Data Integrator Core Tutorial

Data Integrator Product Overview

The Data Integrator Designer window

• Tool palette — Buttons on the tool palette enable you to add new objects to the workspace.

Data Integrator Core Tutorial 25

Data Integrator Product Overview

Data Integrator objects

In Data Integrator, all entities you add, define, modify, or work with are objects. Objects have:

Options Control the object. For example, to set up a

connection to a database, defining the database name would be an option for the connection.

Properties Describe the object. For example, the name and

creation date. Attributes are properties used to locate and organize objects.

Classes Type of object. You can copy reusable objects

from the object library. You cannot copy single-use objects.

The following diagram shows transform objects in the Data Integrator object library.

Object list

To resize the object library, click and drag it until you see the border size you want, then release. When you widen the object library, the name of each object is visible next to its icon.

Object library tabs

26 Data Integrator Core Tutorial

Object hierarchy

The following illustration shows the hierarchical relationships for the key object types within Data Integrator.

Data Integrator Product Overview

Data Integrator objects

Key:

Data Integrator Core Tutorial 27

Data Integrator Product Overview

Data Integrator objects

In the repository, the designer groups objects hierarchically from a project, to jobs, to optional work flows, to data flows. In jobs, work flows define a sequence of processing steps, and data flows transform data from source(s) to target(s).

Projects and jobs

A project is the highest-level object in the Data Integrator Designer window. Projects provide you with a way to organize the other objects you create in Data Integrator. Only one project is open at a time (where “open” means “visible in the project area”).

A job is the smallest unit of work that you can schedule independently for execution.

Work flows and data flows

Jobs are composed of work flows and/or data flows:

•A work flow is the incorporation of several data flows into a coherent flow of work for an entire job.

•A data flow is the process by which source data is transformed into target data.

A work flow orders data flows and operations that support them; a work flow also defines the interdependencies between data flows. For example, if one target table depends on values from other tables, use the work flow to specify the order in which you want Data Integrator to populate the tables. Also use work flows to define strategies for handling errors that occur during project execution. You can also use work flows to define conditions for running sections of a project.

The following diagram illustrates a typical work flow.

28 Data Integrator Core Tutorial

Work flow

Data Integrator Product Overview

Data Integrator objects

Control

Operations

A data flow defines the basic task that Data Integrator accomplishes, which involves moving data from one or more sources to one or more target tables or files. You define data flows by identifying the sources from which to extract data, the transformations the data should undergo, and targets.

Data flow

Source(s)

Object-naming conventions

Data Integrator recommends that you follow a consistent naming convention to facilitate object identification. For example, Rapid Marts use standards that are documented in the Business Objects Component Development Guide. Here are some examples:

Data Flow

Data

Transformation(s)

Data Flow

Control

Operations

Target(s)

Prefix Suffix Object Example

JOB Job JOB_SalesOrg WF Work flow WF_SalesOrg DF Data flow DF_Currency

DS Datastore ODS_ds

Data Integrator Core Tutorial 29

Data Integrator Product Overview

New terms

Review the following terms.

Term Description

Attribute Property that can be used as a constraint for locating

objects.

Data flow Contains steps to define how source data becomes target

data. Called by a work flow or job.

Datastore Logical channel that connects Data Integrator to source

and target databases.

Job The smallest unit of work that you can schedule

independently for execution. A job is a special work flow that cannot be called by another work flow or job.

Metadata Data that describes the objects maintained by Data

Integrator.

Object Any project, job, work flow, data flow, datastore, file

format, message, custom function, or transform created, modified, or used in Data Integrator.

Object library Part of the Designer interface that represents a “window”

into the local repository and provides access to reusable objects.

Option A choice in a dialog box that controls how an object

functions.

Project Grouping of jobs for the purpose of organized storage.

The Designer can open only one project at a time.

Property Characteristic used to define the state, appearance, or

value of an object; for example, the name of the object or the date it was created.

Repository A database that stores Designer predefined system objects

and user-defined objects including source and target metadata and transformation rules. Can be local or central (shared).

Source Table, file, or legacy system from which Data Integrator

reads data. Target Table or file to which Data Integrator loads data. Work flow Contains steps to define the order of job execution. Calls

a data flow to manipulate data.

30 Data Integrator Core Tutorial

Data Integrator Product Overview

Chapter summary and what to do next

This chapter has given you a short overview of the Data Integrator product and terminology. For more information about these topics, see the first few chapters of the Data Integrator

Getting Started Guide and Data Integrator Designer Guide.

In the next chapter, you will begin creating objects using the Data Integrator Designer.

Data Integrator Core Tutorial 31

Data Integrator Product Overview

Chapter summary and what to do next

32 Data Integrator Core Tutorial

Data Integrator Core Tutorial

Defining Source and Target

Metadata in Data Integrator

In this chapter you will set up logical connections between Data Integrator, a flat-file source, and a target data warehouse. You will also create and import objects into the local repository. Storing connection metadata in the repository enables you to work within Data Integrator to manage tables that are stored in various environments.

The tasks in this chapter include:

• Logging in to the repository

• Defining a datastore

• Importing metadata

• Defining a file format

Data Integrator Core Tutorial 33

Defining Source and Target Metadata in Data Integrator

Logging in to the repository

When you use Data Integrator, you save your work in the local repository. So, when you open Data Integrator, a login window for the local repository opens.

" To log in to Data Integrator

1. From the Start menu, click Programs > Business Objects Data Integrator version > Data Integrator Designer

As Data Integrator starts, a repository login screen appears. A sample login window (for the Oracle environment) appears as follows.

2. Supply the appropriate login information for your

repository, then click

Data Integrator connects to the specified repository, then opens the Designer.

See “Create repository, source, and target databases on an

existing RDBMS” on page 8 for login information.

34 Data Integrator Core Tutorial

OK.

Defining Source and Target Metadata in Data Integrator

Logging in to the repository

In the next section you will define datastores (connections) for your source and target.

Data Integrator Core Tutorial 35

Defining Source and Target Metadata in Data Integrator

Defining a datastore

Datastores:

• Provide a logical channel (connection) to a database

• Must be specified for each source and target database

• Are used to import metadata for source and target databases into the repository.

• Are used by Data Integrator to read data from source tables and load data to target tables

The databases to which Data Integrator datastores can connect include:

•Oracle

•IBM DB2

•Informix

• Microsoft SQL Server

• Sybase

•ODBC

• DETAIL (for mainframe sources)

Metadata consists of:

! Database tables

! RDBMS functions ! Application-specific data structures

36 Data Integrator Core Tutorial

•Table name

• Column names

• Column data types

• Primary key columns

• Table attributes

Defining Source and Target Metadata in Data Integrator

Connection metadata is defined in the object library as datastores (for tables) and file formats (for flat files).

The next task describes how to define datastores using the Designer. Note that while you are designating the datastores as sources or targets, datastores only function as connections. You will define the actual source and target objects when you define data flows later in the tutorial.

" To define a datastore for the source (ODS) database

1. From the Datastores tab of the object library, right-click in

the blank area and click

New.

Defining a datastore

TIP: To find the Datastores tab, roll your cursor over each

object library tab and read the text. You can also expand the Object Library window by clicking and dragging it or double-clicking it. Tab names display when you expand the Object Library window.

Data Integrator Core Tutorial 37

Defining Source and Target Metadata in Data Integrator

Defining a datastore

The Datastore Editor window opens. A sample Datastore Editor window for the Oracle environment appears as follows:

2. In the Name box, type ODS_DS.

This datastore name labels the connection to the database you will use as a source. The datastore name will appear in the local repository. When you create your own projects/applications, remember to give your objects meaningful names.

3. In the Application Type box, click Custom.

38 Data Integrator Core Tutorial

Defining Source and Target Metadata in Data Integrator

Defining a datastore

4. In the Database Type box, click the option that corresponds

to the database software being used to store the source data.

The remainder of the boxes on the

Administrator

window depend on the Database Type

Datastore Web

selected.

5. Accept the default row commit size of 1000 for all databases.

The commit size option refers to batch array size (number of rows).

6. The following table lists some of the options for the

Datastore Editor depending on the RDBMS you are using. The options marked skip are not used in this tutorial. Enter the information you recorded in “Create repository, source,

and target databases on an existing RDBMS” on page 8.

Oracle DB2 Informix SQL Server Sybase

Datastore name Datastore name Datastore name Datastore name Datastore name Database type Database type Database type Database type Database type CDC DB2 data source Informix data

source

Database connection name

Oracle version DB2 version Informix

version

Bulk loader directory (skip)

Overflow file directory (skip)

User name User name User name User name User name Password Password Password Password Password LiveLoad (skip) LiveLoad (skip) LiveLoad (skip) LiveLoad (skip) LiveLoad (skip)

Bulk loader directory (skip)

Overflow file directory (skip)

Bulk loader directory (skip)

Overflow file directory (skip)

Database server name

Database name Database name

SQL server version

Overflow file directory (skip)

Database server name

Sybase version

Overflow file directory (skip)

7. Click OK.

Data Integrator saves a datastore for your source in the repository.

Data Integrator Core Tutorial 39

Defining Source and Target Metadata in Data Integrator

Defining a datastore

" To define a datastore for the target database

Define a datastore for the target database using the same procedure as for the source (ODS) database.

•Use TARGET_DS for the datastore name.

• Use the information you recorded in “Create repository,

source, and target databases on an existing RDBMS” on page 8.

40 Data Integrator Core Tutorial

Importing metadata

With Data Integrator, you can import metadata for individual tables using a datastore. You can import metadata by:

•Browsing

•Name

• Searching

The following procedure describes how to import by browsing.

" To import metadata for ODS source tables into Data Integrator

1. In the Datastores tab, right-click the ODS_DS datastore and

Open.

click

The names of all the tables in the database defined by the datastore named ODS_DS display in a window in the workspace.

Defining Source and Target Metadata in Data Integrator

Importing metadata

2. Move the cursor over the right edge of the Metadata column

heading until it changes to a resize cursor.

3. Double-click the column separator to automatically resize

the column.

Data Integrator Core Tutorial 41

Defining Source and Target Metadata in Data Integrator

Importing metadata

4. Import the following tables by right-clicking each table name

and clicking

Import. Alternatively, because the tables are

grouped together, click the first name, Shift-click the last, and import them together. (Use Ctrl-click for nonconsecutive entries.)

ods.ods_customer ods.ods_material ods.ods_salesorder ods.ods_salesitem ods.ods_delivery ods.ods_employee ods.ods_region

NOTE: In MS SQL Server, the owner prefix might be dbo

instead of ODS.

Data Integrator imports the metadata for each table into the local repository.

5. Expand the ODS_DS datastore’s Tables container in the

object library and verify the tables are listed in the repository.

42 Data Integrator Core Tutorial

Defining Source and Target Metadata in Data Integrator

Importing metadata

" To import metadata for target tables into Data Integrator

1. Open the TARGET_DS datastore.

2. Import the following tables by right-clicking each table name

and clicking them together.

target.status_table target.cust_dim target.employee_dim target.mtrl_dim target.sales_fact target.salesorg_dim target.time_dim target.CDC_time

NOTE: In MS SQL Server, the owner prefix will be dbo

instead of TARGET.

Import. Alternatively, Use Ctrl-click and import

Data Integrator imports the metadata for each table into the local repository.

Data Integrator Core Tutorial 43

Defining Source and Target Metadata in Data Integrator

Importing metadata

3. Expand the Target_DS datastore’s Tables container in the

object library and verify the tables are listed in the repository.

44 Data Integrator Core Tutorial

Defining Source and Target Metadata in Data Integrator

Defining a file format

If the source or target RDBMS includes data stored in flat files, you must define file formats in Data Integrator. File formats are a set of properties that describe the structure of a flat file.

Data Integrator includes a file format editor. Use it to define flat file formats. The editor supports delimited and fixed-width formats.

You can specify file formats for one file or a group of files. You can define flat files from scratch or by import and modify an existing flat file. Either way, Data Integrator saves a connection to the file or file group.

NOTE: Data Integrator also includes a file format

(Transport_Format) that you can use to read flat files in SAP R/3.

In the next chapter you will use a flat file as your source data. Therefore, you must create a file format and connection to the file now.

Defining a file format

" To define a file format

1. In the object library, click the File Formats tab, right-click in a

blank area of the object library, and click

2. The file format editor opens.

3. Under General, leave Ty p e as Delimited. Change the Name to Format_SalesOrg.

Under Data Files, click the Files folder icon and navigate in your Data Integrator install directory to \Tutorial Files\sales_org.txt. Click displays sample data from the sales_org.txt file in the (lower right) Data Preview pane.

New > File Format.

Open. The file format editor

Data Integrator Core Tutorial 45

Defining Source and Target Metadata in Data Integrator

Defining a file format

5. Under Default Format, change Date to ddmmyyyy.

The source data contains dates with a two-digit day number followed by a two-digit month number, a four-digit year (ddmmyyyy), and no time value. (Note that the sample dates do not contain a delimiter between values. Using an unsupported format string such as ddmmJJJJ will result in incorrect dates, and no error message will appear.)

6. The first row contains the names of the column headings.

Under

Input/Output, change Skip row header to Ye s .

Data Integrator removes the first row and uses these column heading values for the field names.

7. In the upper-right Column Attributes pane, click DateOpen

and change the data type to lengths should appear as follows:

Field Name Data Type Field Size

SalesOffice Int Region VarChar 2 DateOpen Date Country VarChar 7

Date. The column types and

46 Data Integrator Core Tutorial

Defining Source and Target Metadata in Data Integrator

8. Click Save & Close.

Defining a file format

Data Integrator Core Tutorial 47

Defining Source and Target Metadata in Data Integrator

New terms

The terms examined in this chapter included:

Te r m M e a n i n g

Datastore Connection from Data Integrator to tables in source

or target databases. Stored as an object in the repository.

Metadata Data that describes objects maintained by Data

Integrator. Metadata that Data Integrator stores in its local repository includes:

•Table name

• Column name

• Column data types

• Primary key columns

• Table attributes

•RDBMS functions

Object library The GUI part of the Designer representing the local

repository.

File format A set of properties that define the table structure for a

flat file. Stored as an object in the repository.

48 Data Integrator Core Tutorial

Defining Source and Target Metadata in Data Integrator

Chapter summary and what to do next

At this point, you have finished all the preparatory work to define data movement specifications for a flat-file data source to a target data warehouse. In this chapter you have:

• Defined a datastore from Data Integrator to your target data warehouse

• Imported metadata from target tables into the local repository so that you can use Data Integrator Designer to work with these tables

• Defined file formats and a connection to flat-file source data

You are now ready to create a new project and define jobs that will populate the target tables with source data. You will do that for the sales organization dimension table in Chapter 4,

“Populating the SalesOrg Dimension from a Flat File”.

Sales Org Dimension

You can now exit Data Integrator or go on to the next chapter. The information you have created in this chapter has been saved in the local repository and will be automatically available to you the next time you use Data Integrator.

If you need more information about the topics covered in this chapter, see the Data Integrator Designer Guide.

Data Integrator Core Tutorial 49

Defining Source and Target Metadata in Data Integrator

Chapter summary and what to do next

50 Data Integrator Core Tutorial

Data Integrator Core Tutorial

Populating the SalesOrg

Dimension from a Flat File

In this chapter, you will populate the sales organization

Sales Org Dimension

dimension table in your target data warehouse with data from a flat file called Format_SalesOrg.

The concepts and tasks in this chapter include:

• Objects and their hierarchical relationships

• Adding a new project

• Adding a job

• Adding a work flow

• About data flows

• Saving the project

• Executing the job

• About deleting objects

• New terms

• Chapter summary and what to do next

Data Integrator Core Tutorial 51

Populating the SalesOrg Dimension from a Flat File

Objects and their hierarchical relationships

Everything in Data Integrator is an object. The key objects involved in data movement activities (like projects, jobs, work flows, and data flows) display in the Data Integrator Designer project area according to their relationship in the object hierarchy.

The following figure shows a display of the types of objects you will be creating while you are working in this chapter.

Project

Job

Workflow

Dataflow

Object hierarchies are displayed in the project area of the Designer.

52 Data Integrator Core Tutorial

Adding a new project

Projects group and organize related objects. Projects display in the project area of the Designer and can contain any number of jobs, work flows, and data flows.

" To add a new project

1. In the Data Integrator Designer, from the Project menu click New > Project.

2. Name the project Class_Exercises.

3. Click Create.

The project name appears as the only object in the project area of the Designer.

Next, you will define the job that will be used to extract the information from the flat-file source.

Populating the SalesOrg Dimension from a Flat File

Adding a new project

Data Integrator Core Tutorial 53

Populating the SalesOrg Dimension from a Flat File

Adding a job

A job is a reusable object. It is also the second level in the project hierarchy. It contains work flows (which contain the order of steps to be executed) and data flows (which contain data movement instructions). You can execute jobs manually or as scheduled.

In this exercise you will define a job called JOB_SalesOrg.

" To add a new batch job

1. Right-click in the project area and click New Batch Job.

2. Right-click the new job and click Rename. Alternatively,

left-click the job twice (slowly) to make the name editable.

3. Type JOB_SalesOrg.

4. Left-click or press Enter.

The job appears in the project hierarchy under Class_Exercises and in the project area of the object library.

54 Data Integrator Core Tutorial

Adding a work flow

A work flow is a reusable object. It executes only within a Job. Use work flows to:

•Call data flows

• Call another work flow

• Define the order of steps to be executed in your job

• Pass parameters to and from data flows

• Define conditions for executing sections of the project

• Specify how to handle errors that occur during execution

• Define conditions for executing sections of the project

Work flows are optional.

The Data Integrator objects you can use to create work flows appear on the tool palette:

Populating the SalesOrg Dimension from a Flat File

Adding a work flow

Button Component Programming Analog

Work flow Procedure

Data Flow Declarative SQL select statement

Script Subset of lines in a procedure

Conditional If/then/else logic

Try Try block indicator

Catch Try block terminator and exception handler

Data Integrator Core Tutorial 55

Populating the SalesOrg Dimension from a Flat File

Adding a work flow

" To ad d a wo r k f l o w

1. With JOB_SalesOrg selected in the project area, click the

work flow button on the tool palette.

2. Click the blank workspace area.

A work flow icon appears in the workspace. The work flow also appears in the project area on the left under the job name (expand the job to view).

NOTE: You can place a work flow anywhere in the

workspace, but because flows are best viewed from left to right and top to bottom, place it near the top left corner.

3. Change the name of the work flow to WF_SalesOrg.

4. Click the name of the work flow.

An empty view for the work flow appears in the workspace. You will use this view to define the elements of the work flow. Notice the title bar changes to display the name of the work flow.

56 Data Integrator Core Tutorial

About data flows

A data flow defines the flow of data from sources to targets. It is used to:

• Identify the source data you want to read

• Define the transformations you want to perform on the data

• Identify the target table to which you want to load data

A data flow is a reusable object. It is always called from a work flow or a job.

The first data flow you need to create for this tutorial reads sales organization data from a flat file and loads the data into the sales organization dimension table in the target data warehouse.

Adding a data flow

The following procedure creates a data flow named DF_SalesOrg inside the work flow WF_SalesOrg.

Populating the SalesOrg Dimension from a Flat File

About data flows

" To ad d a dat a f l ow

1. Make sure the work flow window is open in the workspace.

If it is not, click the WF_SalesOrg work flow in the project area.

2. Click the data flow button on the tool palette.

3. Click the workspace.

A representation of a data flow appears in the workspace. The data flow also appears in the project area.

4. Change the name of the data flow to DF_SalesOrg.

Data Integrator Core Tutorial 57

Populating the SalesOrg Dimension from a Flat File

About data flows

Notice the project, job, work flow, and data flow objects display in hierarchical form in both the project area and the object library. To navigate to these levels, click their names in the project area.

Data flow in the workspace

Data flow in the project area

Data flow in the local object library

5. Click the data flow name.

A definition area or grid for the data flow appears in the workspace. This area will be used to define the data flow as described in the next section.

58 Data Integrator Core Tutorial

Defining the data flow

Inside the data flow, you must specify the instructions to transform source data into the form you want for the target table. In this case, you will define the data flow instructions for building the sales organization dimension table using the following objects:

• An object representing the source file

• An object defining a query transform (or query). A query transform maps columns from source to target. With a query, you can:

! Select row sets ! Create joins ! Group and order by data values ! Create data set filters ! Execute functions

Populating the SalesOrg Dimension from a Flat File

About data flows

• An object representing the target table into which the data loads

The next three exercises guide you through the steps necessary to define a dataflow’s content:

1. Add objects to the data flow.

2. Connect them in the order that data will flow through them.

3. Define the query that maps the source columns to the target

columns.

" To add objects to a data flow

1. Verify the DF_SalesOrg data flow is open in the workspace.

In the object library, click the

Formats tab.

Data Integrator Core Tutorial 59

Populating the SalesOrg Dimension from a Flat File

About data flows

2. The source for this data flow will be the flat file

Format_SalesOrg. Click and drag the Format_SalesOrg object to the workspace.

3. Drop the Format_SalesOrg object on the left side of the

workspace to leave room for more objects on the right.

4. Click Make Source from the shortcut menu that appears

when you drop the object.

5. Click the query button on the tool palette.

6. Click to the right of the source file in the workspace.

7. In the object library, click the Datastores tab and expand the

datastore named Target_DS.

8. The target for this data flow will be the table

SALESORG_DIM. Click and drag the SALESORG_DIM object to the workspace and drop it to the right of the query.

9. Click Make Target from the shortcut menu that appears

when you drop the table. Data Integrator creates a table with the same schema as the query output schema.

All the elements necessary to create the sales organization dimension table are now in the workspace. In the next section, you will connect the steps in the correct order.

" To define the order of steps in a data flow

You now need to define the sequence for the data flow. The steps execute in left-to-right order.

1. Click the square on the right edge of the source file and drag

it to the triangle on the left edge of the query transform.

60 Data Integrator Core Tutorial

Populating the SalesOrg Dimension from a Flat File

About data flows

2. Use the same drag technique to connect the query transform

to the target table.

The next section introduces the query transform and the query transform editor.

" To define a query transform that maps a source to a target

1. Click the name of the query in the project area or in the

workspace.

The query editor opens showing the source schema, the target schema (which has automatically been copied from the target table metadata to the output pane of the query), and property sheets for defining the query.

There are four columns in the source schema: Sales_Office, Region, DateOpen, and Country. You will specify that three of them (Sales_Office, Region, and DateOpen) be copied to the target table.

2. Map the source columns to target columns by dragging and

dropping each source column name onto the corresponding column for the target schema. From the shortcut menu that appears, click imported from the target. Notice that the column icon next to the source column changes to an arrow when dragged to the target, which indicates that the column has been mapped.

Remap Column to preserve the data type

Data Integrator Core Tutorial 61

Populating the SalesOrg Dimension from a Flat File

About data flows

Also notice that when you select a column in the target schema, the Mapping tab of the property sheet shows the mapping relationship for that column. You can also view the mapping relationship by scrolling the Schema Out pane to the right to show the Mapping column.

Target schema

Source schema

Property sheets

Column mapping definition

3. Validate the query by clicking the Valid ate Cu rre n t button on

the toolbar.

The Output window appears with tabs for Errors, Warnings, and Information. The Warning tab indicates that Data Integrator will convert the data type for the SALESOFFICE column.

4. Close the Output window and click the Back arrow on the

tool bar to close the query editor and return to the data flow view.

62 Data Integrator Core Tutorial

Validating the data flow

Next you will verify that the data flow has been constructed properly (which does not guarantee that the job will run or produce the correct data).

Va l id at i o n menu provides design-time validation options.

The These options check for syntax errors, not run-time errors. Run-time validation occurs while the job executes. (See

“Executing the job” on page 66.)

" To va l i da t e a d at a f l o w

1. Click DF_SalesOrg in the project area.

2. From the menu bar, click Vali dati on > Vali date > C ur rent View

NOTE:

Populating the SalesOrg Dimension from a Flat File

About data flows

! Current View — Validates the object definition open in

the workspace.

! All Objects in View — Validates the object definition

open in the workspace and all of the objects that it calls.

Also note the toolbar.

Data Integrator displays the Output dialog box with the Warning tab indicating that Data Integrator will convert the data type for the SALESOFFICE column.

If the Output window displays any errors, correct them before proceeding.

Val idate C urr ent and Vali date A ll buttons on the

Data Integrator Core Tutorial 63

Populating the SalesOrg Dimension from a Flat File

About data flows

Addressing errors

" To use the Output window:

1. Right-click an error message and click View.

Data Integrator displays the Error Message window in which you can read the expanded error message text.

2. Double-click the error message to open the editor of the

object containing the error.

NOTE: Warning messages do not prohibit job execution.

You have validated the data flow and completed the description of the data movement for the sales organization dimension table.

64 Data Integrator Core Tutorial

Saving the project

You can save the steps you have completed and close Data Integrator at any time.

Populating the SalesOrg Dimension from a Flat File

Saving the project

To save all objects in a project, from the

All

. Or, when you close a project or exit the program, Data Integrator lists all recently changed items and prompts you to save them.

To save work displayed only in the active object workspace, from the

Project menu click Save.

Project menu click Save

Data Integrator Core Tutorial 65

Populating the SalesOrg Dimension from a Flat File

Executing the job

Now you will execute the job you have just created.

" To execute a job

First verify that your job server is running by looking at the job server icon at the bottom of the Designer window. Mouse over the icon to see the Job Server name, machine name, and port number in the status area. If the job server is not running, the icon will have a red X on it.

1. Select the job name in the project area, in this case

JOB_SalesOrg.

2. Right-click and click Execute.

If you have changed any objects and not yet saved them, Data Integrator prompts you to save your work. Click

OK.

66 Data Integrator Core Tutorial

Populating the SalesOrg Dimension from a Flat File

Executing the job

Data Integrator validates the job again before running it. If there are syntax errors, review the error messages and return to the design and validate the effected steps of the job. Also review any warning messages before dismissing them.

If the job validates properly, the Execution Properties dialog box appears. These properties include debugging parameters, whether you want to capture data for viewing, and an option to set global variables.

To set or change a job’s default execution settings, right-click the job name and click

3. For this exercise, do not change the defaults. Click OK.

Properties.

Data Integrator Core Tutorial 67

Populating the SalesOrg Dimension from a Flat File

Executing the job

Data Integrator executes the job and produces three log files:

! Monitor log — A list of the job steps in the order they

started.

! Statistics log — A list of each step in the job, the number

of rows processed by that step, and the time required to complete the operation.

! Error log — A list of any errors produced by the RDBMS,

Data Integrator, or the computer operating system during the job execution.

The log window opens in the workspace and displays the monitor log. The buttons at the top of the log window show which kind of log you are viewing. Clicking the middle button displays the statistics log, and the right-hand button shows the error log.

4. Use the buttons at the top of the log window to view the

statistics log. Because there are no errors or warnings, the error log button is unavailable (there is no error log to examine).

68 Data Integrator Core Tutorial

Populating the SalesOrg Dimension from a Flat File

Executing the job

5. After the job is complete, use your RDBMS query tool to

check the contents of the table named salesorg_dim.

Data Integrator Core Tutorial 69

Populating the SalesOrg Dimension from a Flat File

About deleting objects

Deleting a job or other object from the project area does not delete it from the object library. The object still exists in any other projects to which it has been assigned. Deleting an object from the object library, however, deletes all occurrences of the object.

70 Data Integrator Core Tutorial

New terms

Populating the SalesOrg Dimension from a Flat File

New terms

The terms introduced in this chapter included:

Te r m M e a n i n g

Property sheets Tabs in the query editor used for defining properties

of the output schema Query A Data Integrator transform object Schema A description of the data layout of a table Transform A Data Integrator object that defines data

transformation

Data Integrator Core Tutorial 71

Populating the SalesOrg Dimension from a Flat File

Chapter summary and what to do next

You have now defined and run your first data movement job—a job that populates the sales organization dimension table in the target data warehouse.

The next chapter introduces you to time dimension tables. In that exercise you will populate the time dimension table with some simple time attributes:

•Year number

•Month number

• Business quarter

The chapter also introduces you to the Date_Generation transform.

Time Dimension

If you need more information about the topics covered in this chapter, see the Data Integrator Designer Guide.

72 Data Integrator Core Tutorial

Data Integrator Core Tutorial

Populating the Time

Dimension Using a Transform

The exercise in this chapter builds and populates a time

Time Dimension

dimension table. Time dimension tables contain date/time-related attributes such as season, holiday period, fiscal quarter, and others that are not directly ascertainable from traditional SQL style date/time data types.

The time dimension table in this example is simple in that it contains only the year number, month number, and business quarter as time attributes. It uses a Julian date as a primary key.

The tasks in this chapter include:

• Retrieving the project

• Adding the job and data flow

• Defining the time dimension data flow

• Saving and executing the job

Data Integrator Core Tutorial 73

Populating the Time Dimension Using a Transform

Retrieving the project

If you have closed Data Integrator, reopen the Class_Exercises project.

" To open the Class_Exercises project

1. Start Data Integrator.

2. Log in to your repository.

3. Click Project > Open.

4. Click Class_Exercises.

5. Click OK.

74 Data Integrator Core Tutorial

Populating the Time Dimension Using a Transform

Adding the job and data flow

As you have done in the previous exercise, you will begin with creating a job that contains a data flow. A work flow is not necessary for this job; you will be calling the data flow directly from the job.

" To ad d t h e j o b an d dat a f l ow

1. Add a job named JOB_TimeDim that will load data into the

time dimension table.

For a reminder about how to create a job, see “Adding a job”

on page 54.

2. Open JOB_TimeDim and create a data flow object named

DF_TimeDim. (Do not create a work flow this time.)

Adding the job and data flow

Data Integrator Core Tutorial 75

Populating the Time Dimension Using a Transform

Defining the time dimension data flow

The data flow instructions for populating a time dimension table consist of the following objects:

• A Date_Generation transform as a source

• A query to produce column values from the generated dates such as what day the date falls on, the quarter it’s in, and so on.

• A target table into which the time dimension data loads

" To specify the components of the time data flow

1. Open the data flow named DF_TimeDim.

2. In the object library, click the Transforms tab.

3. Click the Date_Generation transform.

Transforms are predefined Data Integrator objects.

4. Drag the transform icon into the workspace and drop it on

the left side of the workspace to leave room for more objects to the right.

5. Click the query button on the tool palette.

6. Click in the workspace to the right of the Date_Generation

transform.

7. In the object library for the datastore named Target_DS, drag

the table named TIME_DIM into the workspace and drop it to the right of the query.

8. Click Make Target from the shortcut menu.

76 Data Integrator Core Tutorial

Populating the Time Dimension Using a Transform

All the objects you need to create a time dimension table are now available in the workspace. Next you will connect the objects in the order they need to be executed.

" To define the flow of data

1. Click the square on the right edge of the Date_Generation

transform and drag to the triangle on the left edge of the query.

2. Use the same drag technique to connect the query to the

TIME_DIM target.

The connections indicate the flow of data through these instructions.

Defining the time dimension data flow

The following sections define the details of each step of the data flow.

" To define the output of the Date_Generation transform

The Date_Generation transform produces a column of dates for a range and increment you specify.

Data Integrator Core Tutorial 77

Populating the Time Dimension Using a Transform

Defining the time dimension data flow

1. Click the name of the Date_Generation transform to open

the transform editor.

2. Enter these values in the editor columns:

Start Date 1994.01.01 End Date 2000.12.31 Increment daily

3. Click the Back arrow in the tool bar to close the

transform editor and return to the data flow.

The date column serves as the input to the query. The query allows you to apply functions to the input columns and map those columns to an internal data set (the query output).

" To define the output of the query

1. Click the query icon in the project area.

78 Data Integrator Core Tutorial

Populating the Time Dimension Using a Transform

Defining the time dimension data flow

The query editor shows an input schema with a single column, the output schema copied from the target, and property sheets for defining the query.

2. Map the generated date to the NATIVEDATE column by

dragging the DI_GENERATED_DATE column from the input schema to the NATIVEDATE output column, and click Remap Column.

Alternatively, you can drag and drop the DI_GENERATED_DATE column from the source pane into the mapping expression. The column name DI_GENERATED_DATE will be qualified with the name of the Date_Generation transform; for example, Date_Generation_1.DI_GENERATED_DATE.

3. Click each column name in the target schema (one by one)

and define the mapping for the column using the following table. The mapping definition applies to the column selected in the target schema. Type the mapping expression directly on the

Mapping tab.

Alternatively, click the

Function button to open the Function

editor and add functions to the mapping definition that way.

These are the mapping expressions and function descriptions for the rest of the columns:

Column name Mapping Function description

Date_ID

Yea rNum

MonthNum

BusQuarter

julian(di_generated_date)

to_char(di_generated_date, 'yyyy')

month(di_generated_date)

quarter(di_generated_date)

Use the JULIAN function to set the Julian date for that date value.

Use the TO_CHAR function to select only the year out of the date value. Enclose yyyy in single quotes.

Use the MONTH function to set the month number for that date value.

Use the QUARTER function to set the quarter for that date value.

Data Integrator Core Tutorial 79

Populating the Time Dimension Using a Transform

Defining the time dimension data flow

You might notice that the tutorial simplifies the quarter calculation. For the purposes of the tutorial, assume that the business year is the same as the calendar year.

4. Click the Back arrow on the tool bar.

These columns are now the input schema for the TIME_DIM table.

This completes the data-movement description for the time dimension table.

80 Data Integrator Core Tutorial

Populating the Time Dimension Using a Transform

Saving and executing the job

Save the project. For a reminder of how to save a project, see

“Saving the project” on page 65.

Execute the job JOB_TimeDim. After the job runs, check the contents of the TIME_DIM table with a query tool for the RDBMS on which your target data warehouse is stored.

Chapter summary and what to do next

You have now populated two tables in the sales data warehouse:

• Sales Org dimension (from a flat file)

• Time dimension, which consists of a date generation transform and a query

In the next chapter, you will extract data to populate the customer dimension table.

Customer Dimension

At this point, you can exit Data Integrator or go on to the next exercise.

If you need more information about the topics covered in this chapter, see the Data Integrator Designer Guide.

Data Integrator Core Tutorial 81

Populating the Time Dimension Using a Transform

Chapter summary and what to do next

82 Data Integrator Core Tutorial

Data Integrator Core Tutorial

Populating the Customer

Dimension from a Relational Ta b l e

The data you used to populate the sales organization dimension

Customer Dimension

table was in a flat file. In this exercise, you will extract data from a relational table and load it into the customer dimension table.

This exercise also demonstrates how to enable and view metadata reports.

You must first import the source and target tables, as described in “Importing metadata” on page 41.

The tasks in this chapter include:

• Adding the CustDim job and work flow

• Adding the CustDim data flow

• Validating the CustDim data flow

• Executing the CustDim job

• Using the interactive debugger

• Chapter summary and what to do next

Data Integrator Core Tutorial 83

Populating the Customer Dimension from a Relational Table

Adding the CustDim job and work flow

" To add the CustDim job

1. Right-click the Class_Exercises project name and click New Batch Job

2. Rename this job JOB_CustDim.

" To ad d t h e C u s t Di m wo r k f l o w

1. Open JOB_CustDim by clicking on its name in the project

area.

2. Define a work flow named WF_CustDim inside

JOB_CustDim.

For more information, see “Adding a work flow” on page 55.

84 Data Integrator Core Tutorial

Populating the Customer Dimension from a Relational Table

Adding the CustDim data flow

" To add the CustDim data flow object

1. Verify the WF_CustDim work flow window is open in the

workspace.

If not, click the WF_CustDim work flow name in the project area (or in the workspace if visible).

2. Add a data flow to the work flow definition.

3. Rename the data flow DF_CustDim.

4. Click the name of the data flow to open the data flow

definition.

Adding the CustDim data flow

Data Integrator Core Tutorial 85

Populating the Customer Dimension from a Relational Table

Defining the CustDim data flow

The data flow instructions for building the dimension table will consist of the following objects:

• The source table

• A query

• The target table into which the customer dimension data loads

" To bring the objects into the data flow

1. Verify the DF_CustDim data flow is open in the workspace.

2. In the object library, view the list of tables in the ods_ds

datastore. Drag and drop the CUSTOMER table to the left side of the workspace and click

3. Click the query icon on the tool palette. and click to the

right of the table to place the query in the workspace.

Make Source.

4. In the object library, from the list of tables for the target_ds

datastore, drag and drop the CUST_DIM table to the right of the query and click

5. Connect the icons to indicate the flow of data, as shown.

" To define the query

1. In the workspace, click the name of the query to open the

query editor.

86 Data Integrator Core Tutorial

Make Target.

Populating the Customer Dimension from a Relational Table

Defining the CustDim data flow

2. Remap the following source columns to the target schema,

leaving the names and data types as they are in the target.

Column Data type Description

Cust_ID varchar(10) Customer number Cust_classf varchar(2) Customer classification Name1 varchar(35) Customer name Address varchar(35) Address City varchar(35) City Region_ID varchar(2) Region Zip varchar(10) Postal code

(Do not map Cust_Timestamp.)

NOTE: In Microsoft SQL Server and Sybase, you must specify

the columns in the order shown in the table.

3. Click the Back arrow in the icon bar to return to the data

flow.

Data Integrator Core Tutorial 87

Populating the Customer Dimension from a Relational Table

Validating the CustDim data flow

Next you will verify that the data flow has been constructed properly.

" To verify that the data flow has been constructed properly

1. From the menu bar, click Va l id at i o n > Validate > All Objects in View

If your design contains syntax errors, a dialog box appears with a message describing the error. Warning messages usually do not affect proper execution of the job.

If your data flow contains no errors, you will get the following message:

88 Data Integrator Core Tutorial

Populating the Customer Dimension from a Relational Table

Executing the CustDim job

Just as you executed the job named JOB_SalesOrg in the previous exercise, execute the job named JOB_CustDim:

1. In the project area, right-click the job name and click Execute.

You will be prompted to save your work:

Executing the CustDim job

2. Click OK.

The Execution Properties window appears.

3. Check the Capture data for viewing box and accept the

default (100) collects sample output rows from data flows and transforms in the job.

4. After the job completes, ensure there are no error or warning

messages.

5. To view the captured sample data, in the project area click

the data flow to open it in the workspace. Click the magnifying glass on an object to view the data.

Or, use a query tool to check the contents of the cust_dim table.

rows. Data Integrator’s View Data feature

Data Integrator Core Tutorial 89

Populating the Customer Dimension from a Relational Table

Using the interactive debugger

The Designer includes an interactive debugger that allows you to examine and modify data row by row by placing filters and breakpoints on lines in a data flow diagram. The debugger allows you to examine what happens to the data after each transform or object in the flow.

A debug filter functions as a simple query transform with a WHERE clause. Use a filter to reduce a data set in a debug job execution. A breakpoint is the location where a debug job execution pauses and returns control to you.

This exercise demonstrates how to set a breakpoint and view data in debug mode.

" To set a breakpoint in a data flow

1. Click the name of the DF_CustDim data flow to open its

definition.

2. Right-click the line between the source table and the query

and click

3. In the Breakpoint pane, select the Set check box.

90 Data Integrator Core Tutorial

Set Filter/Breakpoint.

Populating the Customer Dimension from a Relational Table

4. Click OK.

Using the interactive debugger

" To use the interactive debugger

1. In the project area, right-click Job_CustDim and click Start debug

If prompted to save your work, click

The Debug Properties window opens.

2. Click OK.

OK.

Data Integrator Core Tutorial 91

Populating the Customer Dimension from a Relational Table

Using the interactive debugger

The Designer displays three additional windows (Call stack, Trace, and Variables) as well as View Data panes below the work space.

Call stack window

View Data panes

Trace window

The left View Data pane shows the data in the CUSTOMER source table, and the right pane shows one row at a time (the default) that has passed to the query.

3. To process the next row, from the Debug menu click Get Next Row

The next row replaces the existing row in the query pane. To see all rows, select the

92 Data Integrator Core Tutorial

Variables window

All check box.

Business objects DATA INTEGRATOR 6.5.1 CORE TUTORIAL User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Contents

Before You Begin

Introduction

Audience and assumptions

Tutorial objectives

Data Integrator product documentation

Tutorial prerequisites

Preparation for this tutorial

Environment required

Tutorial setup

Create repository, source, and target databases on an existing RDBMS

Install Data Integrator on your computer

Run SQL scripts to create sample source and target tables

Data Integrator tutorial structure

To exit the tutorial

Introduction

What does Data Integrator do?

Data Integrator components

How do you use Data Integrator?

Data Integrator system configurations

Windows implementation

UNIX implementation

The Data Integrator Designer window

Data Integrator objects

Object hierarchy

Projects and jobs

Work flows and data flows

Object-naming conventions

New terms

Chapter summary and what to do next

Logging in to the repository

Defining a datastore

Importing metadata

Defining a file format

New terms

Chapter summary and what to do next

Objects and their hierarchical relationships

Adding a new project

Adding a job

Adding a work flow

About data flows

Adding a data flow

Defining the data flow

Validating the data flow

Addressing errors

Saving the project

Executing the job

About deleting objects

New terms

Chapter summary and what to do next

Retrieving the project

Adding the job and data flow

Defining the time dimension data flow

Saving and executing the job

Chapter summary and what to do next

Adding the CustDim job and work flow

Adding the CustDim data flow

Defining the CustDim data flow

Validating the CustDim data flow

Executing the CustDim job

Using the interactive debugger