COPYRIGHT No part of the computer software or this document may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in
writing from Business Objects. The information in this document is subject to change without notice. If you find any problems with this
documentation, please report them to Business Objects in writing at documentation@businessobjects.com. Business Objects does not
warrant that this document is error free.
TRADEMARKS
The Business Objects logo, BusinessObjects and Rapid Marts are registered trademarks of Business Objects S.A. in the United States
and/or other countries.
Microsoft, Windows, Windows NT, Access, and other names of Microsoft products referenced herein are either registered trademarks
or trademarks of Microsoft Corporation in the United States and/or other countries.
Oracle is a registered trademark of Oracle Corporation. All other names of Oracle products referenced herein are trademarks or
registered trademarks of Oracle Corporation.
DETAIL is a trademark of Striva Technology Ltd.
SAP, R/3, BW, ALE/WEB and ABAP/4 are the registered or unregistered trademarks of SAP AG.
PeopleSoft is a registered trademark of PeopleSoft, Inc.
All other product, brand, and company names mentioned herein are the trademarks of their respective owners.
USE RESTRICTIONS
This software and documentation is commercial computer software under Federal Acquisition regulations, and is provided only under
the Restricted Rights of the Federal Acquisition Regulations applicable to commercial computer software provided at private expense.
The use, duplication, or disclosure by the U.S. Government is subject to restrictions set forth in subdivision (c)(1)(ii) of the Rights in
Technical Data and Computer Software clause at 252.227- 7013.
The exercises in this tutorial introduce concepts and techniques
to extract, transform and load batch data from flat-file and
relational database sources for use in a data warehouse. Use this
tutorial to obtain practical experience using Data Integrator
components including the Designer, repository and Job Server.
This chapter covers the following:
•Introduction
•Audience and assumptions
•Tutorial objectives
•Data Integrator product documentation
•Tutorial prerequisites
•Data Integrator tutorial structure
•To exit the tutorial
Data Integrator Core Tutorial 1
1
Before You Begin
Introduction
Introduction
This tutorial introduces core features of Data Integrator. Data
Integrator is a component of the Business Objects Business
Intelligence suite that allows you to extract and integrate data for
analytical reporting and e-business.
While exercises in this tutorial focus on how to use batch data
flows, you can also use Data Integrator to develop real-time data
extraction and integration as described in Chapter 13, “Running
a Real-time Job in Test Mode”.
The Business Objects Data Integration platform also provides a
number of Rapid Marts, which are predefined data models with
built-in jobs provided by Data Integrator for use with business
intelligence (BI) and online analytical processing (OLAP) tools.
Contact your sales representative for more information about
Rapid Marts.
2 Data Integrator Core Tutorial
Audience and assumptions
This tutorial assumes that:
•You are an application developer or database administrator
working on data extraction, data warehousing, or data
integration.
•You understand your source data systems, DBMS, business
intelligence, and e-business messaging concepts.
•You understand your organization’s data needs.
•You are familiar with SQL (Structured Query Language).
•You are familiar with Microsoft Windows NT or Windows
2000.
Before You Begin
Audience and assumptions
1
Data Integrator Core Tutorial 3
1
Before You Begin
Tutorial objectives
Tutorial objectives
The intent of this tutorial is to introduce core Data Integrator
Designer functionality.
After completing this tutorial you should be able to:
•Describe the process for extracting, transforming, and
loading data using Data Integrator
•Identify Data Integrator objects
•Define Data Integrator objects to extract flat-file and
relational data from your sources, transform the data to suit
your needs, and load the data to your targets
•Use Data Integrator features and functions to:
!Recover from run-time errors
!Capture changed data
!Set up a multiuser development environment
!Run a real-time job
!View and print metadata reports
!Examine data throughout a job using the debugger
4 Data Integrator Core Tutorial
Before You Begin
Data Integrator product documentation
Data Integrator product documentation
Consult the Data Integrator Getting Started Guide for:
•An overview of Data Integrator products and architecture
•Data Integrator installation and configuration information
•A list of product documentation and a suggested reading
path
After you install Data Integrator, you can view technical
documentation from many locations. To view documentation in
PDF format, you can:
1
•Select
•Select
Start > Programs > Business Objects Data Integrator
version > Data Integrator Documentation,
Designer’s
!Release Notes
!Release Summary
!Technical Manuals
Help menu, click:
Help in Data Integrator Web Administrator
or from the
You can also view and download PDF documentation by visiting
Business Objects Customer Support online. To access this Web
site, you must have a valid user name and password. To obtain
your user name and password, go to
http://www.techsupport.businessobjects.com and click
Register.
After installing Data Integrator Designer you can also view
documentation in HTML format.
Online Help opens
automatically the first time you open the Designer.
You can also open
•Choose
Contents from the Designer’s Help menu.
•Click objects in the object library or workspace and press
Help, using one of the following methods:
F1.
Online Help opens to the subject you selected.
Data Integrator Core Tutorial 5
1
Before You Begin
Data Integrator product documentation
Use OnlineHelp’s links and tool bar to navigate:
Online help tool bar:
ButtonNameDescription
HomeReturns to the Online Help home page.
Show/Hide
Navigation
Show/Hide
Descriptions
Start/Stop Help
Updates
Shows and hides the Table of Contents for the
core suite of Data Integrator Technical Manuals.
Includes a master Index and Search tabs.
The Online Help window automatically updates
its information when you click an object in the
Designer window. In addition, you can click any
object in the Designer window then click this
button to show or hide an object’s local and
global variables.
When the lock appears open, each time you click
an object in the Designer window, the Online
Help information automatically updates.
When the lock button appears closed, the Online
Help window does not automatically update
when you click objects in the Designer window;
you must press F1.
6 Data Integrator Core Tutorial
Tutorial prerequisites
This section provides a high-level description of the steps you
need to complete before you begin the tutorial exercises.
Preparation for this tutorial
Read the chapters on logging in to the Designer and the Designer
user interface in the Data Integrator Designer Guide to get an
overview of the Designer user interface including terms and
concepts relevant to this tutorial.
This tutorial also provides a high-level summary in the next
chapter, “Data Integrator Product Overview” on page 17.
Environment required
To use this tutorial, you must have Data Integrator running on
Windows NT or 2000 and a supported RDBMS (Oracle, IBM
DB2, Informix, Microsoft SQL Server, or Sybase).
Before You Begin
Tutorial prerequisites
1
You can install Data Integrator product components (Designer,
Web Administrator, Job Server, Access Server) on a single
computer or distribute them across multiple computers.
In the simplest case, all components in the following diagram
can reside on the same computer
Data Integrator Core Tutorial 7
1
Before You Begin
Tutorial prerequisites
Tutorial setup
Ensure you have access to a RDBMS; contact your system
administrator for assistance.
To set up your computer for this tutorial you must do the
following tasks:
•Create repository, source, and target databases on an existing
RDBMS
•Install Data Integrator on your computer
•Run SQL scripts to create sample source and target tables
The following sections describe each of these tasks.
Create repository, source, and target databases on an
existing RDBMS
" To create the databases
1. Log in to your RDBMS.
8 Data Integrator Core Tutorial
Before You Begin
Tutorial prerequisites
2. (Oracle only). Optionally create a service name alias, for
example, using the Oracle Net8 Easy Config utility.
Set the protocol to TCP/IP and enter a service name; for
example, training.bobj. This can act as your connection
name.
3. Create three databases—for your repository, source
operational data store (ODS), and target. For each, you must
create a user account and password. The recommended
values used in the tutorial SQL scripts are:
RepositorySourceTarget
User namerepoodstarget
Passwordrepoodstarget
4. Grant access privileges for the user account. For example for
Oracle, grant CONNECT and RESOURCE roles.
5. Make a note of the connection names, database versions,
user names and passwords in the following table. You will be
asked to refer to this information throughout the tutorial.
1
ValueRepositorySourceTarget
•Database connection name (Oracle)
or
•Database server name and
Database name (MS-SQL Server)
Database version
User name
Password
Install Data Integrator on your computer
See the Data Integrator Getting Started Guidefor detailed
information about system requirements, configuration, and
running the installation program for Windows NT or 2000. See
also the previous section, “Environment required” on page 7.
Data Integrator Core Tutorial 9
1
Before You Begin
Tutorial prerequisites
Be prepared to enter the following information when installing
Data Integrator:
•Your Windows NT or 2000 domain and user name
•Windows NT or 2000 password
•Your Windows NT or 2000 computer’s name and host ID
•Data Integrator license information
•Connection information for the local repository and Job
Server
NOTE: In the Repository Manager window, click the Local
option for the repository.
•A second Windows NT or 2000 account with rights to log in
as a service so the Job Server can run using this account
The installation process takes you into the Data Integrator Server
Manager to define a Job Server. Then, you can optionally create an
Access Server if you want to use web-based batch job
administration.
10 Data Integrator Core Tutorial
Before You Begin
Tutorial prerequisites
The default installation creates the following entries in the
Data Integrator DesignerOpens Data Integrator Designer.
License InformationDisplays license information.
Metadata ReportsOpens the metadata reporting tool in a
Repository ManagerOpens a dialog box that you can use to
Server ManagerOpens a dialog box that you can use to
Web AdministratorOpens the Data Integrator Web
Provides a list of documentation PDFs
including Release Notes, Release
Summary, Technical Manuals (complete
documentation set), and Tutorial.
web browser
update repository connection
information.
configure Job Servers and Access Servers.
Administrator in a browser.
1
When you installed Data Integrator, it sets up a Windows service
for the Job Server. To verify that the service is enabled, open the
Services Control Panel and ensure that any Data Integrator
services are configured for Started Status and Automatic Startup
Type.
Run SQL scripts to create sample source and target tables
Data Integrator installation includes a batch file
(CreateTables_databasetype.bat) for each supported RDBMS
(indicated by the suffix databasetype). The batch files run SQL
scripts that create and populate tables on your source database
and create the target schema on the target database.
Data Integrator Core Tutorial 11
1
Before You Begin
Tutorial prerequisites
" To run the scripts
1. Using Windows Explorer, locate the CreateTables batch file
for your RDBMS in your Data Integrator installation
directory in \Tutorial Files\Scripts.
2. Open the appropriate script file and edit the pertinent
connection information (and user names and passwords if
you are not using ods/ods and target/target).
The Oracle batch file contains commands of the form:
(The output files provide logs that you can examine for
success or error messages.)
3. Double-click on the batch file to run the SQL scripts.
4. Use an RDBMS query tool to check your source ODS
database.
5. The following tables should exist on your source database
after you run the script. These tables should include a few
rows of sample data.
Descriptive NameTable Name in Database
CustomerCustomer
MaterialMaterial
Sales Order HeaderSalesOrder
Sales Order Line ItemSalesItem
Sales DeliveryDelivery
EmployeeEmployee
RegionRegion
12 Data Integrator Core Tutorial
Before You Begin
Tutorial prerequisites
6. Use an RDBMS query tool to check your target data
warehouse.
The following tables should exist on your target database after
you run the script.
Descriptive NameTable Name in Database
Sales Org Dimensionsalesorg_dim
Customer Dimensioncust_dim
Material Dimension mtrl_dim
Time Dimension time_dim
Employee Dimensionemployee_dim
Sales Factsales_fact
Recovery Statusstatus_table
CDC StatusCDC_time
1
Data Integrator Core Tutorial 13
1
Before You Begin
Data Integrator tutorial structure
Data Integrator tutorial structure
The goal of the tutorial exercises is to demonstrate Data
Integrator features using a simplified data model. The model is a
sales data warehouse with a star schema that contains one fact
table and some dimension tables.
Sales Org
Dimension
From
flat file
Employee
Dimension
From
relational
table
Material
Dimension
From
XML file
or SAP
R/3 table
Star Schema
Sales Fact
From
relational
table
Time
Dimension
From
transform
Customer
Dimension
From
relational
table
Chapters build on jobs you create and skills learned in previous
chapters. You must complete each exercise to begin the next.
This tutorial is organized as follows.
Chapter 2, “Data Integrator Product Overview” introduces the
basic architecture and the user interface for Data Integrator.
Chapter 3, “Defining Source and Target Metadata in Data
Integrator” introduces working with the Designer. Use the
Designer to define a datastore and a file format, then import
metadata to the object library. After completing this chapter, you
will have completed the preliminary work required to define
data movement specifications for flat-file data.
14 Data Integrator Core Tutorial
Before You Begin
Data Integrator tutorial structure
Chapter 4, “Populating the SalesOrg Dimension from a Flat File”
introduces basic data flows, query transforms, and source and
target tables. The exercise populates the sales organization
dimension table from flat-file data.
Chapter 5, “Populating the Time Dimension Using a Transform”
introduces Data Integrator functions. This exercise creates a data
flow for populating the time dimension table.
Chapter 6, “Populating the Customer Dimension from a
Relational Table” introduces data extraction from relational
tables. This exercise defines a job that populates the customer
dimension table.
Chapter 7, “Populating the Material Dimension from an XML
File” introduces data extraction from nested sources. This
exercise defines a job that populates the material dimension
table.
Chapter 8, “Populating the Sales Fact Table from Relational
Tables” continues data extraction from relational tables and
introduces joins and the lookup function. The exercise populates
the sales fact table.
1
Chapter 9, “Changed Data Capture” introduces a basic approach
to changed data capture. The exercise uses variables, parameters,
functions, and scripts.
Chapter 10, “Recovery Mechanisms” presents techniques for
recovering from incomplete data loads.
Chapter 11, “Multiuser Development” presents the use of a
central repository for setting up a multiuser development
environment.
Chapter 12, “Extracting SAP R/3 Data” provides optional
exercises on extracting data from an SAP R/3 source.
Chapter 13, “Running a Real-time Job in Test Mode” provides
optional exercises on running a real-time job in test mode.
Data Integrator Core Tutorial 15
1
Before You Begin
To exit the tutorial
To exit the tutorial
You can exit the tutorial at any point after creating a sample
project (see “Adding a new project” on page 53).
" To exit the tutorial:
1. From the Project menu, click Exit.
If any work has not been saved, you are prompted to save
your work.
2. Click Ye s or No.
" To resume the tutorial:
1. Open Data Integrator and log in to the repository in which
you saved your work.
2. The Designer window opens.
3. From the Project menu, click Open.
4. Click the name of the project you want to work with, then
click
The Designer window opens with the project and the objects
within it displayed in the project area.
16 Data Integrator Core Tutorial
Open.
Data Integrator Core Tutorial
Data Integrator Product
2
Overview
This chapter provides an overview of Data Integrator. It
introduces the product architecture and the Data Integrator
Designer.
This chapter contains the following sections:
•Introduction
•The Data Integrator Designer window
•Data Integrator objects
•New terms
•Chapter summary and what to do next
Data Integrator Core Tutorial 17
2
Data Integrator Product Overview
Introduction
Introduction
This section gives you a brief introduction to Data Integrator.
What does Data Integrator do?
Data Integrator is a data movement and integration tool. One of
its fundamental capabilities is extracting, transforming, and
loading (ETL) data from heterogeneous sources into a target
database or data warehouse. You create applications (jobs) that
specify data mappings and transformations by using the Data
Integrator Designer.
Data Integrator real-time interfaces provide additional support
for real-time data movement and access. Data Integrator
real-time reacts immediately to messages as they are sent,
performing predefined operations with message content. Data
Integrator real-time components provide services to web
applications and other client applications.
18 Data Integrator Core Tutorial
Data Integrator components
The Data Integrator product consists of several components
including:
•Data Integrator Designer
The Designer allows you to create, test, and execute jobs that
populate a data warehouse. It is a development tool with a
unique graphical user interface. It enables developers to
create objects, then drag, drop, and configure them by
selecting icons in a source-to-target flow diagram. It allows
you to define data mappings, transformations, and control
logic
. Use the Designer to create applications specifying work
flows (job execution definitions) and data flows (data
transformation definitions).
•Data Integrator Job Server
The Job Server is an application that launches the Data
Integrator processing engine and serves as an interface to the
engine and other components in the Data Integrator suite.
Data Integrator Product Overview
Introduction
2
•Data Integrator engine
The Data Integrator engine executes individual jobs defined
in the application you create using the Designer. When you
start your application, the Data Integrator Job Server
launches enough engines to effectively accomplish the
defined tasks.
•Data Integrator Repository
The repository is a database that stores Designer predefined
system objects and user-defined objects including source and
target metadata and transformation rules. In addition to the
local repository used by the Designer and Job Server, you can
optionally establish a central repository for object sharing
and version control.
Data Integrator Core Tutorial 19
2
Data Integrator Product Overview
Introduction
The Designer handles all repository transactions. Direct
manipulation of the repository is unnecessary except for:
!Setup before installing Data Integrator
You must create space for a repository within your
RDBMS before installing Data Integrator.
!Security administration
Data Integrator uses your security at the network and
RDBMS levels.
!Backup and recovery
You can export your repository to a file. Additionally, you
should regularly back up the database where the
repository is stored.
•Data Integrator Access Server
The Access Server passes messages between web applications
and the Data Integrator Job Server and engines. It provides a
reliable and scalable interface for request-response
processing.
•Data Integrator Web Administrator
The Web Administrator provides browser-based
administration of Data Integrator resources, including:
!Scheduling, monitoring, and executing batch jobs
!Configuring, starting, and stopping real-time services
!Configuring Job Server, Access Server, and repository
!Configuring and managing adapters
!Managing users
!Publishing batch jobs and real-time services via Web
20 Data Integrator Core Tutorial
usage
services
Data Integrator Product Overview
Introduction
The following diagram illustrates Data Integrator product
components and relationships.
2
Central Repository
(shared, optional)
Heterogeneous
Data Sources
Designer and
Local Admin
Local Repository
(user-specific)
Job Server and
Engine
Data Warehouse
(Target)
How do you use Data Integrator?
You use Data Integrator to design, produce, and run data
movement applications.
Using the Designer, you can build work flows and data flows that
specify data extraction, transformation, and loading processes.
In Data Integrator real-time, you have the added capability to
build real-time data flows that support e-business transactions.
Web-based
Administration
Access Server
Web
Applications
You create jobs to contain, organize, and run your flows. You
create projects to organize the jobs.
Refine and build on your design until you have created a
well-tested, production-quality application. In Data Integrator,
you can set applications to run in test mode or on a specific
schedule. Using Data Integrator real-time, you can run
applications in real time so they immediately respond to
web-based client requests.
Data Integrator Core Tutorial 21
2
Data Integrator Product Overview
Introduction
Data Integrator system configurations
You can configure Data Integrator in various ways. The following
diagram illustrates one possible system configuration.
Designer
Job Server
Access Server
(Windows NT
or 2000)
TCP/IP Network
Sources
Job Server
(UNIX)
Data Warehouse
ERP Data
ERP
Application
Server
When integrating Data Integrator into your existing
environment, consider:
•The servers shown in the diagram can be separate physical
computers, or they can be installed on a single computer.
•For peak performance, install and create the Data Integrator
local repository on either the same computer as the Data
Integrator Job Server or on the same computer as the target
data warehouse.
In either of the previous configurations, the computer
should be on the same LAN segment as the rest of the Data
Integrator components.
22 Data Integrator Core Tutorial
As shown in the diagram, most Data Integrator
components—the Designer, Job Server, and Access Server—can
run on the same Windows NT or 2000 system, or you can install
the Job Server on a UNIX system running Hewlett Packard
HP-UX, Sun Solaris, or IBM AIX.
Windows implementation
You can configure a Windows NT or 2000 system as either a
server or a workstation. A large-memory, multiprocessor system
is ideal because the multithreading, pipelining, and parallel
work flow execution features in Data Integrator take full
advantage of such a system.
You can create your target data warehouse on a database server
that may or may not be a separate physical computer.
You can use a shared disk or FTP to transfer data between your
source system and the Data Integrator Job Server.
Data Integrator Product Overview
Introduction
2
UNIX implementation
You can install the Data Integrator Job Server on a UNIX system
running Hewlett Packard HP-UX, Sun Solaris, or IBM AIX. You
can also configure the Job Server to start automatically when you
restart the computer.
Data Integrator Core Tutorial 23
2
Data Integrator Product Overview
The Data Integrator Designer window
The Data Integrator Designer window
The following illustration shows the key areas of the Data
Integrator Designer window.
Workspace
Project area
Tool palette
Local object
library
The key areas of the Data Integrator application window are:
•Project area — Contains the current project (and the job(s)
and other objects within it) available to you at a given time.
In Data Integrator, all entities you create, modify, or work
with are objects.
•Workspace — The area of the application window in which
you define, display, and modify objects.
•Local object library — Provides access to local repository
objects including built-in system objects, such as transforms,
and the objects you build and save, such as jobs and data
flows.
24 Data Integrator Core Tutorial
Data Integrator Product Overview
The Data Integrator Designer window
•Tool palette — Buttons on the tool palette enable you to add
new objects to the workspace.
2
Data Integrator Core Tutorial 25
2
Data Integrator Product Overview
Data Integrator objects
Data Integrator objects
In Data Integrator, all entities you add, define, modify, or work
with are objects. Objects have:
OptionsControl the object. For example, to set up a
connection to a database, defining the database
name would be an option for the connection.
PropertiesDescribe the object. For example, the name and
creation date. Attributes are properties used to
locate and organize objects.
ClassesType of object. You can copy reusable objects
from the object library. You cannot copy
single-use objects.
The following diagram shows transform objects in the Data
Integrator object library.
Object
list
To resize the object library, click and drag it until you see the
border size you want, then release. When you widen the object
library, the name of each object is visible next to its icon.
Object library tabs
26 Data Integrator Core Tutorial
Object hierarchy
The following illustration shows the hierarchical relationships
for the key object types within Data Integrator.
Data Integrator Product Overview
Data Integrator objects
2
Key:
Data Integrator Core Tutorial 27
2
Data Integrator Product Overview
Data Integrator objects
In the repository, the designer groups objects hierarchically from
a project, to jobs, to optional work flows, to data flows. In jobs,
work flows define a sequence of processing steps, and data flows
transform data from source(s) to target(s).
Projects and jobs
A project is the highest-level object in the Data Integrator
Designer window. Projects provide you with a way to organize
the other objects you create in Data Integrator. Only one project
is open at a time (where “open” means “visible in the project
area”).
A job is the smallest unit of work that you can schedule
independently for execution.
Work flows and data flows
Jobs are composed of work flows and/or data flows:
•A work flow is the incorporation of several data flows into a
coherent flow of work for an entire job.
•A data flow is the process by which source data is transformed
into target data.
A work flow orders data flows and operations that support them;
a work flow also defines the interdependencies between data
flows. For example, if one target table depends on values from
other tables, use the work flow to specify the order in which you
want Data Integrator to populate the tables. Also use work flows
to define strategies for handling errors that occur during project
execution. You can also use work flows to define conditions for
running sections of a project.
The following diagram illustrates a typical work flow.
28 Data Integrator Core Tutorial
Work flow
Data Integrator Product Overview
Data Integrator objects
2
Control
Operations
A data flow defines the basic task that Data Integrator
accomplishes, which involves moving data from one or more
sources to one or more target tables or files. You define data
flows by identifying the sources from which to extract data, the
transformations the data should undergo, and targets.
Data flow
Source(s)
Object-naming conventions
Data Integrator recommends that you follow a consistent
naming convention to facilitate object identification. For
example, Rapid Marts use standards that are documented in the
Business ObjectsComponent Development Guide. Here are some
examples:
AttributeProperty that can be used as a constraint for locating
objects.
Data flowContains steps to define how source data becomes target
data. Called by a work flow or job.
DatastoreLogical channel that connects Data Integrator to source
and target databases.
JobThe smallest unit of work that you can schedule
independently for execution. A job is a special work flow
that cannot be called by another work flow or job.
MetadataData that describes the objects maintained by Data
Integrator.
ObjectAny project, job, work flow, data flow, datastore, file
format, message, custom function, or transform created,
modified, or used in Data Integrator.
Object libraryPart of the Designer interface that represents a “window”
into the local repository and provides access to reusable
objects.
OptionA choice in a dialog box that controls how an object
functions.
ProjectGrouping of jobs for the purpose of organized storage.
The Designer can open only one project at a time.
PropertyCharacteristic used to define the state, appearance, or
value of an object; for example, the name of the object or
the date it was created.
RepositoryA database that stores Designer predefined system objects
and user-defined objects including source and target
metadata and transformation rules. Can be local or
central (shared).
Source Table, file, or legacy system from which Data Integrator
reads data.
Target Table or file to which Data Integrator loads data.
Work flowContains steps to define the order of job execution. Calls
a data flow to manipulate data.
30 Data Integrator Core Tutorial
Data Integrator Product Overview
Chapter summary and what to do next
Chapter summary and what to do next
This chapter has given you a short overview of the Data
Integrator product and terminology. For more information
about these topics, see the first few chapters of the Data Integrator
Getting Started Guide and Data Integrator Designer Guide.
In the next chapter, you will begin creating objects using the
Data Integrator Designer.
2
Data Integrator Core Tutorial 31
2
Data Integrator Product Overview
Chapter summary and what to do next
32 Data Integrator Core Tutorial
Data Integrator Core Tutorial
Defining Source and Target
3
Metadata in Data Integrator
In this chapter you will set up logical connections between Data
Integrator, a flat-file source, and a target data warehouse. You
will also create and import objects into the local repository.
Storing connection metadata in the repository enables you to
work within Data Integrator to manage tables that are stored in
various environments.
The tasks in this chapter include:
•Logging in to the repository
•Defining a datastore
•Importing metadata
•Defining a file format
Data Integrator Core Tutorial 33
3
Defining Source and Target Metadata in Data Integrator
Logging in to the repository
Logging in to the repository
When you use Data Integrator, you save your work in the local
repository. So, when you open Data Integrator, a login window
for the local repository opens.
" To log in to Data Integrator
1. From the Start menu, click Programs > Business Objects Data
Integrator version > Data Integrator Designer
As Data Integrator starts, a repository login screen appears. A
sample login window (for the Oracle environment) appears
as follows.
.
2. Supply the appropriate login information for your
repository, then click
Data Integrator connects to the specified repository, then
opens the Designer.
See “Create repository, source, and target databases on an
existing RDBMS” on page 8 for login information.
34 Data Integrator Core Tutorial
OK.
Defining Source and Target Metadata in Data Integrator
Logging in to the repository
In the next section you will define datastores (connections) for
your source and target.
3
Data Integrator Core Tutorial 35
3
Defining Source and Target Metadata in Data Integrator
Defining a datastore
Defining a datastore
Datastores:
•Provide a logical channel (connection) to a database
•Must be specified for each source and target database
•Are used to import metadata for source and target databases
into the repository.
•Are used by Data Integrator to read data from source tables
and load data to target tables
The databases to which Data Integrator datastores can connect
include:
•Oracle
•IBM DB2
•Informix
•Microsoft SQL Server
•Sybase
•ODBC
•DETAIL (for mainframe sources)
Metadata consists of:
!Database tables
!RDBMS functions
!Application-specific data structures
36 Data Integrator Core Tutorial
•Table name
•Column names
•Column data types
•Primary key columns
•Table attributes
Defining Source and Target Metadata in Data Integrator
Connection metadata is defined in the object library as
datastores (for tables) and file formats (for flat files).
The next task describes how to define datastores using the
Designer. Note that while you are designating the datastores as
sources or targets, datastores only function as connections. You
will define the actual source and target objects when you define
data flows later in the tutorial.
" To define a datastore for the source (ODS) database
1. From the Datastores tab of the object library, right-click in
the blank area and click
New.
Defining a datastore
3
TIP: To find the Datastores tab, roll your cursor over each
object library tab and read the text. You can also expand the
Object Library window by clicking and dragging it or
double-clicking it. Tab names display when you expand the
Object Library window.
Data Integrator Core Tutorial 37
3
Defining Source and Target Metadata in Data Integrator
Defining a datastore
The Datastore Editor window opens. A sample Datastore
Editor window for the Oracle environment appears as
follows:
2. In the Name box, type ODS_DS.
This datastore name labels the connection to the database
you will use as a source. The datastore name will appear in
the local repository. When you create your own
projects/applications, remember to give your objects
meaningful names.
3. In the Application Type box, click Custom.
38 Data Integrator Core Tutorial
Defining Source and Target Metadata in Data Integrator
Defining a datastore
4. In the Database Type box, click the option that corresponds
to the database software being used to store the source data.
3
The remainder of the boxes on the
Administrator
window depend on the Database Type
Datastore Web
selected.
5. Accept the default row commit size of 1000 for all databases.
The commit size option refers to batch array size (number of
rows).
6. The following table lists some of the options for the
Datastore Editor depending on the RDBMS you are using.
The options marked skip are not used in this tutorial. Enter
the information you recorded in “Create repository, source,
and target databases on an existing RDBMS” on page 8.
OracleDB2InformixSQL ServerSybase
Datastore nameDatastore nameDatastore nameDatastore nameDatastore name
Database typeDatabase typeDatabase typeDatabase typeDatabase type
CDCDB2 data sourceInformix data
source
Database
connection
name
Oracle versionDB2 versionInformix
version
Bulk loader
directory (skip)
Overflow file
directory (skip)
User nameUser name User nameUser nameUser name
Password Password Password Password Password
LiveLoad (skip)LiveLoad (skip)LiveLoad (skip)LiveLoad (skip)LiveLoad (skip)
Bulk loader
directory (skip)
Overflow file
directory (skip)
Bulk loader
directory (skip)
Overflow file
directory (skip)
Database server
name
Database nameDatabase name
SQL server
version
Overflow file
directory (skip)
Database server
name
Sybase version
Overflow file
directory (skip)
7. Click OK.
Data Integrator saves a datastore for your source in the
repository.
Data Integrator Core Tutorial 39
3
Defining Source and Target Metadata in Data Integrator
Defining a datastore
" To define a datastore for the target database
Define a datastore for the target database using the same
procedure as for the source (ODS) database.
•Use TARGET_DS for the datastore name.
•Use the information you recorded in “Create repository,
source, and target databases on an existing RDBMS” on
page 8.
40 Data Integrator Core Tutorial
Importing metadata
With Data Integrator, you can import metadata for individual
tables using a datastore. You can import metadata by:
•Browsing
•Name
•Searching
The following procedure describes how to import by browsing.
" To import metadata for ODS source tables into Data Integrator
1. In the Datastores tab, right-click the ODS_DS datastore and
Open.
click
The names of all the tables in the database defined by the
datastore named ODS_DS display in a window in the
workspace.
Defining Source and Target Metadata in Data Integrator
Importing metadata
3
2. Move the cursor over the right edge of the Metadata column
heading until it changes to a resize cursor.
3. Double-click the column separator to automatically resize
the column.
Data Integrator Core Tutorial 41
3
Defining Source and Target Metadata in Data Integrator
Importing metadata
4. Import the following tables by right-clicking each table name
and clicking
Import. Alternatively, because the tables are
grouped together, click the first name, Shift-click the last, and
import them together. (Use Ctrl-click for nonconsecutive
entries.)
NOTE: In MS SQL Server, the owner prefix will be dbo
instead of TARGET.
Import. Alternatively, Use Ctrl-click and import
3
Data Integrator imports the metadata for each table into the
local repository.
Data Integrator Core Tutorial 43
3
Defining Source and Target Metadata in Data Integrator
Importing metadata
3. Expand the Target_DS datastore’s Tables container in the
object library and verify the tables are listed in the repository.
44 Data Integrator Core Tutorial
Defining Source and Target Metadata in Data Integrator
Defining a file format
If the source or target RDBMS includes data stored in flat files,
you must define file formats in Data Integrator. File formats are a
set of properties that describe the structure of a flat file.
Data Integrator includes a file format editor. Use it to define flat
file formats. The editor supports delimited and fixed-width
formats.
You can specify file formats for one file or a group of files. You
can define flat files from scratch or by import and modify an
existing flat file. Either way, Data Integrator saves a connection to
the file or file group.
NOTE: Data Integrator also includes a file format
(Transport_Format) that you can use to read flat files in SAP R/3.
In the next chapter you will use a flat file as your source data.
Therefore, you must create a file format and connection to the
file now.
Defining a file format
3
" To define a file format
1. In the object library, click the File Formats tab, right-click in a
blank area of the object library, and click
2. The file format editor opens.
3. Under General, leave Ty p e as Delimited. Change the Name to
Format_SalesOrg.
4.
Under Data Files, click the Files folder icon and navigate in
your Data Integrator install directory to \Tutorial
Files\sales_org.txt. Click
displays sample data from the sales_org.txt file in the (lower
right) Data Preview pane.
New > File Format.
Open. The file format editor
Data Integrator Core Tutorial 45
3
Defining Source and Target Metadata in Data Integrator
Defining a file format
5. Under Default Format, change Date to ddmmyyyy.
The source data contains dates with a two-digit day number
followed by a two-digit month number, a four-digit year
(ddmmyyyy), and no time value. (Note that the sample dates
do not contain a delimiter between values. Using an
unsupported format string such as ddmmJJJJ will result in
incorrect dates, and no error message will appear.)
6. The first row contains the names of the column headings.
Under
Input/Output, change Skip row header to Ye s .
Data Integrator removes the first row and uses these column
heading values for the field names.
7. In the upper-right Column Attributes pane, click DateOpen
and change the data type to
lengths should appear as follows:
Defining Source and Target Metadata in Data Integrator
8. Click Save & Close.
Defining a file format
3
Data Integrator Core Tutorial 47
3
Defining Source and Target Metadata in Data Integrator
New terms
New terms
The terms examined in this chapter included:
Te r mM e a n i n g
DatastoreConnection from Data Integrator to tables in source
or target databases. Stored as an object in the
repository.
MetadataData that describes objects maintained by Data
Integrator. Metadata that Data Integrator stores in its
local repository includes:
•Table name
•Column name
•Column data types
•Primary key columns
•Table attributes
•RDBMS functions
Object libraryThe GUI part of the Designer representing the local
repository.
File formatA set of properties that define the table structure for a
flat file. Stored as an object in the repository.
48 Data Integrator Core Tutorial
Defining Source and Target Metadata in Data Integrator
Chapter summary and what to do next
Chapter summary and what to do next
At this point, you have finished all the preparatory work to
define data movement specifications for a flat-file data source to
a target data warehouse. In this chapter you have:
•Defined a datastore from Data Integrator to your target data
warehouse
•Imported metadata from target tables into the local
repository so that you can use Data Integrator Designer to
work with these tables
•Defined file formats and a connection to flat-file source data
You are now ready to create a new project and define jobs that
will populate the target tables with source data. You will do that
for the sales organization dimension table in Chapter 4,
“Populating the SalesOrg Dimension from a Flat File”.
Sales Org
Dimension
3
You can now exit Data Integrator or go on to the next chapter.
The information you have created in this chapter has been saved
in the local repository and will be automatically available to you
the next time you use Data Integrator.
If you need more information about the topics covered in this
chapter, see the Data Integrator Designer Guide.
Data Integrator Core Tutorial 49
3
Defining Source and Target Metadata in Data Integrator
Chapter summary and what to do next
50 Data Integrator Core Tutorial
Data Integrator Core Tutorial
Populating the SalesOrg
4
Dimension from a Flat File
In this chapter, you will populate the sales organization
Sales Org
Dimension
dimension table in your target data warehouse with data from a
flat file called Format_SalesOrg.
The concepts and tasks in this chapter include:
•Objects and their hierarchical relationships
•Adding a new project
•Adding a job
•Adding a work flow
•About data flows
•Saving the project
•Executing the job
•About deleting objects
•New terms
•Chapter summary and what to do next
Data Integrator Core Tutorial 51
4
Populating the SalesOrg Dimension from a Flat File
Objects and their hierarchical relationships
Objects and their hierarchical relationships
Everything in Data Integrator is an object. The key objects
involved in data movement activities (like projects, jobs, work
flows, and data flows) display in the Data Integrator Designer
project area according to their relationship in the object
hierarchy.
The following figure shows a display of the types of objects you
will be creating while you are working in this chapter.
Project
Job
Workflow
Dataflow
Object hierarchies are displayed in the project area of the
Designer.
52 Data Integrator Core Tutorial
Adding a new project
Projects group and organize related objects. Projects display in
the project area of the Designer and can contain any number of
jobs, work flows, and data flows.
" To add a new project
1. In the Data Integrator Designer, from the Project menu click
New > Project.
2. Name the project Class_Exercises.
3. Click Create.
The project name appears as the only object in the project area of
the Designer.
Next, you will define the job that will be used to extract the
information from the flat-file source.
Populating the SalesOrg Dimension from a Flat File
Adding a new project
4
Data Integrator Core Tutorial 53
4
Populating the SalesOrg Dimension from a Flat File
Adding a job
Adding a job
A job is a reusable object. It is also the second level in the project
hierarchy. It contains work flows (which contain the order of
steps to be executed) and data flows (which contain data
movement instructions). You can execute jobs manually or as
scheduled.
In this exercise you will define a job called JOB_SalesOrg.
" To add a new batch job
1. Right-click in the project area and click New Batch Job.
2. Right-click the new job and click Rename. Alternatively,
left-click the job twice (slowly) to make the name editable.
3. Type JOB_SalesOrg.
4. Left-click or press Enter.
The job appears in the project hierarchy under Class_Exercises
and in the project area of the object library.
54 Data Integrator Core Tutorial
Adding a work flow
A work flow is a reusable object. It executes only within a Job.
Use work flows to:
•Call data flows
•Call another work flow
•Define the order of steps to be executed in your job
•Pass parameters to and from data flows
•Define conditions for executing sections of the project
•Specify how to handle errors that occur during execution
•Define conditions for executing sections of the project
Work flows are optional.
The Data Integrator objects you can use to create work flows
appear on the tool palette:
Populating the SalesOrg Dimension from a Flat File
Adding a work flow
4
ButtonComponentProgramming Analog
Work flowProcedure
Data FlowDeclarative SQL select statement
ScriptSubset of lines in a procedure
ConditionalIf/then/else logic
TryTry block indicator
CatchTry block terminator and exception handler
Data Integrator Core Tutorial 55
4
Populating the SalesOrg Dimension from a Flat File
Adding a work flow
" To ad d a wo r k f l o w
1. With JOB_SalesOrg selected in the project area, click the
work flow button on the tool palette.
2. Click the blank workspace area.
A work flow icon appears in the workspace. The work flow
also appears in the project area on the left under the job
name (expand the job to view).
NOTE: You can place a work flow anywhere in the
workspace, but because flows are best viewed from left to
right and top to bottom, place it near the top left corner.
3. Change the name of the work flow to WF_SalesOrg.
4. Click the name of the work flow.
An empty view for the work flow appears in the workspace.
You will use this view to define the elements of the work
flow. Notice the title bar changes to display the name of the
work flow.
56 Data Integrator Core Tutorial
About data flows
A data flow defines the flow of data from sources to targets. It is
used to:
•Identify the source data you want to read
•Define the transformations you want to perform on the data
•Identify the target table to which you want to load data
A data flow is a reusable object. It is always called from a work
flow or a job.
The first data flow you need to create for this tutorial reads sales
organization data from a flat file and loads the data into the
sales organization dimension table in the target data warehouse.
Adding a data flow
The following procedure creates a data flow named DF_SalesOrg
inside the work flow WF_SalesOrg.
Populating the SalesOrg Dimension from a Flat File
About data flows
4
" To ad d a dat a f l ow
1. Make sure the work flow window is open in the workspace.
If it is not, click the WF_SalesOrg work flow in the project
area.
2. Click the data flow button on the tool palette.
3. Click the workspace.
A representation of a data flow appears in the workspace. The
data flow also appears in the project area.
4. Change the name of the data flow to DF_SalesOrg.
Data Integrator Core Tutorial 57
4
Populating the SalesOrg Dimension from a Flat File
About data flows
Notice the project, job, work flow, and data flow objects
display in hierarchical form in both the project area and the
object library. To navigate to these levels, click their names in
the project area.
Data flow in the
workspace
Data flow in the
project area
Data flow in the local
object library
5. Click the data flow name.
A definition area or grid for the data flow appears in the
workspace. This area will be used to define the data flow as
described in the next section.
58 Data Integrator Core Tutorial
Defining the data flow
Inside the data flow, you must specify the instructions to
transform source data into the form you want for the target table.
In this case, you will define the data flow instructions for
building the sales organization dimension table using the
following objects:
•An object representing the source file
•An object defining a query transform (or query). A query
transform maps columns from source to target. With a query,
you can:
!Select row sets
!Create joins
!Group and order by data values
!Create data set filters
!Execute functions
Populating the SalesOrg Dimension from a Flat File
About data flows
4
•An object representing the target table into which the data
loads
The next three exercises guide you through the steps necessary to
define a dataflow’s content:
1. Add objects to the data flow.
2. Connect them in the order that data will flow through them.
3. Define the query that maps the source columns to the target
columns.
" To add objects to a data flow
1. Verify the DF_SalesOrg data flow is open in the workspace.
In the object library, click the
Formats tab.
Data Integrator Core Tutorial 59
4
Populating the SalesOrg Dimension from a Flat File
About data flows
2. The source for this data flow will be the flat file
Format_SalesOrg. Click and drag the Format_SalesOrg object
to the workspace.
3. Drop the Format_SalesOrg object on the left side of the
workspace to leave room for more objects on the right.
4. Click Make Source from the shortcut menu that appears
when you drop the object.
5. Click the query button on the tool palette.
6. Click to the right of the source file in the workspace.
7. In the object library, click the Datastores tab and expand the
datastore named Target_DS.
8. The target for this data flow will be the table
SALESORG_DIM. Click and drag the SALESORG_DIM object
to the workspace and drop it to the right of the query.
9. Click Make Target from the shortcut menu that appears
when you drop the table. Data Integrator creates a table with
the same schema as the query output schema.
All the elements necessary to create the sales organization
dimension table are now in the workspace. In the next section,
you will connect the steps in the correct order.
" To define the order of steps in a data flow
You now need to define the sequence for the data flow. The steps
execute in left-to-right order.
1. Click the square on the right edge of the source file and drag
it to the triangle on the left edge of the query transform.
60 Data Integrator Core Tutorial
Populating the SalesOrg Dimension from a Flat File
About data flows
2. Use the same drag technique to connect the query transform
to the target table.
The next section introduces the query transform and the query
transform editor.
" To define a query transform that maps a source to a target
1. Click the name of the query in the project area or in the
workspace.
The query editor opens showing the source schema, the
target schema (which has automatically been copied from
the target table metadata to the output pane of the query),
and property sheets for defining the query.
4
There are four columns in the source schema: Sales_Office,
Region, DateOpen, and Country. You will specify that three
of them (Sales_Office, Region, and DateOpen) be copied to
the target table.
2. Map the source columns to target columns by dragging and
dropping each source column name onto the corresponding
column for the target schema. From the shortcut menu that
appears, click
imported from the target. Notice that the column icon next
to the source column changes to an arrow when dragged to
the target, which indicates that the column has been
mapped.
Remap Column to preserve the data type
Data Integrator Core Tutorial 61
4
Populating the SalesOrg Dimension from a Flat File
About data flows
Also notice that when you select a column in the target
schema, the Mapping tab of the property sheet shows the
mapping relationship for that column. You can also view the
mapping relationship by scrolling the Schema Out pane to
the right to show the Mapping column.
Target schema
Source schema
Property sheets
Column mapping
definition
3. Validate the query by clicking the Valid ate Cu rre n t button on
the toolbar.
The Output window appears with tabs for Errors, Warnings,
and Information. The Warning tab indicates that Data
Integrator will convert the data type for the SALESOFFICE
column.
4. Close the Output window and click the Back arrow on the
tool bar to close the query editor and return to the data
flow view.
62 Data Integrator Core Tutorial
Validating the data flow
Next you will verify that the data flow has been constructed
properly (which does not guarantee that the job will run or
produce the correct data).
Va l id at i o n menu provides design-time validation options.
The
These options check for syntax errors, not run-time errors.
Run-time validation occurs while the job executes. (See
“Executing the job” on page 66.)
" To va l i da t e a d at a f l o w
1. Click DF_SalesOrg in the project area.
2. From the menu bar, click Vali dati on > Vali date > C ur rent
View
.
NOTE:
Populating the SalesOrg Dimension from a Flat File
About data flows
4
!Current View — Validates the object definition open in
the workspace.
!All Objects in View — Validates the object definition
open in the workspace and all of the objects that it calls.
Also note the
toolbar.
Data Integrator displays the Output dialog box with the Warning
tab indicating that Data Integrator will convert the data type for
the SALESOFFICE column.
If the Output window displays any errors, correct them before
proceeding.
Val idate C urr ent and Vali date A ll buttons on the
Data Integrator Core Tutorial 63
4
Populating the SalesOrg Dimension from a Flat File
About data flows
Addressing errors
" To use the Output window:
1. Right-click an error message and click View.
Data Integrator displays the Error Message window in which
you can read the expanded error message text.
2. Double-click the error message to open the editor of the
object containing the error.
NOTE: Warning messages do not prohibit job execution.
You have validated the data flow and completed the description
of the data movement for the sales organization dimension
table.
64 Data Integrator Core Tutorial
Saving the project
You can save the steps you have completed and close Data
Integrator at any time.
Populating the SalesOrg Dimension from a Flat File
Saving the project
4
To save all objects in a project, from the
All
. Or, when you close a project or exit the program, Data
Integrator lists all recently changed items and prompts you to
save them.
To save work displayed only in the active object workspace, from
the
Project menu click Save.
Project menu click Save
Data Integrator Core Tutorial 65
4
Populating the SalesOrg Dimension from a Flat File
Executing the job
Executing the job
Now you will execute the job you have just created.
" To execute a job
First verify that your job server is running by looking at the job
server icon at the bottom of the Designer window. Mouse over
the icon to see the Job Server name, machine name, and port
number in the status area. If the job server is not running, the
icon will have a red X on it.
1. Select the job name in the project area, in this case
JOB_SalesOrg.
2. Right-click and click Execute.
If you have changed any objects and not yet saved them,
Data Integrator prompts you to save your work. Click
OK.
66 Data Integrator Core Tutorial
Populating the SalesOrg Dimension from a Flat File
Executing the job
Data Integrator validates the job again before running it. If
there are syntax errors, review the error messages and return
to the design and validate the effected steps of the job. Also
review any warning messages before dismissing them.
If the job validates properly, the Execution Properties dialog
box appears. These properties include debugging parameters,
whether you want to capture data for viewing, and an option
to set global variables.
4
To set or change a job’s default execution settings, right-click
the job name and click
3. For this exercise, do not change the defaults. Click OK.
Properties.
Data Integrator Core Tutorial 67
4
Populating the SalesOrg Dimension from a Flat File
Executing the job
Data Integrator executes the job and produces three log files:
!Monitor log — A list of the job steps in the order they
started.
!Statistics log — A list of each step in the job, the number
of rows processed by that step, and the time required to
complete the operation.
!Error log — A list of any errors produced by the RDBMS,
Data Integrator, or the computer operating system during
the job execution.
The log window opens in the workspace and displays the
monitor log. The buttons at the top of the log window show
which kind of log you are viewing. Clicking the middle
button displays the statistics log, and the right-hand button
shows the error log.
4. Use the buttons at the top of the log window to view the
statistics log. Because there are no errors or warnings, the
error log button is unavailable (there is no error log to
examine).
68 Data Integrator Core Tutorial
Populating the SalesOrg Dimension from a Flat File
Executing the job
5. After the job is complete, use your RDBMS query tool to
check the contents of the table named salesorg_dim.
4
Data Integrator Core Tutorial 69
4
Populating the SalesOrg Dimension from a Flat File
About deleting objects
About deleting objects
Deleting a job or other object from the project area does not
delete it from the object library. The object still exists in any
other projects to which it has been assigned. Deleting an object
from the object library, however, deletes all occurrences of the
object.
70 Data Integrator Core Tutorial
New terms
Populating the SalesOrg Dimension from a Flat File
New terms
The terms introduced in this chapter included:
Te r mM e a n i n g
Property sheetsTabs in the query editor used for defining properties
of the output schema
QueryA Data Integrator transform object
SchemaA description of the data layout of a table
TransformA Data Integrator object that defines data
transformation
4
Data Integrator Core Tutorial 71
4
Populating the SalesOrg Dimension from a Flat File
Chapter summary and what to do next
Chapter summary and what to do next
You have now defined and run your first data movement job—a
job that populates the sales organization dimension table in the
target data warehouse.
The next chapter introduces you to time dimension tables. In
that exercise you will populate the time dimension table with
some simple time attributes:
•Year number
•Month number
•Business quarter
The chapter also introduces you to the Date_Generation
transform.
Time
Dimension
If you need more information about the topics covered in this
chapter, see the Data Integrator Designer Guide.
72 Data Integrator Core Tutorial
Data Integrator Core Tutorial
Populating the Time
5
Dimension Using a Transform
The exercise in this chapter builds and populates a time
Time
Dimension
dimension table. Time dimension tables contain
date/time-related attributes such as season, holiday period, fiscal
quarter, and others that are not directly ascertainable from
traditional SQL style date/time data types.
The time dimension table in this example is simple in that it
contains only the year number, month number, and business
quarter as time attributes. It uses a Julian date as a primary key.
The tasks in this chapter include:
•Retrieving the project
•Adding the job and data flow
•Defining the time dimension data flow
•Saving and executing the job
Data Integrator Core Tutorial 73
5
Populating the Time Dimension Using a Transform
Retrieving the project
Retrieving the project
If you have closed Data Integrator, reopen the Class_Exercises
project.
" To open the Class_Exercises project
1. Start Data Integrator.
2. Log in to your repository.
3. Click Project > Open.
4. Click Class_Exercises.
5. Click OK.
74 Data Integrator Core Tutorial
Populating the Time Dimension Using a Transform
Adding the job and data flow
As you have done in the previous exercise, you will begin with
creating a job that contains a data flow. A work flow is not
necessary for this job; you will be calling the data flow directly
from the job.
" To ad d t h e j o b an d dat a f l ow
1. Add a job named JOB_TimeDim that will load data into the
time dimension table.
For a reminder about how to create a job, see “Adding a job”
on page 54.
2. Open JOB_TimeDim and create a data flow object named
DF_TimeDim. (Do not create a work flow this time.)
Adding the job and data flow
5
Data Integrator Core Tutorial 75
5
Populating the Time Dimension Using a Transform
Defining the time dimension data flow
Defining the time dimension data flow
The data flow instructions for populating a time dimension table
consist of the following objects:
•A Date_Generation transform as a source
•A query to produce column values from the generated dates
such as what day the date falls on, the quarter it’s in, and so
on.
•A target table into which the time dimension data loads
" To specify the components of the time data flow
1. Open the data flow named DF_TimeDim.
2. In the object library, click the Transforms tab.
3. Click the Date_Generation transform.
Transforms are predefined Data Integrator objects.
4. Drag the transform icon into the workspace and drop it on
the left side of the workspace to leave room for more objects
to the right.
5. Click the query button on the tool palette.
6. Click in the workspace to the right of the Date_Generation
transform.
7. In the object library for the datastore named Target_DS, drag
the table named TIME_DIM into the workspace and drop it
to the right of the query.
8. Click Make Target from the shortcut menu.
76 Data Integrator Core Tutorial
Populating the Time Dimension Using a Transform
All the objects you need to create a time dimension table are
now available in the workspace. Next you will connect the
objects in the order they need to be executed.
" To define the flow of data
1. Click the square on the right edge of the Date_Generation
transform and drag to the triangle on the left edge of the
query.
2. Use the same drag technique to connect the query to the
TIME_DIM target.
The connections indicate the flow of data through these
instructions.
Defining the time dimension data flow
5
The following sections define the details of each step of the data
flow.
" To define the output of the Date_Generation transform
The Date_Generation transform produces a column of dates for
a range and increment you specify.
Data Integrator Core Tutorial 77
5
Populating the Time Dimension Using a Transform
Defining the time dimension data flow
1. Click the name of the Date_Generation transform to open
the transform editor.
2. Enter these values in the editor columns:
Start Date1994.01.01
End Date2000.12.31
Incrementdaily
3. Click the Back arrow in the tool bar to close the
transform editor and return to the data flow.
The date column serves as the input to the query. The query
allows you to apply functions to the input columns and map
those columns to an internal data set (the query output).
" To define the output of the query
1. Click the query icon in the project area.
78 Data Integrator Core Tutorial
Populating the Time Dimension Using a Transform
Defining the time dimension data flow
The query editor shows an input schema with a single
column, the output schema copied from the target, and
property sheets for defining the query.
2. Map the generated date to the NATIVEDATE column by
dragging the DI_GENERATED_DATE column from the input
schema to the NATIVEDATE output column, and click
Remap Column.
Alternatively, you can drag and drop the
DI_GENERATED_DATE column from the source pane into
the mapping expression. The column name
DI_GENERATED_DATE will be qualified with the name of
the Date_Generation transform; for example,
Date_Generation_1.DI_GENERATED_DATE.
3. Click each column name in the target schema (one by one)
and define the mapping for the column using the following
table. The mapping definition applies to the column selected
in the target schema. Type the mapping expression directly
on the
Mapping tab.
5
Alternatively, click the
Function button to open the Function
editor and add functions to the mapping definition that way.
These are the mapping expressions and function descriptions
for the rest of the columns:
Column nameMappingFunction description
Date_ID
Yea rNum
MonthNum
BusQuarter
julian(di_generated_date)
to_char(di_generated_date,
'yyyy')
month(di_generated_date)
quarter(di_generated_date)
Use the JULIAN function to set the
Julian date for that date value.
Use the TO_CHAR function to
select only the year out of the date
value. Enclose yyyy in single
quotes.
Use the MONTH function to set the
month number for that date value.
Use the QUARTER function to set
the quarter for that date value.
Data Integrator Core Tutorial 79
5
Populating the Time Dimension Using a Transform
Defining the time dimension data flow
You might notice that the tutorial simplifies the quarter
calculation. For the purposes of the tutorial, assume that the
business year is the same as the calendar year.
4. Click the Back arrow on the tool bar.
These columns are now the input schema for the TIME_DIM
table.
This completes the data-movement description for the time
dimension table.
80 Data Integrator Core Tutorial
Populating the Time Dimension Using a Transform
Saving and executing the job
Saving and executing the job
Save the project. For a reminder of how to save a project, see
“Saving the project” on page 65.
Execute the job JOB_TimeDim. After the job runs, check the
contents of the TIME_DIM table with a query tool for the
RDBMS on which your target data warehouse is stored.
Chapter summary and what to do next
You have now populated two tables in the sales data warehouse:
•Sales Org dimension (from a flat file)
•Time dimension, which consists of a date generation
transform and a query
In the next chapter, you will extract data to populate the
customer dimension table.
5
Customer
Dimension
At this point, you can exit Data Integrator or go on to the next
exercise.
If you need more information about the topics covered in this
chapter, see the Data Integrator Designer Guide.
Data Integrator Core Tutorial 81
5
Populating the Time Dimension Using a Transform
Chapter summary and what to do next
82 Data Integrator Core Tutorial
Data Integrator Core Tutorial
Populating the Customer
6
Dimension from a Relational
Ta b l e
The data you used to populate the sales organization dimension
Customer
Dimension
table was in a flat file. In this exercise, you will extract data from
a relational table and load it into the customer dimension table.
This exercise also demonstrates how to enable and view
metadata reports.
You must first import the source and target tables, as described
in “Importing metadata” on page 41.
The tasks in this chapter include:
•Adding the CustDim job and work flow
•Adding the CustDim data flow
•Validating the CustDim data flow
•Executing the CustDim job
•Using the interactive debugger
•Chapter summary and what to do next
Data Integrator Core Tutorial 83
6
Populating the Customer Dimension from a Relational Table
Adding the CustDim job and work flow
Adding the CustDim job and work flow
" To add the CustDim job
1. Right-click the Class_Exercises project name and click New
Batch Job
2. Rename this jobJOB_CustDim.
" To ad d t h e C u s t Di m wo r k f l o w
1. Open JOB_CustDim by clicking on its name in the project
area.
2. Define a work flow named WF_CustDim inside
JOB_CustDim.
For more information, see “Adding a work flow” on page 55.
.
84 Data Integrator Core Tutorial
Populating the Customer Dimension from a Relational Table
Adding the CustDim data flow
" To add the CustDim data flow object
1. Verify the WF_CustDim work flow window is open in the
workspace.
If not, click the WF_CustDim work flow name in the project
area (or in the workspace if visible).
2. Add a data flow to the work flow definition.
3. Rename the data flow DF_CustDim.
4. Click the name of the data flow to open the data flow
definition.
Adding the CustDim data flow
6
Data Integrator Core Tutorial 85
6
Populating the Customer Dimension from a Relational Table
Defining the CustDim data flow
Defining the CustDim data flow
The data flow instructions for building the dimension table will
consist of the following objects:
•The source table
•A query
•The target table into which the customer dimension data
loads
" To bring the objects into the data flow
1. Verify the DF_CustDim data flow is open in the workspace.
2. In the object library, view the list of tables in the ods_ds
datastore. Drag and drop the CUSTOMER table to the left
side of the workspace and click
3. Click the query icon on the tool palette. and click to the
right of the table to place the query in the workspace.
Make Source.
4. In the object library, from the list of tables for the target_ds
datastore, drag and drop the CUST_DIM table to the right of
the query and click
5. Connect the icons to indicate the flow of data, as shown.
" To define the query
1. In the workspace, click the name of the query to open the
query editor.
86 Data Integrator Core Tutorial
Make Target.
Populating the Customer Dimension from a Relational Table
Defining the CustDim data flow
2. Remap the following source columns to the target schema,
leaving the names and data types as they are in the target.
ColumnData typeDescription
Cust_IDvarchar(10)Customer number
Cust_classfvarchar(2)Customer classification
Name1varchar(35)Customer name
Addressvarchar(35)Address
Cityvarchar(35)City
Region_IDvarchar(2)Region
Zipvarchar(10)Postal code
(Do not map Cust_Timestamp.)
NOTE: In Microsoft SQL Server and Sybase, you must specify
the columns in the order shown in the table.
3. Click the Back arrow in the icon bar to return to the data
flow.
6
Data Integrator Core Tutorial 87
6
Populating the Customer Dimension from a Relational Table
Validating the CustDim data flow
Validating the CustDim data flow
Next you will verify that the data flow has been constructed
properly.
" To verify that the data flow has been constructed properly
1. From the menu bar, click
Va l id at i o n > Validate > All Objects in View
If your design contains syntax errors, a dialog box appears with a
message describing the error. Warning messages usually do not
affect proper execution of the job.
If your data flow contains no errors, you will get the following
message:
88 Data Integrator Core Tutorial
Populating the Customer Dimension from a Relational Table
Executing the CustDim job
Just as you executed the job named JOB_SalesOrg in the
previous exercise, execute the job named JOB_CustDim:
1. In the project area, right-click the job name and click
Execute.
You will be prompted to save your work:
Executing the CustDim job
6
2. Click OK.
The Execution Properties window appears.
3. Check the Capture data for viewing box and accept the
default (100)
collects sample output rows from data flows and transforms
in the job.
4. After the job completes, ensure there are no error or warning
messages.
5. To view the captured sample data, in the project area click
the data flow to open it in the workspace. Click the
magnifying glass on an object to view the data.
Or, use a query tool to check the contents of the cust_dim
table.
rows. Data Integrator’s View Data feature
Data Integrator Core Tutorial 89
6
Populating the Customer Dimension from a Relational Table
Using the interactive debugger
Using the interactive debugger
The Designer includes an interactive debugger that allows you to
examine and modify data row by row by placing filters and
breakpoints on lines in a data flow diagram. The debugger
allows you to examine what happens to the data after each
transform or object in the flow.
A debug filter functions as a simple query transform with a
WHERE clause. Use a filter to reduce a data set in a debug job
execution. A breakpoint is the location where a debug job
execution pauses and returns control to you.
This exercise demonstrates how to set a breakpoint and view
data in debug mode.
" To set a breakpoint in a data flow
1. Click the name of the DF_CustDim data flow to open its
definition.
2. Right-click the line between the source table and the query
and click
3. In the Breakpoint pane, select the Set check box.
90 Data Integrator Core Tutorial
Set Filter/Breakpoint.
Populating the Customer Dimension from a Relational Table
4. Click OK.
Using the interactive debugger
6
" To use the interactive debugger
1. In the project area, right-click Job_CustDim and click Start
debug
.
If prompted to save your work, click
The Debug Properties window opens.
2. Click OK.
OK.
Data Integrator Core Tutorial 91
6
Populating the Customer Dimension from a Relational Table
Using the interactive debugger
The Designer displays three additional windows (Call stack,
Trace, and Variables) as well as View Data panes below the
work space.
Call stack
window
View Data panes
Trace window
The left View Data pane shows the data in the CUSTOMER
source table, and the right pane shows one row at a time (the
default) that has passed to the query.
3. To process the next row, from the Debug menu click Get
Next Row
The next row replaces the existing row in the query pane. To
see all rows, select the
92 Data Integrator Core Tutorial
Variables
window
.
All check box.
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.