Business Objects products in this release may contain redistributions of software
licensed from third-party contributors. Some of these individual components may
also be available under alternative licenses. A partial listing of third-party
contributors that have requested or permitted acknowledgments, as well as required
notices, can be found at: http://www.businessobjects.com/thirdparty
Precision in operations ......................................................................189
Index191
8Data Services Performance Optimization Guide
Welcome to Data Services
1
Welcome to Data Services
1
Welcome
Welcome
Data Services XI Release 3 provides data integration and data quality
processes in one runtime environment, delivering enterprise performance
and scalability.
The data integration processes of Data Services allow organizations to easily
explore, extract, transform, and deliver any type of data anywhere across
the enterprise.
The data quality processes of Data Services allow organizations to easily
standardize, cleanse, and consolidate data anywhere, ensuring that end-users
are always working with information that's readily available, accurate, and
trusted.
Documentation set for Data Services
You should become familiar with all the pieces of documentation that relate
to your Data Services product.
What this document providesDocument
Documentation Map
Release Summary
Release Notes
Getting Started Guide
Installation Guide for Windows
Installation Guide for UNIX
10Data Services Performance Optimization Guide
Information about available Data Services books,
languages, and locations
Highlights of key features in this Data Services release
Important information you need before installing and
deploying this version of Data Services
An introduction to Data Services
Information about and procedures for installing Data
Services in a Windows environment.
Information about and procedures for installing Data
Services in a UNIX environment.
Advanced Development Guide
Welcome to Data Services
Documentation set for Data Services
What this document providesDocument
Guidelines and options for migrating applications including information on multi-user functionality and
the use of the central repository for version control
1
Designer Guide
Integrator's Guide
Management Console: Administrator
Guide
Management Console: Metadata Reports Guide
Migration Considerations Guide
Performance Optimization Guide
Reference Guide
Information about how to use Data Services Designer
Information for third-party developers to access Data
Services functionality. Also provides information about
how to install, configure, and use the Data Services
Adapter for JMS.
Information about how to use Data Services Administrator
Information about how to use Data Services Metadata
Reports
Information about:
•Release-specific product behavior changes from
earlier versions of Data Services to the latest release
•How to migrate from Data Quality to Data Services
Information about how to improve the performance
of Data Services
Detailed reference material for Data Services Designer
Data Services Performance Optimization Guide11
Welcome to Data Services
1
Documentation set for Data Services
Technical Manuals
What this document providesDocument
A compiled “master” PDF of core Data Services books
containing a searchable master table of contents and
index:
•
Getting Started Guide
•
Installation Guide for Windows
•
Installation Guide for UNIX
•
Designer Guide
•
Reference Guide
•
Management Console: Metadata Reports Guide
•
Management Console: Administrator Guide
•
Performance Optimization Guide
•
Advanced Development Guide
•
Supplement for J.D. Edwards
•
Supplement for Oracle Applications
•
Supplement for PeopleSoft
•
Supplement for Siebel
•
Supplement for SAP
Tutorial
A step-by-step introduction to using Data Services
In addition, you may need to refer to several Adapter Guides and
Supplemental Guides.
What this document providesDocument
Salesforce.com Adapter
Interface
Supplement for J.D. Edwards
Supplement for Oracle Applications
Supplement for PeopleSoft
12Data Services Performance Optimization Guide
Information about how to install, configure, and use the Data
Services Salesforce.com Adapter Interface
Information about license-controlled interfaces between Data
Services and J.D. Edwards World and J.D. Edwards OneWorld
Information about the license-controlled interface between Data
Services and Oracle Applications
Information about license-controlled interfaces between Data
Services and PeopleSoft
Welcome to Data Services
Accessing documentation
What this document providesDocument
1
Supplement for SAP
Supplement for Siebel
Information about license-controlled interfaces between Data
Services, SAP ERP, and SAP BI/BW
Information about the license-controlled interface between Data
Services and Siebel
Accessing documentation
You can access the complete documentation set for Data Services in several
places.
Accessing documentation on Windows
After you install Data Services, you can access the documentation from the
Start menu.
1. Choose Start > Programs > BusinessObjects XI 3.1 >
BusinessObjects Data Services > Data Services Documentation.
Note:
Only a subset of the documentation is available from the Start menu. The
documentation set for this release is available in LINK_DIR\Doc\Books\en.
2. Click the appropriate shortcut for the document that you want to view.
Accessing documentation on UNIX
After you install Data Services, you can access the online documentation by
going to the directory where the printable PDF files were installed.
1. Go to LINK_DIR/doc/book/en/.
2. Using Adobe Reader, open the PDF file of the document that you want
to view.
Data Services Performance Optimization Guide13
Welcome to Data Services
1
Business Objects information resources
Accessing documentation from the Web
You can access the complete documentation set for Data Services from the
Business Objects Customer Support site.
1.
Go to http://help.sap.com.
2. Cick Business Objects at the top of the page.
You can view the PDFs online or save them to your computer.
Business Objects information resources
A global network of Business Objects technology experts provides customer
support, education, and consulting to ensure maximum business intelligence
benefit to your business.
Useful addresses at a glance:
ContentAddress
14Data Services Performance Optimization Guide
Welcome to Data Services
Business Objects information resources
ContentAddress
1
Customer Support, Consulting, and Education
services
Information about Customer Support programs,
as well as links to technical articles, downloads,
and online forums. Consulting services can
provide you with information about how Business Objects can help maximize your business
intelligence investment. Education services can
provide information about training options and
modules. From traditional classroom learning
to targeted e-learning seminars, Business Objects can offer a training package to suit your
learning needs and preferred learning style.
Get online and timely information about Data
Services, including tips and tricks, additional
downloads, samples, and much more. All content is to and from the community, so feel free
to join in and contact us if you have a submission.
Search the Business Objects forums on the
SAP Community Network to learn from other
Data Services users and start posting questions
or share your knowledge with the community.
Blueprints for you to download and modify to fit
your needs. Each blueprint contains the necessary Data Services project, jobs, data flows, file
formats, sample data, template tables, and
custom functions to run the data flows in your
environment with only a few modifications.
Data Services Performance Optimization Guide15
Welcome to Data Services
1
Business Objects information resources
http://help.sap.com/
ContentAddress
Business Objects product documentation.Product documentation
Documentation mailbox
documentation@businessobjects.com
Supported platforms documentation
https://service.sap.com/bosap-support
Send us feedback or questions about your
Business Objects documentation. Do you have
a suggestion on how we can improve our documentation? Is there something that you particularly like or have found useful? Let us know,
and we will do our best to ensure that your
suggestion is considered for the next release
of our documentation.
Note:
If your issue concerns a Business Objects
product and not the documentation, please
contact our Customer Support experts.
Get information about supported platforms for
Data Services.
In the left panel of the window, navigate to
Documentation > Supported Platforms >
BusinessObjects XI 3.1. Click the BusinessObjects Data Services link in the main window.
16Data Services Performance Optimization Guide
Environment Test Strategy
2
Environment Test Strategy
2
The source OS and database server
This section covers suggested methods of tuning source and target database
applications, their operating systems, and the network used by your Data
Services environment. It also introduces key Data Services job execution
options.
This section contains the following topics:
•
The source OS and database server on page 18
•
The target OS and database server on page 19
•
The network on page 20
•
Data Services Job Server OS and job options on page 20
To test and tune Data Services jobs, work with all four of these components
in the order shown above.
In addition to the information in this section, you can use your UNIX or
Windows operating system and database server documentation for specific
techniques, commands, and utilities that can help you measure and tune the
Data Services environment.
The source OS and database server
Tune the source operating system and database to quickly read data from
disks.
Operating system
Make the input and output (I/O) operations as fast as possible. The
read-ahead protocol, offered by most operating systems, can greatly improve
performance. This protocol allows you to set the size of each I/O operation.
Usually its default value is 4 to 8 kilobytes which is too small. Set it to at least
64K on most platforms.
Database
Tune your database on the source side to perform SELECTs as quickly as
possible.
18Data Services Performance Optimization Guide
Environment Test Strategy
The target OS and database server
In the database layer, you can improve the performance of SELECTs in
several ways, such as the following:
•Create indexes on appropriate columns, based on your Data Services
data flows.
•Increase the size of each I/O from the database server to match the OS
read-ahead I/O size.
•Increase the size of the shared buffer to allow more data to be cached in
the database server.
•Cache tables that are small enough to fit in the shared buffer. For example,
if jobs access the same piece of data on a database server, then cache
that data. Caching data on database servers will reduce the number of
I/O operations and speed up access to database tables.
See your database server documentation for more information about
techniques, commands, and utilities that can help you measure and tune the
the source databases in your Data Services jobs.
The target OS and database server
2
Tune the target operating system and database to quickly write data to disks.
Operating system
Make the input and output operations as fast as possible. For example, the
asynchronous I/O, offered by most operating systems, can greatly improve
performance. Turn on the asynchronous I/O.
Database
Tune your database on the target side to perform INSERTs and UPDATES
as quickly as possible.
In the database layer, there are several ways to improve the performance
of these operations.
Here are some examples from Oracle:
Data Services Performance Optimization Guide19
Environment Test Strategy
2
The network
•Turn off archive logging
•Turn off redo logging for all tables
•Tune rollback segments for better performance
•Place redo log files and data files on a raw device if possible
•Increase the size of the shared buffer
See your database server documentation for more information about
techniques, commands, and utilities that can help you measure and tune the
the target databases in your Data Services jobs.
The network
When reading and writing data involves going through your network, its ability
to efficiently move large amounts of data with minimal overhead is very
important. Do not underestimate the importance of network tuning (even if
you have a very fast network with lots of bandwidth).
Set network buffers to reduce the number of round trips to the database
servers across the network. For example, adjust the size of the network
buffer in the database client so that each client request completely fills a
small number of network packets.
Data Services Job Server OS and job
options
Tune the Job Server operating system and set job execution options to
improve performance and take advantage of self-tuning features of Data
Services.
Operating system
Data Services jobs are multi-threaded applications. Typically a single data
flow in a job initiates one al_engine process that in turn initiates at least 4
threads.
For maximum performance benefits:
20Data Services Performance Optimization Guide
•Consider a design that will run one al_engine process per CPU at a
time.
•Tune the Job Server OS so that Data Services threads spread to all
available CPUs.
For more information, see Checking system utilization on page 26.
Data Services jobs
You can tune Data Services job execution options after:
•Tuning the database and operating system on the source and the target
computers
•Adjusting the size of the network buffer
•Your data flow design seems optimal
You can tune the following execution options to improve the performance of
Data Services jobs:
•Monitor sample rate
Environment Test Strategy
Data Services Job Server OS and job options
2
•Collect statistics for optimization and Use collected statistics
Setting Monitor sample rate
During job execution, Data Services writes information to the monitor log file
and updates job events after processing the number of rows specified in
Monitor sample rate. Default value is 1000. Increase Monitor sample rate
to reduce the number of calls to the operating system to write to the log file.
When setting Monitor sample rate, you must evaluate performance
improvements gained by making fewer calls to the operating system against
your ability to view more detailed statistics during job execution. With a higher
Monitor sample rate, Data Services collects more data before calling the
operating system to open the file, and performance improves. However, with
a higher monitor rate, more time passes before you can view statistics during
job execution.
Data Services Performance Optimization Guide21
Environment Test Strategy
2
Data Services Job Server OS and job options
In production environments when your jobs transfer large volumes of data,
Business Objects recommends that you increase Monitor sample rate to
50,000.
Note:
If you use a virus scanner on your files, exclude the Data Services log from
the virus scan. Otherwise, the virus scan analyzes the Data Services log
repeated during the job execution, which causes a performance degradation.
Collecting statistics for self-tuning
Data Services provides a self-tuning feature to determine the optimal cache
type (in-memory or pageable) to use for a data flow.
To take advantage of this self-tuning feature
1. When you first execute a job, select the option Collect statistics for
optimization to collect statistics which include number of rows and width
of each row. Ensure that you collect statistics with data volumes that
represent your production environment. This option is not selected by
default.
2. The next time you execute the job, this option is selected by default.
3. When changes occur in data volumes, re-run your job with Collect
statistics for optimization to ensure that Data Services has the most
current statistics to optimize cache types.
For more information about these caches, see .
Related Topics
•Using Caches on page 63
22Data Services Performance Optimization Guide
Measuring Data Services
Performance
3
Measuring Data Services Performance
3
Data Services processes and threads
This section contains the following topics:
•
Data Services processes and threads on page 24
•
Measuring performance of Data Services jobs on page 25
Data Services processes and threads
Data Services uses processes and threads to execute jobs that extract data
from sources, transform the data, and load data into a data warehouse. The
number of concurrently executing processes and threads affects the
performance of Data Services jobs.
Data Services processes
The processes Data Services uses to run jobs are:
•al_jobserver
The al_jobserver initiates one process for each Job Server configured on
a computer. This process does not use much CPU power because it is
only responsible for launching each job and monitoring the job's execution.
•al_engine
For batch jobs, an al_engine process runs when a job starts and for each
of its data flows. Real-time jobs run as a single process.
The number of processes a batch job initiates also depends upon the
number of:
•parallel work flows
•parallel data flows
•sub data flows
For an example of the Data Services monitor log that displays the processes,
see Analyzing log files for task duration on page 30.
24Data Services Performance Optimization Guide
Data Services threads
A data flow typically initiates one al_engine process, which creates one
thread per data flow object. A data flow object can be a source, transform,
or target. For example, two sources, a query, and a target could initiate four
threads.
If you are using parallel objects in data flows, the thread count will increase
to approximately one thread for each source or target table partition. If you
set the Degree of parallelism (DOP) option for your data flow to a value
greater than one, the thread count per transform will increase. For example,
a DOP of 5 allows five concurrent threads for a Query transform. To run
objects within data flows in parallel, use the following Data Services features:
•Table partitioning
•File multithreading
•Degree of parallelism for data flows
Measuring Data Services Performance
Measuring performance of Data Services jobs
3
Related Topics
•Using parallel Execution on page 77
Measuring performance of Data Services
jobs
You can use several techniques to measure performance of Data Services
jobs:
•
Checking system utilization on page 26
•
Analyzing log files for task duration on page 30
•
Reading the Monitor Log for execution statistics on page 31
•
Reading the Performance Monitor for execution statistics on page 32
•
Reading Operational Dashboards for execution statistics on page 34
Data Services Performance Optimization Guide25
Measuring Data Services Performance
3
Measuring performance of Data Services jobs
Checking system utilization
The number of Data Services processes and threads concurrently executing
affects the utilization of system resources (see Data Services processes and
threads on page 24).
Check the utilization of the following system resources:
•CPU
•Memory
•Disk
•Network
To monitor these system resources, use the following tools:
For UNIX:
•top or a third party utility (such as glance for HPUX)
For Windows:
•Performance tab on the Task Manager
Depending on the performance of your jobs and the utilization of system
resources, you might want to adjust the number of Data Services processes
and threads. The following sections describe different situations and suggests
Data Services features to adjust the number of processes and threads for
each situation.
CPU utilization
Data Services is designed to maximize the use of CPUs and memory
available to run the job.
The total number of concurrent threads a job can run depends upon job
design and environment. Test your job while watching multi-threaded Data
Services processes to see how much CPU and memory the job requires.
Make needed adjustments to your job design and environment and test again
to confirm improvements.
26Data Services Performance Optimization Guide
Measuring Data Services Performance
Measuring performance of Data Services jobs
For example, if you run a job and see that the CPU utilization is very high,
you might decrease the DOP value or run less parallel jobs or data flows.
Otherwise, CPU thrashing might occur.
For another example, if you run a job and see that only half a CPU is being
used, or if you run eight jobs on an eight-way computer and CPU usage is
only 50%, you can be interpret this CPU utilization in several ways:
•One interpretation might be that Data Services is able to push most of
the processing down to source and/or target databases.
•Another interpretation might be that there are bottlenecks in the database
server or the network connection. Bottlenecks on database servers do
not allow readers or loaders in jobs to use Job Server CPUs efficiently.
To determine bottlenecks, examine:
•Disk service time on database server computers
Disk service time typically should be below 15 milliseconds. Consult
your server documentation for methods of improving performance.
For example, having a fast disk controller, moving database server
log files to a raw device, and increasing log size could improve disk
service time.
3
•Number of threads per process allowed on each database server
operating system. For example:
•On HPUX, the number of kernel threads per process is configurable.
The CPU to thread ratio defaults to one-to-one. Business Objects
recommends setting the number of kernel threads per CPU to
between 512 and 1024.
•On Solaris and AIX, the number of threads per process is not
configurable. The number of threads per process depends on
system resources. If a process terminates with a message like
"Cannot create threads," you should consider tuning the job.
For example, use the Run as a separate process option to split
a data flow or use the Data_Transfer transform to create two sub
data flows to execute sequentially. Since each sub data flow is
executed by a different Data Services al_engine process, the
number of threads needed for each will be 50% less than in your
previous job design.
Data Services Performance Optimization Guide27
Measuring Data Services Performance
3
Measuring performance of Data Services jobs
If you are using the Degree of parallelism option in your data flow,
reduce the number for this option in the data flow Properties
window.
•Network connection speed
Determine the rate that your data is being transferred across your
network.
•If the network is a bottle neck, you might change your job execution
distribution level from sub data flow to data flow or job to execute
the entire data flow on the local Job Server.
•If the capacity of your network is much larger, you might retrieve
multiple rows from source databases using fewer requests.
•Yet another interpretation might be that the system is under-utilized. In
this case, you might increase the value for the Degree of parallelism
option and increase the number of parallel jobs and data flows.
Related Topics
•Using parallel Execution on page 77
•Using grid computing to distribute data flow execution on page 116
•Using array fetch size on page 185
Data Services memory
For memory utilization, you might have one of the following different cases:
•Low amount of physical memory.
In this case, you might take one of the following actions:
•Add more memory to the Job Server.
•Redesign your data flow to run memory-consuming operations in
separate sub data flows that each use a smaller amount of memory,
and distribute the sub data flows over different Job Servers to access
memory on multiple machines. For more information, see Splitting a
data flow into sub data flows on page 102.
•Redesign your data flow to push down memory-consuming operations
to the database. For more information, see Push-down operations on
page 42.
28Data Services Performance Optimization Guide
Measuring Data Services Performance
Measuring performance of Data Services jobs
For example, if your data flow reads data from a table, joins it to a file,
and then groups it to calculate an average, the group by operation might
be occurring in memory. If you stage the data after the join and before
the group by into a database on a different computer, then when a sub
data flow reads the staged data and continues with the group processing,
it can utilize memory from the database server on a different computer.
This situation optimizes your system as a whole.
For information about how to stage your data, see Data_Transfer transform
on page 110. For more information about distributing sub data flows to
different computers, see Using grid computing to distribute data flow
execution on page 116.
•Large amount of memory but it is under-utilized.
In this case, you might cache more data. Caching data can improve the
performance of data transformations because it reduces the number of
times the system must access the database.
Data Services provides two types of caches: in-memory and pageable.
For more information, see Caching data on page 64.
3
•Paging occurs.
Pageable cache is the default cache type for data flows. On Windows
and Linux, the virtual memory available to the al_engine process is 1.5
gigabytes (500 megabytes of virtual memory is reserved for other engine
operations, totaling 2GB). On UNIX, Data Services limits the virtual
memory for the al_engine process to 3.5 gigabytes (500MB is reserved
for other engine operations, totaling 4GB). If more memory is needed
than these virtual memory limits, Data Services starts paging to continue
executing the data flow.
If your job or data flow requires more memory than these limits, you can
take advantage of one of the following Data Services features to avoid
paging:
•Split the data flow into sub data flows that can each use the amount
of memory set by the virtual memory limits.
Each data flow or each memory-intensive operation within a data flow
can run as a separate process that uses separate memory from each
other to improve performance and throughput. For more information,
see Splitting a data flow into sub data flows on page 102.
Data Services Performance Optimization Guide29
Measuring Data Services Performance
3
Measuring performance of Data Services jobs
•Push-down memory-intensive operations to the database server so
that less memory is used on the Job Server computer. For more
information, see Push-down operations on page 42.
Analyzing log files for task duration
The trace log shows the progress of an execution through each component
(object) of a job. The following sample Trace log shows a separate Process
ID (Pid) for the Job, data flow, and each of the two sub data flows.
30Data Services Performance Optimization Guide
Loading...
+ 164 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.