PatentsBusiness Objects owns the following U.S. patents, which may cover products that are
documentation@businessobjects.com
.
offered and sold by Business Objects: 5,555,403, 6,247,008 B1, 6,578,027 B2,
6,490,593 and 6,289,352.
TrademarksBusiness Objects, the Business Objects logo, Crystal Reports, and Crystal Enterprise
are trademarks or registered trademarks of Business Objects SA or its affiliated
companies in the United States and other countries. All other names mentioned herein
may be trademarks of their respective owners.
Third-party contributorsBusiness Objects products in this release may contain redistributions of software
licensed from third-party contributors. Some of these individual components may
also be available under alternative licenses. A partial listing of third-party
contributors that have requested or permitted acknowledgments, as well as required
notices, can be found at: http://www.businessobjects.com/thirdparty
2
DataRight IQ Transition Guide
Contents
Chapter 1:
Welcome to DataRight IQ ............................................................................ 7
Comparing DataRight and DataRight IQ.........................................................8
About DataRight IQDataRight IQ is advanced data-parsing software that identifies information in
your database so that you can use it more effectively. Here are just a few things
that you can do with DataRight IQ:
parse and standardize data
assign gender and prenames
perform advanced search-and-replace
combine split names or split combined names
About this guideUse this document to learn about the new features in DataRight IQ, how they
compare with DataRight, and how to perform the same tasks you used in
DataRight with DataRight IQ.
ConventionsThis document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
cd\dirs
.”
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file, and
the
.txt
Menu
commands
extension (
We indicate commands that you choose from menus in the following
format: Menu Name > Command Name. For example, “Choose File >
testfile
.txt
).”
New.”
!
We use this symbol to alert you to important information and potential
problems.
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful to
you.
5
Documentation
Related documentsDataRight IQ comes with other documentation to help you fully use the
application’s abilities.
DocumentFor those
who use
Release Notes
anyContains any necessary installation information and
Description
explains DataRight IQ’s capabilities in relation to the
previous version.
User’s Guide
anyLearn more about what DataRight IQ can do, and how
to do it. Includes Job-file and Library information.
Views online
help
Modifier’s
Guide
ViewsContains information, accessible on-line, about run-
ning UMD through its Views implementation.
anyExplains how to modify UMD to suit your needs—
includign how to create custom dictionaries using the
command-line version of the User-Modifiable Dictionary (UMD) program, how to edit your rule file, and
how to set up your user-defined patterns.
Here is a list of other product documentation that you may find useful.
DocumentDescription
Views: A Quick Guide to
Get You Started
Gives you the basic information you need to get started
with any Views software.
Access the latest
documentation
Edjob booklet
Explains how to use the Edjob utility to update your job
files. Use this utility to update your job files when you
receive a new version of UMD.
System Administrator’s
Guide
Describes installation procedures, system requirements,
and more.
You can access product documentation in several places:
On your computer. Release notes, manuals, and other documents for each
product that you have installed are available in the Documentation folder.
Choose Start > Programs > Firstlogic Applications > Documentation.
On the Customer Portal. Go to www.firstlogic.com/customer, and then
click the Documentation link to access all the latest product documentation.
You can view the PDFs online or save them to your computer for viewing or
printing.
6
DataRight IQ Transition Guide
Chapter 1:
Welcome to DataRight IQ
This Transition Guide introduces you to the DataRight IQ product, contrasting its
features with those of DataRight. As such, this guide is designed for those
familiar with DataRight 2.56 or previous versions.
What’s new in
DataRight IQ
Converting from
DataRight
DataRight IQ was developed from the convergence and evolution of products
such as DataRight and TrueName Library.
If you’re familiar with DataRight, you have a good basic understanding of
DataRight IQ’s Views and Job implementations. DataRight IQ shares many
things with the DataRight product. However, many features—as well as some
fundamental differences in how they parse data—make these two products
distinct from each other.
To see the differences, refer to “Comparing DataRight and DataRight IQ” on
page 8.
Because of features such as its increased parsing capabilities, DataRight IQ has
numerous parameters in its underlying job file that are not in DataRight's master
job file. See “Job file comparisons” on page 55.
Because DataRight and DataRight IQ are two distinctly different products
(DataRight IQ is not an update for DataRight, but an upgrade), the process of
upgrading from DataRight to DataRight IQ may be a little different from what
you may be used to when merely updating to a new version.
For more information on updating DataRight IQ Job, see “Installing and running
DataRight IQ” on page 13.
Chapter 1: Welcome to DataRight IQ
7
Comparing DataRight and DataRight IQ
DataRight IQ builds on DataRight’s features, but it also goes so much further.
On the surface, DataRight and DataRight IQ may appear very similar. They both
have a job-file implementation (whose job file blocks contain a lot of the same
parameters). They both have a Views implementation too (and their windows
look very similar). However, DataRight IQ contains several enhancements over
DataRight. Below is a brief list of DataRight IQ features.
Parsing more dataDataRight IQ parses several more types of data than DataRight 2.5x parsed,
including e-mail addresses, phone numbers (U.S. and international), user-defined
patterns, and U.S. Social Security numbers. One of the additional types of data
that DataRight IQ parses is dates. (DataRight 2.56 could format dates but didn’t
actually parse the data.)
For more information on parsing dates and other data, see “How DataRight IQ
parses new types of data” on page 31.
Rule-based parsingDataRight IQ brings you the flexibility of rule-based parsing. With parsing based
on user-modifiable rules, you can customize how DataRight IQ parses data by
editing the rules.
In addition to rule-based parsing, DataRight IQ also provides presumptive
parsing, which is similar to the way DataRight 2.5x parsed. (The presumptive
parsing option is applicable only to name and firm data.)
For more information, see “DataRight IQ uses rule-based parsing” on page 26.
Controlling parsingDataRight IQ provides you with control over what you parse on multi-line input.
You can turn on and off each of your parsing capabilities for each specific input
line.
For more information, see “Turn off parsing engines” on page 29.
Application fieldsDataRight IQ brings you many new PW and AP fields. Most of these fields have
been added because of the new parsing capabilities.
For more information, see “Fields DataRight IQ uses” on page 49.
User-defined patternsDataRight IQ lets you define patterns that you can then parse. You define these
patterns using Regular Expressions.
For general information about parsing user-defined patterns, see “Parse user-
defined patterns” on page 40. For detailed information about setting up (defining)
user-defined patterns, see the DataRight IQ Modifier’s Guide.
Name orderName order can be FML (First, Middle, Last) or LFM (Last, First, Middle). You
set this in your DEF file—just like you did with DataRight 2.5x.
But in addition to FML or LFM, DataRight IQ lets you determine how stringently
name order is applied to parsed names. If you apply “strict” name order,
DataRight IQ parses name data the way you set it in the DEF file. If you don’t
8
DataRight IQ Transition Guide
apply “strict” name order, DataRight IQ uses the way it’s set in your DEF file
only when the input is ambiguous.
For more information, see “Control name order” on page 42.
Title associationsDataRight IQ lets you associate names and titles more precisely than DataRight
2.5x. DataRight had a parameter called “Associate Name & Title.” However, in
DataRight IQ you can also associate names and titles on discrete lines and on
multilines.
For more information, see your DataRight IQ User’s Guide.
Search-and-ReplaceDataRight IQ’s Search-and-Replace capabilities are very similar to DataRight
2.5x’s. However, because of DataRight IQ’s user-defined pattern matching
feature, patterns are now an option when you choose your Search-and Replace
method.
In addition, DataRight IQ has modified its Search-and-Replace function to handle
any casing. You can specify whether you want to ignore casing when performing
Search-and-Replace.
For more information, see your DataRight IQ User’s Guide.
Scan-and-SplitDataRight IQ’s Scan-and-Split capabilities are very similar to DataRight 2.5x’s.
However, DataRight IQ has modified its Scan-and-Split function to handle any
casing. You can specify whether you want to ignore casing when performing
Scan-and-Split. (Note: Scan-and-Split does not support DataRight IQ’s new userdefined pattern matching.)
For more information, see your DataRight IQ User’s Guide.
Statistics filesIn addition to all the reports you could create in DataRight 2.5x, DataRight IQ
lets you generate several statistics files. Statistics files are text files containing all
the information from a specific report.
.
For more information, see “Generate statistics files” on page 44
Unicode supportDataRight IQ can read and write data that follows the Unicode standard. Strictly
speaking, this ability isn’t different between DataRight and DataRight IQ.
However, because it was added only in DataRight 2.56, many DataRight users
may not be familiar with the implementation of Unicode.
For more information, see the document Unicode and Firstlogic: Introduction to Firstlogic’s Unicode-Enabled Technology, which is available on the customer
portal.
Chapter 1: Welcome to DataRight IQ
9
DataRight IQ’s implementation methods
You may be familiar with only one “flavor” of DataRight or DataRight IQ.
However, keep in mind that DataRight IQ spans a range of deployment options.
Implementation
Job fileA batch program for processing database files. It sets processing
modes and options by reading a text file (“job file”) that contains
specific information on how DataRight IQ should process data.
ViewsA graphical interface program for setting up DataRight IQ jobs.
Views is available in two forms:
“Local”
Installed and run entirely on Microsoft Windows.
Vie ws
Remote
Vie ws
Has two components—The Views GUI is installed
on a Windows PC, while the software for actually
processing jobs is installed on a UNIX server.
Library
Toolkits for highly-customized integration of DataRight IQ
technology into your existing software applications.
RAPIDA simplified approach to integrating DataRight IQ technology into
your applications. (RAPID stands for Rapid Application Integration Deployment.)
RAPID contains some qualities of a library or toolkit (it has an API
and requires some programming to implement) while it also offers
the amenities of our batch tools (GUI setup and easy generation of
reports).
Although they all have common DataRight IQ capabilities, each may have unique
installation and operating requirements. This guide deals primarily with the Job
file and Views implementations.
10
DataRight IQ Transition Guide
Where to look for more information
DataRight IQ includes a number of user guides, reference guides, and online
documentation (see below).
DataRight IQ-specific
documentation
The documentation that you receive may depend on how you use DataRight IQ—
for example, as a Job File, Views, Library, or RAPID product. (For information
on the Rapid implementation of DataRight IQ, see the table “DataRight IQ-
related documentation” on page 11.)
DocumentFor those
who use
DataRight IQ
Transition Guide
(this document)
DataRight IQ
User’s Guide
DataRight IQ
Modifier’s Guide
QuickParse
User’s Guide
anyExplains DataRight IQ’s capabilities in relation
anyExplains DataRight IQ’s capabilities (includes
anyExplains how to modify DataRight IQ to suit
anyExplains the QuickParse utility, which lets you
Description
to DataRight 2.5x and tells you any necessary
installation information.
units specifically for Job and Library users).
your needs, including how to create custom dictionaries using the command-line version of the
User-Modifiable Dictionary (UMD) program,
creating or editing the rule file, and user-defined
pattern matching (UDPM) information.
interactively test how DataRight IQ processes a
particular line or lines of data
DataRight IQ-related
documentation
DataRight IQ Views
online help
UMD Views
online help
ViewsContains the online documentation for your
Views product (access the help either through
your application or from the Documentation
CD.)
anyContains information about using the Views ver-
sion of the User-Modifiable Dictionary (UMD)
program.
In addition to DataRight IQ-specific documentation, there may be other product
documentation that you may need to refer to. To find these, look on your
Documentation CD.
DocumentDescription
System Administrator’s Guide
Database Prep
Views: Quick Start
Guide
Explains how to install your software.
Explains how to prepare input files for processing, including
how to create definition and format files (DEF, FMT, DMT).
Contains tips for converting from one database type to
another. Explains filters and functions and how to use them.
Gives you the basic information you need to get started with
any Views software.
Chapter 1: Welcome to DataRight IQ
11
DocumentDescription
RAPID Online
Documentation
Quick Reference for
Views and Job-File
Products
Provides the specific information (such as API calls) needed
for DataRight IQ RAPID (applies to those who use the
RAPID implementation of any related product).
Contains handy reference information:
descriptions of input and output fields
command-line options
ASCII code values
summary of functions and filter operators
12
DataRight IQ Transition Guide
Chapter 2:
Installing and running DataRight IQ
For additional installation information, see Business Objects’ System
Administrator’s Guide.
PathsYou must set your paths appropriately.
Operating systemComment
WindowsThe installation program prompts you to select a drive and
specify a directory location.
The default directory name is
accept this default. The subdirectories underneath
created automatically, as shown below.
pw
. We recommend that you
\pw
are
UNIXThe
postware
postware
directory structure is shown below. Note that
must
not
be a root directory.
Directory structureIf you accept the default settings during installation, the installer creates the
following directory structure.
adm
Edjob and those
utilities used by
multiple products
pw or postware
dirs
City and ZCF
directories
csgui dir
Remote Views
dtr_iq
DataRight IQ
executables and
dictionaries
utils
samples
Sample job
quickparse
Chapter 2: Installing and running DataRight IQ
13
Installing your product
On WindowsWhen you insert the application CD, the installation program should start
automatically. If it doesn’t, follow these steps:
1.Access your Windows Start menu and choose Run.
2.In the Run window, type x:\setup (where x is the letter of your CD-ROM
drive) and click OK.
The installation program should start. For more information about installing, see
Business Objects’ System Administrator’s Guide.
On UNIXTo install DataRight IQ on UNIX, use install_console for the Java Runtime
Environment as explained in Business Objects’ System Administrator’s Guide.
When you install your software on UNIX, you must set path and pw_path. See
the System Administrator’s Guide for details.
UNIX users: Access
the shared libraries
In addition to setting path and pw_path, you must also set the shared library path
environment variable for each user in the appropriate login script. This ensures
that products use the correct files from the /disk_n/postware/adm directory. (We
use disk_n to refer to the file system or disk where you choose to install the
software.)
You need to do this only once. If you already set the shared library path
environment variable for a different product in this CSR, you don’t need to do it
again.
To set the shared library path environment
variable, first determine the appropriate
PlatformEnvironment
variable
environment variable from the table at right.
Then follow these steps:
1.If there is no shared library path, create
one referring to /disk_n/postware/adm.
2.If there is an existing shared library
AIXLIBPATH
HP/UX 11.0SHLIB_PATH
SolarisLD_LIBRARY_PATH
LinuxLD_LIBRARY_PATH
path, append the path to /disk_n/postware/adm at the end of your existing entries.
The examples on the next page show steps 1 and 2 from above for both the
Bourne and C shells on Solaris. You may need to make adjustments based on the
type of shell that you use. See the System Administrator’s Guide for details.
Bourne shell exampleIf you use Bourne shell on Solaris, add product entries to your .profile or .login
Quick ParseIf you use DataRight IQ on the UNIX platform and you have Trolltech’s Qt
installed (see www.trolltech.com
), you need to set up your path in a specific way.
Otherwise, you may have a problem running QuickParse, a utility shipped with
DataRight IQ.
The following directions apply only if you have QuickParse installed on your
UNIX system.and if you’re not using Qt version 3.1.2.
The QuickParse utility lets you quickly see how data that you input (either
manually or from a database) would parse if input through your DataRight IQ
application. QuickParse is written using the Qt programming language, and it’s
shipped with the Qt library version 3.1.2 (libqt.so.3 on UNIX).
You need to set the LB_LIBRARY_PATH correctly so you don’t have problems
running QuickParse. This path is:
/postware/DTR_IQ/utils/quickparse
This path needs to have the directory containing libqt.so.3 as the first item in the
path statement. It should read like this:
If you have questions about setting your LD_LIBRARY_PATH, see your system
administrator.
Chapter 2: Installing and running DataRight IQ
15
Updating your jobs (DataRight IQ Job and Views)
Before you can use your existing jobs and dictionaries in the latest version of
DataRight IQ, you must update them.
Use Edjob to update
jobs
Use DCTCONV to
update parsing
dictionaries
To update your existing jobs to the latest version of DataRight IQ, use the Edjob
utility. Don’t try to update your job files by hand. Instead, use the Edjob update
utility that’s installed with DataRight IQ.
There are two update script files for you to run on your existing jobs:
To update jobs from DataRight to DataRight IQ: dtr2dtr_iq.upd
To update jobs from an older version of DataRight IQ to the latest version of
DataRight IQ: pwdiqjob.upd
Find the DataRight IQ update scripts in the DTR IQ subdirectory.
For complete instructions on running Edjob, see Business Objects’ Edjob User’s Guide.
If you created any custom parsing dictionaries (using DataRight 2.56 or earlier),
you need to convert the transaction file to DataRight IQ’s format and then build a
new dictionary.
To convert, enter the following at your DOS prompt:
dctconv transaction_filenew_transaction_file
Format of your existing
transaction file
where “transaction_file” is the name of your existing transaction file and
“new_transaction_file” is the name of your DataRight IQ transaction file.
This process converts the transaction file to the format accepted by DataRight IQ.
For information about building a dictionary, see the documentation for UMD
(User Modifiable Dictionary) in the DataRight IQ Modifier’s Guide.
Your existing transaction file must be formatted with these field lengths:
A sample job is included with your DataRight IQ software. The sample job is
provided so that you can verify your DataRight IQ installation. The sample job
also introduces you to the files used in a DataRight IQ job.
Note: DataRight had a run utility named rundtr, which you could use to run
your jobs. For DataRight IQ Job, run your jobs using a command line. Or in
Views, run your jobs by clicking the Run icon.
Supporting filesTo read your input files, DataRight IQ needs to know the input database type,
input field names and, in some cases, the length and type of each field.
To provide this information, you create a definition file for each input file, and
perhaps a format file. The definition file tells DataRight IQ the database type and
“translates” your input field names to names that DataRight IQ understands. The
format file tells DataRight IQ how the input file is formatted.
We’ve created definition and format files for the sample input file. Look in the
samples subdirectory for files with the .def and .fmt extensions. Use any text
editor to open the files to view their contents, but don’t change the files in any
way.
For more information about definition files and format files, see Business
Objects’ Database Prep documentation.
If you use DataRight
IQ Job
If you use DataRight
IQ Views
The job file is a set of instructions that tells DataRight IQ how to process your
input file. When you create a DataRight IQ job, copy the master job file
(master.diq) that is installed in the dtr_iq directory and insert the instructions that
are unique to your job.
A sample job file (called quikwin.diq) is included in the samples subdirectory.
Use any text editor to open the job file. Notice that the job file includes
parameters that are displayed in groups called blocks. Entries typed at the
parameters give DataRight IQ instructions on how to process the job. Scroll
through the entire sample job file to get an idea of how this job is set up, but do
not change any of the parameter entries.
Check file paths. You should access the Auxiliary Files block and check the
paths for the dictionary and directory files. If you placed any of the files in
different locations, change these entries before you run the sample job.
DataRight IQ Views gives you an easy-to-use interface to set up and run your job.
Instead of customizing the job file (such as the sample job file mentioned above)
through a text editor, you can use the windows, menus, and controls of DataRight
IQ Views. For assistance with the Views interface, see the product’s online help,
which includes window-level and control-level context-sensitive topics.
Running the sample
job
To run the sample in Job, type the commands shown below for the UNIX
platform. In Views, follow the steps shown below for the Windows platform.
Chapter 2: Installing and running DataRight IQ
17
If you didn’t install DataRight IQ in the default directory, change the path name
accordingly.
Before and after you run the job, look at the contents of the samples directory.
Notice which files are input and which files DataRight IQ creates.
PlatformDirections
UNIX
(Job)
Windows
(Views)
Enter the following:
$ cd /usr/postware/dtr_iq/samples
$ fldiq quikwin.diq
Click
Start > Programs > Firstlogic Applications.
↵
↵
Then select DataRight IQ. You can either browse for the sample job or type the path and file
name for the sample job and click the Run icon.
Verification and
processing
When you enter the command line or click the Run icon, DataRight IQ verifies
that all the parameter entries in the job file are valid. As DataRight IQ verifies the
job, it displays progress messages. If DataRight IQ detects a problem, it issues a
verification warning or error.
Once the job file passes verification, DataRight IQ begins processing the input
file. As DataRight IQ processes the input file, it displays messages to keep you
informed of the job’s progress.
Reports After you run the sample job, you can look at the reports that DataRight IQ
generated. For more information, see the Reports chapter in your DataRight IQ
User’s Guide.
18
DataRight IQ Transition Guide
Editing files in Job
DataRight IQ offers two ways in which you can set up and edit your jobs. Use Job
to enter your options and set up your job in a text-based file, then run the job
using a command line in DOS. Or, use Views to set up your job in a Window’s
environment and run your job with a click of a button.
When you use Views to set up your job, there’s not the possibility for making
setup mistakes as there is in Job. Therefore, here are a few things to pay attention
to when you set up DataRight IQ using Job.
Copy and edit the
master job
block
parameters
Keep blocks intact
In the dtr_iq subdirectory, we provide a master job file called master.diq. To
create your own job file, copy the master job file and then edit the copy. Do not
try to type your own job file from scratch. When you name your job file, use the
file extension .diq—for example, filename.diq.
To edit job files, use a text editor or word processor. If you use a word processor,
save the file as simple ASCII text.
job settings
* MASTER JOB FILE FOR Firstlogic DataRight IQ
BEGIN General DataRight IQ 7.10c =========================
Job Description (to 80 chars)........ =
Job Owner (to 20 chars).............. =
END
BEGIN Execution ===========================================
Post to Input File(s) (Y/N).......... = Y
Post to Output File(s) (Y/N)......... = Y
Create Reports (Y/N)................. = Y
Keep the basic structure of the job blocks intact:
Do not delete parameters or rearrange them within a block.
Do not edit parameter names (anything to the left of the equal sign).
Do not edit block titles.
Do not edit the BEGIN or END lines. (Exception: If you want DataRight IQ
to ignore a block, insert an asterisk in front of the word BEGIN.)
You may add comments at the beginning or end of the job file and between
blocks. Start all comment lines with an asterisk (*). Do not use the key words
BEGIN or END in your comments.
Type entries correctlyParameter names are often followed by guidelines or options shown in
parentheses. You can distinguish a guideline from an option by its case.
Chapter 2: Installing and running DataRight IQ
19
Guidelines are shown in lowercase, options in UPPERCASE (see the graphic,
below.) Case does not matter when you type your entry, but be sure to spell
options exactly as shown. (There is one exception: At Y/N parameters, you may
spell out “Yes” or “No.”)
Guidelines are shown
in lowercase.
Job Description (to 80 chars)........ =
Cache Buffer Size (SPEED/SPACE)...... = SPEED
Options are shown
in UPPERCASE.
Case does not matter
when you type your entry
If you’re entering a long parameter entry, never press the Enter key. Simply let
the entry wrap onto the next line.
Include all required
information
BlocksCertain blocks are required in every job. The documentation that describes each
block indicates whether that block is required or optional. For more information
about each block, see the DataRight IQ User’s Guide.
ParametersMost parameters require an entry. There are very few optional parameters that
may be left blank.
$job, $time, and
$date macros
In some parameter entries, you can include the macros $job, $time, and $date.
DataRight IQ converts the macro to a specific piece of information. (You can also
use these macros in Views.)
$jobDataRight IQ automatically converts $job to the base name of the job file
(without path or extension). For example, your entry at the Output File Name
parameter might be $job.dat. If your job file were named my_file.diq, DataRight
IQ would name the output file my_file.dat. You can include the $job macro in
file names, the job description, and report headers.
$time and $dateDataRight IQ automatically converts $time and $date to the time and date, which
are taken from your computer’s clock when the job starts running. The time is ten
characters long in the format hh:mm:ss with am or pm. The date is eleven
characters long in the format dd-mmm-yyyy.
You can include the $time and $date macros in the job description or in report
headers.
20
DataRight IQ Transition Guide
Editing files in Views
DataRight IQ Views is provided with DataRight IQ Job. You can choose which
software that you want to used based on your experience and comfort level. Some
of you are used to the setting up your jobs using the master job file and running it
with a DOS command. Some of you may be more familiar with using a GUI
(graphical user interface).
Advantages of ViewsWith Views, you eliminate many of the chances for mistakes that you can make
in Job. You can’t delete parameters like you can in Job, and many times Views
tells you when you’ve entered an incorrect option. Most options don’t need to be
typed, but are chosen from a drop-down list or by selecting or deselecting an
option. This eliminates the chance for mistakes or misspellings in your entries.
Views provides help at the click of a button. If you don’t know what an option
does, simply click the question mark icon in the Views window and click on the
option. An explanation appears telling you what the option does. For more indepth information, access the online help by choosing Help from the menu.
While there is always a chance to make errors in setting up your jobs, Views
provides an environment that eliminates many of the chances for error.
For details on using Views, see the Views Quick Start Guide.
Chapter 2: Installing and running DataRight IQ
21
Views provides a
menu bar and tool
bar that provide you
with easy ways to set
up and run your jobs.
Views presents you
the elements of
DataRight IQ Job,
but in Views, you
open windows to
enter parameter
information.
Access DataRight
Job blocks by
expanding groups in
the tree view and
double-clicking the
block that you want
to setup.
The block appears in
a window as shown.
Notice how the
entries in Views
are usually
options that you
select. You can
see how the
parameters and
the options
match up in this
example.
22
DataRight IQ Transition Guide
Chapter 3:
What and how DataRight IQ parses
You can use DataRight IQ to identify and isolate various data and data
components from fields of information. We call this parsing.
What DataRight IQ
parses—compared to
DataRight
With DataRight IQ you can parse more data than you can using DataRight.
Comparing what you can parse with DataRight versus DataRight IQ, you can tell
that DataRight IQ takes you well beyond just name and address.
With its increased parsing capabilities, DataRight IQ lets you parse the following
types of data:
You can parse data
such as
names
job titles
firms (company data)
U.S. street addresses
e-mail addresses
Social Security numbers
U.S. phone numbers
int’l phone numbers
date
user-defined patterns
g
i
R
a
t
a
D
see note
t
h
D
IQ
t
h
g
i
R
a
t
a
For more information
see...
your
DataRight IQ
Users’ Guide
the next chapter
DataRight and dates. Previously, with DataRight 2.5x, you could format
dates, but not parse them. DataRight IQ recognizes dates in a variety of
formats and breaks those dates into components.
For a complete list of the input fields that DataRight IQ accepts, see Business
Objects’ Quick Reference documentation. For a more detailed description about
how DataRight IQ parses each type of data, see the next chapter.
Chapter 3: What and how DataRight IQ parses
23
Parsing—DataRight versus DataRight IQ
You can use DataRight to identify and isolate a wide variety of data. However, it
can’t recognize as many data fields as DataRight IQ.
In the example below (using the jobfile implementation), notice all the data that
DataRight doesn’t know what to do with—and so puts into the Extra field. But
when you use DataRight IQ to parse the same data, it properly identifies data like
phone, e-mail address, and Social Security number.
Data parsed by DataRight Job 2.56
Input data
Mr. Dan R. Smith, Jr., CPA
Director of Admissions
Jones Inc.
PO Box 567
1234 Main St S
Biron, WI 54494
421-55-2424
dsmith@rdrindustries.com
507-555-3423
Apr 20, 2003
Prename
First Name
Middle Name
Last Name
Maturity Postname
Honorary Postname
Title
Firm
Address
Address
Lastline
Extra
Extra
Extra
Extra
Data parsed by DataRight IQ Job 7.10
Prename
First Name
Middle Name
Last Name
Maturity Postname
Honorary Postname
Title
Firm
Address
Address
Lastline
Social Security
E-mail address
Phone
Date
Mr.
Dan
R.
Smith
Jr.
CPA
Director of Admissions
Jones Inc.
PO Box 567
1234 Main St S
Biron, WI 54494
421-55-2424
dsmith@rdrindustries.com
507-555-3423
April 20, 2003
Mr.
Dan
R.
Smith
Jr.
CPA
Director of Admissions
Jones Inc.
POB 567
1234 Main St S
Biron, WI 54494
421-55-2424
dsmith@rdrindustries.com
507-555-3423
April 20, 2003
More application
fields
24
DataRight IQ User’s Guide
DataRight IQ brings you many new PW and AP fields. Most of these fields have
been added because of the new parsing capabilities. For more information, see
“Fields DataRight IQ uses” on page 49.
How DataRight IQ differs
DataRight IQ and DataRight have some underlying differences in their respective
parsing behaviors. DataRight IQ significantly improves parsing and identification
of name and firm results through configurable “rules” based parsing.
DataRight IQ is not
your old DataRight
What’s the basis
for this difference?
DataRight IQ and DataRight parse some data differently. You may find that you
receive different results when you parse the same data through DataRight IQ and
through DataRight.
DataRight IQ makes fewer assumptions about data than DataRight makes. For
example, when DataRight encounters data on a name line, it assumes that data is
a name. DataRight IQ, however, exercises more caution in its parsing behavior.
DataRight IQ doesn’t automatically assume data on a name line is name data. It
parses the data, and if the data does not parse as name data, it places the data in an
Extra field. To make that data parse as a name, you may need to add a rule or edit
a dictionary.
Rule-based and presumptive. In addition to DataRight IQ’s rule-based way
of parsing, DataRight IQ can also parse presumptively (similar to DataRight)
when no parsing rule is hit. For more information, see “DataRight IQ uses
rule-based parsing” on page 26.
DataRight 2.xx uses a methodology for parsing that’s based on identifying and
isolating words and then comparing them to an empirical source, also known as a
dictionary lookup, to determine their meaning.
DataRight IQ builds on this method and uses configurable rules. This parsing
method enhances the DataRight approach by using external rules, in combination
with dictionary lookup, to guide the parser’s actions.
What’s the benefit
of this difference?
DataRight IQ’s approach to parsing provides the following benefits:
Context sensitivity—DataRight IQ can perform limited contextually based
parsing along with existing dictionary lookups where the parser can survey
its surroundings and sometimes infer a word’s meaning by its relationship to
other elements in addition to dictionary lookups.
Flexibility—DataRight IQ has greater flexibility because rules are no longer
hard coded. Rules can be changed to meet the project needs without changing
the program. This allows advanced users to change, add, or delete rules to
meet their specific needs.
Chapter 3: What and how DataRight IQ parses
25
DataRight IQ uses rule-based parsing
In DataRight 2.56 and earlier, you could only perform presumptive parsing. You
didn’t have the option of rule-based parsing. Now DataRight IQ only uses
presumptive parsing on name and firm data when the rule-based parsing does not
work.
DataRight IQ follows sets of rules to determine how to parse data. It decides how
to parse based only on the data itself and the rules. When the data matches a rule,
DataRight outputs the data accordingly.
Presumptive parsingThe parsing in DataRight was all hard coded (not like DataRight IQ, where there
is an editable rule file). There was no way to change any parsing results. If you
knew that your data was a name or a firm, you’d input it on a nameline or firmline
and it would always parse as such regardless of what it was.
Although DataRight IQ introduced rule-based parsing, some users prefer how
DataRight worked, when it sent all entries to name or firm. So we incorporated
the presumptive parsing option into DataRight IQ.
DataRight IQ tries its rule-based parsing with name (or firm) rules when input on
a name or firm line, respectively. If data doesn't match a rule (and you have
activated presumptive parsing in your setup), it uses presumptive parsing to make
a best guess. With presumptive parsing, a name or firm will always be parsed out
of a name or firm line.
The input does go to the rule set first, so in some cases the rules will match only
part of the entry and parse that out. The remaining will go to extra.
Some examplesIn the example below, the nameline data “xb1wc so34bod2jc” is recognized as
junk data when parsing by the rules (the data has numbers and/or no letters in it)
and so it’s sent to the Extra output field.
Input (on a name line)Output with rule-based parsing
FieldData
xb1wc so34bod2jcExtra1xb1wc so34bod2jc
If you turn presumptive parsing on, the same data is parsed as first and last name
because it came in on a nameline.
Input (on a name line)Output with presumptive parsing
FieldData
xb1wc so34bod2jcFirst namexb1wc
Last nameso34bod2jc
26
DataRight IQ User’s Guide
If you have a legitimate name in your data (like “john smith” among the same
junk data, below), parsing will pull out John as a first name, Smith as a last name,
and the rest as Extra—regardless if you use rule-based parsing or presumptive
parsing—because the data matched a parsing rule.
Input Output with presumptive parsing on
FieldData
xb1wc john smith so34bod2jcFirst nameJohn
Last nameSmith
Extra1xb1wc so34bod2jc
Turn presumptive
parsing on
In Job, activate presumptive parsing for both name and firm lines in the Parsing
Control box:
If you use DataRight IQ Library, you use an API call to accomplish this.
BEGIN Parsing Control =============================================
Parsing Mode (NONE/PARSE)............ = PARSE
Presumptive Parse Name Lines......... = Y
Presumptive Parse Firm Lines......... = Y
END
In Views, select the Parsing Setup group in the main window, then open the
Parsing Control window. Select the Use Presumptive Parsing for Name Lines and
Use Presumptive Parsing for Firm Lines options:
Chapter 3: What and how DataRight IQ parses
27
DataRight IQ’s multiline parsing order
When input is on a multiline, DataRight IQ parses data in the following order:
OrderParsed item
Street address and lastline
1
E-mail address
2
U.S. Social Security number
3
Date
4
Phone number (U.S. or Canadian)
5
Phone number (International)
6
User-defined pattern
7
Name and title
8
Firm
9
Why order?The order in which DataRight IQ parses your data is important. Why? Because if
DataRight IQ identifies data as one thing before it can evaluate it as another, you
may get unexpected results.
For example, if DataRight IQ identifies a nine-digit number as a U.S. Social
Security number, then it won’t evaluate that data as a potential international
phone number. Likewise, if you set up a custom pattern that looks for 5-digit
numbers, anything recognized as a ZIP code is not going to make it through to get
evaluated against your pattern.
When parsing, DataRight IQ looks through each record for different types of
data. For each type of data, DataRight IQ makes a separate “pass” through the
data. If DataRight IQ finds something on one pass, it extracts that data—and on
the next pass examines only the data that remains.
When an item is recognized, it doesn’t go to the next step.
28
DataRight IQ User’s Guide
Modify how DataRight IQ parses
DataRight IQ parses better than DataRight, in part because you can decide how it
parses. You have more control over how DataRight IQ parses.
You can, of course, set the input fields and retrieve the output fields as usual. And
with DataRight IQ you can create custom dictionaries just like you can with
DataRight—by using the User-Modifiable Dictionary (UMD) utility.
But with DataRight IQ you can also create or edit parsing rules in the rule file
(drlrules.dat). The rule file controls how DataRight IQ parses name and firm
data.
Use pre-defined rulesDataRight IQ already provides hundreds of rules for many different possible
combinations of data. These rules will likely satisfy the parsing needs of most
users. However, you may encounter data that isn’t being parsed the way that you
want it to be parsed. Or, maybe you want to tweak a rule so that it returns a
different confidence score. In situations like this, it is very handy to be able to edit
the rule file.
For more information on editing the rules by which DataRight IQ parses, see the
DataRight IQ Modifier’s Guide.
Turn off parsing
engines
To help you control how DataRight IQ parses, the program gives you the ability
to directly control what DataRight IQ parses. For each input line you can
selectively turn off the parsing of addresses, names, firms, Social Security
numbers, dates, phone numbers, user-defined patterns, and e-mail addresses.
In your DataRight IQ Job product, see the Multiline Parsing block. In your
DataRight IQ Library product, see the drl_disable_iline_parsers() function (if
using C) or the DisableILineParsers() method (if using C++).
Chapter 3: What and how DataRight IQ parses
29
30
DataRight IQ User’s Guide
Chapter 4:
How DataRight IQ parses new types of
data
DataRight IQ can identify and isolate various data and data components from
fields of information:
Fields that DataRight IQ parses
that DataRight did not
e-mail addressespage 32
U.S. Social Security numbers page 34
datespage 37
phone numberspage 38
user-defined pattternspage 40
For a complete list of the input fields that DataRight IQ accepts, see Business
Objects’ Quick Reference for Views and Job-File Products.
For more information, see
Chapter 4: How DataRight IQ parses new types of data
31
Parse e-mail addresses
When DataRight IQ parses input data that it determines is an e-mail address, it
places the components of that data into specific fields for output. Below is an
example of a simple e-mail address:
sales@firstlogic.com
By identifying the various data components (user name, host, and so on) by their
relationships to each other, DataRight IQ then assigns the data to specific fields.
Fields usedDataRight IQ outputs the individual
components of a parsed email address—
that is, the email user name, complete
domain name, top domain, second
domain, third domain, fourth domain,
fifth domain, and host name.
For inputting and outputting e-mail
address information, DataRight IQ uses
the fields listed at right:
What DataRight IQ
does
With DataRight IQ, you can do the following things with an e-mail address:
Parse the e-mail address, either in a field by itself or combined in a field with
other data.
Break the domain name down into sub-elements.
Verify that an e-mail address is properly formatted.
Flag the address for special handling (see “Flag addresses” on page 32).
Not verifiedSeveral aspects of an e-mail address are not verified by DataRight IQ. DataRight
IQ does not verify:
whether the domain name (the portion to the right of the @ sign) is
registered.
whether an e-mail server is active at that address.
whether the user name (the portion to the left of the @ sign) is registered on
that e-mail server (if any).
whether the personal name in the record can be reached at this e-mail
address.
Flag addressesYou can flag e-mail addresses based on a list of criteria you create or maintain.
For example, if you focus on B2B (business to business), you might want to flag
consumer-oriented domain names such as hotmail, yahoo, or aol.com.
You flag addresses by matching them against a list of hosts and domain names in
a file named drlemail.dat. You can post a flag (to the field EmailISP1-6) that
32
DataRight IQ User’s Guide
indicates if it looked up or not. This can then be used to separate records either in
the current job or in a future process.
E-mail componentsThe AP field where DataRight IQ places the data depends on the position of the
data in the record. DataRight IQ follows the Domain Name System (DNS) in
determining the correct output field.
When DataRight IQ parses the following data:
expat@london.home.office.city.co.uk
it would assign these components to the following fields according to DNS.
Sample dataFieldField description
expatEmailUser1The user name, or “addressee.” The person or department, for example,
for whom the e-mail is intended.
london.home.office.city.co.ukEmailAllD1The “all D” is the entire domain.
ukEmailTopD1The top level domain. In DNS, the highest level of hierarchy after the root
(the last “dot”). In a domain name, that portion of the domain name that
appears farthest to the right, often “com,” “org,” “gov,” and so on.
home.office.city.coEmailTopD2-5The elements between the host and top level domain..
londonHostThe element immediately to the right of the “at” symbol (@).
For example, with the input data, expat@london.home.office.city.co.uk,
DataRight IQ outputs each element in the following fields:
= expat@london.home.office.city.co.uk
= expat
= london.home.office.city.co.uk
= uk
= co
= city
= office
= home
= london
= f
Chapter 4: How DataRight IQ parses new types of data
33
Parse Social Security numbers
DataRight IQ parses U.S. Social Security numbers (SSNs) that are either by
themselves or on an input line surrounded by other text.
Fields usedFor inputting and outputting U.S. Social Security number information, DataRight
IQ uses the following fields:
Input fieldsOutput fields
How this differs
from DataRight
Setting up SSN parse
in DataRight IQ
PW.SSN1-6
AP.SSN1-6
and multiline fields
The six available PW.SSN fields (PW.SSN1-6) store outputted Social Security
number data.
Example of data:
Typical field length:
123-45-6789
9-11 characters
DataRight (as opposed to DataRight IQ) had one PW.SSN field that could contain
a Social Security number. DataRight did not perform any processing on the data
in this field.
You could use this field to overcome field-naming differences among input files.
For example, if one input file contained a field named SS_Number and another
input file contained a field named Soc_Sec_No, you could define both fields as
PW.SSN. This would give you a common Social Security Number field
(PW.SSN) to use in filters and output posting.
DataRight IQ has two SSN options to set up for parsing:
Specify the file location that contains SSN information
Select the type of deliminter to use
SSN information fileSpecify the location of the SSN information file in the Auxiliary Files block in
DataRight IQ.
SSN delimiterIn the Standardization/Assignment Control block, you can determine what
delimiter you want to output the SSN with by setting the SSN Delimiter option.
In Job, there is a note at the end of the block specifying the deliminters to choose
from In Views, choose from options in a drop-down list.
34
DataRight IQ User’s Guide
How DataRight IQ
handles Social
Security numbers
DataRight IQ handles Social Security numbers in two steps:
1.Identifies a potential SSN by looking for any of three patterns:
PatternDigits per groupingDelimited by
nnnnnnnnn9 consecutive digitsn.a.
nnn nn nnnn3, 2, and 4 (for area, group, and serial)spaces
nnn-nn-nnnn3, 2, and 4 (for area, group, and serial)all supported
delimiters
2.Performs a validity check on the first five digits only. Two outcomes of this
validity check are possible:
OutcomeDescription
PassDataRight IQ successfully parses the data—and the Social Secu-
rity number comes out in an AP field.
FailDataRight IQ does not parse the data because it’s not a valid SSN
as defined by the U.S. government—so the data comes out as
Extra, unparsed data.
Check validityWhen performing a validity check, DataRight IQ doesn’t verify that a particular
9-digit Social Security number has been issued, or that it’s the correct number for
any named person. Instead, it validates only the first 5 digits (area and group).
DataRight IQ doesn’t validate the last 4 digits (serial)—except to confirm they
are digits.
SSAdataDataRight IQ’s validation of the first 5 digits is driven by a table from the Social
). That table is
updated monthly as the SSA opens new groups. The rules and data that guide this
check are available at http://www.ssa.gov/history/ssn/geocard.html
Update your SSN fileBusiness Objects provides the Social Security Number (SSN) file (drlssn.dat)f or
.
DataRight IQ customers interested in parsing recently issued and existing U.S.
Social Security numbers. The SSN file is updated monthly with the latest SSN
information from the U.S. government. Business Objects will convert the data to
a format that DataRight IQ can use and post the data by the 5th of every month.
You can obtain the most current SSN file from the Customer Portal site at http://
download.firstlogic.com. This area provides you with the opportunity to
download the latest drlssn.dat file used to parse U.S. Social Security numbers
within DataRight IQ.
Outputs valid SSNsOutputs only Social Security numbers that pass its validation. If an apparent SSN
fails validation, DataRight IQ does not pass on the number as a parsed, but
invalid, Social Security number.
Chapter 4: How DataRight IQ parses new types of data
35
Other U.S. ID
numbers
Your data may include other numbers used in the United States for governmental
identification purposes. DataRight IQ’s capability is aimed at U.S. Social
Security numbers, which are, in effect, Tax IDs for individuals. However, other
numbers include ITIN and EIN.
NameDescription
ITIN Individual Taxpayer Identification Number
This “cousin” to the Social Security number is what the IRS assigns to people
who earn money and pay federal income taxes but who are not citizens (they
are resident or non-resident aliens). An ITIN looks like an SSN except that it
begins with the number 9.
DataRight IQ treats an ITIN as an invalid SSN. It might match the pattern,
but not make it through the check against the SSN table, so an ITIN will
come out as unparsed Extra.
EIN Employer Identification Number
Synonymous with a corporate Tax Identification Number (TIN) or Tax ID,
this number is also 9 digits. However, its pattern is
nn-nnnnnnn
. Because of
that, the EIN is not recognized by by DataRight IQ’s SSN parser.
Use UPDM to parse other patterns. If you need to parse patterns that aren’t
covered by one of DataRight IQ’s usual parsing engines, use DataRight IQ’s
UDPM (user-defined pattern matching) feature.
36
DataRight IQ User’s Guide
Parse dates
DataRight IQ recognizes dates in a variety of formats and breaks those dates into
components.
Fields usedFor inputting and outputting date
information, DataRight IQ uses the
following fields:
Formats and
delimiters
DataRight IQ supports the following formats and delimiters. That is, you can
select any one of these formats to standardize dates.
FormatExample
yyyy*mm*dd2004 01 27
yy*mm*dd04 01 27
dd*mm*yyyy27 01 2004
dd*mm*yy27 01 04
mm*dd*yyyy01 27 2004
mm*dd*yy01 27 04
dd*mmm*yy27 Jan 04
dd*mmm*yyyy27 Jan 2004
mmm*dd*yyyyJan 27 2004
Input fieldsOutput fields
PW.Date1-6
AP.Date1-6
and multiline fields
Delimiter*Description
<none>no space
<space>a space
/
–
\
.
* Delimiters appear between date components only for for-
mats that have delimiters. That is, you also have the
option of using no delimiters (<none>).
forward slash
dash
backward slash
period
mmm*dd*yyJan 27 04
yyyymmdd20040127
yymmdd040127
ddmmyyyy27012004
ddmmyy270104
mmddyyyy01272004
mmddyy012704
DataRight IQ can parse up to six dates from your defined record. That is,
DataRight IQ identifies one or more dates (up to six) in the input, breaks found
dates into components, and makes dates available as output in either the original
format or a user-selected standard format.
Chapter 4: How DataRight IQ parses new types of data
37
Parse phone numbers
DataRight IQ can parse both North American (U.S. and Canada) and
international phone numbers. When DataRight IQ parses a phone number, it
outputs the individual components of the number into the appropriate AP fields
(see the examples below).
Fields usedFor inputting and outputting phone
number information, DataRight IQ
uses the following fields:
U.S. versus
international
phone numbers
U.S. and CanadaWhat DataRight IQ calls U.S. phone numbers should be more properly called
Phone numbering systems differ around the world. DataRight IQ recognizes
phone numbers by their pattern and (for non-U.S. numbers) by their country code,
too.
North American phone numbers. The Canadian phone number standard follows
the same pattern as U.S. phone numbers. Because of this, when DataRight IQ
parses a phone number that’s either from the U.S. or Canada, it posts the data to
AP.USPhoneX.
DataRight IQ searches for U.S. phone numbers by commonly used patterns such
as: (234) 567-8901, 234-567-8901, and 2345678901.
DataRight IQ gives you the option for some reformatting on output (such as your
choice of delimiters). Below is an example with extension text:
Input data:(901) 234-5678 EXT 1234
Output data:901-234-5678 Ext. 1234
Europe and Pacific-RimDataRight IQ searches for European and Pacific-Rim numbers by pattern. The
patterns used are stored in drlphint.dat. They require that the country code
appear at the beginning of the number. DataRight IQ doesn’t offer any options for
reformatting international phone numbers. Also, DataRight IQ doesn’t crosscompare to the address to see if the country and city codes in the phone match the
address.
38
DataRight IQ User’s Guide
Phone number
components
Phone numbers
consist of different
output components
depending on
whether they’re U.S.
or international
numbers.
Individual components for:
U.S. phone numbersnon-U.S. phone numbers
area code
prefix
line number
extension
line type
country code
city code
number
description
Example of U.S.
phone number
Example of non-U.S.
phone number
Say you have the following U.S. phone data for input:
Work (308)-555-8402 ext 34
DataRight IQ parses the data in
the following AP fields:
Some of these fields (namely,
area code, extension, and type)
are optional. If your input
doesn’t have appropriate values
for these fields, DataRight IQ
leaves them empty.
AP fieldOutput value
AP.USAreaCod1308
AP.USPhonPre1555
AP.USPhonLin18402
AP.USPhonExt134
AP.USPhonTyp1Work
Say you have the following international (non-U.S.) phone data for input:
61-9-0123-4567
DataRight IQ parses the data in the following AP fields (all data must be present
for the phone data to be valid):
AP fieldOutput valueNote
AP.IntCtryCd161
AP.IntCityCd19
AP.IntPhNum10123-4567
AP.IntPhDesc1AustraliaPopulated based on the Country ID.
DataRight IQ accepts international phone numbers only as they would be
!
dialed from the U.S. For example, the number must start with the appropriate
country code.
Also, if presented on a line with other data, the international phone number
must start the line.
Chapter 4: How DataRight IQ parses new types of data
39
Parse user-defined patterns
Parse any number
or alphanumeric
With DataRight IQ you can parse data that’s outside the range of name, title,
address, and so on. With DataRight IQ’s user-defined pattern matching (UDPM)
feature, you can parse a wide variety of data such as:
In other words, DataRight IQ can parse any kind of number or alphanumeric for
which you can define a pattern.
Fields usedFor inputting and outputting user-defined pattern information, DataRight IQ uses
the following fields:
Input fieldsOutput fieldsDescription of output field
PW.Pattern1-4
and multiline fields
AP.Pattern1-4
AP.PatnLabel1-4
AP.Patnsub1-4_1-5
The pattern
The label for the pattern
The subpattern(s) of the pattern
The pattern label is created in the
drludpm.dat
file when the pattern is defined.
How it’s doneDataRight IQ is able to parse patterns through its user-defined pattern matching
(UDPM) feature, which uses regular expressions. That is, you can set up data
patterns to suit your data (such as part numbers), and DataRight IQ can parse your
data according to those user-defined patterns.
DataRight IQ’s UDPM feature makes possible the parsing and extraction of
virtually any kind of data that conforms to a pattern—any type of data pattern that
can be expressed using regular expressions.
Define your patternWhen you create a user-defined pattern, you must include a carriage return/
linefeed at the end of the line. All characters before the carriage return/linefeed—
even blank spaces—are considered part of the pattern.
For more informationFor more information on UDPM and setting up the patterns that DataRight IQ
will parse, see the DataRight IQ Modifier’s Guide, which accompanies the
DataRight IQ product. This ability is for advanced users of DataRight IQ. You
should read and follow all warnings before changing how your product works.
40
DataRight IQ User’s Guide
Chapter 5:
DataRight IQ’s additional features
In addition to improvements in parsing, DataRight IQ has several other new
features to enhance your control over file parsing.
Control name order. DataRight IQ provides you with more control over
how your name formats are followed.
Generate statistics files. DataRight IQ generates statistics files for you to
use to create customize reports.
Use confidence scores. DataRight IQ assigns calculated numeric scores to
parsed names and firms to indicate the accuracy of the parse.
Chapter 5: DataRight IQ’s additional features
41
Control name order
DataRight IQ lets you determine name order—FML (First Middle Last) or LFM
(Last First Middle)—just like DataRight did. However, instead of only two name
orders (FML and LFM), DataRight IQ provides you with more control over how
those orders are followed.
“Strict” or “Suggest”Now you can decide how name order is applied to parsed names. DataRight IQ
determines name order according to how you set the following parameter(s):
You set the name order (FML or LFM) in the definition (DEF) file.
Turn on strict name
order
LibraryTo make name order in DataRight IQ work as you may be used to with
DataRight, use one of the two *_STRICT values. When you choose SUGGEST,
DataRight IQ uses the suggested order only when the input is ambiguous. With
SUGGEST, when the name is ambiguous, DataRight IQ looks to the name order
to determine which name is First and which Last.
When you use the *_STRICT values, DataRight IQ parses name data the way it is
set. This means that if you use DRL_FLM_STRICT, “Tommy Jones” on input
parses as First name Tommy and Last name Jones on output. Likewise, if you use
DRL_LFM_STRICT, “Tommy Jones” parses as Last name Tommy and First
name Jones.
Know your data. Be aware that when you choose the strict values for name
!
parsing, the method you choose will be the way each name is parsed. This
could result in unexpected results such as the example with “Tommy Jones”
above.
JobIf you use DataRight IQ Job, you enable strict name order by turning the ability
on in the Input File block. In the Input File block, change the value for the Strict
Name Order parameter to Yes.
Strict Name Order (Y/N).......... = N
42
DataRight IQ Transition Guide
The only valid values for this parameter are Y or N (yes or no).
ValueDescription
YesDataRight IQ will use the name order set in your DEF file for every record.
NoDataRight IQ will use the name order set in your DEF file only when the
name is ambiguous.
ViewsSetup strict name order in the same way that you set up for Job. Open the Input
File window and select Strict Name Order.
Chapter 5: DataRight IQ’s additional features
43
Generate statistics files
You can have DataRight IQ generate statistics files. Statistics files are text files
containing all the information from a specific report. You can then use these files
when you create custom reports.
DataRight IQ can create the following statistics files:
Enable statistics file
generation
Statistics fileBased on this
report
Job
Job SummaryThis statistics file contains a single record. The
Statistics file
Description of statistics file
statistics in this file represent the significant
aspects of your DataRight IQ job.
Exec
Statistics file
Output
Statistics file
Executive
Summary
This statistics file contains everything that you can
find in the Executive Summary report.
Output File This statistics file includes one record per list per
output file. Totals for the lists are not provided; to
determine totals, add up the appropriate fields,
based on the type of output file.
In the same way that DataRight IQ creates reports, DataRight IQ creates each
statistics file during the process that the file describes.
For example, to create the output Statistics File, you must have your job set up to
create an output file and an Output File report. So, in addition to the Output
Statistics File block, your job must include a Report: Output File block, and a
Create File for Output block.
Activate statistic file generation in the Execution block by typing Y for the Create
Report Statistics Files paramter in Job, or selecting the Create Reports Statistics
File(s) option in Views.
44
Yes and No (or selected/deselected) are the only options for this parameter.
ValueDescription
Yes (selected)If the parameter is set to Y (selected in Views), then DataRight IQ
verifies the Statistics Files block parameters for valid file type and
path. Valid file types are ASCII, delimited, or dBASE3.
No (deselected)If the parameter is set to N (deselected in Views), then DataRight
IQ ignores the Statistics File block and doesn’t create any statistics
files during processing.
After you have enabled statistics files, you need to set them up.
DataRight IQ Transition Guide
Setting options for
statistics files
Job
Views
You can set options and file paths for your statistics files through the Statistics
Files block. In this block, you specify the file names, locations, and types. Use the
$job macro whereever a file name is needed to save time.
Note: The options in the block come in pairs. At the first parameter, type a
full path for the statistics file. On the second line type the file type that you
want (ASCII, Delimited, or dBASE3).
BEGIN Statistics File =========================================
Job Stats Name (path & file name).... = $jobj.dsj
File Type (ASCII/DBASE3/DELIMITED)... = ASCII
Exec Stats Name (path & file name) .. = $jobe.dse
File Type (ASCII/DBASE3/DELIMITED)... = ASCII
Output Stats Name (path & file name). = $jobo.dso
File Type (ASCII/DBASE3/DELIMITED)... = ASCII
END
For more informationFor complete information about statistics files, see the Reports chapter in your
DataRight IQ User’s Guide.
Chapter 5: DataRight IQ’s additional features
45
Use confidence scores
DataRight IQ assigns a confidence score to each parse.
A confidence score is a number between 0-100 that is used as a way to quantify
the confidence that a piece of data was correctly parsed. In DataRight IQ, there is
a confidence score for the items parsed by a rule, as well as for the
subcomponents that make up the pieces of that rule. You can use this score to
analyze your parsing results.
Discrete fields are parsed at 100 percent confidence. For non-discrete fields you
can specify the breakpoint for confidence ranges.
For reports only. Confidence scores are used for reporting purposes only.
How you set the confidence scores has no effect on the way your data is
parsed. If you change the High Confidence setting to a lower percentage, your
report will show more data as parsed with high confidence, but that doesn’t
change how DataRight IQ parsed that data.
Changing scores using
a confidence booster
DataRight IQ chooses the rule with the highest confidence score. You cannot
change the initial confidence score determined by DataRight IQ. However, if you
want another rule to be used, you can add a “confidence booster” to the rule you
want to use so that it returns a higher confidence. Then DataRight IQ chooses the
new rule for the parse that is used.
Refer to the Modifier’s Guide for more details about confidence boosters and
changing rule files.
For reports only. Confidence scores are used for reporting purposes only.
How you set the confidence scores has no effect on the way your data is
parsed. If you change the High Confidence setting to a lower percentage, your
report will show more data as parsed with high confidence, but that doesn’t
change how DataRight IQ parsed that data.
Set confidence scores
JobIn DataRight IQ’s job-file implementation, the Report Defaults block has the
following parameters controlling confidence scores (non-discrete fields).
High Confidence (2 to 100)........... = 31
Medium Confidence (1 to 99).......... = 21
Name High Confidence (2 to 100)...... = 31
Name Medium Confidence (1 to 99)..... = 21
Firm High Confidence (2 to 100)...... = 66
Firm Medium Confidence (1 to 99)..... = 21
These “whole record” confidence scores correspond
with similar parameters in
DataRight 2.5x.
These confidence scores for
name and firm are new for
DataRight IQ.
46
DataRight IQ Transition Guide
ViewsIn Views, you control the confidence scores for non-discrete fields in the Reports
Defaults block.
For more informationFor more information, see your DataRight IQ User’s Guide. For an overview of
and details about confidence scores, see the chapter “Data-quality scores and
codes.”
Chapter 5: DataRight IQ’s additional features
47
48
DataRight IQ Transition Guide
Chapter 6:
Fields DataRight IQ uses
DataRight and DataRight IQ don’t use the same fields. Well, not all the same
fields. DataRight IQ has a number of fields that you won’t find in DataRight
(necessary because of its increased parsing abilities). Conversely, DataRight IQ
has no need for, and so it doesn’t use, some fields that DataRight made use of.
Avoid these input fields:
!
If you’re a jobfile user, you should not input data on certain fields that you
may be familiar with from DataRight 2.56.
Avoid:Use instead:
PW.Name_linePW.Name_line1-6
PW.PhonePW.Phone1-6
PW.SSNPW.SSN1-6
If your definition file contains a field with this name, you will receive an
invalid field error message.
Chapter 6: Fields DataRight IQ uses
49
Comparing fields in DataRight IQ and DataRight
Input fieldsYou can compare PostWare (PW) fields in DataRight IQ and DataRight.
DataRight IQ 7.10DataRight 2.56How DataRight IQ differs
DataRight IQ’s other application fields are very similar to DataRight’s. The
following tables contrast DataRight’s IQ’s APU and APC fields with DataRight
2.56’s.
APU fields
DataRight IQ 7.10DataRight 2.56How DataRight IQ differs
Here you can make a side-by-side comparison of the entire master job files for
DataRight 2.56 and DataRight IQ 7.10c revision 2.
On the following pages, the job file for DataRight 2.56 appears in the left column
with the corresponding blocks of DataRight IQ’s job file alongside in the right
column.
Change bars
By using change bars, the following pages show you what’s different between
DataRight and DataRight IQ. Change bars to the right of the DataRight
column show which DataRight blocks and parameters don’t exist in
DataRight IQ. Similarly, change bars next to the DataRight IQ column show
which DataRight IQ blocks don’t exist in DataRight.
Appendix A:
55
DataRight 2.56
DataRight IQ 7.10
* MASTER JOB FILE FOR DataRight
BEGIN General DataRight 2.56c ==========================
Job Description (to 80 chars)........ =
Job Owner (to 20 chars).............. =
END
BEGIN Execution ========================================
Parsing Mode (NONE/PARSE)............ = Parse
Post to Input File(s) (Y/N).......... = Y
Post to Output File(s) (Y/N)......... = Y
Create Reports (Y/N)................. = Y
Warn Before File Overwrite (Y/N)..... = Y
Show Detailed Process messages (Y/N). = Y
Message Update Increment............. = 1000
Work File Directory.................. =
Create Backup File(s) (Y/N).......... = N
Backup Directory (path).............. =
Cache Buffer Size (SPEED/SPACE)...... = Speed
END
BEGIN Auxiliary Files ==================================
*BEGIN Create Scan/Split Function =======================
Function Name (to 10 chars).......... =
Internal Table Name (to 20 chars).... =
External Table (path & file name).... =
Table Priority (INTERNAL/EXTERNAL)... = INTERNAL
Scan Method (WORD/STR)............... = WORD
Split Method (BEFORE/AFTER/3PART).... = BEFORE
END
BEGIN Date Conversion =================================
Function Name (to 10 chars).......... =
Input Date Format (See NOTE)......... =
Output Date Format (See NOTE)........ =
Output Field Type (DATE/CHARACTER)... = DATE
Use 20xx for Years 00 to ?? (00-99).. = 50
END
* NOTE:
* For Date Format, specify any combination of the ...
* DD for day
* MM or MMM for month (MM is numerals, MMM is alpha)
* YY or YYYY for year
* Punctuation and/or spaces can be used for delimiters.
BEGIN Create File for Output ===========================
Output File (location & file name)... =
File Type (See NOTE)................. = ASCII
Create DEF file (Y/N)................ = N
Rec Format to Clone (path & file name)=
Field (name,length,type[,misc])...... =
END
* NOTE: The following are valid File Types:
* DBASE3
* ASCII
* DELIMITED
* EBCDIC
* RMS (VMS only)
* RMS_FIXED (VMS only)
BEGIN Post to Output File =============================