Identify and parse name, title, firm data,
phone numbers, Social Security numbers,
dates, e-mail addresses, and userdefined patterns
Assign gender and add prenames
Create personalized greetings
Generate match standards
Convert files to a standard format
Search for and replace data
Scan and split data
Generate reports and statistics files
Notices
Published in the United States of America by Firstlogic, Inc., 100 Harborview Plaza,
La Crosse, Wisconsin 54601-4071.
Customer CareTechnical help is free for customers who are current on their ESP. Advisors are
available from 8 a.m. to 6 p.m. Central time, Monday through Friday. When you call,
have at hand the user’s manual and the version number of your Firstlogic product.
Call from a location where you can operate your software while speaking on the
phone. To save time, fax or e-mail your questions, and an advisor will call or e-mail
back with answers prepared. Or visit our Knowledge Base on the Customer Portal
web site to find answers on your own, right away, at any time of the day or night.
Our Customer Care group also manages our customer database and order processing.
Call them for order status, shipment tracking, reporting damaged shipments or flawed
media, changes in contact information, and so on.
What do you think of
this guide?
Legal notices
Phone
888-788-9004 in the U.S. and Canada;
elsewhere +1-608-788-9000
Fax
Web site
E-mail
Product literature
Corporate receptionist
608-788-2870
http://www.firstlogic.com/customer
customer@firstlogic.com
888-215-6442, fax 608-788-1188,
or
information@firstlogic.com
608-782-5000, or fax 608-788-1188
The Firstlogic Technical Publications group strives to bring you the most useful and
accurate publications possible. Please give us your opinion about our documentation
by filling out the brief survey at http://www.firstlogic.com/customer/surveys/
default.asp. We appreciate your feedback! Thank you!
Firstlogic, Inc., or any authorized dealer distributing this product, makes no warranty, expressed or implied, with respect to
this computer software product or with respect to this manual or its contents, its quality, performance, merchantability, or
fitness for any particular purpose or use. It is solely the responsibility of the purchaser to determine its suitability for a
particular purpose or use. Firstlogic, Inc. will in no event be liable for direct, indirect, incidental, or consequential damages
resulting from any defect or omission in this software product, this manual, the program disks, or related items and
processes, including, but not limited to, any interruption of service, loss of business or anticipatory profit, even if
Firstlogic, Inc. has been advised of the possibility of such damages. This statement of limited liability is in lieu of all other
warranties or guarantees, expressed or implied, including warranties of merchantability or fitness for a particular purpose.
Registered trademarks of Firstlogic, Inc. include 1L, 1L (ball design), ACE, ACSpeed, DataJet, DocuRight, eDataQuality,
Entry Planner, Firstlogic, Firstlogic InfoSource, FirstPrep, FirstSolutions, GeoCensus, i·d·Centric, IQ Insight, iSummit,
Label Studio, MailCoder, Match/Consolidate, PostWare, Postalsoft, Postalsoft Address Dictionary, Postalsoft Business
Edition by Firstlogic, Postalsoft DeskTop Mailer, Postalsoft DeskTop PostalCoder, Postalsoft DeskTop Presort, Postalsoft
Manifest Reporter, PrintForm, RapidKey, Total Rewards, and TrueName. Trademarks of Firstlogic, Inc. include
DataRight, IRVE, and TaxIQ. Trademarks of the United States Postal Service include CASS, DPV, eLOT, FASTforward,
link
and ZIP. All other trademarks are the property of their respective owners.
Firstlogic’s DataRight IQ product includes a number of user guides, reference
guides, and online documentation (see below).
About this guideThis guide is divided into three units: User’s Guide, Job-File Reference, and
Library Guide.
DocumentFor those
who use
DataRight IQ
anyExplains DataRight IQ’s capabilities in general terms,
User’s Guide
Description
with information specific to different implementations
(Job, Views, and Library) in separate units (see below).
anyUnit 1:
DataRight IQ
Contains an overview of the software’s tools and features.
Overview
Job
and/or
Vie ws
Unit 2:
DataRight IQ
Job and Views
Contains more detailed information
about the software’s capabilities and
includes some setup instructions that
you can use in your own job creation.
LibraryUnit 3:
DataRight IQ
Library
Contains the detailed information that
you need to set up and run the Library
implementation of DataRight IQ.
Conventions usedThe following conventions are used throughout this guide and other Firstlogic
documentation:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
cd\dirs
.”
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file,
and the
.txt
extension (
testfile
.txt
).”
Menu commands We indicate commands that you choose from menus in the following
format: Menu Name > Command Name. For example, “Choose File
> New.”
!
We use this symbol to alert you to important information and potential problems.
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful
to you.
Preface
7
Documentation
Other documentationBesides this user’s guide, DataRight IQ comes with other documentation to help
you fully use the application’s abilities.
Other related
documentation
DocumentFor those
who use
DataRight IQ
anyContains any necessary installation information and
Release Notes
Description
explains DataRight IQ’s capabilities in relation to
the previous version.
DataRight IQ
Transition Guide
anyFor those upgrading from DataRight or TrueName,
this guide explains the differences between those
products and DataRight IQ.
DataRight IQ
Views online help
ViewsContains information, accessible on-line, about run-
ning DataRight IQ through its Views implementation.
DataRight IQ
Modifier’s Guide
anyExplains how to modify DataRight IQ to suit your
needs—including how to create custom dictionaries
using the command-line version of the User-Modifiable Dictionary (UMD) program, how to edit your
rule file, and how to set up your user-defined patterns.
Below is a list of other Firstlogic documentation that you may find useful.
DocumentDescription
Access the latest
documentation
Views: A Quick Guide
to Get You Started
Edjob booklet
Gives you the basic information you need to get started with
any Firstlogic Views software.
Explains how to use the Edjob utility to update your job files.
Use this utility to update your job files when you receive a
new version of DataRight IQ.
System Administrator’s
Guide
Describes installation procedures, system requiremtents and
more.
You can access Firstlogic documentation in several places:
On your computer. Release notes, manuals, and other documents for each
Firstlogic product that you’ve installed are available in the Documentation
folder. Choose Start > Programs > Firstlogic Applications >
Documentation.
On the Firstlogic Customer Portal. Go to www.firstlogic.com/customer,
and then click the Documentation link to access all the latest Firstlogic
documentation. You can view the PDFs online, save them to your computer,
or order professionally printed documents that will be delivered to you. To
order printed documents, see the following instructions.
8
DataRight IQ User’s Guide
Unit 1:
DataRight IQ Overview
Contents
Job file
Views
Library
for information specific to a particular implementation, such as step-by-step
procedures.
Welcome to DataRight IQ11
Standardize data23
Add new data to existing data31
Custom dictionaries41
The chapters in this unit are for all DataRight IQ users. The
information in this unit applies whether you use the Job, Views,
or Library implementation of DataRight IQ.
These chapters provide a conceptual overview of DataRight IQ’s
capabilities. They also include many references to later chapters
DataRight IQ Overview
9
10
DataRight IQ User’s Guide
Chapter 1:
Welcome to DataRight IQ
DataRight IQ is advanced data-parsing software that identifies the information in
your database so that you can use it more effectively.
What is parsing?DataRight IQ identifies and isolates data from other data. We call this parsing. It
can parse a wide variety of data:
address
E-mail
SSN
date
phone numbers (US and Canada)
phone numbers (international)
Batch versus nonbatch parsing
user-defined pattern matching (UDPM)
name
firm
It can parse this data even if it’s floating in unfielded lines.
While parsing, the software can standardize your data to make it more consistent,
add gender information and salutations, and create output records in your
preferred target format.
No matter which implementation of DataRight IQ you have (Views, Job, RAPID,
Library, and so on), you can parse the same data. However, there are some
differences in how DataRight IQ parses data depending on its implementation.
DataRight IQ Views and Job implementations: Parses batches of data (data in
databases).
DataRight IQ Library implementation: Parses data one record at a time
according to your setup. Its behavior depends on how you integrate it with your
software.
Keep these differences in mind when reading DataRight IQ’s documentation.
When the documentation refers to accessing databases or refers to a batch
process, this only applies to Views and Job, it doesn’t apply to DataRight IQ
Library.
Chapter 1: Welcome to DataRight IQ
11
How DataRight IQ works
Job and ViewsBelow are the basic steps DataRight IQ (Job and Views) takes as it processes a
record:
StepDescription
Input recordsThe software takes one record at a time from a database.
Select records
for processing
The software selects and processes only the records you want. For
example, you can select records based on specific criteria such as age
or income, or select a representative sample of records for processing.
Modify input
data
The software modifies or converts input data on input. For example,
DataRight IQ could convert prename codes to prenames or remove
unwanted data from a field.
addresses, e-mail addresses, Social Security numbers, dates, phone
numbers, and any other user defined patterns. It then breaks the data
into lines and individual components such as prename, first name,
middle name, last name, title, for example.
Standardize data The software performs case conversion, standardizes common words
(Incorp becomes Inc.), and fills in missing city, state, or ZIP Code
data. It also can convert firm names, titles, and other data to acronyms
(General Motors becomes GM).
Split recordsThe software splits one input record into several output records. For
example, if an input file contains multiple names, DataRight IQ can
create a separate record for each person.
Generate
new data
The software also splits combined names such as
Smith
into individual names such as
John Smith
The software generates new data that you can add to the record:
•Gender codes.
DataRight IQ
assigns a gender code for each
John and Mary
and
Mary Smith
.
name.
•Prenames. If
DataRight IQ
is confident of a name’s gender, it
can assign the prename Mr., Ms., or Mrs.
•Match standards.
or potential matching words. For example,
DataRight IQ
can generate match standards,
DataRight IQ
can
tell you that Patrick and Patricia are potential matches for the
name Pat.
•Greetings.
DataRight IQ
generates personal salutations in
various styles: formal (Dear Mr. Shakespeare), casual (Dear William), and title (Dear Playwright).
12
DataRight IQ User’s Guide
Select records
for output
For each output file, the software selects only the records you want.
For example, you can select records based on specific criteria (such as
gender) or select a representative sample of records.
Output dataThe software offers four types of data for output:
•Standardized data parsed into components and lines.
•Unstandardized (raw) data parsed into components and lines.
•Raw data taken directly from the input file (not parsed).
•Additional data and codes generated during processing.
LibraryBelow are the basic steps that DataRight IQ (library implementation) takes as it
processes a record:
StepDescription
Input recordsYou pass one record at a time to DataRight IQ Library.
addresses, e-mail addresses, Social Security numbers, dates, phone
numbers, and any other user defined patterns. It then breaks the
data into lines and individual components such as prename, first
name, middle name, last name, title, for example.
Standardize dataDataRight IQ can perform case conversion and standardize com-
mon words (Incorp becomes Inc.). DataRight IQ can also convert
firm names, titles, and other data to acronyms (General Motors
becomes GM).
Split recordsDataRight IQ also splits combined names such as
Smith
Generate
new data
into individual names such as
DataRight IQ generates new data that you can add to the record:
Gender codes.
DataRight IQ assigns a gender code for each
John Smith
John and Mary
and
Mary Smith
name.
Prenames.
can assign the prename
Match standards.
If DataRight IQ is confident of a name’s gender, it
Mr., Ms.
, or
Mrs.
DataRight IQ can generate match standards,
or potential matching words. For example, DataRight IQ can
tell you that
name
Greetings.
ous styles: formal (
iam
), and title (
Patrick
and
Pat
Patricia
.
are potential matches for the
DataRight IQ generates personal salutations in vari-
Dear Mr. Shakespeare
Dear Playwright
).
), casual (
Dear Will-
Output dataDataRight IQ makes the parsed data and new data available for
output. You retrieve the items that you want.
.
Chapter 1: Welcome to DataRight IQ
13
What DataRight IQ can do
We’ve discussed how DataRight IQ works, now let’s introduce you to the many
things that DataRight IQ can do with your data. We’ll explain some of these in
more detail later in this guide.
Convert file format If you rent or purchase data, or if you process data from multiple sources,
DataRight IQ can help you prepare your data for further processing. Using
DataRight IQ (Job or Views), you can process up to 255 input files at one time—
regardless of their format—and convert them to your target format.
If data floats in unfielded lines in the record, DataRight IQ can identify
information and isolate it so you can place each data element exactly where you
want in the output file.
Parse data You can use DataRight IQ to identify and isolate a wide variety of data—even if
the data is floating in lines.
Input data
Parsed data
Mr. Dan R. Smith, Jr., CPA
Account Mgr.
Jones Inc.
Dept. of Accounting
421-55-2424
PO Box 567
Biron, WI 54494
drsmith@jonesinc.com
507-555-3423
Jan 3, 2003
PrenameMr.
First NameDan
Middle NameR.
Last NameSmith
Maturity
Jr.
Postname
Honorary
CPA
Postname
TitleAccount Mgr.
FirmJones Inc.
Firm LocationDept. of Accounting
Social Security421-55-2424
E-mail Addressdrsmith@jonesinc.com
Phone507-555-3423
DateAugust 20, 2003
Last LineBiron, WI 54494
14
AddressPO Box 567
DataRight IQ User’s Guide
•DataRight IQ (Job and Views) parses up to six names per record. For
all six names found, it parses components such as prename, first name,
middle name, last name, and postname. Then it sends the data to
individual fields. DataRight IQ also parses up to six job titles from
each record.
•DataRight IQ (Job and Views) parses up to two firm names (such as
IBM) and up to two firm locations (such as Engineering Dept.) per
record. DataRight IQ can also convert firm names to accepted
acronyms—for example, it can convert General Motors Corp. to GM.
•DataRight IQ (Job and Views) parses U.S. address lines, as well as
city, state, and ZIP Code data.
Select records DataRight IQ (Job and Views) offers advanced record-selection features on input
and output. You decide what records will be processed and what records will be
included in your output files:
•You can select records based on specific criteria such as gender, age,
or geographical data.
•You can also select a representative sample of records—for example,
you could select 50,000 records at random from throughout a file (Job
and Views).
Standardize data DataRight IQ can standardize data to make your records more consistent. Some
things that it can standardize include case, punctuation, and acronyms. For more
information on DataRight IQ’s standardization abilities, see “Standardize data”
on page 23.
Assign gender and
prenames
Create personalized
greetings
DataRight IQ assigns a precise gender code to each name. DataRight IQ offers
several levels of gender codes—strong male, strong female, weak male, weak
female, and ambiguous. The intelligence behind gender assignment lies partly in
the parsing software, and partly in the parsing dictionary. For more information,
see “Gender codes” on page 32.
When DataRight IQ can assign a gender with a strong or weak confidence, it can
also assign a prename: Mr., Ms., or Mrs. For more information, see “Prenames”
on page 34.
You can use DataRight IQ to create personalized greetings in formal, casual, and
title styles: Dear Mr. Shakespeare, Dear William, and Dear Playwright.
DataRight IQ creates a greeting for each person, as well as an overall greeting for
the entire record (for example, Dear William and Harold or Dear Sirs).
You can customize greetings by specifying the greeting word and ending
punctuation. For example, the default casual style yields Dear William, but you
could create greetings such as Greetings, William! instead.
For more information, see “Salutations” on page 37.
Chapter 1: Welcome to DataRight IQ
15
Perform advanced
search-and-replace
DataRight IQ (Job and Views) offers a powerful search-and-replace feature that
lets you convert and modify data:
•Convert coded data. For example, you could convert prename codes to
prenames.
•Remove unwanted data. You can search for unwanted data and delete
it by replacing it with nothing. For example, you could remove dates
from a name field.
•Search-and-replace. You can perform a traditional search-and-replace,
searching for a specific value and replacing it with another value. For
example, you could search for Occupant and replace it with Current Resident.
•Search-and-put. You can search for a value in one field and put the
“replacement” value into a different field, leaving the original field
intact. For example, you could search for income codes in a field from
your input file, convert the codes to income ranges, and put the results
into a field in your output file.
Split combined namesIf your records include combined names—most often, married couples or other
family members—you can use DataRight IQ to split them. You can leave both
names in one record or create two separate output records.
Ms. Patricia Jones
Ms. Patricia and Mr. William Jones
Mr. William Jones
Scan-and-splitThe scan-and-split feature is designed to more precisely arrange data from fields
that contain complex information, such as mixed combinations of names, firms,
account numbers, special designations (as shown below), or dates. For example, a
field might contain data like the following:
John Smith, Trustee for Mary Smith
Scan for:
Split method:
Trustee for
After
Line1:
John Smith, Trustee for
Line2:
Mary Smith
Use the scan-and-split feature to manipulate parts of the data into either one, two,
or three fields, and designate combinations of strings. In the example above, the
scanned phrase of Trustee for is placed after the first name listed.
Refer to “Step 1: Create a scan-and-split table” on page 109 for detailed
information about the Scan-and-Split feature and how to set it up in DataRight
IQ.
16
DataRight IQ User’s Guide
Migrate data from a mainframe system
Suppose a database on your mainframe system contains data entered during dayto-day operations. The first five fields of the database contain different data from
record to record (the data “floats” between fields) and often contain extraneous
data that you want to remove. The database also contains several fields of coded
data that you want to convert, as well as a date field that you need to convert from
2-digit years to 4-digit years.
You can use DataRight IQ to conform to your master format:
identify and parse floating data
remove extraneous information
convert coded data
convert and parse dates
Identify and parse
floating data
If your input file is in a multiline format (fields contain different data from record
to record), DataRight IQ can identify and parse individual data elements and put
them into separate fields.
Input data
Line1
Line2
Line3
Line4
Line5
City
State
ZIP
Line1
Line2
Line3
Line4
Line5
City
State
ZIP
Mr. Dan Williams CPA
Jones Engineering Inc.
Dept. of Accounting
PO box 567
1234 Main St
La Crosse
WI
54601
Jan Smith, President
Smith Consulting
123 W 4th St
Winona
MN
55987
Parsed data
Name
Title
Firm
Firm Location
Address1
Address2
City
State
ZIP
Name
Title
Firm
Firm Location
Address1
Address2
City
State
ZIP
Mr. Dan Williams
CPA
Jones Engineering Inc.
Dept. of Accounting
1234 Main St
PO box 567
La Crosse
WI
54601
Jan Smith
President
Smith Consulting
123 W 4th St
Winona
MN
55987
Chapter 1: Welcome to DataRight IQ
17
Remove extraneous
information
You can use DataRight IQ to remove extraneous information from a field. For
example, you could remove a date from a name field:
Input data
Cleansed record
Line1
Line2
Line3
Ann Jones
3/2/04
123 Main St
Onalaska WI 54650
Name
Address1
Last Line
Date
Ann Jones
123 Main St
Onalaska WI 54650
03/02/2004
Convert coded dataSuppose your database has several fields that contain coded data. You eventually
want to consolidate the data in this database with the data in a master database.
You want to convert the codes so that they are consistent with the coding system
in the master database.
Your coding system lists a variety of prenames and assigns a number to each.
Similarly, you have numbers representing gender, marital status, and different
levels of income.
Input data
Prename:
Gender:
Marital status:
income:
Converted data
8
1
2
3
Prename:
Gender:
Marital status:
income:
Dr
F
S
50,000 to 75,000
Convert and parse
dates
Here is what the coding system may look like for this example:
Prename: 8 = Dr
Gender: 1 = female
Marital status: 2 = single
Income: 3 = 50,000 to 75,000
If your database contains a date field, the software can convert the format to your
standard, and even parse the data into fields:
Input data
013197
Converted data
Date
Ye a r
Month
Day
1/31/1997
1997
January
31
18
DataRight IQ User’s Guide
Prepare records for match processing
Suppose you maintain a large master database. On a daily basis you consolidate
data from a number of sources and add records to your master database. In the
master database you want just one record for each person, so you use a matching
process to eliminate duplicate records and consolidate information about each
individual into a single record.
DataRight IQ can help you prepare your records for matching.
Parse data DataRight IQ can parse (identify) individual data components and put them into
separate fields. This makes it easier to match data, because you can compare
apples to apples. For example, you can compare a last name in one record to a last
name in another record rather than compare whole lines of data.
Input data
Line1
Line2
Line3
Line4
Line5
Intl. Marketing, Inc.
Dept. of Sales
Pat Smith, Sales Mgr.
328 Bluebird Ln
Biron WI 54494
Parsed data
First Name
Last Name
Title
Firm
Firm Location
Address
City
State
ZIP
Pat
Smith
Sales Mgr.
Intl. Marketing, Inc.
Dept. of Sales
328 Bluebird Ln
Biron
WI
54494
Standardize firm
names
Convert nonmailing
city names
You can also use DataRight IQ to standardize inconsistent firm names.
Consistency among records can help you improve matching results.
DataRight IQ standardizes commonly used words in firm names:
International Harvester, Inc.
Internatl. Harvester, Incorp.
Intl. Harvester, Inc.
Intl. Harvester, Incorporated
DataRight IQ can also convert firm names to accepted acronyms:
International Business Machines
Internatl. Business Machines
IBM
Intl. Business Machines Corporation
DataRight IQ can convert nonmailing, or “vanity,” city names to the city name
preferred by the U.S. Postal Service. This improves consistency among records.
Hollywood
Los Angeles
Chapter 1: Welcome to DataRight IQ
19
Provide match
standards for name
data
DataRight IQ can provide match standards for first and middle names. For
example, DataRight IQ can tell you that Patrick and Patricia are potential
matches for the first name Pat.
Pat
Patrick
Patricia
Match standards can help you overcome two types of matching problems:
alternate spellings (Catherine and Katherine) and nicknames (Pat and Patrick).
ExampleThis example shows how DataRight IQ can prepare records for matching.
Input record from data source 1
Intl Marketing, Inc.
Dept. of Accounting
Pat Smith, Accounting Mgr.
328 Bluebird Ln
Input record from data source 2
Smith, Patricia R.
International Mktg, Incorp.
328 Bluebird Ln
Wisconsin Rapids, Wisconsin
Biron, WI 54494
Cleansed record
First Name
Match Standards
Middle Name
Last Name
Title
Firm
Firm Location
Address
City
State
ZIP
Pat
Patrick, Patricia
Smith
Accounting Mgr.
Intl. Mktg, Inc.
Dept. of Accounting
328 Bluebird Ln
Wisconsin Rapids
WI
54494
Cleansed record
First Name
Match Standards
Middle Name
Last Name
Title
Firm
Address
City
State
ZIP
Patricia
Patricia
R.
Smith
Intl. Mktg, Inc.
328 Bluebird Ln
Wisconsin Rapids
WI
54494
20
DataRight IQ User’s Guide
Convert file type and format
Suppose you run a service bureau. One of your clients recently rented two lists
from two different list brokers. One list is an ASCII file and one is a dBASE file,
and the files are in different formats. Your client wants you to convert the lists to
his preferred “house” format and put all the records in a single database. The
client wants to evaluate each broker and asks you to provide information about
the quality of the data in each list.
You can use DataRight IQ to do the following:
Convert files to the desired type and format.
Consolidate records into a single output file.
Generate reports containing separate data-quality statistics for each list.
Format
Name
Address1
Address2
Address3
Phone
Format
First_Name
Mid_Init
Last Name
Address
Apt
City
State
ZIP
Phone
John A. Smith
1234 Main St
La Crosse WI
54601
608-555-1212
Mary
R
Jones
913 12th Ave S
Apt 30
Onalaska
WI
54601
608-123-4567
ASCII
dBASE
Format
Name
Address1
Last_Line
Phone
dBASE
Name
Address1
Last_Line
Phone
Reports contain separate
statistics for each list
Report
List A
John A. Smith
1234 Main St
La Crosse WI 54601
608-555-1212
Ms. Mary R Jones
913 12th Ave S Apt 30
Onalaska WI 54601
608-123-4567
List B
Chapter 1: Welcome to DataRight IQ
21
Convert floating data to fielded data
Suppose that your company has traditionally stored customer data on a
mainframe system. The database has an open format: Data is stored in lines, and
there is little or no consistency in the position of data within these lines. You have
a huge amount of data about your customers, but that information is not very
accessible in its current format.
The most important step in making the data more useful and accessible is to
identify specific data elements and put them into separate fields. DataRight IQ
can do this because of the many pieces of information that it can identify and
isolate. (Refer to “What is parsing?” on page 11 for a complete list.)
Input data
Line1
Line2
Line3
Line4
Line5
Line6
Ms. Roberta L. Williams,
CPA
Jones Engineering Inc.
Dept. of Accounting
PO Box 123
La Crosse WI 54601
Parsed data
Prename
First Name
Middle Name
Last Name
Postname
Title
Firm
Firm Location
Extra
Extra
Ms.
Roberta
L.
Williams
CPA
Jones Engineering Inc.
Dept. of Accounting
PO Box 123
La Crosse WI 54601
Input data
Line1
Line2
Line3
Line4
Line5
Line6
Smith Consulting
William Smith, Jr., President
123 W 4th St
Winona MN 55987-3546
Parsed data
Prename
First Name
Middle Name
Last Name
Postname
Title
Firm
Firm Location
Address
Lastline
William
Smith
Jr.
President
Smith Consulting
123 W 4th St
Winona MN 55987-3546
22
DataRight IQ User’s Guide
Chapter 2:
Standardize data
Read this chapter to find out how DataRight IQ makes your data more consistent
from record to record through standardization.
Chapter 2: Standardize data
23
Name format
Name format is the sequence of name components in a name line. For example,
First-Middle-Last or Last-First-Middle. If the name format is consistent
throughout your database, you get better name-parsing results if you “tell”
DataRight IQ the name format. Do this by setting the name format in your
definition (DEF) file.
However, DataRight IQ adheres to the way it's set in your DEF file only when the
input is ambiguous. If you want DataRight IQ to apply “strict” name order, select
the Strict Name Order option in the Input File block. Then DataRight IQ parses
name data the way you set it in the DEF file.
Known name formatWhen the software knows the name format (and has compared the data to the
rules), it will correctly identify the first name and last name. But, if it does not
know the name format, in some cases it may not be able to determine which name
is the first and which name is the last. If a name is determined to be ambiguous,
the software assumes that the name format is first-middle-last.
For example, if the software doesn’t know the name format, it could not
accurately decide if a name such as Carey Donnell should be Donnell Carey.
Both names could be first names.
With a known name format of last-first-middle, DataRight IQ knows that the first
name in the sequence is really the last name. It will then output either Donnell, Carey if your output format is set to last-first-middle, or Carey Donnell if your
output format is set to first-middle-last.
Inconsistent name
format
The software can accept and process input names even if the name format varies
from record to record. For example, it can accept and process a file in which some
names are in the First-Middle-Last sequence, and others are in the Last-FirstMiddle sequence.
If the name format is inconsistent in your database, tell DataRight IQ that the
name format is unknown. It will then identify first, middle, and last name
components based on the data itself.
Refer to “Define input name format” on page 51 for details about setting up name
format in your jobs.
24
DataRight IQ User’s Guide
Address data
DataRight IQ can standardize city, state, and ZIP Code data and can assign
missing or invalid last-line data. This can help you fill in missing data, overcome
inconsistencies, and produce standardized address output.
Limited last-line
standardization
If you choose, DataRight IQ can perform limited last-line standardization:
Standardize state data to the U.S. Postal Service abbreviation. For example,
Wisc. becomes WI.
Provide city and state data based on the ZIP Code. If the city or state is
missing or misspelled, or if the city-state combination is not valid, DataRight
IQ assigns the correct city and state for the input ZIP Code:
InputOutput
La Crescent 55947La Crescent MN 55947
Onalsky WI 54650Onalaska WI 54650
Provide ZIP Code data. If a ZIP Code is missing or is not valid for the city
and state, DataRight IQ can assign a default ZIP Code:
InputOutput
Minneapolis MNMinneapolis MN 55401
La Crosse WI 54620La Crosse WI 54601
Convert nonmailing
city names
Caution: Some cities have more than one 5-digit ZIP Code. The default
!
ZIP may or may not be correct for the mailing address in the record.
If you standardize the last line, you can also convert nonmailing city names. A
nonmailing city is a city that is served by a post office located in another city.
DataRight IQ can convert nonmailing city names to post-office city names:
InputOutput
Hollywood CALos Angeles CA 90027
Little Chute WI 54914Appleton WI 54914
Chapter 2: Standardize data
25
Convert words to acronyms
DataRight IQ can convert components to widely accepted acronyms. You can use
DataRight IQ to convert firm names, titles, and so on, to standard acronyms.
For example, DataRight IQ can convert General Motors to GM, or National
Aeronautics and Space Administration to NASA. Also, given the input Chief
Executive Officer, DataRight IQ can produce CEO as output.
Acronyms can help you overcome inconsistencies among input records. If you
are preparing data for matching, more consistent data may lead to better matching
results, especially for firm names.
International Business Machines
Internatl. Business Machines
Intl. Business Machines Corpora-
DataRight IQ produces an acronym only when one is available in the parsing
dictionary—it does not produce initials by algorithm or rule. That is, DataRight
IQ produces only accepted acronyms for selected companies. DataRight IQ does
not simply take the first letter of each word in the firm name and return a pseudo
“acronym.”
IBM
You can enhance and customize acronyms by using the User Modifiable
Dictionary (UMD) program. For details on UMD, see your DataRight IQ Modifier’s Guide or the online help that accompanies UMD Views.
26
DataRight IQ User’s Guide
Convert case
DataRight IQ offers three styles of casing, upper, mixed, and lower. You can also
choose to retain (preserve) the same case that was used in the input record.
OutputOriginal dataResulting data
UpperDr. John McKay, Ph.D.DR. JOHN MCKAY, PH.D.
MixedDR. JOHN MCKAY, PH.D.Dr. John McKay, Ph.D.
LowerDr. John McKay, Ph.D.dr. john mckay, ph.d.
PreserveDR. John MCKay, PH.D.DR. John MCKay, PH.D.
Intelligent mixed caseThe basic rule of mixed case is to capitalize the first letter of the word and put the
rest of the word in lowercase. However, there are exceptions:
Some words should be in all uppercase, such as IBM and RN
Some words should be in all lowercase, such as of and or
Some words contain an internal capital letter, such as McKay
DataRight IQ can intelligently apply the correct casing to these and many other
mixed-case exceptions.
When DataRight IQ applies mixed case, it doesn’t simply look at the spelling of
the word. DataRight IQ also looks at how the word is being used. For example,
the word MS should have an uppercase “S” when it is used as a postname
meaning “Master of Science,” but the “s” should be lowercase for the prename
Ms. DataRight IQ intelligently applies the correct casing:
InputOutput
JOHN R. SMITH, MSJohn R. Smith, M.S.
MS. MARY ANN JOHNSONMs
. Mary Ann Johnson
You can customize mixed casing by creating a custom capitalization dictionary.
For details, see “Improve casing results” on page 44.
Chapter 2: Standardize data
27
Other standardization options
Some of the options below can be set in the Standardization Options block or the
Salutation Options block. Others are done automatically by DataRight IQ. For
example, DataRight IQ always standardizes common job-title words and firm
words, and strips punctuation from address lines.
Common job-title
words
Address-line
punctuation
Retain original
punctuation
When your input data contains job titles, DataRight IQ standardizes them. For
example:
Mr John A Smith, C. E. O.Mr. John A. Smith CEO
DataRight IQ removes punctuation from address lines:
InputStandardized output
123 S. Main St., Apt. 4123 S Main St Apt 4
If you prefer to retain the original punctuation, you must also retain the original
text.
Mr. John A. Smith C.E.O.Mr. John A. Smith C.E.O.
Note: When you retrieve original text, you always get original punctuation,
regardless of your setting for punctuation standardization.
Phone number
formats, and
extensions
Use DataRight IQ’s phone number settings to standardize phone numbers from
the United States and Canada. (There aren’t any format options for International
numbers (non-US) because they only consist of the country code and number.)
You can choose from one of these phone formats:
(xxx)xxx-xxxx (default)
xxx-xxx-xxxx
xxxxxxxxxx
In addition, you can enter the phone extension text that you want to appear in
front of extension numbers. For example, enter Ext. to appear in front of
extension numbers.
Formats for datesUse DataRight IQ to change date formats. DataRight IQ offers many date formats
for you to choose from.
In addition to choosing the format for dates, you can specify the delimiter to use.
Choose from <none>, <space>, a forward (/) or backward (\) slash, or a dash.
28
DataRight IQ User’s Guide
JobIn Job, enter the Format number and Delimiter number that relate to the format
and delimiter that you want. A list of date formats and delimiter formats follows
the Standardization/Assignment Control block. The parameters are Output Date
Format and Output Delimiter Format.
* Date Format Options:
* Format 1 - YYYY*MM*DD
* Format 2 - YY*MM*DD
* Format 3 - DD*MM*YYYY
* Format 4 - DD*MM*YY
* Format 5 - MM*DD*YYYY
* Format 6 - MM*DD*YY
* Format 7 - DD*MMM*YY
* Format 8 - DD*MMM*YYYY
* Format 9 - MMM*DD*YYYY
* Format 10 - MMM*DD*YY
* Format 11 - YYYYMMDD
* Format 12 - YYMMDD
* Format 13 - DDMMYYYY
* Format 14 - DDMMYY
* Format 15 - MMDDYYYY
* Format 16 - MMDDYY
* Date/SSN Delimiter Options:
* Delimiter 1 - '' (No delimiter)
* Delimiter 2 - ' ' (Space)
* Delimiter 3 - '/'
* Delimiter 4 - '-'
* Delimiter 5 - '\'
* Delimiter 6 - '.'
ViewsFor Views users, choose a date format and date delimiter from the drop-down
lists in the Standardization Style window.
You can also control date format on input by completing the Input File options
Input Date Format (Month before Day) and Input Date Format (Year first). These
are in the Input File block. Refer to “Input Date Format (Month before Day) Input
Date Format (Year first)” on page 211 for descriptions.
Chapter 2: Standardize data
29
Output standardized data
You can output either standardized or unstandardized data from DataRight IQ. To
output standardized data, post output fields (AP.fieldname) to your output file. If
you prefer unstandardized data, post database (DB.fieldname), input fields
(PW.fieldname), or output fields that contain parsed, unstandardized data
(APU.fieldname).
For more information about data available for posting, see “Post the data that you
want” on page 86. For a complete list of PW, AP, and APU fields, see Firstlogic’s
Quick Reference for Views and Job-File products.
30
DataRight IQ User’s Guide
Chapter 3:
Add new data to existing data
You can augment your existing data by having DataRight IQ add data such as
gender codes, prenames, and salutations. This added data can include the
following:
gender codes (such as “strong female”)
prenames (such as “Ms.” or “Mr.” or Mrs.”)
match standards
salutations (such as “Dear Ms. Jones”)
For example, given the information at left, DataRight IQ can provide you with the
data at right.
Lori Jones, CEO
Name
Gender
Prename
Match Standards
Formal Greeting
Casual Greeting
Title Greeting
Ms. Lori Jones
Strong female
Ms.
LAURA, LOREN
Dear Ms. Jones:
Dear Lori:
Dear CEO:
Chapter 3: Add new data to existing data
31
Gender codes
Add precise gender
information
For each name in the record, DataRight IQ assigns a gender code.
You can add gender data to your database, or you can use gender codes to select
records. For example, if you’re mailing a women’s newsletter, you could select
records for women only.
Reliable gender dataThe name and gender data that DataRight IQ uses for determining gender is
obtained from a variety of sources and compiled into the Parsing dictionary
(parsing.dct). There is a gender value given to names in the dictionary that helps
DataRight IQ determine if the name is male or female. Here’s a look at how
DataRight IQ does it:
GenderDescription Example
Strong malealmost certainly male
94 to 100% of people with this name are male
Weak maleprobably male
(70 to 94% of people with this name are male)
Ambiguousname does not reliably indicate gender
(Fewer than 70% male, fewer than 70% female)
Weak femaleprobably female
(70 to 94% of people with this name are female)
Robert
Te rr y
Pat
Lynn
if female prename with
male name
If male prename with
female name
Strong femalealmost certainly female
(94 to 100% of people with this name are female)
Anne
To assign a gender code, DataRight IQ looks up the gender for the prename and
the first name, then uses gender-assignment rules to assign a gender code.
It is not uncommon for a married woman to combine a female prename with her
husband’s name. DataRight IQ considers the name strong female:
Mrs. Fred Saeger = strong female
If a male prename occurs with a female name, DataRight IQ usually considers the
gender “ambiguous”:
Mr. Terri Smith = ambiguous
32
DataRight IQ User’s Guide
Gender codes and
how to retrieve them
DataRight IQ assigns a gender code for the first six names found in a record. You
can retrieve gender codes from the output field AP.Gender.
CodeDescriptionExample
1 DRL_MALE_STRONGStrong male. (High confi-
dence that the person is
male. That is, the name
belongs to someone who
is almost certainly a
male.)
2 DRL_MALE_WEAKWeak male. (Some confi-
dence that the person is
male. That is, the name
belongs to someone who
is probably male.)
3 DRL_AMBIGUOUSAmbiguous. (The name
does not reliably indicate
a gender. The name could
be either male or female.)
4 DRL_FEMALE_WEAKWeak female. (Some con-
fidence that the person is
female. That is, the name
belongs to someone who
is probably a female.)
5 DRL_FEMALE_STRONGStrong female. (High
confidence that the person is female. That is, the
name belongs to someone
who is almost certainly a
female.)
John
Adrian
Pat
Lynn
Mary
6 DRL_MULTI_MIXEDMultiple names, at least
one male and at least one
John and
Mary
female (no ambiguous or
unassigned genders).
7 DRL_MULTI_NAMES_MALEMultiple names, all male.John and
Adrian
8 DRL_MULTI_NAMES_FEMALEMultiple names, all
female.
9 DRL_MULTI_NAMES_AMBIGUOUS Multiple names, at least
one ambiguous (but none
Mary and
Lynn
William and
Pat
unassigned).
0 DRL_UNASSIGNEDUnassigned. (The first
name could not be found
PVT first CL
Jkiloji Smith
in the dictionary and any
prename is gender-neutral.)
Chapter 3: Add new data to existing data
33
Prenames
Add prenames as
separate components
or in name lines
If DataRight IQ can assign a strong or weak gender to a name, it can also assign a
prename. For example, given John Smith as input, DataRight IQ can assign the
prename Mr. You can add prenames to your database, either in a separate field or
as part of a name line.
You can retrieve prenames as separate components or as part of a name line (up
to six names).
InputOutput
John SmithPrename: Mr.
Name: Mr. John Smith
If the input does not include a prename, DataRight IQ assigns a prename based on
gender. DataRight IQ offers two options so that you can get the prename results
you want.
In Job, find the parameters in the Standardization/Assignment Control block:
Use Generated Prenames (Y/N)......... = Y
Female Prename Assignment (MS/MRS)... = MS
In Views, find the options in the Standardization Style window:
34
DataRight IQ User’s Guide
When and how
DataRight IQ assigns
prenames
If the input does not include a prename, DataRight IQ assigns a prename based on
its assigned gender code. If the gender is strong or weak, DataRight IQ assigns
the appropriate prename (Mr. or Ms.). To avoid offending anyone, it does not
assign a prename if the gender is ambiguous.
InputGender assignedOutput prename
Chuck StremplerStrong maleMr.
Adrian MileauWeak maleMr.
Pat O’MalleyAmbiguous(none)
Abbie Van BurenWeak femaleMs.
Gladys SaegerStrong femaleMs.
When the input includes a prename, DataRight IQ always carries over that
prename to the output, even if the gender of the first and middle names suggests a
different prename.
InputOutput prename
Ms. Glenn CloseMs.
Mr. Stacey KeachMr.
Mrs. Gerald FordMrs.
Female prename Normally, for a female name DataRight IQ assigns the prename Ms. For example,
given Mary Smith as input, DataRight IQ produces Ms. Mary Smith as output.
However, in some situations you may prefer to assume marriage and assign the
prename Mrs. You can tell DataRight IQ to use the prename Mrs. for input name
lines such as John and Mary Smith:
InputOutput
John and Mary SmithMr. John and Mrs. Mary Smith
Chapter 3: Add new data to existing data
35
Prenames in output
name lines
Include prenamesTo include the generated prenames in your output file’s name lines, include these
output fields in your output file setup:
AP.Name_Line1-6
AP.Name1-6
And complete the following parameter in Job:
BEGIN Standardization/Assignment Control ======================
Standardize Lastline (Y/N)........... = Y
Non-Mailing Cities (CONVERT/PRESERVE) = PRESERVE
Case (UPPER/lower/Mixed/SAME)........ = Mixed
Use Generated Prenames (Y/N)......... = Y
*******The remaining block omitted for illustration********
Here’s the setup in Views:
Here’s a sample of the output:
InputOutput
Anne Smith, Engr.Ms. Anne Smith, Engr.
Exclude prenamesYou can exclude the generated prenames from your output file’s name lines by
entering N in the Use Generated Prenames parameter in Job. In Views, you
uncheck the Include Generated Prenames in Name Lines option in the
Standardization Style window.
Here’s a sample of the output without the generated prename:
InputOutput
Anne Smith, Engr.Anne Smith, Engr.
For complete step-by-step instructions, refer to the Views online help.
36
DataRight IQ User’s Guide
Salutations
SalutationsInclude a salutation in your correspondence. DataRight IQ generates a salutation
for each individual in the record, and it also generates a salutation for the entire
record. Here are some of the salutation features in DataRight IQ:
Create formal salutations such as Dear Mr. McKay.
Create casual salutations such as Dear Dan.
Customize the greeting word (such as Dear or Hello) and the ending
punctuation.
Use alternate greetings under certain circumstances. For example, if the name
is absent you can use a title greeting such as Dear VP of Sales.
Here are the salutation options shown in both Views and Job:
Set up salutations in the Salutation Options window in Views.
BEGIN Salutation Options ====================
Salutation Format (FORMAL/CASUAL).... = FORMAL
Use Title Salutation if No Name (Y/N) = N
Salutation Initiator................. = DEAR
Salutation Punctuation............... = ,
Salutation Connector................. =
Short Greeting for All Male Record... = SIRS
Short Greeting for All Female Record. = LADIES
Alternate Salutation................. =
Alternate Salutation Threshold(1-100) =
END
Set up salutations in the Salutation Options block in Job.
Chapter 3: Add new data to existing data
37
DataRight IQ uses
sophisticated logic
DataRight IQ uses sophisticated logic to create high-quality salutations. It does
not follow a simplistic pattern of “Dear <Name>”. Instead, it looks at all of the
name elements that are present and produces the best possible salutation.
For example, given the input J. Ewing, DataRight IQ never produces the
undesirable salutation Dear J. Instead, DataRight IQ detects that the first name is
an initial and produces the salutation Dear J. Ewing.
Find descriptions of these salutation options on the following pages:
formal or casual
customized
multiname
alternate for absent or low-scoring names
Choose formal or
casual salutations
DataRight IQ creates salutations for all six names found in the record. You can
choose formal salutations such as Dear Mr. Shakespeare or casual salutations
such as Dear William.
Formal salutationsFor formal salutations, DataRight IQ uses the prename and last name whenever
possible. DataRight IQ uses the input prename or the DataRight IQ-generated
prename:
Input nameGender assignedFormal salutation
Mary SmithStrong femaleDear Ms. Smith,
Robert JonesStrong maleDear Mr. Jones,
Casual salutationsFor casual salutations, DataRight IQ generally uses the first name:
Input nameCasual salutation
Alex ShakerDear Alex,
If the first name is an initial only, DataRight IQ does not generate a salutation
such as Dear J. Instead, DataRight IQ uses the first initial and the last name:
38
Input nameCasual salutation
J. R. EwingDear J. Ewing
If no first name exists, DataRight IQ uses the prename and last name:
Input nameCasual salutation
Mr. SmithDear Mr. Smith,
DataRight IQ User’s Guide
Create the salutations
you want
DataRight IQ creates a salutation for each name in the record (up to six). You can
decide what initiator word, connector word and ending punctuation to use in your
salutations.
Dear Mr. Smith and Ms. Jones:
InitiatorConnectorPunctuation
You can create salutations in any style you want. For example, you could create
salutations such as Hello, Bob & Mary!
Create multiname
salutations
Consistency within each
salutation
Formal salutations for
dual names
For each record, DataRight IQ generates a salutation for the entire record, which
you can retrieve from the output field AP.Salute_Rec. The record salutation is a
greeting for all of the persons in the record.
DataRight IQ attempts to use the same style for all the names in the record. For
example, for a formal salutation, if a prename is unavailable for one or more
names in the record, DataRight IQ uses first names and last names for all persons.
This results in consistency within the salutation:
InputFormal salutation
John Smith and Pat JonesDear John Smith and Pat Jones,
(not Dear Mr.
Smith and Pat Jones)
For dual names, the formal salutation uses a shared last name:
InputFormal salutation
John and Mary SmithDear Mr. and Ms. Smith,
You can retrieve dual-name salutations from the output field AP.Dual_Salut.
Short greetings If a record contains multiple names of the same strong gender, you can generate
short formal salutations such as Dear Sirs:
Input namesFormal greeting
Robert Jones and Bob SmithDear Sirs:
Mary Smith and Jane HammondDear Ladies:
Sometimes name data is missing or a name receives a low parsing-confidence
score. In these situations, you can generate alternate salutations.
Chapter 3: Add new data to existing data
39
Title salutationIf a person’s job title is present but name data is absent, you can use the job title in
the salutation:
InputSalutation
Alternate salutation for
low-scoring names
Name1: (none)
Dear Sales Mgr.,
Title1: Sales Mgr.
If a name receives a low parsing-confidence score, you may prefer to use a
generic salutation rather than use questionable name data. For low-scoring names,
you can use a generic salutation such as Dear Valued Customer or Dear Sports Fan.
For more information about parsing-confidence scores, see “Data-quality scores
and codes” on page 165.
40
DataRight IQ User’s Guide
Chapter 4:
Custom dictionaries
A custom dictionary can help improve how DataRight IQ works for you. You can
create custom dictionaries to improve casing and parsing results.
Learn how to create your own custom dictionaries in this chapter plus read about
how custom dictionaries improve casing and parsing results.
Chapter 4: Custom dictionaries
41
Create a custom dictionary with UMD
To create a custom dictionary, use the User-Modifiable Dictionary program
(UMD) installed with DataRight IQ.
UMD can help you create both a custom capitalization dictionary and a custom
parsing dictionary. For instructions on using UMD, see the DataRight IQ Modifier’s Guide or the online help that accompanies UMD Views.
We recommend that you create a separate custom capitalization dictionary and
use it in addition to the capitalization dictionary that comes with the software,
pwcap.dct.
Important: Each time you install a software update, we overwrite the file
!
pwcap.dct. To protect your work, give your dictionary a name other than
pwcap.dct. This will prevent your dictionary from being overwritten the
next time that you install an update of the software.
For Views users: The easiest way to create a custom capitalization dictionary is
to use the Capitalization Wizard. Start the Wizard by choosing Tools >
Capitalization Wizard.
For Job users: Create a database containing your dictionary entries. Then, use
UMD to convert the database to a capitalization dictionary. For instructions about
what fields to include in your database, what data to enter in each field, and how
to convert the database to a dictionary, see the DataRight IQ Modifier’s Guide.
How to use your
custom dictionary
You can specify two capitalization dictionaries. We recommend that you use our
capitalization dictionary, pwcap.dct, as dictionary #1 and your custom dictionary
as dictionary #2.
Duplicates in dictionaries? If both dictionaries contain the same word,
DataRight IQ uses the entry from dictionary 2.
The way you specify your custom dictionary depends on how you implemented
your DataRight IQ product:
In DataRight IQ Library, to use your custom dictionary you specify the path
and file name of the dictionary file.
In Job, when you set up your job file, tell DataRight IQ to use your
capitalization dictionary in addition to the standard dictionary. In the
Auxiliary Files section, list pwcap.dct as Capitalization Dictionary 1 and
your custom dictionary as Capitalization Dictionary 2.
BEGIN Auxiliary Files =========================================
*************Portions omitted for illustration************
*************Portions omitted for illustration************
END
42
DataRight IQ User’s Guide
In DataRight IQ Views, you specify your dictionaries in the Auxiliary Files
window in the Dictionaries group:
Processing speed. Processing is a little slower when DataRight IQ consults
two capitalization dictionaries. The difference in speed varies depending upon
a number of variables, but expect processing to take about one percent longer
than it would with one capitalization dictionary.
Chapter 4: Custom dictionaries
43
Improve casing results
Mixed-case resultsIf you use mixed case, the general rule is to capitalize the first letter of the word
and put the rest of the word in lowercase. However, there are exceptions to that
rule, such as McD
To handle mixed-case exceptions, DataRight IQ consults a capitalization
dictionary. DataRight IQ includes a capitalization dictionary called pwcap.dct,
which contains mixed-case exceptions.
onald, Ph.D., IBM, NY, and so on
How DataRight IQ
capitalizes in mixed
case
Improve mixed-case
results
A capitalization dictionary is a list of words that are mixed-case exceptions. The
dictionary contains the correct casing of a word and also indicates when that
casing should be used.
For example, the capitalization dictionary has entries for both MS and CO:
Dictionary entry Usage
MSPOSTNAME
COSTATE
The word MS is cased differently depending upon how it is used: MS as an
abbreviation for the postname “Master of Science,” or Ms
as a prename. So, the
entry in the capitalization dictionary indicates that MS (uppercase “S”) should be
used only for postname data.
The word CO is cased differently depending upon how it is used: CO
abbreviation for the state of Colorado, but Co
as an abbreviation for the word
as an
company. So, the dictionary entry indicates that CO (uppercase “O”) should be
used only for state data.
Most DataRight IQ users find that our capitalization dictionary (pwcap.dct) is
sufficient for producing good mixed-case results. However, it would be
impossible for our capitalization dictionary to contain every mixed-case
exception. If DataRight IQ does not case a word as you would like, you can create
a custom capitalization dictionary.
44
DataRight IQ User’s Guide
For example, TechTel is not in our capitalization dictionary, so DataRight IQ
capitalizes only the first letter of the word:
TECHTEL, INC.
Techtel Inc.
If you add the word TechTel to your custom capitalization dictionary, you can get
the desired mixed-case results:
TECHTEL, INC.
TechTel Inc.
Improve parsing results
DataRight IQ uses a parsing dictionary to guide parsing, gender assignment, and
standardization. You can create a custom dictionary to improve parsing results for
your specific data.
Customizing your dictionary. For details on adding or editing dictionary
entries, see the DataRight IQ Modifier’s Guide or UMD online help.
How DataRight IQ
uses the dictionary
The parsing dictionary lists words and phrases and tells how they’re used. For
example, the parsing dictionary tells DataRight IQ that the word Engineering can
be used as a firm word (Smith Engineering Inc.) or in a job title (VP of Engineering).
DataRight IQ gets other information from the dictionary, too:
The dictionary contains gender data. For example, the dictionary tells
DataRight IQ that the name Anne is a female name, and that Mr. is a male
prename.
The dictionary also tells DataRight IQ the standard and acronym forms of
words. For example, the dictionary indicates that Inc. is the standard form of
Incorporated and that GM is the acronym for General Motors.
The dictionary contains match standards. For example, the dictionary tells
DataRight IQ that the names Patricia and Patrick are potential matches for
the name Pat.
Correct specific
parsing behavior
Local namesDataRight IQ’s name data is based on an analysis of U.S. residents. As such, the
You might customize the parsing dictionary to correct specific parsing behavior
that you have seen in your output.
parsing dictionary is broadly useful across the United States. However, you may
want to tailor the dictionary to better suit your data by adding ethnic or regional
names. If DataRight IQ doesn’t recognize a specific name—for example, Jinco Xandru—you can add Jinco to the dictionary as a first name and Xandru as a last
name.
Industry-specific
jargon
Our parsing dictionary is useful across many industries. You might tailor the
dictionary to better suit your own industry by adding special titles, prenames or
postnames, acronyms, or other jargon words.
For example, if you process data for the real estate industry, you might add
industry-specific postnames such as CRS, ABR, GRI.
Specific phrasesSome words can be used in firm names and in job titles. As a result, DataRight IQ
may incorrectly parse some job titles as firm names. To improve parsing, you can
add phrases to the dictionary.
Firm names
containing personal
names
Often a firm name is made up of personal names. As a result, DataRight IQ may
incorrectly parse the firm as a personal name. For example, the catalog retailer J. Crew may be parsed as a personal name rather than as a firm.
Chapter 4: Custom dictionaries
45
To improve parsing, you can add multiple-word firm names to the dictionary. For
example, to parse J. Crew as a firm rather than as a personal name, you could add J. Crew to the dictionary as a firm name.
Create a custom
dictionary
To create a custom parsing dictionary, use our User-Modifiable Dictionaries
(UMD) program. For instructions on using UMD, see the DataRight IQ Modifier’s Guide or UMD Views online help.
Note: If you add a word to a parsing dictionary and that word also has special
mixed casing (such as Tech T
el), remember to also add the word to your
custom capitalization dictionary.
46
DataRight IQ User’s Guide
Unit 2:
DataRight IQ Job and Views
Contents
Job file
Views
Library
Set up a DataRight IQ job49
Overview of data parsing55
Details of data parsing65
Process parsed data81
Search-and-replace89
Scan-and-split107
Input and output119
Reports and statistics141
Data-quality scores and codes165
Master job file183
The chapters in this unit mostly apply to Job and Views users. If
you use the Library implementation of DataRight IQ, there is a
section in “Details of data parsing” on page 65 that explains some
parsing features exclusive to Library. For complete Library
information, see “Welcome to DataRight IQ Library” on
page 247.
DataRight IQ Job and Views
47
48
DataRight IQ User’s Guide
Chapter 5:
Set up a DataRight IQ job
For almost every DataRight IQ job that you run, you need to perform some basic
setup tasks. These tasks include setting up the following files:
You also need to set your preferences for standardizing data and adding
salutations. In addition, you need to set up your reports and verify that your job is
ready.
Chapter 5: Set up a DataRight IQ job
49
Set up your input files
DataRight IQ can accept up to 255 input files for each job. You need to tell
DataRight IQ where to find your input files and how to read them.
Almost everything you need to know about setting up input files is explained in
Firstlogic’s Database Prep documentation. For detailed how-to instructions on
setting up your input files, see that documentation.
Tell DataRight IQ the
format of the file
Define the input fields
Tell DataRight IQ
where the file is
located
For flat files and some types of databases, you have to tell DataRight IQ the
physical format of the input file. To do this, create a format file (also known as an
FMT or DMT file). Our Database Prep manual explains how to set up a format
file.
DataRight IQ recognizes a specific set of input fields called PW fields. If you
want DataRight IQ to process the data in a field, you can do one of two things:
Use the Modify PW Field parameter (see the Input File block).
Map that field to one of the DataRight IQ-recognized PW fields. To map
fields, you must create a separate file called a definition file, or DEF file. In
Views, you can create a DEF file using the DefMap tool in the Tools menu.
For instructions on setting up a DEF file, see our Database Prep manual. For a
description of each PW field, see our Quick Reference. For more guidelines about
setting DataRight IQ’s input fields, see “Define input fields” on page 51.
Most of the work of setting up your input files is done outside of DataRight IQ.
Inside your DataRight IQ job file, the only thing you really need to do is provide
the location and file name in the Input File section of your job.
50
DataRight IQ User’s Guide
Define input fields
Setting up input fields is an extremely important part of setting up your DataRight
IQ job. The more accurately you set up the fields in your definition (DEF) file,
the better DataRight IQ’s output parsing results will be.
For a complete list and descriptions of the DataRight IQ input fields, see our
Quick Reference.
Define fields as
precisely as possible
Define fields
accurately
Browse data before
defining fields
Prepare for floating
data
The DEF file tells DataRight IQ what kind of data each input field contains. The
more information you give DataRight IQ about the contents of a field, the better
your parsing results will be.
When you define an input field, make sure that you account for all of the different
kinds of data that occur in the field. If the content of the field varies from record
to record, make sure that your field definition accounts for this.
A common mistake is to define a field based on its field name rather than on the
actual contents of the field. This method often leads to inaccurate field
definitions, which leads to inaccurate output results.
Field names can be deceiving. Before you write a DEF file, look at the data in
your input database. It is much quicker to browse the input database and write an
accurate DEF file than it is to rerun an entire job because of a mistake in the DEF
file.
Usually you can present a floating field to DataRight IQ as a multiline or name/
firm field. However, sometimes a database floats between two completely
different record formats. This usually results when someone appends one
database to another without realizing that the two files have different formats.
The result is a multiformat file.
Define input name
format
Multiformat files require special setup; see “Straighten out a file that has multiple
record formats” on page 121.
Name format is the sequence of name components in a name line—for example,
First-Middle-Last or Last-First-Middle. If the name format is consistent
throughout your database, you get better name-parsing results if you tell
DataRight IQ the name format.
For example, given the name Thomas Todd, DataRight IQ can correctly identify
the first name and last name if you tell it the name format. However, if DataRight
IQ does not know the name format, it’s unclear which is the first name and which
the last—is it Thomas Todd or Todd Thomas?
Chapter 5: Set up a DataRight IQ job
51
Set up auxiliary files
To identify and process data, DataRight IQ depends on a set of auxiliary files. All
y o u h a v e t o d o i s t e l l D a t a R i g h t I Q w h e r e t h e f i l e s a r e l o c a t e d o n y o u r c o m p u t e r .
About the filesHere’s a brief description of how DataRight IQ uses these files.
DataRight IQ’s parsing
information (name, firm,
and title).
Addressaddrln.dctDictionary
file
Last linelastln.dctDictionary
file
Firm linefirmln.dctDictionary
file
DataRight IQ’s address line
information.
DataRight IQ’s last line
information.
DataRight IQ’s firm (company) information.
Required
52
DataRight IQ User’s Guide
City city08.dirDirectory
file
ZCFzcf08.dirDirectory
file
Capitalizationpwcap.dctDictionary
file
Custom capitalization
Default ASCII
[name].dctDictionary
file
[name].fmtFormat fileDataRight IQ’s default
FMT
Default DEF[name].defDefinition
DataRight IQ’s city information.
DataRight IQ’s U.S. ZIP
Code information.
DataRight IQ’s capitalization information.
Your own additional capitalization information.
ASCII format.
DataRight IQ’s default.
file
Optional
Set up your output files
Although it is possible to post data back to your input file, most DataRight IQ
users post records to an output file or files. When you set up a new output file,
you must perform two separate but equally important tasks:
Set up the format of the output file.
Specify what data to put in each output field.
Create the format of
the output file
If you create a new database for output, you must first define the format of that
new file—the file type, the sequence of fields, field names, field lengths, and so
on.
You can create the format of an output file in two ways: You can manually define
the format of the file, or you can clone the format of an existing file. For more
information about output format, see “Specify the output format you want” on
page 124.
Post data into the
output fields
After you define the format of the output file, you must specify the content—what
information will be posted into each output field.
You can post raw data from the input file, data processed by DataRight IQ, and
new data generated during processing.
Enable output posting If you want to post data to an output file, make sure you enable output posting in
the Execution section of the job file. If output posting is disabled, DataRight IQ
ignores your instructions for creating and posting to output files.
Chapter 5: Set up a DataRight IQ job
53
Verify that your job is ready
Before you can run your DataRight IQ job, you must verify that it’s ready. When
DataRight IQ verifies a job, it makes sure that you have provided all of the
information required to run the job. DataRight IQ also makes sure that all your
job settings are valid.
Verifier messagesDuring verification, DataRight IQ may issue two types of messages:
Message typeDescription
ErrorIf DataRight IQ finds a problem that would prevent the job from
running, it gives an error message.
WarningIf DataRight IQ finds a less serious problem, it gives a warning
message. This means there is a possibility the job will produce
unexpected results.
Batch verifierWhen you start the batch-processing version of DataRight IQ (DataRight IQ Job),
you do so by typing a command line at your operating-system prompt. Below is
an example.
fldiq /v /lmy_job c:\pw\dtr_iq\jobs\my_job.diq
DataRight IQ automatically starts verifying that the job is ready for processing. It
stops on the first serious error. After you correct the error, you start the
verification process all over again.
Views verifierDataRight IQ Views offers a handy way to verify jobs. Views can find and
present more than one error at a time. You can jump directly to a trouble spot by
selecting an error or warning message and clicking the Go To button.
54
DataRight IQ User’s Guide
Chapter 6:
Overview of data parsing
You can use DataRight IQ to identify and isolate various data and data
components from fields of information. We call this parsing. .
What DataRight IQ
parses
DataRight IQ can identify and isolate various data and data components from
fields of information, including:
•street or “land” addresses and last lines
•e-mail addresses
•Social Security numbers
•dates
•U.S. and international phone numbers
•user-defined patterns
•names and titles of persons
•firms
For a complete list of the input fields that DataRight IQ accepts, see Firstlogic’s
Quick Reference documentation. For a more detailed description about how
DataRight IQ parses each type of data, see the next chapter.
Chapter 6: Overview of data parsing
55
DataRight IQ uses parsing dictionary and word patterns
To identify and parse data, DataRight IQ uses a parsing dictionary and then
examines words and word patterns. It then isolates data into different types of
fields.
Dictionary lookupsAs DataRight IQ processes each field, it breaks each field into words. DataRight
IQ looks up each word in a parsing dictionary (parsing.dct). The parsing
dictionary helps DataRight IQ determine what type of data each word might be.
For example, the parsing dictionary tells DataRight IQ Library that the word
Engineering could be part of a job title, firm name, or firm location.
Word patternsAfter DataRight IQ performs dictionary lookups, it looks at the position and
sequence of words in the field. DataRight IQ examines word patterns to help
identify data components.
For example, the word Te ch te l is not in the parsing dictionary. However, if an
input field contains Tech te l In c ., DataRight IQ can correctly identify it as a firm
name, because DataRight IQ knows that a word (or words) followed by Inc. is a
firm name.
Isolate data
components for
output
After DataRight IQ identifies individual data elements, it isolates each
component. For example, if given Techtel Inc., Dept. of Engineering as input,
DataRight IQ can isolate Techtel Inc. as the firm name and Dept. of Engineering
as the firm location.
Techtel Inc., Dept. of Engineering
Firm Name:
Firm Location:
Techtel Inc.
Dept. of Engineering
56
DataRight IQ User’s Guide
DataRight IQ uses rule-based parsing
One of the ways that you can modify DataRight IQ’s behavior is by creating or
editing parsing rules in the rule file (drlrules.dat). The rule file controls how
DataRight IQ parses name and firm data.
Rule-file organizationThe rule file has a straight-forward organization. It consists of a header,
explanatory information, and parsing rules grouped by data type.
The header The file’s header identifies the DataRight IQ rule file. You must not alter or
delete the header.
DRL Rule File v1.0;
# DO NOT EDIT, MODIFY OR REMOVE THE ABOVE LINE!!!!!
#
Explanatory informationYou can add notes and explanations for rules in your rule file. These notes must
be commented out by typing a pound sign (#) at the beginning of the note.
# The following types will be used in each example
#
# NAME_DESIGN = ATTN
# PRENAME = MR
# NAME_STRONG_FN = JOHN
Parsing rules by data typeLastly, the rule file consists of rules grouped by data type. Groups of rules
include:
Name rules
Dual name rules
Firm rules
Address Line rules
Last Line rules
Optional rules, not enabled by default
#######################################################
# NAME RULES #
# #
#######################################################
Modifying the rule fileThe rule file has hundreds of rules for many different possible combinations of
data. These rules will likely satisfy the parsing needs of most DataRight IQ users.
However, you can add to or change the rule file. Refer to the DataRight IQ Modifier’s Guide for details.
Chapter 6: Overview of data parsing
57
Presumptive parsing
For name and firm information, when the rules don’t apply, DataRight IQ uses
presumptive parsing.
DataRight IQ tries its rule-based parsing with name (or firm) rules when input on
a name or firm line, respectively. If data doesn't match a rule (and you have
activated presumptive parsing in your setup), it uses presumptive parsing to make
a best guess. With presumptive parsing, a name or firm will always be parsed out
of a name or firm line.
The input does go to the rule set first, so in some cases the rules will match only
part of the entry and parse that out. The remaining will go to extra.
Some examplesWith rule-based parsing, data that matches a rule will be parsed according to that
rule.
In the example below, the nameline data “
xb1wc so34bod2jc” is recognized as
junk data when parsing by the rules (the data has numbers and/or no letters in it)
and so it’s sent to the Extra output field.
Input (on a name line)Output with rule-based parsing
FieldData
xb1wc so34bod2jcExtra1xb1wc so34bod2jc
If you turn presumptive parsing on, the same data is parsed as first and last name
because it came in on a nameline.
Input (on a name line)Output with presumptive parsing
FieldData
xb1wc so34bod2jcFirst namexb1wc
Last nameso34bod2jc
If you have a legitimate name in your data (like “john smith” among the same
junk data, below), parsing will pull out John as a first name, Smith as a last name,
and the rest as Extra—regardless if you use rule-based parsing or presumptive
parsing—because the data matched a parsing rule.
58
Input Output with presumptive parsing on
FieldData
xb1wc john smith so34bod2jcFirst nameJohn
Last nameSmith
Extra1xb1wc so34bod2jc
DataRight IQ User’s Guide
Turn presumptive
parsing on
In Job, activate presumptive parsing for both name and firm lines in the Parsing
Control box:
BEGIN Parsing Control =============================================
Parsing Mode (NONE/PARSE)............ = PARSE
Presumptive Parse Name Lines......... = Y
Presumptive Parse Firm Lines......... = Y
END
In Views, select the Parsing Setup group in the main window, then open the
Parsing Control window. Select the Use Presumptive Parsing for Name Lines and
Use Presumptive Parsing for Firm Lines options:
Chapter 6: Overview of data parsing
59
Parse discrete components and lines
Data files contain data that is arranged in many different formats. DataRight IQ
accepts input data several ways:
discrete components
data as whole lines
Discrete components Each field in a discrete
component file contains
one piece of information.
The Prename field
contains only a prename
and that’s it. There is a
field for a first name, last
name, postname, and so on
Prename
First Name
Middle Name
Last Name
Postname
Title
Firm Name
Firm Location
Mr.
John
A.
Smith
Jr.
Engineer
Firstlogic Inc.
Dept. of Engrg.
so that the input record
looks like this one at right. These fields are in essence already parsed.
Because of the nature of discrete fields, DataRight IQ does not perform its own
parsing. For example, if you tell DataRight IQ that a piece of data is a first name,
DataRight IQ will not try to parse it as a last name, even if it looks like a last
name.
Data as whole linesDataRight IQ can accept data as whole lines. The name line contains a prename,
first, middle, and last name, postname, and title. The address contains city, state,
and ZIP Code. The e-mail line contains user, host, and domain information.
Name Line
Firm Line
Address
e-mail
Mr. John A. Smith Jr., Engineer
Firstlogic Inc., Dept. of Engrg.
100 Harborview Plz, La Crosse, WI 54601
ja.smith@firstlogic.com
60
DataRight IQ User’s Guide
Parse floating data
Sometimes a database contains fields that include different data from record to
record. In other words, data floats between fields. DataRight IQ recognizes
floating data in several types of fields:
multiline field.
name/firm field
“non-addr” field
MultilineSome data entry systems provide lines where data entry operators can enter any
data. There may be little or no consistency about the position of data within these
lines. We call this a multiline format.
Here are some guidelines for multiline data:
You may pass in multiple names on one line—for example, John and Mary
Jones, or Bill Johnson and Craig Andrews.
You may combine name and job-title data on the same line—for example,
Robert Smith, Software Engineer.
Separate firm from name or title data. Do not pass in firm data on the same
line with name or job-title data, for example, John Smith, Engineering Services Inc. DataRight IQ will usually parse the entire line as a firm name,
because personal names commonly occur within firm names.
A multiline field may contain any of the following:
A name line (may include a job title).
A job title.
A firm name or firm location or both.
An address line.
City, state, and/or ZIP Code.
Other data such as a U.S. Social Security number, phone, date, e-mail
address, and data that matches a user-defined pattern.
When DataRight IQ processes a multiline field, it identifies and isolates the data
listed above. Any other data is sent to an output field called AP.Extra.
Name/firm fieldA name/firm field is the same as a multiline field except that it never contains
address data or any data from other parsers. A name/firm field may contain any of
the following:
a name line (may include a job title)
a job title
a firm name or firm location or both
When DataRight IQ processes a name/firm field, it identifies and isolates name,
title, and firm data. Any other data (including address data or data from other
parsers) is sent to an output field called AP.Extra.
Chapter 6: Overview of data parsing
61
Non-address fieldA non-address field is the same as a name/firm field in that it does not contain any
address data. However, it can contain data from other parsers. A non-address field
may contain any of the following:
any name/firm field
an e-mail address
a date
a phone number
a UDPM field
A SSN
62
DataRight IQ User’s Guide
DataRight IQ’s multiline parsing order
When input is on a multiline, DataRight IQ parses data in the following order:
OrderParsed item
Street address and lastline
1
E-mail address
2
U.S. Social Security number
3
Date
4
Phone number (U.S. or Canadian)
5
Phone number (International)
6
User-defined pattern
7
Name and title
8
Firm
9
Why order?The order in which DataRight IQ parses your data is important. Why? Because if
DataRight IQ identifies data as one thing before it can evaluate it as another, you
may get unexpected results.
For example, if DataRight IQ identifies a nine-digit number as a U.S. Social
Security number, then it won’t evaluate that data as a potential international
phone number. Likewise, if you set up a custom pattern that looks for 5-digit
numbers, anything recognized as a ZIP code is not going to make it through to get
evaluated against your pattern.
When parsing, DataRight IQ looks through each record for different types of
data. For each type of data, DataRight IQ makes a separate “pass” through the
data. If DataRight IQ finds something on one pass, it extracts that data—and on
the next pass examines only the data that remains.
When an item is recognized, it doesn’t go to the next step.
Chapter 6: Overview of data parsing
63
Modify how DataRight IQ parses
DataRight IQ parses better than DataRight, in part because you can decide how it
parses. You have more control over how DataRight IQ parses.
You can, of course, set the input fields and retrieve the output fields as usual. And
with DataRight IQ you can create custom dictionaries just like you can with
DataRight—by using Firstlogic’s User-Modifiable Dictionary (UMD) utility
But with DataRight IQ you can create or edit parsing rules in the rule file
(drlrules.dat). The rule file controls how DataRight IQ parses name and firm
data. Also, you can define patterns and rules for parsing data that DataRight IQ
does not already parse.
Use pre-defined rulesDataRight IQ already provides hundreds of rules for many different possible
combinations of data. These rules will likely satisfy the parsing needs of most
users. However, you may encounter data that isn’t being parsed the way that you
want it to be parsed. Or, maybe you want to tweak a rule so that it returns a
different confidence score. In situations like this, it is very handy to be able to edit
the rule file.
Edit the rule fileThe DataRight IQ rule file (drlrules.dat) controls how DataRight IQ parses
groups of output type subcomponents for name and firm data.
Turn off parsing
engines for multiline
files
DataRight IQ already provides hundreds of rules for many different possible
combinations of data. These rules will likely satisfy the parsing needs of most
DataRight IQ users.
However, you may encounter data that isn’t being parsed as you’d like it to be.
Or, maybe you would like to tweak a rule so that it returns a different confidence
score. In situations like this, it is very handy to be able to edit the rule file.
For more information on editing the rules by which DataRight IQ parses, see the
DataRight IQ Modifier’s Guide.
To help you control how DataRight IQ parses, the software gives you the ability
to directly control what DataRight IQ parses. For each input line of a multiline
file you can selectively turn off the parsing of addresses, names, firms, Social
Security numbers, dates, phone numbers, user-defined patterns, and e-mail
addresses.
In your DataRight IQ Job product, see the Multiline Parsing block.
64
DataRight IQ User’s Guide
Chapter 7:
Details of data parsing
DataRight IQ can identify and isolate various data and data components from
fields of information: See the table below.
Fields that DataRight IQ parsesFor more information, see
street or “land” addresses and lastlinepage 66
e-mail addressespage 67
U.S. Social Security numbers page 69
datespage 72
phone numberspage 73
user-defined patternspage 76
names and titles of personspage 77
firmspage 77
For a complete list of the input fields that DataRight IQ accepts, see Firstlogic’s
Quick Reference for Views and Job-File Products documentation.
Library onlyIn addition, there are some features of parsing in DataRight IQ Library that are
not included in Views and Job. They are:
Associate a title with a name on the same line (page 78)
Associate a title with a name on a different line (page 79)
Associating name lines with title lines (page 80)
Note: Some DataRight IQ implementations have inherent differences in how
they operate. DataRight IQ’s Views and Library implementations, for
example, process data differently. Views can accept databases and process
batch files, while Library accepts data one record at a time.
Keep these differences in mind when you read DataRight IQ’s
documentation. For example, when the documentation refers to accessing
databases or to a batch process, this doesn’t apply to DataRight IQ Library.
Chapter 7: Details of data parsing
65
Parse street addresses
DataRight IQ parses address data (sometimes called “street addresses” to
distinguish them from other addresses, such as e-mail addresses).
DataRight IQ accepts U.S. and Puerto Rican address lines. DataRight IQ accepts
city, state, and ZIP Code data either together on one line or as discrete fields
when followed directly by each other.
Components
La Crosse
WI
Whole lines
100 Harborview Plaza
La Crosse WI 54601-4051
54601-4051
Note: If you have more than one address line (not counting city-state-ZIP),
present all of your address lines to DataRight IQ as multiline (floating) fields.
Don’t present one line as an address line and the other as a multiline field.
66
DataRight IQ User’s Guide
Parse e-mail addresses
When DataRight IQ parses input data that it determines is an e-mail address, it
places the components of that data into specific fields for output. Below is an
example of a simple e-mail address:
sales@firstlogic.com
By identifying the various data components (user name, host, and so on) by their
relationships to each other, DataRight IQ then assigns the data to specific fields.
Fields usedDataRight IQ outputs the individual
components of a parsed email address—
that is, the email user name, complete
domain name, top domain, second
domain, third domain, fourth domain,
fifth domain, and host name.
For inputting and outputting e-mail
address information, DataRight IQ uses
the fields listed at right:
What DataRight IQ
does
With DataRight IQ, you can do the following things with an e-mail address:
Parse the e-mail address, either in a field by itself or combined in a field with
other data.
Break the domain name down into sub-elements.
Verify that an e-mail address is properly formatted.
Flag the address for special handling (see “Flag addresses” on page 67).
Not verifiedSeveral aspects of an e-mail address are not verified by DataRight IQ. DataRight
IQ does not verify:
whether the domain name (the portion to the right of the @ sign) is
registered.
whether an e-mail server is active at that address.
whether the user name (the portion to the left of the @ sign) is registered on
that e-mail server (if any).
whether the personal name in the record can be reached at this e-mail
address.
Flag addressesYou can flag e-mail addresses based on a list of criteria you create or maintain.
For example, if you focus on B2B, you might want to flag consumer-oriented
domain names such as hotmail, yahoo, or aol.com.
Chapter 7: Details of data parsing
67
You flag addresses by matching them against a list of hosts and domain names in
a file named drlemail.dat. The software sets e-mail or ISP as T (true) or F (false).
Then, using a filter on the T/F output, you can post a flag (to the field EmailISP1-
6) that you can use to separate this data. Then you can process these flagged
addresses separately from other e-mail addresses.
E-mail componentsThe AP field where DataRight IQ places the data depends on the position of the
data in the record. DataRight IQ follows the Domain Name System (DNS) in
determining the correct output field.
When DataRight IQ parses the following data:
expat@london.home.office.city.co.uk
it would assign these components to the following fields according to DNS.
Sample dataFieldField description
expatEmailUser1The user name, or “addressee.” The
person or department, for example, for
whom the e-mail is intended.
london.home.office.city.co.uk EmailAllD1The “all D” is the entire domain.
ukEmailTopD1The top level domain. In DNS, the
highest level of hierarchy after the
root (the last “dot”). In a domain
name, that portion of the domain name
that appears farthest to the right, often
“com,” “org,” “gov,” and so on.
home.office.city.coEmailTopD2-5 The elements between the host and top
level domain.
londonHostThe element immediately to the right
of the “at” symbol (@).
For example, with the input data, expat@london.home.office.city.co.uk,
DataRight IQ outputs each element in the following fields:
= expat@london.home.office.city.co.uk
= expat
= london.home.office.city.co.uk
= uk
= co
= city
= office
= home
= london
= F
68
DataRight IQ User’s Guide
Parse Social Security number
DataRight IQ parses U.S. Social Security numbers (SSNs) that are either by
themselves or on an input line surrounded by other text.
Fields usedFor inputting and outputting U.S. Social Security number information, DataRight
IQ uses the following fields:
Input fieldsOutput fields
PW.SSN1-6
AP.SSN1-6
and multiline fields
The six available PW.SSN fields (PW.SSN1-6) store output Social Security
number data.
Setting up SSN parse
in DataRight IQ
Example of data:
Typical field length:
DataRight IQ has two SSN options to set up for parsing:
Specify the file location that contains SSN information
Select the type of delimiter to use
123-45-6789
9-11 characters
SSN information fileSpecify the location of the SSN information file in the Auxiliary Files block in
DataRight IQ.
SSN delimiterIn the Standardization/Assignment Control block, you can determine what
delimiter you want to output the SSN with by setting the SSN Delimiter option.
In Job, there is a note at the end of the block specifying the delimiters to choose
from In Views, choose from options in a drop-down list.
How DataRight IQ
parses Social Security
numbers
DataRight IQ parses Social Security numbers in two steps:
1. Identifies a potential SSN by looking for any of three patterns:
PatternDigits per groupingDelimited by
nnnnnnnnn9 consecutive digitsn.a.
nnn nn nnnn3, 2, and 4 (for area, group, and serial)spaces
nnn-nn-nnnn3, 2, and 4 (for area, group, and serial)all supported
delimiters
Chapter 7: Details of data parsing
69
2. Performs a validity check on the first five digits only. Two outcomes of this
validity check are possible:
OutcomeDescription
PassDataRight IQ successfully parses the data—and the Social Secu-
rity number comes out in an AP field.
FailDataRight IQ does not parse the data because it’s not a valid SSN
as defined by the U.S. government—so the data comes out as
Extra, unparsed data.
Check validityWhen performing a validity check, DataRight IQ doesn’t verify that a particular
9-digit Social Security number has been issued, or that it’s the correct number for
any named person. Instead, it validates only the first 5 digits (area and group).
DataRight IQ doesn’t validate the last 4 digits (serial)—except to confirm they
are digits.
SSAdataDataRight IQ’s validation of the first 5 digits is driven by a table from the Social
). That table is
updated monthly as the SSA opens new groups. The rules and data that guide this
check are available at http://www.ssa.gov/history/ssn/geocard.html
.
Update your SSN fileFirstlogic provides the Social Security Number (SSN) file (drlssn.dat)f or
DataRight IQ customers interested in parsing recently issued and existing U.S.
Social Security numbers. The SSN file is updated monthly with the latest SSN
information from the U.S. government. Firstlogic will convert the data to a format
that DataRight IQ can use and post the data by the 5th of every month.
You can obtain the most current SSN file from Firstlogic Customer Portal site at
http://www.firstlogic.com/customer. This area provides you with the opportunity
to download the latest drlssn.dat file used to parse U.S. Social Security numbers
within DataRight IQ.
Outputs valid SSNsOutputs only Social Security numbers that pass its validation. If an apparent SSN
fails validation, DataRight IQ does not pass on the number as a parsed, but
invalid, Social Security number.
Other U.S. ID
numbers
Your data may include other numbers used in the United States for governmental
identification purposes. DataRight IQ’s capability is aimed at U.S. Social
Security numbers, which are, in effect, Tax IDs for individuals. However, other
numbers include ITIN and EIN.
NameDescription
ITIN Individual Taxpayer Identification Number
This “cousin” to the Social Security number is what the IRS assigns to people
who earn money and pay federal income taxes but who are not citizens (they
are resident or non-resident aliens). An ITIN looks like an SSN except that it
begins with the number 9.
DataRight IQ treats an ITIN as an invalid SSN. It might match the pattern,
but not make it through the check against the SSN table, so an ITIN will
come out as unparsed Extra.
70
DataRight IQ User’s Guide
NameDescription
EIN Employer Identification Number
Synonymous with a corporate Tax Identification Number (TIN) or Tax ID,
this number is also 9 digits. However, its pattern is
nn-nnnnnnn
. Because of
that, the EIN is not recognized by DataRight IQ’s SSN parser.
Use UPDM to parse other patterns. If you need to parse patterns that aren’t
covered by one of DataRight IQ’s usual parsing engines, use DataRight IQ’s
UDPM (user-defined pattern matching) feature.
Chapter 7: Details of data parsing
71
Parse dates
DataRight IQ recognizes dates in a variety of formats and breaks those dates into
components.
Fields usedFor inputting and outputting date
information, DataRight IQ uses the
following fields:
Formats and
delimiters
DataRight IQ supports the following formats and delimiters. That is, you can
select any one of these formats to standardize dates.
FormatExample
yyyy*mm*dd2004 01 27
yy*mm*dd04 01 27
dd*mm*yyyy27 01 2004
dd*mm*yy27 01 04
mm*dd*yyyy01 27 2004
mm*dd*yy01 27 04
dd*mmm*yy27 Jan 04
dd*mmm*yyyy27 Jan 2004
mmm*dd*yyyyJan 27 2004
Input fieldsOutput fields
PW.Date1-6
AP.Date1-6
and multiline fields
Delimiter*Description
<none>no space
<space>a space
/
–
\
.
* Delimiters appear between date components only for formats that have delimiters.
That is, you also have the option of using no
delimiters (<none>).
forward slash
dash
backward slash
period
mmm*dd*yyJan 27 04
yyyymmdd20040127
yymmdd040127
ddmmyyyy27012004
ddmmyy270104
mmddyyyy01272004
mmddyy012704
DataRight IQ can parse up to six dates from your defined record. That is,
DataRight IQ identifies one or more dates (up to six) in the input, breaks found
dates into components, and makes dates available as output in either the original
format or a user-selected standard format.
72
DataRight IQ User’s Guide
Parse phone numbers
I
t field
Output field
DataRight IQ can parse both North American (U.S. and Canada) and
international phone numbers. When DataRight IQ parses a phone number, it
outputs the individual components of the number into the appropriate AP fields
(see the examples below).
Fields usedFor inputting and outputting phone
number information, DataRight IQ
uses the following fields:
U.S. and CanadaWhat DataRight IQ calls U.S. phone numbers should be more properly called
Phone numbering systems differ around the world. DataRight IQ recognizes
phone numbers by their pattern and (for non-U.S. numbers) by their country code,
too.
North American phone numbers. The Canadian phone number standard follows
the same pattern as U.S. phone numbers. Because of this, when DataRight IQ
parses a phone number that’s either from the U.S. or Canada, it posts the data to
AP.USPhoneX.
DataRight IQ searches for U.S. phone numbers by commonly used patterns such
as: (234) 567-8901, 234-567-8901, and 2345678901.
DataRight IQ gives you the option for some reformatting on output (such as your
choice of delimiters). Below is an example with extension text:
Input data:(901) 234-5678 EXT 1234
Output data:901-234-5678 Ext. 1234
Europe and Pacific-RimDataRight IQ searches for European and Pacific-Rim numbers by pattern. The
patterns used are stored in drlphint.dat. They require that the country code
appear at the beginning of the number. DataRight IQ doesn’t offer any options for
reformatting international phone numbers. Also, DataRight IQ doesn’t crosscompare to the address to see if the country and city codes in the phone match the
address.
Chapter 7: Details of data parsing
73
Phone number
components
Phone numbers
consist of different
output components
depending on
whether they’re U.S.
or international
numbers.
Individual components for:
U.S. phone numbersnon-U.S. phone numbers
area code
prefix
line number
extension
line type
country code
city code
number
description
Example of U.S. phone
number
Say you have the following U.S. phone data for input:
Work (308)-555-8402 ext 34
DataRight IQ parses the data in
the following AP fields:
Some of these fields (namely,
area code, extension, and type)
are optional. If your input
doesn’t have appropriate values
for these fields, DataRight IQ
leaves them empty.
Phone typeThere is a finite list of what
can be returned as a phone
type.
AP fieldOutput value
AP.USAreaCod1308
AP.USPhonPre1555
AP.USPhonLin18402
AP.USPhonExt134
AP.USPhonTyp1Work
Phone typePossible input values
Businessbusiness, bus, work
Homehome, hme, personal
Datadata
Voicevoice, vmail
Faxfax
74
BBSbbs
Cellularcell, cellular, mobile
DataRight IQ User’s Guide
Example of non-U.S.
phone number
Say you have the following international (non-U.S.) phone data for input:
61-9-0123-4567
DataRight IQ parses the data in the following AP fields (all data must be present
for the phone data to be valid):
AP fieldOutput valueNote
AP.IntCtryCd161
AP.IntCityCd19
AP.IntPhNum10123-4567
AP.IntPhDesc1AustraliaPopulated based on the Country ID.
Important: DataRight IQ accepts international phone numbers only as they
!
would be dialed from the U.S. For example, the number must start with the
appropriate country code.
Also, if presented on a line with other data, the international phone number
must start the line.
Formatting output
data
DataRight IQ lets you specify the prefix (for example, EXT.) that you want to
indicate a phone number's extension. For more information, see “Standard Phone
Extension” on page 241.
Chapter 7: Details of data parsing
75
Parse user-defined patterns
Parse any number
or alphanumeric
With DataRight IQ you can parse data that’s outside the range of name, title,
address, and so on. With DataRight IQ’s user-defined pattern matching (UDPM)
feature, you can parse a wide variety of data such as:
account numbers
part numbers
purchase orders
invoice numbers
VINs (vehicle identification numbers)
driver license numbers
In other words, DataRight IQ can parse any kind of number or alphanumeric for
which you can define a pattern.
Fields usedFor inputting and outputting user-defined pattern information, DataRight IQ uses
the following fields:
Input fieldsOutput fieldsDescription of output field
PW.Pattern1-4
and multiline fields
AP.Pattern1-4
AP.PatnLabel1-4
AP.Patnsub1-4_1-5
The pattern
The label for the pattern
The subpattern(s) of the pattern
The pattern label is created in the
drludpm.dat
file when the pattern is defined.
How it’s doneDataRight IQ is able to parse patterns through its user-defined pattern matching
(UDPM) feature, which uses regular expressions. That is, you can set up data
patterns to suit your data (such as part numbers), and DataRight IQ can parse your
data according to those user-defined patterns.
DataRight IQ’s UDPM feature makes possible the parsing and extraction of
virtually any kind of data that conforms to a pattern—any type of data pattern that
can be expressed using regular expressions.
Define your patternWhen you create a user-defined pattern, you must include a carriage return/
linefeed at the end of the line. All characters before the carriage return/linefeed—
even blank spaces—are considered part of the pattern.
For more informationFor more information on UDPM and setting up the patterns that DataRight IQ
will parse, see the DataRight IQ Modifier’s Guide, which accompanies the
DataRight IQ product. This ability is for advanced users of DataRight IQ. You
should read and follow all warnings before changing how your product works.
76
DataRight IQ User’s Guide
Parse names and titles
DataRight IQ can parse name and title data.
A person’s name can consist of the following parts: prename, first name, middle
name, last name, postname, and so on.
DataRight IQ can accept up to six names and titles as discrete components.
DataRight IQ also accepts name and title data on partial lines or whole lines. The
name line or multiline field may contain one or two names per line.
Components
Mr.
John
A.
Smith
Jr.
Accountant
Parse firms
Partial lines
Mr. John A.
Smith Jr., Accountant
Whole line
Mr. John A. Smith Jr., Accountant
DataRight IQ can parse firm and company data.
DataRight IQ accepts firm name and firm location data. A firm location is a
department, building, mail stop, or other location within a company. DataRight
IQ accepts firm names and firm locations as components or whole lines.
Components
Firstlogic Inc.
Whole line
Firstlogic Inc., Dept. of Accounting
Dept. of Accounting
Chapter 7: Details of data parsing
77
Library only: Associate a title with a name on the same line
When a name and job title are on the same input line, by default DataRight IQ
associates the title with a name in the line. This makes it easier for you to retrieve
a personal name and the job title that goes with that name.
For Job and Views users: This functionality is controlled by several
parameters in the Standardization/Assignment Control block. See “Tell DataRight
IQ whether to associate data on title lines with data on name lines and between
multilines. Associating a name with a title can make it easier for you to retrieve a
personal name and the job title that goes with that name.” on page 239.
Multiple names and
titles
If names and titles occur on the same line, DataRight IQ associates each title with
the closest preceding name:
John Smith, Supervisor and Mary Smith, Mgr.
Name1
Title1
Name2
Title2
Mr. John Smith
Supervisor
Ms. Mary Smith
Mgr
Dual name and titleFor dual names such as John and Mary Smith, by default DataRight IQ associates
the job title with the first of the two names:
John and Mary Smith, Manager
Name1
Title1
Name2
Mr. John Smith
Manager
Ms. Mary Smith
Title2
78
DataRight IQ User’s Guide
Library only: Associate a title with a name on a different line
Suppose your input records contain multiple unfielded names and job titles. In
your output records, you want to create separate fields for each person and his or
her job title. To do this, you need to match each job title with a person:
Enable interline
association
Line1
Line2
Line3
John Smith and Ed Jones, Mgr.
Ann Rose
Software Engr.
Name1
Title1
Name2
Title2
Name3
Title3
Mr. John Smith
Mr. Ed Jones
Mgr.
Ms. Ann Rose
Software Engr.
We provide several options so that you can associate a name on one input line
with a job title on another input line.
By default, DataRight IQ is conservative. If a name and title are not on the same
line, DataRight IQ does not make any association between them. Instead, it parses
the title as a separate, “nameless” person:
Mr. John Smith
Manager
Name1
Title1
Mr. John Smith
Name2
Title2
Manager
If you prefer, DataRight IQ can associate a name on one line with a title on
another.
Note: Regardless of how you set the title-association options, a title must
follow a name in order for it to be associated with that name. If a title precedes
a name, DataRight IQ will never associate the title with that name.
CEO
Kathryn Jones
Name1
Title1
Name2
Title2
Chapter 7: Details of data parsing
CEO
Kathryn Jones
79
Library only: Associating name lines with title lines
You can associate name lines with their corresponding title lines—for example,
DRL_INAMELINE1 with DRL_ITITLELINE1:
Associating data that
is input on a multiline
DRL_INAMELINE1
DRL_ITITLELINE1
Mr. Bob Smith
CEO
Name1
Title1
Mr. Bob Smith
CEO
Most users will probably want to set up this level of association.
You can make an association even if the name or title (or both) occurs on a
multiline:
DRL_ILINE1
DRL_ILINE2
DRL_ILINE3
Mr. Bill Johnson
Firstlogic Inc.
Software Engr.
Name1
Title1
Mr. Bill Johnson
Software Engr.
If you allow association on multilines, DataRight IQ can associate a name and
title even if they are on nonconsecutive lines (as shown in the example).
80
DataRight IQ User’s Guide
Chapter 8:
Process parsed data
When parsing data, DataRight IQ can process in many ways. During processing,
DataRight IQ can do the following:
Standardize data in several ways
Add new data
Remove unwanted data
Convert dates
Post the data that you want
Chapter 8: Process parsed data
81
Standardize data
DataRight IQ can help you make data more consistent from record to record.
Correct inconsistent
name format
You can use DataRight IQ to straighten out disorderly name-line fields. For
example, you might be processing records in which most names are in FirstMiddle-Last order, but in some records the sequence is Last-First-Middle.
DataRight IQ can help you put them all in the same format, because you can
retrieve name components and post them to your output file in any sequence.
InputJohn A. Smith, Jr.
Jones, Mary R.
OutputJohn A. Smith, Jr.
Mary R. Jones
Note: If you post whole name lines (rather than components), DataRight IQ
outputs in FML (first-middle-last) name format.
Convert case You can use DataRight IQ to convert case to upper, lower, or mixed case, or you
can choose to preserve case. In mixed case, DataRight IQ intelligently cases
words such as McD
for the same word. For example, DataRight IQ correctly cases CO
(Colorado) and Co
onald, Ph.D., and IBM. It can even recognize different casings
as a state
as the abbreviation for the word “company.”
For more information, see “Convert case” on page 27.
Standardize business
words
To correct inconsistencies among records, DataRight IQ standardizes common
job-title and firm words. For example, DataRight IQ takes abbreviations such as
Internatl, Intrntl, and Interntl and converts them all to Intl. DataRight IQ can also
convert firm names to acronyms. For example, given General Motors Corp. as
input, DataRight IQ can produce GM as output.
Standardize and
assign last-line data
82
DataRight IQ User’s Guide
For details, see “Standardize data” on page 23.
DataRight IQ can also do some limited standardization and assignment of lastline data (city, state, ZIP):
Assign city, state, or ZIP Code data if it is missing or not valid.
Convert vanity city names to post-office city names—for example, given
Hollywood as input, produce Los Angeles as output.
For details, see “Standardize data” on page 23.
Add new data
DataRight IQ not only helps you correct the data you already have—it also adds
new data.
Gender codes DataRight IQ assigns a highly descriptive gender code for each name (up to six
per record). For example, DataRight IQ can assign a code to tell you that
Reginald White is a male and Lynn Jones is probably a female.
For more information, see “Gender codes” on page 32.
Prenames If a name has a strong or weak gender, then DataRight IQ can add a prename. If
the name is determined to be ambiguous, it will not add a prename. You can
retrieve the prename as a separate component or as part of a name line:
DataDescription
InputDan and Anne McKay
Alice McKay
Terry Fitsimmons
Pat Tubler
OutputMr. Dan and Mrs. Anne
McKay
Ms. Alice McKay
Mr. Terry Fitsimmons
Pat Tubler
Strong male and female
Strong female
Weak male
Ambiguous
“Terry” is a weak male name so it’s
assigned “Mr.”
“Pat” is ambiguous, so it is not
assigned a prename.
For more information, see “Prenames” on page 34.
Salutations You can use DataRight IQ to create salutations such as Dear Mr. McKay or Dear
Dan. You can customize the greeting word (such as Dear or Hello) and the
ending punctuation. You can also use alternate greetings under certain
circumstances—for example, if the name is absent you can use a title greeting,
such as Dear VP of Sales.
For more information, see “Salutations” on page 37.
Match standards Given an input nickname, DataRight IQ can often produce a match-standard
name. For example, given the input nickname Al, DataRight IQ can return names
that are potential matches, such as Alan and Alphonse. If your database can
perform multiway matching, you can use the match standards to improve
matching performance.
Data-quality scores
and codes
DataRight IQ generates scores and codes describing the quality of the data found
in input and output fields. DataRight IQ produces codes that identify what
components were parsed from each record, the types of changes made to data
components, and the parsing errors associated with specific record components.
For more information, see “Data-quality scores and codes” on page 165.
Chapter 8: Process parsed data
83
Remove unwanted data
You can use DataRight IQ to remove extraneous or unwanted data from a field.
Use search-andreplace
Use standard
functions
You can use the search-and-replace feature to remove unwanted data from a field.
For example, you could remove designators from a name field:
BeforeAfter
John Smith, Trustee
Anne Jones, Beneficiary
John Smith
Anne Jones
For more information, see “Search-and-replace” on page 89.
You can also use DataRight IQ functions to modify, convert, and manipulate
data. For example, suppose a field in your database sometimes contains a
parenthetical comment such as (Deliver to back door) at the end of the field.
You could use DataRight IQ functions to look for parentheses in the field, then
delete the parentheses and everything between them.
BeforeAfter
123 Main St (back door)
345 10th Ave (deliver after 2:00)
123 Main St
345 10th Ave
You would use DataRight IQ functions to delete the parenthetical data. You’d
first check to see whether the field contains parentheses. If it did, you would
extract and keep everything except the parentheses and their contents. To do that,
you would find the character position of the opening parenthesis and extract
everything up to that point.
To do this, you would write an expression similar to this:
For more information about what functions are, how they work, and how to use
them, see Firstlogic’s Database Prep documentation.
84
DataRight IQ User’s Guide
Convert dates
You can convert dates from 2-digit years to 4-digit years.
Suppose you manage a membership database for a professional organization. The
database contains a date indicating when a person’s membership expires. The
field has a 2-digit year—for example, 99-10-25. Some memberships expire in the
2000s—for example, 03-12-31 indicates that the membership expires on
December 31, 2003.
You can use DataRight IQ to convert the dates from 2-digit years to 4-digit years:
BeforeAfter
04-12-312004-12-31
Convert 1900s and
2000s
Convert to and from
almost any format
Convert date-type or
character-type fields
98-04-2519
99-01-0119
05-03-2220
98-04-25
99-01-01
05-03-22
You can convert dates to 4-digit years even if some dates are in the 2000s. Just
tell DataRight IQ what year to use as the cut-off. For example, if you set the cutoff year to “05,” DataRight IQ converts the years 00 through 05 to 2000s (20xx)
and the years 06 through 99 to 1900s (19xx).
DataRight IQ can convert dates to and from almost any format:
Convert almost any input format. For example, DataRight IQ supports
formats such as MM/DD/YY (01/23/04), YY-MMM-DD (04-Jan-23),
DDMMYY (230104), MMM DD, YY (Jan 23, 04) and more. (Refer to
“Formats and delimiters” on page 72 for a list of all date formats.)
Choose from a variety of date delimiters ranging from no spaces to dashes, or
slashes. (Refer to “Formats and delimiters” on page 72 for a list of
delimiters.)
DataRight IQ doesn’t require that your dates be stored in date-type fields.
DataRight IQ can convert the format of dates stored in character-type fields or
date-type fields.
You can convert the format of dates, particularly in character-type fields. For
example, you could convert from a DD-MM-YYYY format to a YYYY-MMMDD format.
Chapter 8: Process parsed data
85
Post the data that you want
DataRight IQ offers a variety of output data so that you can post the data you
want into each output field.
For complete descriptions of DataRight IQ output fields, see Firstlogic’s Quick
Reference. For an introductory discussion of output posting, see Firstlogic’s
Database Prep documentation.
Name and title
c o m p o n e n t s a n d l i n e s
Firm components and
lines
For names and job titles, DataRight IQ offers individual components for up to six
names. DataRight IQ offers individual name lines for up to six names. DataRight
IQ also provides complete name lines containing the same name data as the input
name line.
InputDr. Mary R. Smith, M.D. and Mr. Doug A. Jones, Jr., Vice President
Available
for output
Components
Prename 1: Dr.
First Name 1: Mary
Middle Name 1: R.
Last Name 1: Smith
Maturity Postname 1:
Other Postname 1: M.D.
Title 1:
Prename 2: Mr.
First Name 2: Doug
Middle Name 2: A.
Last Name 2: Jones
Maturity Postname 2: Jr.
Other Postname 2:
Title 2: Vice President
Line for each name
Name 1: Dr. Mary R. Smith, M.D.
Name 2: Mr. Doug A. Jones Jr.
Name line
Dr. Mary R. Smith M.D. and Mr. Doug A. Jones Jr. Vice President
DataRight IQ provides two kinds of firm data: firm names and firm locations. A
firm location is a department, mail stop, or other location within a company.
InputFirstlogic Inc., Dept. of Accounting
86
DataRight IQ User’s Guide
Available
for output
Components
Firm: Firstlogic Inc.
Firm Location: Dept. of Accounting
Line
Firstlogic Inc. Dept. of Accounting
Address components
and lines
DataRight IQ offers address and last-line components as well as address lines and
a last line. Address-line components are offered for use in matching; we don’t
recommend them for general use.
Address: 800 W Benton St Apt 6, PO box 123
Primary Address: 800 W Benton St
PO Box Line: PO box 123
Last Line: Tomah WI 54660-1474
Note: Similar data are available for rural-route addresses.
DataRight IQ offers two kinds of parsed data: standardized and unstandardized.
Standardized data is altered by DataRight IQ according to your settings and
DataRight IQ’s standardization rules.
Unstandardized data is identified and parsed into individual components or
lines, but the data is not altered. Casing and spelling are left exactly as they
appear in the input file.
InputJohn Mckay Jr, ACCOUNTANT
Firstlogic Incorp., dept. of accounting
Output
Name:
Title:
Firm:
Firm Location:
Standardized
Mr. John McKay, Jr.
Accountant
Firstlogic Inc.
Dept. of Accounting
Unstandardized
John Mckay Jr
ACCOUNTANT
Firstlogic Incorp.
dept. of accounting
You can retrieve standardized data from AP fields, and unstandardized data from
APU fields. For a list of fields, see our Quick Reference.
Chapter 8: Process parsed data
87
New data During processing, DataRight IQ generates new data.
InputDoug Jones
New dataName: Mr. Doug Jones
Gender: 1 (strong male)
Prename: Mr.
Salutation: Dear Mr. Jones:
Match Std: Douglas
You can retrieve new data from DataRight IQ application (AP) fields. For a list of
DataRight IQ application fields, see our Quick Reference.
Note: DataRight IQ also generates parsing-confidence scores and data-quality
codes. For more information about scores and codes, see “Data-quality scores
and codes” on page 165.
Raw data from the
input file
Overcome fieldnaming conflicts
You can copy raw data directly from the input file to the output file.
For example, suppose your input file contains data that you don’t want to process
with DataRight IQ’s parsers. You want to preserve the data so that it is exactly the
same from input to output file. For each record, you can carry that data over from
the input file to the output file.
To copy data directly from the input file, use the field name as it appears in your
format file (FMT or DMT) or your dBASE3 file, prefixed with “DB.” For
example, for the input field BIRTHDATE, you could post DB.BIRTHDATE.
Note: To post raw data from many fields at once, you can use the Copy Input
Data To Output File option in the Post To Output File section of the job file.
For setup details, see the Views online help.
Suppose you’re processing two input files. One file has a field called PART_NO
and another has a field called PART_ID. You want to post the part numbers to the
output file. However, if you post DB.PART_NO you’ll get data from the first
input file but not the second. Likewise, if you post DB.PART_ID, you’ll get data
from the second file but not the first.
To overcome the difference in input field names (PART_NO versus PART_ID),
present both input fields to DataRight IQ as PW.PART_NO. You could then post
PW.PART_NO in the output file. Your DEF file entries would look like this:
88
DataRight IQ User’s Guide
DEF file 1:
PW.PART_NO = PART_NO
DEF file 2:PW.PART_NO = PART_ID
Note: You can also use user-defined PW fields to overcome field-naming
conflicts. For more information, see Firstlogic’s Database Prep
documentation.
Chapter 9:
Search-and-replace
DataRight IQ’s search-and-replace feature lets you modify data or filter records
according to search-and-replace results.
There are many ways that you can apply the search-and-replace feature to your
jobs. Here are four ways for you to use this feature:
Convert coded data (see page 99)
Remove unwanted data from a field (see page 100)
Search and put (see page 101)
Select a subset of records (see page 102)
In this chapter read an overview of the search-and-replace feature (“Simple
search and replace” on page 90), then learn how to set up your own search-and-
replace (“How to use search-and-replace” on page 92).
This chapter ends with examples of the four ways to use search and replace (listed
above), and a few other examples of ways that you can use this feature to enhance
your job results (“Additional examples” on page 103).
Chapter 9: Search-and-replace
89
Simple search and replace
When you use search and replace, you can search for
a substring
a word
a pattern
the entire contents of a field
and replace it with another value.
For example, if a company changes its name, you could search for the old
company name and replace it with the new company name. To do this type of
traditional search-and-replace, you must perform the search-and-replace on a
field in the input database (a DB field).
SubstringUse the string search-and-replace method to replace a string of characters that
may be found next to or between other characters in a field. For example, you
could strip extraneous punctuation marks from a field:
Search valueReplace valueBeforeAfter
/(space)John/JonesJohn Jones
Use the string search-and-replace method carefully. The search string is
!
replaced whenever it is found, even if it’s part of a word:
Search valueReplace valueBeforeAfter
And&Gerald K. And
ersonGerald K. &erson
Processing speedProcessing will be slowed if you conduct a substring search. Speed will vary
depending upon a number of variables, including file sizes and the size of the
search-and-replace table.
WordUse the word search-and-replace method to replace a word found within a field.
Search valueReplace valueBeforeAfter
NewRenewalNew subscriberRenewal subscriber
90
DataRight IQ User’s Guide
PatternUse the pattern search to search for patterns within a field that you can replace
with a value. For example, a database contains vehicle identification numbers
(VIN) that you want to replace with the make of the auto.
Search valueReplace valueBeforeAfter
^[A-Z0-9]([F])[A-Z0-9]{7}([A-
Ford1FMDU34X7TZA04833 Ford
Z])([A-Z])([0-9]{6})$
FieldUse the field search-and-replace method to match and replace the entire contents
of a field. The entire field must match the search value, and the entire field is
replaced:
Search valueReplace valueBeforeAfter
OccupantCurrent ResidentOccupantCurrent Resident
Current OccupantCurrent Occupant
Chapter 9: Search-and-replace
91
How to use search-and-replace
Setting up a search-and-replace process involves three main steps:
1. Create a search-and-replace table
2. Create a search-and-replace function
3. Use the search-and-replace function.
Create a search-andreplace table
Create a search-andreplace function
Use the search-andreplace function
First, you need to specify what to search for and how to replace it. You use a
search-and-replace table to tell DataRight IQ each search value and its
replacement value.
You can use an internal table created within the job file or an external table that
resides in a separate file. (See “Step 1: Create a search-and-replace table” on
page 93.)
After you create a search-and-replace table, you need to specify how and when to
conduct the search. You use a search-and-replace function to tell DataRight IQ
how to conduct the search. (See “Step 2: Create a search-and-replace function” on
page 95.)
To actually perform a search-and-replace, you must use your search-and-replace
function elsewhere in the job. You can use the function on input or on output.
When you use the function, you will specify which field to search and where to
place the results.
Tell DataRight IQ when to conduct the search (on input or output), what field to
search, and where to place the results by using the function anywhere a filter can
be applied. (See “Step 3: Use the search-and-replace function” on page 96.)
92
DataRight IQ User’s Guide
Step 1: Create a search-and-replace table
If you want to conduct a search-and-replace, the first step is to tell DataRight IQ
exactly what values to search for and how to replace them. To do this, you need to
create a search-and-replace table.
Search-and-replace
table
Suppose you store prenames as 2-byte codes. You want to convert the codes to
prenames. Create a table to tell DataRight IQ each code and its replacement
value.
For example, if you wanted to convert prename codes to prenames, your searchand-replace table might look like this:
Search forReplace with
01Mr.
02Mrs.
03Ms.
04Miss
05Dr.
06Rev.
07Rabbi
08Lt.
09Col.
10Gen.
Internal versus
external tables
How to create an
internal table
You can use two types of search-and-replace tables: internal and external.
Use an internal table for a job-specific task. For example, if your input file
contains prename codes unique to that file, you could use an internal table to
convert the codes.
Use an external table for a frequently performed task. For example, suppose
your company uses the same prename codes in all its databases. Whenever
you process a database, you need to convert the codes. You could create an
external table and use it every time you need to convert the codes.
To create an internal table, set up the Create Internal Table section in the job file.
For setup details, see the Views online help.
Chapter 9: Search-and-replace
93
How to create an
external table
An external table resides in a separate file and can be used over and over again for
any DataRight IQ job.
If you have DataRight IQ Views, the quickest way to create a new external
table is to use the Search And Replace Wizard in the Tools menu.
If you own DataRight IQ Job, create a database containing a search field and
a replace field. For each record, type a search value and its replacement
value. Then, use the User-Modifiable Dictionary (UMD) program to convert
the database to an external search-and-replace table. For guidelines on how to
format the database and use UMD, see the DataRight IQ Modifier’s Guide.
Table entries are
independent
DataRight IQ does not search for values in the order in which they appear in the
search-and-replace table. Instead, DataRight IQ stores search-and-replace values
in a sequence that optimizes look-ups. It searches from the longest string to the
shortest string. Set up your table with the intention that each entry is used in a
separate search-and-replace action.
If you need to perform one search-and-replace before you perform another, set up
two separate search-and-replace actions and nest one inside the other. For more
information about nested functions, see the Database Prep manual.
94
DataRight IQ User’s Guide
Step 2: Create a search-and-replace function
A search-and-replace table is just the first step; it tells DataRight IQ each search
value and its replacement value. You also have to specify how and when to
conduct the search.
The search-and-replace function tells DataRight IQ that this function contains the
following information:
The function name.
Which search-and-replace table(s) to use.
Whether to search for a substring within a field, a word within a field, or the
entire contents of a field. Options are field, word, string, and pattern.
What action to take if the search field does not contain any of the search
values from your table.
Case sensitivity.
Name and build your
function
To give DataRight IQ these instructions, you will create and use a search-andreplace function. The function tells DataRight IQ how to conduct the search.
To create a search-and-replace function, open a Create Search/Replace Function
window.
Enter a name for the
function.
Tell DataRight IQ which
search-and-replace
table(s) to use.
Search the entire contents of a field, a word
within a field, a substring
within a field, or a pattern within a field.
Leave unmatched data
intact or replace it with a
default value.
Tell DataRight IQ to
ignore case.
Refer to “Create Search/Replace Function” on page 204 for descriptions of the
parameters in this block. Views users may consult the Views online help.
Chapter 9: Search-and-replace
95
Step 3: Use the search-and-replace function
You can apply a search-and-replace function anywhere that you can apply a filter:
Job-file blockPurposeFunction location
ViewsJob
Apply your function
on input
Input FileApply a function to the
input file.
Post to Input File
Post to Output File
Apply a function before
posting modified data to
Modify PW
Field
Post Data
setting
Copy (Source,
Destination)
Copy (source,
destination)
an input or output file.
Input File
Post to Output File
Input List Description
Select a set of records
for processing or for
inclusion in an output
Filter settingFilter
file, list, or report.
Report: Parsing Error
Report: Change
Report: All Records
Select a set of records
for inclusion in a report.
Report
Options,
Record Filter
Record Filter
By setting up the search-and-replace function in the Input File block, you are
telling DataRight IQ to conduct the search on input. The search-and-replace
function tells DataRight IQ how to conduct the search. When you actually use the
function, you tell DataRight IQ the following information:
Which field do you want to search?
Where do you want to place the results?
Using the example from “Search-and-replace table” on page 93, suppose your
coded prename data is stored in a field called PREFIX. You want to convert the
codes to prenames on input so that DataRight IQ can process the prename data.
In Views: Modify PW FieldTell DataRight IQ what fields to search by using the Modify PW Field tool in the
Input File block.
To access Modify PW Field, click Modify in the Input File window after you’ve
entered the input file path and filename.
In the Modify PW Fields window, you specify where to place the results of the
search-and-replace function. (In our example, it is the Pre_Name field.) Then you
build the expression that you will use. The expression consists of your searchand-replace function, and the database field PREFIX from the example.
Where to place resultsSearch-and-replace
function
Field to search
96
DataRight IQ User’s Guide
After you set up Modify PW Fields, DataRight IQ will search the PREFIX field
in your input file and place the search-and-replace results in the DataRight IQ
input field PW.Pre_Name.
In JobIn Job, you only need to enter your expression in the Copy (source, destination)
parameter:
*BEGIN Post to Input File ======================================
Use a search-and-replace function to process a set of records to be included in an
input file, output file, list, or report. You can apply the function in these blocks:
Input File
Post to Output File
Input List Description
Report: Parsing Error
Report: Change
Report: All Records
Chapter 9: Search-and-replace
97
In ViewsClick Report Details in the report blocks to apply your function. In the other
blocks, use the Filter feature. Below is the sample function in the Report Details
window:
In JobThe function is in the Record Filter parameter.
*BEGIN Report: Change ==========================================
Location and File Name/Printer Device =
Existing File (APPEND/REPLACE)....... =
Number of Copies (1 to 10)........... =
Case (UPPER/Mixed)................... =
**********Portions omitted for illustration *************
Record Filter (to 1024 chars)........ = ConvertPre(DB.PREFIX)
Nth Select Type (USER/AUTO/RANDOM)... = USER
User Nth Select (1.0 - ???).......... =
Max # of Records to Print............ = 500
Field Type (AP/CUSTOM)............... = AP
Custom Copy (src,len[,title])........ =
END
98
DataRight IQ User’s Guide
Convert coded data
You can use search-and-replace to convert data on input. For example, you could
convert numeric prename codes to actual prename data (for example,
1 = Mr.) so that DataRight IQ could work with the prename data during
processing.
You can also convert data before posting it to an output file. For example, to save
storage space you might prefer to store prenames as 1-byte codes. You could use
search-and-replace to convert prenames to codes, then post the codes to the
output file.
ExampleSuppose your input file has a PREFIX field containing coded prename data. You
want to convert the codes to prenames. Since DataRight IQ uses prename data
during processing, you want to convert the codes on input.
Create a search-andreplace table
Create a search-andreplace function
First, set up a search-and-replace table showing each code and its corresponding
prename:
BEGIN Create Internal Table ==============================
Internal Table Name (to 20 chars).... = Prename Table
Table Entry (search,replace)......... = 1, Mr.
Table Entry (search,replace)......... = 2, Mrs.
Table Entry (search,replace)......... = 3, Ms.
Table Entry (search,replace)......... = 4, Miss
Table Entry (search,replace)......... = 5, Dr.
Table Entry (search,replace)......... = 6, Rev.
END
Next, set up a search-and-replace function. Specify the name of the table and the
type of search to perform. For this example, we’ll do a field search because the
code fills the entire 1-byte PREFIX field. We don’t want to leave any stray codes,
so if DataRight IQ finds a code that is not in our table, we’ll replace it with a
default value of nothing (indicated by two double-quotation marks with nothing
in between).
BEGIN Create Search/Replace Function =====================
Function Name (to 10 chars).......... = conv_pre
Internal Table Name (to 20 chars).... = Prename Table
External Table (path & file name).... =
Search Priority (INTERNAL/EXTERNAL).. =
Search & Repl. Method (FIELD/WORD/STR)= field
Default Return Action (ORIG/DFLT).... = dflt
Default Return Value ................ = ""
Case Insensitive Search/Replace(Y/N)........... = N
END
Use the search-andreplace function
Finally, use the search-and-replace function elsewhere in the job. For our
example, we want to convert the coded data as it is input, so we’ll use the searchand-replace function at the Modify PW Field parameter in the Input File block.
When we use the function, we specify which field to search (the search field) and
where to place the results (the destination field):
Modify PW Field = conv_pre(DB.PREFIX), Pre_Name
S/R function
Search field
Chapter 9: Search-and-replace
Destination
PW field
99
Remove unwanted data from a field
You can use search-and-replace to remove unwanted data from a field. The
search-and-replace feature lets you search for substrings or words within a field,
or for the entire contents of a field. This means you can target very specific data
to be removed from a field while leaving the rest of the field intact.
ExampleSuppose you are processing a file in which the Name field in each record includes
a phrase—for example, Mr. John Doe, beneficiary. As you input each record, you
want to remove the phrase from the Name field.
Create a search and
replace table
Create a search-andreplace function
Use the search-andreplace function
First, set up a search-and-replace table. The easiest way to remove a date from a
name field is to search for each of the numerals 0–9 and the delimiter character
( / ), and replace each with nothing (an empty string):
BEGIN Create Internal Table ===============================
Next, set up a search-and-replace function. Specify the name of the table and the
type of search to perform. For this example, we need to do a substring search (str)
because each search value may lie next to or between other characters.
BEGIN Create Search/Replace Function =========================
Finally, use the search-and-replace function elsewhere in the job. For our
example, we want to modify data on input, so we’ll use the search-and-replace
function at the Modify PW Field parameter in the Input File block.
100
DataRight IQ User’s Guide
When we use the function, we specify which field to search (the search field) and
where to place the results (the destination field):
Modify PW Field = removeben(DB.Name), Name_Line1
S/R function
Search field
Destination
PW field
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.