Business objects DATARIGHT IQ 7.70C User Manual

DataRight IQ

User’s Guide

Version 7.70c July 2006

Identify and parse name, title, firm data, phone numbers, Social Security numbers, dates, e-mail addresses, and userdefined patterns

Assign gender and add prenames

Create personalized greetings

Generate match standards

Convert files to a standard format

Search for and replace data

Scan and split data

Generate reports and statistics files

Notices

Published in the United States of America by Firstlogic, Inc., 100 Harborview Plaza, La Crosse, Wisconsin 54601-4071.

Customer Care Technical help is free for customers who are current on their ESP. Advisors are

available from 8 a.m. to 6 p.m. Central time, Monday through Friday. When you call, have at hand the user’s manual and the version number of your Firstlogic product. Call from a location where you can operate your software while speaking on the phone. To save time, fax or e-mail your questions, and an advisor will call or e-mail back with answers prepared. Or visit our Knowledge Base on the Customer Portal web site to find answers on your own, right away, at any time of the day or night.

Our Customer Care group also manages our customer database and order processing. Call them for order status, shipment tracking, reporting damaged shipments or flawed media, changes in contact information, and so on.

What do you think of this guide?

Legal notices

Phone

888-788-9004 in the U.S. and Canada; elsewhere +1-608-788-9000

Fax

Web site

E-mail

Product literature

Corporate receptionist

608-788-2870

http://www.firstlogic.com/customer

customer@firstlogic.com

888-215-6442, fax 608-788-1188, or

information@firstlogic.com

608-782-5000, or fax 608-788-1188

The Firstlogic Technical Publications group strives to bring you the most useful and accurate publications possible. Please give us your opinion about our documentation by filling out the brief survey at http://www.firstlogic.com/customer/surveys/

default.asp. We appreciate your feedback! Thank you!

Firstlogic Proprietary and Confidential

© 2006 Firstlogic, Inc. All rights reserved. This publication and accompanying software are protected by U.S. copyright law and international treaties. No part of this publication or accompanying software may be copied, transferred, or distributed to any person without the express written permission of Firstlogic, Inc.

National ZIP+4 Directory © 2006 United States Postal Service. Firstlogic Directories © 2006 Firstlogic, Inc. All City, ZCF, state ZIP+4, regional ZIP+4, and supporting directories are also protected under the Firstlogic copyright. Firstlogic, Inc. is a nonexclusive interface distributor of the USPS and holds a nonexclusive license to publish and sell ZIP+4 databases on optical and magnetic media. Firstlogic publishes this document and offers the Firstlogic product to the public under a nonexclusive license from the United States Postal Service. The price of the Firstlogic product is not established, controlled, or approved by the U.S. Postal Service.

Firstlogic, Inc., or any authorized dealer distributing this product, makes no warranty, expressed or implied, with respect to this computer software product or with respect to this manual or its contents, its quality, performance, merchantability, or fitness for any particular purpose or use. It is solely the responsibility of the purchaser to determine its suitability for a particular purpose or use. Firstlogic, Inc. will in no event be liable for direct, indirect, incidental, or consequential damages resulting from any defect or omission in this software product, this manual, the program disks, or related items and processes, including, but not limited to, any interruption of service, loss of business or anticipatory profit, even if Firstlogic, Inc. has been advised of the possibility of such damages. This statement of limited liability is in lieu of all other warranties or guarantees, expressed or implied, including warranties of merchantability or fitness for a particular purpose.

Registered trademarks of Firstlogic, Inc. include 1L, 1L (ball design), ACE, ACSpeed, DataJet, DocuRight, eDataQuality, Entry Planner, Firstlogic, Firstlogic InfoSource, FirstPrep, FirstSolutions, GeoCensus, i·d·Centric, IQ Insight, iSummit, Label Studio, MailCoder, Match/Consolidate, PostWare, Postalsoft, Postalsoft Address Dictionary, Postalsoft Business Edition by Firstlogic, Postalsoft DeskTop Mailer, Postalsoft DeskTop PostalCoder, Postalsoft DeskTop Presort, Postalsoft Manifest Reporter, PrintForm, RapidKey, Total Rewards, and TrueName. Trademarks of Firstlogic, Inc. include DataRight, IRVE, and TaxIQ. Trademarks of the United States Postal Service include CASS, DPV, eLOT, FASTforward,

link

and ZIP. All other trademarks are the property of their respective owners.

NCOA

DataRight IQ User’s Guide

Contents...........................................................................................................3

Preface .............................................................................................................7

DataRight IQ Overview ................................................................................ 9

Chapter 1:

Welcome to DataRight IQ .......................................................................... 11

How DataRight IQ works ..............................................................................12

What DataRight IQ can do.............................................................................14

Migrate data from a mainframe system..........................................................17

Prepare records for match processing ............................................................19

Convert file type and format ..........................................................................21

Convert floating data to fielded data..............................................................22

Chapter 2:

Standardize data.......................................................................................... 23

Name format...................................................................................................24

Address data ...................................................................................................25

Convert words to acronyms............................................................................26

Convert case ...................................................................................................27

Other standardization options.........................................................................28

Output standardized data ..............................................................................30

Chapter 3:

Add new data to existing data .................................................................... 31

Gender codes ..................................................................................................32

Prenames ........................................................................................................34

Salutations ......................................................................................................37

Chapter 4:

Custom dictionaries..................................................................................... 41

Create a custom dictionary with UMD .........................................................42

Improve casing results....................................................................................44

Improve parsing results ..................................................................................45

DataRight IQ Job and Views...................................................................... 47

Chapter 5:

Set up a DataRight IQ job .......................................................................... 49

Set up your input files ....................................................................................50

Define input fields..........................................................................................51

Set up auxiliary files.......................................................................................52

Set up your output files ..................................................................................53

Verify that your job is ready ..........................................................................54

Contents

Chapter 6:

Overview of data parsing ........................................................................... 55

DataRight IQ uses parsing dictionary and word patterns .............................. 56

DataRight IQ uses rule-based parsing .......................................................... 57

Presumptive parsing....................................................................................... 58

Parse discrete components and lines..............................................................60

Parse floating data.......................................................................................... 61

DataRight IQ’s multiline parsing order .........................................................63

Modify how DataRight IQ parses..................................................................64

Chapter 7:

Details of data parsing................................................................................ 65

Parse street addresses.....................................................................................66

Parse e-mail addresses ................................................................................... 67

Parse Social Security number ........................................................................69

Parse dates......................................................................................................72

Parse phone numbers ..................................................................................... 73

Parse user-defined patterns ............................................................................ 76

Parse names and titles....................................................................................77

Parse firms .....................................................................................................77

Library only: Associate a title with a name on the same line........................ 78

Library only: Associate a title with a name on a different line......................79

Library only: Associating name lines with title lines ................................80

Chapter 8:

Process parsed data .................................................................................... 81

Standardize data............................................................................................. 82

Add new data ................................................................................................. 83

Remove unwanted data ................................................................................ 84

Convert dates .................................................................................................85

Post the data that you want ............................................................................ 86

Chapter 9:

Search-and-replace ..................................................................................... 89

Simple search and replace.............................................................................. 90

How to use search-and-replace ......................................................................92

Step 1: Create a search-and-replace table ...................................................... 93

Step 2: Create a search-and-replace function ................................................ 95

Step 3: Use the search-and-replace function.................................................. 96

Convert coded data ........................................................................................ 99

Remove unwanted data from a field............................................................100

Search and put.............................................................................................. 101

Select a subset of records............................................................................ 102

Additional examples .................................................................................... 103

Chapter 10:

Scan-and-split............................................................................................ 107

Step 1: Create a scan-and-split table............................................................ 109

Step 2: Create a scan-and-split function ......................................................111

Step 3: Apply the scan-and-split function.................................................... 113

Advanced techniques: ..................................................................................116

DataRight IQ User’s Guide

Chapter 11:

Input and output........................................................................................ 119

Select and control input and output records.................................................126

Set up output files.........................................................................................133

Chapter 12:

Reports and statistics ................................................................................ 141

Manage your reports and statistics files .......................................................143

Generate statistics files.................................................................................145

About reports................................................................................................147

About statistics files .....................................................................................161

Chapter 13:

Data-quality scores and codes .................................................................. 165

Confidence scores ........................................................................................166

Using parsing-confidence scores..................................................................167

Confidence scores for Names.......................................................................169

Confidence scores for firms .........................................................................172

Change codes................................................................................................173

Quality codes................................................................................................176

Error codes ...................................................................................................179

Status codes ..................................................................................................180

Chapter 14:

Master job file............................................................................................ 183

Chapter 15:

Job-file blocks and parameters ................................................................ 193

Auxiliary Files..............................................................................................194

Create File for Output ..................................................................................196

Create Internal Table....................................................................................199

Create Scan/Split Function...........................................................................201

Create Scan/Split Table................................................................................203

Create Search/Replace Function ..................................................................204

Execution......................................................................................................206

General ........................................................................................................209

Input File ......................................................................................................210

Input List Description...................................................................................214

Multiline Parsing..........................................................................................216

Output Control..............................................................................................217

Parsing Control.............................................................................................218

Post to Input File ..........................................................................................219

Post to Output File........................................................................................220

Report Defaults ............................................................................................223

Report: All Records......................................................................................227

Report: Change.............................................................................................230

Report: Executive Summary Report: Input File Summary Report: Input List Summary

Report: Job Summary...................................................................................231

Report: List Quality......................................................................................232

Report: Output File.......................................................................................233

Report: Parsing Error ...................................................................................234

Salutation Options........................................................................................235

Contents

Standardization/Assignment Control...........................................................237

Statistics Files .............................................................................................. 242

Undetermined List Action ...........................................................................243

Unicode Conversion ....................................................................................244

247

Welcome to DataRight IQ Library ......................................................... 247

Chapter 16:

Overview of DataRight IQ Library......................................................... 249

Phases of operation ...................................................................................... 250

Application Program Interfaces (API).........................................................251

Chapter 17:

Setup phase................................................................................................ 253

Define input fields ....................................................................................... 254

Input settings................................................................................................ 256

Setup methods..............................................................................................264

Chapter 18:

Validation, parsing, and output............................................................... 267

Validation Phase .......................................................................................... 268

Parsing Phase ...............................................................................................269

Output Phase ................................................................................................270

The query method ........................................................................................273

Chapter 19:

Library compiling and linking................................................................. 275

Sample program........................................................................................... 276

Chapter 20:

Library fields and options........................................................................ 279

General defines ............................................................................................ 280

Library options............................................................................................. 281

Miscellaneous options..................................................................................283

Options for names and firms........................................................................284

Options for greetings ................................................................................... 287

Options for dates.......................................................................................... 289

Options for phone numbers..........................................................................292

Options for Social Security numbers........................................................... 293

Options for disabling parsing.......................................................................294

Gender codes................................................................................................295

Chapter 21:

Library methods for C++......................................................................... 297

Chapter 22:

Library functions for C ............................................................................ 309

Library API functions ..................................................................................311

Index............................................................................................................ 333

DataRight IQ User’s Guide

Preface

Firstlogic’s DataRight IQ product includes a number of user guides, reference guides, and online documentation (see below).

About this guide This guide is divided into three units: User’s Guide, Job-File Reference, and

Library Guide.

Document For those

who use

DataRight IQ

any Explains DataRight IQ’s capabilities in general terms,

User’s Guide

Description

with information specific to different implementations (Job, Views, and Library) in separate units (see below).

any Unit 1:

DataRight IQ

Contains an overview of the software’s tools and features.

Overview

Job and/or Vie ws

Unit 2: DataRight IQ Job and Views

Contains more detailed information about the software’s capabilities and includes some setup instructions that you can use in your own job creation.

Library Unit 3:

DataRight IQ Library

Contains the detailed information that you need to set up and run the Library implementation of DataRight IQ.

Conventions used The following conventions are used throughout this guide and other Firstlogic

documentation:

Convention Description

Bold We use bold type for file names, paths, emphasis, and text that you

should type exactly as shown. For example, “Type

cd\dirs

.”

Italics We use italics for emphasis and text for which you should substitute

your own data or values. For example, “Type a name for your file, and the

.txt

extension (

testfile

.txt

).”

Menu commands We indicate commands that you choose from menus in the following

format: Menu Name > Command Name. For example, “Choose File > New.”

We use this symbol to alert you to important information and potential problems.

We use this symbol to point out special cases that you should know about.

We use this symbol to draw your attention to tips that may be useful to you.

Preface

Documentation

Other documentation Besides this user’s guide, DataRight IQ comes with other documentation to help

you fully use the application’s abilities.

DataRight IQ Overview

Contents

Job file



Views



Library



for information specific to a particular implementation, such as step-by-step procedures.

Welcome to DataRight IQ 11

Standardize data 23

Add new data to existing data 31

Custom dictionaries 41

The chapters in this unit are for all DataRight IQ users. The information in this unit applies whether you use the Job, Views, or Library implementation of DataRight IQ.

These chapters provide a conceptual overview of DataRight IQ’s capabilities. They also include many references to later chapters

DataRight IQ Overview

DataRight IQ User’s Guide

Chapter 1: Welcome to DataRight IQ

DataRight IQ is advanced data-parsing software that identifies the information in your database so that you can use it more effectively.

What is parsing? DataRight IQ identifies and isolates data from other data. We call this parsing. It

can parse a wide variety of data:

 address

 E-mail

 SSN

 date

 phone numbers (US and Canada)

 phone numbers (international)

Batch versus nonbatch parsing

 user-defined pattern matching (UDPM)

 name

 firm

It can parse this data even if it’s floating in unfielded lines.

While parsing, the software can standardize your data to make it more consistent, add gender information and salutations, and create output records in your preferred target format.

No matter which implementation of DataRight IQ you have (Views, Job, RAPID, Library, and so on), you can parse the same data. However, there are some differences in how DataRight IQ parses data depending on its implementation.

DataRight IQ Views and Job implementations: Parses batches of data (data in databases).

DataRight IQ Library implementation: Parses data one record at a time according to your setup. Its behavior depends on how you integrate it with your software.

Keep these differences in mind when reading DataRight IQ’s documentation. When the documentation refers to accessing databases or refers to a batch process, this only applies to Views and Job, it doesn’t apply to DataRight IQ Library.

Chapter 1: Welcome to DataRight IQ

How DataRight IQ works

Job and Views Below are the basic steps DataRight IQ (Job and Views) takes as it processes a

record:

Step Description

Input records The software takes one record at a time from a database.

Select records for processing

The software selects and processes only the records you want. For example, you can select records based on specific criteria such as age or income, or select a representative sample of records for processing.

Modify input data

The software modifies or converts input data on input. For example, DataRight IQ could convert prename codes to prenames or remove unwanted data from a field.

Parse data The software identifies and isolates names, job titles, firm data,

addresses, e-mail addresses, Social Security numbers, dates, phone numbers, and any other user defined patterns. It then breaks the data into lines and individual components such as prename, first name, middle name, last name, title, for example.

Standardize data The software performs case conversion, standardizes common words

(Incorp becomes Inc.), and fills in missing city, state, or ZIP Code data. It also can convert firm names, titles, and other data to acronyms (General Motors becomes GM).

Split records The software splits one input record into several output records. For

example, if an input file contains multiple names, DataRight IQ can create a separate record for each person.

Generate new data

The software also splits combined names such as

Smith

into individual names such as

John Smith

The software generates new data that you can add to the record:

• Gender codes.

DataRight IQ

assigns a gender code for each

John and Mary

and

Mary Smith

name.

• Prenames. If

DataRight IQ

is confident of a name’s gender, it

can assign the prename Mr., Ms., or Mrs.

• Match standards. or potential matching words. For example,

DataRight IQ

can generate match standards,

DataRight IQ

can tell you that Patrick and Patricia are potential matches for the name Pat.

• Greetings.

DataRight IQ

generates personal salutations in various styles: formal (Dear Mr. Shakespeare), casual (Dear William), and title (Dear Playwright).

DataRight IQ User’s Guide

Select records for output

For each output file, the software selects only the records you want. For example, you can select records based on specific criteria (such as gender) or select a representative sample of records.

Output data The software offers four types of data for output:

• Standardized data parsed into components and lines.

• Unstandardized (raw) data parsed into components and lines.

• Raw data taken directly from the input file (not parsed).

• Additional data and codes generated during processing.

Library Below are the basic steps that DataRight IQ (library implementation) takes as it

processes a record:

Step Description

Input records You pass one record at a time to DataRight IQ Library.

Parse data The software identifies and isolates names, job titles, firm data,

Standardize data DataRight IQ can perform case conversion and standardize com-

mon words (Incorp becomes Inc.). DataRight IQ can also convert firm names, titles, and other data to acronyms (General Motors becomes GM).

Split records DataRight IQ also splits combined names such as

Smith

Generate new data

into individual names such as

DataRight IQ generates new data that you can add to the record:



Gender codes.

DataRight IQ assigns a gender code for each

John Smith

John and Mary

and

Mary Smith

name.



Prenames.

can assign the prename



Match standards.

If DataRight IQ is confident of a name’s gender, it

Mr., Ms.

, or

Mrs.

DataRight IQ can generate match standards, or potential matching words. For example, DataRight IQ can tell you that name



Greetings.

ous styles: formal (

iam

), and title (

Patrick

and

Pat

Patricia

are potential matches for the

DataRight IQ generates personal salutations in vari-

Dear Mr. Shakespeare

Dear Playwright

), casual (

Dear Will-

Output data DataRight IQ makes the parsed data and new data available for

output. You retrieve the items that you want.

Chapter 1: Welcome to DataRight IQ

What DataRight IQ can do

We’ve discussed how DataRight IQ works, now let’s introduce you to the many things that DataRight IQ can do with your data. We’ll explain some of these in more detail later in this guide.

Convert file format If you rent or purchase data, or if you process data from multiple sources,

DataRight IQ can help you prepare your data for further processing. Using DataRight IQ (Job or Views), you can process up to 255 input files at one time— regardless of their format—and convert them to your target format.

If data floats in unfielded lines in the record, DataRight IQ can identify information and isolate it so you can place each data element exactly where you want in the output file.

Parse data You can use DataRight IQ to identify and isolate a wide variety of data—even if

the data is floating in lines.

Input data

Parsed data

Mr. Dan R. Smith, Jr., CPA Account Mgr. Jones Inc. Dept. of Accounting 421-55-2424 PO Box 567 Biron, WI 54494 drsmith@jonesinc.com 507-555-3423 Jan 3, 2003

Prename Mr.

First Name Dan

Middle Name R.

Last Name Smith

Maturity

Jr.

Postname

Honorary

CPA

Postname

Title Account Mgr.

Firm Jones Inc.

Firm Location Dept. of Accounting

Social Security 421-55-2424

E-mail Address drsmith@jonesinc.com

Phone 507-555-3423

Date August 20, 2003

Last Line Biron, WI 54494

Address PO Box 567

DataRight IQ User’s Guide

• DataRight IQ (Job and Views) parses up to six names per record. For all six names found, it parses components such as prename, first name, middle name, last name, and postname. Then it sends the data to individual fields. DataRight IQ also parses up to six job titles from each record.

• DataRight IQ (Job and Views) parses up to two firm names (such as IBM) and up to two firm locations (such as Engineering Dept.) per record. DataRight IQ can also convert firm names to accepted acronyms—for example, it can convert General Motors Corp. to GM.

• DataRight IQ (Job and Views) parses U.S. address lines, as well as city, state, and ZIP Code data.

Select records DataRight IQ (Job and Views) offers advanced record-selection features on input

and output. You decide what records will be processed and what records will be included in your output files:

• You can select records based on specific criteria such as gender, age, or geographical data.

• You can also select a representative sample of records—for example, you could select 50,000 records at random from throughout a file (Job and Views).

Standardize data DataRight IQ can standardize data to make your records more consistent. Some

things that it can standardize include case, punctuation, and acronyms. For more information on DataRight IQ’s standardization abilities, see “Standardize data”

on page 23.

Assign gender and prenames

Create personalized greetings

DataRight IQ assigns a precise gender code to each name. DataRight IQ offers several levels of gender codes—strong male, strong female, weak male, weak female, and ambiguous. The intelligence behind gender assignment lies partly in the parsing software, and partly in the parsing dictionary. For more information, see “Gender codes” on page 32.

When DataRight IQ can assign a gender with a strong or weak confidence, it can also assign a prename: Mr., Ms., or Mrs. For more information, see “Prenames”

on page 34.

You can use DataRight IQ to create personalized greetings in formal, casual, and title styles: Dear Mr. Shakespeare, Dear William, and Dear Playwright. DataRight IQ creates a greeting for each person, as well as an overall greeting for the entire record (for example, Dear William and Harold or Dear Sirs).

You can customize greetings by specifying the greeting word and ending punctuation. For example, the default casual style yields Dear William, but you could create greetings such as Greetings, William! instead.

For more information, see “Salutations” on page 37.

Chapter 1: Welcome to DataRight IQ

Perform advanced search-and-replace

DataRight IQ (Job and Views) offers a powerful search-and-replace feature that lets you convert and modify data:

• Convert coded data. For example, you could convert prename codes to prenames.

• Remove unwanted data. You can search for unwanted data and delete it by replacing it with nothing. For example, you could remove dates from a name field.

• Search-and-replace. You can perform a traditional search-and-replace, searching for a specific value and replacing it with another value. For example, you could search for Occupant and replace it with Current Resident.

• Search-and-put. You can search for a value in one field and put the “replacement” value into a different field, leaving the original field intact. For example, you could search for income codes in a field from your input file, convert the codes to income ranges, and put the results into a field in your output file.

Split combined names If your records include combined names—most often, married couples or other

family members—you can use DataRight IQ to split them. You can leave both names in one record or create two separate output records.

Ms. Patricia Jones

Ms. Patricia and Mr. William Jones

Mr. William Jones

Scan-and-split The scan-and-split feature is designed to more precisely arrange data from fields

that contain complex information, such as mixed combinations of names, firms, account numbers, special designations (as shown below), or dates. For example, a field might contain data like the following:

John Smith, Trustee for Mary Smith

Scan for: Split method:

Trustee for

After

Line1:

John Smith, Trustee for

Line2:

Mary Smith

Use the scan-and-split feature to manipulate parts of the data into either one, two, or three fields, and designate combinations of strings. In the example above, the scanned phrase of Trustee for is placed after the first name listed.

Refer to “Step 1: Create a scan-and-split table” on page 109 for detailed information about the Scan-and-Split feature and how to set it up in DataRight IQ.

DataRight IQ User’s Guide

Migrate data from a mainframe system

Suppose a database on your mainframe system contains data entered during dayto-day operations. The first five fields of the database contain different data from record to record (the data “floats” between fields) and often contain extraneous data that you want to remove. The database also contains several fields of coded data that you want to convert, as well as a date field that you need to convert from 2-digit years to 4-digit years.

You can use DataRight IQ to conform to your master format:

 identify and parse floating data

 remove extraneous information

 convert coded data

 convert and parse dates

Identify and parse floating data

If your input file is in a multiline format (fields contain different data from record to record), DataRight IQ can identify and parse individual data elements and put them into separate fields.

Input data

Line1 Line2 Line3 Line4 Line5 City State ZIP

Mr. Dan Williams CPA Jones Engineering Inc. Dept. of Accounting PO box 567 1234 Main St La Crosse WI 54601

Jan Smith, President

Smith Consulting 123 W 4th St

Winona MN 55987

Parsed data

Name Title Firm Firm Location Address1 Address2 City State ZIP

Mr. Dan Williams CPA Jones Engineering Inc. Dept. of Accounting 1234 Main St PO box 567 La Crosse WI 54601

Jan Smith President Smith Consulting

123 W 4th St

Winona MN 55987

Chapter 1: Welcome to DataRight IQ

Remove extraneous information

You can use DataRight IQ to remove extraneous information from a field. For example, you could remove a date from a name field:

Input data

Cleansed record

Line1

Line2

Line3

Ann Jones

3/2/04

123 Main St

Onalaska WI 54650

Name

Address1

Last Line

Date

Ann Jones

123 Main St

Onalaska WI 54650

03/02/2004

Convert coded data Suppose your database has several fields that contain coded data. You eventually

want to consolidate the data in this database with the data in a master database. You want to convert the codes so that they are consistent with the coding system in the master database.

Your coding system lists a variety of prenames and assigns a number to each. Similarly, you have numbers representing gender, marital status, and different levels of income.

Input data

Prename: Gender: Marital status: income:

Converted data

8 1 2 3

Prename: Gender: Marital status: income:

Dr F S 50,000 to 75,000

Convert and parse dates

Here is what the coding system may look like for this example:

Prename: 8 = Dr

Gender: 1 = female

Marital status: 2 = single

Income: 3 = 50,000 to 75,000

If your database contains a date field, the software can convert the format to your standard, and even parse the data into fields:

Input data

013197

Converted data

Date Ye a r Month Day

1/31/1997 1997 January

DataRight IQ User’s Guide

Prepare records for match processing

Suppose you maintain a large master database. On a daily basis you consolidate data from a number of sources and add records to your master database. In the master database you want just one record for each person, so you use a matching process to eliminate duplicate records and consolidate information about each individual into a single record.

DataRight IQ can help you prepare your records for matching.

Parse data DataRight IQ can parse (identify) individual data components and put them into

separate fields. This makes it easier to match data, because you can compare apples to apples. For example, you can compare a last name in one record to a last name in another record rather than compare whole lines of data.

Input data

Line1 Line2 Line3 Line4 Line5

Intl. Marketing, Inc. Dept. of Sales Pat Smith, Sales Mgr. 328 Bluebird Ln Biron WI 54494

Parsed data

First Name Last Name Title Firm Firm Location Address City State ZIP

Pat Smith Sales Mgr. Intl. Marketing, Inc. Dept. of Sales 328 Bluebird Ln Biron WI 54494

Standardize firm names

Convert nonmailing city names

You can also use DataRight IQ to standardize inconsistent firm names. Consistency among records can help you improve matching results.

 DataRight IQ standardizes commonly used words in firm names:

International Harvester, Inc. Internatl. Harvester, Incorp.

Intl. Harvester, Inc.

Intl. Harvester, Incorporated

 DataRight IQ can also convert firm names to accepted acronyms:

International Business Machines Internatl. Business Machines

IBM

Intl. Business Machines Corporation

DataRight IQ can convert nonmailing, or “vanity,” city names to the city name preferred by the U.S. Postal Service. This improves consistency among records.

Hollywood

Los Angeles

Chapter 1: Welcome to DataRight IQ

Provide match standards for name data

DataRight IQ can provide match standards for first and middle names. For example, DataRight IQ can tell you that Patrick and Patricia are potential matches for the first name Pat.

Pat

Patrick Patricia

Match standards can help you overcome two types of matching problems: alternate spellings (Catherine and Katherine) and nicknames (Pat and Patrick).

Example This example shows how DataRight IQ can prepare records for matching.

Input record from data source 1

Intl Marketing, Inc. Dept. of Accounting Pat Smith, Accounting Mgr. 328 Bluebird Ln

Input record from data source 2

Smith, Patricia R. International Mktg, Incorp. 328 Bluebird Ln Wisconsin Rapids, Wisconsin

Biron, WI 54494

Cleansed record

First Name Match Standards Middle Name Last Name Title Firm Firm Location Address City State ZIP

Pat Patrick, Patricia

Smith Accounting Mgr. Intl. Mktg, Inc. Dept. of Accounting 328 Bluebird Ln Wisconsin Rapids WI 54494

Cleansed record

First Name Match Standards Middle Name Last Name Title Firm Address City State ZIP

Patricia Patricia R. Smith

Intl. Mktg, Inc. 328 Bluebird Ln Wisconsin Rapids WI 54494

DataRight IQ User’s Guide

Convert file type and format

Suppose you run a service bureau. One of your clients recently rented two lists from two different list brokers. One list is an ASCII file and one is a dBASE file, and the files are in different formats. Your client wants you to convert the lists to his preferred “house” format and put all the records in a single database. The client wants to evaluate each broker and asks you to provide information about the quality of the data in each list.

You can use DataRight IQ to do the following:

 Convert files to the desired type and format.  Consolidate records into a single output file.  Generate reports containing separate data-quality statistics for each list.

Format

Name Address1 Address2 Address3 Phone

Format

First_Name Mid_Init Last Name Address Apt City State ZIP Phone

John A. Smith 1234 Main St La Crosse WI 54601 608-555-1212

Mary R Jones 913 12th Ave S Apt 30 Onalaska WI 54601 608-123-4567

ASCII

dBASE

Format

Name Address1 Last_Line Phone

dBASE

Name Address1 Last_Line Phone

Reports contain separate statistics for each list

Report

List A

John A. Smith 1234 Main St La Crosse WI 54601 608-555-1212

Ms. Mary R Jones 913 12th Ave S Apt 30 Onalaska WI 54601 608-123-4567

List B

Chapter 1: Welcome to DataRight IQ

Convert floating data to fielded data

Suppose that your company has traditionally stored customer data on a mainframe system. The database has an open format: Data is stored in lines, and there is little or no consistency in the position of data within these lines. You have a huge amount of data about your customers, but that information is not very accessible in its current format.

The most important step in making the data more useful and accessible is to identify specific data elements and put them into separate fields. DataRight IQ can do this because of the many pieces of information that it can identify and isolate. (Refer to “What is parsing?” on page 11 for a complete list.)

Input data

Line1 Line2 Line3 Line4 Line5 Line6

Ms. Roberta L. Williams, CPA Jones Engineering Inc. Dept. of Accounting PO Box 123 La Crosse WI 54601

Parsed data

Prename First Name Middle Name Last Name Postname Title Firm Firm Location Extra Extra

Ms. Roberta L. Williams CPA

Jones Engineering Inc. Dept. of Accounting PO Box 123 La Crosse WI 54601

Input data

Line1 Line2 Line3 Line4 Line5 Line6

Smith Consulting William Smith, Jr., President 123 W 4th St Winona MN 55987-3546

Parsed data

Prename First Name Middle Name Last Name Postname Title Firm Firm Location Address Lastline

William

Smith Jr. President Smith Consulting

123 W 4th St Winona MN 55987-3546

DataRight IQ User’s Guide

Chapter 2: Standardize data

Read this chapter to find out how DataRight IQ makes your data more consistent from record to record through standardization.

Chapter 2: Standardize data

Name format

Name format is the sequence of name components in a name line. For example, First-Middle-Last or Last-First-Middle. If the name format is consistent throughout your database, you get better name-parsing results if you “tell” DataRight IQ the name format. Do this by setting the name format in your definition (DEF) file.

However, DataRight IQ adheres to the way it's set in your DEF file only when the input is ambiguous. If you want DataRight IQ to apply “strict” name order, select the Strict Name Order option in the Input File block. Then DataRight IQ parses name data the way you set it in the DEF file.

Known name format When the software knows the name format (and has compared the data to the

rules), it will correctly identify the first name and last name. But, if it does not know the name format, in some cases it may not be able to determine which name is the first and which name is the last. If a name is determined to be ambiguous, the software assumes that the name format is first-middle-last.

For example, if the software doesn’t know the name format, it could not accurately decide if a name such as Carey Donnell should be Donnell Carey. Both names could be first names.

With a known name format of last-first-middle, DataRight IQ knows that the first name in the sequence is really the last name. It will then output either Donnell, Carey if your output format is set to last-first-middle, or Carey Donnell if your output format is set to first-middle-last.

Inconsistent name format

The software can accept and process input names even if the name format varies from record to record. For example, it can accept and process a file in which some names are in the First-Middle-Last sequence, and others are in the Last-FirstMiddle sequence.

If the name format is inconsistent in your database, tell DataRight IQ that the name format is unknown. It will then identify first, middle, and last name components based on the data itself.

Refer to “Define input name format” on page 51 for details about setting up name format in your jobs.

DataRight IQ User’s Guide

Address data

DataRight IQ can standardize city, state, and ZIP Code data and can assign missing or invalid last-line data. This can help you fill in missing data, overcome inconsistencies, and produce standardized address output.

Limited last-line standardization

If you choose, DataRight IQ can perform limited last-line standardization:

 Standardize state data to the U.S. Postal Service abbreviation. For example,

Wisc. becomes WI.

 Provide city and state data based on the ZIP Code. If the city or state is

missing or misspelled, or if the city-state combination is not valid, DataRight IQ assigns the correct city and state for the input ZIP Code:

Input Output

La Crescent 55947 La Crescent MN 55947

Onalsky WI 54650 Onalaska WI 54650

 Provide ZIP Code data. If a ZIP Code is missing or is not valid for the city

and state, DataRight IQ can assign a default ZIP Code:

Input Output

Minneapolis MN Minneapolis MN 55401

La Crosse WI 54620 La Crosse WI 54601

Convert nonmailing city names

Caution: Some cities have more than one 5-digit ZIP Code. The default

ZIP may or may not be correct for the mailing address in the record.

If you standardize the last line, you can also convert nonmailing city names. A nonmailing city is a city that is served by a post office located in another city. DataRight IQ can convert nonmailing city names to post-office city names:

Input Output

Hollywood CA Los Angeles CA 90027

Little Chute WI 54914 Appleton WI 54914

Chapter 2: Standardize data

Convert words to acronyms

DataRight IQ can convert components to widely accepted acronyms. You can use DataRight IQ to convert firm names, titles, and so on, to standard acronyms.

For example, DataRight IQ can convert General Motors to GM, or National Aeronautics and Space Administration to NASA. Also, given the input Chief Executive Officer, DataRight IQ can produce CEO as output.

Acronyms can help you overcome inconsistencies among input records. If you are preparing data for matching, more consistent data may lead to better matching results, especially for firm names.

International Business Machines Internatl. Business Machines Intl. Business Machines Corpora-

DataRight IQ produces an acronym only when one is available in the parsing dictionary—it does not produce initials by algorithm or rule. That is, DataRight IQ produces only accepted acronyms for selected companies. DataRight IQ does not simply take the first letter of each word in the firm name and return a pseudo “acronym.”

IBM

You can enhance and customize acronyms by using the User Modifiable Dictionary (UMD) program. For details on UMD, see your DataRight IQ Modifier’s Guide or the online help that accompanies UMD Views.

DataRight IQ User’s Guide

Convert case

DataRight IQ offers three styles of casing, upper, mixed, and lower. You can also choose to retain (preserve) the same case that was used in the input record.

Output Original data Resulting data

Upper Dr. John McKay, Ph.D. DR. JOHN MCKAY, PH.D.

Mixed DR. JOHN MCKAY, PH.D. Dr. John McKay, Ph.D.

Lower Dr. John McKay, Ph.D. dr. john mckay, ph.d.

Preserve DR. John MCKay, PH.D. DR. John MCKay, PH.D.

Intelligent mixed case The basic rule of mixed case is to capitalize the first letter of the word and put the

rest of the word in lowercase. However, there are exceptions:

 Some words should be in all uppercase, such as IBM and RN  Some words should be in all lowercase, such as of and or  Some words contain an internal capital letter, such as McKay

DataRight IQ can intelligently apply the correct casing to these and many other mixed-case exceptions.

When DataRight IQ applies mixed case, it doesn’t simply look at the spelling of the word. DataRight IQ also looks at how the word is being used. For example, the word MS should have an uppercase “S” when it is used as a postname meaning “Master of Science,” but the “s” should be lowercase for the prename Ms. DataRight IQ intelligently applies the correct casing:

Input Output

JOHN R. SMITH, MS John R. Smith, M.S.

MS. MARY ANN JOHNSON Ms

. Mary Ann Johnson

You can customize mixed casing by creating a custom capitalization dictionary. For details, see “Improve casing results” on page 44.

Chapter 2: Standardize data

Other standardization options

Some of the options below can be set in the Standardization Options block or the Salutation Options block. Others are done automatically by DataRight IQ. For example, DataRight IQ always standardizes common job-title words and firm words, and strips punctuation from address lines.

Common job-title words

Address-line punctuation

Retain original punctuation

When your input data contains job titles, DataRight IQ standardizes them. For example:

Mr John A Smith, C. E. O. Mr. John A. Smith CEO

DataRight IQ removes punctuation from address lines:

Input Standardized output

123 S. Main St., Apt. 4 123 S Main St Apt 4

If you prefer to retain the original punctuation, you must also retain the original text.

Mr. John A. Smith C.E.O. Mr. John A. Smith C.E.O.

Note: When you retrieve original text, you always get original punctuation, regardless of your setting for punctuation standardization.

Phone number formats, and extensions

Use DataRight IQ’s phone number settings to standardize phone numbers from the United States and Canada. (There aren’t any format options for International numbers (non-US) because they only consist of the country code and number.)

You can choose from one of these phone formats:

 (xxx)xxx-xxxx (default)

 xxx-xxx-xxxx

 xxxxxxxxxx

In addition, you can enter the phone extension text that you want to appear in front of extension numbers. For example, enter Ext. to appear in front of extension numbers.

Formats for dates Use DataRight IQ to change date formats. DataRight IQ offers many date formats

for you to choose from.

In addition to choosing the format for dates, you can specify the delimiter to use. Choose from <none>, <space>, a forward (/) or backward (\) slash, or a dash.

DataRight IQ User’s Guide

Job In Job, enter the Format number and Delimiter number that relate to the format

and delimiter that you want. A list of date formats and delimiter formats follows the Standardization/Assignment Control block. The parameters are Output Date Format and Output Delimiter Format.

* Date Format Options: * Format 1 - YYYY*MM*DD * Format 2 - YY*MM*DD * Format 3 - DD*MM*YYYY * Format 4 - DD*MM*YY * Format 5 - MM*DD*YYYY * Format 6 - MM*DD*YY * Format 7 - DD*MMM*YY * Format 8 - DD*MMM*YYYY * Format 9 - MMM*DD*YYYY * Format 10 - MMM*DD*YY * Format 11 - YYYYMMDD * Format 12 - YYMMDD * Format 13 - DDMMYYYY * Format 14 - DDMMYY * Format 15 - MMDDYYYY * Format 16 - MMDDYY * Date/SSN Delimiter Options: * Delimiter 1 - '' (No delimiter) * Delimiter 2 - ' ' (Space) * Delimiter 3 - '/' * Delimiter 4 - '-' * Delimiter 5 - '\' * Delimiter 6 - '.'

Views For Views users, choose a date format and date delimiter from the drop-down

lists in the Standardization Style window.

You can also control date format on input by completing the Input File options Input Date Format (Month before Day) and Input Date Format (Year first). These are in the Input File block. Refer to “Input Date Format (Month before Day) Input

Date Format (Year first)” on page 211 for descriptions.

Chapter 2: Standardize data

Output standardized data

You can output either standardized or unstandardized data from DataRight IQ. To output standardized data, post output fields (AP.fieldname) to your output file. If you prefer unstandardized data, post database (DB.fieldname), input fields (PW.fieldname), or output fields that contain parsed, unstandardized data (APU.fieldname).

For more information about data available for posting, see “Post the data that you

want” on page 86. For a complete list of PW, AP, and APU fields, see Firstlogic’s

Quick Reference for Views and Job-File products.

DataRight IQ User’s Guide

Chapter 3: Add new data to existing data

You can augment your existing data by having DataRight IQ add data such as gender codes, prenames, and salutations. This added data can include the following:

 gender codes (such as “strong female”)

 prenames (such as “Ms.” or “Mr.” or Mrs.”)

 match standards

 salutations (such as “Dear Ms. Jones”)

For example, given the information at left, DataRight IQ can provide you with the data at right.

Lori Jones, CEO

Name Gender Prename Match Standards Formal Greeting Casual Greeting Title Greeting

Ms. Lori Jones Strong female Ms.

LAURA, LOREN Dear Ms. Jones: Dear Lori: Dear CEO:

Chapter 3: Add new data to existing data

Gender codes

Add precise gender information

For each name in the record, DataRight IQ assigns a gender code.

You can add gender data to your database, or you can use gender codes to select records. For example, if you’re mailing a women’s newsletter, you could select records for women only.

Reliable gender data The name and gender data that DataRight IQ uses for determining gender is

obtained from a variety of sources and compiled into the Parsing dictionary (parsing.dct). There is a gender value given to names in the dictionary that helps DataRight IQ determine if the name is male or female. Here’s a look at how DataRight IQ does it:

Gender Description Example

Strong male almost certainly male

94 to 100% of people with this name are male

Weak male probably male

(70 to 94% of people with this name are male)

Ambiguous name does not reliably indicate gender

(Fewer than 70% male, fewer than 70% female)

Weak female probably female

(70 to 94% of people with this name are female)

Robert

Te rr y

Pat

Lynn

if female prename with male name

If male prename with female name

Strong female almost certainly female

(94 to 100% of people with this name are female)

Anne

To assign a gender code, DataRight IQ looks up the gender for the prename and the first name, then uses gender-assignment rules to assign a gender code.

It is not uncommon for a married woman to combine a female prename with her husband’s name. DataRight IQ considers the name strong female:

Mrs. Fred Saeger = strong female

If a male prename occurs with a female name, DataRight IQ usually considers the gender “ambiguous”:

Mr. Terri Smith = ambiguous

DataRight IQ User’s Guide

Gender codes and how to retrieve them

DataRight IQ assigns a gender code for the first six names found in a record. You can retrieve gender codes from the output field AP.Gender.

Code Description Example

1 DRL_MALE_STRONG Strong male. (High confi-

dence that the person is male. That is, the name belongs to someone who is almost certainly a male.)

2 DRL_MALE_WEAK Weak male. (Some confi-

dence that the person is male. That is, the name belongs to someone who is probably male.)

3 DRL_AMBIGUOUS Ambiguous. (The name

does not reliably indicate a gender. The name could be either male or female.)

4 DRL_FEMALE_WEAK Weak female. (Some con-

fidence that the person is female. That is, the name belongs to someone who is probably a female.)

5 DRL_FEMALE_STRONG Strong female. (High

confidence that the person is female. That is, the name belongs to someone who is almost certainly a female.)

John

Adrian

Pat

Lynn

Mary

6 DRL_MULTI_MIXED Multiple names, at least

one male and at least one

John and

Mary female (no ambiguous or unassigned genders).

7 DRL_MULTI_NAMES_MALE Multiple names, all male. John and

Adrian

8 DRL_MULTI_NAMES_FEMALE Multiple names, all

female.

9 DRL_MULTI_NAMES_AMBIGUOUS Multiple names, at least

one ambiguous (but none

Mary and

Lynn

William and

Pat unassigned).

0 DRL_UNASSIGNED Unassigned. (The first

name could not be found

PVT first CL

Jkiloji Smith in the dictionary and any prename is gender-neutral.)

Chapter 3: Add new data to existing data

Prenames

Add prenames as separate components or in name lines

If DataRight IQ can assign a strong or weak gender to a name, it can also assign a prename. For example, given John Smith as input, DataRight IQ can assign the prename Mr. You can add prenames to your database, either in a separate field or as part of a name line.

You can retrieve prenames as separate components or as part of a name line (up to six names).

Input Output

John Smith Prename: Mr.

Name: Mr. John Smith

If the input does not include a prename, DataRight IQ assigns a prename based on gender. DataRight IQ offers two options so that you can get the prename results you want.

In Job, find the parameters in the Standardization/Assignment Control block:

Use Generated Prenames (Y/N)......... = Y

Female Prename Assignment (MS/MRS)... = MS

In Views, find the options in the Standardization Style window:

DataRight IQ User’s Guide

When and how DataRight IQ assigns prenames

If the input does not include a prename, DataRight IQ assigns a prename based on its assigned gender code. If the gender is strong or weak, DataRight IQ assigns the appropriate prename (Mr. or Ms.). To avoid offending anyone, it does not assign a prename if the gender is ambiguous.

Input Gender assigned Output prename

Chuck Strempler Strong male Mr.

Adrian Mileau Weak male Mr.

Pat O’Malley Ambiguous (none)

Abbie Van Buren Weak female Ms.

Gladys Saeger Strong female Ms.

When the input includes a prename, DataRight IQ always carries over that prename to the output, even if the gender of the first and middle names suggests a different prename.

Input Output prename

Ms. Glenn Close Ms.

Mr. Stacey Keach Mr.

Mrs. Gerald Ford Mrs.

Female prename Normally, for a female name DataRight IQ assigns the prename Ms. For example,

given Mary Smith as input, DataRight IQ produces Ms. Mary Smith as output. However, in some situations you may prefer to assume marriage and assign the prename Mrs. You can tell DataRight IQ to use the prename Mrs. for input name lines such as John and Mary Smith:

Input Output

John and Mary Smith Mr. John and Mrs. Mary Smith

Chapter 3: Add new data to existing data

Prenames in output name lines

Include prenames To include the generated prenames in your output file’s name lines, include these

output fields in your output file setup:

 AP.Name_Line1-6  AP.Name1-6

And complete the following parameter in Job:

BEGIN Standardization/Assignment Control ======================

Standardize Lastline (Y/N)........... = Y

Non-Mailing Cities (CONVERT/PRESERVE) = PRESERVE

Case (UPPER/lower/Mixed/SAME)........ = Mixed

Use Generated Prenames (Y/N)......... = Y

*******The remaining block omitted for illustration********

Here’s the setup in Views:

Here’s a sample of the output:

Input Output

Anne Smith, Engr. Ms. Anne Smith, Engr.

Exclude prenames You can exclude the generated prenames from your output file’s name lines by

entering N in the Use Generated Prenames parameter in Job. In Views, you uncheck the Include Generated Prenames in Name Lines option in the Standardization Style window.

Here’s a sample of the output without the generated prename:

Input Output

Anne Smith, Engr. Anne Smith, Engr.

For complete step-by-step instructions, refer to the Views online help.

DataRight IQ User’s Guide

Salutations

Salutations Include a salutation in your correspondence. DataRight IQ generates a salutation

for each individual in the record, and it also generates a salutation for the entire record. Here are some of the salutation features in DataRight IQ:

 Create formal salutations such as Dear Mr. McKay.

 Create casual salutations such as Dear Dan.

 Customize the greeting word (such as Dear or Hello) and the ending

punctuation.

 Use alternate greetings under certain circumstances. For example, if the name

is absent you can use a title greeting such as Dear VP of Sales.

Here are the salutation options shown in both Views and Job:

Set up salutations in the Salutation Options window in Views.

BEGIN Salutation Options ====================

Salutation Format (FORMAL/CASUAL).... = FORMAL

Use Title Salutation if No Name (Y/N) = N

Salutation Initiator................. = DEAR

Salutation Punctuation............... = ,

Salutation Connector................. =

Short Greeting for All Male Record... = SIRS Short Greeting for All Female Record. = LADIES

Alternate Salutation................. =

Alternate Salutation Threshold(1-100) = END

Set up salutations in the Salutation Options block in Job.

Chapter 3: Add new data to existing data

DataRight IQ uses sophisticated logic

DataRight IQ uses sophisticated logic to create high-quality salutations. It does not follow a simplistic pattern of “Dear <Name>”. Instead, it looks at all of the name elements that are present and produces the best possible salutation.

For example, given the input J. Ewing, DataRight IQ never produces the undesirable salutation Dear J. Instead, DataRight IQ detects that the first name is an initial and produces the salutation Dear J. Ewing.

Find descriptions of these salutation options on the following pages:

 formal or casual

 customized

 multiname

 alternate for absent or low-scoring names

Choose formal or casual salutations

DataRight IQ creates salutations for all six names found in the record. You can choose formal salutations such as Dear Mr. Shakespeare or casual salutations such as Dear William.

Formal salutations For formal salutations, DataRight IQ uses the prename and last name whenever

possible. DataRight IQ uses the input prename or the DataRight IQ-generated prename:

Input name Gender assigned Formal salutation

Mary Smith Strong female Dear Ms. Smith,

Robert Jones Strong male Dear Mr. Jones,

Casual salutations For casual salutations, DataRight IQ generally uses the first name:

Input name Casual salutation

Alex Shaker Dear Alex,

If the first name is an initial only, DataRight IQ does not generate a salutation such as Dear J. Instead, DataRight IQ uses the first initial and the last name:

Input name Casual salutation

J. R. Ewing Dear J. Ewing

If no first name exists, DataRight IQ uses the prename and last name:

Input name Casual salutation

Mr. Smith Dear Mr. Smith,

DataRight IQ User’s Guide

Create the salutations you want

DataRight IQ creates a salutation for each name in the record (up to six). You can decide what initiator word, connector word and ending punctuation to use in your salutations.

Dear Mr. Smith and Ms. Jones:

Initiator Connector Punctuation

You can create salutations in any style you want. For example, you could create salutations such as Hello, Bob & Mary!

Create multiname salutations

Consistency within each salutation

Formal salutations for dual names

For each record, DataRight IQ generates a salutation for the entire record, which you can retrieve from the output field AP.Salute_Rec. The record salutation is a greeting for all of the persons in the record.

DataRight IQ attempts to use the same style for all the names in the record. For example, for a formal salutation, if a prename is unavailable for one or more names in the record, DataRight IQ uses first names and last names for all persons. This results in consistency within the salutation:

Input Formal salutation

John Smith and Pat Jones Dear John Smith and Pat Jones,

(not Dear Mr.

Smith and Pat Jones)

For dual names, the formal salutation uses a shared last name:

Input Formal salutation

John and Mary Smith Dear Mr. and Ms. Smith,

You can retrieve dual-name salutations from the output field AP.Dual_Salut.

Short greetings If a record contains multiple names of the same strong gender, you can generate

short formal salutations such as Dear Sirs:

Input names Formal greeting

Robert Jones and Bob Smith Dear Sirs:

Mary Smith and Jane Hammond Dear Ladies:

Sometimes name data is missing or a name receives a low parsing-confidence score. In these situations, you can generate alternate salutations.

Chapter 3: Add new data to existing data

Title salutation If a person’s job title is present but name data is absent, you can use the job title in

the salutation:

Input Salutation

Alternate salutation for low-scoring names

Name1: (none)

Dear Sales Mgr.,

Title1: Sales Mgr.

If a name receives a low parsing-confidence score, you may prefer to use a generic salutation rather than use questionable name data. For low-scoring names, you can use a generic salutation such as Dear Valued Customer or Dear Sports Fan.

For more information about parsing-confidence scores, see “Data-quality scores

and codes” on page 165.

DataRight IQ User’s Guide

Chapter 4: Custom dictionaries

A custom dictionary can help improve how DataRight IQ works for you. You can create custom dictionaries to improve casing and parsing results.

Learn how to create your own custom dictionaries in this chapter plus read about how custom dictionaries improve casing and parsing results.

Chapter 4: Custom dictionaries

Create a custom dictionary with UMD

To create a custom dictionary, use the User-Modifiable Dictionary program (UMD) installed with DataRight IQ.

UMD can help you create both a custom capitalization dictionary and a custom parsing dictionary. For instructions on using UMD, see the DataRight IQ Modifier’s Guide or the online help that accompanies UMD Views.

We recommend that you create a separate custom capitalization dictionary and use it in addition to the capitalization dictionary that comes with the software,

pwcap.dct.

Important: Each time you install a software update, we overwrite the file

pwcap.dct. To protect your work, give your dictionary a name other than pwcap.dct. This will prevent your dictionary from being overwritten the

next time that you install an update of the software.

For Views users: The easiest way to create a custom capitalization dictionary is to use the Capitalization Wizard. Start the Wizard by choosing Tools > Capitalization Wizard.

For Job users: Create a database containing your dictionary entries. Then, use UMD to convert the database to a capitalization dictionary. For instructions about what fields to include in your database, what data to enter in each field, and how to convert the database to a dictionary, see the DataRight IQ Modifier’s Guide.

How to use your custom dictionary

You can specify two capitalization dictionaries. We recommend that you use our capitalization dictionary, pwcap.dct, as dictionary #1 and your custom dictionary as dictionary #2.

Duplicates in dictionaries? If both dictionaries contain the same word, DataRight IQ uses the entry from dictionary 2.

The way you specify your custom dictionary depends on how you implemented your DataRight IQ product:

 In DataRight IQ Library, to use your custom dictionary you specify the path

and file name of the dictionary file.

 In Job, when you set up your job file, tell DataRight IQ to use your

capitalization dictionary in addition to the standard dictionary. In the Auxiliary Files section, list pwcap.dct as Capitalization Dictionary 1 and your custom dictionary as Capitalization Dictionary 2.

BEGIN Auxiliary Files =========================================

*************Portions omitted for illustration************

Capitalization Dct 1(path & pwcap.dct)= pwcap.dct Capitalization Dct 2(path & file.dct) =

*************Portions omitted for illustration************

END

DataRight IQ User’s Guide

In DataRight IQ Views, you specify your dictionaries in the Auxiliary Files window in the Dictionaries group:

Processing speed. Processing is a little slower when DataRight IQ consults two capitalization dictionaries. The difference in speed varies depending upon a number of variables, but expect processing to take about one percent longer than it would with one capitalization dictionary.

Chapter 4: Custom dictionaries

Improve casing results

Mixed-case results If you use mixed case, the general rule is to capitalize the first letter of the word

and put the rest of the word in lowercase. However, there are exceptions to that rule, such as McD

To handle mixed-case exceptions, DataRight IQ consults a capitalization dictionary. DataRight IQ includes a capitalization dictionary called pwcap.dct, which contains mixed-case exceptions.

onald, Ph.D., IBM, NY, and so on

How DataRight IQ capitalizes in mixed case

Improve mixed-case results

A capitalization dictionary is a list of words that are mixed-case exceptions. The dictionary contains the correct casing of a word and also indicates when that casing should be used.

For example, the capitalization dictionary has entries for both MS and CO:

Dictionary entry Usage

MS POSTNAME

CO STATE

The word MS is cased differently depending upon how it is used: MS as an abbreviation for the postname “Master of Science,” or Ms

as a prename. So, the entry in the capitalization dictionary indicates that MS (uppercase “S”) should be used only for postname data.

The word CO is cased differently depending upon how it is used: CO abbreviation for the state of Colorado, but Co

as an abbreviation for the word

as an

company. So, the dictionary entry indicates that CO (uppercase “O”) should be used only for state data.

Most DataRight IQ users find that our capitalization dictionary (pwcap.dct) is sufficient for producing good mixed-case results. However, it would be impossible for our capitalization dictionary to contain every mixed-case exception. If DataRight IQ does not case a word as you would like, you can create a custom capitalization dictionary.

DataRight IQ User’s Guide

For example, TechTel is not in our capitalization dictionary, so DataRight IQ capitalizes only the first letter of the word:

TECHTEL, INC.

Techtel Inc.

If you add the word TechTel to your custom capitalization dictionary, you can get the desired mixed-case results:

TECHTEL, INC.

TechTel Inc.

Improve parsing results

DataRight IQ uses a parsing dictionary to guide parsing, gender assignment, and standardization. You can create a custom dictionary to improve parsing results for your specific data.

Customizing your dictionary. For details on adding or editing dictionary entries, see the DataRight IQ Modifier’s Guide or UMD online help.

How DataRight IQ uses the dictionary

The parsing dictionary lists words and phrases and tells how they’re used. For example, the parsing dictionary tells DataRight IQ that the word Engineering can be used as a firm word (Smith Engineering Inc.) or in a job title (VP of Engineering).

DataRight IQ gets other information from the dictionary, too:

 The dictionary contains gender data. For example, the dictionary tells

DataRight IQ that the name Anne is a female name, and that Mr. is a male prename.

 The dictionary also tells DataRight IQ the standard and acronym forms of

words. For example, the dictionary indicates that Inc. is the standard form of Incorporated and that GM is the acronym for General Motors.

 The dictionary contains match standards. For example, the dictionary tells

DataRight IQ that the names Patricia and Patrick are potential matches for the name Pat.

Correct specific parsing behavior

Local names DataRight IQ’s name data is based on an analysis of U.S. residents. As such, the

You might customize the parsing dictionary to correct specific parsing behavior that you have seen in your output.

parsing dictionary is broadly useful across the United States. However, you may want to tailor the dictionary to better suit your data by adding ethnic or regional names. If DataRight IQ doesn’t recognize a specific name—for example, Jinco Xandru—you can add Jinco to the dictionary as a first name and Xandru as a last name.

Industry-specific jargon

Our parsing dictionary is useful across many industries. You might tailor the dictionary to better suit your own industry by adding special titles, prenames or postnames, acronyms, or other jargon words.

For example, if you process data for the real estate industry, you might add industry-specific postnames such as CRS, ABR, GRI.

Specific phrases Some words can be used in firm names and in job titles. As a result, DataRight IQ

may incorrectly parse some job titles as firm names. To improve parsing, you can add phrases to the dictionary.

Firm names containing personal names

Often a firm name is made up of personal names. As a result, DataRight IQ may incorrectly parse the firm as a personal name. For example, the catalog retailer J. Crew may be parsed as a personal name rather than as a firm.

Chapter 4: Custom dictionaries

To improve parsing, you can add multiple-word firm names to the dictionary. For example, to parse J. Crew as a firm rather than as a personal name, you could add J. Crew to the dictionary as a firm name.

Create a custom dictionary

To create a custom parsing dictionary, use our User-Modifiable Dictionaries (UMD) program. For instructions on using UMD, see the DataRight IQ Modifier’s Guide or UMD Views online help.

Note: If you add a word to a parsing dictionary and that word also has special mixed casing (such as Tech T

el), remember to also add the word to your

custom capitalization dictionary.

DataRight IQ User’s Guide

Unit 2:

DataRight IQ Job and Views

Contents

Job file



Views



Library



Set up a DataRight IQ job 49

Overview of data parsing 55

Details of data parsing 65

Process parsed data 81

Search-and-replace 89

Scan-and-split 107

Input and output 119

Reports and statistics 141

Data-quality scores and codes 165

Master job file 183

The chapters in this unit mostly apply to Job and Views users. If you use the Library implementation of DataRight IQ, there is a section in “Details of data parsing” on page 65 that explains some parsing features exclusive to Library. For complete Library information, see “Welcome to DataRight IQ Library” on

page 247.

DataRight IQ Job and Views

DataRight IQ User’s Guide

Chapter 5: Set up a DataRight IQ job

For almost every DataRight IQ job that you run, you need to perform some basic setup tasks. These tasks include setting up the following files:

 input files  auxiliary files (defaults)  output files

You also need to set your preferences for standardizing data and adding salutations. In addition, you need to set up your reports and verify that your job is ready.

Chapter 5: Set up a DataRight IQ job

Set up your input files

DataRight IQ can accept up to 255 input files for each job. You need to tell DataRight IQ where to find your input files and how to read them.

Almost everything you need to know about setting up input files is explained in Firstlogic’s Database Prep documentation. For detailed how-to instructions on setting up your input files, see that documentation.

Tell DataRight IQ the format of the file

Define the input fields

Tell DataRight IQ where the file is located

For flat files and some types of databases, you have to tell DataRight IQ the physical format of the input file. To do this, create a format file (also known as an FMT or DMT file). Our Database Prep manual explains how to set up a format file.

DataRight IQ recognizes a specific set of input fields called PW fields. If you want DataRight IQ to process the data in a field, you can do one of two things:

 Use the Modify PW Field parameter (see the Input File block).

 Map that field to one of the DataRight IQ-recognized PW fields. To map

fields, you must create a separate file called a definition file, or DEF file. In Views, you can create a DEF file using the DefMap tool in the Tools menu.

For instructions on setting up a DEF file, see our Database Prep manual. For a description of each PW field, see our Quick Reference. For more guidelines about setting DataRight IQ’s input fields, see “Define input fields” on page 51.

Most of the work of setting up your input files is done outside of DataRight IQ. Inside your DataRight IQ job file, the only thing you really need to do is provide the location and file name in the Input File section of your job.

DataRight IQ User’s Guide

Define input fields

Setting up input fields is an extremely important part of setting up your DataRight IQ job. The more accurately you set up the fields in your definition (DEF) file, the better DataRight IQ’s output parsing results will be.

For a complete list and descriptions of the DataRight IQ input fields, see our Quick Reference.

Define fields as precisely as possible

Define fields accurately

Browse data before defining fields

Prepare for floating data

The DEF file tells DataRight IQ what kind of data each input field contains. The more information you give DataRight IQ about the contents of a field, the better your parsing results will be.

When you define an input field, make sure that you account for all of the different kinds of data that occur in the field. If the content of the field varies from record to record, make sure that your field definition accounts for this.

A common mistake is to define a field based on its field name rather than on the actual contents of the field. This method often leads to inaccurate field definitions, which leads to inaccurate output results.

Field names can be deceiving. Before you write a DEF file, look at the data in your input database. It is much quicker to browse the input database and write an accurate DEF file than it is to rerun an entire job because of a mistake in the DEF file.

Usually you can present a floating field to DataRight IQ as a multiline or name/ firm field. However, sometimes a database floats between two completely different record formats. This usually results when someone appends one database to another without realizing that the two files have different formats. The result is a multiformat file.

Define input name format

Multiformat files require special setup; see “Straighten out a file that has multiple

record formats” on page 121.

Name format is the sequence of name components in a name line—for example, First-Middle-Last or Last-First-Middle. If the name format is consistent throughout your database, you get better name-parsing results if you tell DataRight IQ the name format.

For example, given the name Thomas Todd, DataRight IQ can correctly identify the first name and last name if you tell it the name format. However, if DataRight IQ does not know the name format, it’s unclear which is the first name and which the last—is it Thomas Todd or Todd Thomas?

Chapter 5: Set up a DataRight IQ job

Set up auxiliary files

To identify and process data, DataRight IQ depends on a set of auxiliary files. All y o u h a v e t o d o i s t e l l D a t a R i g h t I Q w h e r e t h e f i l e s a r e l o c a t e d o n y o u r c o m p u t e r .

About the files Here’s a brief description of how DataRight IQ uses these files.

File File name File type Contains Status

Rule drlrules.dat Data file DataRight IQ’s parsing

rules.

Pattern drludpm.dat Data file DataRight IQ’s user-

defined pattern information.

SSN drlssn.dat Data file DataRight IQ’s U.S. Social

Security number information.

Email drlemail.dat Data file DataRight IQ’s e-mail

address information.

Int’l phone drlphint.dat Data file DataRight IQ’s interna-

tional (non-U.S. and Canadian) phone numbers.

Parsing parsing.dct Dictionary

file

DataRight IQ’s parsing information (name, firm, and title).

Address addrln.dct Dictionary

file

Last line lastln.dct Dictionary

file

Firm line firmln.dct Dictionary

file

DataRight IQ’s address line information.

DataRight IQ’s last line information.

DataRight IQ’s firm (company) information.

Required

DataRight IQ User’s Guide

City city08.dir Directory

file

ZCF zcf08.dir Directory

file

Capitalization pwcap.dct Dictionary

file

Custom capitalization

Default ASCII

[name].dct Dictionary

file

[name].fmt Format file DataRight IQ’s default

FMT

Default DEF [name].def Definition

DataRight IQ’s city information.

DataRight IQ’s U.S. ZIP Code information.

DataRight IQ’s capitalization information.

Your own additional capitalization information.

ASCII format.

DataRight IQ’s default.

file

Optional

Set up your output files

Although it is possible to post data back to your input file, most DataRight IQ users post records to an output file or files. When you set up a new output file, you must perform two separate but equally important tasks:

 Set up the format of the output file.  Specify what data to put in each output field.

Create the format of the output file

If you create a new database for output, you must first define the format of that new file—the file type, the sequence of fields, field names, field lengths, and so on.

You can create the format of an output file in two ways: You can manually define the format of the file, or you can clone the format of an existing file. For more information about output format, see “Specify the output format you want” on

page 124.

Post data into the output fields

After you define the format of the output file, you must specify the content—what information will be posted into each output field.

You can post raw data from the input file, data processed by DataRight IQ, and new data generated during processing.

Enable output posting If you want to post data to an output file, make sure you enable output posting in

the Execution section of the job file. If output posting is disabled, DataRight IQ ignores your instructions for creating and posting to output files.

Chapter 5: Set up a DataRight IQ job

Verify that your job is ready

Before you can run your DataRight IQ job, you must verify that it’s ready. When DataRight IQ verifies a job, it makes sure that you have provided all of the information required to run the job. DataRight IQ also makes sure that all your job settings are valid.

Verifier messages During verification, DataRight IQ may issue two types of messages:

Message type Description

Error If DataRight IQ finds a problem that would prevent the job from

running, it gives an error message.

Warning If DataRight IQ finds a less serious problem, it gives a warning

message. This means there is a possibility the job will produce unexpected results.

Batch verifier When you start the batch-processing version of DataRight IQ (DataRight IQ Job),

you do so by typing a command line at your operating-system prompt. Below is an example.

fldiq /v /lmy_job c:\pw\dtr_iq\jobs\my_job.diq

DataRight IQ automatically starts verifying that the job is ready for processing. It stops on the first serious error. After you correct the error, you start the verification process all over again.

Views verifier DataRight IQ Views offers a handy way to verify jobs. Views can find and

present more than one error at a time. You can jump directly to a trouble spot by selecting an error or warning message and clicking the Go To button.

DataRight IQ User’s Guide

Chapter 6: Overview of data parsing

You can use DataRight IQ to identify and isolate various data and data components from fields of information. We call this parsing. .

What DataRight IQ parses

DataRight IQ can identify and isolate various data and data components from fields of information, including:

• street or “land” addresses and last lines

• e-mail addresses

• Social Security numbers

•dates

• U.S. and international phone numbers

• user-defined patterns

• names and titles of persons

•firms

For a complete list of the input fields that DataRight IQ accepts, see Firstlogic’s Quick Reference documentation. For a more detailed description about how DataRight IQ parses each type of data, see the next chapter.

Chapter 6: Overview of data parsing

DataRight IQ uses parsing dictionary and word patterns

To identify and parse data, DataRight IQ uses a parsing dictionary and then examines words and word patterns. It then isolates data into different types of fields.

Dictionary lookups As DataRight IQ processes each field, it breaks each field into words. DataRight

IQ looks up each word in a parsing dictionary (parsing.dct). The parsing dictionary helps DataRight IQ determine what type of data each word might be.

For example, the parsing dictionary tells DataRight IQ Library that the word Engineering could be part of a job title, firm name, or firm location.

Word patterns After DataRight IQ performs dictionary lookups, it looks at the position and

sequence of words in the field. DataRight IQ examines word patterns to help identify data components.

For example, the word Te ch te l is not in the parsing dictionary. However, if an input field contains Tech te l In c ., DataRight IQ can correctly identify it as a firm name, because DataRight IQ knows that a word (or words) followed by Inc. is a firm name.

Isolate data components for output

After DataRight IQ identifies individual data elements, it isolates each component. For example, if given Techtel Inc., Dept. of Engineering as input, DataRight IQ can isolate Techtel Inc. as the firm name and Dept. of Engineering as the firm location.

Techtel Inc., Dept. of Engineering

Firm Name: Firm Location:

Techtel Inc. Dept. of Engineering

DataRight IQ User’s Guide

DataRight IQ uses rule-based parsing

One of the ways that you can modify DataRight IQ’s behavior is by creating or editing parsing rules in the rule file (drlrules.dat). The rule file controls how DataRight IQ parses name and firm data.

Rule-file organization The rule file has a straight-forward organization. It consists of a header,

explanatory information, and parsing rules grouped by data type.

The header The file’s header identifies the DataRight IQ rule file. You must not alter or

delete the header.

DRL Rule File v1.0; # DO NOT EDIT, MODIFY OR REMOVE THE ABOVE LINE!!!!! #

Explanatory information You can add notes and explanations for rules in your rule file. These notes must

be commented out by typing a pound sign (#) at the beginning of the note.

# The following types will be used in each example # # NAME_DESIGN = ATTN # PRENAME = MR # NAME_STRONG_FN = JOHN

Parsing rules by data type Lastly, the rule file consists of rules grouped by data type. Groups of rules

include:

 Name rules  Dual name rules  Firm rules  Address Line rules  Last Line rules  Optional rules, not enabled by default

####################################################### # NAME RULES # # # #######################################################

####################################################### # Titles by themselves # Admin # title1 = TITLE_ALONE; options = begin : end; action = DRL_PERSON conf: 40; DRL_PERSON = 1 : DRL_TITLE : L; end_action

Modifying the rule file The rule file has hundreds of rules for many different possible combinations of

data. These rules will likely satisfy the parsing needs of most DataRight IQ users. However, you can add to or change the rule file. Refer to the DataRight IQ Modifier’s Guide for details.

Chapter 6: Overview of data parsing

Presumptive parsing

For name and firm information, when the rules don’t apply, DataRight IQ uses presumptive parsing.

DataRight IQ tries its rule-based parsing with name (or firm) rules when input on a name or firm line, respectively. If data doesn't match a rule (and you have activated presumptive parsing in your setup), it uses presumptive parsing to make a best guess. With presumptive parsing, a name or firm will always be parsed out of a name or firm line.

The input does go to the rule set first, so in some cases the rules will match only part of the entry and parse that out. The remaining will go to extra.

Some examples With rule-based parsing, data that matches a rule will be parsed according to that

rule.

In the example below, the nameline data “

xb1wc so34bod2jc” is recognized as

junk data when parsing by the rules (the data has numbers and/or no letters in it) and so it’s sent to the Extra output field.

Input (on a name line) Output with rule-based parsing

Field Data

xb1wc so34bod2jc Extra1 xb1wc so34bod2jc

If you turn presumptive parsing on, the same data is parsed as first and last name because it came in on a nameline.

Input (on a name line) Output with presumptive parsing

Field Data

xb1wc so34bod2jc First name xb1wc

Last name so34bod2jc

If you have a legitimate name in your data (like “john smith” among the same junk data, below), parsing will pull out John as a first name, Smith as a last name, and the rest as Extra—regardless if you use rule-based parsing or presumptive parsing—because the data matched a parsing rule.

Input Output with presumptive parsing on

Field Data

xb1wc john smith so34bod2jc First name John

Last name Smith

Extra1 xb1wc so34bod2jc

DataRight IQ User’s Guide

Turn presumptive parsing on

In Job, activate presumptive parsing for both name and firm lines in the Parsing Control box:

BEGIN Parsing Control =============================================

Parsing Mode (NONE/PARSE)............ = PARSE

Presumptive Parse Name Lines......... = Y

Presumptive Parse Firm Lines......... = Y

END

In Views, select the Parsing Setup group in the main window, then open the Parsing Control window. Select the Use Presumptive Parsing for Name Lines and Use Presumptive Parsing for Firm Lines options:

Chapter 6: Overview of data parsing

Parse discrete components and lines

Data files contain data that is arranged in many different formats. DataRight IQ accepts input data several ways:

 discrete components  data as whole lines

Discrete components Each field in a discrete

component file contains one piece of information. The Prename field contains only a prename and that’s it. There is a field for a first name, last name, postname, and so on

Prename First Name Middle Name Last Name Postname Title Firm Name Firm Location

Mr. John A. Smith Jr. Engineer Firstlogic Inc. Dept. of Engrg.

so that the input record looks like this one at right. These fields are in essence already parsed.

Because of the nature of discrete fields, DataRight IQ does not perform its own parsing. For example, if you tell DataRight IQ that a piece of data is a first name, DataRight IQ will not try to parse it as a last name, even if it looks like a last name.

Data as whole lines DataRight IQ can accept data as whole lines. The name line contains a prename,

first, middle, and last name, postname, and title. The address contains city, state, and ZIP Code. The e-mail line contains user, host, and domain information.

Name Line Firm Line Address e-mail

Mr. John A. Smith Jr., Engineer Firstlogic Inc., Dept. of Engrg. 100 Harborview Plz, La Crosse, WI 54601 ja.smith@firstlogic.com

DataRight IQ User’s Guide

Parse floating data

Sometimes a database contains fields that include different data from record to record. In other words, data floats between fields. DataRight IQ recognizes floating data in several types of fields:

 multiline field.  name/firm field  “non-addr” field

Multiline Some data entry systems provide lines where data entry operators can enter any

data. There may be little or no consistency about the position of data within these lines. We call this a multiline format.

Here are some guidelines for multiline data:

 You may pass in multiple names on one line—for example, John and Mary

Jones, or Bill Johnson and Craig Andrews.

 You may combine name and job-title data on the same line—for example,

Robert Smith, Software Engineer.

Separate firm from name or title data. Do not pass in firm data on the same line with name or job-title data, for example, John Smith, Engineering Services Inc. DataRight IQ will usually parse the entire line as a firm name, because personal names commonly occur within firm names.

A multiline field may contain any of the following:

 A name line (may include a job title).

 A job title.

 A firm name or firm location or both.

 An address line.

 City, state, and/or ZIP Code.

 Other data such as a U.S. Social Security number, phone, date, e-mail

address, and data that matches a user-defined pattern.

When DataRight IQ processes a multiline field, it identifies and isolates the data listed above. Any other data is sent to an output field called AP.Extra.

Name/firm field A name/firm field is the same as a multiline field except that it never contains

address data or any data from other parsers. A name/firm field may contain any of the following:

 a name line (may include a job title)

 a job title

 a firm name or firm location or both

When DataRight IQ processes a name/firm field, it identifies and isolates name, title, and firm data. Any other data (including address data or data from other parsers) is sent to an output field called AP.Extra.

Chapter 6: Overview of data parsing

Non-address field A non-address field is the same as a name/firm field in that it does not contain any

address data. However, it can contain data from other parsers. A non-address field may contain any of the following:

 any name/firm field

 an e-mail address

 a date

 a phone number

 a UDPM field

 A SSN

DataRight IQ User’s Guide

DataRight IQ’s multiline parsing order

When input is on a multiline, DataRight IQ parses data in the following order:

Order Parsed item

Street address and lastline

E-mail address

U.S. Social Security number

Date

Phone number (U.S. or Canadian)

Phone number (International)

User-defined pattern

Name and title

Firm

Why order? The order in which DataRight IQ parses your data is important. Why? Because if

DataRight IQ identifies data as one thing before it can evaluate it as another, you may get unexpected results.

For example, if DataRight IQ identifies a nine-digit number as a U.S. Social Security number, then it won’t evaluate that data as a potential international phone number. Likewise, if you set up a custom pattern that looks for 5-digit numbers, anything recognized as a ZIP code is not going to make it through to get evaluated against your pattern.

When parsing, DataRight IQ looks through each record for different types of data. For each type of data, DataRight IQ makes a separate “pass” through the data. If DataRight IQ finds something on one pass, it extracts that data—and on the next pass examines only the data that remains.

When an item is recognized, it doesn’t go to the next step.

Chapter 6: Overview of data parsing

Modify how DataRight IQ parses

DataRight IQ parses better than DataRight, in part because you can decide how it parses. You have more control over how DataRight IQ parses.

You can, of course, set the input fields and retrieve the output fields as usual. And with DataRight IQ you can create custom dictionaries just like you can with DataRight—by using Firstlogic’s User-Modifiable Dictionary (UMD) utility

But with DataRight IQ you can create or edit parsing rules in the rule file (drlrules.dat). The rule file controls how DataRight IQ parses name and firm data. Also, you can define patterns and rules for parsing data that DataRight IQ does not already parse.

Use pre-defined rules DataRight IQ already provides hundreds of rules for many different possible

combinations of data. These rules will likely satisfy the parsing needs of most users. However, you may encounter data that isn’t being parsed the way that you want it to be parsed. Or, maybe you want to tweak a rule so that it returns a different confidence score. In situations like this, it is very handy to be able to edit the rule file.

Edit the rule file The DataRight IQ rule file (drlrules.dat) controls how DataRight IQ parses

groups of output type subcomponents for name and firm data.

Turn off parsing engines for multiline files

DataRight IQ already provides hundreds of rules for many different possible combinations of data. These rules will likely satisfy the parsing needs of most DataRight IQ users.

However, you may encounter data that isn’t being parsed as you’d like it to be. Or, maybe you would like to tweak a rule so that it returns a different confidence score. In situations like this, it is very handy to be able to edit the rule file.

For more information on editing the rules by which DataRight IQ parses, see the DataRight IQ Modifier’s Guide.

To help you control how DataRight IQ parses, the software gives you the ability to directly control what DataRight IQ parses. For each input line of a multiline file you can selectively turn off the parsing of addresses, names, firms, Social Security numbers, dates, phone numbers, user-defined patterns, and e-mail addresses.

In your DataRight IQ Job product, see the Multiline Parsing block.

DataRight IQ User’s Guide

Chapter 7: Details of data parsing

DataRight IQ can identify and isolate various data and data components from fields of information: See the table below.

Fields that DataRight IQ parses For more information, see

street or “land” addresses and lastline page 66

e-mail addresses page 67

U.S. Social Security numbers page 69

dates page 72

phone numbers page 73

user-defined patterns page 76

names and titles of persons page 77

firms page 77

For a complete list of the input fields that DataRight IQ accepts, see Firstlogic’s Quick Reference for Views and Job-File Products documentation.

Library only In addition, there are some features of parsing in DataRight IQ Library that are

not included in Views and Job. They are:

 Associate a title with a name on the same line (page 78)

 Associate a title with a name on a different line (page 79)

 Associating name lines with title lines (page 80)

Note: Some DataRight IQ implementations have inherent differences in how they operate. DataRight IQ’s Views and Library implementations, for example, process data differently. Views can accept databases and process batch files, while Library accepts data one record at a time.

Keep these differences in mind when you read DataRight IQ’s documentation. For example, when the documentation refers to accessing databases or to a batch process, this doesn’t apply to DataRight IQ Library.

Chapter 7: Details of data parsing

Parse street addresses

DataRight IQ parses address data (sometimes called “street addresses” to distinguish them from other addresses, such as e-mail addresses).

DataRight IQ accepts U.S. and Puerto Rican address lines. DataRight IQ accepts city, state, and ZIP Code data either together on one line or as discrete fields when followed directly by each other.

Components

La Crosse WI

Whole lines

100 Harborview Plaza La Crosse WI 54601-4051

54601-4051

Note: If you have more than one address line (not counting city-state-ZIP), present all of your address lines to DataRight IQ as multiline (floating) fields. Don’t present one line as an address line and the other as a multiline field.

DataRight IQ User’s Guide

Parse e-mail addresses

When DataRight IQ parses input data that it determines is an e-mail address, it places the components of that data into specific fields for output. Below is an example of a simple e-mail address:

sales@firstlogic.com

By identifying the various data components (user name, host, and so on) by their relationships to each other, DataRight IQ then assigns the data to specific fields.

Fields used DataRight IQ outputs the individual

components of a parsed email address— that is, the email user name, complete domain name, top domain, second domain, third domain, fourth domain, fifth domain, and host name.

For inputting and outputting e-mail address information, DataRight IQ uses the fields listed at right:

What DataRight IQ does

With DataRight IQ, you can do the following things with an e-mail address:

 Parse the e-mail address, either in a field by itself or combined in a field with

other data.

 Break the domain name down into sub-elements.

 Verify that an e-mail address is properly formatted.

 Flag the address for special handling (see “Flag addresses” on page 67).

Input fields Output fields

PW.Email1-6 and multiline fields

AP.Email1-6 AP.EmailUser1-6 AP.EmailAllD1-6 AP.EmailTopD1-6 AP.Email2ndD1-6 AP.Email2ndD1-6 AP.Email3rdD1-6 AP.Email4thD1-6 AP.Email5thD1-6 AP.EmailHost1-6 AP.EmailISP1-6

Not verified Several aspects of an e-mail address are not verified by DataRight IQ. DataRight

IQ does not verify:

 whether the domain name (the portion to the right of the @ sign) is

registered.

 whether an e-mail server is active at that address.

 whether the user name (the portion to the left of the @ sign) is registered on

that e-mail server (if any).

 whether the personal name in the record can be reached at this e-mail

address.

Flag addresses You can flag e-mail addresses based on a list of criteria you create or maintain.

For example, if you focus on B2B, you might want to flag consumer-oriented domain names such as hotmail, yahoo, or aol.com.

Chapter 7: Details of data parsing

You flag addresses by matching them against a list of hosts and domain names in a file named drlemail.dat. The software sets e-mail or ISP as T (true) or F (false). Then, using a filter on the T/F output, you can post a flag (to the field EmailISP1-

6) that you can use to separate this data. Then you can process these flagged addresses separately from other e-mail addresses.

E-mail components The AP field where DataRight IQ places the data depends on the position of the

data in the record. DataRight IQ follows the Domain Name System (DNS) in determining the correct output field.

When DataRight IQ parses the following data:

expat@london.home.office.city.co.uk

it would assign these components to the following fields according to DNS.

Sample data Field Field description

expat EmailUser1 The user name, or “addressee.” The

person or department, for example, for whom the e-mail is intended.

london.home.office.city.co.uk EmailAllD1 The “all D” is the entire domain.

uk EmailTopD1 The top level domain. In DNS, the

highest level of hierarchy after the root (the last “dot”). In a domain name, that portion of the domain name that appears farthest to the right, often “com,” “org,” “gov,” and so on.

home.office.city.co EmailTopD2-5 The elements between the host and top

level domain.

london Host The element immediately to the right

of the “at” symbol (@).

For example, with the input data, expat@london.home.office.city.co.uk, DataRight IQ outputs each element in the following fields:

AP field Output value

AP.Email AP.EmailUser1 AP.EmailAllD1 AP.EmailTopD1 AP.Email2ndD1 AP.Email3rdD1 AP.Email4thD1 AP.Email5thD1 AP.EmailHost1 AP.EmailISP1

= expat@london.home.office.city.co.uk = expat = london.home.office.city.co.uk = uk = co = city = office = home = london = F

DataRight IQ User’s Guide

Parse Social Security number

DataRight IQ parses U.S. Social Security numbers (SSNs) that are either by themselves or on an input line surrounded by other text.

Fields used For inputting and outputting U.S. Social Security number information, DataRight

IQ uses the following fields:

Input fields Output fields

PW.SSN1-6

AP.SSN1-6

and multiline fields

The six available PW.SSN fields (PW.SSN1-6) store output Social Security number data.

Setting up SSN parse in DataRight IQ

Example of data: Typical field length:

DataRight IQ has two SSN options to set up for parsing:

 Specify the file location that contains SSN information

 Select the type of delimiter to use

123-45-6789 9-11 characters

SSN information file Specify the location of the SSN information file in the Auxiliary Files block in

DataRight IQ.

SSN delimiter In the Standardization/Assignment Control block, you can determine what

delimiter you want to output the SSN with by setting the SSN Delimiter option.

In Job, there is a note at the end of the block specifying the delimiters to choose from In Views, choose from options in a drop-down list.

How DataRight IQ parses Social Security numbers

DataRight IQ parses Social Security numbers in two steps:

1. Identifies a potential SSN by looking for any of three patterns:

Pattern Digits per grouping Delimited by

nnnnnnnnn 9 consecutive digits n.a.

nnn nn nnnn 3, 2, and 4 (for area, group, and serial) spaces

nnn-nn-nnnn 3, 2, and 4 (for area, group, and serial) all supported

delimiters

Chapter 7: Details of data parsing

2. Performs a validity check on the first five digits only. Two outcomes of this

validity check are possible:

Outcome Description

Pass DataRight IQ successfully parses the data—and the Social Secu-

rity number comes out in an AP field.

Fail DataRight IQ does not parse the data because it’s not a valid SSN

as defined by the U.S. government—so the data comes out as Extra, unparsed data.

Check validity When performing a validity check, DataRight IQ doesn’t verify that a particular

9-digit Social Security number has been issued, or that it’s the correct number for any named person. Instead, it validates only the first 5 digits (area and group). DataRight IQ doesn’t validate the last 4 digits (serial)—except to confirm they are digits.

SSAdata DataRight IQ’s validation of the first 5 digits is driven by a table from the Social

Security Administration (http://www.ssa.gov/foia/highgroup.htm

). That table is updated monthly as the SSA opens new groups. The rules and data that guide this check are available at http://www.ssa.gov/history/ssn/geocard.html

Update your SSN file Firstlogic provides the Social Security Number (SSN) file (drlssn.dat)f or

DataRight IQ customers interested in parsing recently issued and existing U.S. Social Security numbers. The SSN file is updated monthly with the latest SSN information from the U.S. government. Firstlogic will convert the data to a format that DataRight IQ can use and post the data by the 5th of every month.

You can obtain the most current SSN file from Firstlogic Customer Portal site at

http://www.firstlogic.com/customer. This area provides you with the opportunity

to download the latest drlssn.dat file used to parse U.S. Social Security numbers within DataRight IQ.

Outputs valid SSNs Outputs only Social Security numbers that pass its validation. If an apparent SSN

fails validation, DataRight IQ does not pass on the number as a parsed, but invalid, Social Security number.

Other U.S. ID numbers

Your data may include other numbers used in the United States for governmental identification purposes. DataRight IQ’s capability is aimed at U.S. Social Security numbers, which are, in effect, Tax IDs for individuals. However, other numbers include ITIN and EIN.

Name Description

ITIN Individual Taxpayer Identification Number

This “cousin” to the Social Security number is what the IRS assigns to people who earn money and pay federal income taxes but who are not citizens (they are resident or non-resident aliens). An ITIN looks like an SSN except that it begins with the number 9.

DataRight IQ treats an ITIN as an invalid SSN. It might match the pattern, but not make it through the check against the SSN table, so an ITIN will come out as unparsed Extra.

DataRight IQ User’s Guide

Name Description

EIN Employer Identification Number

Synonymous with a corporate Tax Identification Number (TIN) or Tax ID, this number is also 9 digits. However, its pattern is

nn-nnnnnnn

. Because of

that, the EIN is not recognized by DataRight IQ’s SSN parser.

Use UPDM to parse other patterns. If you need to parse patterns that aren’t covered by one of DataRight IQ’s usual parsing engines, use DataRight IQ’s UDPM (user-defined pattern matching) feature.

Chapter 7: Details of data parsing

Parse dates

DataRight IQ recognizes dates in a variety of formats and breaks those dates into components.

Fields used For inputting and outputting date

information, DataRight IQ uses the following fields:

Formats and delimiters

DataRight IQ supports the following formats and delimiters. That is, you can select any one of these formats to standardize dates.

Format Example

yyyy*mm*dd 2004 01 27

yy*mm*dd 04 01 27

dd*mm*yyyy 27 01 2004

dd*mm*yy 27 01 04

mm*dd*yyyy 01 27 2004

mm*dd*yy 01 27 04

dd*mmm*yy 27 Jan 04

dd*mmm*yyyy 27 Jan 2004

mmm*dd*yyyy Jan 27 2004

Input fields Output fields

PW.Date1-6

AP.Date1-6

and multiline fields

Delimiter* Description

<none> no space

<space> a space

–

* Delimiters appear between date components only for formats that have delimiters. That is, you also have the option of using no delimiters (<none>).

forward slash

dash

backward slash

period

mmm*dd*yy Jan 27 04

yyyymmdd 20040127

yymmdd 040127

ddmmyyyy 27012004

ddmmyy 270104

mmddyyyy 01272004

mmddyy 012704

DataRight IQ can parse up to six dates from your defined record. That is, DataRight IQ identifies one or more dates (up to six) in the input, breaks found dates into components, and makes dates available as output in either the original format or a user-selected standard format.

DataRight IQ User’s Guide

Parse phone numbers

t field

Output field

DataRight IQ can parse both North American (U.S. and Canada) and international phone numbers. When DataRight IQ parses a phone number, it outputs the individual components of the number into the appropriate AP fields (see the examples below).

Fields used For inputting and outputting phone

number information, DataRight IQ uses the following fields:

npu

PW.Phone1-6 and multiline fields

AP.USPhone1-6 AP.USAreaCod1-6 AP.USPhonPre1-6 AP.USPhonLin1-6 AP.USPhonExt1-6 AP.USPhonTyp1-6

AP.IntPhone1-6 AP.IntCtryCd1-6 AP.IntCityCd1-6 AP.IntPhNum1-6 AP.IntPhDesc1-6

U.S. versus international phone numbers

U.S. and Canada What DataRight IQ calls U.S. phone numbers should be more properly called

Phone numbering systems differ around the world. DataRight IQ recognizes phone numbers by their pattern and (for non-U.S. numbers) by their country code, too.

North American phone numbers. The Canadian phone number standard follows the same pattern as U.S. phone numbers. Because of this, when DataRight IQ parses a phone number that’s either from the U.S. or Canada, it posts the data to AP.USPhoneX.

DataRight IQ searches for U.S. phone numbers by commonly used patterns such as: (234) 567-8901, 234-567-8901, and 2345678901.

DataRight IQ gives you the option for some reformatting on output (such as your choice of delimiters). Below is an example with extension text:

Input data: (901) 234-5678 EXT 1234

Output data: 901-234-5678 Ext. 1234

Europe and Pacific-Rim DataRight IQ searches for European and Pacific-Rim numbers by pattern. The

patterns used are stored in drlphint.dat. They require that the country code appear at the beginning of the number. DataRight IQ doesn’t offer any options for reformatting international phone numbers. Also, DataRight IQ doesn’t crosscompare to the address to see if the country and city codes in the phone match the address.

Chapter 7: Details of data parsing

Phone number components

Phone numbers consist of different output components depending on whether they’re U.S. or international numbers.

Individual components for:

U.S. phone numbers non-U.S. phone numbers

 area code  prefix  line number  extension  line type

 country code  city code  number  description

Example of U.S. phone number

Say you have the following U.S. phone data for input:

Work (308)-555-8402 ext 34

DataRight IQ parses the data in the following AP fields:

Some of these fields (namely, area code, extension, and type) are optional. If your input doesn’t have appropriate values for these fields, DataRight IQ leaves them empty.

Phone type There is a finite list of what

can be returned as a phone type.

AP field Output value

AP.USAreaCod1 308

AP.USPhonPre1 555

AP.USPhonLin1 8402

AP.USPhonExt1 34

AP.USPhonTyp1 Work

Phone type Possible input values

Business business, bus, work

Home home, hme, personal

Data data

Voice voice, vmail

Fax fax

BBS bbs

Cellular cell, cellular, mobile

DataRight IQ User’s Guide

Example of non-U.S. phone number

Say you have the following international (non-U.S.) phone data for input:

61-9-0123-4567

DataRight IQ parses the data in the following AP fields (all data must be present for the phone data to be valid):

AP field Output value Note

AP.IntCtryCd1 61

AP.IntCityCd1 9

AP.IntPhNum1 0123-4567

AP.IntPhDesc1 Australia Populated based on the Country ID.

Important: DataRight IQ accepts international phone numbers only as they

would be dialed from the U.S. For example, the number must start with the appropriate country code.

Also, if presented on a line with other data, the international phone number must start the line.

Formatting output data

DataRight IQ lets you specify the prefix (for example, EXT.) that you want to indicate a phone number's extension. For more information, see “Standard Phone

Extension” on page 241.

Chapter 7: Details of data parsing

Parse user-defined patterns

Parse any number or alphanumeric

With DataRight IQ you can parse data that’s outside the range of name, title, address, and so on. With DataRight IQ’s user-defined pattern matching (UDPM) feature, you can parse a wide variety of data such as:

 account numbers

 part numbers

 purchase orders

 invoice numbers

 VINs (vehicle identification numbers)

 driver license numbers

In other words, DataRight IQ can parse any kind of number or alphanumeric for which you can define a pattern.

Fields used For inputting and outputting user-defined pattern information, DataRight IQ uses

the following fields:

Input fields Output fields Description of output field

PW.Pattern1-4 and multiline fields

AP.Pattern1-4 AP.PatnLabel1-4 AP.Patnsub1-4_1-5

The pattern The label for the pattern The subpattern(s) of the pattern

The pattern label is created in the

drludpm.dat

file when the pattern is defined.

How it’s done DataRight IQ is able to parse patterns through its user-defined pattern matching

(UDPM) feature, which uses regular expressions. That is, you can set up data patterns to suit your data (such as part numbers), and DataRight IQ can parse your data according to those user-defined patterns.

DataRight IQ’s UDPM feature makes possible the parsing and extraction of virtually any kind of data that conforms to a pattern—any type of data pattern that can be expressed using regular expressions.

Define your pattern When you create a user-defined pattern, you must include a carriage return/

linefeed at the end of the line. All characters before the carriage return/linefeed— even blank spaces—are considered part of the pattern.

For more information For more information on UDPM and setting up the patterns that DataRight IQ

will parse, see the DataRight IQ Modifier’s Guide, which accompanies the DataRight IQ product. This ability is for advanced users of DataRight IQ. You should read and follow all warnings before changing how your product works.

DataRight IQ User’s Guide

Parse names and titles

DataRight IQ can parse name and title data.

A person’s name can consist of the following parts: prename, first name, middle name, last name, postname, and so on.

DataRight IQ can accept up to six names and titles as discrete components. DataRight IQ also accepts name and title data on partial lines or whole lines. The name line or multiline field may contain one or two names per line.

Components

Mr. John A. Smith Jr. Accountant

Parse firms

Partial lines

Mr. John A. Smith Jr., Accountant

Whole line

Mr. John A. Smith Jr., Accountant

DataRight IQ can parse firm and company data.

DataRight IQ accepts firm name and firm location data. A firm location is a department, building, mail stop, or other location within a company. DataRight IQ accepts firm names and firm locations as components or whole lines.

Components

Firstlogic Inc.

Whole line

Firstlogic Inc., Dept. of Accounting

Dept. of Accounting

Chapter 7: Details of data parsing

Library only: Associate a title with a name on the same line

When a name and job title are on the same input line, by default DataRight IQ associates the title with a name in the line. This makes it easier for you to retrieve a personal name and the job title that goes with that name.

For Job and Views users: This functionality is controlled by several

parameters in the Standardization/Assignment Control block. See “Tell DataRight

IQ whether to associate data on title lines with data on name lines and between multilines. Associating a name with a title can make it easier for you to retrieve a personal name and the job title that goes with that name.” on page 239.

Multiple names and titles

If names and titles occur on the same line, DataRight IQ associates each title with the closest preceding name:

John Smith, Supervisor and Mary Smith, Mgr.

Name1 Title1 Name2 Title2

Mr. John Smith Supervisor Ms. Mary Smith Mgr

Dual name and title For dual names such as John and Mary Smith, by default DataRight IQ associates

the job title with the first of the two names:

John and Mary Smith, Manager

Name1 Title1 Name2

Mr. John Smith Manager Ms. Mary Smith

Title2

DataRight IQ User’s Guide

Library only: Associate a title with a name on a different line

Suppose your input records contain multiple unfielded names and job titles. In your output records, you want to create separate fields for each person and his or her job title. To do this, you need to match each job title with a person:

Enable interline association

Line1 Line2 Line3

John Smith and Ed Jones, Mgr. Ann Rose Software Engr.

Name1

Title1 Name2 Title2 Name3 Title3

Mr. John Smith

Mr. Ed Jones Mgr. Ms. Ann Rose Software Engr.

We provide several options so that you can associate a name on one input line with a job title on another input line.

By default, DataRight IQ is conservative. If a name and title are not on the same line, DataRight IQ does not make any association between them. Instead, it parses the title as a separate, “nameless” person:

Mr. John Smith Manager

Name1 Title1

Mr. John Smith

Name2 Title2

Manager

If you prefer, DataRight IQ can associate a name on one line with a title on another.

Note: Regardless of how you set the title-association options, a title must follow a name in order for it to be associated with that name. If a title precedes a name, DataRight IQ will never associate the title with that name.

CEO Kathryn Jones

Name1 Title1 Name2 Title2

Chapter 7: Details of data parsing

CEO Kathryn Jones

Library only: Associating name lines with title lines

You can associate name lines with their corresponding title lines—for example, DRL_INAMELINE1 with DRL_ITITLELINE1:

Associating data that is input on a multiline

DRL_INAMELINE1 DRL_ITITLELINE1

Mr. Bob Smith CEO

Name1 Title1

Mr. Bob Smith CEO

Most users will probably want to set up this level of association.

You can make an association even if the name or title (or both) occurs on a multiline:

DRL_ILINE1 DRL_ILINE2 DRL_ILINE3

Mr. Bill Johnson Firstlogic Inc. Software Engr.

Name1 Title1

Mr. Bill Johnson Software Engr.

If you allow association on multilines, DataRight IQ can associate a name and title even if they are on nonconsecutive lines (as shown in the example).

DataRight IQ User’s Guide

Chapter 8: Process parsed data

When parsing data, DataRight IQ can process in many ways. During processing, DataRight IQ can do the following:

 Standardize data in several ways

 Add new data

 Remove unwanted data

 Convert dates

 Post the data that you want

Chapter 8: Process parsed data

Standardize data

DataRight IQ can help you make data more consistent from record to record.

Correct inconsistent name format

You can use DataRight IQ to straighten out disorderly name-line fields. For example, you might be processing records in which most names are in FirstMiddle-Last order, but in some records the sequence is Last-First-Middle. DataRight IQ can help you put them all in the same format, because you can retrieve name components and post them to your output file in any sequence.

Input John A. Smith, Jr.

Jones, Mary R.

Output John A. Smith, Jr.

Mary R. Jones

Note: If you post whole name lines (rather than components), DataRight IQ

outputs in FML (first-middle-last) name format.

Convert case You can use DataRight IQ to convert case to upper, lower, or mixed case, or you

can choose to preserve case. In mixed case, DataRight IQ intelligently cases words such as McD for the same word. For example, DataRight IQ correctly cases CO (Colorado) and Co

onald, Ph.D., and IBM. It can even recognize different casings

as a state

as the abbreviation for the word “company.”

For more information, see “Convert case” on page 27.

Standardize business words

To correct inconsistencies among records, DataRight IQ standardizes common job-title and firm words. For example, DataRight IQ takes abbreviations such as Internatl, Intrntl, and Interntl and converts them all to Intl. DataRight IQ can also convert firm names to acronyms. For example, given General Motors Corp. as input, DataRight IQ can produce GM as output.

Standardize and assign last-line data

DataRight IQ User’s Guide

For details, see “Standardize data” on page 23.

DataRight IQ can also do some limited standardization and assignment of lastline data (city, state, ZIP):

 Assign city, state, or ZIP Code data if it is missing or not valid.

 Convert vanity city names to post-office city names—for example, given

Hollywood as input, produce Los Angeles as output.

For details, see “Standardize data” on page 23.

Add new data

DataRight IQ not only helps you correct the data you already have—it also adds new data.

Gender codes DataRight IQ assigns a highly descriptive gender code for each name (up to six

per record). For example, DataRight IQ can assign a code to tell you that Reginald White is a male and Lynn Jones is probably a female.

For more information, see “Gender codes” on page 32.

Prenames If a name has a strong or weak gender, then DataRight IQ can add a prename. If

the name is determined to be ambiguous, it will not add a prename. You can retrieve the prename as a separate component or as part of a name line:

Data Description

Input Dan and Anne McKay

Alice McKay Terry Fitsimmons Pat Tubler

Output Mr. Dan and Mrs. Anne

McKay Ms. Alice McKay Mr. Terry Fitsimmons Pat Tubler

Strong male and female Strong female Weak male Ambiguous

“Terry” is a weak male name so it’s assigned “Mr.” “Pat” is ambiguous, so it is not assigned a prename.

For more information, see “Prenames” on page 34.

Salutations You can use DataRight IQ to create salutations such as Dear Mr. McKay or Dear

Dan. You can customize the greeting word (such as Dear or Hello) and the

ending punctuation. You can also use alternate greetings under certain circumstances—for example, if the name is absent you can use a title greeting, such as Dear VP of Sales.

For more information, see “Salutations” on page 37.

Match standards Given an input nickname, DataRight IQ can often produce a match-standard

name. For example, given the input nickname Al, DataRight IQ can return names that are potential matches, such as Alan and Alphonse. If your database can perform multiway matching, you can use the match standards to improve matching performance.

Data-quality scores and codes

DataRight IQ generates scores and codes describing the quality of the data found in input and output fields. DataRight IQ produces codes that identify what components were parsed from each record, the types of changes made to data components, and the parsing errors associated with specific record components.

For more information, see “Data-quality scores and codes” on page 165.

Chapter 8: Process parsed data

Remove unwanted data

You can use DataRight IQ to remove extraneous or unwanted data from a field.

Use search-andreplace

Use standard functions

You can use the search-and-replace feature to remove unwanted data from a field. For example, you could remove designators from a name field:

Before After

John Smith, Trustee Anne Jones, Beneficiary

John Smith Anne Jones

For more information, see “Search-and-replace” on page 89.

You can also use DataRight IQ functions to modify, convert, and manipulate data. For example, suppose a field in your database sometimes contains a parenthetical comment such as (Deliver to back door) at the end of the field.

You could use DataRight IQ functions to look for parentheses in the field, then delete the parentheses and everything between them.

Before After

123 Main St (back door) 345 10th Ave (deliver after 2:00)

123 Main St 345 10th Ave

You would use DataRight IQ functions to delete the parenthetical data. You’d first check to see whether the field contains parentheses. If it did, you would extract and keep everything except the parentheses and their contents. To do that, you would find the character position of the opening parenthesis and extract everything up to that point.

To do this, you would write an expression similar to this:

iif(“(” $ DB.Field, substr(DB.Field, 1, at(“(”, DB.Field)-1), DB.Field)

For more information about what functions are, how they work, and how to use them, see Firstlogic’s Database Prep documentation.

DataRight IQ User’s Guide

Convert dates

You can convert dates from 2-digit years to 4-digit years.

Suppose you manage a membership database for a professional organization. The database contains a date indicating when a person’s membership expires. The field has a 2-digit year—for example, 99-10-25. Some memberships expire in the 2000s—for example, 03-12-31 indicates that the membership expires on December 31, 2003.

You can use DataRight IQ to convert the dates from 2-digit years to 4-digit years:

Before After

04-12-31 2004-12-31

Convert 1900s and 2000s

Convert to and from almost any format

Convert date-type or character-type fields

98-04-25 19

99-01-01 19

05-03-22 20

98-04-25

99-01-01

05-03-22

You can convert dates to 4-digit years even if some dates are in the 2000s. Just tell DataRight IQ what year to use as the cut-off. For example, if you set the cutoff year to “05,” DataRight IQ converts the years 00 through 05 to 2000s (20xx) and the years 06 through 99 to 1900s (19xx).

DataRight IQ can convert dates to and from almost any format:

 Convert almost any input format. For example, DataRight IQ supports

formats such as MM/DD/YY (01/23/04), YY-MMM-DD (04-Jan-23), DDMMYY (230104), MMM DD, YY (Jan 23, 04) and more. (Refer to “Formats and delimiters” on page 72 for a list of all date formats.)

 Choose from a variety of date delimiters ranging from no spaces to dashes, or

slashes. (Refer to “Formats and delimiters” on page 72 for a list of delimiters.)

DataRight IQ doesn’t require that your dates be stored in date-type fields. DataRight IQ can convert the format of dates stored in character-type fields or date-type fields.

You can convert the format of dates, particularly in character-type fields. For example, you could convert from a DD-MM-YYYY format to a YYYY-MMMDD format.

Chapter 8: Process parsed data

Post the data that you want

DataRight IQ offers a variety of output data so that you can post the data you want into each output field.

For complete descriptions of DataRight IQ output fields, see Firstlogic’s Quick

Reference. For an introductory discussion of output posting, see Firstlogic’s Database Prep documentation.

Name and title c o m p o n e n t s a n d l i n e s

Firm components and lines

For names and job titles, DataRight IQ offers individual components for up to six names. DataRight IQ offers individual name lines for up to six names. DataRight IQ also provides complete name lines containing the same name data as the input name line.

Input Dr. Mary R. Smith, M.D. and Mr. Doug A. Jones, Jr., Vice President

Available for output

Components

Prename 1: Dr. First Name 1: Mary Middle Name 1: R. Last Name 1: Smith Maturity Postname 1: Other Postname 1: M.D. Title 1:

Prename 2: Mr. First Name 2: Doug Middle Name 2: A. Last Name 2: Jones Maturity Postname 2: Jr. Other Postname 2: Title 2: Vice President

Line for each name

Name 1: Dr. Mary R. Smith, M.D. Name 2: Mr. Doug A. Jones Jr.

Name line

Dr. Mary R. Smith M.D. and Mr. Doug A. Jones Jr. Vice President

DataRight IQ provides two kinds of firm data: firm names and firm locations. A firm location is a department, mail stop, or other location within a company.

Input Firstlogic Inc., Dept. of Accounting

DataRight IQ User’s Guide

Available for output

Components

Firm: Firstlogic Inc. Firm Location: Dept. of Accounting

Line

Firstlogic Inc. Dept. of Accounting

Address components and lines

DataRight IQ offers address and last-line components as well as address lines and a last line. Address-line components are offered for use in matching; we don’t recommend them for general use.

Input 800 W Benton St Apt 6, PO box 123

Tomah WI 54660-1474

Parsed data: Standardized or unstandardized

Available for output

Components

Primary Range: 800 Secondary Addr: Apt 6 Secondary Range: 6 PO Box Number: 123

City: Tomah State: WI ZIP: 54660 ZIP4: 1474

Lines

Address: 800 W Benton St Apt 6, PO box 123 Primary Address: 800 W Benton St PO Box Line: PO box 123 Last Line: Tomah WI 54660-1474

Note: Similar data are available for rural-route addresses.

DataRight IQ offers two kinds of parsed data: standardized and unstandardized.

 Standardized data is altered by DataRight IQ according to your settings and

DataRight IQ’s standardization rules.

 Unstandardized data is identified and parsed into individual components or

lines, but the data is not altered. Casing and spelling are left exactly as they appear in the input file.

Input John Mckay Jr, ACCOUNTANT

Firstlogic Incorp., dept. of accounting

Output

Name: Title: Firm: Firm Location:

Standardized

Mr. John McKay, Jr. Accountant Firstlogic Inc. Dept. of Accounting

Unstandardized

John Mckay Jr ACCOUNTANT Firstlogic Incorp. dept. of accounting

You can retrieve standardized data from AP fields, and unstandardized data from APU fields. For a list of fields, see our Quick Reference.

Chapter 8: Process parsed data

New data During processing, DataRight IQ generates new data.

Input Doug Jones

New data Name: Mr. Doug Jones

Gender: 1 (strong male) Prename: Mr. Salutation: Dear Mr. Jones: Match Std: Douglas

You can retrieve new data from DataRight IQ application (AP) fields. For a list of DataRight IQ application fields, see our Quick Reference.

Note: DataRight IQ also generates parsing-confidence scores and data-quality codes. For more information about scores and codes, see “Data-quality scores

and codes” on page 165.

Raw data from the input file

Overcome fieldnaming conflicts

You can copy raw data directly from the input file to the output file.

For example, suppose your input file contains data that you don’t want to process with DataRight IQ’s parsers. You want to preserve the data so that it is exactly the same from input to output file. For each record, you can carry that data over from the input file to the output file.

To copy data directly from the input file, use the field name as it appears in your format file (FMT or DMT) or your dBASE3 file, prefixed with “DB.” For example, for the input field BIRTHDATE, you could post DB.BIRTHDATE.

Note: To post raw data from many fields at once, you can use the Copy Input Data To Output File option in the Post To Output File section of the job file. For setup details, see the Views online help.

Suppose you’re processing two input files. One file has a field called PART_NO and another has a field called PART_ID. You want to post the part numbers to the output file. However, if you post DB.PART_NO you’ll get data from the first input file but not the second. Likewise, if you post DB.PART_ID, you’ll get data from the second file but not the first.

To overcome the difference in input field names (PART_NO versus PART_ID), present both input fields to DataRight IQ as PW.PART_NO. You could then post PW.PART_NO in the output file. Your DEF file entries would look like this:

DataRight IQ User’s Guide

DEF file 1:

PW.PART_NO = PART_NO

DEF file 2:PW.PART_NO = PART_ID

Note: You can also use user-defined PW fields to overcome field-naming conflicts. For more information, see Firstlogic’s Database Prep documentation.

Chapter 9: Search-and-replace

DataRight IQ’s search-and-replace feature lets you modify data or filter records according to search-and-replace results.

There are many ways that you can apply the search-and-replace feature to your jobs. Here are four ways for you to use this feature:

 Convert coded data (see page 99)

 Remove unwanted data from a field (see page 100)

 Search and put (see page 101)

 Select a subset of records (see page 102)

In this chapter read an overview of the search-and-replace feature (“Simple

search and replace” on page 90), then learn how to set up your own search-and-

replace (“How to use search-and-replace” on page 92).

This chapter ends with examples of the four ways to use search and replace (listed above), and a few other examples of ways that you can use this feature to enhance your job results (“Additional examples” on page 103).

Chapter 9: Search-and-replace

Simple search and replace

When you use search and replace, you can search for

 a substring

 a word

 a pattern

 the entire contents of a field

and replace it with another value.

For example, if a company changes its name, you could search for the old company name and replace it with the new company name. To do this type of traditional search-and-replace, you must perform the search-and-replace on a field in the input database (a DB field).

Substring Use the string search-and-replace method to replace a string of characters that

may be found next to or between other characters in a field. For example, you could strip extraneous punctuation marks from a field:

Search value Replace value Before After

/ (space) John/Jones John Jones

Use the string search-and-replace method carefully. The search string is

replaced whenever it is found, even if it’s part of a word:

Search value Replace value Before After

And & Gerald K. And

erson Gerald K. &erson

Processing speed Processing will be slowed if you conduct a substring search. Speed will vary

depending upon a number of variables, including file sizes and the size of the search-and-replace table.

Word Use the word search-and-replace method to replace a word found within a field.

Search value Replace value Before After

New Renewal New subscriber Renewal subscriber

DataRight IQ User’s Guide

Pattern Use the pattern search to search for patterns within a field that you can replace

with a value. For example, a database contains vehicle identification numbers (VIN) that you want to replace with the make of the auto.

Search value Replace value Before After

^[A-Z0-9]([F])[A-Z0-9]{7}([A-

Ford 1FMDU34X7TZA04833 Ford

Z])([A-Z])([0-9]{6})$

Field Use the field search-and-replace method to match and replace the entire contents

of a field. The entire field must match the search value, and the entire field is replaced:

Search value Replace value Before After

Occupant Current Resident Occupant Current Resident

Current Occupant Current Occupant

Chapter 9: Search-and-replace

How to use search-and-replace

Setting up a search-and-replace process involves three main steps:

1. Create a search-and-replace table

2. Create a search-and-replace function

3. Use the search-and-replace function.

Create a search-andreplace table

Create a search-andreplace function

Use the search-andreplace function

First, you need to specify what to search for and how to replace it. You use a search-and-replace table to tell DataRight IQ each search value and its replacement value.

You can use an internal table created within the job file or an external table that resides in a separate file. (See “Step 1: Create a search-and-replace table” on

page 93.)

After you create a search-and-replace table, you need to specify how and when to conduct the search. You use a search-and-replace function to tell DataRight IQ how to conduct the search. (See “Step 2: Create a search-and-replace function” on

page 95.)

To actually perform a search-and-replace, you must use your search-and-replace function elsewhere in the job. You can use the function on input or on output. When you use the function, you will specify which field to search and where to place the results.

Tell DataRight IQ when to conduct the search (on input or output), what field to search, and where to place the results by using the function anywhere a filter can be applied. (See “Step 3: Use the search-and-replace function” on page 96.)

DataRight IQ User’s Guide

Step 1: Create a search-and-replace table

If you want to conduct a search-and-replace, the first step is to tell DataRight IQ exactly what values to search for and how to replace them. To do this, you need to create a search-and-replace table.

Search-and-replace table

Suppose you store prenames as 2-byte codes. You want to convert the codes to prenames. Create a table to tell DataRight IQ each code and its replacement value.

For example, if you wanted to convert prename codes to prenames, your searchand-replace table might look like this:

Search for Replace with

01 Mr.

02 Mrs.

03 Ms.

04 Miss

05 Dr.

06 Rev.

07 Rabbi

08 Lt.

09 Col.

10 Gen.

Internal versus external tables

How to create an internal table

You can use two types of search-and-replace tables: internal and external.

 Use an internal table for a job-specific task. For example, if your input file

contains prename codes unique to that file, you could use an internal table to convert the codes.

 Use an external table for a frequently performed task. For example, suppose

your company uses the same prename codes in all its databases. Whenever you process a database, you need to convert the codes. You could create an external table and use it every time you need to convert the codes.

To create an internal table, set up the Create Internal Table section in the job file. For setup details, see the Views online help.

Chapter 9: Search-and-replace

How to create an external table

An external table resides in a separate file and can be used over and over again for any DataRight IQ job.

 If you have DataRight IQ Views, the quickest way to create a new external

table is to use the Search And Replace Wizard in the Tools menu.

 If you own DataRight IQ Job, create a database containing a search field and

a replace field. For each record, type a search value and its replacement value. Then, use the User-Modifiable Dictionary (UMD) program to convert the database to an external search-and-replace table. For guidelines on how to format the database and use UMD, see the DataRight IQ Modifier’s Guide.

Table entries are independent

DataRight IQ does not search for values in the order in which they appear in the search-and-replace table. Instead, DataRight IQ stores search-and-replace values in a sequence that optimizes look-ups. It searches from the longest string to the shortest string. Set up your table with the intention that each entry is used in a separate search-and-replace action.

If you need to perform one search-and-replace before you perform another, set up two separate search-and-replace actions and nest one inside the other. For more information about nested functions, see the Database Prep manual.

DataRight IQ User’s Guide

Step 2: Create a search-and-replace function

A search-and-replace table is just the first step; it tells DataRight IQ each search value and its replacement value. You also have to specify how and when to conduct the search.

The search-and-replace function tells DataRight IQ that this function contains the following information:

 The function name.

 Which search-and-replace table(s) to use.

 Whether to search for a substring within a field, a word within a field, or the

entire contents of a field. Options are field, word, string, and pattern.

 What action to take if the search field does not contain any of the search

values from your table.

 Case sensitivity.

Name and build your function

To give DataRight IQ these instructions, you will create and use a search-andreplace function. The function tells DataRight IQ how to conduct the search.

To create a search-and-replace function, open a Create Search/Replace Function window.

Enter a name for the function.

Tell DataRight IQ which search-and-replace table(s) to use.

Search the entire contents of a field, a word within a field, a substring within a field, or a pattern within a field.

Leave unmatched data intact or replace it with a default value.

Tell DataRight IQ to ignore case.

Refer to “Create Search/Replace Function” on page 204 for descriptions of the parameters in this block. Views users may consult the Views online help.

Chapter 9: Search-and-replace

Step 3: Use the search-and-replace function

You can apply a search-and-replace function anywhere that you can apply a filter:

Job-file block Purpose Function location

Views Job

Apply your function on input

Input File Apply a function to the

input file.

Post to Input File Post to Output File

Apply a function before posting modified data to

Modify PW Field

Post Data setting

Copy (Source, Destination)

Copy (source, destination)

an input or output file.

Input File Post to Output File Input List Description

Select a set of records for processing or for inclusion in an output

Filter setting Filter

file, list, or report.

Report: Parsing Error Report: Change Report: All Records

Select a set of records for inclusion in a report.

Report Options, Record Filter

Record Filter

By setting up the search-and-replace function in the Input File block, you are telling DataRight IQ to conduct the search on input. The search-and-replace function tells DataRight IQ how to conduct the search. When you actually use the function, you tell DataRight IQ the following information:

 Which field do you want to search?

 Where do you want to place the results?

Using the example from “Search-and-replace table” on page 93, suppose your coded prename data is stored in a field called PREFIX. You want to convert the codes to prenames on input so that DataRight IQ can process the prename data.

In Views: Modify PW Field Tell DataRight IQ what fields to search by using the Modify PW Field tool in the

Input File block.

To access Modify PW Field, click Modify in the Input File window after you’ve entered the input file path and filename.

In the Modify PW Fields window, you specify where to place the results of the search-and-replace function. (In our example, it is the Pre_Name field.) Then you build the expression that you will use. The expression consists of your searchand-replace function, and the database field PREFIX from the example.

Where to place results Search-and-replace

function

Field to search

DataRight IQ User’s Guide

After you set up Modify PW Fields, DataRight IQ will search the PREFIX field in your input file and place the search-and-replace results in the DataRight IQ input field PW.Pre_Name.

In Job In Job, you only need to enter your expression in the Copy (source, destination)

parameter:

*BEGIN Post to Input File ======================================

Input File (location & file name).... =

Copy (source,destination)............ =ConvertPre(DB.PREFIX), Pre_Name

END

Apply your function on output

By setting up the search-and-replace function in the Post to Output File block, you are telling DataRight IQ to conduct the search on output.

In Views

Where to place the results.

Search-andreplace function

DataRight IQ will post the results of the PREFIX field (after it’s been run through the search-and-replace function) to the output field PRE_NAME.

In Job Enter the function expression in the Copy parameter:

BEGIN Post to Output File ===================================== Output File (location & file name)... =

Existing File (APPEND/REPLACE)....... =

Output Filter (to 1024 chars)........ =

Nth Select Type (USER/AUTO/RANDOM)... = USER

User Nth Select (1.0 - ???).......... =

Maximum Number of Records to Output.. =

Character Encoding (See NOTE)........ =

Unicode Conversion Name.............. =

Copy Input Data to Output File (Y/N). = n

Copy (source,destination)............ = ConvertPre(DB.PREFIX), Pre_Name

END

Field to search.

Select a set of records for processing

Use a search-and-replace function to process a set of records to be included in an input file, output file, list, or report. You can apply the function in these blocks:

 Input File  Post to Output File  Input List Description  Report: Parsing Error  Report: Change  Report: All Records

Chapter 9: Search-and-replace

In Views Click Report Details in the report blocks to apply your function. In the other

blocks, use the Filter feature. Below is the sample function in the Report Details window:

In Job The function is in the Record Filter parameter.

*BEGIN Report: Change ========================================== Location and File Name/Printer Device =

Existing File (APPEND/REPLACE)....... =

Number of Copies (1 to 10)........... =

Case (UPPER/Mixed)................... =

**********Portions omitted for illustration *************

Record Filter (to 1024 chars)........ = ConvertPre(DB.PREFIX)

Nth Select Type (USER/AUTO/RANDOM)... = USER

User Nth Select (1.0 - ???).......... =

Max # of Records to Print............ = 500

Field Type (AP/CUSTOM)............... = AP

Custom Copy (src,len[,title])........ =

END

DataRight IQ User’s Guide

Convert coded data

You can use search-and-replace to convert data on input. For example, you could convert numeric prename codes to actual prename data (for example, 1 = Mr.) so that DataRight IQ could work with the prename data during processing.

You can also convert data before posting it to an output file. For example, to save storage space you might prefer to store prenames as 1-byte codes. You could use search-and-replace to convert prenames to codes, then post the codes to the output file.

Example Suppose your input file has a PREFIX field containing coded prename data. You

want to convert the codes to prenames. Since DataRight IQ uses prename data during processing, you want to convert the codes on input.

Create a search-andreplace table

Create a search-andreplace function

First, set up a search-and-replace table showing each code and its corresponding prename:

BEGIN Create Internal Table ==============================

Internal Table Name (to 20 chars).... = Prename Table

Table Entry (search,replace)......... = 1, Mr.

Table Entry (search,replace)......... = 2, Mrs.

Table Entry (search,replace)......... = 3, Ms.

Table Entry (search,replace)......... = 4, Miss

Table Entry (search,replace)......... = 5, Dr.

Table Entry (search,replace)......... = 6, Rev.

END

Next, set up a search-and-replace function. Specify the name of the table and the type of search to perform. For this example, we’ll do a field search because the code fills the entire 1-byte PREFIX field. We don’t want to leave any stray codes, so if DataRight IQ finds a code that is not in our table, we’ll replace it with a default value of nothing (indicated by two double-quotation marks with nothing in between).

BEGIN Create Search/Replace Function =====================

Function Name (to 10 chars).......... = conv_pre

Internal Table Name (to 20 chars).... = Prename Table

External Table (path & file name).... =

Search Priority (INTERNAL/EXTERNAL).. = Search & Repl. Method (FIELD/WORD/STR)= field

Default Return Action (ORIG/DFLT).... = dflt

Default Return Value ................ = ""

Case Insensitive Search/Replace(Y/N)........... = N

END

Use the search-andreplace function

Finally, use the search-and-replace function elsewhere in the job. For our example, we want to convert the coded data as it is input, so we’ll use the searchand-replace function at the Modify PW Field parameter in the Input File block.

When we use the function, we specify which field to search (the search field) and where to place the results (the destination field):

Modify PW Field = conv_pre(DB.PREFIX), Pre_Name

S/R function

Search field

Chapter 9: Search-and-replace

Destination PW field

Remove unwanted data from a field

You can use search-and-replace to remove unwanted data from a field. The search-and-replace feature lets you search for substrings or words within a field, or for the entire contents of a field. This means you can target very specific data to be removed from a field while leaving the rest of the field intact.

Example Suppose you are processing a file in which the Name field in each record includes

a phrase—for example, Mr. John Doe, beneficiary. As you input each record, you want to remove the phrase from the Name field.

Create a search and replace table

Create a search-andreplace function

Use the search-andreplace function

First, set up a search-and-replace table. The easiest way to remove a date from a name field is to search for each of the numerals 0–9 and the delimiter character ( / ), and replace each with nothing (an empty string):

BEGIN Create Internal Table ===============================

Internal Table Name (to 20 chars).... = ben_table

Table Entry (search,replace)......... = beneficiary

END

Next, set up a search-and-replace function. Specify the name of the table and the type of search to perform. For this example, we need to do a substring search (str) because each search value may lie next to or between other characters.

BEGIN Create Search/Replace Function =========================

Function Name (to 10 chars).......... = removeben

Internal Table Name (to 20 chars).... = ben_table

External Table (path & file name).... =

Search Priority (INTERNAL/EXTERNAL).. = Search & Repl. Method (FIELD/WORD/STR)= str

Default Return Action (ORIG/DFLT).... = orig

Default Return Value ................ =

Case Insensitive Search/Replace(Y/N)..= Y END

Finally, use the search-and-replace function elsewhere in the job. For our example, we want to modify data on input, so we’ll use the search-and-replace function at the Modify PW Field parameter in the Input File block.

100

DataRight IQ User’s Guide

When we use the function, we specify which field to search (the search field) and where to place the results (the destination field):

Modify PW Field = removeben(DB.Name), Name_Line1

S/R function

Search field

Destination PW field

Business objects DATARIGHT IQ 7.70C User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

DataRight IQ

User’s Guide

Contents

Preface

About this guide This guide is divided into three units: User’s Guide, Job-File Reference, and

Conventions used The following conventions are used throughout this guide and other Firstlogic

Other related documentation

DataRight IQ Overview

What is parsing? DataRight IQ identifies and isolates data from other data. We call this parsing. It

How DataRight IQ works

Job and Views Below are the basic steps DataRight IQ (Job and Views) takes as it processes a

Library Below are the basic steps that DataRight IQ (library implementation) takes as it

What DataRight IQ can do

Convert file format If you rent or purchase data, or if you process data from multiple sources,

Parse data You can use DataRight IQ to identify and isolate a wide variety of data—even if

Select records DataRight IQ (Job and Views) offers advanced record-selection features on input

Standardize data DataRight IQ can standardize data to make your records more consistent. Some

Assign gender and prenames

Create personalized greetings

Perform advanced search-and-replace

Split combined names If your records include combined names—most often, married couples or other

Scan-and-split The scan-and-split feature is designed to more precisely arrange data from fields

Migrate data from a mainframe system

Identify and parse floating data

Remove extraneous information

Convert coded data Suppose your database has several fields that contain coded data. You eventually

Convert and parse dates

Prepare records for match processing

Parse data DataRight IQ can parse (identify) individual data components and put them into

Standardize firm names

Convert nonmailing city names

Provide match standards for name data

Example This example shows how DataRight IQ can prepare records for matching.

Convert file type and format

Convert floating data to fielded data

Name format

Known name format When the software knows the name format (and has compared the data to the

Inconsistent name format

Address data

Limited last-line standardization

Convert nonmailing city names

Convert words to acronyms

Convert case

Intelligent mixed case The basic rule of mixed case is to capitalize the first letter of the word and put the

Other standardization options

Common job-title words

Address-line punctuation

Retain original punctuation

Phone number formats, and extensions

Formats for dates Use DataRight IQ to change date formats. DataRight IQ offers many date formats

Output standardized data

Gender codes

Add precise gender information

Reliable gender data The name and gender data that DataRight IQ uses for determining gender is

Gender codes and how to retrieve them

Prenames

Add prenames as separate components or in name lines

When and how DataRight IQ assigns prenames

Female prename Normally, for a female name DataRight IQ assigns the prename Ms. For example,

Prenames in output name lines

Salutations

Salutations Include a salutation in your correspondence. DataRight IQ generates a salutation

DataRight IQ uses sophisticated logic

Choose formal or casual salutations

Create the salutations you want

Create multiname salutations

Create a custom dictionary with UMD

How to use your custom dictionary

Improve casing results

Mixed-case results If you use mixed case, the general rule is to capitalize the first letter of the word

How DataRight IQ capitalizes in mixed case

Improve mixed-case results

Improve parsing results

How DataRight IQ uses the dictionary

Correct specific parsing behavior

Local names DataRight IQ’s name data is based on an analysis of U.S. residents. As such, the