Business objects UMD 2.56 User Manual

UMD

User’s Guide

Version 2.56 March 2003

UMD command line and configuration file

Query the dictionaries to investigate parsing and mixed-casing

Create custom capitalization dictionaries

Create search-and-replace tables

Notices
Published in the United States of America by Firstlogic, Inc., 100 Harborview Plaza, La Crosse, Wisconsin 54601-4071.
Customer Care Technical help is free for customers who are current on their ESP. Advisors are
available from 8 a.m. to 6 p.m. central time, Monday through Friday. When you call, have at hand the user’s manual and the version number of the product you are using. Call from a location where you can operate your software while speaking on the phone. To save time, fax or e-mail your questions, and an advisor will call or e-mail back with answers prepared. Or visit our Knowledge Base on the Customer Portal web site, where you can find answers on your own, right away, at any time of the day or night.
Our Customer Care group also manages our customer database and order processing. Call them for order status, shipment tracking, reporting damaged shipments or flawed media, changes in contact information, and so on.
What do you think of this guide?
Legal notices
Phone
888-788-9004 in the U.S. and Canada; elsewhere 608-788-9000
Web site
E-mail
Product literature
Corporate receptionist
http://www.firstlogic.com/customer
customer@firstlogic.com
888-215-6442, fax 608-788-1188, or
information@firstlogic.com
608-782-5000, or fax 608-788-1188
The Firstlogic Technical Publications group strives to bring you the most useful and accurate publications possible. Please give us your opinion about our documentation by filling out the brief survey at http://customer.firstlogic.com/surveys/default.asp
.
We appreciate your feedback! Thank you!
© 2003 Firstlogic, Inc. All rights reserved. This publication and accompanying software are protected by U.S. copyright law and international treaties. No part of this publication or accompanying software may be copied, transferred, or distributed to any person without the express written permission of Firstlogic, Inc.
National ZIP+4 Directory © 2003 United States Postal Service. Firstlogic Directories © 2003 Firstlogic, Inc. All City, ZCF, state ZIP+4, regional ZIP+4, and supporting directories are also protected under the Firstlogic copyright. Firstlogic, Inc. holds a nonexclusive license to publish and sell ZIP+4 databases on optical and magnetic media. The price of the Firstlogic product is neither established, controlled, nor approved by the U.S. Postal Service.
Firstlogic, Inc., or any authorized dealer distributing this product, makes no warranty, expressed or implied, with respect to this computer software product or with respect to this manual or its contents, its quality, performance, merchantability, or fitness for any particular purpose or use. It is solely the responsibility of the purchaser to determine its suitability for a particular purpose or use. Firstlogic, Inc. will in no event be liable for direct, indirect, incidental, or consequential damages resulting from any defect or omission in this software product, this manual, the program disks, or related items and processes, including, but not limited to, any interruption of service, loss of business or anticipatory profit, even if Firstlogic, Inc. has been advised of the possibility of such damages. This statement of limited liability is in lieu of all other warranties or guarantees, expressed or implied, including warranties of merchantability or fitness for a particular purpose.
2 UMD User’s Guide
1L, IL (ball design), ACE, ACSpeed, DataJet, DocuRight, eDataQuality, Firstlogic, GeoCensus, i·d·Centric, IQ Insight, MailCoder, PostWare, Postalsoft, Postalsoft Address Dictionary, Postalsoft DeskTop Mailer, Postalsoft DeskTop PostalCoder, Postalsoft DeskTop Presort, Postalsoft Manifest Reporter, PrintForm, RapidKey, Total Rewards, and TrueName are registered trademarks of Firstlogic, Inc. DataRight, Entry Planner, FirstPrep, IRVE, iSummit, Label Studio, Match/Consolidate, Postalsoft Business Edition by Firstlogic, and TaxIQ are trademarks of Firstlogic, Inc. All other trademarks are the property of their respective owners.
Contents
Preface..............................................................................................................5
Chapter 1:
Custom parsing dictionaries ......................................................................... 7
Step 1: Query the dictionary.............................................................................9
Step 2: Create a parsing transaction file .........................................................13
Step 3: Put your entries in the transaction file................................................14
Step 4: Build your custom parsing dictionary ................................................16
Step 5: Maintain and update your custom dictionary .....................................17
Sample transaction: Add a new word .............................................................18
Sample transaction: Add a title phrase ...........................................................19
Sample transaction: Add a multiple-word firm name.....................................20
Sample transaction: Add a firm that looks like a personal name....................22
Sample transaction: Modify information codes..............................................23
Sample transaction: Modify standards and standard-types.............................24
Sample transaction: Add an acronym for acronym conversion......................25
Rules for working with match standards ........................................................26
Chapter 2:
Custom capitalization dictionaries............................................................. 29
Step 1: Create a capitalization transaction file................................................30
Step 2: Put your entries in the transaction file................................................31
Step 3: Build your custom capitalization dictionary.......................................32
Step 4: Update your custom dictionary ..........................................................33
Chapter 3:
Search-and-replace tables........................................................................... 35
Step 1: Create a search-and-replace transaction file.......................................36
Step 2: Build the search-and-replace table ...................................................38
Appendix A:
UMD configuration file, umd.cfg ................................................................39
Appendix B:
UMD command line......................................................................................41
Appendix C:
Information codes and standard-type codes ..............................................43
Index...............................................................................................................47
Contents 3
4 UMD User’s Guide

Preface

About this guide This guide explains how to customize your DataRight program to suit your
needs. This customization includes using the User-Modifiable Dictionary (UMD), which is a tool for viewing and customizing dictionary files.
This guide explains how to use the command-line version of UMD to create custom parsing dictionaries, custom capitalization dictionaries, and DataRight search-and-replace tables.
Related documents Before using UMD, you should understand how your Firstlogic product uses
the dictionaries. See your product documentation for details.
UMD Views In UMD Views, the step-by-step process of creating and maintaining a
dictionary is different from the process described in this manual.
If you use UMD Views, do not rely on this guide. Instead, get online tips, procedures, and information while you work:
•Click the Help button on the first screen of UMD Views.
•Press F1 from any UMD Views screen.
Click the button in the upper right corner of any UMD Views screen, then click the item for which you want information.
Preface 5
Conventions The following conventions are used throughout this manual.
Convention Description
Bold
Italics
Menu commands
Changes
We use boldface type for file names and paths. When we’re explaining something that you would type on your computer, bold­face indicates something that you should type exactly as shown; for example, “Type
cd\dirs
.”
We use italics for emphasis. When we’re explaining something that you would type on your computer, italics indicate an item for which you should substitute your own data or values; for example, “Type a name for your job, along with the
.job
extension (
jobname
.job
).”
We indicate commands that you choose from menus in the follow­ing format: Menu Name | Command Name. For example, “Choose File | New.”
We use a change bar in the right margin to mark product changes since the last version.
We use this symbol to alert you to important information and poten­tial problems.
We use this symbol to point out special cases that you should know about.
6 UMD User’s Guide
Chapter 1:

Custom parsing dictionaries

What is a parsing dictionary?
Our name-parsing technology identifies and parses name, title, and firm data. The parser looks up words in the parsing dictionary to get information. The parser then uses the dictionary information, as well as word patterns, to identify and parse name, title, and firm data.
The parsing dictionary contains entries for words and phrases. Each entry tells how the word or phrase might be used. For example, the dictionary indicates that the word Engineering can be used in a firm name (such as Smith Engineering, Inc.) or job title (such as VP of Engineering).
The dictionary also contains other information:
Acronyms
Match standards
Gender
Probabilities
The dictionary contains the standard and acronymic forms of
Inc.
words. For example, the dictionary indicates that dardized form of
Business Machines
Incorporated
.
and
IBM
is the acronym for
is the stan-
Intl
The dictionary contains match standards (potential matches). For example,
Patrick
and
Patricia
are match standards for
Pat.
The dictionary contains gender data. For example, it indicates that
Anne
is a feminine name and
Mr.
is a masculine prename.
The dictionary indicates the likelihood that a name is a first name rather than a last name. For example, there is a 40 percent chance
Martin
that
is a first name rather than a last name.
Chapter 1: Custom parsing dictionaries 7
Why create a custom dictionary?
Our base parsing dictionary contains thousands of name, title, and firm entries. You might tailor the dictionary to better suit your data. For example:
You might customize the dictionary to correct specific parsing behavior. For example, given the name Mary Jones, CRNA, the word CRNA is parsed as a job title. In reality, CRNA is a postname (Certified Registered Nurse Anesthetist). To correct this, you could add CRNA to the parsing dictionary as a postname.
You might tailor the dictionary to better suit your data by adding regional or ethnic names, special titles, or industry jargon. For example, if you process data for the real estate industry, you might add postnames such as
CRS (Certified Residential Specialist) and ABR (Accredited Buyer Representative).
If a specific title or firm name is parsed incorrectly, you can add an entry for the entire phrase. For example, the parser previously identified Hewlett Packard as a personal name, so we added Hewlett Packard to the dictionary as a firm name.
Overview of creating a dictionary
To create a custom parsing dictionary, follow these basic steps:
1. Use UMD Show to query our base parsing dictionary. Look for existing entries for the words you wish to add or change.
2. Put your custom entries in a transaction file. A transaction file is a database containing the additions and changes you wish to make to our dictionary.
3. Build your custom dictionary. UMD Build takes our base dictionary, makes the additions and changes specified in the transaction file, and creates the custom dictionary.
Source dictionary
Our parsing dictionary, parsing.dct
Transaction file
A database containing your additions and changes
Supporting files
Files that enable UMD to read the transaction file
UMD Build
Custom dictionary
A new dictionary con­taining entries from the source dictionary with your additions and changes
Qualifications Preparing custom parsing dictionaries is a task for a data-management
professional. If you employ UMD in all its capabilities, dictionary editing is almost an engineering task. Dictionary editing is not a clerical task.
A note about examples The sample queries and transactions in this chapter are for example only. By
the time you read this manual, the particular examples may have been added to our base parsing dictionaries, so your query results may differ from what is shown.
8 UMD User’s Guide

Step 1: Query the dictionary

Before you add a word to the dictionary, query our base parsing dictionary, parsing.dct, to see whether there is already an entry for the word.
Run UMD Show To query a dictionary, run UMD Show. To run UMD Show, use the command
line (see “UMD Show” on page 41). For example:
umd /s parsing.dct
UMD Show is interactive. You enter a query and UMD Show responds, either with data or a message that your query was not found in the dictionary.
Querying a single word To query a single word, type the word at the Enter> prompt. Do not include
any punctuation. If the word is in the dictionary, UMD Show displays the dictionary entry:
C:\umd /s parsing.dct Using a parsing Dictionary. Enter a query, or press <Esc> to exit. Enter> Beth
Usage: 99 Intl Code(s): USENGLISH Info Code(s): NAME NAMEGEN5 Standard(s) for BETH:
- BETHANY NAME_MTC
- BETHEL NAME_MTC
- ELIZABETH NAME_MTC
For descriptions of the information and standard-type codes, see Appendix C.
If the word is not in the dictionary, UMD Show tells you the entry was not found:
C:\umd /s parsing.dct Using a parsing Dictionary. Enter a query, or press <Esc> to exit. Enter> Michelangelo Text not found in dictionary.
Chapter 1: Custom parsing dictionaries 9
Querying a title phrase To look up a multiple-word title, you must query the “lookup” form of the
title—the same form as the parser would look up:
Procedure Example
1
1. Start with the raw title. Chief Executive Officer
2. Query each word and get the first title match standard
Chf. Exec. Off. (TITLE_MTC) for each. If an appropriate match standard does not exist, use the original word.
3. Remove all punctuation.
Chf Exec Off This is the form of the title that you should query.
C:\
umd /s parsing.dct
Enter a query, or press <ESC> to exit.
Chief
Enter> Usage: 0 Intl Code(s): USENGLISH Info Code(s): FIRMMISC PHRASE_WRD PREGEN3 PRENAME TITLE Standard(s) for CHIEF:
- CHIEF FIRM_STD, PRENAME_STD, TITLE_STD
- CHF. FIRM_MTC, PRENAME_MTC, TITLE_MTC Enter a query, or press <ESC> to exit.
Enter>
Executive
Usage: 0
Get the first appro­priate match stan­dard for each word in the phrase.
Intl Code(s): USENGLISH Info Code(s): FIRMMISC PHRASE_WRD TITLE Standard(s) for EXECUTIVE:
- EXEC. FIRM_MTC, FIRM_STD, TITLE_MTC, TITLE_STD Enter a query, or press <ESC> to exit.
Enter>
Officer
Usage: 0 Intl Code(s): USENGLISH Info Code(s): FIRMMISC NAME NAMEGEN1 PHRASE_WRD PREGEN3 PRE­GEN TITLE Standard(s) for OFFICER:
- OFFICER FIRM_STD, NAME_MTC, PRENAME_STD TITLE_STD
- OFF. FIRM_MTC, PRENAME_MTC, TITLE_MTC Enter a query, or press <ESC> to exit.
Enter>
Chf Exec Off
Usage: 0
Query the lookup form of the phrase.
Intl Code(s): USENGLISH Info Code(s): TITLE Standard(s) for CHF EXEC OFF:
- CEO TITLE_ACR, TITLE_MTC, TITLE_STD
10 UMD User’s Guide
1. If a line contains consecutive words that are marked as phrase words, the parser gets the first match standard for each word, removes any punctuation, and looks up the phrase.
Querying a multiple­word firm name
If you want to query a firm name that is also a personal name, such as Hewlett Packard or Johnson & Johnson, see “Querying a firm name that looks like a personal name” on page 12.
To look up a multiple-word firm name, you must query the “lookup” form of the firm name—the same form as the parser would look up:
Procedure Example
1. Start with the raw firm name. The General Motors Corporation
2. Remove the words
and, or, of, the
3. Remove firm terminator words such as
Ltd, Co
, etc.
4. Query each remaining word. Get the first firm match standard (FIRM_MTC) for each.
If an appropriate match standard does not exist, use the original word.
5. Remove all punctuation. This is the lookup form of the firm name.
, and
for
. General Motors Corporation
Corp, Inc
,
General Motors
Gen. Motors
Gen Motors
2
Query the lookup form of the firm name:
C:\umd /s parsing.dct
Using a Parsing Dictionary. Enter a query, or press <ESC> to exit. Enter> Gen Motors
Usage: 0 Intl Code(s): USENGLISH Info Code(s): FIRMMISC FIRMNAME Standard(s) for GEN MOTORS:
- GM FIRM_ACR, FIRM_MTC, FIRM_STD
For descriptions of the information codes and standard-type codes, see Appendix C.
2. If a line contains at least one word marked as a FIRMNAME word, the parser removes all noise words, gets the first firm match standard for each word in the line, removes any punctuation, and looks up the remaining line. (If all words are marked as NAME words, the process is different—see page 12.)
Chapter 1: Custom parsing dictionaries 11
Querying a firm name that looks like a personal name
Some firms are named after people—for example, Hewlett Packard or Johnson and Johnson.
To look up this type of firm name, you must query the “lookup” form of the
3
firm name—the same form of the name that the parser would look up:
Procedure Example
1. Start with the raw firm name. Johnson and Johnson Corp.
2. Remove all punctuation characters. Johnson and Johnson Corp
3. Remove the words
and, or, of, the
4. Remove all firm-terminator words, such as
,
Inc, Ltd
tion
, and so on.
, and
for
. Johnson Johnson Corp
Corpora-
This is the lookup form of the firm name.
Query the lookup form of the firm name:
C:\umd /s parsing.dct Using a Parsing Dictionary. Enter a query, or press <ESC> to exit. Enter> Johnson Johnson
Usage: 1 Intl Code(s): USENGLISH Info Code(s): FIRMNAME Standard(s) for JOHNSON JOHNSON:
- JOHNSON JOHNSON FIRM_MTC, FIRM_STD
Johnson Johnson
12 UMD User’s Guide
3. If all of the words in a line are identified as both FIRMNAME and NAME words, the parser removes noise words and punctuation, then looks to see whether the name is listed as a firm name. If so, the line is parsed as a firm name. If not, the line is parsed as a personal name.

Step 2: Create a parsing transaction file

A transaction file is a database that contains all the additions and changes that you want to make to the parsing dictionary. The first time you create a custom parsing dictionary, you must create a transaction file.
If you’re updating an existing custom dictionary, use your existing transaction file. Your dictionary will be easier to manage if you store your entries in one transaction file, rather than scattering them among many files.
Create a transaction database
The quickest, easiest way to create a transaction database file and its supporting files is to use the “output file” feature of UMD Show. (See “UMD Show” on page 41.)
1. Use UMD Show to query our base parsing dictionary, parsing.dct. Include the o option on the command line. Use the file name that you plan to use for your custom dictionary, but with the extension .trn—for example, my_parse.trn.
4
2. Query a word that is in the dictionary, such as Bob.
3. Press Enter to save the query to your output file. Press Esc again to exit.
C:\umd /s parsing.dct /o my_parse.trn /d dBase3 Using a Parsing Dictionary Enter a query, or press <ESC> to exit. Enter> Bob
Usage: 99 Intl Code(s): USENGLISH Info Code(s): NAME NAMEGEN1 Standard(s) for BOB:
Enter a query, or press Enter> Previous query appended to C:\my_parse.trn. Enter a query, or press <ESC> to exit. Enter> <Esc>
- ROBERT NAME_MTC
&
& to save, or press <ESC> to exit
Keep supporting files with transaction file
UMD Show will create an output database file—for example, my_parse.trn. You can use this database as your transaction file.
When you create a transaction database as described above, UMD Show creates a supporting file such as my_parse.def. For ASCII and delimited transaction files, UMD Show also create an additional supporting file such as my_parse.fmt or my_parse.dmt. To open and read the transaction file, UMD requires these files. If you move the transaction file to a new location, make sure you also move the corresponding supporting files.
4. If you plan to use a database program or spreadsheet program to edit the file, we recommend creating a dBASE3 or
ASCII file. If you plan to use a text editor or word processor to edit the file, we recommend creating a delimited file. However, be aware that our UMD Views program does not support updating of delimited files.
Chapter 1: Custom parsing dictionaries 13

Step 3: Put your entries in the transaction file

To add records to your transaction file, use a text editor or database program. For each record, provide the information described below.
For examples, see the sample transactions starting with “Sample transaction: Add a new word” on page 18.
Parsing transaction entries
Field Data to enter
Action Choose one:
N
Create a new entry or overwrite the existing entry.
A
Add information to an existing entry.
C
Change the usage or gender data for an existing entry.
D
Delete information from an existing entry, or delete the entry.
Primary Type the word or phrase that you want to add or whose entry you want to
modify. Fifty four characters maximum, not case-sensitive, do any punctuation.
For phrases and multiple-word firm names, use the “lookup” form. To get the lookup form, see “Querying a title phrase” on page 10 and “Querying a firm name that looks like a personal name” on page 12.
not
include
Secondary Type one of the following:
The preferred standardized form of the Primary. A match standard (for information and guidelines, see “Rules for working
with match standards” on page 26).
The acronym form of the Primary.
Usage For name data, indicate the likelihood on a scale of 0 to 100 that the name
is a first name rather than a last name: If the name is always a first name, type If the name is always a last name, or if the word is not a name, type
Intl Type
Info Type all information codes that apply, if not already in the dictionary. Put
Stdtype Type all standard-type codes that apply, if not already in the dictionary.
USENGLISH
one space (no punctuation) between codes. For a list of information codes, see “Information codes” on page 43.
Put one space (no punctuation) between codes. For a list of standard-type codes, see “Standard-type codes” on page 45.
.
100
.
0
.
14 UMD User’s Guide
Required fields For each action, you must provide certain information. In the table below, a
check mark (ü) means that you must provide information for that field.
Type of change
Action Pri-
Create a new entry
Add a standard to an existing entry
Add information to an existing entry
Delete an entire entry
Delete a standard D
Delete a standard-type code
Delete an infor­mation code
Change usage data for an exist­ing entry
Second-
mary
N
A
A
D
ary
ü
ü
ü
ü
ü
ü
ü
Usage Intl Info Std-
Note 1
ü
ü
Note 2
ü
ü
type
ü
ü
Note 3
Note 4
ü
ü
D
ü
Must be
blank
ü
Must be
ü
Note 5
ü
blank
D
C
ü
ü
ü
ü
Note 6
Change gender data for an exist-
C
ü
ü
Note 7
ing entry
1) Required only if the Info field contains NAME.
2) Required only if the necessary Info code is not already in the existing
dictionary entry. For example, if you add the Stdtype code TITLE_MTC, you must specify the Info code TITLE unless one of those Info codes is already specified in the existing dictionary entry.
3) Required if the Info field contains anything besides PHRASE_WRD.
4) Must include all of the Info codes listed in the existing dictionary entry.
5) Must include all of the Stdtype codes listed in the existing dictionary entry.
6) This field is ignored. UMD automatically deletes dependent standard types.
7) PREGENx or NAMEGENx only. The existing dictionary entry must
contain a corresponding gender code. For example, if the existing entry contains the gender code NAMEGEN1, you may change it to any other NAMEGENx code.
Chapter 1: Custom parsing dictionaries 15

Step 4: Build your custom parsing dictionary

After you put all of your entries in your transaction file, use UMD Build to build your custom parsing dictionary.
UMD Build To build your dictionary, run UMD Build. The easiest way to convey your
instructions to UMD is through the UMD configuration file.
1. Open a copy of the configuration file umd.cfg.
2. Type entries for the UMD Build parameters. Specify our dictionary, parsing.dct, as your Source Dictionary.
For descriptions of the configuration-file parameters, see “” on page 39.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
# # UMD Build
Dictionary Type (See NOTE) ............. = Parsing
Source Dictionary (path & dct) ......... = c:\pathname\parsing.dct
Transaction File Name (path & file name) = c:\pathname\my_parse.trn
Target Dictionary (path & dct) ......... = c:\pathname\my_parse.dct
Verify Input File Only (YES/NO) ........ = NO
Error Message Log File (path & name) ... = c:\pathname\my_parse.log
Work Directory (path) .................. = r:\temp
Verification and building
3. Save the configuration file.
4. Run UMD with the cfg option. For example:
umd /cfg my_parse.cfg
Before UMD builds your custom dictionary, it checks to make sure the entries in your transaction file are valid. If a validation error or warning occurs, look at the error log file. If an error occurred, fix your transaction file, then run UMD Build again.
If the transaction file is free of errors, UMD builds your custom dictionary. During the build process, UMD takes the source dictionary, makes the changes and additions specified in your transaction file, and creates your custom dictionary.
16 UMD User’s Guide

Step 5: Maintain and update your custom dictionary

Use your existing transaction file
Keep your dictionary up to date
If you want to update your custom dictionary, put your changes and additions in your existing transaction file. Your custom dictionary will be much easier to manage if you accumulate all your entries in one transaction file, rather than scattering them among many files.
When you rebuild your parsing dictionary, always use our base parsing dictionary, parsing.dct, as the source dictionary.
Whenever we send you a new parsing.dct file, build an updated custom dictionary by running your transaction file against our new dictionary. This allows you to benefit from the additions and improvements that we have made to the base dictionary.
If you do not run your transaction file against each new base dictionary, the differences between our base dictionary and your custom dictionary will increase. This will affect your parsing results and impede our ability to provide technical support.
Chapter 1: Custom parsing dictionaries 17

Sample transaction: Add a new word

Suppose your data file contains the name line Anne Smith, CRNA. You notice that the word CRNA is being parsed as a job title. However, CRNA is really a postname (Certified Registered Nurse Anesthetist).
When you query CRNA, you discover it is not in the parsing dictionary:
C:\umd /s parsing.dct Using a Parsing Dictionary. Enter a query, or press <ESC> to exit. Enter> CRNA Text not found in dictionary.
To add the word to the dictionary, you would add the following record to your transaction file:
Field name Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
N CRNA CRNA 0 USENGLISH HONPOST HONPOST_STD HONPOST_MTC
Capitalization As you add words to the parsing dictionary, make a note of any words that
have unusual mixed-case capitalization. To get the correct mixed-case capitalization, you must also add these words to your custom capitalization dictionary.
For example, if you add CRNA to the parsing dictionary, you should also add it to your custom capitalization dictionary. Otherwise, the mixed-casing will be Crna rather than CRNA.
18 UMD User’s Guide

Sample transaction: Add a title phrase

There is a lot of overlap between words that can be used in firm names and words that can be used in job titles. For example, the words Vice , President, and Marketing can all be used in firm names and in job titles. As a result, the parser may incorrectly identify Vice President of Marketing as a firm name rather than a title. To correct this kind of parsing behavior, you can add a title phrase to the dictionary.
Two main tasks To add a title phrase to the dictionary, you must do two things:
Make sure each word is in the dictionary and has the information codes
PHRASE_WRD and TITLE.
Enter the “lookup” form of the phrase so that the parser will find it (see
“Querying a title phrase” on page 10). Otherwise, the entry will have no affect on parsing results.
To add a title phrase to the dictionary:
1. Query the “lookup” form of the phrase (see “Querying a title phrase” on
page 10). For example, to add the phrase Vice President of Marketing to the dictionary, use the lookup form Vice Pres of Mktg.
2. If the phrase is not in the dictionary, create a new entry in your transaction
file. Use the lookup form as the primary and secondary:
Field name Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
N Vice Pres of Mktg Vice Pres of Mktg 0 USENGLISH TITLE TITLE_STD TITLE_MTC
3. Query each word in the original phrase (e.g., Vice, President, of, and
Marketing). Make sure each word meets the following requirements:
• It has the information code PHRASE_WRD.
• It has the information code TITLE.
• The first title match standard (TITLE_MTC) is the same as the word used in the phrase entry.
Chapter 1: Custom parsing dictionaries 19
4. If a word is not in the dictionary or does not meet the requirements listed in Step 3, add the word (or modify it) by putting an entry in your transaction file.
For our example, the word President is in the dictionary but is not identified as a phrase word, so we need to mark it as a PHRASE_WRD. We also need to mark the word of as a TITLE word and a PHRASE_WRD.
Field name Entry for ‘President’ Entry for ‘of’
Action Primary Secondary Usage Intl Info Stdtype
A President Pres.
PHRASE_WRD
A of of
PHRASE_WRD TITLE TITLE_MTC TITLE_STD
For best results. Perform steps 3 and 4 for variant spellings and abbreviations of each word. For our example, we would check to make sure that Pres and Mktg are marked as phrase words. This enables the parser to recognize variant raw forms of the phrase—such as Vice Pre s. of Marketing, Vice President of Mktg., and Vice Pres. of Mktg.—in addition to the original phrase Vice President of Marketing.

Sample transaction: Add a multiple-word firm name

If a multiple-word firm name such as Emery Worldwide is parsed incorrectly, you can add the firm name to the dictionary.
If a firm name looks like a personal name,1 such as Hewlett Packard or Johnson & Johnson, the procedure is different from the one shown on this page. See “Sample transaction: Add a firm that looks like a personal name” on page 22.
1. To the parser, a line “looks like” a personal name if all of the words in the line are marked as NAME words. For
example, Check N Go looks like a personal name because the words Check, N, and Go are all NAME words.
Two main tasks To add a multiple-word firm name to the dictionary, you must do two things:
Make sure each at least one of the words is in the dictionary and has the FIRMNAME information code.
Enter the “lookup” form of the firm name so that the parser will find it (see “Step 1: Query the dictionary” on page 9). Otherwise, the entry will have no affect on parsing results.
20 UMD User’s Guide
To add a multi-word entry to the dictionary:
1. If the firm name looks like a personal name—for example, Hewlett
Packard, Merrill Lynch, Johnson & Johnson—see “Querying a firm name
that looks like a personal name” on page 12.
2. In your transaction file, create a new entry for the “lookup” form of the
firm name (see “Querying a multiple-word firm name” on page 11).
Field Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
N Emery Worldwide Emery Worldwide 0 USENGLISH FIRMNAME FIRM_STD FIRM_MTC
3. Make sure that at least one of the words in the firm name (for example,
Emery or Worl dw id e) meets the following requirements:
• The entry includes the FIRMNAME information code.
• The first firm match standard (FIRM_MTC) is the word that you used in your firm-name entry in step 2.
4. If none of the words meets the requirements in step 3, add an entry to your transaction file.
Most often, you’ll need to mark one of the words as a FIRMNAME word, as shown here.
Field name Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
A Emery Emery
FIRMNAME FIRM_MTC FIRM_STD
Chapter 1: Custom parsing dictionaries 21

Sample transaction: Add a firm that looks like a personal name

Many firms are named after people—for example, Hewlett Packard or Merrill Lynch. The parser often identifies these as personal names rather than firm
names. To correct this, you can add the firm name to the dictionary.
Two main tasks If a firm name looks like a personal name,
Make sure each word is in the dictionary and has both the NAME and FIRMNAME information codes.
Create an entry for the “lookup” form of the firm name.
To add a firm name that looks like a personal name:
1. Query the “lookup” form of the firm name (see “Querying a firm name that looks like a personal name” on page 12).
2. If the firm name is not in the dictionary, create a new entry in your transaction file. Use the lookup form as the primary and secondary.
Field name Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
N Robert W. Baird Robert W. Baird 0 USENGLISH FIRMNAME FIRM_MTC FIRM_STD
3. Query each word. Make sure it is in the dictionary and is identified as both a NAME and a FIRMNAME. If not, add the word (or modify it) by putting an entry in your transaction file. In our example, Robert W Baird, all three words are in the dictionary, but none has the FIRMNAME information code.
5
you must do two things:
22 UMD User’s Guide
For each word, we would put an entry in the transaction file to add the FIRMNAME information code, as shown here for Robert.
Field name Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
5. To the parser, a line “looks like” a personal name if all of the words in the line are marked as NAME words. For
example, Check N Go looks like a personal name because the words Check, N, and Go are all NAME words.
A Robert Robert
FIRMNAME FIRM_MTC FIRM_STD

Sample transaction: Modify information codes

Suppose your data file contains the line John Smith PsyD. You notice that this line is parsed as a firm name rather than a personal name. Although the name of John Smith’s business might possibly be John Smith PsyD, you would prefer to parse this as a name rather than a firm.
When you query the dictionary, you notice that PsyD is listed as a firm word:
C:\umd /s parsing.dct Using a Parsing Dictionary. Enter a query, or press <ESC> to exit. Enter> PsyD
Usage: 0 Intl Code(s): USENGLISH Info Code(s): FIRMMISC Standard(s) for PSYD:
- PSYD FIRM_MTC, FIRM_STD
In your custom dictionary, you could specify that PsyD is also an honorary postname (Doctor of Psychiatry). To do this, modify the existing entry to add the honorary-postname codes:
Field name Transaction entry
Action Primary Secondary
A PsyD
PsyD Usage Intl Info Stdtype
HONPOST
HONPOST_STD HONPOST_MTC
Notice that when you add a new information code, you must also specify at least one standard for that type of information. In this case, we specified PsyD as the standard and match standard for honorary postnames.
Chapter 1: Custom parsing dictionaries 23

Sample transaction: Modify standards and standard-types

Suppose you want to standardize your data to make it as consistent as possible. In job titles, the word Engineer is standardized to Engr., but you would prefer to standardize it to Eng. instead.
In the dictionary, the title standard for Engineer is Engr.:
C:\umd /s parsing.dct Using a Parsing Dictionary. Enter a query, or press <ESC> to exit. Enter>engineer
Usage: 0 Intl Code(s): USENGLISH Info Code(s): FIRMMISC PHRASE_WRD TITLE Standard(s) for ENGINEER:
- ENGINEER FIRM_MTC, FIRM_STD
- ENGR. TITLE_MTC, TITLE_STD
To change the title standard to Eng., you need to do two things:
•Add the standard Eng. and identify it as a title standard (TITLE_STD).
Delete the TITLE_STD code from the standard Engr.
You would put these entries in your transaction file:
Field name Entry to add ‘Eng.’
as a standard
Action Primary Secondary
A Engineer
Eng. Usage Intl Info Stdtype
TITLE_STD
Entry to delete TITLE_STD code from the standard ‘Engr.’
D Engineer Engr.
TITLE_STD
24 UMD User’s Guide

Sample transaction: Add an acronym for acronym conversion

The name parser can convert a prename, postname, job title, firm name, or firm location to an acronym. The parser produces an acronym only when one is available in the parsing dictionary—it does not generate initials by algorithm or rule.
How the parser generates acronyms
To add an acronym:
Before looking for an acronym, the parser removes all punctuation and noise words and gets the first appropriate match standard for each word. You must use the same phrase that the parser will actually look up—otherwise, the parser won’t find your entry and won’t generate the acronym.
Procedure Example
1. Start with the raw phrase. Certified Residential Specialist
2. Remove the words
3. For firm names, remove firm terminator words
Corp, Inc, Ltd, Co
such as
4. Query each remaining word. Get the first appro­priate match standard for each. For example, if you are adding a firm match standard, get the first FIRM_MTC.
1
5. Remove all punctuation. This is the lookup form of the acronym phrase.
1. If the word is not in the dictionary, create a new entry for the word (see page 18). If the word is in the dictionary but does not list an appropriate match standard, create an entry to add the appropriate information code and match-standard type (see pages 23 and 24). For example, for the word Residential we would add the information code HONPOST and the standard-type code HONPOST_MTC.
and, or, of, the
etc.
, and
for
. Certified Residential Specialist
Certified Residential Specialist
Cert. Residential Specialist
Cert Residential Specialist
To add the phrase to the dictionary, put an entry in your transaction file. Use the lookup form of the phrase as the primary, and use the acronym itself as the secondary:
Field name Transaction entry
Action Primary Secondary Usage Intl Info Stdtype
N Cert Residential Specialist CRS 0 USENGLISH HONPOST HONPOST_ACR
Chapter 1: Custom parsing dictionaries 25

Rules for working with match standards

Each entry in the parsing dictionary may include one or more match standards. You can use match standards to improve the performance of your matching or merge/purge software.
How match standards work
To simplify this discussion, we discuss match standards for personal names. However, match standards are also available for other types of data.
In the dictionaries, a match standard is a one-way relationship, a pointer from one name to another:
Alberto
Albert
Allen
Alan
Alfredo
Alfred
Alex
Alexander
Al
Alphonso
Alphonse
Alonzo
Almon
•For the name Al, the match standards are Albert, Alan, Alfred, Alexander, Alphonse, and Almon.
•For the name Alberto, the match standard is Albert. (Likewise, for Allen the match standard is Alan; for Alfredo, Alfred; and so on.)
If two different names return the same match standard, you can use your matching software to do multiway comparisons and find a match. For example, since Alberto and Al both return Albert as a match standard, your matching software could match Alberto Smith to Al Smith.
Here are partial dictionary entries for the name Al and its direct match standards.
26 UMD User’s Guide
Primary Standard
ALBERT ALBERT
ALAN ALAN
ALFRED ALFRED
ALEXANDER ALEXANDER
ALPHONSE ALPHONSE
ALMON ALMON
AL ALBERT, ALAN, ALFRED, ALEXANDER, ALPHONSE
Notice that each match standard has its own entry, and that in that entry, the standard is the same as the primary.
Working with match standards
To use a word as a match standard, it should have its own entry in the dictionary (or have its own entry in the transaction file).
6
In that entry, the word must be a match standard of itself—in other words, the match standard must be the same as the query word.
For example, you could use the word Dr as a match standard because it is in the dictionary and has itself, Dr, as a match standard:
Enter a query, or press <ESC> to exit. Enter> Dr
Usage: 0 Intl Code(s): USENGLISH Info Code(s): HONPOST_ALONE PREGEN3 PRENAME_ALONE Standard(s) for DR:
- DR. HONPOST_MTC, HONPOST_STD, PRENAME_MTC, PRENAME_STD
If a word is a match standard of itself, you can use it as the same type of match standard for another word.
Field name Transaction entry
Action Primary Secondary
A Doc
DR.
Usage Intl Info Stdtype
PRENAME PREGEN3 PRENAME_STD
PRENAME_MTC
Spelling and punctuation. The spelling and punctuation of the Secondary
in the transaction entry must exactly match the Standard in the existing dictionary entry.
6. Technically, you could also use a word as a match standard if that word does not have an entry in the dictionary—for example, you could use Michelangelo as a match standard because Michelangelo is not in the dictionary. In practice, however, if you use a word as a match standard, you’ll probably also want that word to have its own entry in the dictionary, so we make that assumption in our guidelines.
Chapter 1: Custom parsing dictionaries 27
28 UMD User’s Guide
Chapter 2:

Custom capitalization dictionaries

This chapter explains how to create and maintain custom capitalization dictionaries.
What is a capitalization dictionary?
Why create a custom dictionary?
Create transactions, build your dictionary
In a custom capitalization dictionary, you can specify the correct casing for a word in different situations. For example, you can specify that when MCKAYE is used as a last name, the casing should be McKaye.
Most users find that our capitalization dictionary, pwcap.dct, produces good mixed-case results. However, if a word is not cased as you would like, you can enter that word in a custom capitalization dictionary.
For example, if you want the word TECHTEL to be cased as Tec hTel, you could add the word Tec hTe l to your custom dictionary.
Most of our products allow you to use two capitalization dictionaries at once, so we expect that most users will employ our base dictionary “as is” and build their own, separate dictionary as an extension. When you use your dictionary, you can give it priority over ours by specifying your dictionary as Dictionary #2.
For each entry you want to place in your capitalization dictionary, you will create a record, or transaction, in a database called a transaction file.
After you make all of your entries in the transaction file, you will run the UMD Build process. UMD Build reads the entries from your transaction file and creates your custom dictionary.
Query our dictionary or yours
You can look up words in our dictionary, pwcap.dct, or your custom dictionary. For example, if you want to see how we capitalize the word PHD, you can query the dictionary:
c:\umd /s pwcap.dct Using a Capital Dictionary. Enter a query, or press <ESC> to exit. Enter> PHD PHD is capitalized as follows:
-PhD is used with EVERY occurrence.
For more details about querying a capitalization dictionary, see “Querying your dictionary” on page 33.
Chapter 2: Custom capitalization dictionaries 29

Step 1: Create a capitalization transaction file

A transaction file is a database that contains all of your entries for a particular custom capitalization dictionary. Each entry in your transaction file will create one entry in your custom dictionary.
If you are working with an existing custom dictionary, use the existing transaction file for that dictionary. Do not create more than one transaction file for each custom dictionary.
Create a transaction database
The quickest, easiest way to create a transaction database and its supporting files is to use the “output file” feature of UMD Show. (See “UMD Show” on page 41.)
1. Use UMD Show to query our base capitalization dictionary, pwcap.dct. Include the o option on the command line. Use the base file name that you plan to use for your custom dictionary, but with the extension .trn—for example, my_cap.trn.
7
2. Query a word that is in the dictionary, such as PhD.
3. Press <Enter> to save the query to your output file. Press <Esc> again to exit.
C:\umd /s pwcap.dct /o my_cap.trn /d dBase3 Using a Capital Dictionary Enter a query, or press <ESC> to exit. Enter> phd PHD is capitalized as follows:
-PhD is used with EVERY occurrence Enter a query, or press Enter>
&
Previous query appended to C:\my_cap.trn. Enter a query, or press <ESC> to exit. Enter> <Esc>
& to save, or press <ESC> to exit
UMD Show will create an output database file—for example, my_cap.trn. You can use this database as your transaction file. For instructions on adding your entries to the transaction file, see “Step 2: Put your entries in the transaction file” on page 31.
Keep supporting files with transaction file
30 UMD User’s Guide
When you create a transaction database as described above, UMD Show creates a supporting file such as my_cap.def. If the transaction file is ASCII or delimited-ASCII, UMD Show also creates an additional file such as my_cap.fmt or my_cap.dmt.
To open and read the transaction file, UMD needs these files. If you move the transaction file to a new location, make sure you also move the corresponding supporting files.
7. If you plan to use a database program or spreadsheet program to edit the file, we recommend creating a dBASE3 or ASCII file. If you plan to use a text editor or word processor to edit the file, we recommend creating a delimited file. However, be aware that our UMD Views Windows program does not support updating of delimited files.

Step 2: Put your entries in the transaction file

For each word you want to add to your custom capitalization dictionary, create a record in your transaction file. Use a text editor or a database program to add records to your transaction file.
Capitalization transaction entries
The table below describes what information to put in each field in your transaction file.
Field Data to enter
Action Choose one:
N
Create a new entry.
D
Delete the existing entry from the source dictionary.
Primary Type the word in the preferred casing, 54 characters maximum. Type a
single word (no spaces). Do not include any punctuation.
Attribute Specify when this casing should be used. Include all that apply, separated
by one space (no punctuation):
PRE
NAME Prenames
FIRS
TNAME First names
LAS
TNAME Last names
PREL
ASTNAME Last-name prefixes
POS
TNAME Postnames
TITL
E Job titles
FIRM
Firm data
ADD
RESS Address lines
CIT
Y City names
STA
TE State names
FIN
ANCIAL Financial terms.
EVE
RY Every occurrence
2
You may type the entire word or just the portion shown in bold (for exam­ple, FIRSTNAME or FIRS).
If the Action field contains
1. This command is used rarely, if ever. If you want to delete an entry from your custom dictionary, simply delete that record from your transaction file, then rebuild the dictionary. If you don’t like the casing for a word in our base dictionary, pwcap.dct, you don’t need to delete the entry from our dictionary. Instead, put the desired casing in your custom dictionary. When you process data, specify your dictionary as Dictionary #2 so that your entry will override ours.
2. Do not specify the ADDRESS, CITY, or STATE attribute unless the product that uses the dictionary has address­parsing capability.
D
, you may leave this field blank.
1
Sample entries Here are some sample entries.
Action Primary Attribute
N dos PRELASTNAME
NMcCathieEVERY
Chapter 2: Custom capitalization dictionaries 31

Step 3: Build your custom capitalization dictionary

After you put all of your entries in your transaction file, use UMD Build to build your custom capitalization dictionary.
UMD Build
To build your dictionary, run UMD Build. The easiest way to convey your instructions to UMD is through the UMD configuration file.
1. Open a copy of the configuration file umd.cfg.
2. Type your instructions in the UMD Build parameters. For descriptions of the parameters, see Appendix A.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
# # UMD Build
Dictionary Type (see NOTE) ............. = Capital
Source Dictionary (path & dct) ......... =
Transaction File Name (path & file name) = c:\pathname\my_cap.trn
Target Dictionary (path & dct) ......... = c:\pathname\my_cap.dct
Verify Input File only (YES/NO) ........ = NO
Error Message Log File (path & name) ... = c:\pathname\my_cap.log
Work Directory (path) .................. = r:\temp
...
3. Save the configuration file. We recommend using the same base file name as your dictionary, but with the extension .cfg—for example, my_cap.cfg.
4. Run UMD with the cfg option. For example:
umd /cfg my_cap.cfg
During the build process, UMD reads the entries from your transaction file and creates your custom dictionary.
Tips:
We recommend that you use the same base file name for the transaction file and custom dictionary, and store both files in the same location.
We recommend that you accumulate all your custom entries in one transaction file and build your custom dictionary from the transaction file only. If you do this, you will not need to specify a source dictionary when you run UMD Build.
32 UMD User’s Guide

Step 4: Update your custom dictionary

If you want to add a new entry to your custom dictionary, edit your transaction file, then rebuild your custom dictionary.
Use your existing transaction file
Rebuild your custom dictionary
Querying your dictionary
To update an existing dictionary, add your new entries to your existing transaction file. Your custom dictionary will be much easier to manage if you accumulate all of your entries in one transaction file, rather than scattering them among many files.
To add a word, create a record for that word in the transaction file. To delete a word, delete the record for that word from the transaction file. For dBASE3 files, UMD supports non-destructive delete marking.
After you add your new entries, run UMD Build as instructed on page 32. UMD will rebuild your custom dictionary based on your updated transaction file.
You may wish to query your custom capitalization dictionary to see whether it contains a particular word. To query a dictionary, run UMD Show (see “UMD Show” on page 41). For example:
umd /s my_cap.dct
If you look up a word that is in the dictionary, UMD displays the preferred casing and tells you when that casing is used:
c:\umd /s my_cap.dct Using a Capital Dictionary. Enter a query, or press <ESC> to exit. Enter> TECHTEL TECHTEL is capitalized as follows:
-TechTel is used with FIRM occurrences.
Chapter 2: Custom capitalization dictionaries 33
34 UMD User’s Guide
Chapter 3:

Search-and-replace tables

You can use UMD to create search-and-replace tables for use with our DataRight product. The process consists of two main steps:
1. Create a search-and-replace transaction file
2. Build the search-and-replace table
Chapter 3: Search-and-replace tables 35

Step 1: Create a search-and-replace transaction file

A search-and-replace transaction file is a database that contains all of your search-and-replace entries for a particular table. You can use an existing database, or you can create a new database.
If you are working with an existing table, use the existing transaction file for that table. Do not create more than one transaction file for each table. Your table will be much easier to manage if all your entries are in one transaction file, rather than scattered among many files.
Transaction file A search-and-replace transaction file must be a fixed-length ASCII, dBASE3,
or delimited-ASCII database.
1
You may create a new database or use an
existing database.
The database must contain a field for the search value and a field for the replacement value. Each field may contain up to 128 characters. It is allowable for the file to contain additional fields, but those fields will be ignored.
Transaction entries In your transaction file, enter the following information for each entry:
Field Description
Primary The search value. 128 characters maximum. Case, punctuation, lead-
ing spaces, and spaces between words will be respected.
Secondary The replacement value. 128 characters maximum. Case, punctuation,
leading spaces, and spaces between words will be respected. To indi­cate a replacement value of nothing, leave the Secondary field empty.
Here are a few sample entries:
Primary Secondary
01 Mr.
36 UMD User’s Guide
02 Ms.
03 Mrs.
1. If you plan to use a database program or spreadsheet program to edit the file, we recommend creating a dBASE3 or
fixed-length ASCII file. If you plan to use a text editor or word processor to edit the file, we recommend creating a delimited file. However, be aware that our UMD Views program does not support updating of delimited files.
Supporting files To open and read a transaction file, UMD requires supporting files that
describe the format of the file. You must create these supporting files and store them in the same location as your transaction file. To create the files, use a text editor or word processor (save the file as text-only).
Definition file (DEF)
The definition file defines the type of database and the names of the fields containing the search (primary) and
Database Type = dBase3 User:Primary = Search User:Secondary = Replace
replacement (secondary) values.
On the first line, type Database Type =, then type the appropriate
database type: ASCII, dBASE3, or Delimited.
On the second line, type User:Primary =, then type the name of the field
that contains the search value.
On the third line, type User:Secondary =, then type the name of the field
that contains the replacement value.
When you save the file, use the same base file name as your transaction file, but with the file extension .def. For example, if the transaction file is named table.drl, then name the definition file table.def.
If your database contains additional fields, do not define those fields in the DEF file.
Format file (FMT or DMT)
A format file describes the structure of the database. It lists the name, length, and data-type of each field (usually c for character).
Create a text file that contains one line for each field in the database. On each line, type the field name, length, and data-type, separated by commas.
Search, 20, c Replace, 20, c Record Delimiter = 013 010 Field Delimiter = 044 Field Framing Character = 034
If the database is delimited-ASCII, also specify the ASCII code values for the delimiters used in the file.
When you save the format file, use the same base file name as the transaction file. If the transaction file is fixed-length ASCII, use the file extension .fmt. If the transaction file is delimited ASCII, use the file extension .dmt. For example, if the transaction file is table.trn, then name the format file table.fmt or table.dmt.
If your database contains additional fields, you must define those fields in the format file.
For complete information about DEF, FMT, and DMT files, see our Database Prep manual.
Chapter 3: Search-and-replace tables 37

Step 2: Build the search-and-replace table

After you put all of your entries in the transaction file, use UMD Build to build the search-and-replace table.
UMD Build To build your table, run UMD Build. The easiest way to convey your
instructions to UMD is through the UMD configuration file.
1. Open a copy of the configuration file umd.cfg.
2. Type your instructions in the UMD Build parameters.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
# # UMD Build
Dictionary Type (see NOTE) ............. = Generic
Source Dictionary (path & dct) ......... =
Transaction File Name (path & file name) = c:\pathname\table.trn
Target Dictionary (path & dct) ......... = c:\pathname\table.dct
Verify Input File only (YES/NO) ........ = NO
Error Message Log File (path & name) ... = c:\pathname\table.log
Work Directory (path) .................. = r:\temp
...
For descriptions of the parameters, see “” on page 39.
3. Save the configuration file. We recommend you use the same base file name as your search-and-replace table, but with the extension .cfg—for example, table.cfg.
4. Run UMD with the cfg option. For example:
umd /cfg my_cap.cfg
Tips:
When you name your search-and-replace table, use the extension .dct.
Use the same base file name for the transaction file and the search-
and-replace table, and store both files in the same location. That way, you can tell at a glance which transaction file goes with which table.
For each table, we recommend that you accumulate all of your entries
in one transaction file. If you do this, you will not need to specify a “source” dictionary when you run UMD Build.
38 UMD User’s Guide
Appendix A:

UMD configuration file, umd.cfg

umd.cfg Rather than type a long command line, you can use the UMD configuration
file. Make a copy of umd.cfg and save it under a different file name. Then edit and use your copy.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
# # UMD Build
Dictionary Type (see NOTE) ............. =
Source Dictionary (path & dct) ......... =
Transaction File Name (path & file name) =
Target Dictionary (path & dct) ......... =
Verify Input File only (YES/NO) ........ =
Error Message Log File (path & name) ... =
Work Directory (path) .................. =
# # Dictionary Types: # parsing # generic # capital # # Output File Types: # delimited #ascii #dbase3
Guidelines for editing the configuration file
In the configuration file, do not edit anything to the left of the equal signs. To insert comments, prefix them with a pound sign (#). Complete either the UMD Show section or the UMD Build section, not both. (Exception: For UMD Show, specify the dictionary at the Source Dictionary parameter.)
Command line When you run UMD, include the configuration file as a parameter on the
UMD command line:
Platform Command line
Output File Name Output File Type
UNIX, VMS
Windows, Alpha NT
These parameters are for UMD Show mode only. If you want to record query results in an output file, enter the path and file name. Specify a file type of ASCII, dBASE3, or Delimited.
1. If you plan to use the output file as a transaction file, we recommend the following: If you plan to use a database program or spreadsheet program to edit the file, create a dBASE3 or ASCII file. If you plan to use a text editor or word processor to edit the file, create a delimited file. However, be aware that our UMD Views Windows program does not support updating of delimited files.
umd -cfg
umd
/cfg cfg_file.cfg
1
cfg_file.cfg
Appendix A: 39
Dictionary Type Enter the type of dictionary you want to modify. Possible dictionary types are
Parsing, Capital, and Generic (search-and-replace).
Source Dictionary If you are building a custom dictionary or table, enter the path and file name of
the source dictionary:
If you are creating a parsing dictionary, specify our parsing dictionary parsing.dct.
If you are creating a capitalization dictionary or search-and-replace table, you should usually leave this blank.
If you are querying an existing dictionary (UMD Show), specify the path and file name of the dictionary you want to query.
Transaction File Name Type the path and file name of the transaction file containing your entries.
Target Dictionary Type the path and file name of the custom dictionary you want to create. If the
file already exists, UMD overwrites the existing file.
1
If you do not specify a
target, UMD uses the source dictionary as the target.
Do not overwrite any of our base dictionaries. Instead, give your custom dictionary a separate name. Each time you install a software update, we overwrite our base dictionaries. If you use our file names for your dictionaries, your custom dictionaries may be overwritten.
Verify Input File Only If you set this option to Yes , UMD checks all the entries in the transaction file
but does not actually produce the target dictionary. This is handy if you want to verify during the day and run the build process during the night.
If you set this option to No, UMD checks the entries in the transaction file. If no verification errors occur, UMD builds the target dictionary.
Error Message Log File We recommend that you specify an error log file. UMD will write any error or
warning messages to the log file so you can review them later.
If you leave this parameter blank, UMD sends error and warning messages to the screen (standard output). If any messages scroll off the screen, you will not be able to retrieve them.
Work Directory By default, UMD places its temporary work files in the current directory. If
you would like to use some other location, specify a path.
To estimate the space required for work files, use this formula: Work space = 4 x (size of transaction file + size of source dictionary)
1. Before overwriting an existing dictionary, UMD makes a backup copy of the existing file. For example, if the
dictionary is named custom.dct, UMD creates a backup file named custom.001. The next time, UMD creates a backup named custom.002, and so on up to custom.999.
40 UMD User’s Guide
Appendix B:

UMD command line

You can use one of three command lines with UMD:
UMD Show, for querying an existing dictionary or table
UMD Build, for verifying and building your user-modifiable dictionary
UMD Config, for using the UMD configuration file.
UMD Show You can query an existing dictionary or table by using the UMD Show
command line.
Platform Command
line
UNIX, VMS
Windows, Alpha NT
Parameter Description
s dct_file.dct
o out_file
d db_type
1. If you plan to use the output file as a transaction file, we recommend the following: If you plan to use a database program or spreadsheet program to edit the file, create a dBASE3 or ASCII file. If you plan to use a text editor or word processor to edit the file, create a delimited file. However, be aware that our UMD Views program does not support updating of delimited files.
umd -s
umd /s
Path and file name of the dictionary to query.
Path and file name of the output file. If you save a query, UMD writes it to this file. If the file already exists, UMD appends to the end of the file.
Note: You can edit the output file and use it as a transaction file
Database type for the output file. Choose one: dBASE3, ASCII (default), or Delimited.
dct_file.dct [-o
dct_file.dct [/o
out_file
out_file
1
] [-d
] [/d
db_type
db_type
]
]
Appendix B: 41
UMD Build If you prefer not to use the configuration file, you can place all the UMD
Build parameters on the command line.
Platform Command line
UNIX, VMS
Windows, Alpha NT
umd
umd
dct_type -i
dct_type /i
trans
[-s
source
] [-t
target
trans
[/s
source
] [/t
target
Parameter Description
dct_type
i trans
s source
t target
e err_log
p work
Dictionary type: replace).
Path and file name of the transaction file containing your custom entries.
Path and file name of the source dictionary to use as a base for your custom dictionary.
Path and file name of the custom dictionary to create. If the file already exists, UMD will overwrite it. get, UMD uses the source dictionary as the target.
Log file for validation warnings and errors. We recommend that you include this parameter.
Path and directory to use for temporary storage of work files. To estimate space requirements, use this formula:
Work space = 4 x (size of transaction file + size of source dictio­nary)
] [-e
] [/e
err_log
err_log
] [-p
] [/p
Parsing, Capital
work
work
] [-v]
] [/v]
, or
Generic
1
If you do not specify a tar-
(search-and-
v
1. Before overwriting an existing dictionary, UMD makes a backup copy of the existing file. For example, if the
dictionary is named custom.dct, UMD creates a backup file named custom.001. The next time, UMD creates a backup named custom.002, and so on up to custom.999.
UMD Config Rather than type the UMD Show or UMD Build command line, you can
Verify only. If you include this option, UMD checks all the entries in the transaction file but does not actually produce the target dictio­nary.
specify file names and options in the UMD configuration file (see “” on page 39).
To run UMD with the configuration file, use the following command:
Platform Command line
UNIX, VMS
Windows, Alpha NT
umd
umd
-cfg
cfg_file.cfg
/cfg
cfg_file.cfg
42 UMD User’s Guide
Appendix C:

Information codes and standard-type codes

Information codes If you’re creating a parsing transaction entry, type the appropriate information
codes (or “info codes”) in the Info field. Put one space (no punctuation) between codes.
If you’re using UMD Show to query a parsing dictionary, these are the codes shown in the Info Codes field.
Information code Description
FIRMINIT When used in a firm name, likely to be the first word in the firm name.
FIRMLOC A location within a firm (usually used for internal mail delivery), such as
Room, Building.
FIRMMISC A word used in firm names.
FIRMNAME This code is used for firm names that may be parsed incorrectly. For example,
could be incorrectly parsed as a personal name, so listed as Firm Name words.
FIRMTERM Likely to be the last word in a firm name, such as
HONPOST A postname that signifies certification, academic degree, or affiliation, such as
USNR.
MATURPOST A maturity postname such as Jr or
NAME A first name or last name, such as
Note:
The entry must also include one of the NAMEGEN codes.
NAMEDESIG A name designator such as
NAMEGEN1-5 The gender of the name.
• NAMEGEN1 94 to 100 percent chance the person is a man (e.g.,
• NAMEGEN2 70 to 93 percent chance the person is a man (e.g.,
• NAMEGEN3 The name does not reliably indicate gender, or is a last name.
• NAMEGEN4 70 to 93 percent chance the person is a woman (e.g.,
• NAMEGEN5 94 to 100 percent chance the person is a woman (e.g.,
Note:
The entry must also include the NAME code.
Attn
or
Sr.
John
c/o
or
.
Hewlett, Packard
Inc, Corp, Ltd,
Smith
.
Department, Mailstop,
, and
Hewlett Packard
and so on.
Robert
).
Adrian
).
Lynn
).
Anne
).
Hewlett Packard
are all
CPA, PhD
, or
NAMESPEC A word that may appear in a name line, such as
NUMBER A number word, such as
PHRASE_WRD A word that is part of a phrase.
For example, the dictionary contains an entry for the phrase
VP
and
Mktg—
is marked as a PHRASE_WRD.
One, First
, or
1st
.
Family, Resident, Occupant
VP Mktg
. Each word in the phrase—
Appendix C: 43
.
Information code Description
PREGEN1-5 The gender of a prename.
• PREGEN1 Masculine. For example,
• PREGEN3 Neutral. For example,
• PREGEN5 Feminine. For example,
Note:
The entry must also include the PRENAME code.
PRELAST A last-name prefix, such as
Van
Dr
Allen
Mr
or
Senor
or
Capt
.
Ms, Mrs
, or
or O’Connor
.
Senora
.
.
PRENAME A prename, such as
Note:
The entry must also include one of the PREGEN codes.
REGION A geographical word such as
Mr, Ms, Senor, Senora, Dr, Capt
North, We st e rn, Minnesota
TITLE A word used in a job title, such as
FINAN A financial term, such as
Custodian
Software
or
Tru stee
or
Engineer
.
.
, or NY.
.
44 UMD User’s Guide
S t a n d a r d - t y p e c o d e s If you’re creating a parsing transaction entry, type the appropriate standard-
type codes in the Stdtype field. Put one space (no comma) between codes.
If you’re using UMD Show to query a parsing dictionary, these are the codes shown next to each Standard.
In this table, we use the terminology Primary and Secondary. If you are using UMD Show, the Primary is the word you queried and the Secondary is the Standard.
Standard-type code Description
ALL_TEXT_TYPES If a text standard (STD) is not indicated for a particular type of data, use this standard as the default.
Usually used with NUMBER and REGION words. For example, if a word is parsed as a firm name but the dictionary does not list a FIRM_STD for the
word, then use the ALL_TEXT_TYPES standard as the text standard.
Note:
The ALL_TEXT_TYPES is used as a default text standard (STD) only. It is not used as a
default match standard or acronym.
FIRM_ACR If the Primary is parsed as a firm name, use this Secondary as the acronym.
FIRM_MTC If the Primary is parsed as a firm name, use this Secondary as the match standard.
FIRM_STD If the Primary is parsed as a firm name, use this Secondary as the standardized form.
FIRMLOC_ACR If the Primary is parsed as a firm location, use this Secondary as the acronym.
FIRMLOC_MTC If the Primary is parsed as a firm location, use this Secondary as the match standard.
FIRMLOC_STD If the Primary is parsed as a firm location, use this Secondary as the standardized form.
HONPOST_ACR If the Primary is parsed as a honorary postname, use this Secondary as the acronym.
HONPOST_MTC If the Primary is parsed as a honorary postname, use this Secondary as the match standard.
HONPOST_STD If the Primary is parsed as a honorary postname, use this Secondary as the standardized form.
MATURPOST_MTC If the Primary is parsed as a maturity postname, use this Secondary as the match standard.
MATURPOST_STD If the Primary is parsed as a maturity postname, use this Secondary as the standardized form.
NAME_MTC If the Primary is parsed as a name, use this Secondary as the match standard.
NAMEDESIG_ACR If the Primary is parsed as a name designator, use this Secondary as the acronym.
NAMEDESIG_MTC If the Primary is parsed as a name designator, use this Secondary as the match standard.
NAMEDESIG_STD If the Primary is parsed as a name designator, use this Secondary as the standardized form.
NAMESPEC_ACR If the Primary is parsed as a name-special component, use this Secondary as the acronym.
NAMESPEC_MTC If the Primary is parsed as a name-special component, use this Secondary as the match standard.
NAMESPEC_STD If the Primary is parsed as a name-special component, use this Secondary as the standardized form.
PRELAST_MTC If the Primary is parsed as a last-name prefix, use this Secondary as the match standard.
PRELAST_STD If the Primary is parsed as a last-name prefix, use this Secondary as the standardized form.
PRENAME_ACR If the Primary is parsed as a prename, use this Secondary as the acronym.
PRENAME_MTC If the Primary is parsed as a prename, use this Secondary as the match standard.
PRENAME_STD If the Primary is parsed as a prename, use this Secondary as the standardized form.
Appendix C: 45
Standard-type code Description
TITLE_ACR If the Primary is parsed as a title, use this Secondary as the acronym.
TITLE_MTC If the Primary is parsed as a title, use this Secondary as the match standard.
TITLE_STD If the Primary is parsed as a title, use this Secondary as the standardized form.
FINAN_ACR If the Primary is parsed as a financial term, use this Secondary as the acronym.
FINAN_MTC If the Primary is parsed as a financial term, use this Secondary as the match standard.
FINAN_STD If the Primary is parsed as a financial term, use this Secondary as the standardized form.
46 UMD User’s Guide

Index

A
acronyms
how parser generates
adding
firm that looks like a personal name multiple-word firm name, 20 new word, 18 title phrase, 19
B
build UMD (command line), 42 building
custom parsing dictionary
C
capitalization
in parsing dictionary
capitalization dictionary
build customized create transaction file, 30 create your own, 30 definition, 29 querying, 29, 33 update customized, 33
capitalization transaction entries, 31 capitalization transaction file
, 30
creating
codes
information standard-type, 45
command line
build UMD config UMD, 42
query UMD, 41 configure UMD (command line), 42 conventions used, 6 creating a parsing transaction file, 13 custom capitalization dictionary
building
updating, 33 custom dictionary
creating for parsing
maintaining, 17
updating, 17 custom parsing dictionary
building customized parsing dictionary, 16
, 43
, 42
, 32
, 16
D
Definition file (DEF), 37 dictionary
creating custom for parsing dictionary type
, 25
, 16
, 18
, 32
, 8
, 8
, 22
, 7, 16
parsing
F
firm
adding when it looks like a personal name adding when multiple word, 20
firm name that looks like a personal name
querying
Format file (FMT or DMT), 37
, 12
G
generating acronyms via parser, 25
I
Information codes, 43 information codes
modifying
, 23
M
match standards
how standards work rules, 26 spelling and punctuation, 27 working with, 27
modifying
information codes standards and standard-types, 24
multiple-word firm name
adding
, 20
querying, 11
, 26
, 23
N
new word
, 18
adding
P
parser
generating acronyms
parsing dictionary
building custom creating custom, 8 definition, 7
parsing transaction file
, 13
creating
personal name
really a firm name
punctuation
in match standards
, 25
, 16
, 22
, 27
Q
query UMD (command line), 41
, 22
Index 47
R
replacement value, 36
S
search value, 36 search-and-replace table
, 38
building
search-and-replace transaction file
creating
, 36
spelling
in match standards
standards
modifying
standard-type codes, 45 standard-types
modifying
supporting files, 37
, 27
, 24
, 24
transaction file, 36
creating for parsing, 13 placing entries in, 14 putting entries in, 31
transaction file for capitalization
creating
, 30
transaction files
supporting files
, 13
U
UMD, 5 UMD Build (command line), 32, 38 UMD Build command line, 42 UMD Config command line, 42 UMD configuration file, 38 UMD Show command line, 41 UMD Views, 5
T
title phrase
, 19
adding querying, 10
transaction database
, 13, 30
creating
transaction entries, 36
parsing, 14
V
verification
when building custom dictionary
Views for UMD, 5
W
word
querying
, 9
, 16
48 UMD User’s Guide
Loading...