Query the dictionaries to investigate
parsing and mixed-casing
Create a custom parsing dictionary
Create custom capitalization
dictionaries
Create search-and-replace tables
Notices
Published in the United States of America by Firstlogic, Inc., 100 Harborview Plaza,
La Crosse, Wisconsin 54601-4071.
Customer CareTechnical help is free for customers who are current on their ESP. Advisors are
available from 8 a.m. to 6 p.m. central time, Monday through Friday. When you call,
have at hand the user’s manual and the version number of the product you are using.
Call from a location where you can operate your software while speaking on the
phone. To save time, fax or e-mail your questions, and an advisor will call or e-mail
back with answers prepared. Or visit our Knowledge Base on the Customer Portal
web site, where you can find answers on your own, right away, at any time of the day
or night.
Our Customer Care group also manages our customer database and order processing.
Call them for order status, shipment tracking, reporting damaged shipments or flawed
media, changes in contact information, and so on.
What do you think of
this guide?
Legal notices
Phone
888-788-9004 in the U.S. and Canada;
elsewhere 608-788-9000
Web site
E-mail
Product literature
Corporate receptionist
http://www.firstlogic.com/customer
customer@firstlogic.com
888-215-6442, fax 608-788-1188,
or
information@firstlogic.com
608-782-5000, or fax 608-788-1188
The Firstlogic Technical Publications group strives to bring you the most useful and
accurate publications possible. Please give us your opinion about our documentation
by filling out the brief survey at http://customer.firstlogic.com/surveys/default.asp
Firstlogic, Inc., or any authorized dealer distributing this product, makes no warranty, expressed or implied, with respect to
this computer software product or with respect to this manual or its contents, its quality, performance, merchantability, or
fitness for any particular purpose or use. It is solely the responsibility of the purchaser to determine its suitability for a
particular purpose or use. Firstlogic, Inc. will in no event be liable for direct, indirect, incidental, or consequential damages
resulting from any defect or omission in this software product, this manual, the program disks, or related items and
processes, including, but not limited to, any interruption of service, loss of business or anticipatory profit, even if Firstlogic,
Inc. has been advised of the possibility of such damages. This statement of limited liability is in lieu of all other warranties
or guarantees, expressed or implied, including warranties of merchantability or fitness for a particular purpose.
2UMD User’s Guide
1L, IL (ball design), ACE, ACSpeed, DataJet, DocuRight, eDataQuality, Firstlogic, GeoCensus, i·d·Centric, IQ Insight,
MailCoder, PostWare, Postalsoft, Postalsoft Address Dictionary, Postalsoft DeskTop Mailer, Postalsoft DeskTop
PostalCoder, Postalsoft DeskTop Presort, Postalsoft Manifest Reporter, PrintForm, RapidKey, Total Rewards, and
TrueName are registered trademarks of Firstlogic, Inc. DataRight, Entry Planner, FirstPrep, IRVE, iSummit, Label Studio,
Match/Consolidate, Postalsoft Business Edition by Firstlogic, and TaxIQ are trademarks of Firstlogic, Inc. All other
trademarks are the property of their respective owners.
About this guideThis guide explains how to customize your DataRight program to suit your
needs. This customization includes using the User-Modifiable Dictionary
(UMD), which is a tool for viewing and customizing dictionary files.
This guide explains how to use the command-line version of UMD to create
custom parsing dictionaries, custom capitalization dictionaries, and DataRight
search-and-replace tables.
Related documentsBefore using UMD, you should understand how your Firstlogic product uses
the dictionaries. See your product documentation for details.
UMD Views In UMD Views, the step-by-step process of creating and maintaining a
dictionary is different from the process described in this manual.
If you use UMD Views, do not rely on this guide. Instead, get online tips,
procedures, and information while you work:
•Click the Help button on the first screen of UMD Views.
•Press F1 from any UMD Views screen.
•Click the button in the upper right corner of any UMD Views screen,
then click the item for which you want information.
Preface5
Conventions The following conventions are used throughout this manual.
ConventionDescription
Bold
Italics
Menu commands
Changes
We use boldface type for file names and paths. When we’re
explaining something that you would type on your computer, boldface indicates something that you should type exactly as shown; for
example, “Type
cd\dirs
.”
We use italics for emphasis. When we’re explaining something that
you would type on your computer, italics indicate an item for which
you should substitute your own data or values; for example, “Type
a name for your job, along with the
.job
extension (
jobname
.job
).”
We indicate commands that you choose from menus in the following format: Menu Name | Command Name. For example, “Choose
File | New.”
We use a change bar in the right margin to mark product changes
since the last version.
We use this symbol to alert you to important information and potential problems.
We use this symbol to point out special cases that you should know
about.
6UMD User’s Guide
Chapter 1:
Custom parsing dictionaries
What is a parsing
dictionary?
Our name-parsing technology identifies and parses name, title, and firm data.
The parser looks up words in the parsing dictionary to get information. The
parser then uses the dictionary information, as well as word patterns, to
identify and parse name, title, and firm data.
The parsing dictionary contains entries for words and phrases. Each entry tells
how the word or phrase might be used. For example, the dictionary indicates
that the word Engineering can be used in a firm name (such as Smith Engineering, Inc.) or job title (such as VP of Engineering).
The dictionary also contains other information:
Acronyms
Match
standards
Gender
Probabilities
The dictionary contains the standard and acronymic forms of
Inc.
words. For example, the dictionary indicates that
dardized form of
Business Machines
Incorporated
.
and
IBM
is the acronym for
is the stan-
Intl
The dictionary contains match standards (potential matches). For
example,
Patrick
and
Patricia
are match standards for
Pat.
The dictionary contains gender data. For example, it indicates that
Anne
is a feminine name and
Mr.
is a masculine prename.
The dictionary indicates the likelihood that a name is a first name
rather than a last name. For example, there is a 40 percent chance
Martin
that
is a first name rather than a last name.
Chapter 1: Custom parsing dictionaries7
Why create a custom
dictionary?
Our base parsing dictionary contains thousands of name, title, and firm
entries. You might tailor the dictionary to better suit your data. For example:
•You might customize the dictionary to correct specific parsing behavior.
For example, given the name Mary Jones, CRNA, the word CRNA is
parsed as a job title. In reality, CRNA is a postname (Certified Registered Nurse Anesthetist). To correct this, you could add CRNA to the parsing
dictionary as a postname.
•You might tailor the dictionary to better suit your data by adding regional
or ethnic names, special titles, or industry jargon. For example, if you
process data for the real estate industry, you might add postnames such as
CRS (Certified Residential Specialist) and ABR (Accredited Buyer
Representative).
•If a specific title or firm name is parsed incorrectly, you can add an entry
for the entire phrase. For example, the parser previously identified
Hewlett Packard as a personal name, so we added Hewlett Packard to the
dictionary as a firm name.
Overview of creating a
dictionary
To create a custom parsing dictionary, follow these basic steps:
1. Use UMD Show to query our base parsing dictionary. Look for
existing entries for the words you wish to add or change.
2. Put your custom entries in a transaction file. A transaction file is a
database containing the additions and changes you wish to make to our
dictionary.
3. Build your custom dictionary. UMD Build takes our base dictionary,
makes the additions and changes specified in the transaction file, and
creates the custom dictionary.
Source dictionary
Our parsing dictionary,
parsing.dct
Transaction file
A database containing
your additions and
changes
Supporting files
Files that enable UMD to
read the transaction file
UMD
Build
Custom dictionary
A new dictionary containing entries from the
source dictionary with
your additions and
changes
QualificationsPreparing custom parsing dictionaries is a task for a data-management
professional. If you employ UMD in all its capabilities, dictionary editing is
almost an engineering task. Dictionary editing is not a clerical task.
A note about examplesThe sample queries and transactions in this chapter are for example only. By
the time you read this manual, the particular examples may have been added
to our base parsing dictionaries, so your query results may differ from what is
shown.
8UMD User’s Guide
Step 1: Query the dictionary
Before you add a word to the dictionary, query our base parsing dictionary,
parsing.dct, to see whether there is already an entry for the word.
Run UMD ShowTo query a dictionary, run UMD Show. To run UMD Show, use the command
line (see “UMD Show” on page 41). For example:
umd /s parsing.dct
UMD Show is interactive. You enter a query and UMD Show responds, either
with data or a message that your query was not found in the dictionary.
Querying a single word To query a single word, type the word at the Enter> prompt. Do not include
any punctuation. If the word is in the dictionary, UMD Show displays the
dictionary entry:
C:\umd /s parsing.dct
Using a parsing Dictionary.
Enter a query, or press <Esc> to exit.
Enter> Beth
Usage: 99
Intl Code(s): USENGLISH
Info Code(s): NAME NAMEGEN5
Standard(s) for BETH:
- BETHANYNAME_MTC
- BETHELNAME_MTC
- ELIZABETHNAME_MTC
For descriptions of the
information and standard-type
codes, see Appendix C.
If the word is not in the dictionary, UMD Show tells you the entry was not
found:
C:\umd /s parsing.dct
Using a parsing Dictionary.
Enter a query, or press <Esc> to exit.
Enter> Michelangelo
Text not found in dictionary.
Chapter 1: Custom parsing dictionaries9
Querying a title phrase To look up a multiple-word title, you must query the “lookup” form of the
title—the same form as the parser would look up:
ProcedureExample
1
1. Start with the raw title.Chief Executive Officer
2. Query each word and get the first title match standard
Chf. Exec. Off.
(TITLE_MTC) for each. If an appropriate match standard
does not exist, use the original word.
3. Remove all punctuation.
Chf Exec Off
This is the form of the title that you should query.
C:\
umd /s parsing.dct
Enter a query, or press <ESC> to exit.
Chief
Enter>
Usage: 0
Intl Code(s): USENGLISH
Info Code(s): FIRMMISC PHRASE_WRD PREGEN3 PRENAME TITLE
Standard(s) for CHIEF:
- CHIEF FIRM_STD, PRENAME_STD, TITLE_STD
- CHF. FIRM_MTC, PRENAME_MTC, TITLE_MTC
Enter a query, or press <ESC> to exit.
Enter>
Executive
Usage: 0
Get the first appropriate match standard for each word
in the phrase.
Intl Code(s): USENGLISH
Info Code(s): FIRMMISC PHRASE_WRD TITLE
Standard(s) for EXECUTIVE:
- EXEC. FIRM_MTC, FIRM_STD, TITLE_MTC, TITLE_STD
Enter a query, or press <ESC> to exit.
Enter>
Officer
Usage: 0
Intl Code(s): USENGLISH
Info Code(s): FIRMMISC NAME NAMEGEN1 PHRASE_WRD PREGEN3 PREGEN TITLE
Standard(s) for OFFICER:
- OFF. FIRM_MTC, PRENAME_MTC, TITLE_MTC
Enter a query, or press <ESC> to exit.
Enter>
Chf Exec Off
Usage: 0
Query the lookup
form of the phrase.
Intl Code(s): USENGLISH
Info Code(s): TITLE
Standard(s) for CHF EXEC OFF:
- CEO TITLE_ACR, TITLE_MTC, TITLE_STD
10UMD User’s Guide
1. If a line contains consecutive words that are marked as phrase words, the parser gets the first match standard for each
word, removes any punctuation, and looks up the phrase.
Querying a multipleword firm name
If you want to query a firm name that is also a personal name, such as
Hewlett Packard or Johnson & Johnson, see “Querying a firm name that
looks like a personal name” on page 12.
To look up a multiple-word firm name, you must query the “lookup” form of
the firm name—the same form as the parser would look up:
ProcedureExample
1. Start with the raw firm name.The General Motors Corporation
2. Remove the words
and, or, of, the
3. Remove firm terminator words such as
Ltd, Co
, etc.
4. Query each remaining word. Get the first firm
match standard (FIRM_MTC) for each.
If an appropriate match standard does not exist, use
the original word.
5. Remove all punctuation.
This is the lookup form of the firm name.
, and
for
.General Motors Corporation
Corp, Inc
,
General Motors
Gen. Motors
Gen Motors
2
Query the lookup form of the firm name:
C:\umd /s parsing.dct
Using a Parsing Dictionary.
Enter a query, or press <ESC> to exit.
Enter> Gen Motors
Usage: 0
Intl Code(s): USENGLISH
Info Code(s): FIRMMISC FIRMNAME
Standard(s) for GEN MOTORS:
- GMFIRM_ACR, FIRM_MTC, FIRM_STD
For descriptions of the
information codes and
standard-type codes,
see Appendix C.
2. If a line contains at least one word marked as a FIRMNAME word, the parser removes all noise words, gets the first
firm match standard for each word in the line, removes any punctuation, and looks up the remaining line. (If all words are
marked as NAME words, the process is different—see page 12.)
Chapter 1: Custom parsing dictionaries11
Querying a firm name
that looks like a
personal name
Some firms are named after people—for example, Hewlett Packard or
Johnson and Johnson.
To look up this type of firm name, you must query the “lookup” form of the
3
firm name—the same form of the name that the parser would look up:
ProcedureExample
1. Start with the raw firm name.Johnson and Johnson Corp.
2. Remove all punctuation characters.Johnson and Johnson Corp
3. Remove the words
and, or, of, the
4. Remove all firm-terminator words, such as
,
Inc, Ltd
tion
, and so on.
, and
for
.Johnson Johnson Corp
Corpora-
This is the lookup form of the firm name.
Query the lookup form of the firm name:
C:\umd /s parsing.dct
Using a Parsing Dictionary.
Enter a query, or press <ESC> to exit.
Enter> Johnson Johnson
Usage: 1
Intl Code(s): USENGLISH
Info Code(s): FIRMNAME
Standard(s) for JOHNSON JOHNSON:
- JOHNSON JOHNSONFIRM_MTC, FIRM_STD
Johnson Johnson
12UMD User’s Guide
3. If all of the words in a line are identified as both FIRMNAME and NAME words, the parser removes noise words and
punctuation, then looks to see whether the name is listed as a firm name. If so, the line is parsed as a firm name. If not, the
line is parsed as a personal name.
Step 2: Create a parsing transaction file
A transaction file is a database that contains all the additions and changes that
you want to make to the parsing dictionary. The first time you create a custom
parsing dictionary, you must create a transaction file.
If you’re updating an existing custom dictionary, use your existing
transaction file. Your dictionary will be easier to manage if you store your
entries in one transaction file, rather than scattering them among many
files.
Create a transaction
database
The quickest, easiest way to create a transaction database file and its
supporting files is to use the “output file” feature of UMD Show. (See “UMD
Show” on page 41.)
1. Use UMD Show to query our base parsing dictionary, parsing.dct.
Include the o option on the command line. Use the file name that you plan
to use for your custom dictionary, but with the extension .trn—for
example, my_parse.trn.
4
2. Query a word that is in the dictionary, such as Bob.
3. Press Enter to save the query to your output file. Press Esc again to exit.
C:\umd /s parsing.dct /o my_parse.trn /d dBase3
Using a Parsing Dictionary
Enter a query, or press <ESC> to exit.
Enter> Bob
Usage: 99
Intl Code(s): USENGLISH
Info Code(s): NAME NAMEGEN1
Standard(s) for BOB:
Enter a query, or press
Enter>
Previous query appended to C:\my_parse.trn.
Enter a query, or press <ESC> to exit.
Enter> <Esc>
- ROBERTNAME_MTC
&
& to save, or press <ESC> to exit
Keep supporting files
with transaction file
UMD Show will create an output database file—for example,
my_parse.trn. You can use this database as your transaction file.
When you create a transaction database as described above, UMD Show
creates a supporting file such as my_parse.def. For ASCII and delimited
transaction files, UMD Show also create an additional supporting file such as
my_parse.fmt or my_parse.dmt. To open and read the transaction file, UMD
requires these files. If you move the transaction file to a new location, make
sure you also move the corresponding supporting files.
4. If you plan to use a database program or spreadsheet program to edit the file, we recommend creating a dBASE3 or
ASCII file. If you plan to use a text editor or word processor to edit the file, we recommend creating a delimited file.
However, be aware that our UMD Views program does not support updating of delimited files.
Chapter 1: Custom parsing dictionaries13
Step 3: Put your entries in the transaction file
To add records to your transaction file, use a text editor or database program.
For each record, provide the information described below.
For examples, see the sample transactions starting with “Sample transaction:
Add a new word” on page 18.
Parsing transaction
entries
FieldData to enter
ActionChoose one:
N
Create a new entry or overwrite the existing entry.
A
Add information to an existing entry.
C
Change the usage or gender data for an existing entry.
D
Delete information from an existing entry, or delete the entry.
PrimaryType the word or phrase that you want to add or whose entry you want to
modify. Fifty four characters maximum, not case-sensitive, do
any punctuation.
For phrases and multiple-word firm names, use the “lookup” form. To get
the lookup form, see “Querying a title phrase” on page 10 and “Querying
a firm name that looks like a personal name” on page 12.
not
include
SecondaryType one of the following:
The preferred standardized form of the Primary.
A match standard (for information and guidelines, see “Rules for working
with match standards” on page 26).
The acronym form of the Primary.
UsageFor name data, indicate the likelihood on a scale of 0 to 100 that the name
is a first name rather than a last name:
If the name is always a first name, type
If the name is always a last name, or if the word is not a name, type
IntlType
InfoType all information codes that apply, if not already in the dictionary. Put
StdtypeType all standard-type codes that apply, if not already in the dictionary.
USENGLISH
one space (no punctuation) between codes.
For a list of information codes, see “Information codes” on page 43.
Put one space (no punctuation) between codes.
For a list of standard-type codes, see “Standard-type codes” on page 45.
.
100
.
0
.
14UMD User’s Guide
Required fieldsFor each action, you must provide certain information. In the table below, a
check mark (ü) means that you must provide information for that field.
Type of
change
Action Pri-
Create a new
entry
Add a standard to
an existing entry
Add information
to an existing
entry
Delete an entire
entry
Delete a standardD
Delete a
standard-type
code
Delete an information code
Change usage
data for an existing entry
Second-
mary
N
A
A
D
ary
ü
ü
ü
ü
ü
ü
ü
Usage Intl InfoStd-
Note 1
ü
ü
Note 2
ü
ü
type
ü
ü
Note 3
Note 4
ü
ü
D
ü
Must be
blank
ü
Must be
ü
Note 5
ü
blank
D
C
ü
ü
ü
ü
Note 6
Change gender
data for an exist-
C
ü
ü
Note 7
ing entry
1) Required only if the Info field contains NAME.
2) Required only if the necessary Info code is not already in the existing
dictionary entry. For example, if you add the Stdtype code TITLE_MTC, you
must specify the Info code TITLE unless one of those Info codes is already
specified in the existing dictionary entry.
3) Required if the Info field contains anything besides PHRASE_WRD.
4) Must include all of the Info codes listed in the existing dictionary entry.
5) Must include all of the Stdtype codes listed in the existing dictionary entry.
6) This field is ignored. UMD automatically deletes dependent standard types.
7) PREGENx or NAMEGENx only. The existing dictionary entry must
contain a corresponding gender code. For example, if the existing entry
contains the gender code NAMEGEN1, you may change it to any other
NAMEGENx code.
Chapter 1: Custom parsing dictionaries15
Step 4: Build your custom parsing dictionary
After you put all of your entries in your transaction file, use UMD Build to
build your custom parsing dictionary.
UMD BuildTo build your dictionary, run UMD Build. The easiest way to convey your
instructions to UMD is through the UMD configuration file.
1. Open a copy of the configuration file umd.cfg.
2. Type entries for the UMD Build parameters. Specify our dictionary,
parsing.dct, as your Source Dictionary.
For descriptions of the configuration-file parameters, see “” on page 39.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
#
# UMD Build
Dictionary Type (See NOTE) ............. = Parsing
Work Directory (path) .................. = r:\temp
Verification and
building
3. Save the configuration file.
4. Run UMD with the cfg option. For example:
umd /cfg my_parse.cfg
Before UMD builds your custom dictionary, it checks to make sure the entries
in your transaction file are valid. If a validation error or warning occurs, look
at the error log file. If an error occurred, fix your transaction file, then run
UMD Build again.
If the transaction file is free of errors, UMD builds your custom dictionary.
During the build process, UMD takes the source dictionary, makes the
changes and additions specified in your transaction file, and creates your
custom dictionary.
16UMD User’s Guide
Step 5: Maintain and update your custom dictionary
Use your existing
transaction file
Keep your dictionary
up to date
If you want to update your custom dictionary, put your changes and additions
in your existing transaction file. Your custom dictionary will be much easier to
manage if you accumulate all your entries in one transaction file, rather than
scattering them among many files.
When you rebuild your parsing dictionary, always use our base parsing
dictionary, parsing.dct, as the source dictionary.
Whenever we send you a new parsing.dct file, build an updated custom
dictionary by running your transaction file against our new dictionary. This
allows you to benefit from the additions and improvements that we have made
to the base dictionary.
If you do not run your transaction file against each new base dictionary, the
differences between our base dictionary and your custom dictionary will
increase. This will affect your parsing results and impede our ability to
provide technical support.
Chapter 1: Custom parsing dictionaries17
Sample transaction: Add a new word
Suppose your data file contains the name line Anne Smith, CRNA. You notice
that the word CRNA is being parsed as a job title. However, CRNA is really a
postname (Certified Registered Nurse Anesthetist).
When you query CRNA, you discover it is not in the parsing dictionary:
C:\umd /s parsing.dct
Using a Parsing Dictionary.
Enter a query, or press <ESC> to exit.
Enter> CRNA
Text not found in dictionary.
To add the word to the dictionary, you would add the following record to your
transaction file:
Field nameTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
N
CRNA
CRNA
0
USENGLISH
HONPOST
HONPOST_STD HONPOST_MTC
Capitalization As you add words to the parsing dictionary, make a note of any words that
have unusual mixed-case capitalization. To get the correct mixed-case
capitalization, you must also add these words to your custom capitalization
dictionary.
For example, if you add CRNA to the parsing dictionary, you should also add
it to your custom capitalization dictionary. Otherwise, the mixed-casing will
be Crna rather than CRNA.
18UMD User’s Guide
Sample transaction: Add a title phrase
There is a lot of overlap between words that can be used in firm names and
words that can be used in job titles. For example, the words Vice , President,
and Marketing can all be used in firm names and in job titles. As a result, the
parser may incorrectly identify Vice President of Marketing as a firm name
rather than a title. To correct this kind of parsing behavior, you can add a title
phrase to the dictionary.
Two main tasksTo add a title phrase to the dictionary, you must do two things:
•Make sure each word is in the dictionary and has the information codes
PHRASE_WRD and TITLE.
•Enter the “lookup” form of the phrase so that the parser will find it (see
“Querying a title phrase” on page 10). Otherwise, the entry will have no
affect on parsing results.
To add a title phrase to
the dictionary:
1. Query the “lookup” form of the phrase (see “Querying a title phrase” on
page 10). For example, to add the phrase Vice President of Marketing to
the dictionary, use the lookup form Vice Pres of Mktg.
2. If the phrase is not in the dictionary, create a new entry in your transaction
file. Use the lookup form as the primary and secondary:
Field nameTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
N
Vice Pres of Mktg
Vice Pres of Mktg
0
USENGLISH
TITLE
TITLE_STD TITLE_MTC
3. Query each word in the original phrase (e.g., Vice, President, of, and
Marketing). Make sure each word meets the following requirements:
• It has the information code PHRASE_WRD.
• It has the information code TITLE.
• The first title match standard (TITLE_MTC) is the same as the word
used in the phrase entry.
Chapter 1: Custom parsing dictionaries19
4. If a word is not in the dictionary or does not meet the requirements listed
in Step 3, add the word (or modify it) by putting an entry in your
transaction file.
For our example, the word President is in the dictionary but is not
identified as a phrase word, so we need to mark it as a PHRASE_WRD.
We also need to mark the word of as a TITLE word and a
PHRASE_WRD.
Field nameEntry for ‘President’Entry for ‘of’
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
A
President
Pres.
PHRASE_WRD
A
of
of
PHRASE_WRD TITLE
TITLE_MTC TITLE_STD
For best results. Perform steps 3 and 4 for variant spellings and
abbreviations of each word. For our example, we would check to make
sure that Pres and Mktg are marked as phrase words. This enables the
parser to recognize variant raw forms of the phrase—such as Vice Pre s. of Marketing, Vice President of Mktg., and Vice Pres. of Mktg.—in addition
to the original phrase Vice President of Marketing.
Sample transaction: Add a multiple-word firm name
If a multiple-word firm name such as Emery Worldwide is parsed incorrectly,
you can add the firm name to the dictionary.
If a firm name looks like a personal name,1 such as Hewlett Packard or Johnson & Johnson, the procedure is different from the one shown on this
page. See “Sample transaction: Add a firm that looks like a personal
name” on page 22.
1. To the parser, a line “looks like” a personal name if all of the words in the line are marked as NAME words. For
example, Check N Go looks like a personal name because the words Check, N, and Go are all NAME words.
Two main tasksTo add a multiple-word firm name to the dictionary, you must do two things:
•Make sure each at least one of the words is in the dictionary and has the
FIRMNAME information code.
•Enter the “lookup” form of the firm name so that the parser will find it
(see “Step 1: Query the dictionary” on page 9). Otherwise, the entry will
have no affect on parsing results.
20UMD User’s Guide
To add a multi-word
entry to the dictionary:
1. If the firm name looks like a personal name—for example, Hewlett
Packard, Merrill Lynch, Johnson & Johnson—see “Querying a firm name
that looks like a personal name” on page 12.
2. In your transaction file, create a new entry for the “lookup” form of the
firm name (see “Querying a multiple-word firm name” on page 11).
FieldTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
N
Emery Worldwide
Emery Worldwide
0
USENGLISH
FIRMNAME
FIRM_STD FIRM_MTC
3. Make sure that at least one of the words in the firm name (for example,
Emery or Worl dw id e) meets the following requirements:
• The entry includes the FIRMNAME information code.
• The first firm match standard (FIRM_MTC) is the word that you used
in your firm-name entry in step 2.
4. If none of the words meets the requirements in step 3, add an entry to your
transaction file.
Most often, you’ll need to mark one of the words as a FIRMNAME word,
as shown here.
Field nameTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
A
Emery
Emery
FIRMNAME
FIRM_MTC FIRM_STD
Chapter 1: Custom parsing dictionaries21
Sample transaction: Add a firm that looks like a personal
name
Many firms are named after people—for example, Hewlett Packard or Merrill
Lynch. The parser often identifies these as personal names rather than firm
names. To correct this, you can add the firm name to the dictionary.
Two main tasksIf a firm name looks like a personal name,
•Make sure each word is in the dictionary and has both the NAME and
FIRMNAME information codes.
•Create an entry for the “lookup” form of the firm name.
To add a firm name
that looks like a
personal name:
1. Query the “lookup” form of the firm name (see “Querying a firm name
that looks like a personal name” on page 12).
2. If the firm name is not in the dictionary, create a new entry in your
transaction file. Use the lookup form as the primary and secondary.
Field nameTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
N
Robert W. Baird
Robert W. Baird
0
USENGLISH
FIRMNAME
FIRM_MTC FIRM_STD
3. Query each word. Make sure it is in the dictionary and is identified as both
a NAME and a FIRMNAME. If not, add the word (or modify it) by
putting an entry in your transaction file. In our example, Robert W Baird,
all three words are in the dictionary, but none has the FIRMNAME
information code.
5
you must do two things:
22UMD User’s Guide
For each word, we would put an entry in the transaction file to add the
FIRMNAME information code, as shown here for Robert.
Field nameTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
5. To the parser, a line “looks like” a personal name if all of the words in the line are marked as NAME words. For
example, Check N Go looks like a personal name because the words Check, N, and Go are all NAME words.
A
Robert
Robert
FIRMNAME
FIRM_MTC FIRM_STD
Sample transaction: Modify information codes
Suppose your data file contains the line John Smith PsyD. You notice that this
line is parsed as a firm name rather than a personal name. Although the name
of John Smith’s business might possibly be John Smith PsyD, you would
prefer to parse this as a name rather than a firm.
When you query the dictionary, you notice that PsyD is listed as a firm word:
C:\umd /s parsing.dct
Using a Parsing Dictionary.
Enter a query, or press <ESC> to exit.
Enter> PsyD
Usage: 0
Intl Code(s): USENGLISH
Info Code(s): FIRMMISC
Standard(s) for PSYD:
- PSYDFIRM_MTC, FIRM_STD
In your custom dictionary, you could specify that PsyD is also an honorary
postname (Doctor of Psychiatry). To do this, modify the existing entry to add
the honorary-postname codes:
Field nameTransaction entry
Action
Primary
Secondary
A
PsyD
PsyD
Usage
Intl
Info
Stdtype
HONPOST
HONPOST_STD HONPOST_MTC
Notice that when you add a new information code, you must also specify at
least one standard for that type of information. In this case, we specified PsyD
as the standard and match standard for honorary postnames.
Chapter 1: Custom parsing dictionaries23
Sample transaction: Modify standards and standard-types
Suppose you want to standardize your data to make it as consistent as
possible. In job titles, the word Engineer is standardized to Engr., but you
would prefer to standardize it to Eng. instead.
In the dictionary, the title standard for Engineer is Engr.:
C:\umd /s parsing.dct
Using a Parsing Dictionary.
Enter a query, or press <ESC> to exit.
Enter>engineer
Usage: 0
Intl Code(s): USENGLISH
Info Code(s): FIRMMISC PHRASE_WRD TITLE
Standard(s) for ENGINEER:
- ENGINEERFIRM_MTC, FIRM_STD
- ENGR.TITLE_MTC, TITLE_STD
To change the title standard to Eng., you need to do two things:
•Add the standard Eng. and identify it as a title standard (TITLE_STD).
•Delete the TITLE_STD code from the standard Engr.
You would put these entries in your transaction file:
Field nameEntry to add ‘Eng.’
as a standard
Action
Primary
Secondary
A
Engineer
Eng.
Usage
Intl
Info
Stdtype
TITLE_STD
Entry to delete TITLE_STD code
from the standard ‘Engr.’
D
Engineer
Engr.
TITLE_STD
24UMD User’s Guide
Sample transaction: Add an acronym for acronym conversion
The name parser can convert a prename, postname, job title, firm name, or
firm location to an acronym. The parser produces an acronym only when one
is available in the parsing dictionary—it does not generate initials by
algorithm or rule.
How the parser
generates acronyms
To add an acronym:
Before looking for an acronym, the parser removes all punctuation and noise
words and gets the first appropriate match standard for each word. You must
use the same phrase that the parser will actually look up—otherwise, the
parser won’t find your entry and won’t generate the acronym.
ProcedureExample
1. Start with the raw phrase.Certified Residential Specialist
2. Remove the words
3. For firm names, remove firm terminator words
Corp, Inc, Ltd, Co
such as
4. Query each remaining word. Get the first appropriate match standard for each. For example, if you
are adding a firm match standard, get the first
FIRM_MTC.
1
5. Remove all punctuation. This is the lookup form
of the acronym phrase.
1. If the word is not in the dictionary, create a new entry for the word (see page 18). If the word is in the dictionary but
does not list an appropriate match standard, create an entry to add the appropriate information code and match-standard
type (see pages 23 and 24). For example, for the word Residential we would add the information code HONPOST and the
standard-type code HONPOST_MTC.
and, or, of, the
etc.
, and
for
.Certified Residential Specialist
Certified Residential Specialist
Cert. Residential Specialist
Cert Residential Specialist
To add the phrase to the dictionary, put an entry in your transaction file. Use
the lookup form of the phrase as the primary, and use the acronym itself as the
secondary:
Field nameTransaction entry
Action
Primary
Secondary
Usage
Intl
Info
Stdtype
N
Cert Residential Specialist
CRS
0
USENGLISH
HONPOST
HONPOST_ACR
Chapter 1: Custom parsing dictionaries25
Rules for working with match standards
Each entry in the parsing dictionary may include one or more match standards.
You can use match standards to improve the performance of your matching or
merge/purge software.
How match standards
work
To simplify this discussion, we discuss match standards for personal names.
However, match standards are also available for other types of data.
In the dictionaries, a match standard is a one-way relationship, a pointer from
one name to another:
Alberto
Albert
Allen
Alan
Alfredo
Alfred
Alex
Alexander
Al
Alphonso
Alphonse
Alonzo
Almon
•For the name Al, the match standards are Albert, Alan, Alfred, Alexander,
Alphonse, and Almon.
•For the name Alberto, the match standard is Albert. (Likewise, for Allen
the match standard is Alan; for Alfredo, Alfred; and so on.)
If two different names return the same match standard, you can use your
matching software to do multiway comparisons and find a match. For
example, since Alberto and Al both return Albert as a match standard, your
matching software could match Alberto Smith to Al Smith.
Here are partial dictionary entries for the name Al and its direct match
standards.
26UMD User’s Guide
PrimaryStandard
ALBERTALBERT
ALANALAN
ALFREDALFRED
ALEXANDERALEXANDER
ALPHONSEALPHONSE
ALMONALMON
ALALBERT, ALAN, ALFRED, ALEXANDER, ALPHONSE
Notice that each match standard has its own entry, and that in that entry, the
standard is the same as the primary.
Working with match
standards
To use a word as a match standard, it should have its own entry in the
dictionary (or have its own entry in the transaction file).
6
In that entry, the
word must be a match standard of itself—in other words, the match standard
must be the same as the query word.
For example, you could use the word Dr as a match standard because it is in
the dictionary and has itself, Dr, as a match standard:
Enter a query, or press <ESC> to exit.
Enter> Dr
Usage: 0
Intl Code(s): USENGLISH
Info Code(s): HONPOST_ALONE PREGEN3 PRENAME_ALONE
Standard(s) for DR:
- DR. HONPOST_MTC, HONPOST_STD, PRENAME_MTC, PRENAME_STD
If a word is a match standard of
itself, you can use it as the same
type of match standard for another
word.
Field nameTransaction entry
Action
Primary
Secondary
A
Doc
DR.
Usage
Intl
Info
Stdtype
PRENAME PREGEN3
PRENAME_STD
PRENAME_MTC
Spelling and punctuation. The spelling and punctuation of the Secondary
in the transaction entry must exactly match the Standard in the existing
dictionary entry.
6. Technically, you could also use a word as a match standard if that word does not have an entry in the dictionary—for
example, you could use Michelangelo as a match standard because Michelangelo is not in the dictionary. In practice,
however, if you use a word as a match standard, you’ll probably also want that word to have its own entry in the dictionary,
so we make that assumption in our guidelines.
Chapter 1: Custom parsing dictionaries27
28UMD User’s Guide
Chapter 2:
Custom capitalization dictionaries
This chapter explains how to create and maintain custom capitalization
dictionaries.
What is a
capitalization
dictionary?
Why create a custom
dictionary?
Create transactions,
build your dictionary
In a custom capitalization dictionary, you can specify the correct casing for a
word in different situations. For example, you can specify that when MCKAYE
is used as a last name, the casing should be McKaye.
Most users find that our capitalization dictionary, pwcap.dct, produces good
mixed-case results. However, if a word is not cased as you would like, you can
enter that word in a custom capitalization dictionary.
For example, if you want the word TECHTEL to be cased as Tec hTel, you
could add the word Tec hTe l to your custom dictionary.
Most of our products allow you to use two capitalization dictionaries at once,
so we expect that most users will employ our base dictionary “as is” and build
their own, separate dictionary as an extension. When you use your dictionary,
you can give it priority over ours by specifying your dictionary as Dictionary
#2.
For each entry you want to place in your capitalization dictionary, you will
create a record, or transaction, in a database called a transaction file.
After you make all of your entries in the transaction file, you will run the
UMD Build process. UMD Build reads the entries from your transaction file
and creates your custom dictionary.
Query our dictionary
or yours
You can look up words in our dictionary, pwcap.dct, or your custom
dictionary. For example, if you want to see how we capitalize the word PHD,
you can query the dictionary:
c:\umd /s pwcap.dct
Using a Capital Dictionary.
Enter a query, or press <ESC> to exit.
Enter> PHD
PHD is capitalized as follows:
-PhD is used with EVERY occurrence.
For more details about querying a capitalization dictionary, see “Querying
your dictionary” on page 33.
Chapter 2: Custom capitalization dictionaries29
Step 1: Create a capitalization transaction file
A transaction file is a database that contains all of your entries for a particular
custom capitalization dictionary. Each entry in your transaction file will create
one entry in your custom dictionary.
If you are working with an existing custom dictionary, use the existing
transaction file for that dictionary. Do not create more than one transaction
file for each custom dictionary.
Create a transaction
database
The quickest, easiest way to create a transaction database and its supporting
files is to use the “output file” feature of UMD Show. (See “UMD Show” on
page 41.)
1. Use UMD Show to query our base capitalization dictionary, pwcap.dct.
Include the o option on the command line. Use the base file name that you
plan to use for your custom dictionary, but with the extension .trn—for
example, my_cap.trn.
7
2. Query a word that is in the dictionary, such as PhD.
3. Press <Enter> to save the query to your output file. Press <Esc> again to
exit.
C:\umd /s pwcap.dct /o my_cap.trn /d dBase3
Using a Capital Dictionary
Enter a query, or press <ESC> to exit.
Enter> phd
PHD is capitalized as follows:
-PhD is used with EVERY occurrence
Enter a query, or press
Enter>
&
Previous query appended to C:\my_cap.trn.
Enter a query, or press <ESC> to exit.
Enter> <Esc>
& to save, or press <ESC> to exit
UMD Show will create an output database file—for example, my_cap.trn.
You can use this database as your transaction file. For instructions on adding
your entries to the transaction file, see “Step 2: Put your entries in the
transaction file” on page 31.
Keep supporting files
with transaction file
30UMD User’s Guide
When you create a transaction database as described above, UMD Show
creates a supporting file such as my_cap.def. If the transaction file is ASCII
or delimited-ASCII, UMD Show also creates an additional file such as
my_cap.fmt or my_cap.dmt.
To open and read the transaction file, UMD needs these files. If you move the
transaction file to a new location, make sure you also move the corresponding
supporting files.
7. If you plan to use a database program or spreadsheet program to edit the file, we recommend creating a dBASE3 or
ASCII file. If you plan to use a text editor or word processor to edit the file, we recommend creating a delimited file.
However, be aware that our UMD Views Windows program does not support updating of delimited files.
Step 2: Put your entries in the transaction file
For each word you want to add to your custom capitalization dictionary, create
a record in your transaction file. Use a text editor or a database program to add
records to your transaction file.
Capitalization
transaction entries
The table below describes what information to put in each field in your
transaction file.
FieldData to enter
ActionChoose one:
N
Create a new entry.
D
Delete the existing entry from the source dictionary.
PrimaryType the word in the preferred casing, 54 characters maximum. Type a
single word (no spaces). Do not include any punctuation.
AttributeSpecify when this casing should be used. Include all that apply, separated
by one space (no punctuation):
PRE
NAME Prenames
FIRS
TNAME First names
LAS
TNAME Last names
PREL
ASTNAME Last-name prefixes
POS
TNAME Postnames
TITL
E Job titles
FIRM
Firm data
ADD
RESS Address lines
CIT
Y City names
STA
TE State names
FIN
ANCIAL Financial terms.
EVE
RY Every occurrence
2
You may type the entire word or just the portion shown in bold (for example, FIRSTNAME or FIRS).
If the Action field contains
1. This command is used rarely, if ever. If you want to delete an entry from your custom dictionary, simply delete that
record from your transaction file, then rebuild the dictionary. If you don’t like the casing for a word in our base dictionary,
pwcap.dct, you don’t need to delete the entry from our dictionary. Instead, put the desired casing in your custom
dictionary. When you process data, specify your dictionary as Dictionary #2 so that your entry will override ours.
2. Do not specify the ADDRESS, CITY, or STATE attribute unless the product that uses the dictionary has addressparsing capability.
D
, you may leave this field blank.
1
Sample entriesHere are some sample entries.
ActionPrimaryAttribute
NdosPRELASTNAME
NMcCathieEVERY
Chapter 2: Custom capitalization dictionaries31
Step 3: Build your custom capitalization dictionary
After you put all of your entries in your transaction file, use UMD Build to
build your custom capitalization dictionary.
UMD Build
To build your dictionary, run UMD Build. The easiest way to convey your
instructions to UMD is through the UMD configuration file.
1. Open a copy of the configuration file umd.cfg.
2. Type your instructions in the UMD Build parameters. For descriptions of
the parameters, see Appendix A.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
#
# UMD Build
Dictionary Type (see NOTE) ............. = Capital
Source Dictionary (path & dct) ......... =
Transaction File Name (path & file name) = c:\pathname\my_cap.trn
Work Directory (path) .................. = r:\temp
...
3. Save the configuration file. We recommend using the same base file name
as your dictionary, but with the extension .cfg—for example, my_cap.cfg.
4. Run UMD with the cfg option. For example:
umd /cfgmy_cap.cfg
During the build process, UMD reads the entries from your transaction file
and creates your custom dictionary.
Tips:
•We recommend that you use the same base file name for the transaction
file and custom dictionary, and store both files in the same location.
•We recommend that you accumulate all your custom entries in one
transaction file and build your custom dictionary from the transaction file
only. If you do this, you will not need to specify a source dictionary when
you run UMD Build.
32UMD User’s Guide
Step 4: Update your custom dictionary
If you want to add a new entry to your custom dictionary, edit your transaction
file, then rebuild your custom dictionary.
Use your existing
transaction file
Rebuild your custom
dictionary
Querying your
dictionary
To update an existing dictionary, add your new entries to your existing
transaction file. Your custom dictionary will be much easier to manage if you
accumulate all of your entries in one transaction file, rather than scattering
them among many files.
To add a word, create a record for that word in the transaction file. To delete a
word, delete the record for that word from the transaction file. For dBASE3
files, UMD supports non-destructive delete marking.
After you add your new entries, run UMD Build as instructed on page 32.
UMD will rebuild your custom dictionary based on your updated transaction
file.
You may wish to query your custom capitalization dictionary to see whether it
contains a particular word. To query a dictionary, run UMD Show (see “UMD
Show” on page 41). For example:
umd /smy_cap.dct
If you look up a word that is in the dictionary, UMD displays the preferred
casing and tells you when that casing is used:
c:\umd /s my_cap.dct
Using a Capital Dictionary.
Enter a query, or press <ESC> to exit.
Enter> TECHTEL
TECHTEL is capitalized as follows:
-TechTel is used with FIRM occurrences.
Chapter 2: Custom capitalization dictionaries33
34UMD User’s Guide
Chapter 3:
Search-and-replace tables
You can use UMD to create search-and-replace tables for use with our
DataRight product. The process consists of two main steps:
1. Create a search-and-replace transaction file
2. Build the search-and-replace table
Chapter 3: Search-and-replace tables35
Step 1: Create a search-and-replace transaction file
A search-and-replace transaction file is a database that contains all of your
search-and-replace entries for a particular table. You can use an existing
database, or you can create a new database.
If you are working with an existing table, use the existing transaction file for
that table. Do not create more than one transaction file for each table. Your
table will be much easier to manage if all your entries are in one transaction
file, rather than scattered among many files.
Transaction file A search-and-replace transaction file must be a fixed-length ASCII, dBASE3,
or delimited-ASCII database.
1
You may create a new database or use an
existing database.
The database must contain a field for the search value and a field for the
replacement value. Each field may contain up to 128 characters. It is allowable
for the file to contain additional fields, but those fields will be ignored.
Transaction entries In your transaction file, enter the following information for each entry:
leading spaces, and spaces between words will be respected. To indicate a replacement value of nothing, leave the Secondary field empty.
Here are a few sample entries:
PrimarySecondary
01Mr.
36UMD User’s Guide
02Ms.
03Mrs.
1. If you plan to use a database program or spreadsheet program to edit the file, we recommend creating a dBASE3 or
fixed-length ASCII file. If you plan to use a text editor or word processor to edit the file, we recommend creating a
delimited file. However, be aware that our UMD Views program does not support updating of delimited files.
Supporting files To open and read a transaction file, UMD requires supporting files that
describe the format of the file. You must create these supporting files and
store them in the same location as your transaction file. To create the files, use
a text editor or word processor (save the file as text-only).
Definition file (DEF)
The definition file defines the type of
database and the names of the fields
containing the search (primary) and
Database Type = dBase3
User:Primary = Search
User:Secondary = Replace
replacement (secondary) values.
•On the first line, type Database Type =, then type the appropriate
database type: ASCII, dBASE3, or Delimited.
•On the second line, type User:Primary =, then type the name of the field
that contains the search value.
•On the third line, type User:Secondary =, then type the name of the field
that contains the replacement value.
When you save the file, use the same base file name as your transaction file,
but with the file extension .def. For example, if the transaction file is named
table.drl, then name the definition file table.def.
If your database contains additional fields, do not define those fields in the
DEF file.
Format file (FMT or DMT)
A format file describes the structure of the database. It lists the name, length,
and data-type of each field (usually c for character).
Create a text file that contains one
line for each field in the database. On
each line, type the field name, length,
and data-type, separated by commas.
Search, 20, c
Replace, 20, c
Record Delimiter = 013 010
Field Delimiter = 044
Field Framing Character = 034
If the database is delimited-ASCII, also specify the ASCII code values for the
delimiters used in the file.
When you save the format file, use the same base file name as the transaction
file. If the transaction file is fixed-length ASCII, use the file extension .fmt. If
the transaction file is delimited ASCII, use the file extension .dmt. For
example, if the transaction file is table.trn, then name the format file
table.fmt or table.dmt.
If your database contains additional fields, you must define those fields in
the format file.
For complete information about DEF, FMT, and DMT files, see our Database Prep manual.
Chapter 3: Search-and-replace tables37
Step 2: Build the search-and-replace table
After you put all of your entries in the transaction file, use UMD Build to
build the search-and-replace table.
UMD BuildTo build your table, run UMD Build. The easiest way to convey your
instructions to UMD is through the UMD configuration file.
1. Open a copy of the configuration file umd.cfg.
2. Type your instructions in the UMD Build parameters.
# UMD Show
Output File Name (path & file name) .... =
Output File Type (See NOTE) ............ =
#
# UMD Build
Dictionary Type (see NOTE) ............. = Generic
Source Dictionary (path & dct) ......... =
Transaction File Name (path & file name) = c:\pathname\table.trn
Work Directory (path) .................. = r:\temp
...
For descriptions of the parameters, see “” on page 39.
3. Save the configuration file. We recommend you use the same base file
name as your search-and-replace table, but with the extension .cfg—for
example, table.cfg.
4. Run UMD with the cfg option. For example:
umd /cfg my_cap.cfg
Tips:
•When you name your search-and-replace table, use the extension .dct.
•Use the same base file name for the transaction file and the search-
and-replace table, and store both files in the same location. That way,
you can tell at a glance which transaction file goes with which table.
•For each table, we recommend that you accumulate all of your entries
in one transaction file. If you do this, you will not need to specify a
“source” dictionary when you run UMD Build.
38UMD User’s Guide
Appendix A:
UMD configuration file, umd.cfg
umd.cfgRather than type a long command line, you can use the UMD configuration
file. Make a copy of umd.cfg and save it under a different file name. Then edit
and use your copy.
In the configuration file, do not edit anything to the left of the equal signs. To
insert comments, prefix them with a pound sign (#). Complete either the UMD
Show section or the UMD Build section, not both. (Exception: For UMD
Show, specify the dictionary at the Source Dictionary parameter.)
Command lineWhen you run UMD, include the configuration file as a parameter on the
UMD command line:
PlatformCommand line
Output File Name
Output File Type
UNIX, VMS
Windows, Alpha NT
These parameters are for UMD Show mode only. If you want to record query
results in an output file, enter the path and file name. Specify a file type of
ASCII, dBASE3, or Delimited.
1. If you plan to use the output file as a transaction file, we recommend the following: If you plan to use a database
program or spreadsheet program to edit the file, create a dBASE3 or ASCII file. If you plan to use a text editor or word
processor to edit the file, create a delimited file. However, be aware that our UMD Views Windows program does not
support updating of delimited files.
umd -cfg
umd
/cfg cfg_file.cfg
1
cfg_file.cfg
Appendix A: 39
Dictionary TypeEnter the type of dictionary you want to modify. Possible dictionary types are
Parsing, Capital, and Generic (search-and-replace).
Source DictionaryIf you are building a custom dictionary or table, enter the path and file name of
the source dictionary:
•If you are creating a parsing dictionary, specify our parsing dictionary
parsing.dct.
•If you are creating a capitalization dictionary or search-and-replace table,
you should usually leave this blank.
If you are querying an existing dictionary (UMD Show), specify the path and
file name of the dictionary you want to query.
Transaction File NameType the path and file name of the transaction file containing your entries.
Target DictionaryType the path and file name of the custom dictionary you want to create. If the
file already exists, UMD overwrites the existing file.
1
If you do not specify a
target, UMD uses the source dictionary as the target.
Do not overwrite any of our base dictionaries. Instead, give your custom
dictionary a separate name. Each time you install a software update, we
overwrite our base dictionaries. If you use our file names for your
dictionaries, your custom dictionaries may be overwritten.
Verify Input File OnlyIf you set this option to Yes , UMD checks all the entries in the transaction file
but does not actually produce the target dictionary. This is handy if you want
to verify during the day and run the build process during the night.
If you set this option to No, UMD checks the entries in the transaction file. If
no verification errors occur, UMD builds the target dictionary.
Error Message Log FileWe recommend that you specify an error log file. UMD will write any error or
warning messages to the log file so you can review them later.
If you leave this parameter blank, UMD sends error and warning messages to
the screen (standard output). If any messages scroll off the screen, you will not
be able to retrieve them.
Work DirectoryBy default, UMD places its temporary work files in the current directory. If
you would like to use some other location, specify a path.
To estimate the space required for work files, use this formula:
Work space = 4 x (size of transaction file + size of source dictionary)
1. Before overwriting an existing dictionary, UMD makes a backup copy of the existing file. For example, if the
dictionary is named custom.dct, UMD creates a backup file named custom.001. The next time, UMD creates a backup
named custom.002, and so on up to custom.999.
40UMD User’s Guide
Appendix B:
UMD command line
You can use one of three command lines with UMD:
•UMD Show, for querying an existing dictionary or table
•UMD Build, for verifying and building your user-modifiable dictionary
•UMD Config, for using the UMD configuration file.
UMD Show You can query an existing dictionary or table by using the UMD Show
command line.
PlatformCommand
line
UNIX, VMS
Windows,
Alpha NT
ParameterDescription
s dct_file.dct
o out_file
d db_type
1. If you plan to use the output file as a transaction file, we recommend the following: If you plan to use a
database program or spreadsheet program to edit the file, create a dBASE3 or ASCII file. If you plan to use a
text editor or word processor to edit the file, create a delimited file. However, be aware that our UMD Views
program does not support updating of delimited files.
umd -s
umd /s
Path and file name of the dictionary to query.
Path and file name of the output file. If you save a query,
UMD writes it to this file. If the file already exists, UMD
appends to the end of the file.
Note: You can edit the output file and use it as a transaction
file
Database type for the output file. Choose one: dBASE3,
ASCII (default), or Delimited.
dct_file.dct [-o
dct_file.dct [/o
out_file
out_file
1
] [-d
] [/d
db_type
db_type
]
]
Appendix B: 41
UMD Build If you prefer not to use the configuration file, you can place all the UMD
Build parameters on the command line.
PlatformCommand line
UNIX,
VMS
Windows,
Alpha NT
umd
umd
dct_type -i
dct_type /i
trans
[-s
source
] [-t
target
trans
[/s
source
] [/t
target
ParameterDescription
dct_type
i trans
s source
t target
e err_log
p work
Dictionary type:
replace).
Path and file name of the transaction file containing your custom
entries.
Path and file name of the source dictionary to use as a base for your
custom dictionary.
Path and file name of the custom dictionary to create. If the file
already exists, UMD will overwrite it.
get, UMD uses the source dictionary as the target.
Log file for validation warnings and errors. We recommend that you
include this parameter.
Path and directory to use for temporary storage of work files. To
estimate space requirements, use this formula:
Work space = 4 x (size of transaction file + size of source dictionary)
] [-e
] [/e
err_log
err_log
] [-p
] [/p
Parsing, Capital
work
work
] [-v]
] [/v]
, or
Generic
1
If you do not specify a tar-
(search-and-
v
1. Before overwriting an existing dictionary, UMD makes a backup copy of the existing file. For example, if the
dictionary is named custom.dct, UMD creates a backup file named custom.001. The next time, UMD creates a
backup named custom.002, and so on up to custom.999.
UMD Config Rather than type the UMD Show or UMD Build command line, you can
Verify only. If you include this option, UMD checks all the entries
in the transaction file but does not actually produce the target dictionary.
specify file names and options in the UMD configuration file (see “” on
page 39).
To run UMD with the configuration file, use the following command:
PlatformCommand line
UNIX, VMS
Windows, Alpha NT
umd
umd
-cfg
cfg_file.cfg
/cfg
cfg_file.cfg
42UMD User’s Guide
Appendix C:
Information codes and standard-type codes
Information codes If you’re creating a parsing transaction entry, type the appropriate information
codes (or “info codes”) in the Info field. Put one space (no punctuation)
between codes.
If you’re using UMD Show to query a parsing dictionary, these are the codes
shown in the Info Codes field.
Information codeDescription
FIRMINITWhen used in a firm name, likely to be the first word in the firm name.
FIRMLOCA location within a firm (usually used for internal mail delivery), such as
Room, Building.
FIRMMISCA word used in firm names.
FIRMNAMEThis code is used for firm names that may be parsed incorrectly. For example,
could be incorrectly parsed as a personal name, so
listed as Firm Name words.
FIRMTERMLikely to be the last word in a firm name, such as
HONPOSTA postname that signifies certification, academic degree, or affiliation, such as
USNR.
MATURPOSTA maturity postname such as Jr or
NAMEA first name or last name, such as
Note:
The entry must also include one of the NAMEGEN codes.
NAMEDESIGA name designator such as
NAMEGEN1-5The gender of the name.
• NAMEGEN1 94 to 100 percent chance the person is a man (e.g.,
• NAMEGEN2 70 to 93 percent chance the person is a man (e.g.,
• NAMEGEN3 The name does not reliably indicate gender, or is a last name.
• NAMEGEN4 70 to 93 percent chance the person is a woman (e.g.,
• NAMEGEN5 94 to 100 percent chance the person is a woman (e.g.,
Note:
The entry must also include the NAME code.
Attn
or
Sr.
John
c/o
or
.
Hewlett, Packard
Inc, Corp, Ltd,
Smith
.
Department, Mailstop,
, and
Hewlett Packard
and so on.
Robert
).
Adrian
).
Lynn
).
Anne
).
Hewlett Packard
are all
CPA, PhD
, or
NAMESPECA word that may appear in a name line, such as
NUMBERA number word, such as
PHRASE_WRDA word that is part of a phrase.
For example, the dictionary contains an entry for the phrase
VP
and
Mktg—
is marked as a PHRASE_WRD.
One, First
, or
1st
.
Family, Resident, Occupant
VP Mktg
. Each word in the phrase—
Appendix C: 43
.
Information codeDescription
PREGEN1-5The gender of a prename.
• PREGEN1 Masculine. For example,
• PREGEN3 Neutral. For example,
• PREGEN5 Feminine. For example,
Note:
The entry must also include the PRENAME code.
PRELASTA last-name prefix, such as
Van
Dr
Allen
Mr
or
Senor
or
Capt
.
Ms, Mrs
, or
or O’Connor
.
Senora
.
.
PRENAMEA prename, such as
Note:
The entry must also include one of the PREGEN codes.
REGIONA geographical word such as
Mr, Ms, Senor, Senora, Dr, Capt
North, We st e rn, Minnesota
TITLEA word used in a job title, such as
FINANA financial term, such as
Custodian
Software
or
Tru stee
or
Engineer
.
.
, or NY.
.
44UMD User’s Guide
S t a n d a r d - t y p e c o d e s If you’re creating a parsing transaction entry, type the appropriate standard-
type codes in the Stdtype field. Put one space (no comma) between codes.
If you’re using UMD Show to query a parsing dictionary, these are the codes
shown next to each Standard.
In this table, we use the terminology Primary and Secondary. If you are
using UMD Show, the Primary is the word you queried and the Secondary
is the Standard.
Standard-type codeDescription
ALL_TEXT_TYPESIf a text standard (STD) is not indicated for a particular type of data, use this standard as the default.
Usually used with NUMBER and REGION words.
For example, if a word is parsed as a firm name but the dictionary does not list a FIRM_STD for the
word, then use the ALL_TEXT_TYPES standard as the text standard.
Note:
The ALL_TEXT_TYPES is used as a default text standard (STD) only. It is not used as a
default match standard or acronym.
FIRM_ACRIf the Primary is parsed as a firm name, use this Secondary as the acronym.
FIRM_MTCIf the Primary is parsed as a firm name, use this Secondary as the match standard.
FIRM_STDIf the Primary is parsed as a firm name, use this Secondary as the standardized form.
FIRMLOC_ACRIf the Primary is parsed as a firm location, use this Secondary as the acronym.
FIRMLOC_MTCIf the Primary is parsed as a firm location, use this Secondary as the match standard.
FIRMLOC_STDIf the Primary is parsed as a firm location, use this Secondary as the standardized form.
HONPOST_ACRIf the Primary is parsed as a honorary postname, use this Secondary as the acronym.
HONPOST_MTCIf the Primary is parsed as a honorary postname, use this Secondary as the match standard.
HONPOST_STDIf the Primary is parsed as a honorary postname, use this Secondary as the standardized form.
MATURPOST_MTCIf the Primary is parsed as a maturity postname, use this Secondary as the match standard.
MATURPOST_STDIf the Primary is parsed as a maturity postname, use this Secondary as the standardized form.
NAME_MTCIf the Primary is parsed as a name, use this Secondary as the match standard.
NAMEDESIG_ACRIf the Primary is parsed as a name designator, use this Secondary as the acronym.
NAMEDESIG_MTCIf the Primary is parsed as a name designator, use this Secondary as the match standard.
NAMEDESIG_STDIf the Primary is parsed as a name designator, use this Secondary as the standardized form.
NAMESPEC_ACRIf the Primary is parsed as a name-special component, use this Secondary as the acronym.
NAMESPEC_MTCIf the Primary is parsed as a name-special component, use this Secondary as the match standard.
NAMESPEC_STDIf the Primary is parsed as a name-special component, use this Secondary as the standardized form.
PRELAST_MTCIf the Primary is parsed as a last-name prefix, use this Secondary as the match standard.
PRELAST_STDIf the Primary is parsed as a last-name prefix, use this Secondary as the standardized form.
PRENAME_ACRIf the Primary is parsed as a prename, use this Secondary as the acronym.
PRENAME_MTCIf the Primary is parsed as a prename, use this Secondary as the match standard.
PRENAME_STDIf the Primary is parsed as a prename, use this Secondary as the standardized form.
Appendix C: 45
Standard-type codeDescription
TITLE_ACRIf the Primary is parsed as a title, use this Secondary as the acronym.
TITLE_MTCIf the Primary is parsed as a title, use this Secondary as the match standard.
TITLE_STDIf the Primary is parsed as a title, use this Secondary as the standardized form.
FINAN_ACRIf the Primary is parsed as a financial term, use this Secondary as the acronym.
FINAN_MTCIf the Primary is parsed as a financial term, use this Secondary as the match standard.
FINAN_STDIf the Primary is parsed as a financial term, use this Secondary as the standardized form.
46UMD User’s Guide
Index
A
acronyms
how parser generates
adding
firm that looks like a personal name
multiple-word firm name, 20
new word, 18
title phrase, 19