BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects
Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP
company and/or affiliated companies in the United States and/or other countries. SAP® is a
registered trademark of SAP AG in Germany and/or other countries. All other names mentioned herein may be trademarks of their respective owners.
Index ..............................................................................................................63
Contents
3
4
Match/Consolidate Extended Matching Reference
Preface
About this manual
Conventions
This manual provides detailed information about Match/Consolidate (MCD)
extended matching. Use this guide as you set up and run extended-matching
jobs.
For conceptual information about matching records, refer to the User’s Guide to Record Matching. You should read this guide first because it will acquaint
you with the concepts of matching records and all the options that are
available to you.
This document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file,
Menu
commands
!
and the
We indicate commands that you choose from menus in the following format: Menu Name > Command Name. For example, “Choose
File > New.”
We use this symbol to alert you to important information and potential problems.
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful
to you.
Preface
5
Documentation
Other documentationDocuments related to this manual include the following:
DocumentDescription
Access the latest
documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, including how to create DEF, FMT, and DMT files.
Match/Consolidate
User’s Guide to Record
Matching
Explains the concepts behind name and address matching
software and provides examples of how to implement,
analyze, and fine-tune match detection strategies for the
best results.
Match/Consolidate
Job-File Reference
Quick Reference
Contains the operational how-to instructions for setting
up the Match/Consolidate job file.
Contains descriptions of the input and output fields, and
the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for
each product that you’ve installed are available in the Documentation
folder. Choose Start > Programs > Business ObjectsApplications >
Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’
documentation.
6
Match/Consolidate Extended Matching Reference
Chapter 1:
Introducing extended matching
Match/Consolidate (MCD) extended matching lets you customize your MCD job
much more than you can with Match Consolidate’s standard matching. You can
specify matching criteria to the extent that you want. In other words, your setup
may be very simple, or very complex.
There are two kinds of extended matching: automatic matching and rulebased matching:
With automatic matching, you choose the type of matching you want to
perform, such as individual or firm, as described in “Match types” on page 8.
Then, MCD compares records based on the match type and other options you
selected.
With rule-based matching, you set up rules that MCD uses to compare
records. Each rule contains details about a particular field that you want to
compare and how you want the field compared.
Chapter 1: Introducing extended matching
7
Match typesMatch types are a starting point for defining the kind of matching you want to
perform.
When you perform automatic extended matching, you choose a match
type at the Match Type parameter in the Auto Match Specification block. When
you perform rule-based extended matching, you start with the extended matching
template file that corresponds with the match type you want to use (for example,
firm.mpg).
When using match types, consider the following questions:
What fields do I want to compare? (Last name, firm, and so on.)
What end result do I want when the MCD job is complete? (One record per
family, per firm, and so on.)
You can use the following match types with MCD extended matching:
Match typeDescription
FamilyThe purpose of the family match type is to determine whether two people should be considered mem-
bers of the same family, as reflected by their record data. Match/Consolidate compares the last name
and the address data. A match means that the two records reflect members of the same family.
The result of the match is one record per family.
IndividualThe purpose of the individual match type is to determine whether two records are for the same person,
as reflected by their record data. Match/Consolidate compares the first name, last name, and address
data. A match means that the two records reflect the same person.
The result of the match is one record per individual.
ResidentThe purpose of the resident match type is to determine whether two records should be considered mem-
bers of the same residence, as reflected by their record data. Match/Consolidate compares the address
data. A match means that the two records are members of the same household. Contrast this match type
with the family match type, which also compares last-name data.
The result of the match is one record per residence.
FirmThe purpose of the firm match type is to determine whether two records reflect the same firm. This
match type involves comparisons of firm and address data. A match means that the two records represent the same firm.
The result of the match is one record per firm.
Firm-IndividualThe purpose of the firm-individual match type is to determine whether two records are for the same
person at the same firm, as reflected by their record data. With this match type, MCD compares the first
name, last name, firm name, and address data. A match means that the two records reflect the same person at the same firm.
The result of the match is one record per individual per firm.
8
Match/Consolidate Extended Matching Reference
Rule-based matchingWith rule-based extended matching, you can define precisely which fields MCD
should compare and how MCD should compare them. You control what qualifies
as a match by setting up rules.
First, you choose
Required blocksOptional blocks
your match type by
choosing the
appropriate
extended matching
file (refer to
Parsing and Key Options
Rule Definition
Rule Match Spec
Prioritize Matches
Form Break Groups
Break Field Definition
Additional Rule Definition blocks
Extended Match Criteria
“Extended
matching files” on page 10). Then, if you want, you can adjust the settings to
further customize your matching criteria. You can leave the default settings if
they meet your needs.
Automatic matchingAutomatic matching requires you to select a match threshold and a match type
(refer to “Match types” on page 8). You can also edit various parameters to finetune the process, if you so choose. However, it’s not necessary to perform a fieldby-field setup.
Automatic matching
looks at the types of data
fields in your keys and
then selects the
appropriate fields to use
for performing
Required blocksOptional blocks
General
Parsing and Key Field Options
Auto Match Spec
Prioritize Matches
Break Field Definition
Form Break Groups
Auto Field option
comparisons. For
example, MCD would compare name fields if you selected a family or individual
match type, but not if you selected a firm match type.
Match/Consolidate uses four thresholds: exact, tight, medium, and loose. Each
threshold uses percentages to determine how similar fields must be for MCD to
consider them duplicates.
For two fields to match using an exact threshold, the fields compared must be 100
percent alike. The percentages decrease as we move from exact to loose.
Chapter 1: Introducing extended matching
9
Extended matching files
Extended matching
files defined
Extended matching files contain job-file blocks and parameters for extended
matching. We provides you with six extended matching file templates—one for
automatic matching and five for rule-based matching. You can use a rule-based
extended matching file template as-is, or save a copy of the file with a different
name and edit it as needed. Note that extended matching files must use the .mpg
extension.
File nameDescription
Auto.mpg
Family.mpg
Indiv.mpg
Hhold.mpg
Extended matching file for automatic matching. This file contains
settings for each kind of match type (family, individual, firm, firmindividual, and resident). You must copy the desired match-type
section from the
extended matching file.
Extended matching file for rule-based matching, using the family
match type. Refer to “Match types” on page 8 for details about
match types.
Extended matching file for rule-based matching, using the individual match type. Refer to “Match types” on page 8 for details about
match types.
Extended matching file for rule-based matching, using the household, or resident, match type. Refer to “Match types” on page 8 for
details about match types.
auto.mpg
file to create your own automatic
Extended matching
files location
Firm.mpg
Firmindv.mpg
Extended matching file for rule-based matching, using the firm
match type. Refer to “Match types” on page 8 for details about
match types.
Extended matching file for rule-based matching, using the firmindividual match type. Refer to “Match types” on page 8 for details
about match types.
Refer to “Extended matching blocks and parameters” on page 17 for detailed
descriptions of the extended matching files blocks and parameters.
When you install MCD, it installs the extended matching file templates in the
template subdirectory (pw\mpg\template, or postware\merge\template). The
extmatch.mpg file contains all of the extended matching blocks, which is
installed in pw\mpg or postware\merge.
You can set up your extended matching job in one of two ways.
1.Set up extended matching in an extended matching file and refer to that file
from your job file at the Ext Match Blocks parameter in the Auxiliary Files
block (refer to “Enter the name and path of the extended matching file” on
page 14).
2.Copy the blocks from one of the extended matching templates directly into
your MCD job.
10
Match/Consolidate Extended Matching Reference
The Match Results report
The Match Results report shows the results of your extended matching. This may
help you if you get matching results that you did not expect or desire. Based on
information given in this report, you might decide to change settings in your
extended matching file.
Produce the reportTo produce the Match Results report (shown below), you must include and
complete the Report: Match Results block in your MCD job (not in the extended
matching file). Refer to the Match/Consolidate Job-File Reference or Views help
file for details about setting up this block.
BEGIN Report: Match Results ===================================
Location and File Name/Printer Device =
Existing File (APPEND/REPLACE)....... =
Number of Copies (1 to 10)........... =
Case (UPPER/Upper and Lower)......... =
Page Header Line 1 (to 80 chars)..... =
Page Header Line 2 (to 80 chars)..... =
Page Header Line 3 (to 80 chars)..... =
Page Header Line 4 (to 80 chars)..... =
Printer Init, For Reports (see NOTE). =
Printer Reset, For Reports (see NOTE) =
Page Length (in lines)............... =
Page Width (in chars)................ =
Top Margin (in lines)................ =
Bottom Margin (in lines)............. =
Left Margin (in chars)............... =
Right Margin (in chars).............. =
Adv Match Set to Report (ALL/SELECT). =
Adv Select Match Set(s) (name[,...]). =
Adv Match Level (FINEST/ALL/SELECT).. =
Adv Select Match Level to Report..... =
END
The following shows the Views Match Results window.
Chapter 1: Introducing extended matching
11
Example report The following is an example of a match results report.
Match Results Report & job Match/Consolidate X.XX Page 1
Match Set Name: Keyset1
Match Level: level2
Total Comparisons: 22663
Non Match Spec Results
Not Compared - Forced No Match: 1674
Not Compared - Already a Match: 5240
Not Compared - Compares Disabled: 0
Pre-compare Exit Match Decisions: 0
Pre-compare Exit No Match Decisions: 0
Match Post-compare Exit No Match Decisions: 0
No Match Post-compare Exit Match Decisions: 0
Match Spec Results
Match Spec Name: R_Hhold
Match Spec Type: rule
Match Attempts: 15749
Match Decisions: 2587
No Match Decisions: 13162
Undecided: 0
Rule Match Results Summary
Rule Rule Attempts Match No Match Percent
Number Field Type Made Decision Decision Decision
As shown in the following job file application, to enable extended matching, you
must either set the Matching Method parameter in the MCD job’s Execution
Options block to Ext or Adv. As shown in the MCD Execution Options window,
set the Matching Method parameter to Extended or Advanced.
BEGIN Execution ===============================================
Read Records & Create Match Sets(Y/N) = Y
Find Duplicates (Y/N/PREDICT)........ = N
Matching Method (STD/EXT/ADV)........ = EXT
Create Match/Consolidate File (Y/N).. = N
Create Multi-Occurrence File (Y/N)... = N
Create All-Duplicates File (Y/N)..... = N
Create Custom M/C File (Y/N)......... = N
Post to Input File (Y/N)............. = N
Group Post to Purged Files (Y/N)..... = N
Purge (Y/N/PREDICT).................. = N
Custom Purge (Y/N/PREDICT)........... = N
Create Reports (Y/N)................. = Y
Create Report Statistics Files (Y/N). = N
Save Work Files (Y/N)................ = Y
Warn Before File Overwrite (Y/N)..... = Y
Work File Directory (path)........... =
Sort Work File Directory (path)...... =
Create Backup File(s) (Y/N).......... = Y
Backup Directory (path).............. =
Maximum Work Buffer Size (kilobytes) = 4096
Sort Optimization (SPACE/SPEED)...... = SPEED
Virtual Machine (NONE/CREATE)........ = NONE
END
Chapter 1: Introducing extended matching
13
Enter the name and
path of the extended
matching file
If you choose to keep your extended match block in a separate file, you can also
enter the name and path of the extended matching file in the MCD job’s Auxiliary
Files block. For details about extended matching files, refer to “Extended
matching files” on page 10.
BEGIN Auxiliary Files =========================================
Address Line Dct (path & addrln.dct).. = *INSERT PATH HERE* addrln.dct
Last Line Dct (path & lastln.dct)..... = *INSERT PATH HERE* lastln.dct
City Directory (path & city08.dir).... = *INSERT PATH HERE* city08.dir
You can use one of the rule-based extended matching templates that we provide
for you, as we provide it to you. Or you can create your own extended matching
file by copying blocks from one of the templates or extmatch.mpg.
Do not edit the parameter settings in the template files or extmatch.mpg. Instead,
you can save a copy of the original file with a different name, or copy the
extended matching blocks into your MCD job. You can then edit the parameter
settings as necessary. Installing software updates overwrites the extended
matching files.
Refer to “Extended matching blocks and parameters” on page 17 for detailed
descriptions of the extended matching blocks and parameters.
Weighted scoring in rule matching
Weighted scoring adjusts the similarity score for each match rule that is used by
multiplying the similarity score times a weight. For example, if you match on two
fields, the sum of these weighted values is the similarity of the two fields. With
this, you can fail on one rule, but still get a match on the two fields. For example,
a spelling error in a street name will not prevent a match.
Similarity score times
weight
When a rule is used to compare two key fields, the Match engine generates a
similarity score from 0 to 100. That score is then multiplied by the weight percent
that you set. For example, if the weight percent is set to 20 and two first names
have a similarity score of 90, then the first name comparison would contribute 18
(20 percent of 90) to the overall weighted score. For a perfect match, the sum of
all key field comparisons would normally be 100.
Force no-matchFor each match rule, you can set a maximum no dupe score, below which you’ll
never consider the keys a match, even if the overall weighted score is above the
threshold that has been set for a match. This allows you to “force no-match”—for
example, you could decide that if the last name similarity score is less than or
equal to 50, then the keys should be called a no-match.
Force a matchFor each match rule, you can set a minimum dupe similarity score, above which
you’ll always consider the keys a match, even if the overall weighted score is
below the threshold that has been set for a match. This allows you to force a
match. For example, you might decide that if the social security similarity score is
100, then the keys should be called a match.
The following example illustrates how weighting works—assume the minimum
overall weighted score required is 85.
fieldweight (in percent)force no-matchforce a match
street number20 percent
street name20 percent
secondary number15 percent
first name20 percent25
last name20 percent50
post name5 percent
If the first name similarity score is 25 or less, or if the last name score is 50 or
less, the keys will not match, even if the weighted score is 85 or greater.
But if the last name similarity score was 60, then the similarity score would be
multiplied by the weight percent, so that the last name comparison would
contribute 12 (20 percent of 60) to the overall weighted score.
Chapter 1: Introducing extended matching
15
16
Match/Consolidate Extended Matching Reference
Chapter 2:
Extended matching blocks and parameters
This chapter describes the extended matching blocks and parameters. You have
an option of placing extended matching reference blocks within your job file, or
you can set up extended matching in an external extended matching file. You can
then refer to that file from your job file at the Ext Match Blocks parameter in the
Auxiliary Files block (refer to “Enter the name and path of the extended matching
file” on page 14).
Chapter 2: Extended matching blocks and parameters
17
General
The extended matching file that you use must contain a General block. This
allows us to update the extended matching files in the future.
Do not enter any values at these two parameters in your extended matching file.
Match/Consolidate (MCD) takes the values entered in the MCD job-file
parameters instead. This block is only used for updating extended matching files
with the Edjob utility.
18
Match/Consolidate Extended Matching Reference
Parsing and Key Field Options
In the Parsing and Key Field Options block, you specify the layout of keys and
the standardization to perform on the data stored in keys. The Parsing and Key
Field Options block is required if you perform extended matching. The Parsing
and Key Field Options block is required for both automatic and rule-based
extended matching.
Parsing and Key Options Name (to 20 chars)
This is an optional parameter that assigns a logical name for the block. The
Parsing and Key Options parameter in the Extended Matching Criteria block
references this name. Do not use the name in any other Parsing and Key Options
block.
Automatically Generate Key
This parameter controls whether MCD should automatically generate the length
of each key. Valid options for this parameter are listed below.
OptionDescription
NoneIf you set this parameter to None, then the only key fields defined will
be the ones listed in the Key Length parameters, which appear at the
end of the Parsing and Key Options block.
IndividualMCD designs keys so that the results will yield one record per person
(first and last name) at an address.
FamilyMCD designs keys so that the results will yield one record per family
(last name) at an address.
FirmMCD designs keys so that the results will yield one record per firm.
Firm_IndividualMCD designs keys so that the results will yield one record per person
at a firm.
ResidentMCD designs keys so that the results will yield one record per resi-
dence.
Name, Title, and Firm Parsing (Standard/Extended/None)
This parameter controls whether MCD uses its standard parsing routine or uses its
extended parsing routine for name, title, and firm data. You can expect better
matching results from the extended parsing routine, with an increase in
processing time.
OptionDescription
StandardIf you set this parameter to Std (standard), MCD uses the name-related dic-
tionaries listed in the four Std parameters in the Auxiliary Files block.
Those four parameters must be filled out when this parameter is set to Std.
ExtendedIf you set this parameter to Ext (extended), MCD uses the extended multi-
line, firm, and parsing dictionaries listed in the Auxiliary Files block.
NoneSelect this option to disable parsing.
Chapter 2: Extended matching blocks and parameters
19
Standard Number of Names to Store (1-2)
This parameter applies to standard parsing. You can enter the number of people’s
names that you want to store in the key. For example, if a name line was Bill and
Mary Smith, you could keep both names or just one name (in this case, the first
name, Bill Smith).
Match/Consolidate will store one or two (depending on your choice) of each of
the following name fields in the key: Pre_Name, First_Name, Mid_Name. The
more names, name components, and name standards that you store, the more disk
space and processing time you will need.
Standard Number of First_Name Stds (0-1)
This parameter applies to standard parsing. You can choose to store the
standardized first name (such as Robert) along with the original first name (such
as Bob) in the key. If you enter 0 (zero), only the original first name data will be
stored in the key.
Note that this applies to the name which appears first. For example, for Billy and
Mary Smith, MCD stores only the standards for Billy.
Standard Number of Mid_Name Stds (0-1)
This parameter applies to standard parsing. You can choose to store the
standardized middle name (such as Robert) along with the original name (such as
Bob) in the key. If you enter 0 (zero), only the original middle name data is stored
in the key.
This applies to the name that appears first. For example, for Billy Bob and Mary
Sue Smith, MCD stores the standards for only the Mid_Name Bob.
Extended Number of Names to Store (1-3)
This parameter applies to extended parsing. You can enter the number of people’s
names that you want to store in the key. For example, if a name line was Bill and
Mary Smith, you could keep both names or just one name (in this case, the first
name, Bill Smith).
Depending on your choice at this parameter, MCD will store one, two, or three of
each of the following fields in the key: