SAP Match/Consolidate 8.00c Extended Matching Reference

Match/Consolidate

Extended Matching Reference

Match/Consolidate 8.00c
April 2009
Copyright information © 2009 SAP® BusinessObjects™. All rights reserved. SAP BusinessObjects and its logos,
BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessOb­jects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelli­gence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP company and/or affiliated companies in the United States and/or other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries. All other names men­tioned herein may be trademarks of their respective owners.
2
Match/Consolidate Extended Matching Reference

Contents

Preface .............................................................................................................5
Chapter 1:
Introducing extended matching ................................................................... 7
Extended matching files.................................................................................10
The Match Results report ...............................................................................11
Enable extended matching ............................................................................13
Weighted scoring in rule matching ................................................................15
Chapter 2:
Extended matching blocks and parameters.............................................. 17
General ...........................................................................................................18
Parsing and Key Field Options.......................................................................19
Form Break Groups........................................................................................26
Break Field Definition....................................................................................28
Auto Field Option...........................................................................................29
Auto Match Spec............................................................................................36
Rule Definition...............................................................................................41
Rule Match Spec ............................................................................................51
Prioritize Matches ..........................................................................................54
Extended Match Criteria ................................................................................56
Appendix A:
Extended matching file (extmatch.mpg) ....................................................57
Index ..............................................................................................................63
Contents
3
4
Match/Consolidate Extended Matching Reference

Preface

About this manual

Conventions

This manual provides detailed information about Match/Consolidate (MCD) extended matching. Use this guide as you set up and run extended-matching jobs.
For conceptual information about matching records, refer to the User’s Guide to Record Matching. You should read this guide first because it will acquaint you with the concepts of matching records and all the options that are available to you.
This document follows these conventions:
Convention Description
Bold We use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
Italics We use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file,
Menu commands
!
and the
We indicate commands that you choose from menus in the follow­ing format: Menu Name > Command Name. For example, “Choose File > New.”
We use this symbol to alert you to important information and poten­tial problems.
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know about.
We use this symbol to draw your attention to tips that may be useful to you.
Preface
5
Documentation
Other documentation Documents related to this manual include the following:
Document Description
Access the latest documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, includ­ing how to create DEF, FMT, and DMT files.
Match/Consolidate User’s Guide to Record Matching
Explains the concepts behind name and address matching software and provides examples of how to implement, analyze, and fine-tune match detection strategies for the best results.
Match/Consolidate Job-File Reference
Quick Reference
Contains the operational how-to instructions for setting up the Match/Consolidate job file.
Contains descriptions of the input and output fields, and the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for
each product that you’ve installed are available in the Documentation folder. Choose Start > Programs > Business Objects Applications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’ documentation.
6
Match/Consolidate Extended Matching Reference
Chapter 1: Introducing extended matching
Match/Consolidate (MCD) extended matching lets you customize your MCD job much more than you can with Match Consolidate’s standard matching. You can specify matching criteria to the extent that you want. In other words, your setup may be very simple, or very complex.
There are two kinds of extended matching: automatic matching and rule­based matching:
With automatic matching, you choose the type of matching you want to
perform, such as individual or firm, as described in “Match types” on page 8. Then, MCD compares records based on the match type and other options you selected.
With rule-based matching, you set up rules that MCD uses to compare
records. Each rule contains details about a particular field that you want to compare and how you want the field compared.
Chapter 1: Introducing extended matching
7

Match types Match types are a starting point for defining the kind of matching you want to

perform.
When you perform automatic extended matching, you choose a match
type at the Match Type parameter in the Auto Match Specification block. When you perform rule-based extended matching, you start with the extended matching template file that corresponds with the match type you want to use (for example, firm.mpg).
When using match types, consider the following questions:
What fields do I want to compare? (Last name, firm, and so on.)
What end result do I want when the MCD job is complete? (One record per
family, per firm, and so on.)
You can use the following match types with MCD extended matching:
Match type Description
Family The purpose of the family match type is to determine whether two people should be considered mem-
bers of the same family, as reflected by their record data. Match/Consolidate compares the last name and the address data. A match means that the two records reflect members of the same family.
The result of the match is one record per family.
Individual The purpose of the individual match type is to determine whether two records are for the same person,
as reflected by their record data. Match/Consolidate compares the first name, last name, and address data. A match means that the two records reflect the same person.
The result of the match is one record per individual.
Resident The purpose of the resident match type is to determine whether two records should be considered mem-
bers of the same residence, as reflected by their record data. Match/Consolidate compares the address data. A match means that the two records are members of the same household. Contrast this match type with the family match type, which also compares last-name data.
The result of the match is one record per residence.
Firm The purpose of the firm match type is to determine whether two records reflect the same firm. This
match type involves comparisons of firm and address data. A match means that the two records repre­sent the same firm.
The result of the match is one record per firm.
Firm-Individual The purpose of the firm-individual match type is to determine whether two records are for the same
person at the same firm, as reflected by their record data. With this match type, MCD compares the first name, last name, firm name, and address data. A match means that the two records reflect the same per­son at the same firm.
The result of the match is one record per individual per firm.
8
Match/Consolidate Extended Matching Reference

Rule-based matching With rule-based extended matching, you can define precisely which fields MCD

should compare and how MCD should compare them. You control what qualifies as a match by setting up rules.
First, you choose
Required blocks Optional blocks
your match type by choosing the appropriate extended matching file (refer to
Parsing and Key Options Rule Definition Rule Match Spec Prioritize Matches
Form Break Groups Break Field Definition Additional Rule Definition blocks Extended Match Criteria
Extended
matching files” on page 10). Then, if you want, you can adjust the settings to
further customize your matching criteria. You can leave the default settings if they meet your needs.

Automatic matching Automatic matching requires you to select a match threshold and a match type

(refer to “Match types” on page 8). You can also edit various parameters to fine­tune the process, if you so choose. However, it’s not necessary to perform a field­by-field setup.
Automatic matching looks at the types of data fields in your keys and then selects the appropriate fields to use for performing
Required blocks Optional blocks
General Parsing and Key Field Options Auto Match Spec Prioritize Matches
Break Field Definition Form Break Groups Auto Field option
comparisons. For example, MCD would compare name fields if you selected a family or individual match type, but not if you selected a firm match type.
Match/Consolidate uses four thresholds: exact, tight, medium, and loose. Each threshold uses percentages to determine how similar fields must be for MCD to consider them duplicates.
For two fields to match using an exact threshold, the fields compared must be 100 percent alike. The percentages decrease as we move from exact to loose.
Chapter 1: Introducing extended matching
9

Extended matching files

Extended matching files defined

Extended matching files contain job-file blocks and parameters for extended matching. We provides you with six extended matching file templates—one for automatic matching and five for rule-based matching. You can use a rule-based extended matching file template as-is, or save a copy of the file with a different name and edit it as needed. Note that extended matching files must use the .mpg extension.
File name Description
Auto.mpg
Family.mpg
Indiv.mpg
Hhold.mpg
Extended matching file for automatic matching. This file contains settings for each kind of match type (family, individual, firm, firm­individual, and resident). You must copy the desired match-type section from the extended matching file.
Extended matching file for rule-based matching, using the family match type. Refer to “Match types” on page 8 for details about match types.
Extended matching file for rule-based matching, using the individ­ual match type. Refer to “Match types” on page 8 for details about match types.
Extended matching file for rule-based matching, using the house­hold, or resident, match type. Refer to “Match types” on page 8 for details about match types.
auto.mpg
file to create your own automatic

Extended matching files location

Firm.mpg
Firmindv.mpg
Extended matching file for rule-based matching, using the firm match type. Refer to “Match types” on page 8 for details about match types.
Extended matching file for rule-based matching, using the firm­individual match type. Refer to “Match types” on page 8 for details about match types.
Refer to “Extended matching blocks and parameters” on page 17 for detailed descriptions of the extended matching files blocks and parameters.
When you install MCD, it installs the extended matching file templates in the template subdirectory (pw\mpg\template, or postware\merge\template). The extmatch.mpg file contains all of the extended matching blocks, which is installed in pw\mpg or postware\merge.
You can set up your extended matching job in one of two ways.
1. Set up extended matching in an extended matching file and refer to that file from your job file at the Ext Match Blocks parameter in the Auxiliary Files block (refer to “Enter the name and path of the extended matching file” on
page 14).
2. Copy the blocks from one of the extended matching templates directly into your MCD job.
10
Match/Consolidate Extended Matching Reference

The Match Results report

The Match Results report shows the results of your extended matching. This may help you if you get matching results that you did not expect or desire. Based on information given in this report, you might decide to change settings in your extended matching file.

Produce the report To produce the Match Results report (shown below), you must include and

complete the Report: Match Results block in your MCD job (not in the extended matching file). Refer to the Match/Consolidate Job-File Reference or Views help file for details about setting up this block.
BEGIN Report: Match Results =================================== Location and File Name/Printer Device =
Existing File (APPEND/REPLACE)....... =
Number of Copies (1 to 10)........... =
Case (UPPER/Upper and Lower)......... =
Page Header Line 1 (to 80 chars)..... =
Page Header Line 2 (to 80 chars)..... =
Page Header Line 3 (to 80 chars)..... =
Page Header Line 4 (to 80 chars)..... =
Printer Init, For Reports (see NOTE). = Printer Reset, For Reports (see NOTE) =
Page Length (in lines)............... =
Page Width (in chars)................ =
Top Margin (in lines)................ =
Bottom Margin (in lines)............. =
Left Margin (in chars)............... =
Right Margin (in chars).............. =
Adv Match Set to Report (ALL/SELECT). = Adv Select Match Set(s) (name[,...]). = Adv Match Level (FINEST/ALL/SELECT).. =
Adv Select Match Level to Report..... =
END
The following shows the Views Match Results window.
Chapter 1: Introducing extended matching
11

Example report The following is an example of a match results report.

Match Results Report & job Match/Consolidate X.XX Page 1
-----------------------------------------------------------------------
Match Set Name: Keyset1 Match Level: level2 Total Comparisons: 22663
Non Match Spec Results Not Compared - Forced No Match: 1674 Not Compared - Already a Match: 5240 Not Compared - Compares Disabled: 0 Pre-compare Exit Match Decisions: 0 Pre-compare Exit No Match Decisions: 0 Match Post-compare Exit No Match Decisions: 0 No Match Post-compare Exit Match Decisions: 0
Match Spec Results Match Spec Name: R_Hhold Match Spec Type: rule Match Attempts: 15749 Match Decisions: 2587 No Match Decisions: 13162 Undecided: 0
Rule Match Results Summary
Rule Rule Attempts Match No Match Percent Number Field Type Made Decision Decision Decision
1 Last_Name 15749 0 12725 56.15 2 Prim_Range 3024 0 351 1.55 3 PO_Box 2673 0 1 0.00 4 RR_Box 2672 0 14 0.06 5 Prim_Name 2658 0 29 0.13 6 RR_Number 2629 0 0 0.00 7 Address 2629 0 42 0.19 WEIGHTED 2587 2587 0 11.42
12
Match/Consolidate Extended Matching Reference

Enable extended matching

As shown in the following job file application, to enable extended matching, you must either set the Matching Method parameter in the MCD job’s Execution Options block to Ext or Adv. As shown in the MCD Execution Options window, set the Matching Method parameter to Extended or Advanced.
BEGIN Execution =============================================== Read Records & Create Match Sets(Y/N) = Y
Find Duplicates (Y/N/PREDICT)........ = N
Matching Method (STD/EXT/ADV)........ = EXT
Create Match/Consolidate File (Y/N).. = N Create Multi-Occurrence File (Y/N)... = N
Create All-Duplicates File (Y/N)..... = N
Create Custom M/C File (Y/N)......... = N
Post to Input File (Y/N)............. = N
Group Post to Purged Files (Y/N)..... = N
Purge (Y/N/PREDICT).................. = N
Custom Purge (Y/N/PREDICT)........... = N
Create Reports (Y/N)................. = Y
Create Report Statistics Files (Y/N). = N
Save Work Files (Y/N)................ = Y
Warn Before File Overwrite (Y/N)..... = Y
Work File Directory (path)........... =
Sort Work File Directory (path)...... =
Create Backup File(s) (Y/N).......... = Y
Backup Directory (path).............. =
Maximum Work Buffer Size (kilobytes) = 4096
Sort Optimization (SPACE/SPEED)...... = SPEED
Virtual Machine (NONE/CREATE)........ = NONE
END
Chapter 1: Introducing extended matching
13

Enter the name and path of the extended matching file

If you choose to keep your extended match block in a separate file, you can also enter the name and path of the extended matching file in the MCD job’s Auxiliary Files block. For details about extended matching files, refer to “Extended
matching files” on page 10.
BEGIN Auxiliary Files ========================================= Address Line Dct (path & addrln.dct).. = *INSERT PATH HERE* addrln.dct
Last Line Dct (path & lastln.dct)..... = *INSERT PATH HERE* lastln.dct
City Directory (path & city08.dir).... = *INSERT PATH HERE* city08.dir
ZCF Directory (path & zcf08.dir)...... = *INSERT PATH HERE* zcf08.dir
Ext ZIP+4 Dir 1 (path & zip4??.dir)... = *INSERT PATH HERE* zip4??.dir Ext ZIP+4 Dir 2 (path & zip4??.dir)... = *INSERT PATH HERE* zip4??.dir Ext Rev ZIP+4 Dir (path & revzip4.dir).= *INSERT PATH HERE* revzip4.di Ext Firm Line Dct (path & firmln.dct). = *INSERT PATH HERE* firmln.dct
Ext Cap Dct (path & pwcas.dct)........ = *INSERT PATH HERE* pwcas.dct
Std Prename Dct (path & prename.dct).. = *INSERT PATH HERE* prename.dc
Std Name Dct (path & name.dct)........ = *INSERT PATH HERE* name.dct
Std Pre-lastname (path & prelname.dct).= *INSERT PATH HERE* prelname.d
Std Postname (path & postname.dct).... = *INSERT PATH HERE* postname.d
Ext Multi-line Rules (mlrules.gcf).... = *INSERT PATH HERE* mlrules.gc
Ext Firm Rules (path & fprules.gcf)... = *INSERT PATH HERE* fprules.gc Ext Parsing Dct (path & parsing.dct).. = *INSERT PATH HERE* parsing.dc Std Match Percent Dct(path & file.dct).= *INSERT PATH HERE* matchpct.d Ext Match Blocks (path & file name)... = Default ASCII FMT (path & file.fmt)... =
Default DEF (path & file.def)......... =
END

Edit the extended matching file

14
Match/Consolidate Extended Matching Reference
You can use one of the rule-based extended matching templates that we provide for you, as we provide it to you. Or you can create your own extended matching file by copying blocks from one of the templates or extmatch.mpg.
Do not edit the parameter settings in the template files or extmatch.mpg. Instead, you can save a copy of the original file with a different name, or copy the extended matching blocks into your MCD job. You can then edit the parameter settings as necessary. Installing software updates overwrites the extended matching files.
Refer to “Extended matching blocks and parameters” on page 17 for detailed descriptions of the extended matching blocks and parameters.

Weighted scoring in rule matching

Weighted scoring adjusts the similarity score for each match rule that is used by multiplying the similarity score times a weight. For example, if you match on two fields, the sum of these weighted values is the similarity of the two fields. With this, you can fail on one rule, but still get a match on the two fields. For example, a spelling error in a street name will not prevent a match.

Similarity score times weight

When a rule is used to compare two key fields, the Match engine generates a similarity score from 0 to 100. That score is then multiplied by the weight percent that you set. For example, if the weight percent is set to 20 and two first names have a similarity score of 90, then the first name comparison would contribute 18 (20 percent of 90) to the overall weighted score. For a perfect match, the sum of all key field comparisons would normally be 100.

Force no-match For each match rule, you can set a maximum no dupe score, below which you’ll

never consider the keys a match, even if the overall weighted score is above the threshold that has been set for a match. This allows you to “force no-match”—for example, you could decide that if the last name similarity score is less than or equal to 50, then the keys should be called a no-match.

Force a match For each match rule, you can set a minimum dupe similarity score, above which

you’ll always consider the keys a match, even if the overall weighted score is below the threshold that has been set for a match. This allows you to force a match. For example, you might decide that if the social security similarity score is 100, then the keys should be called a match.
The following example illustrates how weighting works—assume the minimum overall weighted score required is 85.
field weight (in percent) force no-match force a match
street number 20 percent
street name 20 percent
secondary number 15 percent
first name 20 percent 25
last name 20 percent 50
post name 5 percent
If the first name similarity score is 25 or less, or if the last name score is 50 or less, the keys will not match, even if the weighted score is 85 or greater.
But if the last name similarity score was 60, then the similarity score would be multiplied by the weight percent, so that the last name comparison would contribute 12 (20 percent of 60) to the overall weighted score.
Chapter 1: Introducing extended matching
15
16
Match/Consolidate Extended Matching Reference
Chapter 2: Extended matching blocks and parameters
This chapter describes the extended matching blocks and parameters. You have an option of placing extended matching reference blocks within your job file, or you can set up extended matching in an external extended matching file. You can then refer to that file from your job file at the Ext Match Blocks parameter in the Auxiliary Files block (refer to “Enter the name and path of the extended matching
file” on page 14).
Chapter 2: Extended matching blocks and parameters
17

General

The extended matching file that you use must contain a General block. This allows us to update the extended matching files in the future.
Job Description (to 80 chars) Job Owner (to 20 chars)
Do not enter any values at these two parameters in your extended matching file. Match/Consolidate (MCD) takes the values entered in the MCD job-file parameters instead. This block is only used for updating extended matching files with the Edjob utility.
18
Match/Consolidate Extended Matching Reference

Parsing and Key Field Options

In the Parsing and Key Field Options block, you specify the layout of keys and the standardization to perform on the data stored in keys. The Parsing and Key Field Options block is required if you perform extended matching. The Parsing and Key Field Options block is required for both automatic and rule-based extended matching.
Parsing and Key Options Name (to 20 chars)
This is an optional parameter that assigns a logical name for the block. The Parsing and Key Options parameter in the Extended Matching Criteria block references this name. Do not use the name in any other Parsing and Key Options block.
Automatically Generate Key
This parameter controls whether MCD should automatically generate the length of each key. Valid options for this parameter are listed below.
Option Description
None If you set this parameter to None, then the only key fields defined will
be the ones listed in the Key Length parameters, which appear at the end of the Parsing and Key Options block.
Individual MCD designs keys so that the results will yield one record per person
(first and last name) at an address.
Family MCD designs keys so that the results will yield one record per family
(last name) at an address.
Firm MCD designs keys so that the results will yield one record per firm.
Firm_Individual MCD designs keys so that the results will yield one record per person
at a firm.
Resident MCD designs keys so that the results will yield one record per resi-
dence.
Name, Title, and Firm Parsing (Standard/Extended/None)
This parameter controls whether MCD uses its standard parsing routine or uses its extended parsing routine for name, title, and firm data. You can expect better matching results from the extended parsing routine, with an increase in processing time.
Option Description
Standard If you set this parameter to Std (standard), MCD uses the name-related dic-
tionaries listed in the four Std parameters in the Auxiliary Files block. Those four parameters must be filled out when this parameter is set to Std.
Extended If you set this parameter to Ext (extended), MCD uses the extended multi-
line, firm, and parsing dictionaries listed in the Auxiliary Files block.
None Select this option to disable parsing.
Chapter 2: Extended matching blocks and parameters
19
Standard Number of Names to Store (1-2)
This parameter applies to standard parsing. You can enter the number of people’s names that you want to store in the key. For example, if a name line was Bill and Mary Smith, you could keep both names or just one name (in this case, the first name, Bill Smith).
Match/Consolidate will store one or two (depending on your choice) of each of the following name fields in the key: Pre_Name, First_Name, Mid_Name. The more names, name components, and name standards that you store, the more disk space and processing time you will need.
Standard Number of First_Name Stds (0-1)
This parameter applies to standard parsing. You can choose to store the standardized first name (such as Robert) along with the original first name (such as Bob) in the key. If you enter 0 (zero), only the original first name data will be stored in the key.
Note that this applies to the name which appears first. For example, for Billy and Mary Smith, MCD stores only the standards for Billy.
Standard Number of Mid_Name Stds (0-1)
This parameter applies to standard parsing. You can choose to store the standardized middle name (such as Robert) along with the original name (such as Bob) in the key. If you enter 0 (zero), only the original middle name data is stored in the key.
This applies to the name that appears first. For example, for Billy Bob and Mary Sue Smith, MCD stores the standards for only the Mid_Name Bob.
Extended Number of Names to Store (1-3)
This parameter applies to extended parsing. You can enter the number of people’s names that you want to store in the key. For example, if a name line was Bill and Mary Smith, you could keep both names or just one name (in this case, the first name, Bill Smith).
Depending on your choice at this parameter, MCD will store one, two, or three of each of the following fields in the key:
Pre_Name Oth_Post First_Name Gender Mid_Name SSN Last_Name Birthdate Mat_Post Title
20
Match/Consolidate Extended Matching Reference
Loading...
+ 46 hidden pages