SAP Match/Consolidate 8.00c Job-File Reference

Match/Consolidate

Job-File Reference

Match/Consolidate 8.00c
April 2009
Copyright information © 2009 SAP® BusinessObjects™. All rights reserved. SAP BusinessObjects and its logos,
BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP company and/or affiliated companies in the United States and/or other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries. All other names mentioned herein may be trademarks of their respective owners.
2
Match/Consolidate Job-File Reference

Contents

Preface .............................................................................................................5
Chapter 1:
Introduction to Match/Consolidate Job ...................................................... 7
Install Match/Consolidate ................................................................................8
Run a Match/Consolidate job...........................................................................9
Steps in Match/Consolidate processing..........................................................11
Run the Match/Consolidate job......................................................................13
Create Match/Consolidate jobs ......................................................................15
Match/Consolidate command line..................................................................17
Messages during verification and processing.................................................18
Template job files...........................................................................................19
Chapter 2:
Job-file blocks and parameters .................................................................. 21
General ...........................................................................................................22
Execution........................................................................................................23
Auxiliary Files................................................................................................28
Unicode Conversion.......................................................................................31
Input File ........................................................................................................33
Input List Defaults..........................................................................................37
Input List Description.....................................................................................45
Super List Defaults.........................................................................................46
Super List Description....................................................................................47
Post to Input File ............................................................................................48
Custom Purge Input File(s) ............................................................................50
Match Qualification........................................................................................52
Match Criteria ................................................................................................54
Match Options................................................................................................59
Match Set Defaults.........................................................................................69
Match Set........................................................................................................70
Combine Match Sets ......................................................................................72
Group Posting.................................................................................................75
Create File for Output ....................................................................................78
Match/Consolidate Output File ......................................................................80
All-Duplicates Output File.............................................................................85
Multi-Occurrence Output File........................................................................90
Custom Match/Consolidate Output File.........................................................95
Custom Output Sorting.................................................................................100
Report Defaults ............................................................................................101
Report: Executive Summary Report: Job Summary Report: Match Results Report: Input File Summary Report: Input List Summary Report: List Quality
Contents
3
Report: List-By-List Match Report: Multi-List
Report: Posted Dupe Groups ....................................................................... 105
Report: List Match
Report: List Duplicates ................................................................................ 106
Report: Output File ......................................................................................107
Report: Purge by List...................................................................................109
Report: Duplicate Records...........................................................................110
Report: Sorted Records................................................................................ 111
Report: Unparsed Records...........................................................................114
Statistics Files ..............................................................................................115
Appendix A:
Master job file (master.mpg) ....................................................................117
Appendix B:
Tips and troubleshooting...........................................................................133
Error-messages.............................................................................................134
Fine-tune your processing............................................................................136
Create output files........................................................................................ 137
Index............................................................................................................ 139
4
Match/Consolidate Job-File Reference

Preface

Match/Consolidate Job

Conventions

This manual provides detailed information about Match/Consolidate (MCD) job. Use this as a reference as you set up and run jobs.
For conceptual information about matching records, see the User’s Guide to Record Matching. This guide will acquaint you with the concepts of matching records and your matching records options and possibilities.
This document follows these conventions:
Convention Description
Bold We use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
Italics We use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file,
Menu commands
!
and the
We indicate commands that you choose from menus in the follow­ing format: Menu Name > Command Name. For example, “Choose File > New.”
We use this symbol to alert you to important information and poten­tial problems.
We use this symbol to point out special cases that you should know about.
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
We use this symbol to draw your attention to tips that may be useful to you.
Preface
5
Documentation
Documents related to this manual include the following:
Document Description
Access the latest documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, includ­ing how to create DEF, FMT, and DMT files.
Match/Consolidate User’s Guide to Record Matching
Explains the concepts behind name and address matching software and provides examples of how to implement, analyze, and fine-tune match detection strategies for the best results.
Match/Consolidate Extended Matching
Contains the operational how-to instructions for setting up extended matching.
Reference
Quick Reference
Contains descriptions of the input and output fields, and the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for
each product that you’ve installed are available in the Documentation folder. Choose Start > Programs > Business Objects Applications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’ documentation.
6
Match/Consolidate Job-File Reference
Chapter 1: Introduction to Match/Consolidate Job
This chapter is a synopsis of the most important facts that you will need to get started with Match/Consolidate (MCD). If you have never used MCD software, consider reading the User’s Guide to Record Matching before using this reference and before running sample jobs.
Chapter 1: Introduction to Match/Consolidate Job
7

Install Match/Consolidate

You must install MCD and the sample and template files before you can work with them. To install MCD, see the System Administrator’s Guide, which explains how to set up the computer and how to install the software. Follow those instructions thoroughly and carefully. Remember to set PATH and PW_PATH. See the Database Prep manual for information about databases and ASCII text files and how to prepare them for MCD.

Windows When you run install setup for Windows operating systems, the program

prompts you to select a drive, and to name a directory where you want MCD located. The default directory location is c:\pw; we recommend that you accept this default.
The installation program automatically creates subdirectories under \PW. If you’ve used another drive or directory name, change the path names accordingly. For details, see the System Administrator’s Guide.
C:
pw
dirsadm mpg mpgc
Edjob and other
utilities

UNIX The UNIX directory structure is shown below. You must create the postware and

ZCF and other
directories
work
Files for the MCD jobs MCD templates
MCD executables
and dictionaries
template
job files
MCD Custom
(optional)
samples
Files for Quick
Start jobs
adm directories. For details, see the System Administrator’s Guide.
postware
Edjob and other
utilities
dirsadm mpgc
ZCF and other
directories
merge
MCD executables
and dictionaries
MCD Custom
(optional)
work
Files for the MCD jobs MCD templates
8
Match/Consolidate Job-File Reference
template
job files
samples
Files for Quick
Start jobs

Run a Match/Consolidate job

Match/Consolidate is a batch program. You prepare several files for input, and enter a command line at the operating-system prompt. Match/Consolidate begins operation, requiring little or no further input.
The illustration below shows what goes into, and comes out of, a MCD job. When you start the program, it first scans the job file for errors. If any are found, you must edit the job file to correct them, then enter the MCD command line again. When all errors have been found and corrected, processing begins. Match/ Consolidate then presents messages about its progress.
Job file

Input

database(s)
Supporting files
Output
database(s)
Match/Consolidate
Reports
Work files
Supporting files
Input Here are the files that MCD requires:
File Description
Job file A job file tells MCD everything it needs to run: Where to find the
input file(s), what sort of processing to do, which reports and outputs to create, and where to place them.
Input file(s) The input file contains name, address, and other data used in the
matching process.
FMT, DMT file For some types of input files, you’ll need a format file. This
describes the input file in a physical way: the sequence of fields, their names, their lengths, and type of data.
DEF file Think of this file as a dictionary. The fields in the input file may have
any names you like. The DEF file translates those names into partic­ular names that MCD can recognize. For details about input files and their support files, see
Database Prep
.
Chapter 1: Introduction to Match/Consolidate Job
9

Results Here are the results that you may expect from MCD processing:

Result Description
Reports MCD prepares plenty of reports documenting the duplicate detection
process. Check the reports to verify good results. Most users instruct MCD to save reports in files, so they can be read on-screen or sent to a printer.
Statistics files MCD can generate statistics files containing information on the job.
These files can be brought into a database, spreadsheet, or word pro­cessing program so you can create the job-related reports.
Output file(s) MCD can create output files, such as databases of processed names
and addresses. You control the format and the content of output files.
Input posting MCD can place information back in the input records.
Input purge MCD can delete unwanted records from the input file(s), or mark
them for later removal.
Consolidation MCD can consolidate information from matched records to form a
best
record for the database.

Work files While processing, MCD stores internal information in work files. These files are

not directly useful to you, but you do need to know a few things about work files:
The MCD installation program creates a work subdirectory. Set MCD
(through the Work File Directory parameter in the Execution block) to place its work files in this subdirectory.
MCD gives you the option to save work files; otherwise, they are
automatically deleted when the job completes.
10
Match/Consolidate Job-File Reference

Steps in Match/Consolidate processing

The simplest kind of MCD job—finding duplicate records and eliminating them—is a three-step process that automatically happens within the MCD operation.
1. Read records and create key file.
2. Find duplicate records.
3. Either delete unwanted records from the input file(s), or copy good records to an output file.

Read records and create key file

The first step is to read all of the input records. Match/Consolidate does two things to each input record. First, MCD decides whether the record should be included in the job. Perhaps the record has already been marked for deletion, for example. Or, you may set conditions that an input record must meet (called a filter) to be included in the job.
If the record is to be included, MCD then extracts data from the record and copies it into another file, called the key file. The key file is an internal work file. When searching for dupes, MCD will look at the data in the key file, not the data in the input file(s).
As data is copied from the input file, it may be modified. Names, firms, and city/ state/ZIP data may be standardized, and address data is parsed. When MCD prepares data for the key file, this is only for internal use, to improve dupe detection. It does not change the data in the input file, nor (if you choose to create one) the output file.
For more information about setting up input files and the key file, see the User’s Guide to Record Matching.

Find duplicates The second step is to find duplicates. You will choose the logic that MCD uses to

decide whether records match.
When the program finds two or more records that match, it assembles them into a dupe group (duplicate record group). Once the search is complete and all dupe groups have been formed, MCD sorts each group to prioritize the records. Again, you will choose from one of the following prioritizing methods.
Prioritizing method Description
Unique records Records that did not match any other record
Master dupes Records that ranked at the top of their dupe group
Subordinate dupes Records that ranked second or lower in their dupe group
Chapter 1: Introduction to Match/Consolidate Job
11
For most users, the unique records and the master dupes are the good records and the subordinate dupes are unwanted. The following table may help you to picture these groups. Normally, records above the line are kept, and those below it are dropped.

Purge dupes or create output

Master dupe
Unique Unique Master
dupe
Unique Master
dupe
Unique
Subordinate Subordinate Subordinate
Subordinate Subordinate
Subordinate
Note that all of step 2 is carried out in MCD work files. So far, MCD has not output or eliminated any records.
The third and final step of most MCD jobs is to prepare the output. You have a choice of two methods.
Method Description
Purge Delete subordinate dupes from the input file, leaving the unique records
and master dupes intact.
Output Copy unique records and master dupes from the input file(s) to an output
file, omitting the subordinates, and leaving the input files intact.
Of course, there are plenty of ways to make a MCD job more complex to meet your particular needs. The User’s Guide to Record Matching explains how you can use the program’s features to get the results you want.
12
Match/Consolidate Job-File Reference

Run the Match/Consolidate job

Sample job We’ve sent you a sample MCD job that is ready to run. Running this job:

Verifies that the system is set up correctly.
Verifies that MCD is installed correctly.
Increases the confidence in working with MCD.
Gives you a starting point from which to develop jobs of your own.
There are two versions of the sample job file:
quikwin.mpg for Windows
quikunix.mpg for UNIX
You’ll find commands for running these jobs on the next page. If the software has been installed and the system set up correctly, MCD will display messages as it goes through these steps:
1. Starts the program and verifies the job file.
2. Reads input records.
3. Searches for dupes.
4. Creates an output file of desirable records.
5. Generates reports.
After you run the job, look at the files in the samples subdirectory.

Job description It’s impossible for one sample to apply perfectly to all users. However, we’ve

designed the sample job to be as simple, yet typical, as possible:
Small mailing list, all one state.
One input file of 1000 records, fixed-length ASCII text.
Family mailing (match on last name and address).
MCD output file (unique records and master dupes).
Basic set of reports.

Create jobs from our sample

You can use the Quick Start job as the basis for your own jobs. Copy the Quick Start job file and edit your copy. Look for the parameter entries in UPPER CASE; those are the ones you’ll probably want to change, especially path and file names.
Chapter 1: Introduction to Match/Consolidate Job
13

Windows commands Complete the following steps to run the Quick Start sample job. If you installed

MCD to a drive or directory other than c:\pw, change the path name.
Open Windows Explorer before and after, so that you can compare what is input to MCD, and the files that MCD creates.
1. Choose Start > Run
2. In the Run box, enter pwmpg c:\pw\mpg\samples\quikwin.mpg (Processing messages appear, and then processing is complete.)
Note that as an alternative to steps 1 and 2, you can enter the following command from a DOS prompt in the c:\pw\mpg directory: pwmpg samples\quikwin.mpg

UNIX commands To run the sample job, type the commands shown below. We’re assuming that

you’ve installed MCD in /usr/postware. If you’ve used another location, change the path name accordingly.
Run ls before and after, so that you may compare what’s input to MCD, and the files that MCD creates.
$ cd /usr/postware/merge/samples
$ ls
$ pwmpg quikunix.mpg (Processing messages appear. Then processing is complete.)
$ ls
14
Match/Consolidate Job-File Reference

Create Match/Consolidate jobs

To help you prepare your own MCD jobs, we provide the following samples. You can copy any of these files as a starting point. You will find them in the template directory (see “Install Match/Consolidate” on page 8).
Before editing any of the sample jobs discussed below, make a copy with a
!
different file name. When we ship software updates, we always ship new copies of the sample jobs. Ensure that the new version does not overwrite a file on which you’ve been working.

Quick Start Now that you’ve run the Quick Start job and you know that it runs, you can adapt

it to your own needs.

Templates Templates are job files that are nearly ready to use; they require just a few

minutes of editing from you. Each template is set up for a particular type of job.

Master The master job file is called master.mpg. It contains one of each type of block

that a MCD job file might contain. When you need to add new blocks to a job, you can copy them from master.mpg.

Resources The resource file match.mpg is not a complete job file. Instead, it contains

samples of the blocks that control matching. These samples will help you understand and select a strategy for matching. You can copy sample blocks into any other job file.
Similarly, the resource file group.mpg contains sample blocks for Group Posting. See the User’s Guide to Record Matching and Chapter 2, “Job-file blocks and
parameters” on page 21 for descriptions and setup instructions.

Extended matching files

There are six extended matching files that you may refer to from the MCD job file. These files are named auto.mpg, family.mpg, firm.mpg, firmindv.mpg, hhold.mpg, and indiv.mpg. See the Extended Matching Reference manual for details about using these files.

Editing tips To edit job files, you will need working knowledge of a good text editor or word-

processing program. If you use a word processor, be sure to save job files as simple ASCII text.
Chapter 1: Introduction to Match/Consolidate Job
15

Guidelines for editing job files

Consider the following points when editing job files:
Copy and edit the job files provided with MCD, or copy blocks between files
as necessary.
Use the file name extension .mpg.
Some blocks are required, but most are optional. You can place blocks in
any sequence.
Do not edit the BEGIN or END lines, block titles, or parameter names
(anything to the left of the equal sign). There is only one exception to this rule: To make MCD ignore a block, insert an asterisk (*) before the word BEGIN.
Never delete parameters or rearrange them within a block. There are a few
!
blocks in which you can copy and repeat the last parameter as many times as you need. Only when the manual says it is okay should you change the number of parameter lines within a block.
Never press the Enter key while typing a long parameter entry; let the entry
!
wrap onto an additional line. If you press the Enter key, MCD counts the extra end-of-line marks as separate lines.
You can add comments at the beginning or end of the job file, and between
blocks, but not within a block. Notes or comments might make it easier for others to understand and use the job file.
We recommend that you start all comment lines with an asterisk (*). We also recommend that you do not use the keywords BEGIN or END in comments. A comment can be anywhere in a job file after an END or before the next BEGIN.
Many parameters require some sort of entry. In the sample job files that we
provide, many parameters have suggested entries already in place. There are some optional parameters that may be left blank.
Where space allows, parameter names are followed by clues or options in
parentheses. Clues are shown in lower case; options are in upper case. Case doesn’t matter in the entry you type, but be sure to spell options exactly as shown, and do not abbreviate. Exception: At Y/N parameters, you may spell out Yes or No.
As programs are updated to new versions, parameters and blocks may be
added or changed. Do not manually update the job files to the new version; instead, use the Edjob utility. For instructions about how to do this, see the Edjob User’s Guide.
16
Match/Consolidate Job-File Reference

Match/Consolidate command line

Checklist Before running MCD, make sure that you have finished the following prepwork:

Create an FMT file for each fixed-length ASCII input file.
Create a DMT file for each delimited ASCII input file.
Create a DEF file for each input file.
Complete the job file.
Verify that adequate disk and memory space are available.

Basic command line To run the job, type the MCD command, followed by the path and name of the

job file: pwmpg [path] jobfile.mpg

Options See the Quick Reference for a complete list of options for the MCD command

line.
Chapter 1: Introduction to Match/Consolidate Job
17

Messages during verification and processing

Verifier messages The MCD job-file verifier is a part of the pwmpg program. It checks the job and

input files for the following types of mistakes and/or omissions:
Missing PW field from DEF file.
Input file is defined as fixed-length ASCII, but FMT file cannot be found.
Missing execution block from job file.
Entry at Filter parameter is invalid because the syntax is wrong.
A verifier message may be either a warning or an error. When you receive a warning, you may choose to continue or to stop processing. An error is more serious, and processing must stop.
You may control the verifier by adding options to the command line. See the a, nos, and v options in the Quick Reference.

Correct errors For command-line processing, the verifier catches only one error per run. Note

that MCD Views lists all warnings and errors at once; when it stops, you will be back at the operating system prompt. Start the text editor, open the job file, and correct the error, then start pwmpg again. Continue this cycle until the job makes it all the way through verification without errors.

Process messages Match/Consolidate reports progress by printing messages on the screen. The

processing messages will refer to tasks that MCD performs: Reading input records, finding dupes, sorting dupes, creating reports and output files, and so on.
During each step, MCD reports progress as a percentage. When all the processing you requested is finished, the last message is as follows:
Processing completed
You might want to capture messages in a file for later reading; to do so, add something like this to the command line:
UNIX:
Windows: /lmessagefile
>
messagefile
.log
.log
This redirects the standard output to a file.
18
Match/Consolidate Job-File Reference

Template job files

Match/Consolidate template job files are nearly ready to use; they require a few minutes of editing from you. You’ll find them on the system in the template subdirectory. Below is a list and description of each template.
Before editing any of the templates, make a copy with a different file name.
!
When we ship software updates, we always ship new copies of the templates. Ensure that a new version does not overwrite a file on which you’ve been working.
Most of the templates do not contain the crucial pair of blocks in which matching logic is set (Match Criteria and Match Options); they are not needed for extended matching. Note that the Match Options block and the Match Criteria block are required for standard matching. For more information, see “Matching Method
(STD/EXT/ADV)” on page 24.
Select the matching logic from the options offered in the resource file match.mpg. Then, copy the appropriate pair of blocks into the job file. This way, you can adapt the template for the following mailing types:
Residential (one per address)
Family (one per last name)
Individual (one per person)

dedupe.mpg We’ve received a client’s mailing list, to prepare it for a direct-mail campaign.

The client has asked us to de-dupe the list. We run the file through MCD and delete dupes from it.

good_out.mpg This job is just like the first, with one change: Instead of deleting dupes from the

input file, we’re going to copy good records to a separate output file. As above, this template may be adapted for individual, household, residential, or business­to-business mailings.

suppress.mpg Our company is a member of the Direct Mail Association. Four times per year we

receive from the DMA an updated copy of their Mail Preference File. This is a list of almost three million people who have asked not to receive unsolicited mail advertising. We want to compare this list to our mailing list to make sure we don’t mail our catalogs and flyers to people who probably wouldn’t buy anyway.

fishing.mpg A common variation on the suppression job is to use the house file as a

suppression list, to prevent mailing to current customers. For example, a charitable foundation mails a special solicitation to rented lists. They want to make sure that no one on their current-donor list receives this mailing. Another example would be a cruise line with berths to fill, mailing a special discount offer that must not be received by anyone already booked.
Chapter 1: Introduction to Match/Consolidate Job
19

multibuy.mpg Our company mails special offers and coupons to frequent buyers; that is, people

who show a pattern of repeat business. Today, we have to mail 10,000 copies of a rather expensive catalog, so we want to select our 10,000 best prospects for this mailing. To prepare this mailing, we’re going to merge several files: Our own customer list, plus lists that we’ve rented.
The total number of input records is about a dozen times the number of catalogs we have to mail, so we can afford to be choosy. We’re going to stipulate that, to receive this mailing, a person’s name must appear several times in our input files. And of those repeat buyers, we are going to take only the 10,000 highest incomes.

firm_pkg.mpg We’re a magazine publisher, mailing technical and engineering titles at second

class. We’re anxious to reduce our postage by using Presort to form firm packages whenever possible. However, Presort looks only for firm names that match exactly. Any variation in spelling or punctuation prevents Presort from finding a match. This is a problem for us, because our subscription-entry system does nothing to standardize company names. But we can use MCD to add a field that will work reliably in Presort.
20
Match/Consolidate Job-File Reference
Chapter 2: Job-file blocks and parameters
This chapter describes each parameter in the job file, listing parameters by block. We list the blocks in the order in which you are most likely to set up a job; see the table of contents for an alphabetical listing of each block.
Chapter 2: Job-file blocks and parameters
21

General

When you open a new job, the optional General Information block appears. The purpose of the this block is to stamp each job with the current version of Match/ Consolidate (MCD). It also provides a location to label the job with a description and a job owner. The information included in this block appears in some reports.

Job description Enter a name up to 80 characters for the job. This information is printed in the Job

Summary Report. Use the following shortcuts (macros) in the description.
$job is converted to the base name (without path or extension) of the job file.
$date and $time are taken from the computer’s clock at the time you start the
job. The date is nine characters long, in the format dd-mmm-yyyy. The time is ten characters long, in the hh:mm:ss format, with am or pm.

Job owner Enter a name up to 20 characters of the job owner. Match/Consolidate prints this

information in the Job Summary Report.
22
Match/Consolidate Job-File Reference

Execution

This block controls how MCD executes. Decisions made in this block determine the matching method, type of output file(s) created, and processing options.
Read Records & Create Match Sets (Y/N)
Reading records and creating the match sets is the first step in MCD processing. If you change the selection of match sets or their lengths, you must repeat this step (set this parameter to Yes again).
Option Description
Ye s En t e r
No If you want to rerun the job using saved work files, set this parameter to
Find Duplicates (Y/N/Predict)
Once you have created the key file (the Read Records & Create Match Sets parameter process), or if you are working from an existing valid reference file rather than raw input data, you can perform this process to compare records, find duplicates, and form dupe groups.
Option Description
Yes To find duplicate records, set this parameter to Yes and then run
No Once dupes have been found correctly, change this parameter to No to run
Ye s
to read all input records and create match sets. This is a
mandatory first step for all the execution processes that follow. If you intend to create, update, or use a reference file, set this option to
Yes. On the first run, set this parameter to Yes and the other processes to No, and then run ence file status), and generates any reports.
No. If you leave it set at Yes, MCD creates the match set again.
MCD searches for dupes and creates any reports that are set up. You can check the reports before going on to the next execution process.
from saved work files for subsequent output processes. If you leave it set at Yes, MCD continues searching for dupes.
pwmpg
. MCD creates the match set (depending on refer-
pwmpg
.
Execute in Background
Predict You can enter Predict to save processing time and perhaps find more
matches. When Find Duplicates is set to Predict, and you run the job, MCD forms break groups but does not perform the full matching process. Then you can check the Job Summary and adjust the break-group setup as needed.
If you select this option, MCD programs operate in a background window. If not selected, the program operates in the active window, with messages, and so on, on top.
Chapter 2: Job-file blocks and parameters
23
Matching Method (STD/EXT/ADV)
Use this parameter to choose a standard, extended, or advanced matching method.
Standard matching This method lets you match on any fields within the database and set the
appropriate match level per component. Standard matching also has default matching strategies for quick and easy setup.
Extended matching This method is rule-based.
MCD uses extended matching rules for settings that tell it how to search for duplicates. You can enter the path and name of an external extended matching file that you want to use at the EXT Match Blocks parameter in the Auxiliary File block. For more information about extended matching, see the
Extended Matching Reference
Advanced matching You can implement this method with standard and/or extended matching. MCD
manual.
has the ability to perform multi level matching and multi criteria matching and lets you use the Match Set blocks and parameters. Advanced matching parameters start with the letters ADV and are active only when you set the Matching Method to ADV.
Use advanced matching if you are using multiple match strategies within one pass of a job or when combining more than one match set. If you want to perform advanced matching, you still need to choose a matching strategy (STD or EXT).
To help you decide whether to perform standard matching or extended
Do I have existing jobs set up that I don't want to change right now?
No
Yes
Set the matching method to Std to use standard matching, with the Match Criteria and Match Options blocks (as you've done in the past).
matching, answer the questions in the flowchart at right. Keep in mind that there are two different kinds of
Do I want the quickest possible setup time, with slower processing time and less control over how Match/Consolidate performs matching?
No
Yes
Set the matching method to Ext, and use automatic extended matching. See the
Extended Matching Reference
for details.
extended matching: automatic and rule­based.
Do I want to:
control the order in which
fields are compared and the relative importance of each field?
compare fields such as social security number and gender, in addition to the usual name and address fields?
control how Match/Consolidate handles compar isons with blank fields and abbreviations?
Yes
Set the matching method to Ext, and use rule-based extended matching. See the
Extended Matching Reference
for details.
24
Match/Consolidate Job-File Reference
Create Match/Consolidate File (Y/N) Create Multi-Occurrence File (Y/N) Create All-Duplicates File (Y/N) Create Custom M/C File (Y/N)
These parameters control what kind of output file you might create.
Option Description
Yes If you want to create an output file, enter Yes at one of the parameters that
controls the output you want.
No If you don’t want to create an output file, enter No
Post to Input File (Y/N) Group Post to Purged Files (Y/N) Purge (Y/N/PREDICT) Custom Purge (Y/N/PREDICT)
These parameters control the purging of data from the input file(s) and the posting of data to input file(s).
Option Description
Yes If you want to post data to, or purge dupes from, the input file(s), enter Yes
No If you don’t want to post or purge the input files, enter No at all
Predict If you want to create a Purge by List prediction report without performing
At the end of any batch run in which you post or purge the input file(s),
!
MCD deletes its work files. This means that you cannot come back later and create any reports or output files.
at all four parameters.
at the appropriate parameter(s). You may choose either the ter or the Custom Purge parameter, but not both.
four parameters.
an actual Purge or Custom Purge, set the Purge or Custom Purge parameter to Predict
.
Purge parame-
If you want any reports or output files, set them up now before the purge. Work files are not lost when you set the Purge parameter to Predict.
You cannot set both the Purge and Custom Purge parameters to Yes. If you
!
do, MCD purges dupes from the input files; however, it ignores the Custom Purge request.
Chapter 2: Job-file blocks and parameters
25
Create Reports (Y/N) Create Report Statistics Files (Y/N) Save Work Files (Y/N)
Unless
you don’t want to save work files to be used in subsequent runs, set
Reports and Save Work Files parameters to Yes.
Option Description
Yes Set the Create Report Statistics Files parameter to Yes if you intend to use
No Set this option to No if you don’t want to save work files to be used in
Warn Before Overwrite (Y/N)
When MCD attempts to create a file, it might find that a file with the same name already exists. This parameter applies to all types of files that MCD might create, except work files. Note that MCD never warns before overwriting a work file.
Option Description
Yes MCD halts, sends a warning message, and waits for a response. You can
the Create
the statistics generated from the job to create the reports. You can then cre­ate the statistics reports by bringing the statistics files into a database, spreadsheet, or word processing program. Be sure to also set up the Statis­tics Files block.
subsequent runs.
stop the job, change the file name, and resume. Consider how setting this parameter to Yes will affect the processing of unattended batch jobs.
No MCD deletes the existing file and overwrites it with a new one without
Work File Directory (path) Sort Work File Directory (path)
You can enter a path for the work file directory and the sort work directory at these parameters. This lets you store the two different kinds of files in separate directories, which you will find useful if there is not enough disk space to store both work files and sort files in the same directory.
The sort work directory holds temporary sort work files. Match/Consolidate uses work files for temporary data storage during MCD processes.
You may leave these parameters blank. If they are left blank, MCD writes the work files and sort work files to the current directory.
warning.
26
Match/Consolidate Job-File Reference
Create Backup File (Y/N)
Backup Directory (path)
Match/Consolidate can make a backup copy of the input file(s) before purging or posting to them.
Option Description
Yes If you would like to create a backup copy of the input file(s), enter Yes.
MCD creates the backup just before posting or purging the input file(s). The backup will have the same base name as the input file and the exten­sion
.bak
.
If you want the backup placed in a particular directory, enter that path and directory name at the Backup Directory parameter. Otherwise, MCD cre­ates the backup copy in the same directory as the input file.
No If you don’t have enough free disk space, or if you don’t feel that a backup
is necessary, enter No at the Create Backup File parameter and leave the Backup Directory parameter blank.
If you elect to create a backup file, it will have the same base name as the input file, with the extension BAK, and will be placed in the same directory as the input file. This directory will not automatically appear in the box. If you want the backup placed in a particular directory, enter the path and directory name in that box.
Maximum Work Buffer Size (kilobytes)
Match/Consolidate runs faster when it has a large memory space. However, if you share memory space with other users, you can set this parameter to restrict how much memory MCD uses for work buffers. Remember that the size you set applies only to work buffers; this is only a portion of the total memory space occupied by the program.
This parameter controls the maximum amount of memory that MCD uses for buffers. For Windows users, the suggested setting for this parameter is 4096 kilobytes (four megabytes). If less memory is available, MCD uses less.
For UNIX users, use 1,024 kilobytes (one megabyte) as a starting point. With both operating systems, if the maximum size is set too large, disk swapping may occur, which decreases speed.
Sort Optimization (SPACE/SPEED)
Match/Consolidate does a lot of sorting: keys, break groups, dupe groups, and so on. Like any sorting program, MCD runs faster when it has adequate disk space. For information about estimating the disk space required for work files, see Appendix A of the User’s Guide to Record Matching.
Option Description
Space MCD uses less disk space but runs slower.
Speed MCD uses more disk space but runs faster.
Chapter 2: Job-file blocks and parameters
27

Auxiliary Files

Auxiliary Files tell MCD where to find the necessary dictionaries, rule files, and directories for parsing and standardizing input data for matching purposes. Match/Consolidate performs standardization on the record’s key for matching purposes only. Data is never changed in MCD.
Match/Consolidate has default options that allow the user to set the auxiliary files. Once they are set in the defaults, they are available for all subsequent jobs. To set defaults, go to Options > Match Consolidate Defaults.

Standard processing When you enable the standard option for Name, Title, and firm Parsing or

Address and Last-line Parsing in the Match Options block, MCD uses the following dictionaries and directories for parsing.
Dictionaries Listed below are the six parsing dictionaries used for standard processing.
Match/Consolidate uses the dictionaries to parse and identify specific data components.
Dictionary Function
Address line (
Addrln.dct
Last line (
Lastln.dct
Standard Prename
Prename.dct
(
Standard Name
Name.dct
(
Standard Pre-Lastname (
Prelname.dct
Standard Postname (
Postname.dct
Match Level Dictionary The matchpct.dct file is a table that sets, at three levels (loose, medium, and
)
)
)
)
)
)
Parses address line components and standardizes the spelling of suffix and directional information.
Parses city, state, and ZIP components.
Parses prename words such as Mr., Mrs., Dr.
Parses name words.
Parses a prefix, such as Von or Van, appearing as the first word in a compound last name.
Parses postname words such as Sr., Jr., or PhD.
tight), the percentage two components must be alike to be considered matches. Use this parameter to customize the dictionary and then enter the re-named file.
Dictionary Function
Standard Match Percent (
Matchpct.dct
)
Defines the values and sets match score adjustments.
Directories The following table lists the two directories that MCD uses to verify and
standardize city, state, and ZIP information.
Directory Function
City (
28
City09.dir
ZIP-City (
Match/Consolidate Job-File Reference
) Lists cities and the ZIP codes associated with them.
ZCF09.dir
) Lists ZIP codes and the cities and states associated with them.
Defaults If input files are in the same format, it is not necessary to create separate format or
definition files for each input file. Create one set of format and definition files as a default and enter its path and file name.
File Name Function
ASCII Format File (
.fmt
) Provides the layout for the input file when no individual
format file exists.
Definition File (
.def
) Defines the file when no individual definition file exists.

Extended processing When you enable the extended option for Name, Title, and firm Parsing and

Address and Last-line Parsing in the Match Options block, MCD uses the following dictionaries and directories for parsing.
Dictionaries Listed below are the three dictionaries MCD uses to parse and standardize name
and firm data for extended processing. These are the same dictionaries that DataRight uses for parsing.
Dictionary Function
Capitalization (
Pwcap.dct
) Standardizes special cases for capitalization such as
VanBuren and McDonald’s.
Parsing (
Parsing.dct
) Parses prename, name designators, name specials, last
name prefixes, first and last names, postnames, and titles.
Firm (
Firmln.dct
Rule Files The following table lists the rule files that MCD uses to take into account data
) Parses and standardizes firm data.
sequences and patterns for the identification of firm data and the components of multiline information.
Rules File Function
Firm Rules (
Fprules.gcf
) Contains information on sequences of firm data and
typical patterns that firms use.
Multiline Rules (
Mlrules.gcf
) Contains information on data sequences and typical
data patterns.
Directories The following table lists the directories MCD uses to standardize address and
lastline data.These are the same directories that ACE uses to standardize data.
Directory Function
ZIP4us.dir (ZIP+4 (1)) A national directory of addresses that is used to verify
and standardize address and lastline data.
(ZIP+4 (2)) A user defined custom directory that holds a smaller
portion of the national ZIP+4 directory.
Reverse ZIP+4 (RevZIP+4) A national directory of addresses with unique ZIP codes.
Chapter 2: Job-file blocks and parameters
29
Matching file The matching file identifies the path to the extended matching file that will be
used for this job. Match/Consolidate, in its extended matching process, uses this file instead of the Matching Criteria and Matching Options windows. If you are performing standard matching, as opposed to extended matching, leave this setting blank. If you have set a Match Blocks default (at the Auxiliary Files tab), MCD enters that default here for each new job you create.
To set or change the entry, type the information into this field. Or, click the folder icon at the end of the field to browse to the file location. Once you have found and selected a file name, MCD automatically enters its path and name into the field.
To use extended matching, you must also select extended matching at the Matching Method setting of the Execution window.
30
Match/Consolidate Job-File Reference
Loading...
+ 116 hidden pages