BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP
BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP
BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP
BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of
Business Objects, an SAP company and/or affiliated companies in the United States and/or
other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries.
All other names mentioned herein may be trademarks of their respective owners.
This manual provides detailed information about Match/Consolidate (MCD)
job. Use this as a reference as you set up and run jobs.
For conceptual information about matching records, see the User’s Guide to Record Matching. This guide will acquaint you with the concepts of matching
records and your matching records options and possibilities.
This document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file,
Menu
commands
!
and the
We indicate commands that you choose from menus in the following format: Menu Name > Command Name. For example, “Choose
File > New.”
We use this symbol to alert you to important information and potential problems.
We use this symbol to point out special cases that you should know
about.
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
We use this symbol to draw your attention to tips that may be useful
to you.
Preface
5
Documentation
Documents related to this manual include the following:
DocumentDescription
Access the latest
documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, including how to create DEF, FMT, and DMT files.
Match/Consolidate
User’s Guide to Record
Matching
Explains the concepts behind name and address matching
software and provides examples of how to implement,
analyze, and fine-tune match detection strategies for the
best results.
Match/Consolidate
Extended Matching
Contains the operational how-to instructions for setting
up extended matching.
Reference
Quick Reference
Contains descriptions of the input and output fields, and
the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for
each product that you’ve installed are available in the Documentation
folder. Choose Start > Programs > Business ObjectsApplications >
Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’
documentation.
6
Match/Consolidate Job-File Reference
Chapter 1:
Introduction to Match/Consolidate Job
This chapter is a synopsis of the most important facts that you will need to get
started with Match/Consolidate (MCD). If you have never used MCD software,
consider reading the User’s Guide to Record Matching before using this reference
and before running sample jobs.
Chapter 1: Introduction to Match/Consolidate Job
7
Install Match/Consolidate
You must install MCD and the sample and template files before you can work
with them. To install MCD, see the System Administrator’s Guide, which explains
how to set up the computer and how to install the software. Follow those
instructions thoroughly and carefully. Remember to set PATH and PW_PATH.
See the Database Prep manual for information about databases and ASCII text
files and how to prepare them for MCD.
WindowsWhen you run install setup for Windows operating systems, the program
prompts you to select a drive, and to name a directory where you want MCD
located. The default directory location is c:\pw; we recommend that you accept
this default.
The installation program automatically creates subdirectories under \PW. If
you’ve used another drive or directory name, change the path names accordingly.
For details, see the System Administrator’s Guide.
C:
pw
dirsadmmpgmpgc
Edjob and other
utilities
UNIXTheUNIX directory structure is shown below. You must create the postware and
ZCF and other
directories
work
Files for the MCD jobsMCD templates
MCD executables
and dictionaries
template
job files
MCD Custom
(optional)
samples
Files for Quick
Start jobs
adm directories. For details, see the System Administrator’s Guide.
postware
Edjob and other
utilities
dirsadmmpgc
ZCF and other
directories
merge
MCD executables
and dictionaries
MCD Custom
(optional)
work
Files for the MCD jobsMCD templates
8
Match/Consolidate Job-File Reference
template
job files
samples
Files for Quick
Start jobs
Run a Match/Consolidate job
Match/Consolidate is a batch program. You prepare several files for input, and
enter a command line at the operating-system prompt. Match/Consolidate begins
operation, requiring little or no further input.
The illustration below shows what goes into, and comes out of, a MCD job. When
you start the program, it first scans the job file for errors. If any are found, you
must edit the job file to correct them, then enter the MCD command line again.
When all errors have been found and corrected, processing begins. Match/
Consolidate then presents messages about its progress.
Job file
Input
database(s)
Supporting files
Output
database(s)
Match/Consolidate
Reports
Work files
Supporting files
InputHere are the files that MCD requires:
FileDescription
Job fileA job file tells MCD everything it needs to run: Where to find the
input file(s), what sort of processing to do, which reports and outputs
to create, and where to place them.
Input file(s)The input file contains name, address, and other data used in the
matching process.
FMT, DMT fileFor some types of input files, you’ll need a format file. This
describes the input file in a physical way: the sequence of fields, their
names, their lengths, and type of data.
DEF fileThink of this file as a dictionary. The fields in the input file may have
any names you like. The DEF file translates those names into particular names that MCD can recognize. For details about input files and
their support files, see
Database Prep
.
Chapter 1: Introduction to Match/Consolidate Job
9
ResultsHere are the results that you may expect from MCD processing:
ResultDescription
ReportsMCD prepares plenty of reports documenting the duplicate detection
process. Check the reports to verify good results. Most users instruct
MCD to save reports in files, so they can be read on-screen or sent to
a printer.
Statistics filesMCD can generate statistics files containing information on the job.
These files can be brought into a database, spreadsheet, or word processing program so you can create the job-related reports.
Output file(s)MCD can create output files, such as databases of processed names
and addresses. You control the format and the content of output files.
Input postingMCD can place information back in the input records.
Input purgeMCD can delete unwanted records from the input file(s), or mark
them for later removal.
ConsolidationMCD can consolidate information from matched records to form a
best
record for the database.
Work filesWhile processing, MCD stores internal information in work files. These files are
not directly useful to you, but you do need to know a few things about work files:
The MCD installation program creates a work subdirectory. Set MCD
(through the Work File Directory parameter in the Execution block) to place
its work files in this subdirectory.
MCD gives you the option to save work files; otherwise, they are
automatically deleted when the job completes.
10
Match/Consolidate Job-File Reference
Steps in Match/Consolidate processing
The simplest kind of MCD job—finding duplicate records and eliminating
them—is a three-step process that automatically happens within the MCD
operation.
1.Read records and create key file.
2.Find duplicate records.
3.Either delete unwanted records from the input file(s), or copy good records to
an output file.
Read records and
create key file
The first step is to read all of the input records. Match/Consolidate does two
things to each input record. First, MCD decides whether the record should be
included in the job. Perhaps the record has already been marked for deletion, for
example. Or, you may set conditions that an input record must meet (called a
filter) to be included in the job.
If the record is to be included, MCD then extracts data from the record and copies
it into another file, called the key file. The key file is an internal work file. When
searching for dupes, MCD will look at the data in the key file, not the data in the
input file(s).
As data is copied from the input file, it may be modified. Names, firms, and city/
state/ZIP data may be standardized, and address data is parsed. When MCD
prepares data for the key file, this is only for internal use, to improve dupe
detection. It does not change the data in the input file, nor (if you choose to create
one) the output file.
For more information about setting up input files and the key file, see the User’s Guide to Record Matching.
Find duplicatesThe second step is to find duplicates. You will choose the logic that MCD uses to
decide whether records match.
When the program finds two or more records that match, it assembles them into a
dupe group (duplicate record group). Once the search is complete and all dupe
groups have been formed, MCD sorts each group to prioritize the records. Again,
you will choose from one of the following prioritizing methods.
Prioritizing methodDescription
Unique recordsRecords that did not match any other record
Master dupesRecords that ranked at the top of their dupe group
Subordinate dupesRecords that ranked second or lower in their dupe group
Chapter 1: Introduction to Match/Consolidate Job
11
For most users, the unique records and the master dupes are the good records and
the subordinate dupes are unwanted. The following table may help you to picture
these groups. Normally, records above the line are kept, and those below it are
dropped.
Purge dupes or create
output
Master
dupe
UniqueUniqueMaster
dupe
UniqueMaster
dupe
Unique
SubordinateSubordinateSubordinate
SubordinateSubordinate
Subordinate
Note that all of step 2 is carried out in MCD work files. So far, MCD has not
output or eliminated any records.
The third and final step of most MCD jobs is to prepare the output. You have a
choice of two methods.
MethodDescription
PurgeDelete subordinate dupes from the input file, leaving the unique records
and master dupes intact.
OutputCopy unique records and master dupes from the input file(s) to an output
file, omitting the subordinates, and leaving the input files intact.
Of course, there are plenty of ways to make a MCD job more complex to meet
your particular needs. The User’s Guide to Record Matching explains how you
can use the program’s features to get the results you want.
12
Match/Consolidate Job-File Reference
Run the Match/Consolidate job
Sample jobWe’ve sent you a sample MCD job that is ready to run. Running this job:
Verifies that the system is set up correctly.
Verifies that MCD is installed correctly.
Increases the confidence in working with MCD.
Gives you a starting point from which to develop jobs of your own.
There are two versions of the sample job file:
quikwin.mpg for Windows
quikunix.mpg for UNIX
You’ll find commands for running these jobs on the next page. If the software has
been installed and the system set up correctly, MCD will display messages as it
goes through these steps:
1.Starts the program and verifies the job file.
2.Reads input records.
3.Searches for dupes.
4.Creates an output file of desirable records.
5.Generates reports.
After you run the job, look at the files in the samples subdirectory.
Job descriptionIt’s impossible for one sample to apply perfectly to all users. However, we’ve
designed the sample job to be as simple, yet typical, as possible:
Small mailing list, all one state.
One input file of 1000 records, fixed-length ASCII text.
Family mailing (match on last name and address).
MCD output file (unique records and master dupes).
Basic set of reports.
Create jobs from our
sample
You can use the Quick Start job as the basis for your own jobs. Copy the Quick
Start job file and edit your copy. Look for the parameter entries in UPPER CASE;
those are the ones you’ll probably want to change, especially path and file names.
Chapter 1: Introduction to Match/Consolidate Job
13
Windows commandsComplete the following steps to run the Quick Start sample job. If you installed
MCD to a drive or directory other than c:\pw, change the path name.
Open Windows Explorer before and after, so that you can compare what is input
to MCD, and the files that MCD creates.
1.Choose Start > Run
2.In the Run box, enter pwmpg c:\pw\mpg\samples\quikwin.mpg
(Processing messages appear, and then processing is complete.)
Note that as an alternative to steps 1 and 2, you can enter the following command
from a DOS prompt in the c:\pw\mpg directory: pwmpg samples\quikwin.mpg
UNIX commandsTo run the sample job, type the commands shown below. We’re assuming that
you’ve installed MCD in /usr/postware. If you’ve used another location, change
the path name accordingly.
Run ls before and after, so that you may compare what’s input to MCD, and the
files that MCD creates.
$ cd /usr/postware/merge/samples
$ ls
$ pwmpg quikunix.mpg (Processing messages appear. Then processing is
complete.)
$ ls
14
Match/Consolidate Job-File Reference
Create Match/Consolidate jobs
To help you prepare your own MCD jobs, we provide the following samples. You
can copy any of these files as a starting point. You will find them in the template
directory (see “Install Match/Consolidate” on page 8).
Before editing any of the sample jobs discussed below, make a copy with a
!
different file name. When we ship software updates, we always ship new
copies of the sample jobs. Ensure that the new version does not overwrite a
file on which you’ve been working.
Quick StartNow that you’ve run the Quick Start job and you know that it runs, you can adapt
it to your own needs.
TemplatesTemplates are job files that are nearly ready to use; they require just a few
minutes of editing from you. Each template is set up for a particular type
of job.
MasterThe master job file is called master.mpg. It contains one of each type of block
that a MCD job file might contain. When you need to add new blocks to a job,
you can copy them from master.mpg.
ResourcesThe resource file match.mpg is not a complete job file. Instead, it contains
samples of the blocks that control matching. These samples will help you
understand and select a strategy for matching. You can copy sample blocks into
any other job file.
Similarly, the resource file group.mpg contains sample blocks for Group Posting.
See the User’s Guide to Record Matching and Chapter 2, “Job-file blocks and
parameters” on page 21 for descriptions and setup instructions.
Extended matching
files
There are six extended matching files that you may refer to from the MCD job
file. These files are named auto.mpg, family.mpg, firm.mpg, firmindv.mpg, hhold.mpg, and indiv.mpg. See the Extended Matching Reference manual for
details about using these files.
Editing tipsTo edit job files, you will need working knowledge of a good text editor or word-
processing program. If you use a word processor, be sure to save job files as
simple ASCII text.
Chapter 1: Introduction to Match/Consolidate Job
15
Guidelines for editing
job files
Consider the following points when editing job files:
Copy and edit the job files provided with MCD, or copy blocks between files
as necessary.
Use the file name extension .mpg.
Some blocks are required, but most are optional. You can place blocks in
any sequence.
Do not edit the BEGIN or END lines, block titles, or parameter names
(anything to the left of the equal sign). There is only one exception to this
rule: To make MCD ignore a block, insert an asterisk (*) before the word
BEGIN.
Never delete parameters or rearrange them within a block. There are a few
!
blocks in which you can copy and repeat the last parameter as many times
as you need. Only when the manual says it is okay should you change the
number of parameter lines within a block.
Never press the Enter key while typing a long parameter entry; let the entry
!
wrap onto an additional line. If you press the Enter key, MCD counts the
extra end-of-line marks as separate lines.
You can add comments at the beginning or end of the job file, and between
blocks, but not within a block. Notes or comments might make it easier for
others to understand and use the job file.
We recommend that you start all comment lines with an asterisk (*). We also
recommend that you do not use the keywords BEGIN or END in comments.
A comment can be anywhere in a job file after an END or before the next
BEGIN.
Many parameters require some sort of entry. In the sample job files that we
provide, many parameters have suggested entries already in place. There are
some optional parameters that may be left blank.
Where space allows, parameter names are followed by clues or options in
parentheses. Clues are shown in lower case; options are in upper case. Case
doesn’t matter in the entry you type, but be sure to spell options exactly as
shown, and do not abbreviate. Exception: At Y/N parameters, you may spell
out Yes or No.
As programs are updated to new versions, parameters and blocks may be
added or changed. Do not manually update the job files to the new version;
instead, use the Edjob utility. For instructions about how to do this, see the
Edjob User’s Guide.
16
Match/Consolidate Job-File Reference
Match/Consolidate command line
ChecklistBefore running MCD, make sure that you have finished the following prepwork:
Create an FMT file for each fixed-length ASCII input file.
Create a DMT file for each delimited ASCII input file.
Create a DEF file for each input file.
Complete the job file.
Verify that adequate disk and memory space are available.
Basic command lineTo run the job, type the MCD command, followed by the path and name of the
job file: pwmpg [path] jobfile.mpg
OptionsSee the Quick Reference for a complete list of options for the MCD command
line.
Chapter 1: Introduction to Match/Consolidate Job
17
Messages during verification and processing
Verifier messagesThe MCD job-file verifier is a part of the pwmpg program. It checks the job and
input files for the following types of mistakes and/or omissions:
Missing PW field from DEF file.
Input file is defined as fixed-length ASCII, but FMT file cannot be found.
Missing execution block from job file.
Entry at Filter parameter is invalid because the syntax is wrong.
A verifier message may be either a warning or an error. When you receive a
warning, you may choose to continue or to stop processing. An error is more
serious, and processing must stop.
You may control the verifier by adding options to the command line. See the
a, nos, and v options in the Quick Reference.
Correct errorsFor command-line processing, the verifier catches only one error per run. Note
that MCD Views lists all warnings and errors at once; when it stops, you will be
back at the operating system prompt. Start the text editor, open the job file, and
correct the error, then start pwmpg again. Continue this cycle until the job makes
it all the way through verification without errors.
Process messagesMatch/Consolidate reports progress by printing messages on the screen. The
processing messages will refer to tasks that MCD performs: Reading input
records, finding dupes, sorting dupes, creating reports and output files, and so on.
During each step, MCD reports progress as a percentage. When all the processing
you requested is finished, the last message is as follows:
Processing completed
You might want to capture messages in a file for later reading; to do so, add
something like this to the command line:
UNIX:
Windows: /lmessagefile
>
messagefile
.log
.log
This redirects the standard output to a file.
18
Match/Consolidate Job-File Reference
Template job files
Match/Consolidate template job files are nearly ready to use; they require a few
minutes of editing from you. You’ll find them on the system in the template
subdirectory. Below is a list and description of each template.
Before editing any of the templates, make a copy with a different file name.
!
When we ship software updates, we always ship new copies of the
templates. Ensure that a new version does not overwrite a file on which
you’ve been working.
Most of the templates do not contain the crucial pair of blocks in which matching
logic is set (Match Criteria and Match Options); they are not needed for extended
matching. Note that the Match Options block and the Match Criteria block are
required for standard matching. For more information, see “Matching Method
(STD/EXT/ADV)” on page 24.
Select the matching logic from the options offered in the resource file
match.mpg. Then, copy the appropriate pair of blocks into the job file. This way,
you can adapt the template for the following mailing types:
Residential (one per address)
Family (one per last name)
Individual (one per person)
dedupe.mpgWe’ve received a client’s mailing list, to prepare it for a direct-mail campaign.
The client has asked us to de-dupe the list. We run the file through MCD and
delete dupes from it.
good_out.mpgThis job is just like the first, with one change: Instead of deleting dupes from the
input file, we’re going to copy good records to a separate output file. As above,
this template may be adapted for individual, household, residential, or businessto-business mailings.
suppress.mpgOur company is a member of the Direct Mail Association. Four times per year we
receive from the DMA an updated copy of their Mail Preference File. This is a list
of almost three million people who have asked not to receive unsolicited mail
advertising. We want to compare this list to our mailing list to make sure we don’t
mail our catalogs and flyers to people who probably wouldn’t buy anyway.
fishing.mpgA common variation on the suppression job is to use the house file as a
suppression list, to prevent mailing to current customers. For example, a
charitable foundation mails a special solicitation to rented lists. They want to
make sure that no one on their current-donor list receives this mailing. Another
example would be a cruise line with berths to fill, mailing a special discount offer
that must not be received by anyone already booked.
Chapter 1: Introduction to Match/Consolidate Job
19
multibuy.mpgOur company mails special offers and coupons to frequent buyers; that is, people
who show a pattern of repeat business. Today, we have to mail 10,000 copies of a
rather expensive catalog, so we want to select our 10,000 best prospects for this
mailing. To prepare this mailing, we’re going to merge several files: Our own
customer list, plus lists that we’ve rented.
The total number of input records is about a dozen times the number of catalogs
we have to mail, so we can afford to be choosy. We’re going to stipulate that, to
receive this mailing, a person’s name must appear several times in our input files.
And of those repeat buyers, we are going to take only the 10,000 highest incomes.
firm_pkg.mpgWe’re a magazine publisher, mailing technical and engineering titles at second
class. We’re anxious to reduce our postage by using Presort to form firm
packages whenever possible. However, Presort looks only for firm names that
match exactly. Any variation in spelling or punctuation prevents Presort from
finding a match. This is a problem for us, because our subscription-entry system
does nothing to standardize company names. But we can use MCD to add a field
that will work reliably in Presort.
20
Match/Consolidate Job-File Reference
Chapter 2:
Job-file blocks and parameters
This chapter describes each parameter in the job file, listing parameters by block.
We list the blocks in the order in which you are most likely to set up a job; see the
table of contents for an alphabetical listing of each block.
Chapter 2: Job-file blocks and parameters
21
General
When you open a new job, the optional General Information block appears. The
purpose of the this block is to stamp each job with the current version of Match/
Consolidate (MCD). It also provides a location to label the job with a description
and a job owner. The information included in this block appears in some reports.
Job descriptionEnter a name up to 80 characters for the job. This information is printed in the Job
Summary Report. Use the following shortcuts (macros) in the description.
$job is converted to the base name (without path or extension) of the job file.
$date and $time are taken from the computer’s clock at the time you start the
job. The date is nine characters long, in the format dd-mmm-yyyy. The time
is ten characters long, in the hh:mm:ss format, with am or pm.
Job ownerEnter a name up to 20 characters of the job owner. Match/Consolidate prints this
information in the Job Summary Report.
22
Match/Consolidate Job-File Reference
Execution
This block controls how MCD executes. Decisions made in this block determine
the matching method, type of output file(s) created, and processing options.
Read Records & Create Match Sets (Y/N)
Reading records and creating the match sets is the first step in MCD processing.
If you change the selection of match sets or their lengths, you must repeat this
step (set this parameter to Yes again).
OptionDescription
Ye sEn t e r
NoIf you want to rerun the job using saved work files, set this parameter to
Find Duplicates (Y/N/Predict)
Once you have created the key file (the Read Records & Create Match Sets
parameter process), or if you are working from an existing valid reference file
rather than raw input data, you can perform this process to compare records, find
duplicates, and form dupe groups.
OptionDescription
YesTo find duplicate records, set this parameter to Yes and then run
NoOnce dupes have been found correctly, change this parameter to No to run
Ye s
to read all input records and create match sets. This is a
mandatory first step for all the execution processes that follow.
If you intend to create, update, or use a reference file, set this option to
Yes. On the first run, set this parameter to Yes and the other processes to
No, and then run
ence file status), and generates any reports.
No. If you leave it set at Yes, MCD creates the match set again.
MCD searches for dupes and creates any reports that are set up. You can
check the reports before going on to the next execution process.
from saved work files for subsequent output processes. If you leave it set at
Yes, MCD continues searching for dupes.
pwmpg
. MCD creates the match set (depending on refer-
pwmpg
.
Execute in Background
PredictYou can enter Predict to save processing time and perhaps find more
matches. When Find Duplicates is set to Predict, and you run the job, MCD
forms break groups but does not perform the full matching process. Then you
can check the Job Summary and adjust the break-group setup as needed.
If you select this option, MCD programs operate in a background window. If not
selected, the program operates in the active window, with messages, and so on,
on top.
Chapter 2: Job-file blocks and parameters
23
Matching Method (STD/EXT/ADV)
Use this parameter to choose a standard, extended, or advanced matching method.
Standard matchingThis method lets you match on any fields within the database and set the
appropriate match level per component. Standard matching also has default
matching strategies for quick and easy setup.
Extended matchingThis method is rule-based.
MCD uses extended matching rules for settings that tell it
how to search for duplicates. You can enter the path and name of an external extended
matching file that you want to use at the EXT Match Blocks parameter in the
Auxiliary File block. For more information about extended matching, see the
Extended Matching Reference
Advanced matchingYou can implement this method with standard and/or extended matching. MCD
manual.
has the ability to perform multi level matching and multi criteria matching and
lets you use the Match Set blocks and parameters. Advanced matching
parameters start with the letters ADV and are active only when you set the
Matching Method to ADV.
Use advanced matching if you are using multiple match strategies within one pass
of a job or when combining more than one match set. If you want to perform
advanced matching, you still need to choose a matching strategy (STD or EXT).
To help you decide
whether to perform
standard matching
or extended
Do I have existing jobs set up
that I don't want to change right
now?
No
Yes
Set the matching method to
Std to use standard
matching, with the Match
Criteria and Match Options
blocks (as you've done in the
past).
matching, answer
the questions in the
flowchart at right.
Keep in mind that
there are two
different kinds of
Do I want the quickest possible
setup time, with slower
processing time and less control
over how Match/Consolidate
performs matching?
No
Yes
Set the matching method to
Ext, and use automatic
extended matching. See the
Extended Matching Reference
for details.
extended matching:
automatic and rulebased.
Do I want to:
control the order in which
•
fields are compared and the
relative importance of each field?
•
compare fields such as social
security number and gender, in
addition to the usual name and
address fields?
•
control how Match/Consolidate
handles compar isons with blank
fields and abbreviations?
Yes
Set the matching method to
Ext, and use rule-based
extended matching. See the
These parameters control what kind of output file you might create.
OptionDescription
YesIf you want to create an output file, enter Yes at one of the parameters that
controls the output you want.
NoIf you don’t want to create an output file, enter No
Post to Input File (Y/N)
Group Post to Purged Files (Y/N)
Purge (Y/N/PREDICT)
Custom Purge (Y/N/PREDICT)
These parameters control the purging of data from the input file(s) and the posting
of data to input file(s).
OptionDescription
YesIf you want to post data to, or purge dupes from, the input file(s), enter Yes
NoIf you don’t want to post or purge the input files, enter No at all
PredictIf you want to create a Purge by List prediction report without performing
At the end of any batch run in which you post or purge the input file(s),
!
MCD deletes its work files. This means that you cannot come back later
and create any reports or output files.
at all four parameters.
at the appropriate parameter(s). You may choose either the
ter or the Custom Purge parameter, but not both.
four parameters.
an actual Purge or Custom Purge, set the Purge or Custom Purge parameter
to Predict
.
Purge parame-
If you want any reports or output files, set them up now before the purge.
Work files are not lost when you set the Purge parameter to Predict.
You cannot set both the Purge and Custom Purge parameters to Yes. If you
!
do, MCD purges dupes from the input files; however, it ignores the Custom
Purge request.
Chapter 2: Job-file blocks and parameters
25
Create Reports (Y/N)
Create Report Statistics Files (Y/N)
Save Work Files (Y/N)
Unless
you don’t want to save work files to be used in subsequent runs, set
Reports and Save Work Files parameters to Yes.
OptionDescription
YesSet the Create Report Statistics Files parameter to Yes if you intend to use
NoSet this option to No if you don’t want to save work files to be used in
Warn Before Overwrite (Y/N)
When MCD attempts to create a file, it might find that a file with the same name
already exists. This parameter applies to all types of files that MCD might create,
except work files. Note that MCD never warns before overwriting a work file.
OptionDescription
YesMCD halts, sends a warning message, and waits for a response. You can
the Create
the statistics generated from the job to create the reports. You can then create the statistics reports by bringing the statistics files into a database,
spreadsheet, or word processing program. Be sure to also set up the Statistics Files block.
subsequent runs.
stop the job, change the file name, and resume. Consider how setting this
parameter to Yes will affect the processing of unattended batch jobs.
NoMCD deletes the existing file and overwrites it with a new one without
Work File Directory (path)
Sort Work File Directory (path)
You can enter a path for the work file directory and the sort work directory at
these parameters. This lets you store the two different kinds of files in separate
directories, which you will find useful if there is not enough disk space to store
both work files and sort files in the same directory.
The sort work directory holds temporary sort work files. Match/Consolidate uses
work files for temporary data storage during MCD processes.
You may leave these parameters blank. If they are left blank, MCD writes the
work files and sort work files to the current directory.
warning.
26
Match/Consolidate Job-File Reference
Create Backup File (Y/N)
Backup Directory (path)
Match/Consolidate can make a backup copy of the input file(s) before purging or
posting to them.
OptionDescription
YesIf you would like to create a backup copy of the input file(s), enter Yes.
MCD creates the backup just before posting or purging the input file(s).
The backup will have the same base name as the input file and the extension
.bak
.
If you want the backup placed in a particular directory, enter that path and
directory name at the Backup Directory parameter. Otherwise, MCD creates the backup copy in the same directory as the input file.
NoIf you don’t have enough free disk space, or if you don’t feel that a backup
is necessary, enter No at the Create Backup File parameter and leave the
Backup Directory parameter blank.
If you elect to create a backup file, it will have the same base name as the input
file, with the extension BAK, and will be placed in the same directory as the input
file. This directory will not automatically appear in the box. If you want the
backup placed in a particular directory, enter the path and directory name in that
box.
Maximum Work Buffer Size (kilobytes)
Match/Consolidate runs faster when it has a large memory space. However, if you
share memory space with other users, you can set this parameter to restrict how
much memory MCD uses for work buffers. Remember that the size you set
applies only to work buffers; this is only a portion of the total memory space
occupied by the program.
This parameter controls the maximum amount of memory that MCD uses for
buffers. For Windows users, the suggested setting for this parameter is 4096
kilobytes (four megabytes). If less memory is available, MCD uses less.
For UNIX users, use 1,024 kilobytes (one megabyte) as a starting point. With
both operating systems, if the maximum size is set too large, disk swapping may
occur, which decreases speed.
Sort Optimization (SPACE/SPEED)
Match/Consolidate does a lot of sorting: keys, break groups, dupe groups, and so
on. Like any sorting program, MCD runs faster when it has adequate disk space.
For information about estimating the disk space required for work files, see
Appendix A of the User’s Guide to Record Matching.
OptionDescription
SpaceMCD uses less disk space but runs slower.
SpeedMCD uses more disk space but runs faster.
Chapter 2: Job-file blocks and parameters
27
Auxiliary Files
Auxiliary Files tell MCD where to find the necessary dictionaries, rule files, and
directories for parsing and standardizing input data for matching purposes.
Match/Consolidate performs standardization on the record’s key for matching
purposes only. Data is never changed in MCD.
Match/Consolidate has default options that allow the user to set the auxiliary
files. Once they are set in the defaults, they are available for all subsequent jobs.
To set defaults, go to Options > Match Consolidate Defaults.
Standard processingWhen you enable the standard option for Name, Title, and firm Parsing or
Address and Last-line Parsing in the Match Options block, MCD uses the
following dictionaries and directories for parsing.
DictionariesListed below are the six parsing dictionaries used for standard processing.
Match/Consolidate uses the dictionaries to parse and identify specific data
components.
DictionaryFunction
Address line
(
Addrln.dct
Last line
(
Lastln.dct
Standard Prename
Prename.dct
(
Standard Name
Name.dct
(
Standard Pre-Lastname
(
Prelname.dct
Standard Postname
(
Postname.dct
Match Level DictionaryThe matchpct.dct file is a table that sets, at three levels (loose, medium, and
)
)
)
)
)
)
Parses address line components and standardizes the spelling
of suffix and directional information.
Parses city, state, and ZIP components.
Parses prename words such as Mr., Mrs., Dr.
Parses name words.
Parses a prefix, such as Von or Van, appearing as the first word
in a compound last name.
Parses postname words such as Sr., Jr., or PhD.
tight), the percentage two components must be alike to be considered matches.
Use this parameter to customize the dictionary and then enter the re-named file.
DictionaryFunction
Standard Match Percent
(
Matchpct.dct
)
Defines the values and sets match score adjustments.
DirectoriesThe following table lists the two directories that MCD uses to verify and
standardize city, state, and ZIP information.
DirectoryFunction
City (
28
City09.dir
ZIP-City (
Match/Consolidate Job-File Reference
)Lists cities and the ZIP codes associated with them.
ZCF09.dir
)Lists ZIP codes and the cities and states associated with them.
DefaultsIf input files are in the same format, it is not necessary to create separate format or
definition files for each input file. Create one set of format and definition files as
a default and enter its path and file name.
File Name Function
ASCII Format File (
.fmt
)Provides the layout for the input file when no individual
format file exists.
Definition File (
.def
)Defines the file when no individual definition file exists.
Extended processingWhen you enable the extended option for Name, Title, and firm Parsing and
Address and Last-line Parsing in the Match Options block, MCD uses the
following dictionaries and directories for parsing.
DictionariesListed below are the three dictionaries MCD uses to parse and standardize name
and firm data for extended processing. These are the same dictionaries that
DataRight uses for parsing.
Dictionary Function
Capitalization (
Pwcap.dct
)Standardizes special cases for capitalization such as
VanBuren and McDonald’s.
Parsing (
Parsing.dct
)Parses prename, name designators, name specials, last
name prefixes, first and last names, postnames, and titles.
Firm (
Firmln.dct
Rule FilesThe following table lists the rule files that MCD uses to take into account data
)Parses and standardizes firm data.
sequences and patterns for the identification of firm data and the components of
multiline information.
Rules FileFunction
Firm Rules (
Fprules.gcf
)Contains information on sequences of firm data and
typical patterns that firms use.
Multiline Rules (
Mlrules.gcf
)Contains information on data sequences and typical
data patterns.
DirectoriesThe following table lists the directories MCD uses to standardize address and
lastline data.These are the same directories that ACE uses to standardize data.
DirectoryFunction
ZIP4us.dir (ZIP+4 (1))A national directory of addresses that is used to verify
and standardize address and lastline data.
(ZIP+4 (2))A user defined custom directory that holds a smaller
portion of the national ZIP+4 directory.
Reverse ZIP+4 (RevZIP+4)A national directory of addresses with unique ZIP codes.
Chapter 2: Job-file blocks and parameters
29
Matching fileThe matching file identifies the path to the extended matching file that will be
used for this job. Match/Consolidate, in its extended matching process, uses this
file instead of the Matching Criteria and Matching Options windows. If you are
performing standard matching, as opposed to extended matching, leave this
setting blank. If you have set a Match Blocks default (at the Auxiliary Files tab),
MCD enters that default here for each new job you create.
To set or change the entry, type the information into this field. Or, click the folder
icon at the end of the field to browse to the file location. Once you have found and
selected a file name, MCD automatically enters its path and name into the field.
To use extended matching, you must also select extended matching at the
Matching Method setting of the Execution window.
30
Match/Consolidate Job-File Reference
Loading...
+ 116 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.