BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP
BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP
BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP
BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of
Business Objects, an SAP company and/or affiliated companies in the United States and/or
other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries.
All other names mentioned herein may be trademarks of their respective owners.
This guide explains how Match/Consolidate (MCD) programs perform record
matching. Beginning with an entry-level orientation on the basics of record
matching, this guide progresses through the common record matching
functions and an explanation of the features that comprise the current
technology of record matching.
Our examples and illustrations are based on actual MCD jobs set up and run
through the MCD Views program on a Windows NT platform. If you are not
using Views, look for similarly named parameters in the corresponding block
of your job file. We assume that you are familiar with your operating system
and have a general understanding of database management.
This document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
ItalicsWe use italics for emphasis and text for which you should substi-
tute your own data or values. For example, “Type a name for your
Menu
commands
file, and the
We indicate commands that you choose from menus in the following format: Menu Name > Command Name. For example, “Choose
File > New.”
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
!
We use this symbol to alert you to important information and
potential problems.
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful
to you.
Preface
7
Documentation
Documents related to this manual include the following:
DocumentDescription
Access the latest
documentation
System Administrator’s Guide
Database Prep
Explains how to install your software.
Explains how to prepare input files for processing,
including how to create DEF, FMT, and DMT files.
Match/Consolidate Extended
Matching
Contains the operational how-to instructions for setting up extended matching.
Reference
Match Library Program-
This is a reference manual for the Match Library.
mer’s Reference
Match/Consolidate
Library Reference
Quick Reference
This is a reference for programmers working with
the Match/Consolidate Library.
Contains descriptions of the input and output fields,
and the command line for the MCD job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for
each product that you’ve installed are available in the Documentation
folder. Choose Start > Programs > Business ObjectsApplications >
Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’
documentation.
8
Match/Consolidate User’s Guide
Chapter 1:
Fundamentals of record matching
This chapter explains some of the fundamentals of record matching. It describes
how to use Match/Consolidate (MCD) to match your records.
Chapter 1: Fundamentals of record matching
9
Terms
This guide references the following terms.
TermDefinition
Consolidation
Group posting
Salvaging data
Dupe group
Match group
Match keyName, address, or other data that is broken down into components,
Raw DataMatch Key
Name_Line1 = George F HayesFirst_name for 8 characters = GEORGE
Address = 100 Main St #5Mid_name for 3 characters = F
Last_line = Edna, MN 55424Last_name for 10 characters = HAYES
Consolidation (or group posting) means copying or accumulating
data from one matched record to another. Often, it means merging
matched records to form a single best record. Some users migrate
information from one record to another, but do not specifically seek
to merge the records.
This is a follow-up process, which occurs after records are identified as members of match groups.
The terms dupe group and match group are used interchangeably in
this guide. This refers to two or more records that were found to
match each other.
standardized, and ready for comparison. For example:
Prim_range for 10 characters = 100
Prim_name for 15 characters = MAIN
Suffix for 6 characters = ST
Sec_range for 6 characters = 5
ZIP for 5 characters = 55424
Actual Key = GEORGE F HAYES 100 MAIN ST 5 55424
Match fieldData that is part of the match key and is compared during the match-
ing process. The First_name data is one of the match fields in the
example above. Middle name (Mid_name) is another, and
Last_name, etc.
Break groupSorting keys into groups of records that are likely to match. Break
groups speed the duplicate detection process by eliminating comparisons of records that have no likelihood of matching. Only
records within the same break group are compared to one another.
10
Match/Consolidate User’s Guide
Benefits of Match/Consolidate
The benefits of using MCD begins with record matching. That means comparing
name, address, and other customer data to find matching records, In other words,
deciding whether, within your rules, Record A and Record B represent the same
person, household, or company. We can help you get started with typical
matching rules; eventually you will probably want to adjust them or make new
rules.
Once you’ve identified pairs or groups of records that match, what do you want to
do? Eliminate redundant records? Migrate customer data from one file to
another? Consider the following possibilities listed in the following table.
TermDefinition
Extended parsingApply parsing and standardization capabilities of ACE and
Extended matchingHighly tunable rule-based matching that lets you prioritize
ConsolidationWe offer two approaches to consolidation. With each, you create
TrueName, to prepare the cleanest, most complete data for
match keys.
match fields. You can prioritize your match fields and make
decisions for a match or non match on a per-field basis.
your own rules for comparing and consolidating records. You
can consolidate matched records into a best record, or migrate
data among your files.
Reference filesWhen you repeatedly match against the same static database,
there’s no need to regenerate match keys each time. Some people call this feature durable or re-usable match keys.
Advanced matchingAdvanced matching lets you find up to three levels of matches
in one pass and find associated matches between separate data
sets. For example, you can find families and individuals as well
as separate residents all in one pass and give a unique number
for each level on output. Association is finding persons who live
at different residents at different times of the year by using a
common data field.
Constant keyConstant key lets you create an ID that is unique to a record or
group of duplicate records. It is sequential, static, and it will not
change when records are updated or re-processed through
MCD.
When you append new records to the database, change when
records are updated or re-processed through MCD tags any that
belong to a group with an existing ID with that same ID.
FeatureOptions
Input purge or
create output file
Most users choose to send desirable records to an output database. Or, if disk space is a concern, you can drop undesirable
records from the input database(s).
Multi-buyerLet’s say you’re bringing together customer lists from several
other direct marketers or publishers. Your best prospects may
be the people whose names appear on two or more lists, indicating they may be most receptive to your offer.
Chapter 1: Fundamentals of record matching
11
FeatureOptions
Custom sorting and
selection
You can perform Nth-select and/or limit your output to a certain number of records. Within your maximum-records limit,
you can select your best prospects using a variety of custom
sorting strategies.
Business-to-businessMCD isn’t just for consumer marketing. For example, with the
proper setup and multiple passes, you can perform
N
-per-firm
selection—in other words, you can limit output so that only a
certain number of individuals in each company will receive
your offer. That helps you spend your advertising dollars most
effectively.
Group postingWhen you’re working with several lists, take advantage of the
best of each list. Use the MCD group posting feature to salvage
the best data—data that’s missing from your records—from
those duplicate records that won’t be included in your final
output.
Suppression listsYou can work with suppression lists—for example, your own
bad-account file, or no-mail lists provided by the government
or direct-marketing association (DMA)—to prevent wasted
mailings and offending consumers.
12
Match/Consolidate User’s Guide
Data, rules, and results
The keys to successful MCD use involves Data, Rules, and Results.
DataClean, complete name and address data will make a big difference in your
success. If you have data from several sources or from outside your organization,
then there may be issues about format and consistency. We can help. Use ACE,
TrueName, DataRight, or DataRight IQ tools to break data down into
components, correct errors and inconsistencies, and fill in missing data.
RulesRules refers to your matching rules—your criteria for when two records should
be called a match, and when they should not. You’ll need to think carefully about
which fields will be evaluated, how they will be compared, and any special or
exceptional circumstances that might override your
normal criteria.
For Views users within your match criteria, we provide five default sets of rules
to help you get started with individual, family, household, business, or businessindividual matching. We recommend that you start your learning and testing with
one of our rule sets, then adjust as necessary. That may mean a cyclical process in
which you run the search for matches, check your reports, make rule changes, and
run the search again.
ResultsConsider the results, or outputs, that you want at the end of the process. Do you
want to create an output database? If so, plan your criteria for the records to be
included in that file. If you want to consolidate records, write lists of fields to
consolidate and how to evaluate or combine each source. Finally, think about
what reports you will need for yourself and your clients.
Chapter 1: Fundamentals of record matching
13
14
Match/Consolidate User’s Guide
Chapter 2:
Record matching overview
This chapter summarizes the Match/Consolidate (MCD) process, and explains
how preparation, setup, and step-by-step execution of your job is vital to getting
the results you want from MCD.
Chapter 2: Record matching overview
15
Summary of the record matching process
To help you understand the MCD process, consider
the five-step process shown at right.
You perform the first two steps and Match/
Consolidate performs steps 3, 4, and 5.
Here we concentrate on the basics, so we ignore
many of the features that you can include to tailor
your MCD job to your job requirements. The other
chapters of this guide further explain these features.
One step at a timeThis chapter describes the steps one at a time. As you better learn MCD and set
1.Prepare your files for
Match/Consolidate
2.Set up your Match/
Consolidate job
3.Read records and
create Match Sets
4.Find matching records
5.Process the match
results
up your MCD jobs, you can do all the processing steps at once.
Match/Consolidate is a batch process. That means you set up a MCD job (define
what records to use and what to do with them), and then start that job. Match/
Consolidate runs the job according to the job settings, in one batch.
Checking your resultsDuring the MCD batch process, your interaction is limited to reading progress
messages (if you so choose). However, once the process is complete, you can
check your results by checking MCD reports and/or output files.
Match/Consolidate can produce 16 different pre-formatted reports, containing
statistics about the process and actual record data for your analysis. In addition,
MCD can produce many statistics files in which you can find most any data
pertinent to your MCD job.
Disk space for
generated files
Normally, you will create reports for every job (select the Create Reports option
at the Execution Options window). Carefully look at the appropriate reports. If
you don’t see the results you want, change your settings and re-process the job.
Do this at each step until you get the results you want.
As it runs, MCD generates work files. If you run out of disk space for those files,
the program will stop. Note that, depending on your operating system, you may
get a variety of errors. For details on estimating disk space requirements, refer to
“Calculate the size of your work files” on page 263.
16
Match/Consolidate User’s Guide
C
Prepare your files for
A
Match/Consolidate
Input file
Input file
Input file
By guiding you through a job, this chapter provides
an overview of the three main Match/Consolidate
processes. For specific details about the processes,
refer to the remaining chapters of this guide.
Match/Consolidate reports are a most valuable
source of information about your job.
Study them carefully to see if you should
adjust your job settings and rerun the process.
Note that your MCD job can be run
in one execution; it need not be run in separate
phases as shown in this illustration.
Preparation
Execution Process:
Read Records and
Create Match Sets
JOHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN
MA01501
ROBERT BRHDLEY WT. BRHDLEY & SONS ENTERPRISE 61 SUMMIT AVE SOUTH ADAMS
MA01247
JOSEPHINE LAMER NEC INFORMATIN SYSTEMS 1414 MASSACHUSETTS AVE BOXBORO
MA01719
MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD
MA01038
MR GREG HAMMOND, MGR CUST REL LISTA INTERNATIONAL 106 LOWLAND ST HOLLISTON
MA01746
MARY PETERS UNIVERSAL PLASTICS CORP 165 FRONT ST CHICOPEE
MA01013
HECTOR R RODRIGUEZ IMPRESOS ALFA AVE DEGETAU A-7 SAN ALFONSO CAGUAS PR
CONSTANSA F FOSTER TRAULSEN & CO INC PO BOX 169 COLLEGE POINT
NY11356
TIM GLAZE SHEPHERD INTELLIGENCE SYSTEMS 358 BAKER AVE CONCORD
MA01742
CLAIRE MONAHAN ASTRA PHARM PRODUCTS 50 OTIS ST WESTBOROUGH
MA01581
ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON
NJ08648
S DONGELO ACCO SWINGLINE 151 RADDIN RD GROTON
MA01450
MR MOE L CURLY, SLS SUPV ROBERTS DISTRIBUTING CORP 372 PASCO RD SPRINGFIELD
MA01119
LANCE R DUNHAM DIR ANGIOGRAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON
MA01460
MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO
NY14228
JAY SPUTNIK- MGR YANKEE AJOEIC ELEC CO 580 MAIN ST BOLTON
MA01740
JAN PAINTER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETON
MA01460
BERNIE VITTI SANDOZ 59 ROUTE 10 EAST HANOVER
NJ07936
LUIS PABON MILES PUERTO RICO INC CALL BOX 11848 SAN JUAN PR
KAREN MCFADDEN VP ROCHE BIOMEDICAL LAB 17 WALDRON AVE GLEN ROCK
NJ07452
MAUREEN DABERNARDI BRADFORD FURNITURE 23 BRADFORD ST CONCORD
MA01742
JEANNE WEINTRAUB, MKTG COORD CHANNING L BETE CO 200 STAGE RD SOUTH DEERFIELD
MA01373
MR BRADFORD W PHOENIX H M SPENCER INC BOX 14030 HOLYOKE MA
MS SUZANNE MC KIERNAN THE HANOVER INSURANCE COMPA 100 SOUTH ST WORCESTER
MA01605
AL DIGREGORIONSON AAA WATER QUALITY SYSTEMS 154 CENTRAL ST SOUTHBRIDGE
MA01550
DENNIS R MILLS SCOTT CASTINGS CORP 461 TONAWANDA ST BUFFALO
Input File Summary
Input List Summary
Sorted Records report
Unparsed Records report
Find Matching
Records
M. Smi th
Maryjane Smith
Dupes, uniques…
Reports
Reports
Input file
Match/Consolidate
Multi-Occurrence
All Duplicates
Custom Match/Consolidate
Execution Process:
Process the Match
Results
Post records
Purge records
Post data
Chapter 2: Record matching overview
Resultant
record
matching
data
Reports
Reports
Output File report
Posted Dupe Groups report
Purge by List report
Statistics files
Duplicate Records report
List Duplicates report
List Match reports
Sorted Records report
…and more…
17
About your first job
If you are new to MCD, we recommend that you make your first MCD job
simple, to familiarize yourself with the overall MCD job processes. As an
introductory job, and as a quick check to be sure your program is properly
installed, we supply a collection of files that are automatically copied to your
program's samples directory when you install MCD.
We provide the quik_mpg.dat file to serve as your database for the sample job.
We also provide the quik_mpg.def and quik_mpg.fmt support files for that
database. The database file contains 1000 records, each record having name and
address data. If you look through the file, you can find some blank fields, and you
may note that some of the records have addresses (or names, or both) similar to
those of other records.
Depending on your operating system, we provide a job file named quikunix.mpg
or quikwin.mpg. This job is preset to read the records of the quik_mpg.dat
database, process it to find duplicate records, and produce a MCD Output File.
If you have not used the standard directory structure prompted by the
installation program, then before you run the introductory job, you may have
to make some small changes to the Auxiliary Files settings, so your program
will be able to find the directories it needs for processing. Refer to your
System Administrator's Guide for additional details.
SubdirectoriesIn addition, many users prefer to keep their
jobs' output and reports in separate
subdirectories, with a directory structure
similar to the one shown at right.
If you want to separate your output and reports
like this, you'll have to do two things:
First, create the additional directories
Then, in your job setup, (quikunix.mpg
or quikwin.mpg) modify the file paths
that are set in your reports and output file
blocks to correspond to those directories.
PW
MPG
Samples
Template
Work
Output
Reports
18
Match/Consolidate User’s Guide
Prepare your files for Match/Consolidate
You need to have two types of files ready before
you run MCD:
Input files—the records you want in the job.
You can input up to 255 files for your MCD
job, and they can be of varying types, including
ASCII, dBASE3, EBCDIC, and delimited.
Supporting files — These files include the
1.Prepare your files for
Match/Consolidate
2.Set up your Match/
Consolidate job
3.Read records and
create Match Sets
4.Find matching records
5.Process the match
results
DEF file, which interprets your input data for
MCD, and format files such as FMT, DMT, or EBC.
For details about input files and support files, refer to Database Prep.
Input filesThe best way to prepare your input files for MCD is to standardize your input
data by using name and address correction software, like our DataRight,
TrueName, ACE, and IACE software. Standardized data increases the speed and
accuracy of the match process.
If your data is not standardized, MCD Job can perform extended parsing for name
and address data. Using extended parsing produces results equivalent to those
derived from using DataRight, TrueName, ACE, and IACE
(U.S. engine) software. However extended parsing is an extra cost option, and it
may increase overall processing time. Note that data is standardized in the key
data for the purpose of matching only.
If you are running our sample job (quikunix.mpg or quikwin.mpg) then your
input file, quik_mpg.dat, is in your program's samples directory.
Re-run the same jobIf you just changed settings and now want to re-run the same job, you may be able
to speed up the process by using reference files. For details, see “Re-use
processed input (key data) with reference files” on page 30.
Chapter 2: Record matching overview
19
Set up your Match/Consolidate job
Once you have prepared your data for MCD, you
need to set up your job so that the MCD program
will know:
Which input file records to include
1.Prepare your files for
Match/Consolidate
2.Set up your Match/
Consolidate job
3.Read records and
How to parse data from the input record
What key data to store for each record
What makes a match or no-match
What result (output) to produce from this job
Which reports that you want to create
create Match Sets
4.Find matching records
5.Process the match
results
Three options If you are using MCD Views, you have the three options explained below for
setting up a MCD job. If you do not have Views, then you must use the third
alternative shown here.
1.Use the Views Wizard — The MCD Views job setup Wizard prompts you
through a setup for your job. The Wizard does not control all the features
available in MCD; however, it does get the job started with the input, output,
and processing options common for most MCD users.
Once you've initially set up your job through the Wizard, you can use Views
to add any additional sophistication needed to produce exactly the results you
want from MCD.
2.Design your job in Views — You can define and design your entire job setup
through Views. Use the Views windows to select the options and define the
setup parameters that produce the results you want.
Views currently includes standard, extended, and advanced matching
processes through Match Criteria and Match Options windows.
3.Copy and edit a job file — If you are already familiar with MCD or other
record matching programs, and like working in a text-only environment, you
may want to set up your job by directly editing the job file.
For the introductory sample job (quikunix.mpg or quikwin.mpg) your setup will
be minimal, to accommodate any differences from the standard MCD installation
(refer to page 18).
20
Match/Consolidate User’s Guide
Read records and create Match Sets
As you learn to use MCD, you may want to
consider performing the first process, that of
reading the input records and creating match sets,
as a separate step. Once you have developed a
better understanding of MCD, you may be more
comfortable combining all of the processes of your
job in one execution.
To run our sample job (quikunix.mpg or
quikwin.mpg) issue the MCD start command from
1.Prepare your files for
Match/Consolidate
2.Set up your Match/
Consolidate job
3.Read records and
create Match Sets
4.Find matching records
5.Process the match
results
your command prompt, or, if using Views, open the sample job and run the job
from within Views. (Refer to our Quick Reference for command line options and
format requirements.)
The key file is a working file that MCD uses to hold the data that’s used in
placing, matching, and ranking (prioritizing) your records. You won’t read or use
this file; only the MCD process will.
The match process compares data from one record to corresponding data from
another record. However, comparing all the record data would take far too much
time for most purposes. Additionally, comparing some parts of the data might
actually be counterproductive.
Therefore, instead of using all the record data, your matching process uses key
data—data that you, the MCD user, identify as the significant parts of the record
to use for finding matches. That data is stored in the MCD key file.
Each key represents
a record
The key file contains a string of data for each record to be processed. You identify
each field and the length of characters to use in the key. For example, you may
want to store 12 characters of the last name data, 30 characters of firm data, 10
characters of primary range data, and so on.
Raw DataMatch Key
Name_Line1 = George F Hayes First_name for 8 characters = GEORGE
Address = 100 Main St #5Mid_name for 3 characters = F
Last_line = Edna, MN 55424Last_name for 10 characters = HAYES
Prim_range for 10 characters = 100
Prim_name for 15 characters = MAIN
Suffix for 6 characters = ST
Sec_range for 6 characters = 5
ZIP for 5 characters = 55424
Actual Key = GEORGE F HAYES 100 MAIN ST 5 55424
Chapter 2: Record matching overview
21
Match sets represent
a match strategy
When you set up your matching criteria, which determines whether two records
will match, you define your matching strategy. Match/Consolidate collects the
records that it compares using this match strategy into a match set.
Match Consolidate can evaluate more than one match set; however, this is an
advanced feature. For more information, refer to “Advanced matching” on
page 219. If you have defined only one match strategy in your Match Consolidate
job, then MCD automatically creates the match set. Once the key data is
assembled for MCD, it can move to the next process: finding matching records.
22
Match/Consolidate User’s Guide
Find matching records
Summary of the
matching process
Once MCD has read your input records and created
the key file, it performs the next main processing
step—finding matching records. For detailed
information about matching, refer to “Engineer
your match setup” on page 195.
1.Prepare your files for
Match/Consolidate
2.Set up your Match/
Consolidate job
3.Read records and
create Match Sets
4.Find matching records
5.Process the match
results
Normally, the match process step involves these three phases:
1.Match/Consolidate places records into small groups to avoid comparing
records that have no reasonable likelihood of matching. This process is often
referred to as forming break groups or sorting keys.
2.Next, MCD compares each key of a specific group to every other key in that
group. When two or more keys match, MCD identifies their records as
members of a dupe group—a duplicate record group. Note that the number of
records in a dupe group can vary widely, depending on the quality of your
data and your matching setup.
3.Then MCD sorts the keys of each group, to prioritize them and to categorize
each record as a unique record, or a master or subordinate “dupe.”
Raw DataMatch Key
Name_Line1 = George F Hayes First_name for 8 characters = GEORGE
Address = 100 Main St #5Mid_name for 3 characters = F
Last_line = Edna, MN 55424Last_name for 10 characters = HAYES
Prim_range for 10 characters = 100
Prim_name for 15 characters = MAIN
Suffix for 6 characters = ST
Sec_range for 6 characters = 5
ZIP for 5 characters = 55424
Actual Key = GEORGE F HAYES 100 MAIN ST 5 55424
For the records of a break group, MCD assigns each to a group of records that it
has determined to match each other and then ranks each record within that dupe
group.
Chapter 2: Record matching overview
23
Process the match results
After MCD has determined which records match,
you need to have MCD do something productive
with its conclusions. Normally, that something will
be one of the following:
Purge the input file.
Create a new output file.
Update existing records.
1.Prepare your files for
Match/Consolidate
2.Set up your Match/
Consolidate job
3.Read records and
create Match Sets
4.Find matching records
5.Process the match
results
Whichever outcome you want, MCD checks each record of the job, one after
another. Match/Consolidate acts on the record based on the results of the
matching process and your choice of processing options.
For details about input and output files, refer to “Purge input files or create output
files” on page 61.
Choose an outputFor detailed information about available options, refer to “Purge input files or
create output files” on page 61. For your jobs, your job process will likely make
obvious what you need from MCD.
The most common use of MCD is to use the results of your MCD job to produce
one of the following two output files. The introductory sample job that we
provide with MCD (quikunix.mpg or quikwin.mpg) is set up to produce the
MCD output file.
The MCD output file contains all the unique records as well as all master
records (master dupes). This type of output file could be used as a mailing
list.
The All Duplicates output file contains all the records that matched any
others. It will include all the records that were members of all the dupe
groups, but none of the unique records. This file might be used in further
database maintenance activities, or quality control functions. This type of
output file might have other uses, as well (refer to “Output file” on page 64).
24
Match/Consolidate User’s Guide
Match/Consolidate features
When you're done with your first, introductory job, you’ll probably be ready to
learn more about some of the features that you can incorporate in your MCD jobs,
s u c h a s l i s t s a n d d a t a p o s t i n g . H e r e ' s w h e r e y o u ' l l f i n d t h e s e s u b j e c t s i n t h i s g u i d e :
TaskMCD featurePage
number
Categorizing input records by source or
Lists27
field value
Logically including or excluding
records, based on field data
Consolidating or copying data among
Filters
28
Functions
Group posting147
matching (duplicate) records
Tracking what happens to (or with)
Super lists41
records from various sources
Selecting the highest quality records
for output
Multi-Level matching
Match sets
Combined match set
Nth select
Custom sorting
224
228
236
224
72
You’ll probably also want to learn about these features.
TaskMCD featurePage
number
Finding more matching recordsKey data172
Speeding up the match processBreak groups188
Controlling the match processStandard matching
Extended matching
Advanced matching
Identifying the best of the matching
records
Ranking or prioritizing
records
Chapter 2: Record matching overview
165
171
170
48
25
26
Match/Consolidate User’s Guide
Chapter 3:
Define your input files and lists
This chapter describes how to define your input. In this chapter, we explain how
to define and limit files to be used as input, how to re-use already processed files,
as reference files, and how to characterize records through the use of input lists.
Match/Consolidate (MCD) uses the words file and list interchangeably. Even if
you do not set up lists, MCD considers each input file a list.
Chapter 3: Define your input files and lists
27
Input files and lists
TermsThe following table describes the various input files and lists.
TermDescription
Input fileYour records. The database you want MCD to process.
Reference fileA re-usable file that results from MCD reading input records.
ListA grouping of records based on a common data characteristic.
Normal listA list of records that MCD should consider to be eligible records.
Suppression listA list of records MCD uses to prevent matching records of other lists
from being sent to the output.
Special listA list of records that should be treated as transparent, like seed lists.
They are not counted in determining how to characterize a match
group—for example, multi-list or single-list.
Super listA group of lists. For example, a super list may be comprised of three
lists rented from one broker.
Set up your listsThe following list summarizes how to set up lists using the MCD Job-File
and Views.
Input files and Reference files — Set up an input file block for each file you
want included in this job.
Lists — In your DEF file, define PW.List_ID. To manually set up lists, set up
one input list description block for each list. To automatically generate lists,
use the Input List Default block.
Select records based on a value in a field — In the DEF file, define
PW.List_ID as the field containing your list identification data. For example,
if you have a database field named List_Code that contains a useful value,
use PW.List_ID = List_Code.
Select records based on any criteria — In the Input List Description section of
your job, specify your selection criteria at the List Filter parameter.
28
Match/Consolidate User’s Guide
Input files
J
J
M
Before MCD can decide whether or not two records match, it must read those
records from your database file(s) and convert them into key data. Identify all
files that you want included in your MCD job.
Determine which
input file records to
include
Match/Consolidate processes records from your input files one at a time. First it
decides whether the record should be included in the job—perhaps the record has
been marked for deletion, for example. Or perhaps you want to limit the number
or type of records to use from an input file. You can set file by file limits on which
records should be used with these methods:
A starting record number.
A maximum number of records from the input file.
Filters that apply to records of this input file.Filters are formal, logical
statements that MCD can act on as it reads your input record. For example,
you might want to exclude or filter out any record that is not from a particular
state. Refer to filter information in the Quick Reference.
An input processing exit function.
OHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN
MA0150 1
ROBERT BRHDLEY WT. BRHDLEY & SO NS ENTER PRISE 61 SUMMIT AV E SOUTH ADAMS
MA0124 7
JOSEPH INE LAME R NEC INFORM ATIN SYS TEMS 1414 MASSACH USETTS AVE BOXBOR O
MA0171 9
MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD
MA0103 8
MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON
MA0174 6
MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE
MA0101 3
HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR
CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT
NY1135 6
TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD
MA0174 2
CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH
MA0158 1
ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON
NJ0864 8
S DONGEL O ACCO SWING LINE 151 RADD IN RD GROTON
MA0145 0
MR MOE L CUR LY, SLS SU PV ROBERTS DIST RIBUTI NG CORP 372 PASC O RD SPRINGFIEL D
MA0111 9
input file
No limits; use al l records
input file
Start at #100
Maximum 3000
input file
Use records that pass the
filter; don’ t use the rest
LANCE R DU NHAM DIR ANGIOG RAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON
MA0146 0
MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO
NY1422 8
JAY SPUT NIK- MGR YANKEE AJO EIC ELEC CO 580 MAIN ST BOLTON
MA0174 0
JAN PAIN TER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETO N
MA0146 0
BERNIE VITTI SANDOZ 59 ROUTE 10 EAST HANOVER
NJ0793 6
LUIS PABON MILES PUERTO RICO INC CALL BOX 11848 SAN JUAN PR
KAREN MC FADDEN VP ROCHE BI OMEDIC AL LAB 17 WALDR ON AVE GLEN ROC K
OHN CASILLO CONSOLIDAIONBEVERAGE 12SAINT MARK ST AUBURNMA01501ROBERTBRHDLEY WT. BRHDLEY & SONSENTERPRISE 61SUMMIT AVE SOUTH ADAMSMA01247JOSEPHINE LAMER NEC INFORMATINSYSTEMS 1414MASSACHUSETTS AVE BOXBOROMA01719MRBILL HANDRICH HELENACHEMICAL CO POBOX 220 HATFIELDMA01038MRGREG HAMMOND,MGRCUST REL LISTA INTERNATIONAL 106 LOWLAND ST HOLLISTONMA01746MARYPETERS UNIVERSAL PLASTICSCORP 165 FRONT ST CHICOPEEMA01013
HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR
CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT
NY1135 6
TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD
MA0174 2
CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH
MA0158 1
ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON
NJ0864 8
S DONGELO ACCO SWINGLINE 151 RADDIN RD GROTON
MA0145 0
MR MOE L CUR LY, SLS SU PV ROBERTS DIST RIBUTI NG CORP 372 PASC O RD SPRINGFIEL D
MA0111 9
LANCE R DU NHAM DIR ANGIOG RAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON
MA0146 0
MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO
NY1422 8
JAY SPUT NIK- MGR YANKEE AJO EIC ELEC CO 580 MAIN ST BOLTON
MA0174 0
JAN PAIN TER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETO N
MA0146 0
BERNIE VITTI SANDOZ 59 ROUTE 10 EAST HANOVER
NJ07936LUISPABON MILES PUERTORICO INC CALLBOX 11848 SAN JUAN PRKAREN MCFADDENVP ROCHE BIOMEDICALLAB 17WALDRON AVE GLENROCKNJ07452MAUREEN DABERNARDI BRADFORDFURNITURE 23BRADFORD ST CONCORDMA01742
JOHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN
MA0150 1
ROBERT BRHDLEY WT. BRHDLEY & SONS ENTERPRISE 61 SUMMIT AVE SOUTH ADAMS
MA0124 7
JOSEPHINE LAMER NEC INFORMATIN SYSTEMS 1414 MASSACHUSETTS AVE BOXBORO
MA0171 9
MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD
MA0103 8
MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON
MA0174 6
MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE
MA0101 3
HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR
CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT
NY1135 6
TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD
MA0174 2
CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH
MA0158 1
ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON
NJ0864 8
S DONGELO ACCO SWINGLINE 151 RADDIN RD GROTON
MA0145 0
R MOE L CURLY, SLS SUPV ROBERTS DI STRIBU TING COR P 372 PASCO RD SPRINGFI ELD
JOHN C ASIL LO CONS OLID AION BEV ERAG E 12 SAI NT M ARK ST AUBU RN
MA0150 1
ROBERT BRHDLEY WT. BRHDLEY & SO NS ENTER PRISE 61 SUMMIT AV E SOUTH ADAMS
MA0124 7
JOSEPH INE LAME R NEC INFORM ATIN SYS TEMS 1414 MASSACH USETTS AVE BOXBOR O
MA0171 9
MR BIL L HA NDRI CH HELE NA C HEMI CAL CO PO BOX 220 HATF IELD
MA0103 8
MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON
MA0174 6
MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE
MA0101 3
HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR
CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT
NY1135 6
TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD
MA0174 2
CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH
MA0158 1
ROBERT FIN E AMER ICAN BIL TRIT E IN C PO BOX 614 6 TREN TON
NJ0864 8
S DONGEL O ACCO SWING LINE 151 RADD IN RD GROTON
MA0145 0
MR MOE L CUR LY, SLS SU PV ROBERTS DIST RIBUTI NG CORP 372 PASC O RD SPRINGFIEL D
MA0111 9
LANCE R DU NHAM DIR ANGIOG RAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON
MA0146 0
MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO
NY1422 8
JAY SPUT NIK- MGR YANKEE AJO EIC ELEC CO 580 MAIN ST BOLTON
MA0174 0
JAN PAIN TER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETO N
MA0146 0
BERNIE VIT TI SAND OZ 59 ROU TE 1 0 EAST HAN OVER
NJ0793 6
LUIS PAB ON MILES PU ERTO RIC O INC CALL BOX 11848 SAN JUAN PR
KAREN MC FADDEN VP ROCHE BI OMEDIC AL LAB 17 WALDR ON AVE GLEN ROC K
NJ0745 2
MAUREE N DA BERN ARDI BRAD FORD FUR NITU RE 23 BRA DFOR D ST CONC ORD
MA0174 2
JEANNE WEINTRA UB, MKTG COORD CHAN NING L BET E CO 200 STAGE RD SOUTH DEER FIELD
MA0137 3
MR BRADF ORD W PHOE NIX H M SPENCE R INC BOX 1403 0 HOLYOKE MA
MS SUZAN NE MC KIER NAN THE HANO VER INSU RANCE CO MPA 100 SOUT H ST WORCESTER
MA0160 5
AL DIGRE GORION SON AAA WATER QUAL ITY SYST EMS 154 CENTRA L ST SOUTHBRI DGE
MA0155 0
DENNIS R MILLS SCOTT CA STINGS CORP 461 TONAWA NDA ST BUFFALO
NY1420 7
All input records
Parse data from the
input file
When it reads your input record, MCD identifies specific parts of your input
records, such as first name, last name, address, city, and so on. This is called
parsing. Later chapters explain the various parsing options.
The parsing process is only for internal program use, to improve the detection of
matching records. Match/Consolidate stores parsing results in working files that
MCD will use in creating the key file. Parsing does not actually change the data in
your input file, nor does it affect the data that will be in your output file (if you
choose to create one). For more information about matching records, refer to
“Find matching records” on page 23.
Chapter 3: Define your input files and lists
29
Re-use processed input (key data) with reference files
A reference file is a specialized work file that contains all the key data for an
input file. Create the reference file during your first MCD process. For
subsequent passes, MCD uses that reference file as the input data instead of using
its associated input file.
Reference files are controlled by settings (parameters) of the Input File block of
your MCD job setup. Refer to the Job-File Reference manual for details about
how to create them or use them.
When you can use a reference file rather than an input file, you save the time that
would have been spent repetitively reading input data and creating key files. As
such, reference files can be a valuable substitute for large, frequently-used input
files, such as mailer suppression lists.
For example, many mailers use the DMA’s MPS file, which lists about 3 million
people who don't want to receive direct mail. Including this file as input
suppresses these people from appearing on any mailing list produced by the MCD
job.
When using reference files, you can change your matching and breaking setup in
subsequent MCD passes or jobs. However, you must stay within the bounds of
the key data that was captured when the reference file was created. The reference
file can’t accommodate changes in the key data, or changes in list or input filter
restrictions that apply to that file.
Lists and prioritiesReference files inherit from their input file the settings that are used in their
corresponding input lists. (Lists are explained later in this chapter; priorities are
explained in the following chapter.) Therefore, a reference file would have to be
regenerated if your job includes the following:
List_ID — Changing to different List_ID field values. A reference file
inherits the List_ID of its input file, whether the List_ID is defined in the
DEF file as a constant or as a field. If the input file has no List_ID, then
neither will the reference file.
Priority field — Changing the priority field to a different field.
When you produce a reference file, generate the Job Summary report, for a record
of all the relevant job settings, and include any options that you may want to
include in jobs using this reference file.
Purge an input file When your MCD job includes input posting or group posting during an input file
purge, MCD will post to both the input file and its associated reference file. For
details about input file purging with reference files, refer to “Purge the input file”
on page 65.
30
Match/Consolidate User’s Guide
Loading...
+ 252 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.