SAP Match/Consolidate 8.00c User’s Guide to Record Matching

Match/Consolidate

User’s Guide to Record Matching

Match/Consolidate 8.00c
April 2009
Copyright information © 2009 SAP® BusinessObjects™. All rights reserved. SAP BusinessObjects and its logos,
BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP company and/or affiliated companies in the United States and/or other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries. All other names mentioned herein may be trademarks of their respective owners.
2
Match/Consolidate User’s Guide

Contents

Preface .............................................................................................................7
Chapter 1:
Fundamentals of record matching............................................................... 9
Terms..............................................................................................................10
Benefits of Match/Consolidate.......................................................................11
Data, rules, and results ...................................................................................13
Chapter 2:
Record matching overview ......................................................................... 15
Summary of the record matching process ......................................................16
About your first job........................................................................................18
Prepare your files for Match/Consolidate ......................................................19
Set up your Match/Consolidate job................................................................20
Read records and create Match Sets...............................................................21
Find matching records....................................................................................23
Process the match results................................................................................24
Match/Consolidate features............................................................................25
Chapter 3:
Define your input files and lists.................................................................. 27
Input files and lists .........................................................................................28
Input files........................................................................................................29
Re-use processed input (key data) with reference files..................................30
Group your records with lists .........................................................................31
Use lists to control the matching process.......................................................32
List types ........................................................................................................33
Dupe search within this list ............................................................................35
List Break Priority..........................................................................................36
Three approaches to defining lists..................................................................37
When a record doesn’t fit into any list ...........................................................40
Create groups of lists (super lists)..................................................................41
Reports on your lists.......................................................................................42
Chapter 4:
Prioritize and suppress records ................................................................. 47
Record priorities and types.............................................................................48
Record priority and suppression.....................................................................50
Prioritize or suppress records based on list membership ...............................52
Penalize records that contain blank fields ......................................................54
Prioritize records based on the contents of one field .....................................56
Reports about record ranking and priorities...................................................58
Chapter 5:
Purge input files or create output files ...................................................... 61
Match/Consolidate results ..............................................................................62
Purge bad records or post good records .........................................................64
Contents
3
Purge the input file......................................................................................... 65
Create an output file or post data to the input file ......................................... 67
Data that you can post.................................................................................... 68
Choose the best records for your output file.................................................. 69
Custom sort your output records....................................................................72
Create a multi-buyer file................................................................................74
Create a multi-occurrence file........................................................................ 76
Select a sample of records .............................................................................77
Reports about your purging or output process............................................... 78
Chapter 6:
Reports and statistics files.......................................................................... 81
Introduction to reports and report files .......................................................... 82
Statistics files ................................................................................................. 84
How statistics files relate to Match/Consolidate reports ............................... 86
How Match/Consolidate counts intra-list and inter-list matches................... 88
Use super lists for report data ........................................................................91
Print reports....................................................................................................92
Duplicate Records Report (.dup) ................................................................... 93
Executive Summary Report (.exs) ................................................................. 95
Input File Summary Report (.ifs)................................................................... 96
Input List Summary Report (.ils)................................................................... 97
Job Summary Report (.mjs) ........................................................................... 98
List-by-List Match Report (.llm) ................................................................. 104
List Duplicates Reports (.ldr)....................................................................... 106
List Match Reports (.lm).............................................................................. 109
List Quality Report (.lqr) .............................................................................113
Match Results Report (.mrr) ........................................................................115
Multi-List Report (.mlr)............................................................................... 117
Output File Reports (.ofr) ............................................................................ 119
Posted Dupe Groups Report (.pdg).............................................................. 124
Purge by List Reports (.prl) ......................................................................... 125
Sorted Records Report (.sor) .......................................................................127
Unparsed Records Report (.unp) ................................................................. 133
Job statistics file...........................................................................................135
Input statistics file........................................................................................ 137
List match statistics file ............................................................................... 138
List statistics file .......................................................................................... 139
Output statistics file .....................................................................................140
Purge statistics file.......................................................................................141
Super list match statistics file ...................................................................... 142
Multi-buyer statistics file .............................................................................143
List subordinates statistics file..................................................................... 144
Chapter 7:
Use group posting to consolidate data..................................................... 145
The basics of group posting.........................................................................146
Introduction to group posting ......................................................................147
Post data sources and destinations ............................................................... 148
Group posting depends on your fields .........................................................149
Group posting more than once per destination record................................. 150
Example: post a new phone number............................................................ 151
Example: additive information .................................................................... 154
4
Match/Consolidate User’s Guide
Examples of group posting strategies...........................................................155
When group posting is all you want to do....................................................159
Group post with an input purge....................................................................160
Reports on group posting .............................................................................162
Chapter 8:
Record matching ....................................................................................... 163
Introduction ..................................................................................................164
Choose between standard and extended matching.......................................165
Factors that affect comparison time .............................................................167
Matching strategies ......................................................................................168
Implement a matching strategy ....................................................................169
Rule matching ..............................................................................................170
Automatic matching .....................................................................................171
Advanced matching......................................................................................173
Use reports to examine the matching process ..............................................174
Chapter 9:
Engineer key data...................................................................................... 175
Key files .......................................................................................................176
Include record keys only as needed..............................................................178
Define key fields ..........................................................................................180
Standardize key data for lastline information ..............................................181
Standardize key data for peoples’ names .....................................................183
Standardize key data for firm (company) names .........................................185
Chapter 10:
Engineer break groups.............................................................................. 187
Form break groups .......................................................................................188
Break strategies ............................................................................................190
Prioritize your break group records..............................................................192
Break-group analysis....................................................................................194
Chapter 11:
Engineer your match setup....................................................................... 195
Compare record keys: the driver record.......................................................196
What makes records match ..........................................................................198
Simscore .......................................................................................................199
How close is close enough ...........................................................................202
How record order affects comparisons.........................................................204
Control record comparisons .........................................................................206
Match with unparsed addresses, last lines, names, and firms ......................208
Matching options..........................................................................................210
How blank fields affect matching ................................................................216
Fine-tune your matching process .................................................................218
Chapter 12:
Advanced matching................................................................................... 219
Terms............................................................................................................220
Match Sets....................................................................................................222
Multi level matching ....................................................................................224
Combine match set.......................................................................................236
Contents
5
Chapter 13:
Constant Key ID........................................................................................ 253
Use Constant Key ID ...................................................................................254
Appendix A:
Match/Consolidate and Match programs................................................ 259
Product-line overview..................................................................................260
Appendix B:
Calculate the size of your work files......................................................... 263
Appendix C:
Analyze your matching strategies ............................................................ 267
Appendix D:
Match/Consolidate Wizard .......................................................................269
Index............................................................................................................ 277
6
Match/Consolidate User’s Guide

Preface

Purpose and contents of this manual

Conventions

This guide explains how Match/Consolidate (MCD) programs perform record matching. Beginning with an entry-level orientation on the basics of record matching, this guide progresses through the common record matching functions and an explanation of the features that comprise the current technology of record matching.
Our examples and illustrations are based on actual MCD jobs set up and run through the MCD Views program on a Windows NT platform. If you are not using Views, look for similarly named parameters in the corresponding block of your job file. We assume that you are familiar with your operating system and have a general understanding of database management.
This document follows these conventions:
Convention Description
Bold We use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
Italics We use italics for emphasis and text for which you should substi-
tute your own data or values. For example, “Type a name for your
Menu commands
file, and the
We indicate commands that you choose from menus in the follow­ing format: Menu Name > Command Name. For example, “Choose File > New.”
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
!
We use this symbol to alert you to important information and potential problems.
We use this symbol to point out special cases that you should know about.
We use this symbol to draw your attention to tips that may be useful to you.
Preface
7
Documentation
Documents related to this manual include the following:
Document Description
Access the latest documentation
System Administrator’s Guide
Database Prep
Explains how to install your software.
Explains how to prepare input files for processing, including how to create DEF, FMT, and DMT files.
Match/Consolidate Extended Matching
Contains the operational how-to instructions for set­ting up extended matching.
Reference
Match Library Program-
This is a reference manual for the Match Library.
mer’s Reference
Match/Consolidate Library Reference
Quick Reference
This is a reference for programmers working with the Match/Consolidate Library.
Contains descriptions of the input and output fields, and the command line for the MCD job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for
each product that you’ve installed are available in the Documentation folder. Choose Start > Programs > Business Objects Applications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’ documentation.
8
Match/Consolidate User’s Guide
Chapter 1: Fundamentals of record matching
This chapter explains some of the fundamentals of record matching. It describes how to use Match/Consolidate (MCD) to match your records.
Chapter 1: Fundamentals of record matching
9

Terms

This guide references the following terms.
Term Definition
Consolidation Group posting Salvaging data
Dupe group Match group
Match key Name, address, or other data that is broken down into components,
Raw Data Match Key
Name_Line1 = George F Hayes First_name for 8 characters = GEORGE
Address = 100 Main St #5 Mid_name for 3 characters = F
Last_line = Edna, MN 55424 Last_name for 10 characters = HAYES
Consolidation (or group posting) means copying or accumulating data from one matched record to another. Often, it means merging matched records to form a single best record. Some users migrate information from one record to another, but do not specifically seek to merge the records.
This is a follow-up process, which occurs after records are identi­fied as members of match groups.
The terms dupe group and match group are used interchangeably in this guide. This refers to two or more records that were found to match each other.
standardized, and ready for comparison. For example:
Prim_range for 10 characters = 100
Prim_name for 15 characters = MAIN
Suffix for 6 characters = ST
Sec_range for 6 characters = 5
ZIP for 5 characters = 55424
Actual Key = GEORGE F HAYES 100 MAIN ST 5 55424
Match field Data that is part of the match key and is compared during the match-
ing process. The First_name data is one of the match fields in the example above. Middle name (Mid_name) is another, and Last_name, etc.
Break group Sorting keys into groups of records that are likely to match. Break
groups speed the duplicate detection process by eliminating com­parisons of records that have no likelihood of matching. Only records within the same break group are compared to one another.
10
Match/Consolidate User’s Guide

Benefits of Match/Consolidate

The benefits of using MCD begins with record matching. That means comparing name, address, and other customer data to find matching records, In other words, deciding whether, within your rules, Record A and Record B represent the same person, household, or company. We can help you get started with typical matching rules; eventually you will probably want to adjust them or make new rules.
Once you’ve identified pairs or groups of records that match, what do you want to do? Eliminate redundant records? Migrate customer data from one file to another? Consider the following possibilities listed in the following table.
Term Definition
Extended parsing Apply parsing and standardization capabilities of ACE and
Extended matching Highly tunable rule-based matching that lets you prioritize
Consolidation We offer two approaches to consolidation. With each, you create
TrueName, to prepare the cleanest, most complete data for match keys.
match fields. You can prioritize your match fields and make decisions for a match or non match on a per-field basis.
your own rules for comparing and consolidating records. You can consolidate matched records into a best record, or migrate data among your files.
Reference files When you repeatedly match against the same static database,
there’s no need to regenerate match keys each time. Some peo­ple call this feature durable or re-usable match keys.
Advanced matching Advanced matching lets you find up to three levels of matches
in one pass and find associated matches between separate data sets. For example, you can find families and individuals as well as separate residents all in one pass and give a unique number for each level on output. Association is finding persons who live at different residents at different times of the year by using a common data field.
Constant key Constant key lets you create an ID that is unique to a record or
group of duplicate records. It is sequential, static, and it will not change when records are updated or re-processed through MCD.
When you append new records to the database, change when records are updated or re-processed through MCD tags any that belong to a group with an existing ID with that same ID.
Feature Options
Input purge or create output file
Most users choose to send desirable records to an output data­base. Or, if disk space is a concern, you can drop undesirable records from the input database(s).
Multi-buyer Let’s say you’re bringing together customer lists from several
other direct marketers or publishers. Your best prospects may be the people whose names appear on two or more lists, indi­cating they may be most receptive to your offer.
Chapter 1: Fundamentals of record matching
11
Feature Options
Custom sorting and selection
You can perform Nth-select and/or limit your output to a cer­tain number of records. Within your maximum-records limit, you can select your best prospects using a variety of custom sorting strategies.
Business-to-business MCD isn’t just for consumer marketing. For example, with the
proper setup and multiple passes, you can perform
N
-per-firm selection—in other words, you can limit output so that only a certain number of individuals in each company will receive your offer. That helps you spend your advertising dollars most effectively.
Group posting When you’re working with several lists, take advantage of the
best of each list. Use the MCD group posting feature to salvage the best data—data that’s missing from your records—from those duplicate records that won’t be included in your final output.
Suppression lists You can work with suppression lists—for example, your own
bad-account file, or no-mail lists provided by the government or direct-marketing association (DMA)—to prevent wasted mailings and offending consumers.
12
Match/Consolidate User’s Guide

Data, rules, and results

The keys to successful MCD use involves Data, Rules, and Results.

Data Clean, complete name and address data will make a big difference in your

success. If you have data from several sources or from outside your organization, then there may be issues about format and consistency. We can help. Use ACE, TrueName, DataRight, or DataRight IQ tools to break data down into components, correct errors and inconsistencies, and fill in missing data.

Rules Rules refers to your matching rules—your criteria for when two records should

be called a match, and when they should not. You’ll need to think carefully about which fields will be evaluated, how they will be compared, and any special or exceptional circumstances that might override your normal criteria.
For Views users within your match criteria, we provide five default sets of rules to help you get started with individual, family, household, business, or business­individual matching. We recommend that you start your learning and testing with one of our rule sets, then adjust as necessary. That may mean a cyclical process in which you run the search for matches, check your reports, make rule changes, and run the search again.

Results Consider the results, or outputs, that you want at the end of the process. Do you

want to create an output database? If so, plan your criteria for the records to be included in that file. If you want to consolidate records, write lists of fields to consolidate and how to evaluate or combine each source. Finally, think about what reports you will need for yourself and your clients.
Chapter 1: Fundamentals of record matching
13
14
Match/Consolidate User’s Guide
Chapter 2: Record matching overview
This chapter summarizes the Match/Consolidate (MCD) process, and explains how preparation, setup, and step-by-step execution of your job is vital to getting the results you want from MCD.
Chapter 2: Record matching overview
15

Summary of the record matching process

To help you understand the MCD process, consider the five-step process shown at right. You perform the first two steps and Match/ Consolidate performs steps 3, 4, and 5.
Here we concentrate on the basics, so we ignore many of the features that you can include to tailor your MCD job to your job requirements. The other chapters of this guide further explain these features.

One step at a time This chapter describes the steps one at a time. As you better learn MCD and set

1. Prepare your files for Match/Consolidate
2. Set up your Match/ Consolidate job
3. Read records and create Match Sets
4. Find matching records
5. Process the match results
up your MCD jobs, you can do all the processing steps at once.
Match/Consolidate is a batch process. That means you set up a MCD job (define what records to use and what to do with them), and then start that job. Match/ Consolidate runs the job according to the job settings, in one batch.

Checking your results During the MCD batch process, your interaction is limited to reading progress

messages (if you so choose). However, once the process is complete, you can check your results by checking MCD reports and/or output files.
Match/Consolidate can produce 16 different pre-formatted reports, containing statistics about the process and actual record data for your analysis. In addition, MCD can produce many statistics files in which you can find most any data pertinent to your MCD job.

Disk space for generated files

Normally, you will create reports for every job (select the Create Reports option at the Execution Options window). Carefully look at the appropriate reports. If you don’t see the results you want, change your settings and re-process the job. Do this at each step until you get the results you want.
As it runs, MCD generates work files. If you run out of disk space for those files, the program will stop. Note that, depending on your operating system, you may get a variety of errors. For details on estimating disk space requirements, refer to “Calculate the size of your work files” on page 263.
16
Match/Consolidate User’s Guide
C
Prepare your files for
A
Match/Consolidate
Input file
Input file
Input file
By guiding you through a job, this chapter provides an overview of the three main Match/Consolidate processes. For specific details about the processes, refer to the remaining chapters of this guide.
Match/Consolidate reports are a most valuable source of information about your job. Study them carefully to see if you should adjust your job settings and rerun the process.
Note that your MCD job can be run in one execution; it need not be run in separate phases as shown in this illustration.
Preparation
Execution Process:
Read Records and Create Match Sets
JOHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN MA01501 ROBERT BRHDLEY WT. BRHDLEY & SONS ENTERPRISE 61 SUMMIT AVE SOUTH ADAMS MA01247 JOSEPHINE LAMER NEC INFORMATIN SYSTEMS 1414 MASSACHUSETTS AVE BOXBORO MA01719 MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD MA01038
MR GREG HAMMOND, MGR CUST REL LISTA INTERNATIONAL 106 LOWLAND ST HOLLISTON MA01746 MARY PETERS UNIVERSAL PLASTICS CORP 165 FRONT ST CHICOPEE MA01013 HECTOR R RODRIGUEZ IMPRESOS ALFA AVE DEGETAU A-7 SAN ALFONSO CAGUAS PR CONSTANSA F FOSTER TRAULSEN & CO INC PO BOX 169 COLLEGE POINT NY11356
TIM GLAZE SHEPHERD INTELLIGENCE SYSTEMS 358 BAKER AVE CONCORD MA01742 CLAIRE MONAHAN ASTRA PHARM PRODUCTS 50 OTIS ST WESTBOROUGH MA01581 ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON NJ08648 S DONGELO ACCO SWINGLINE 151 RADDIN RD GROTON MA01450 MR MOE L CURLY, SLS SUPV ROBERTS DISTRIBUTING CORP 372 PASCO RD SPRINGFIELD MA01119 LANCE R DUNHAM DIR ANGIOGRAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON MA01460 MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO NY14228 JAY SPUTNIK- MGR YANKEE AJOEIC ELEC CO 580 MAIN ST BOLTON MA01740 JAN PAINTER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETON MA01460 BERNIE VITTI SANDOZ 59 ROUTE 10 EAST HANOVER NJ07936 LUIS PABON MILES PUERTO RICO INC CALL BOX 11848 SAN JUAN PR KAREN MCFADDEN VP ROCHE BIOMEDICAL LAB 17 WALDRON AVE GLEN ROCK NJ07452 MAUREEN DABERNARDI BRADFORD FURNITURE 23 BRADFORD ST CONCORD MA01742 JEANNE WEINTRAUB, MKTG COORD CHANNING L BETE CO 200 STAGE RD SOUTH DEERFIELD MA01373 MR BRADFORD W PHOENIX H M SPENCER INC BOX 14030 HOLYOKE MA MS SUZANNE MC KIERNAN THE HANOVER INSURANCE COMPA 100 SOUTH ST WORCESTER MA01605 AL DIGREGORIONSON AAA WATER QUALITY SYSTEMS 154 CENTRAL ST SOUTHBRIDGE MA01550 DENNIS R MILLS SCOTT CASTINGS CORP 461 TONAWANDA ST BUFFALO
ll input records
MSMI56MA55987 JCAS12SA01501 WSNI89LI56308 AMUT92KI56551 SWAL31SE44240 MZAS48FR44242 OLAR96SU06460 FDRA77MA14240 BJAD29HU80308
HEI11SA10158
key file
Execution Process:
Mary Jane Smith
Mary Smith
Set up your
Match/Consolidate job
Reports
Reports
Input File Summary Input List Summary Sorted Records report Unparsed Records report
Find Matching
Records
M. Smi th
Maryjane Smith
Dupes, uniques…
Reports
Reports
Input file
Match/Consolidate
Multi-Occurrence
All Duplicates
Custom Match/Consolidate
Execution Process:
Process the Match
Results
Post records
Purge records
Post data
Chapter 2: Record matching overview
Resultant
record
matching
data
Reports
Reports
Output File report Posted Dupe Groups report Purge by List report Statistics files
Duplicate Records report
List Duplicates report
List Match reports
Sorted Records report
…and more…
17

About your first job

If you are new to MCD, we recommend that you make your first MCD job simple, to familiarize yourself with the overall MCD job processes. As an introductory job, and as a quick check to be sure your program is properly installed, we supply a collection of files that are automatically copied to your program's samples directory when you install MCD.
We provide the quik_mpg.dat file to serve as your database for the sample job. We also provide the quik_mpg.def and quik_mpg.fmt support files for that database. The database file contains 1000 records, each record having name and address data. If you look through the file, you can find some blank fields, and you may note that some of the records have addresses (or names, or both) similar to those of other records.
Depending on your operating system, we provide a job file named quikunix.mpg or quikwin.mpg. This job is preset to read the records of the quik_mpg.dat database, process it to find duplicate records, and produce a MCD Output File.
If you have not used the standard directory structure prompted by the installation program, then before you run the introductory job, you may have to make some small changes to the Auxiliary Files settings, so your program will be able to find the directories it needs for processing. Refer to your System Administrator's Guide for additional details.

Subdirectories In addition, many users prefer to keep their

jobs' output and reports in separate subdirectories, with a directory structure similar to the one shown at right.
If you want to separate your output and reports like this, you'll have to do two things:
First, create the additional directories
Then, in your job setup, (quikunix.mpg
or quikwin.mpg) modify the file paths that are set in your reports and output file blocks to correspond to those directories.
PW
MPG
Samples
Template
Work
Output
Reports
18
Match/Consolidate User’s Guide

Prepare your files for Match/Consolidate

You need to have two types of files ready before you run MCD:
Input files—the records you want in the job.
You can input up to 255 files for your MCD job, and they can be of varying types, including ASCII, dBASE3, EBCDIC, and delimited.
Supporting files — These files include the
1. Prepare your files for
Match/Consolidate
2. Set up your Match/
Consolidate job
3. Read records and
create Match Sets
4. Find matching records
5. Process the match
results
DEF file, which interprets your input data for MCD, and format files such as FMT, DMT, or EBC.
For details about input files and support files, refer to Database Prep.

Input files The best way to prepare your input files for MCD is to standardize your input

data by using name and address correction software, like our DataRight, TrueName, ACE, and IACE software. Standardized data increases the speed and accuracy of the match process.
If your data is not standardized, MCD Job can perform extended parsing for name and address data. Using extended parsing produces results equivalent to those derived from using DataRight, TrueName, ACE, and IACE (U.S. engine) software. However extended parsing is an extra cost option, and it may increase overall processing time. Note that data is standardized in the key data for the purpose of matching only.
If you are running our sample job (quikunix.mpg or quikwin.mpg) then your input file, quik_mpg.dat, is in your program's samples directory.

Re-run the same job If you just changed settings and now want to re-run the same job, you may be able

to speed up the process by using reference files. For details, see “Re-use
processed input (key data) with reference files” on page 30.
Chapter 2: Record matching overview
19

Set up your Match/Consolidate job

Once you have prepared your data for MCD, you need to set up your job so that the MCD program will know:
Which input file records to include
1. Prepare your files for Match/Consolidate
2. Set up your Match/ Consolidate job
3. Read records and
How to parse data from the input record
What key data to store for each record
What makes a match or no-match
What result (output) to produce from this job
Which reports that you want to create
create Match Sets
4. Find matching records
5. Process the match results

Three options If you are using MCD Views, you have the three options explained below for

setting up a MCD job. If you do not have Views, then you must use the third alternative shown here.
1. Use the Views Wizard — The MCD Views job setup Wizard prompts you through a setup for your job. The Wizard does not control all the features available in MCD; however, it does get the job started with the input, output, and processing options common for most MCD users.
Once you've initially set up your job through the Wizard, you can use Views to add any additional sophistication needed to produce exactly the results you want from MCD.
2. Design your job in Views — You can define and design your entire job setup through Views. Use the Views windows to select the options and define the setup parameters that produce the results you want.
Views currently includes standard, extended, and advanced matching processes through Match Criteria and Match Options windows.
3. Copy and edit a job file — If you are already familiar with MCD or other record matching programs, and like working in a text-only environment, you may want to set up your job by directly editing the job file.
For the introductory sample job (quikunix.mpg or quikwin.mpg) your setup will be minimal, to accommodate any differences from the standard MCD installation (refer to page 18).
20
Match/Consolidate User’s Guide

Read records and create Match Sets

As you learn to use MCD, you may want to consider performing the first process, that of reading the input records and creating match sets, as a separate step. Once you have developed a better understanding of MCD, you may be more comfortable combining all of the processes of your job in one execution.
To run our sample job (quikunix.mpg or quikwin.mpg) issue the MCD start command from
1. Prepare your files for Match/Consolidate
2. Set up your Match/ Consolidate job
3. Read records and create Match Sets
4. Find matching records
5. Process the match results
your command prompt, or, if using Views, open the sample job and run the job from within Views. (Refer to our Quick Reference for command line options and format requirements.)
The key file is a working file that MCD uses to hold the data that’s used in placing, matching, and ranking (prioritizing) your records. You won’t read or use this file; only the MCD process will.
The match process compares data from one record to corresponding data from another record. However, comparing all the record data would take far too much time for most purposes. Additionally, comparing some parts of the data might actually be counterproductive.
Therefore, instead of using all the record data, your matching process uses key data—data that you, the MCD user, identify as the significant parts of the record to use for finding matches. That data is stored in the MCD key file.

Each key represents a record

The key file contains a string of data for each record to be processed. You identify each field and the length of characters to use in the key. For example, you may want to store 12 characters of the last name data, 30 characters of firm data, 10 characters of primary range data, and so on.
Raw Data Match Key
Name_Line1 = George F Hayes First_name for 8 characters = GEORGE
Address = 100 Main St #5 Mid_name for 3 characters = F
Last_line = Edna, MN 55424 Last_name for 10 characters = HAYES
Prim_range for 10 characters = 100
Prim_name for 15 characters = MAIN
Suffix for 6 characters = ST
Sec_range for 6 characters = 5
ZIP for 5 characters = 55424
Actual Key = GEORGE F HAYES 100 MAIN ST 5 55424
Chapter 2: Record matching overview
21

Match sets represent a match strategy

When you set up your matching criteria, which determines whether two records will match, you define your matching strategy. Match/Consolidate collects the records that it compares using this match strategy into a match set.
Match Consolidate can evaluate more than one match set; however, this is an advanced feature. For more information, refer to “Advanced matching” on
page 219. If you have defined only one match strategy in your Match Consolidate
job, then MCD automatically creates the match set. Once the key data is assembled for MCD, it can move to the next process: finding matching records.
22
Match/Consolidate User’s Guide

Find matching records

Summary of the matching process

Once MCD has read your input records and created the key file, it performs the next main processing step—finding matching records. For detailed information about matching, refer to “Engineer
your match setup” on page 195.
1. Prepare your files for
Match/Consolidate
2. Set up your Match/
Consolidate job
3. Read records and
create Match Sets
4. Find matching records
5. Process the match
results
Normally, the match process step involves these three phases:
1. Match/Consolidate places records into small groups to avoid comparing records that have no reasonable likelihood of matching. This process is often referred to as forming break groups or sorting keys.
2. Next, MCD compares each key of a specific group to every other key in that group. When two or more keys match, MCD identifies their records as members of a dupe groupa duplicate record group. Note that the number of records in a dupe group can vary widely, depending on the quality of your data and your matching setup.
3. Then MCD sorts the keys of each group, to prioritize them and to categorize each record as a unique record, or a master or subordinate “dupe.”
Raw Data Match Key
Name_Line1 = George F Hayes First_name for 8 characters = GEORGE
Address = 100 Main St #5 Mid_name for 3 characters = F
Last_line = Edna, MN 55424 Last_name for 10 characters = HAYES
Prim_range for 10 characters = 100
Prim_name for 15 characters = MAIN
Suffix for 6 characters = ST
Sec_range for 6 characters = 5
ZIP for 5 characters = 55424
Actual Key = GEORGE F HAYES 100 MAIN ST 5 55424
For the records of a break group, MCD assigns each to a group of records that it has determined to match each other and then ranks each record within that dupe group.
Chapter 2: Record matching overview
23

Process the match results

After MCD has determined which records match, you need to have MCD do something productive with its conclusions. Normally, that something will be one of the following:
Purge the input file.
Create a new output file.
Update existing records.
1. Prepare your files for Match/Consolidate
2. Set up your Match/ Consolidate job
3. Read records and create Match Sets
4. Find matching records
5. Process the match results
Whichever outcome you want, MCD checks each record of the job, one after another. Match/Consolidate acts on the record based on the results of the matching process and your choice of processing options.
For details about input and output files, refer to “Purge input files or create output
files” on page 61.

Choose an output For detailed information about available options, refer to “Purge input files or

create output files” on page 61. For your jobs, your job process will likely make
obvious what you need from MCD.
The most common use of MCD is to use the results of your MCD job to produce one of the following two output files. The introductory sample job that we provide with MCD (quikunix.mpg or quikwin.mpg) is set up to produce the MCD output file.
The MCD output file contains all the unique records as well as all master
records (master dupes). This type of output file could be used as a mailing list.
The All Duplicates output file contains all the records that matched any
others. It will include all the records that were members of all the dupe groups, but none of the unique records. This file might be used in further database maintenance activities, or quality control functions. This type of output file might have other uses, as well (refer to “Output file” on page 64).
24
Match/Consolidate User’s Guide

Match/Consolidate features

When you're done with your first, introductory job, you’ll probably be ready to learn more about some of the features that you can incorporate in your MCD jobs, s u c h a s l i s t s a n d d a t a p o s t i n g . H e r e ' s w h e r e y o u ' l l f i n d t h e s e s u b j e c t s i n t h i s g u i d e :
Task MCD feature Page
number
Categorizing input records by source or
Lists 27
field value
Logically including or excluding records, based on field data
Consolidating or copying data among
Filters
28
Functions
Group posting 147
matching (duplicate) records
Tracking what happens to (or with)
Super lists 41
records from various sources
Selecting the highest quality records for output
Multi-Level matching Match sets Combined match set Nth select
Custom sorting
224 228 236
224 72
You’ll probably also want to learn about these features.
Task MCD feature Page
number
Finding more matching records Key data 172
Speeding up the match process Break groups 188
Controlling the match process Standard matching
Extended matching Advanced matching
Identifying the best of the matching records
Ranking or prioritizing records
Chapter 2: Record matching overview
165 171 170
48
25
26
Match/Consolidate User’s Guide
Chapter 3: Define your input files and lists
This chapter describes how to define your input. In this chapter, we explain how to define and limit files to be used as input, how to re-use already processed files, as reference files, and how to characterize records through the use of input lists.
Match/Consolidate (MCD) uses the words file and list interchangeably. Even if you do not set up lists, MCD considers each input file a list.
Chapter 3: Define your input files and lists
27

Input files and lists

Terms The following table describes the various input files and lists.

Term Description
Input file Your records. The database you want MCD to process.
Reference file A re-usable file that results from MCD reading input records.
List A grouping of records based on a common data characteristic.
Normal list A list of records that MCD should consider to be eligible records.
Suppression list A list of records MCD uses to prevent matching records of other lists
from being sent to the output.
Special list A list of records that should be treated as transparent, like seed lists.
They are not counted in determining how to characterize a match group—for example, multi-list or single-list.
Super list A group of lists. For example, a super list may be comprised of three
lists rented from one broker.

Set up your lists The following list summarizes how to set up lists using the MCD Job-File

and Views.
Input files and Reference files — Set up an input file block for each file you
want included in this job.
Lists — In your DEF file, define PW.List_ID. To manually set up lists, set up
one input list description block for each list. To automatically generate lists, use the Input List Default block.
Select records based on a value in a field — In the DEF file, define
PW.List_ID as the field containing your list identification data. For example, if you have a database field named List_Code that contains a useful value, use PW.List_ID = List_Code.
Select records based on any criteria — In the Input List Description section of your job, specify your selection criteria at the List Filter parameter.
28
Match/Consolidate User’s Guide

Input files

J
J M
Before MCD can decide whether or not two records match, it must read those records from your database file(s) and convert them into key data. Identify all files that you want included in your MCD job.

Determine which input file records to include

Match/Consolidate processes records from your input files one at a time. First it decides whether the record should be included in the job—perhaps the record has been marked for deletion, for example. Or perhaps you want to limit the number or type of records to use from an input file. You can set file by file limits on which records should be used with these methods:
A starting record number.
A maximum number of records from the input file.
Filters that apply to records of this input file. Filters are formal, logical
statements that MCD can act on as it reads your input record. For example, you might want to exclude or filter out any record that is not from a particular state. Refer to filter information in the Quick Reference.
An input processing exit function.
OHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN MA0150 1 ROBERT BRHDLEY WT. BRHDLEY & SO NS ENTER PRISE 61 SUMMIT AV E SOUTH ADAMS MA0124 7 JOSEPH INE LAME R NEC INFORM ATIN SYS TEMS 1414 MASSACH USETTS AVE BOXBOR O MA0171 9 MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD MA0103 8 MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON MA0174 6 MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE MA0101 3 HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT NY1135 6 TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD MA0174 2 CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH MA0158 1 ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON NJ0864 8 S DONGEL O ACCO SWING LINE 151 RADD IN RD GROTON MA0145 0 MR MOE L CUR LY, SLS SU PV ROBERTS DIST RIBUTI NG CORP 372 PASC O RD SPRINGFIEL D MA0111 9
input file
No limits; use al l records
input file
Start at #100
Maximum 3000
input file
Use records that pass the filter; don’ t use the rest
LANCE R DU NHAM DIR ANGIOG RAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON MA0146 0 MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO NY1422 8 JAY SPUT NIK- MGR YANKEE AJO EIC ELEC CO 580 MAIN ST BOLTON MA0174 0 JAN PAIN TER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETO N MA0146 0 BERNIE VITTI SANDOZ 59 ROUTE 10 EAST HANOVER NJ0793 6 LUIS PABON MILES PUERTO RICO INC CALL BOX 11848 SAN JUAN PR KAREN MC FADDEN VP ROCHE BI OMEDIC AL LAB 17 WALDR ON AVE GLEN ROC K
OHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN MA0150 1 ROBERT BRHDLEY WT. BRHDLEY & SONS ENTERPRISE 61 SUMMIT AVE SOUTH ADAMS MA0124 7 JOSEPHINE LAMER NEC INFORMATIN SYSTEMS 1414 MASSACHUSETTS AVE BOXBORO MA0171 9 MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD MA0103 8 MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON MA0174 6 MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE MA0101 3 HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT NY1135 6 TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD MA0174 2 CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH MA0158 1 ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON NJ0864 8 S DONGELO ACCO SWINGLINE 151 RADDIN RD GROTON MA0145 0 MR MOE L CUR LY, SLS SU PV ROBERTS DIST RIBUTI NG CORP 372 PASC O RD SPRINGFIEL D MA0111 9 LANCE R DU NHAM DIR ANGIOG RAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON MA0146 0 MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO NY1422 8 JAY SPUT NIK- MGR YANKEE AJO EIC ELEC CO 580 MAIN ST BOLTON MA0174 0 JAN PAIN TER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETO N MA0146 0 BERNIE VITTI SANDOZ 59 ROUTE 10 EAST HANOVER NJ0793 6 LUIS PABON MILES PUERTO RICO INC CALL BOX 11848 SAN JUAN PR KAREN MCFADDEN VP ROCHE BIOMEDICAL LAB 17 WALDRON AVE GLEN ROCK NJ0745 2 MAUREEN DABERNARDI BRADFORD FURNITURE 23 BRADFORD ST CONCORD MA0174 2
JOHN CASILLO CONSOLIDAION BEVERAGE 12 SAINT MARK ST AUBURN MA0150 1
ROBERT BRHDLEY WT. BRHDLEY & SONS ENTERPRISE 61 SUMMIT AVE SOUTH ADAMS MA0124 7 JOSEPHINE LAMER NEC INFORMATIN SYSTEMS 1414 MASSACHUSETTS AVE BOXBORO MA0171 9 MR BILL HANDRICH HELENA CHEMICAL CO PO BOX 220 HATFIELD MA0103 8 MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON MA0174 6
MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE MA0101 3 HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT NY1135 6 TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD MA0174 2 CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH MA0158 1 ROBERT FINE AMERICAN BILTRITE INC PO BOX 6146 TRENTON NJ0864 8 S DONGELO ACCO SWINGLINE 151 RADDIN RD GROTON MA0145 0 R MOE L CURLY, SLS SUPV ROBERTS DI STRIBU TING COR P 372 PASCO RD SPRINGFI ELD
JOHN C ASIL LO CONS OLID AION BEV ERAG E 12 SAI NT M ARK ST AUBU RN MA0150 1 ROBERT BRHDLEY WT. BRHDLEY & SO NS ENTER PRISE 61 SUMMIT AV E SOUTH ADAMS MA0124 7 JOSEPH INE LAME R NEC INFORM ATIN SYS TEMS 1414 MASSACH USETTS AVE BOXBOR O MA0171 9 MR BIL L HA NDRI CH HELE NA C HEMI CAL CO PO BOX 220 HATF IELD MA0103 8
MR GREG HA MMOND, MGR CUST RE L LISTA IN TERNAT IONAL 106 LOWL AND ST HOLLIS TON MA0174 6 MARY PET ERS UNIVERSAL PL ASTICS CORP 165 FRONT ST CHICOPEE MA0101 3 HECTOR R RODRIGU EZ IMPRESOS ALF A AVE DEGETAU A- 7 SAN ALFO NSO CAGUAS PR CONSTA NSA F FOST ER TRAULSEN & CO IN C PO BOX 169 COLLEGE PO INT NY1135 6
TIM GLAZ E SHEPHERD INTELLI GENCE SY STEMS 35 8 BAKER AV E CONCORD MA0174 2 CLAIRE MONAHAN ASTRA PHARM PR ODUCTS 50 OTIS ST WESTBO ROUGH MA0158 1 ROBERT FIN E AMER ICAN BIL TRIT E IN C PO BOX 614 6 TREN TON NJ0864 8 S DONGEL O ACCO SWING LINE 151 RADD IN RD GROTON MA0145 0 MR MOE L CUR LY, SLS SU PV ROBERTS DIST RIBUTI NG CORP 372 PASC O RD SPRINGFIEL D MA0111 9 LANCE R DU NHAM DIR ANGIOG RAPHIC DEVICES CORP 232 TAYLOR ST LITTLETON MA0146 0 MR PETER BEYETTE BROOKFRONT MEDICAL SERVICES 1459 NIAGARA FALLS BLVD BUFFALO NY1422 8 JAY SPUT NIK- MGR YANKEE AJO EIC ELEC CO 580 MAIN ST BOLTON MA0174 0 JAN PAIN TER LUCAS GRASON STADLER INC 537 GREAT RD LITTLETO N MA0146 0 BERNIE VIT TI SAND OZ 59 ROU TE 1 0 EAST HAN OVER NJ0793 6 LUIS PAB ON MILES PU ERTO RIC O INC CALL BOX 11848 SAN JUAN PR KAREN MC FADDEN VP ROCHE BI OMEDIC AL LAB 17 WALDR ON AVE GLEN ROC K NJ0745 2 MAUREE N DA BERN ARDI BRAD FORD FUR NITU RE 23 BRA DFOR D ST CONC ORD MA0174 2 JEANNE WEINTRA UB, MKTG COORD CHAN NING L BET E CO 200 STAGE RD SOUTH DEER FIELD MA0137 3 MR BRADF ORD W PHOE NIX H M SPENCE R INC BOX 1403 0 HOLYOKE MA MS SUZAN NE MC KIER NAN THE HANO VER INSU RANCE CO MPA 100 SOUT H ST WORCESTER MA0160 5 AL DIGRE GORION SON AAA WATER QUAL ITY SYST EMS 154 CENTRA L ST SOUTHBRI DGE MA0155 0 DENNIS R MILLS SCOTT CA STINGS CORP 461 TONAWA NDA ST BUFFALO NY1420 7
All input records

Parse data from the input file

When it reads your input record, MCD identifies specific parts of your input records, such as first name, last name, address, city, and so on. This is called parsing. Later chapters explain the various parsing options.
The parsing process is only for internal program use, to improve the detection of matching records. Match/Consolidate stores parsing results in working files that MCD will use in creating the key file. Parsing does not actually change the data in your input file, nor does it affect the data that will be in your output file (if you choose to create one). For more information about matching records, refer to “Find matching records” on page 23.
Chapter 3: Define your input files and lists
29

Re-use processed input (key data) with reference files

A reference file is a specialized work file that contains all the key data for an input file. Create the reference file during your first MCD process. For subsequent passes, MCD uses that reference file as the input data instead of using its associated input file.
Reference files are controlled by settings (parameters) of the Input File block of your MCD job setup. Refer to the Job-File Reference manual for details about how to create them or use them.
When you can use a reference file rather than an input file, you save the time that would have been spent repetitively reading input data and creating key files. As such, reference files can be a valuable substitute for large, frequently-used input files, such as mailer suppression lists.
For example, many mailers use the DMA’s MPS file, which lists about 3 million people who don't want to receive direct mail. Including this file as input suppresses these people from appearing on any mailing list produced by the MCD job.
When using reference files, you can change your matching and breaking setup in subsequent MCD passes or jobs. However, you must stay within the bounds of the key data that was captured when the reference file was created. The reference file can’t accommodate changes in the key data, or changes in list or input filter restrictions that apply to that file.

Lists and priorities Reference files inherit from their input file the settings that are used in their

corresponding input lists. (Lists are explained later in this chapter; priorities are explained in the following chapter.) Therefore, a reference file would have to be regenerated if your job includes the following:
List_ID — Changing to different List_ID field values. A reference file
inherits the List_ID of its input file, whether the List_ID is defined in the DEF file as a constant or as a field. If the input file has no List_ID, then neither will the reference file.
Priority field — Changing the priority field to a different field.
When you produce a reference file, generate the Job Summary report, for a record of all the relevant job settings, and include any options that you may want to include in jobs using this reference file.

Purge an input file When your MCD job includes input posting or group posting during an input file

purge, MCD will post to both the input file and its associated reference file. For details about input file purging with reference files, refer to “Purge the input file”
on page 65.
30
Match/Consolidate User’s Guide
Loading...
+ 252 hidden pages