BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP
BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP
BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP
BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of
Business Objects, an SAP company and/or affiliated companies in the United States and/or
other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries.
All other names mentioned herein may be trademarks of their respective owners.
Index ............................................................................................................163
Contents
5
6
Match/Consolidate Library Reference
Preface
About Match/
Consolidate Library
Conventions
This manual is a reference for programmers working with the Match/Consolidate Library. It explains how to make your application work with the library.
Each chapter contains explanations, code examples, call sequences, and reference pages about each of the function calls.
In this manual, we assume that you are already familiar with your programming
language, your operating system, and with concepts of database management.
This document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file, and
the
.txt
Menu
commands
!
extension (
We indicate commands that you choose from menus in the following
format: Menu Name > Command Name. For example, “Choose File >
New.”
We use this symbol to alert you to important information and potential
problems.
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful to
you.
Preface
7
Documentation
Other documentationDocuments related to this manual include the following:
DocumentDescription
Access the latest
documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, including how to create DEF, FMT, and DMT files.
Match/Consolidate User’s
Guide to Record
Matching
Explains the concepts behind name and address matching
software and provides examples of how to implement,
analyze, and fine-tune match detection strategies for the
best results.
Match/Consolidate
Extended Matching
Contains the operational how-to instructions for setting
up extended matching.
Reference
Match Library Program-
This is a reference manual for the Match Library.
mer’s Reference
Quick Reference
Contains descriptions of the input and output fields, and
the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for each
product that you’ve installed are available in the Documentation folder.
Choose Start > Programs > Business ObjectsApplications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’
documentation.
8
Match/Consolidate Library Reference
Chapter 1:
Install, compile, and link
This chapter provides information about installing the software, compiling your
application, and linking the compiled application with the Match/Consolidate
(MCD) libraries for Windows and UNIX systems.
Chapter 1: Install, compile, and link
9
Build on Windows
We provide 32-bit dynamically-linked libraries (DLLs), which you can use with
Windows NT 4.0, Windows 2000 Professional, Windows XP, and Windows 2000
Advanced Server.
InstallationIf you need installation information, refer to the System Administrator’s Guide.
Refer to pw\mplib for the MCD library.
Compilers and
programming
languages
Additional
compilation flags
Create sample
programs
See InfoSource for the latest compiler information.
Beginning with Match/Consolidate release 7.31c, additional compilation flags are
required. These flags are platform specific. For example, on Windows, add the
following: /D “FL_WIN32”
For more information, see the sample build scripts.
We provide a build file, read_dll.me for the sample programs. Refer to this file
for step-by-step instructions for creating a Visual C++ project to build samples.
10
Match/Consolidate Library Reference
Build on UNIX
InstallationIf you need installation information, refer to the System Administrator’s Guide.
Refer to …/postware/mplib for the MCD library.
CompilerSee InfoSource for the latest compiler information.
Additional
compilation flags
Create sample
programs
Beginning with Match/Consolidate release 7.31c, additional compilation flags are
required. These flags are platform specific. For example, on Solaris 64-bit, add
the following: -DFL_UNIX -DFL_UNIX_SOL
For more information, see the sample build scripts.
Refer to buildmp for examples of how to build the sample programs. Modify this
file to set your include and library directories.
Chapter 1: Install, compile, and link
11
12
Match/Consolidate Library Reference
Chapter 2:
Overview of the Match/Consolidate Library
This chapter provides information about the before-and-after of record matching
and the basic steps for working with Match/Consolidate (MCD) Library. It also
provides information about configuration files, sample programs, error handling
and progress callbacks, work files, and function calls.
Chapter 2: Overview of the Match/Consolidate Library
13
Before-and-after of record matching
The MCD Library is a companion to Match Library. Match Library compares two
records and determines whether or not they match. MCD Library works with the
before-and-after of record matching.
Condense records into
essential data
MCD LibraryMCD Library
Select a pair of
records to compare
Match
Library
Compare the recordsUse the results
Theoretically, you could compare each complete, original record with every other
complete, original record, but that would take a very long time. To save time,
you’ll condense records into the data essential for matching and select for
comparison only records that have a reasonable chance of matching.
When you compare records, you’ll decide which data must match—and how
closely—for records to be considered a match. Theoretically, you could use your
complete, original records for comparisons, but such comparisons would be
prohibitively slow.
To make matching more efficient, use our Match Library to condense records so
that they contain only the data needed during the matching process. These
condensed records are called keys.
Original recordKey
FirstName: JoAnne
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
City: Leeds
State: MA
ZIP: 01053
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
ZIP: 01053
Use the MCD Library to collect keys into a file called a reference file. The
reference file contains the Match Library key data, plus additional information—
such as list identifiers—that you can add when you build the reference file.
14
Match/Consolidate Library Reference
Eliminate needless
comparisons
Theoretically you could compare every record to every other record, but that
would take a long time. And many comparisons wouldn't make a lot of sense-for
example, a record with a ZIP Code of 01234 does not match a record with a ZIP
Code of 98765, so it doesn't pay to compare them at all.
To eliminate comparisons between records that are unlikely to match, you can
separate records into clusters called break groups. For example, you could
separate records by ZIP Code and look for matches only within each ZIP Code
group.
Forming break groups eliminates a huge number of unnecessary comparisons.
For a more in-depth discussion-including examples showing how many
comparisons can be eliminated with a minimal effect on results, refer to the
User's Guide to Record Matching.
You can also use lists to eliminate unnecessary comparisons. For example, if you
have a database which you know contains no duplicate records, you can assign all
the records in that file to a list, and tell MCD to cancel comparisons between
records within that list.
Use different
matching rules for
different lists
Form groups of
duplicate records
To control how records are compared, you can assign each record to a list. A list
is a group of records that have some common characteristic—perhaps all of the
records come from the same input database, or all have a common demographic
code.
You can then use different matching rules with different lists. For example, the
matching rules for comparisons within List A could be different from the
matching rules for comparisons between List A and List B. When you pass a pair
of records to the match engine, you can use list membership to dictate which
match rules are used for the comparison.
The MCD Library interprets the matching results and tracks information relevant
to the duplicate-detection process. You can query these results and then handle
the information however you choose.
Chapter 2: Overview of the Match/Consolidate Library
15
Work with the Match/Consolidate Library
When you work with the MCD Library, you’ll follow these basic steps:
1.Create a file that contains only the record-data needed for matching. Such a
file is called a reference file, and the condensed records are called keys.
2.Define logical groups of records called lists. You’ll do this at the same time
you create your reference files by putting a list identifier into each reference
file or into each individual record key.
3.Separate the record keys into break groups. For example, you might separate
records by the first three digits of the ZIP Code.
4.Within each break group, select pairs of record keys and pass them to the
match engine. If you have more than one set of match rules, you can control
which set of rules to use for each pair of records.
5.After the match engine compares the record keys, query the results.
Reference filesKey pairs
List 1
Key A Key D
Key B Key E
Key C Key F
List 2
Key G Key K
Key H Key L
Key J Key M
Break groupsComparisonsResults
Key F
Key H
Key J
Key M
Key A
Key D
Key G
Key B
Key E
Key C
Key K
Key L
Key F
Key H
Match
Library
Not
duplicate
16
Match/Consolidate Library Reference
Configuration files make setup faster and easier
Match/Consolidate Library offers external text configuration files that contain
some of the settings and options to be used for your MCD session. Set the
parameters of the text configuration files, then call mp_cfg_open(). This function
sets your options and returns a session handle.
Advantages of
configuration files
Some advantages to using external text configuration files include the following:
Rather than making dozens of calls to set up a session handle, you make one
call to mp_cfg_open(). Compared with making direct calls to the
conventional API, this should reduce application code size, training time,
development time, errors, and testing time.
You can change breaking strategy, list definitions, and reference file setup
simply by editing the configuration files. Therefore, you can test different
scenarios without changing API calls and rebuilding.
With small modifications to your code, you can use one code base to support
many different MCD scenarios. For each scenario, make a different set of
text configuration files.
Configuration files are heavily commented so that users can edit them
without referring to printed documentation. In fact, all the potential options
and settings are included in the comments, so making additions and changes
to the configuration can be mostly copy-and-paste.
Preset MCD strategy We include a set of configuration files already set for the most commonly-used
MCD strategy. Most users will be able to use these without further editing or
modification. Just specify the location and file name of the mp.cfg file in your
mp_cfg_open() call.
Five configuration
files
Match/Consolidate Library configuration parameters are distributed among five
configuration files, as described below.
Configuration fileFile nameDescription
Overall
Reference
List
Break
Miscellaneous
mp.cfg
mpref.cfg
mplist.cfg
mpbreak.cfg
mpmisc.cfg
Specifies the paths and file names of the other
configuration files.
Controls the format of the reference file, which
contains the record keys.
Defines logical groups of records.
Identifies which fields to use to form break
groups, and which characters in those fields to
use for breaking.
Set miscellaneous options such as the workfile directory.
Chapter 2: Overview of the Match/Consolidate Library
Sample programsWe provide three sample programs and their source code.
ProgramDescription
reftest.c
mptest.c
mptest2.c
ModulesThe sample programs use the following modules.
ProgramDescription
suppfncs.c
errfncs.c
Build the sample
programs
For instructions on building the sample programs, see “Install, compile, and link”
on page 9.
Demonstrates how to create and load a reference file. In this sample
program, data is read from a sample ASCII data file.
Demonstrates how to open a previously created reference file, set up
lists and matching, set up progress callbacks, find break groups, find
duplicates, and query duplicate results.
Demonstrates how to use the MCD Library
configuration files.
Contains supporting functions used by all of the sample programs.
Demonstrates error handling.
Chapter 2: Overview of the Match/Consolidate Library
19
Error handling and progress callback
Nearly every function in the MCD Library returns a status code. It is important to
check the status code after every MCD Library function call. If MCD returns the
value MP_ERROR, your application must decide whether to exit or not, what
information to display, and so forth.
Error handlerThe easiest way to handle error checking is to set up an error-handling function.
Call your error handler whenever MP_OK is not returned.
Inside your error handler, you can call mp_get_error_number() and
mp_get_error_messages() to obtain more information about the failed function
call.
You can call mp_get_error_info() rather than mp_get_error_number() and
mp_get_error_messages(). However, mp_get_error_info() may not work with
Visual Basic.
For sample code, refer to the errfncs.c module in the samples subdirectory.
One error handler for
MCD and Match
If you want to use just one error handler for MCD and Match Library function
calls, you need to detect whether the current error is MCD or Match Library to
determine which get_error_info() function to call. You can do this by checking
the prefix of the error-causing function (mp or mtc), or by setting a variable and
passing its value as a parameter when calling your error handler.
Progress callbacksCertain MCD Library processing steps—such as breaking and finding
duplicates—work with multiple records and potentially involve I/O and sorting.
If you wish to display progress information for MCD Library processing steps,
you can set up progress callbacks for the desired processing steps.
For sample code, refer to the suppfncs.c module in the samples subdirectory.
20
Match/Consolidate Library Reference
Work files
During the processes of creating break groups and finding duplicates, MCD
creates work files. You can specify where to store these files, what to call them,
and whether to delete them automatically.
Work files for break
groups
When finding break groups, MCD creates work files. By default, these files are
stored in the current directory. If you’d prefer to store them somewhere else, you
can specify a work directory.
You must specify a work-file session name. MCD uses the session name as the
base file name for work files. For example, if you assign a session name of “test,”
work files will have names such as test.dbk.
Work files for
duplicate results
When finding duplicates, MCD builds two additional work files—one for
duplicate results, and one for key-file information. These two files can become
quite large, so you may want to store them in a separate location. You can specify
the path and file names for each of these two work files.
Sample codeTo set work-file options, call mp_misc_set_work_file_info(). For sample code,
refer to the suppfncs.c module in the samples subdirectory.
Chapter 2: Overview of the Match/Consolidate Library
21
Function calls for initialization and termination
You may use some or all of the functionality of the MCD Library. You’ll call
these functions for any MCD application.
1.Call mp_init() to initialize the MCD Library.
Initialize the Match Library before initializing the MCD Library. For
more information, refer to your Match Library documentation.
2.If you are not using configuration files, call mp_set_keyfile() to set the path
and file name of the installation key file, mplib.key. You must specify the
location of this file before calling any other MCD functions.
This key file is not related to the record keys stored in your reference file.
This key file “unlocks” the library for use.
3.If you are using configuration files, call mp_cfg_open(). For details about
configuration files, see “Configuration files make setup faster and easier” on
page 17.
4.Set up an error-handling function. Inside your function, call
mp_get_error_info(). For more information, see “Error handling and progress
callback” on page 20.
5.Optional: Call mp_get_revision() to get version information for the MCD and
Match Libraries. You will need this information before calling for technical
support.
6.If you are not using configuration files, call the following functions to set
miscellaneous options and work-file information:
mp_misc_set_option_info()
mp_misc_set_work_file_info()
7.If you want to establish exit functions—for example, to report progress at
crucial points during processing—call mp_misc_set_exit_progress() or
mp_misc_set_exit_blank_field() to register your exit functions with the
MCD Library.
8.Perform processing. A complete, end-to-end job process would involve the
following major steps:
Create, open, or update reference files (Chapter 3).
Create lists and super lists (Chapter 4).
Form break groups (Chapter 5).
Search for duplicates and retrieve results (Chapter 6).
9.If you are using configuration files, call mp_cfg_close().
10. Call mp_term() to terminate the MCD Library and free global memory
allocated for it.
22
Match/Consolidate Library Reference
Chapter 3:
Reference files
This chapter introduces reference files and provides information about creating
reference files, adding keys to a reference file, and maintaining reference files.
Chapter 3: Reference files
23
Introduction to reference files
A reference file is a specialized work file that contains record keys. The record
keys are condensed versions of your original record data, containing only the data
needed to form break groups, determine list membership, perform matching, and
rank records within duplicate groups.
You’ll use the Match Library to define most of the key layout—namely, which
record data to include in the key. When you use MCD to create a reference file,
you’ll tell MCD which match key to use, and you might add useful MCD key
fields to each key.
Header and keysA reference file consists of header data and record keys.
The header contains information about the key layout. You can also place a list
identifier and other user data in the header.
The record keys contain the record data that is needed to form break groups and
perform matching; you use the Match Library to define this portion of the key.
When you create a reference file, you can also add a record-mapping field,
record-priority field, and list identification field to each key.
Temporary or
reusable
Header
Key layout, user “miscellaneous” field, list ID (if constant)
Keys
Overhead data, record data, record-mapping field, record-priority field, list ID field (if variable)
A reference file can be used as a temporary work space. For example, if some of
your duplicate-detection data comes from a transaction file which is always
changing, you can create a temporary reference file from your transaction data
each time you want to find duplicates.
Alternately, once you generate keys and store them in a reference file, you can
save the reference file and add, delete, update, and reuse the keys. However,
many programmers find it easier to regenerate the reference file each time they
find duplicates, rather than maintaining and updating keys in an existing
reference file.
24
Match/Consolidate Library Reference
Key dataA key contains record data plus other data such as a list identifier or record-
mapping information. Keys must include all of the data you want to use to form
break groups and perform matching.
You’ll use the Match Library to define the layout and content of the key. When
you use the MCD Library to create the reference file, you’ll specify which match
key to use. Optionally, you can add the MCD key fields listed below.
Key fieldDescription
List IDDefines to which list a key belongs.
Field priority
Key miscellaneous
Defines a record’s priority within a group of duplicate records.
Stores user information such as a record identifier that allows you to
map a key back to a particular record in your database.
Key lengthEach key consists of a overhead data, Match Library key fields, and MCD
Library key fields. To calculate the length of each key, find the sum of the
following items:
The size of the overhead data (16 bytes per key).
The length of each Match Library key field, including any alternates.
The length of each MCD key field.
Match/Consolidate pads each key so that the total key length is divisible by eight.
List identifiersThere are two ways to store list identifiers:
If all keys in the reference file belong to the same list, you can store the list
ID once, in the reference-file header.
If the reference file contains keys from multiple lists, store the list ID in each
key, in the MCD list ID field.
Chapter 3: Reference files
25
Create a reference file
Configuration fileIf you use the MCD configuration files, use the Reference configuration file,
mpref.cfg, to define MCD key fields and reference-file header information. For
instructions about completing the configuration parameters, see the comments in
the configuration file.
This configuration file replaces calls to mp_refcreate_set_option(). For more
information about using configuration files, see “Configuration files make setup
faster and easier” on page 17.
Sequence of
function calls
To create an empty reference file, call the functions below. For sample code, refer
to reftest.c.
1.Use the Match Library to define a match key. The match key defines the
fields that hold record data—for example, personal-name data, address data,
and so on. For more information, refer to the Match Library Programmer’s Reference manual.
2.Call mp_refcreate_open() to initialize the library and open a session to create
a reference file.
3.Call mp_refcreate_set_mtc_key_id() to specify which match-key layout to
use. You used the Match Library to define the match-key layout (step 1).
4.Call mp_refcreate_set_option() to reserve space for MCD key fields and set
options. If you are using MCD configuration files, you do not need to make
these calls.
5.Call mp_refcreate_close() to lock your settings and create the reference file.
26
Match/Consolidate Library Reference
Add keys to a reference file
To load data into a reference file, add keys. Use the Match Library to load record
data into each key. Use the MCD Library to load data into the MCD key fields.
To add keys a reference file, call the functions below. For sample code, refer to
reftest.c.
1.Call mp_ref_open() to open a previously created reference file.
2.Call mp_refqry_get_mtc_key_id() to get the match-key ID for the reference
file. You will pass this ID as input to Match Library functions.
3.Enter a loop to load data into each key:
Call mp_refmod_clear_key() to clear the internal key buffer before
loading data.
Use the Match Library to load record data into the Match key fields. For
more information, see your Match Library manual.
Call mp_refmod_set_data() to set data in MCD key fields. Call this
function once for each MCD field.
Call mp_refmod_add_key() to add the completed key to the reference
file.
4.Call mp_ref_close() to close the reference file.
Chapter 3: Reference files
27
Maintain a reference file
Many programmers find that it’s easiest to regenerate reference files each time
they perform MCD processing, rather than trying to maintain the files. Often the
time needed to regenerate a reference file is minimal, especially when weighed
against the work of maintaining it.
In some cases, however, you may prefer to maintain an existing reference file. To
update an existing key, you need to find the key, modify the key fields as
necessary, and write the data back to the key.
To modify a reference file, use the mp_refmod_*() functions and the appropriate
Match Library functions. To query the contents of keys, use the mp_refqry_*()
functions and the appropriate Match Library functions. For more information
about working with Match Library key data, refer to the Match Library Programmer’s Reference manual.
28
Match/Consolidate Library Reference
Chapter 4:
Lists, match specifications, and match
levels
This chapter provides an introduction to lists and explains how to set up lists,
control matching within and between lists, give one list priority over another, and
how to gather statistics for a group of lists.
Chapter 4: Lists, match specifications, and match levels
29
Introduction to lists
A list is a group of records that are related in some way—for example, all of the
records might have come from the same input database. Lists give you added
control over the matching process:
Control which matching rules to use when comparing two records.
Cancel comparisons within a list if you know there are no duplicate records
within that list.
Assign one list priority over another. For example, you could favor a house
list over a rented list.
Prioritize records within a break group.
List membershipWhen you set up a list, you assign a list identifier for that list. A record is a
member of that list if the record has the same list identifier. There are two ways to
assign records to lists:
If all of the records (keys) in a reference file belong to the same list, you can
put the appropriate list ID in the file header.
Alternately, you can include a list ID field in your record key. The value in
the list ID field indicates to which list a particular record belongs.
Types of listsYou can define three different types of lists:
List typeDescription
NormalContains good or eligible records (default).
SuppressionContains records that should be suppressed. Suppression records and
all records that match them should be removed from the output.
SpecialContains records that are not counted in the determination of whether
a duplicate group is single-list or multiple-list.
If a record doesn’t
If a record doesn’t belong to any of your defined lists, you can:
belong to any list
ActionOutcome
IgnoreLeave the record out of the job.
AbortReturn an error code.
AssignAssign the record to a default list that you specify.
30
Match/Consolidate Library Reference
PriorityYou can use lists to influence how records are ranked within break groups and
how records are ranked within duplicate groups.
Type of priorityDescription
Break priorityAfter finding break groups, MCD sorts records within each
group. Records are sorted first by break field, then by a “breakpriority” value that you can set for each list. The top-ranked
record in each break group becomes the driver record during
matching.
List priority
Apply blank priority
After finding duplicates, MCD sorts records within each duplicate group. There are two ways you can use lists to influence
this sorting process:
You can set a list-priority value for each list and have MCD
sort by list priority. For example, you might favor a house list
over a rented list.
You can also tell MCD whether to apply your blank-priority
setting to records from a particular list.
Data salvagingDuring matching, MCD can salvage data from a matching record and temporarily
place that data into the driver record for subsequent matching. You can turn this
feature on or off for each list.
Chapter 4: Lists, match specifications, and match levels
31
Set up lists
There are two ways to set up lists: through the List configuration file or through
function calls.
Configuration fileIf you are using the MCD configuration files, use the List configuration file,
mplist.cfg, to define lists. You’ll still use API calls to set up match specifications,
match levels, and super lists.
For instructions about completing the configuration parameters, see the
comments in the configuration file. The list configuration file replaces all of the
function calls listed below.
Function callsIf you are not using MCD configuration files, call these functions to define lists.
For sample code, refer to suppfncs.c.
1.Call mp_list_set_num_lists() to set the number of lists and the number of
match levels to use. You must define at least one list and one match level.
2.Enter a loop. Make the following calls for each list that you want to define.
Call mp_list_set_list_name() to set the name of the list.
Call mp_list_set_list_id() to set the list ID. A record key must have the
same list ID to be considered a member of this list.
Call mp_list_set_list_attr() to specify list attributes.
3.Call mp_list_set_default_action() to specify what action to take if a record
key does not belong to any of your defined lists.
4.If you set the default action to
MP_DLIST_ACTION_ASSIGN, call
mp_list_set_default_list_num() to specify which list is the default list.
32
Match/Consolidate Library Reference
Control matching within and between lists
To control matching within and between lists, you can define multiple match
specifications and match levels. For example, you might perform household
matching, then perform individual matching within each household.
Match specificationA match specification defines matching rules to use within a list or between two
lists. A match specification consists of a Match Library automatic-matching
session or rule-matching sessions, or a callback function.
setting up an automatic matching session (auto_id) or a rule matching session
(rule_id), refer to the Match Library Programmer’s Reference manual.
auto_idrule_idauto_id and rule_id callback
auto_idrule_id1auto_idA user-supplied function returns
any of the three match specifications shown at left.
rule_id2rule_id1
For information about
n
rule_id2
...
rule_id
n
...
rule_id
By default, MCD compares all records from all lists using the first match
specification. If desired, you can set up specific match specifications to control
comparisons within or between particular lists. You can also cancel comparisons
within or between lists.
Match levelA match level consists of one or more match specifications. Duplicates are found
at each level, and then the results are made available for the next match level.
For example, you could use multiple match levels if you wanted to detect
duplicates at the household level and at the individual level within the household
level.
Input
John
Robert
Mary
Jonathon
Jerry
Anna
Smith
Johnson
Jones
Smith
Doe
Smith
Duplicate group
Match level 1: Household
John
Jonathon
Anna
Smith
Smith
Smith
Duplicate group
Match level 2: Individual
John
Jonathon
Smith
Smith
Chapter 4: Lists, match specifications, and match levels
33
Order is importantThe order of the match levels is important because duplicates are found at each
level, and only the results are made available for the next level. Usually, you will
define your broadest match levels first, followed by more specific match levels.
In the example on the previous page, if the order of the match levels were
reversed, Anna Smith would not be included in the duplicate group at the
individual match level, and thus would not be available for duplicate detection at
the household match level.
ConsiderationsIf you use multiple match levels with many lists, memory requirements grow
quite rapidly. Also, extra match levels require more comparisons and processing,
so duplicate results are more complicated to retrieve and understand.
Function callsTo define match specifications and match levels, call the functions below. For
sample code, refer to suppfncs.c.
1.Call mp_list_set_num_lists() to set the number of lists and match levels that
will be defined. You must define at least one match level.
If you use configuration files, you do not need to call this function.
2.Call mp_list_set_num_match_specs() to set the number of match
specifications that will be defined. You must define at least one match
specification.
3.For each match specification, call mp_list_set_match_spec_ruleids(),
mp_list_set_match_spec_autoid(), or mp_list_set_exit_match_spec() to
define the match specification.
4.By default, MCD Library compares all records within or between all lists,
using the first match specification. To change comparisons within or between
lists, to set multiple match levels, or to cancel comparisons within or between
lists, call mp_list_set_match_spec().
34
Match/Consolidate Library Reference
Give one list priority over another
For information about list priority, refer to the User’s Guide to Record Matching.
Master and
subordinate records
During the matching process, MCD assembles
matching records into duplicate groups. Within each
group MCD ranks records. As shown at right, the best
record is called the “master” record, and all remaining
records are subordinate duplicates.
When you set up lists, you can give one list priority
Master record
Subordinate duplicate 1
Subordinate duplicate 2
Subordinate duplicate 3
over another, and thus control how records are ranked
within duplicate groups.
List priorityYou can use list priority set a preference for one list over another. For example, if
you are processing a house list and a rented list, you might prefer to keep records
from the house list.
To control list priority, you assign penalty points to each list. The more penalty
points you assign a list, the lower its priority. For example, if you assign your
house list a score of 1 and the rented list a score of 10, records from the house list
will be ranked more highly. Remember: The lower the penalty score, the higher
the priority.
To set list priority, call mp_list_set_list_attr().
Chapter 4: Lists, match specifications, and match levels
35
Gather statistics for a group of lists
A super list is a group of lists. You can use super lists to prepare a second set of
match statistics, combining the statistics for two or more regular lists.
For example, suppose you define five lists—two house lists and three rented lists.
You would get match statistics for each individual list. But suppose that you also
wanted a summary for the house lists and a summary for the rented lists. You
could create two super lists—one for the house lists, and one for the rented lists.
Super lists affect only the way that match statistics are reported. They do not
affect matching or record priority.
Function callsTo create a super list, call these functions:
1.Call mp_list_set_num_super_lists() to set the number of super lists that will
be defined.
2.For each super list,
Call mp_list_set_super_list_name() to set the super-list name.
Call mp_list_set_super_list_num() to specify which lists belong to this
super list.
36
Match/Consolidate Library Reference
Chapter 5:
Break groups
This chapter provides an introduction to breaking and explains how to set up
normal, adaptive, and automatic breaking. The chapter also provides information
about querying break groups.
Chapter 5: Break groups
37
Introduction to breaking
Eliminate needless
comparisons
Breaking is a judgment by you, the user, that certain records shouldn’t be
compared because there is no realistic probability that they would be considered
duplicates. For example, many users form break groups based partly on the first
three digits of the ZIP Code, because if the first three digits of the ZIP Code do
not match, there is virtually no chance that the records match.
During breaking, input keys are sorted into groups, based on the field data that
you identify for breaking. Then, when keys are compared during the search for
duplicate records, comparisons are made only within—not among—these groups.
This can drastically reduce the number of comparisons that MCD must make, and
thus substantially reduce processing time.
Types of breakingThe MCD Library offers three types of breaking:
Type of
breaking
Normal
breaking
Adaptive
breaking
Description
You specify which fields to use for breaking.
Helps you balance performance (which is optimal with small break
groups) and matching effectiveness (which is optimal with larger
break groups). You specify which fields to use for breaking and the
maximum break-group size. MCD breaks according to your settings,
but combines small break groups whenever possible.
Automatic
breaking
Break-group priorityAfter forming break groups, MCD sorts the records within each group. Records
You make a few settings and select a few options, and MCD decides
which fields to use for breaking.
are sorted first by break field, then by a “break priority” setting that you can make
for each list. The first record in each list becomes the driver record for matching.
You can use break-priority to increase the chances that records from a particular
list becomes the driver records for matching.
QueriesAfter breaking, you can query break groups to determine the success of your
breaking strategy. You can also use break group queries to select certain break
groups for duplicate detection.
38
Match/Consolidate Library Reference
Set up normal breaking
For normal breaking, you specify which fields, and precisely which characters
from the field, to use for breaking. To use a field for breaking, that field should be
common to all of your reference files. For example, if one of your files does not
have a firm-name field, do not use the firm-name field for breaking.
When you choose a field for breaking, consider the quality of the input data. For
example, if you use unstandardized data, or a field that is sometimes blank, some
duplicates may be missed.
Unstandardized dataIf possible, the field on which you break should contain standardized data.
Otherwise, typing errors or inconsistencies may cause matching records to be
placed into different break groups. Those records will never be compared and
thus never detected as duplicates.
For example, if you were to break on the first three characters of an
unstandardized first-name field, you would fail to catch the following records as
duplicates:
Richard Smith
100 Main St
Dick Smith
100 Main St
Five-digit ZIP CodesSome ZIP Codes serve Post Office boxes only. This is important if you are
matching business addresses, which sometimes use a street address and
sometimes use a PO box. If you broke on all five digits of ZIP, you would miss
these duplicates:
ACME Hardware
1234 Main St
La Crosse WI 54601
ACME Hardware
PO Box 100
La Crosse WI 54602
Blank fieldsIt is risky to break on any field that is empty in some records. For example, if you
were to break on the first three digits of the telephone number, you would fail to
catch the following duplicates:
Jane Smith
100 Main St
608.788.8154
Jane Smith
100 Main St
Chapter 5: Break groups
39
Match optionsSome breaking strategies may defeat match options that you defined in the Match
Library.
For example, suppose you instructed the Match Library to search for acronym
matches in the firm field, to detect matches such as IBM and International
Business Machines. If you also used the firm field for breaking, acronyms and
spelled-out names would end up in separate break groups and thus would never
be compared. For example, you would miss these duplicates:
Rita Terranova
ETI
Configuration fileIf you are using the MCD configuration file, you can use the break configuration
Rita Terranova
Eco Technologies Inc.
file, mpbreak.cfg, to set up normal breaking. For instructions about completing
the configuration parameters, see the comments in the configuration file.
The configuration file replaces calls to mp_break_set_info() and
mp_breakfld_set_info().
Sequence of function
calls
Before breaking, you must specify the work directory and the MCD Library
session name so that work files can be built.
1.Optional: You may wish to write an exit function to report progress to the
user during the process of finding break groups. Register your function by
calling mp_misc_set_exit_progress() during initialization.
2.Call mp_break_set_info() to set the break mode and number of
break fields.
If you do not set the number of break fields, MCD assumes there are no
!
break fields and places all records into one break group.
3.For each break field, enter a loop and make three calls to
mp_breakfld_set_info() to set the break field type, start position, and field
length.
4.Specify which reference file(s) to use. Call mp_input_set_num_refs() to set
the number of reference files, then call mp_input_set_ref_num() once for
each reference file to attach it to the session.
5.Call mp_break_find_groups() to form break groups.
40
Match/Consolidate Library Reference
Set up adaptive breaking
Good break groups are essential for optimum performance. If break groups are
too large, the job may take a long time to process. Or, if break groups are too
small, you can miss duplicates. With adaptive breaking, we balance performance
and matching effectiveness. We do this by keeping the size of each break group
below a specific value that you can set. The following is an example of how this
works.
You set up break fields just as you would with normal breaking, plus you set a
maximum break group size. Breaking is first done to the finest level possible
according to your settings, just as with normal breaking. But then, break fields are
adjusted in an attempt to combine small break groups into larger groups, thus
increasing your chances of catching duplicates.
For example, suppose that we have a maximum break group size of 40 record
keys, and we set breaking for all five digits of the ZIP Code and the first two
characters of the street name. First, our defined break groups would be created.
Then, wherever possible, small break groups would be combined into larger
groups without exceeding the 40-record maximum:
ZIP: 54601
Street: Ma
27 Records
ZIP: 54603
Street: Me
5 Records
ZIP: 546
39 Records
MCD can combine the first three break groups by adjusting the ZIP-Code
ZIP: 54660
Street: Na
7 Records
ZIP: 54494
Street: Na
13 Records
ZIP: 54494
20 Records
ZIP: 54494
Street: N
33 Records
Street: Ne
ZIP: 54494
Street: Ma
34 Records
break and forming a “546” group.
If MCD tries to combine the next three break groups on the “54494” ZIP
Code, resultant groups would exceed the 40-record maximum. However,
MCD can combine the “54494 / Na” group and “54494 / Ne” group by
adjusting our street-name break and forming a “54494 / N” group.
MCD leaves the “54494 / Ma” group as it is. This group cannot be combined
with any other groups without exceeding the 40-record maximum.
ZIP: 60606
Street: Ma
42 Records
The “60606 / Ma” group exceeds the 40-record maximum, even at the finest
break level. MCD leaves this group as it is. MCD never attempts to break a
group finer than your break-group settings.
QueriesTo determine which maximum break-group size to select, and more generally,
what kind of breaking results you might expect, you can do break group queries
before you actually find duplicates. This should allow you to fine tune your break
settings.
Chapter 5: Break groups
41
Configuration fileIf you are using the MCD configuration file, you can use the break configuration
file, mpbreak.cfg, to set up adaptive breaking. For instructions about completing
the configuration parameters, see the comments in the configuration file.
The configuration file replaces calls to mp_break_set_info() and
mp_breakfld_set_info().
Function callsSet up adaptive breaking as you would normal breaking (see “Set up normal
breaking” on page 39), but make an additional call to mp_break_set_info() to set
the maximum number of record keys that MCD can place into a combined break
group. Also, be sure to set the break mode to adaptive.
42
Match/Consolidate Library Reference
Set up automatic breaking
Automatic breaking is the easiest way to set up break groups. You make a few
settings, and MCD selects one or more break groups that are common to all of the
reference files. You can query the resultant break groups and accept them or
modify them.
Fields usedMCD limits automatic breaking to the following fields:
ZIP Code
Street name
Street range (house number)
Last name
To use automatic breaking, at least one of these fields must be common among all
of your reference files. You can set additional break groups if you wish.
Optional settingsYou may set the type of matching strategy, the tightness, and the density. This
information helps MCD decide which break fields to select.
Sequence of function
calls
Before breaking, you must specify the work directory and the MCD Library
session name so that work files can be built.
1.Optional: You may wish to write an exit function to report progress to the
user during the process of finding break groups. Register your function by
calling mp_misc_set_exit_progress() during initialization.
2.Call mp_break_set_auto() to set up automatic breaking.
3.Specify which reference file(s) to use. Call mp_input_set_num_refs() to set
the number of reference files, then call mp_input_set_ref_num() once for
each reference file to attach it to the session.
4.Call mp_break_find_groups() to form break groups.
Chapter 5: Break groups
43
Query break groups
After forming break groups, you can use queries to evaluate your break strategy
or to find duplicates one break group at a time.
Evaluate a breaking
strategy
Select a group with
which to find
duplicates
To evaluate your breaking strategy, you can get the following information:
Total number of break groups
Number of record keys in the largest break group
Break string for the largest break group
Break strings for a given break group
Break string lengths
Number of record keys found for a particular user query
When performing a query, you must set at least one break field, and the order of
the query fields must match the order of the break field.
For example, if you broke on all five digits of ZIP Code and three characters of
street name, you could query on 1, 2, 3, 4, or 5 digits of ZIP Code, or 1, 2, 3, 4, or
5 digits of ZIP Code plus 1, 2, or 3 characters of street name. However, a query of
just 1, 2, or 3 characters of street name would not be valid.
Break group queries allow you to select one group of records at a time with which
to subsequently find duplicates. You might take this approach to save time while
testing your matching setup, or simply to break the search job down into
manageable units.
For example, suppose you broke on five digits of ZIP and three characters of
street name. Some break groups such as “54601MAI” may contain very few keys,
so you might want to break “less fine” in order to catch more duplicates. You
could query on a three-digit ZIP such as “546”, get the query results back, and
decide whether to find duplicates in that group. If the query results said that the
“546” group contained a million keys, you would probably want to do another
query, perhaps on a five-digit ZIP such as “54601”.
You can also do multiple queries to form a group for a duplicate search. For
example, suppose you wanted to find dupes in the “54601” break group, but you
also wanted to include the “ ” break group (blank ZIP Code). You could do two
queries and use the results to find duplicates.
When query results are acceptable, you can find duplicates by passing the breakquery identifiers to the dupe search function mp_dupesrch_break_queries().
If you use multiple queries to form a group and your query results contain
overlapping records, your dupe results may be difficult to interpret, because
you may end up comparing a record to itself.
44
Match/Consolidate Library Reference
Sequence of function
calls
The sequence of function calls is as follows:
1.Form break groups.
2.Call mp_breakqry_get_largest_group(). It’s a quick way to detect a problem.
If your largest break group is too large, your breaking strategy may need to be
adjusted.
3.Get a list of all break strings, so you’ll know what to query:
Call mp_breakqry_get_str_len() to get the break-string length. Allocate
an array for the list.
Call mp_breakqry_get_num_groups() to get the number of break groups
that were found. Use this number to increment through a loop.
Enter a loop. Inside the loop, call the function
mp_breakqry_get_next_str() to get the next break string.
4.Call mp_breakqry_init() to set states and allocate memory for the query
process.
5.Call mp_breakqry_set_qryfld() to select a break field and set the query value
for this field.
6.Call mp_breakqry_do_query() to execute your query.
7.Call mp_breakqry_get_num_keys() to get the number of keys in the break
group(s) that met your query.
8.Call mp_breakqry_term() to free memory allocated for the query.
Chapter 5: Break groups
45
46
Match/Consolidate Library Reference
Chapter 6:
Duplicate search and results
This chapter provides an introduction to the duplicate search and results function,
explains how Match/Consolidate (MCD) selects keys to compare, and lists the
sequence of function calls for finding and retrieving duplicates.
Chapter 6: Duplicate search and results
47
Introduction
This introduction provides an overview. For a more detailed discussion of the
matching process, refer to the User’s Guide to Record Matching.
Find duplicatesYou can find duplicates in a batch process, which finds duplicates in all break
groups, or you can search selected break groups.
To find duplicates within all break groups in all specified reference files, use
batch mode.
To find duplicates within user-specified break groups only, use custom break-
group queries as discussed in “Query break groups” on page 44.
Query resultsMatch/Consolidate stores duplicate results in a work file. You can save these
results for later, or you can query them right away. Duplicate results are available
for each match level and can only be queried one match level at a time.
Use an exit routine to
override decisions
You can override a decision that the Match Library has made or will make about a
comparison.
For example, you can set a pair of keys to be duplicates or not duplicates before
the Match Library compares them. You can also override a decision after the
Match Library has already compared two keys.
If you set up a pre-comparison exit, MCD passes three pieces of information to
your exit function:
Match level
Key IDs of the records to be compared
User-settable return code
From within the exit routine, you can call mp_dupexit_get_data() function to
retrieve user data from the desired key. You can also call the mtc_key_get_*()
functions to directly retrieve key-field data. Your function passes back a return
code to indicate the results (if any) of your match decision.
Comparison exits are potentially expensive in terms of extra processing
!
time. The user exit could be called millions, even billions of times.
48
Match/Consolidate Library Reference
How Match/Consolidate selects keys to compare
The match feeder is the part of the MCD Library which selects pairs of keys from
the break groups and passes them to Match Library for duplicate detection. The
match feeder also selects which key will be the driver record, based on breakgroup priority settings.
Lists determine key
pairs
Your list settings are critical in determining which key pairs will be selected and
how each pair will be compared. The list comparison settings tell the match
feeder which lists to compare within and between. The match levels and match
specifications determine which Match Library matching sessions will be used,
and how many levels of comparison will be performed.
Common formatBefore passing keys to the Match Library, the match feeder puts the fields in a
common format. For example, if the first name field is 10 characters in one
reference file and 12 characters in another reference file, the match feeder selects
a common format of 12 characters for the first name field.
Sequence of record
comparisons
The MCD Library finds duplicates by comparing records one pair at a time. To
find duplicates in a break group of just ten records, up to 45 comparisons could be
performed.
1
2
3
4
5
6
7
8
9
10
10987654321
MCD performs comparisons.
Comparisonsperformed.
sons performed.
Comparisons not performed
Comparisons not
because MCD does not com-
performed because
pare a record within itself.
we don’t compare arecord with itself.
Comparisons not performed
Comparisons not
because once MCD compares
performed because
record 1 with record 2, it does
not compare record 2 with
once we’ve compared
record 1.
#1 with #2, we don’tneed tocompare #2with #1.
Compari-
If you have more than one match level, even more comparisons are done. All
necessary comparisons are done at each match level.
Chapter 6: Duplicate search and results
49
Driver recordWhen the search begins, record #1 is called the driver record. It is compared with
4
each non-driver record—#2, #3, #4, and so on, to record #10. Then record #2
becomes the driver. It is compared with #3, #4, and so on. Finally, record #9
becomes the driver and is compared to record #10. Match/Consolidate uses this
process to proceed through all 45 comparisons.
Some comparisons
are skipped
Some comparisons are skipped. If a record is already tagged as a duplicate, it
loses its chance to be the driver. For example, suppose that on the very first
comparison records #1 and #2 match. In this case, record #2 is tagged as a
duplicate, and it never gets a chance to be the driver. Thus any comparisons that
would have been made with record #2 as the driver will be skipped.
We assume that if records #1 and #2 match, there is no need for further
comparisons with record #2. Suppose that the next comparison—#1 and #3—
yields a match. Then all three records would be in the same duplicate group. We
never actually compared record #2 with record #3, but we assume that if they
both match record #1, then they match each other.
1098765432
MCD performs comparisons.
1
2
3
5
6
7
8
9
MM
m
Comparisons performed
Comparisons skipped because
Comparisons skipped
record 1 matches record 2.
because #1matches #2
Comparisons skipped because
Comparisons skipped
record 1 matches record 3.
because #1matches #3
M
Match found
MmMatch found
m
Match assumed because if
Match assumed because
record 1 matches record 2,
if #1 matches #2
and record 1 matches record
and #1 matches #3
3, then record 2 matches
then #2 matches #3
record 3.
Some comparisons
are canceled
A comparison is canceled if any of the following is true:
The non-driver record has already been tagged as a duplicate.
Both records come from the same input list, and searching was deactivated
within that list.
The records are from two different lists, and searching was deactivated
between those two lists.
50
Match/Consolidate Library Reference
Sequence of function calls for finding and retrieving
duplicates
For sample code, refer to suppfncs.c.
Duplicate search1.Optional: During initialization, you may set up exit functions that will be
called during the duplicate-search process. Register your functions with
MCD Library by calling mp_dupexit_get_data() and
mp_dupexit_set_exit_compare(). Within your exit routine, call
mp_dupexit_get_data() or mtc_key_get_*() to retrieve data.
2.Specify which reference file(s) to use.
Call mp_input_set_num_refs() to set the number of reference files.
For each reference file, call mp_input_set_ref_num() to attach the
reference file to the session.
If you previously specified which reference files to use (you must do this
before you form break groups), you may get reference-file information by
calling mp_input_get_num_refs() and mp_input_get_ref_num().
3.To search all break groups, call mp_dupesrch_batch(). This function sorts
keys into break groups then compares records within each group.
To search only selected break groups, do custom break-group queries and call
mp_dupesrch_break_queries(), as discussed in “Query break groups” on
page 44. You might take this approach to save time while testing your
matching setup, or simply to break the search job down into manageable
units.
Duplicate results4.Call mp_duperes_set_match_level() to specify which match level to query.
5.Call mp_duperes_set_results_type() to specify which types of records to
retrieve.
6.To increment through loops, you’ll need some numbers. Call
mp_duperes_get_value() to get the number of duplicates found and the
number of duplicate groups.
7.Call mp_duperes_get_keyid() to get the Match Library key ID so you can
query Match Library key fields.
8.Call mp_duperes_get_value() and get the number of result keys available.
Then, enter a loop to retrieve data from each key. Use the number of
duplicates and the number of duplicate groups (from step 6) to increment
through the loop.
Call mp_duperes_set_key_num() to set the desired key.
Call mp_duperes_get_value() to get information about matching results,
such as which duplicate group the key belongs to,
Call mp_duperes_get_data() to retrieve data from MCD key fields or the
miscellaneous field in the reference-file header.
Using the Match Library, call mtc_key_get_*() to get data from the
Match Library key fields.
Chapter 6: Duplicate search and results
51
52
Match/Consolidate Library Reference
Chapter 7:
Match/Consolidate functions
This chapter describes each of the Match/Consolidate (MCD) Library functions.
Chapter 7: Match/Consolidate functions
53
mp_break_find_groups()
S y nopsis #include <mpbrkinf.h>
int mp_break_find_groups(mp_id);
MP_ID mp_id; Input: Session ID from mp_init()
DescriptionCall mp_break_find_groups() to find break groups according to the current user
settings. At least one reference file must be attached to the session or the key
work file must be defined. At least one break field must be defined before calling
this function. You may not call this function while finding duplicates.
If you use batch mode to search for duplicates, you do not need to call this
function. When you call mp_dupesrch_batch(), the mp_break_find_groups()
function will be called automatically.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value was specified
MP_ERR_NONCOMMON_BREAK_FIELD — Invalid break field type
MP_ERR_INV_REF_FILE — Invalid reference file
Example
int status;
MP_ID mp_id;
/* Find break groups. */
status = mp_break_find_groups(mp_id);
54
Match/Consolidate Library Reference
mp_break_get_info()
Synopsis#include <mpbrkinf.h>
int mp_break_get_info(mp_id, break_info, break_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int break_info; Input: Type of information (see page 56).
int *break_info_val; Output: Current setting (see page 56).
DescriptionCall mp_break_get_info() to query various break group setting information. For
descriptions of the types of break information you can request, and the results you
will receive, see mp_break_set_info() on page 56.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_ID mp_id;
int break_info;
int break_info_val;
/* Query the break mode setting */
status = mp_break_get_info(mp_id, MP_BREAK_INFO_BREAK_MODE,
&break_info_val);
Chapter 7: Match/Consolidate functions
55
mp_break_set_auto()
Synopsis#include <mpbrkinf.h>
int mp_break_set_auto(mp_id, auto_type, tightness, density,
max_num_auto_breaks, num_user_breaks, auto_status);
MP_ID mp_id; Input: Session ID from mp_init()
int auto_type; Input: Type of automatic breaking
int tightness; Input: Matching tightness
int density; Input: Density of records
int max_num_auto_breaks; Input: Maximum number of automatic breaks
int num_user_breaks;Input: Maximum number of additional user breaks
int *auto_status;Output: Did auto breaking set any break fields?
DescriptionCall mp_break_set_auto() to set up automatic breaking. Match/Consolidate will
determine which fields to use for breaking based on the key layouts of the
reference files and the information you provide in this call. This function
overwrites any previous break-field settings.
After calling mp_break_set_auto(), you can query the settings it made, if any, and
set up additional break fields or modify the break fields that were set. The
maximum value for max_num_auto_breaks is MAX_AUTO_BREAK_FIELDS,
as defined in mpbrkinf.h.
Before calling this function, you must specify which reference files to use.
For more information, refer to mp_input_set_num_refs() and
mp_input_set_ref_num().
You can set the auto_type, tightness, and density. The more information you
provide, the more intelligent automatic breaking becomes.
ParameterValuesDescription
auto_typeMP_AUTO_RESIDENTMatching goal is to find one record per address.
MP_AUTO_HOUSEHOLDMatching goal is to find one record per household.
MP_AUTO_INDIVIDUALMatching goal is to find one record per person.
MP_AUTO_UNKNOWNMatching goal is unknown (default).
tightnessMP_AUTO_TIGHTUse strict matching when forming break groups.
MP_AUTO_MEDIUMUse medium matching when forming break groups.
MP_AUTO_LOOSEUse loose matching when forming break groups.
MP_AUTO_UNKNOWNTightness of matching is unknown (default).
56
Match/Consolidate Library Reference
ParameterValuesDescription
densityMP_AUTO_DENSITY_LOCALMost records are from a small regional area—for example, a city
or state.
MP_AUTO_DENSITY_NATIONALRecords are distributed across the entire nation.
MP_AUTO_UNKNOWNDensity of records is unknown (default).
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
int max_num_auto_breaks = 4; /* set no more than 4 auto break fields
*/
int num_user_breaks = 0; /* no additional user break fields */
int auto_status;
/* Set up auto breaking. */
status = mp_break_set_auto(mp_id, MP_AUTO_INDIVIDUAL,
MP_AUTO_LOOSE, MP_AUTO_NATIONAL_DENSITY, max_num_auto_breaks,
num_user_breaks, &auto_status);
Chapter 7: Match/Consolidate functions
57
mp_break_set_info()
Synopsis#include <mpbrkinf.h>
int mp_break_set_info(mp_id, break_info, break_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int break_info; Input: Type of information
int break_info_val; Input: Value
DescriptionCall mp_break_set_info() to set up normal or adaptive breaking. If you are using
automatic breaking, you do not need to call this function.
Values for break_infoValues for break_info_valDescription
MP_BREAK_INFO_NUM_FLDS0 to 16Number of break fields set for this
session.
MP_BREAK_INFO_BREAK_MODEMP_BREAK_MODE_NORMAL
MP_BREAK_MODE_ADAPTIVE
MP_BREAK_INFO_COMPBUF_SPANS>0Number of comparison-buffer
MP_BREAK_INFO_ADAPTIVE_3DG_ZIPMP_TRUE
MP_FALSE
MP_BREAK_INFO_ADAPTIVE_MAXKEYS2 to
Configuration fileRather than calling this function, you can set these options in the Break
LONG_MAX
Note that LONG_MAX is a C runtime library definition. If you are programming in another language, use
the equivalent maximum for your
system.
Break mode for this session.
Default value is
spans allowed while finding duplicates for this session. Default value
is 1.
Indicates whether or not to automatically use the first three digits of
the ZIP Code field. Applies to
adaptive breaking only. Default
value is
Maximum number of keys that can
be combined into a single break
group while finding adaptive break
groups.
If no value is specified, the maximum number of keys allowed is the
number of keys that can fit in the
work buffer. See
mp_misc_set_option_info().
TRUE
NORMAL
.
.
configuration file.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
58
Match/Consolidate Library Reference
Example
int status;
MP_ID mp_id;
/* Set the break mode to be adaptive breaking. */
status = mp_break_set_info(mp_id, MP_BREAK_INFO_BREAK_MODE,
MP_BREAK_MODE_ADAPTIVE);
Chapter 7: Match/Consolidate functions
59
mp_breakfld_get_info()
Synopsis#include <mpbrkinf.h>
int mp_breakfld_get_info(mp_id, breakfld_num, breakfld_info,
breakfld_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int breakfld_num; Input: Break field number
int breakfld_info; Input: Type of information to get (see page 61)
int *breakfld_info_val; Output: Break field information
DescriptionCall mp_breakfld_get_info() to get break-field settings. For a list of the
information you can get, see mp_breakfld_set_info() on page 61.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_INVVAL — Value not valid
Example
int status;
MP_ID mp_id;
int breakfld_info_val;
/* Query the break field type of the first break field. */
status = mp_breakfld_get_info(mp_id, 1, MP_BREAKFLD_INFO_FLDTYPE,
&breakfld_info_val);
60
Match/Consolidate Library Reference
mp_breakfld_set_info()
Synopsis#include <mpbrkinf.h>
int mp_breakfld_set_info(mp_id, breakfld_num, breakfld_info,
breakfld_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int breakfld_num; Input: Break field number
int breakfld_info; Input: Type of information to set (see table below)
int breakfld_info_val; Input: Value
DescriptionCall mp_breakfld_set_info() to set the following break-field information:
Value for breakfld_infoValues for breakfld_info_valDescription and valid settings
MP_BREAKFLD_INFO_FLDTYPEFor valid field types, see the Match
Library header file
MP_BREAKFLD_INFO_STARTPOS1 to the length of the fieldThe starting break position in the field.
MP_BREAKFLD_INFO_FLDLEN1 to (field length – STARTPOS + 1)The length of the break field—in other
Configuration fileRather than calling this function, you can set these options in the Break
mtckey.h
.
The type of Match Library key field.
words, the number of characters from the
field to use for breaking.
configuration file.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
/* Set the field type of the first break field. */
status = mp_breakfld_set_info(mp_id, 1,
MP_BREAKFLD_INFO_FLDTYPE,MTC_KEYFLD_ZIP);
Chapter 7: Match/Consolidate functions
61
mp_breakqry_do_query()
Synopsis#include <mpbquery.h>
int mp_breakqry_do_query(mp_breakqry_id);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query session ID from
mp_breakqry_init()
DescriptionCall mp_breakqry_do_query() to execute a previously defined break-group
query. After the query has been performed, you can get the results of the query
and use the results to find duplicates.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
MP_ERR_INV_WORK_FILE — One or more work files are corrupt
int status;
MP_BREAKQRY_ID mp_breakqry_id;
/* Perform a break-group query. */
status = mp_breakqry_do_query(mp_breakqry_id);
62
Match/Consolidate Library Reference
mp_breakqry_get_largest_group()
Synopsis#include <mpbquery.h>
int mp_breakqry_get_largest_group(mp_id, largest_bg_count, largest_bg_str);
MP_ID mp_id; Input: Session ID from mp_init()
unsigned long *largest_bg_count; Output: Size of largest break group
char *largest_bg_str; Output: Break string of largest break group
DescriptionCall mp_breakqry_get_largest_group() to query the size and break string of the
largest break group.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_INVFLD — Passed in a NULL pointer for largest_bg_str
Example
int status;
MP_ID mp_id;
unsigned long largest_bg_count;
char largest_bg_str[256];
/* Query the largest break group */
status = mp_breakqry_get_largest_group(mp_id, &largest_bg_count,
largest_bg_str);
Chapter 7: Match/Consolidate functions
63
mp_breakqry_get_next_str()
Synopsis#include <mpbquery.h>
int mp_breakqry_get_next_str(mp_id, break_str_num, break_str);
MP_ID mp_id; Input: Session ID from mp_init()
int break_str_num; Input: Number of the break group whose break string to
query
char *break_str; Output: Break-string value
DescriptionCall mp_breakqry_get_next_str() to query the break strings of the break groups to
be used in duplicate detection. You must call mp_break_find_groups() before
calling this function.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
Example
MP_ERR_INVVAL — Invalid value specified
MP_ERR_INV_WORK_FILE — Invalid work file encountered
int status;
MP_ID mp_id;
int break_str_num = 1;
char break_str[256];
/* Get the break string of first break group */
status = mp_breakqry_get_next_str(mp_id, break_str_num, break_str);
64
Match/Consolidate Library Reference
mp_breakqry_get_num_groups()
Synopsis#include <mpbquery.h>
int mp_breakqry_get_num_groups(mp_id, num_groups);
MP_ID mp_id; Input: Session ID from mp_init()
unsigned long *num_groups; Output: Number of break groups
DescriptionCall mp_breakqry_get_num_groups() to query the number of break groups to be
used in duplicate detection.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INV_WORK_FILE — Invalid work file encountered
Example
int status;
MP_ID mp_id;
unsigned long num_groups;
/* Get the number of break groups
that will be used in dupe detection */
status = mp_breakqry_get_num_groups(mp_id, &num_groups);
Chapter 7: Match/Consolidate functions
65
mp_breakqry_get_num_keys()
Synopsis#include <mpbquery.h>
int mp_breakqry_get_num_keys(mp_breakqry_id, total_bgkeys);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query session ID from
mp_breakqry_init()
unsigned long *total_bgkeys; Output: Number of keys found
DescriptionCall mp_breakqry_get_num_keys() to learn the number of keys found for your
break-group query. You can then decide if a query was too restrictive or not
restrictive enough. For example, if a query returns thousands of keys, you might
narrow the query before finding duplicates.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_BREAKQRY_ID mp_breakqry_id;
unsigned long total_bgkeys;
/* Get the number of keys found for a break-group query. */
status = mp_breakqry_get_num_keys(mp_breakqry_id, &total_bgkeys);
66
Match/Consolidate Library Reference
mp_breakqry_get_str_len()
Synopsis#include <mpbquery.h>
int mp_breakqry_get_str_len(mp_id, break_str_len);
MP_ID mp_id; Input: Session ID from mp_init()
int *break_str_len; Output: Length of break-group strings
DescriptionCall mp_breakqry_get_str_len() to query the length of the break strings to be
used in duplicate detection.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_ID mp_id;
int break_str_len;
/* Get the length of the break strings. */
status = mp_breakqry_get_str_len(mp_id, &break_str_len);
Chapter 7: Match/Consolidate functions
67
mp_breakqry_init()
Synopsis#include <mpbquery.h>
int mp_breakqry_init(mp_id, mp_breakqry_id);
MP_ID mp_id; Input: Session ID from mp_init()
MP_BREAKQRY_ID *mp_breakqry_id; Output: Break-query session ID
DescriptionCall mp_breakqry_init() to initialize a session for querying break groups. You can
call this function multiple times to initialize multiple query sessions.
Before calling this function, you must attach at least one reference file to the
MCD Library session, have at least one break field defined, and call
mp_break_find_groups().
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
Example
int status;
MP_ID mp_id;
MP_BREAKQRY_ID mp_breakqry_id;
/* Initialize a break group query session */
status = mp_breakqry_init(mp_id, &mp_breakqry_id);
68
Match/Consolidate Library Reference
mp_breakqry_set_qryfld()
Synopsis#include <mpbquery.h>
int mp_breakqry_set_qryfld(mp_breakqry_id, break_fld_num, break_pos,
data);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query-session ID from
mp_breakqry_init()
int break_fld_num; Input: Break-field number
int break_pos; Input: Starting position within the break field
char *data; Input: Break-string value to query. Must be a null-terminated string.
DescriptionCall mp_breakqry_set_qryfld() to set a break field to be used as part of a break-
group query. You must set at least one break field to form a valid break-group
query. You must set query fields in the same order as the break fields are defined.
For example if the first break field is ZIP Code, then the only possible first query
field is ZIP Code.
The break_pos must be between 1 and the break length of the break field. The
break-string value passed in as data can be the first 1 to n characters of a break
field starting at the break_pos.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
MP_ERR_INV_QUERY — Invalid query defined
Example
int status;
MP_BREAKQRY_ID mp_breakqry_id;
int break_fld_num = 1;
int break_pos = 1;
char *data = “54601”; /* the break string to query */
/* Set a break field to use in a break group query */
status = mp_breakqry_set_qryfld(mp_breakqry_id, break_fld_num,
break_pos, data);
Chapter 7: Match/Consolidate functions
69
mp_breakqry_term()
Synopsis#include <mpbquery.h>
int mp_breakqry_term(mp_breakqry_id);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query session ID from
mp_breakqry_init()
DescriptionCall mp_breakqry_term() to terminate a break-query session and free all
information and memory associated with the break-query session.
You must call mp_breakqry_init() before calling mp_breakqry_term().
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
Example
int status;
MP_BREAKQRY_ID mp_breakqry_id;
/* terminate a break group query session */
status = mp_breakqry_term(mp_breakqry_id);
70
Match/Consolidate Library Reference
mp_cfg_close()
Synopsis#include <mpcfg.h>
int mp_cfg_close(mp_cfg_id);
MP_CFG_ID mp_cfg_id; Input: Session ID from mp_cfg_open()
DescriptionCall mp_cfg_close() to close a configuration-file session. Call mp_cfg_close() for
each session that was initialized with mp_cfg_open().
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
Example
int status;
MP_CFG_ID mp_cfg_id;
/* Close a Match/Consolidate Library configuration-file session. */
status = mp_cfg_close(mp_cfg_id);
Chapter 7: Match/Consolidate functions
71
mp_cfg_get_num_ref_files()
Synopsis#include <mpcfg.h>
int mp_cfg_get_num_ref_files(mp_cfg_id, num_refs);
MP_CFG_ID mp_cfg_id; Input: Session ID from mp_cfg_open()
int *num_refs; Output: Number of reference files
DescriptionCall mp_cfg_get_num_ref_files() to query the number of reference files that were
set by the configuration-file session.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_CFG_ID mp_cfg_id;
int num_refs;
/* Query the number of reference files set by the cfg file session
*/
status = mp_cfg_get_num_ref_files(mp_cfg_id, &num_refs);
72
Match/Consolidate Library Reference
mp_cfg_get_ref_info()
Synopsis#include <mpcfg.h>
int mp_cfg_get_ref_info(mp_cfg_id, refnum, mp_ref_id, refname,
mtc_key_id, key_misc_len, const_list_id, list_id_len);
MP_CFG_ID mp_cfg_id; Input: Session ID from mp_cfg_open()
int refnum; Input: Reference-file number
MP_REF_ID *mp_ref_id; Output: Reference session ID
char *refname; Output: Name of reference file
MTC_KEY_ID *mtc_key_id; Output: Match Library key ID for this reference
file
int *key_misc_len; Output: Length of key-miscellaneous field in file header
char *const_list_id; Output: Constant list ID from file header
int *list_id_len; Output: Length of list ID key field
DescriptionCall mp_cfg_get_ref_info() to get information about a reference file in a
configuration-file session.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_CFG_ID mp_cfg_id;
int refnum;
MP_REF_ID mp_ref_id;
char refname[256];
MTC_KEY_ID mtc_key_id;
int key_misc_len;
char const_list_id[64];
int list_id_len;
/* Query info about the first ref set through the cfg file session
*/
status = mp_cfg_get_ref_info(mp_cfg_id, &refnum, &mp_ref_id,
refname,&mtc_key_id, &key_misc_len, const_list_id, &list_id_len);
Chapter 7: Match/Consolidate functions
73
mp_cfg_open()
Synopsis#include <mpcfg.h>
int mp_cfg_open(mtc_cfg_id, mp_id, mp_cfg_id, cfg_file_name);
MTC_CFG_ID mtc_cfg_id; Input: Session ID from mtc_cfg_open()
char *cfg_file_name; Input: Name of Match/Consolidate “overall”
configuration file to open
DescriptionCall mp_cfg_open() to open a configuration-file session for the MCD Library.
Make one call to mp_cfg_open() for each session desired. You cannot call this
function unless you have successfully called mtc_cfg_open().
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_CFG_INVLINE — Invalid line in configuration file
MP_ERR_CFG_FILE_NOT_OPEN — Configuration file not open
MP_ERR_CFG_MISSING_ENTRIES — Required line missing from
configuration file
MP_ERR_CFG_REPEAT_CFG_FILE — Invalid repeated value in configuration
file
MP_ERR_CFG_MISSING_LABEL — File label missing from configuration file
MP_ERR_CFG_WRONG_ORDER — Configuration file labels in wrong order
MP_ERR_CFG_DEFAULT — Configuration file error
int status;
MTC_CFG_ID mtc_cfg_id;
MP_ID mp_id;
MP_CFG_ID mp_cfg_id;
/* Open a Match/Consolidate Library cfg file session
where settings are in file named "mp.cfg" */
status = mp_cfg_open(mtc_cfg_id, mp_id, &mp_cfg_id, "mp.cfg");
74
Match/Consolidate Library Reference
mp_duperes_get_data()
Synopsis#include <mpdupes.h>
int mp_duperes_get_data(mp_id, duperes_info, ret_res_info);
MP_ID mp_id; Input: Session ID from mp_init()
int duperes_info; Input: Key field from which to retrieve data
(see table below)
char *ret_res_info; Output: Requested data
DescriptionCall mp_duperes_get_data() to get data from a MCD key field.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
Example
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVFLD — Invalid field specified
MP_ERR_INVVAL — Invalid value specified
int status;
MP_ID mp_id;
char ret_res_info[256];
/* Query the key miscellaneous data from a duplicate-result key */
status = mp_duperes_get_data(mp_id, MP_DUPERES_KEY_MISC,
ret_res_info);
Chapter 7: Match/Consolidate functions
75
mp_duperes_get_keyid()
Synopsis#include <mpdupes.h>
int mp_duperes_get_keyid(mp_id, mtc_key_id);
MP_ID mp_id; Input: Session ID from mp_init()
MTC_KEY_ID *mtc_key_id; Output: Match key ID
DescriptionCall mp_duperes_get_keyid() to get the match-key ID needed to query the Match
Library key fields.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
Example
int status;
MP_ID mp_id;
MTC_KEY_ID mtc_key_id;
/* Get the mtc_key_id needed to query Match key fields */
status = mp_duperes_get_keyid(mp_id, &mtc_key_id);
76
Match/Consolidate Library Reference
mp_duperes_get_value()
Synopsis#include <mpdupes.h>
int mp_duperes_get_value(mp_id, duperes_info, ret_res_info);
MP_ID mp_id; Input: Session ID from mp_init()
int duperes_info; Input: Information to retrieve (see table on next page)
unsigned long *ret_res_info;Output: Information
DescriptionCall mp_duperes_get_value() to query information about the duplicate results.
For a list of the information you can query, see the table below.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specifie
Example
Values for duperes_infoDescription
MP_DUPERES_LARGEST_GROUP_COUNTThe total number of records in the largest duplicate group.
MP_DUPERES_NUM_DUPESThe total number of duplicates found.
MP_DUPERES_NUM_GROUPSThe total number of duplicate groups. Use this number to loop through
MP_DUPERES_NUM_KEYSNumber of result keys available for querying.
MP_DUPERES_GROUP_NUMGroup number of the duplicate group to which this record belongs.
MP_DUPERES_UNIQUE_NUMUnique number for this record.
MP_DUPERES_GROUP_ORDERThis record’s position within the duplicate group.
MP_DUPERES_GROUP_COUNTTotal number of records in the duplicate group to which this record
MP_DUPERES_DRIVERWas this record used as the driver record for comparisons?
int status;
MP_ID mp_id;
unsigned long ret_res_info = 0;
/* Query how many dupe results keys there are */
status = mp_duperes_get_value(mp_id, MP_DUPERES_NUM_KEYS,
&ret_res_info);
the results.
belongs.
MP_DUPERES_RULE_NUMRule number of the rule used to make the match decision.
MP_DUPERES_RULE_SCOREScore for the rule used to make the match decision.
MP_DUPERES_WEIGHTED_SCOREDecision weighted score for this record.
MP_DUPERES_SUPER_COUNTNumber of super lists.
Chapter 7: Match/Consolidate functions
77
Values for duperes_infoDescription
MP_DUPERES_SUPER_NUMSuper-list number of the super list to which this record belongs.
MP_DUPERES_LIST_COUNTNumber of lists.
MP_DUPERES_LIST_NUMList number of the list to which this record belongs.
MP_DUPERES_LIST_PRIORITYThe list-priority value for this record.
MP_DUPERES_LIST_TYPEThe type of list to which this record belongs.
MP_DUPERES_DUPEIs this record a subordinate duplicate?
MP_DUPERES_MASTERIs this record a master duplicate?
MP_DUPERES_MATCHSPEC_EXIT_CALLEDWas a user match exit called?
MP_DUPERES_COMPEXIT_DECISIONDid a user match exit make the match decision?
MP_DUPERES_MATCHSPEC_INDEXWhich match specification caused the decision?
MP_DUPERES_MATCH_LEVELWhich match level is being queried?
78
Match/Consolidate Library Reference
mp_duperes_set_key_num()
Synopsis#include <mpdupes.h>
int mp_duperes_set_key_num(mp_id, key_num);
MP_ID mp_id; Input: Session ID from mp_init()
unsigned long key_num; Input: Number of results key
DescriptionCall mp_duperes_set_key_num() to set the key number of the duplicate-results
key with which you want to work. The default is to query the first key.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
MP_ERR_MTC_CALL — Internal Match Library error
Example
int status;
MP_ID mp_id;
unsigned long key_num = 10;
/* Query the 10th dupe results key */
status = mp_duperes_set_key_num(mp_id, key_num);
Chapter 7: Match/Consolidate functions
79
mp_duperes_set_match_level()
Synopsis#include <mpdupes.h>
int mp_duperes_set_match_level(mp_id, level_num);
MP_ID mp_id; Input: Session ID from mp_init()
int level_num; Input: Match level number to work with
DescriptionCall mp_duperes_set_match_level() to set the match level at which to query
duplicate-results keys. The default is to query is the first match level.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
int level_num = 2;
/* Query at match level 2 */
status = mp_duperes_set_match_level(mp_id, level_num);
80
Match/Consolidate Library Reference
mp_duperes_set_results_type()
Synopsis#include <mpdupes.h>
int mp_duperes_set_results_type(mp_id, duperes_type);
MP_ID mp_id; Input: Session ID from mp_init()
int duperes_type; Input:Type of duplicate-results keys with which to work
DescriptionCall mp_duperes_set_results_type() to set the type of duplicate results with
which you want to work. The default is to query duplicate keys only.
Values for duperes_typeDescription
MP_DUPERES_ONLY_DUPESWork with duplicates only (default).
MP_DUPERES_ALL_RECORDSWork with duplicates and unique records.
Querying all records requires more processing time because there are more
records to mark.Returns
Returns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
Example
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
int status;
MP_ID mp_id;
/* Set the results keys to query to be all input keyfile records. */
status = mp_duperes_set_results_type(mp_id,
MP_DUPERES_ALL_RECORDS);
Chapter 7: Match/Consolidate functions
81
mp_dupesrch_batch()
Synopsis#include <mpdupes.h>
int mp_dupesrch_batch(mp_id);
MP_ID mp_id; Input: Session ID from mp_init()
DescriptionCall mp_dupesrch_batch() to find duplicates according to all of the current user
settings. Match/Consolidate will find all duplicates in all reference files and
accumulate the results in a work file.
Before calling this function, you must attach at least one reference file to the
session (or create the key file), define all break-group settings, define at least one
list, define at least one match specification, and enable comparisons within or
between one or more lists.
You may not call mp_dupesrch_batch() while finding break groups or finding
duplicates.
The function mp_break_find_groups() will be called automatically as part of
the duplicate-detection process.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
Example
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value was specified
MP_ERR_INV_WORK_FILE — Invalid work file encountered
MP_ERR_INV_LIST_ID — Invalid list ID encountered
MP_ERR_NONCOMMON_BREAK_FIELD — Invalid break field type
/* Find duplicates in all break groups. */
status = mp_dupesrch_batch(mp_id);
82
Match/Consolidate Library Reference
mp_dupesrch_break_queries()
Synopsis#include <mpdupes.h>
int mp_dupesrch_break_queries(mp_id, num_queries, mp_breakqry_id_list);
MP_ID mp_id; Input: Session ID from mp_init()
int num_queries; Input: Number of break-group queries
MP_BREAKQRY_ID *mp_breakqry_id_list; Input: List of query IDs
DescriptionCall mp_dupesrch_break_queries() to find duplicates. Match/Consolidate finds
duplicates in each set of keys from the specified break-group queries, and
accumulates results in a work file.
Before calling this function, you must attach at least one reference file to the
session (or you must create the key work file). This function cannot be called
while finding break groups or finding duplicates.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
Example
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value was specified
MP_ERR_INV_WORK_FILE — Invalid work file encountered
MP_ERR_INV_LIST_ID — Invalid list id encountered
MP_ERR_NONCOMMON_BREAK_FIELD — Invalid break field type
int status;
MP_ID mp_id;
int num_queries = 2; /* Two user queries */
MP_BREAKQRY_ID *mp_breakqry_id_list; /* List of query IDs */
mp_breakqry_id_list = malloc(sizeof(MP_BREAKQRY_ID) * num_queries);
mp_breakqry_id_list[0] = first_query_ID; /* ID ret’d from query */
mp_breakqry_id_list[1] = second_query_ID; /* ID ret’d from query */
/* Search for dupes on 2 groups of keys defined by break queries */
status = mp_dupesrch_break_queries(mp_id, num_queries,
mp_breakqry_id_list);
Chapter 7: Match/Consolidate functions
83
mp_dupexit_get_data()
Synopsis#include <mpdupes.h>
int mp_dupexit_get_data(mp_id, dupe_rec_type, datatype, ret_data);
MP_ID mp_id; Input: Session ID from mp_init()
int dupe_rec_type; Input: Record from which to get data
int datatype; Input: Type of data
char *ret_data; Output: User buffer to hold data
DescriptionCall mp_dupexit_get_data() from within your comparison-exit function to get
data from MCD key fields for either of the two records being matched. The data
is copied to the user’s buffer ret_data, which must be large enough to hold all the
data stored in the key field plus a NULL terminator.
ParameterValuesDescription
dupe_rec_typeMP_DUP_DRIVER_RECORD
MP_DUP_SUBORDINATE_RECORD
datatypeMP_REF_DATA_KEY_MISC
MP_REF_DATA_LIST_ID
MP_REF_DATA_PRIORITY_FLD
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
Driver record.
Subordinate record.
Key miscellaneous field.
List ID field.
Field-priority field.
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVFLD — Invalid user buffer
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
char ret_data[255];
/* Query the key misc field of the subordinate key */
status = mp_dupexit_get_data(mp_id, MP_DUPE_SUBORDINATE_RECORD,
MP_REF_DATA_KEY_MISC, ret_data);
84
Match/Consolidate Library Reference
mp_dupexit_set_exit_compare()
Synopsis#include <mpdupes.h>
int mp_dupexit_set_exit_compare(mp_id, dupe_exit, compare_exit);
MP_ID mp_id; Input: Session ID from mp_init()
int dupe_exit; Input: Type of comparison exit (see table)
int (*compare_exit)(match_level, driver_mtc_key_id,
subordinate_mtc_key_id, dup_results)
int match_level;In: Match level
MTC_KEY_ID driver_mtc_key_id; In: Driver record key ID
MTC_KEY_ID subordinate_mtc_key_id; In: Subordinate record key ID
int *dup_results;Out: Match decision
DescriptionCall mp_dupexit_set_exit_compare() to set an exit routine to compare records
and decide whether or not they match. Match/Consolidate passes to you the
match level, the key ID of the driver record, and the key ID of the subordinate
record. Within your exit routine, you can call mp_dupexit_get_data() and
mtc_key_get_flddata() to retrieve data from the keys.
You can set up the following types of exits:
Values for dupe_exitDescription
MP_DUPEXIT_PRECOMPCall your comparison exit before passing
the keys to the Match Library.
MP_DUPEXIT_POSTCOMP_DUPECall your comparison exit if the Match
Library finds that two keys to be duplicates.
MP_DUPEXIT_POSTCOMP_NODUPECall your comparison exit if the Match
Library finds that two keys are not duplicates.
Your exit routine must pass back one of the following match decisions:
Values for dup_resultsDescription
MP_DUP_RESULT_UNDECIDEDUndecided.
MP_DUP_RESULT_DUPEThe keys match.
MP_DUP_RESULT_NODUPEThe keys do not match.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Chapter 7: Match/Consolidate functions
85
Example
int status;
MP_ID mp_id;
int compare_exit(int, MTC_KEY_ID, MTC_KEY_ID, int *);
/* Set the pre compare user exit routine */
status = mp_dupexit_set_exit_compare(mp_id, MP_DUPEXIT_PRECOMP,
compare_exit);
86
Match/Consolidate Library Reference
mp_get_error_info()
Synopsis#include <mp.h>
int mp_get_error_info(mp_id, errnum, stdmsg, detailmsg);
MP_ID mp_id; Input: Session ID from mp_init()
int *errnum; Output: Error number
char **stdmsg; Output: Pointer to buffer containing short error message
char **detailmsg; Output: Pointer to buffer containing detailed error message
DescriptionAfter a MCD function returns MP_ERROR, call mp_get_error_info() to get the
error number, a short error message, and a detailed error message. The maximum
length of an error-message string is MP_MAX_MSG_LEN, as defined in mp.h.
The message buffers may be overwritten the next time a MCD function is called.
Visual BasicIf you are programming in Visual Basic, do not call this function. Instead, call
mp_get_error_messages() and mp_get_error_number().
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
MP_ID mp_id;
int retval, errnum;
char *stdmsg, *detailmsg;
/* call a function incorrectly to demonstrate error handling */
/* invalid session name error */
retval = mp_misc_set_work_file_info(mp_id,
MP_WORK_FILE_SESSION_NAME,
"");
/* check for an error */
if (retval != MP_OK) {
if (retval == MP_ERROR) { /* error? */
/* get the error information */
retval = mp_get_error_info(mp_id, &errnum, &stdmsg,
&detailmsg);
if (retval == MP_OK) {
/* print an error message */
printf("mp_misc_set_work_file_info() failed: <%d><%s><%s>\
n", errnum, stdmsg,
detailmsg);
}
}
else { /* invalid id */
printf("mp_misc_set_work_file_info() failed: Invalid ID!\
n");
}
DescriptionAfter a MCD function returns MP_ERROR, call mp_get_error_messages() to
retrieve the error messages. Subsequent MCD calls will reset the error messages.
The maximum length of an error-message string is MP_MAX_MSG_LEN, as
defined in mp.h.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
MP_ID mp_id;
int retval;
char stdmsg[MP_MAX_MSG_LEN], detailmsg[MP_MAX_MSG_LEN];
/* call a function incorrectly to demonstrate error handling */
/* invalid session name error */
retval = mp_misc_set_work_file_info(mp_id,
MP_WORK_FILE_SESSION_NAME,
"");
/* check for an error */
if (retval != MP_OK) {
if (retval == MP_ERROR) { /* error? */
/* get the error messages */
retval = mp_get_error_messages(mp_id, stdmsg, detailmsg);
if (retval == MP_OK) {
/* print an error message */
printf("mp_misc_set_work_file_info() failed: <%s><%s>\n",
stdmsg, detailmsg);
}
}
else { /* invalid id */
printf("mp_misc_set_work_file_info() failed: Invalid ID!\
n");
}
break;
}
88
Match/Consolidate Library Reference
mp_get_error_number()
Synopsis#include <mp.h>
int mp_get_error_number(mp_id, errnum);
MP_ID mp_id; Input: Session ID from mp_init()
int *errnum; Output: Error number
DescriptionAfter a MCD function call returns MP_ERROR, call mp_get_error_number()to
get the error number. Subsequent MCD calls will reset the error.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
MP_ID mp_id;
int retval, errnum;
/* call a function incorrectly to demonstrate error handling */
/* invalid session name error */
retval = mp_misc_set_work_file_info(mp_id,
MP_WORK_FILE_SESSION_NAME,
"");
/* check for an error */
if (retval != MP_OK) {
if (retval == MP_ERROR) { /* error? */
/* get the error number */
retval = mp_get_error_number(mp_id, &errnum);
if (retval == MP_OK) {
/* print an error message */
printf("mp_misc_set_work_file_info() failed: <%d>\n",
errnum);
}
}
else { /* invalid id */
printf("mp_misc_set_work_file_info() failed: Invalid ID!\
n");
}
break;
}
Chapter 7: Match/Consolidate functions
89
mp_get_revision()
Synopsis#include <mp.h>
int mp_get_revision(revision_str);
char *revision_str; Output: Revision string
DescriptionCall mp_get_revision() to get the version number of the MCD Library and all
associated Business Objects libraries. This function is especially useful if you
need to call us for technical support. You must have sufficient space in your
buffer for the string to be copied. The maximum length of the string is
MP_MAX_REV_STR_LEN.
ReturnsReturns MP_OK if successful, otherwise MP_ERROR.
Example
int status;
char revision_str[MP_MAX_REV_STR_LEN];
/* Get the Match/Consolidate Library revision string */
status = mp_get_revision(revision_str);
90
Match/Consolidate Library Reference
mp_init()
Synopsis#include <mp.h>
int mp_init(mtc_id, mp_id);
MTC_ID mtc_id; Input: Match session ID from mtc_init()
MP_ID *mp_id; Output: Match/Consolidate session ID
DescriptionCall mp_init() to initialize a MCD Library session and the global data used by it.
Make one call to mp_init() for each MCD session.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Default settings could not be initialized
MP_ERR_SYSTEM — System error
MP_ERR_SRT_CALL — Sort library could not be initialized
Example
MTC_ID mtc_id;
MP_ID mp_id;
int status;
/* Initialize the Match/Consolidate Library */
status = mp_init(mtc_id, &mp_id);
Chapter 7: Match/Consolidate functions
91
mp_input_get_num_refs()
Synopsis#include <mpinput.h>
int mp_input_get_num_refs(mp_id, num_refs);
MP_ID mp_id; Input: Session ID from mp_init()
int *num_refs; Output: Number of reference files
DescriptionCall mp_input_get_num_refs() to find out how many reference files will be input
into the MCD Library and used for finding break groups and duplicates.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_ID mp_id;
int num_refs;
/* Query the number reference files currently set for input into
the
Match/Consolidate Library. */
status = mp_input_get_num_refs(mp_id, &num_refs);
92
Match/Consolidate Library Reference
mp_input_get_ref_num()
Synopsis#include <mpinput.h>
int mp_input_get_ref_num(mp_ref_id, ref_num);
MP_REF_ID mp_ref_id; Input: Reference ID from mp_ref_open()
int *ref_num; Output: Reference-file number
DescriptionCall mp_input_get_ref_num() to get the file number for a reference file. This
number tells you where the file stands in the sequence of reference files that were
attached to the session by calling mp_input_set_ref_num().
If the reference ID is not found, then ref_num is set to 0 (zero).
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
Example
int status;
MP_REF_ID mp_ref_id;
int ref_num;
/* Get reference file number from the mp_ref_id. */
status = mp_input_get_ref_num(mp_ref_id, &ref_num);
Chapter 7: Match/Consolidate functions
93
mp_input_set_num_refs()
Synopsis#include <mpinput.h>
int mp_input_set_num_refs(mp_id, num_refs);
MP_ID mp_id; Input: Session ID from mp_init()
int num_refs; Input: Number of reference files (1–255)
DescriptionCall mp_input_set_num_refs() to set the number of reference files that you will
use as input to the MCD Library for this session. You must call
mp_input_set_num_refs() before finding break groups and before finding
duplicates.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
int num_refs = 1;
/* Set the number reference files that will be input into the
Match/Consolidate Library. */
status = mp_input_set_num_refs(mp_id, num_refs);
94
Match/Consolidate Library Reference
mp_input_set_ref_num()
Synopsis#include <mpinput.h>
int mp_input_set_ref_num(mp_ref_id, ref_num);
MP_REF_ID mp_ref_id; Input: Reference ID from mp_ref_open()
int ref_num; Input: Reference-file number
DescriptionCall mp_input_set_ref_num() to attach a reference file to the session. You must
set at least one reference file before finding break groups or finding duplicates.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_REF_ID mp_ref_id;
int ref_num = 1;
/* Input a reference file into a Match/Consolidate Library session.
*/
status = mp_input_set_ref_num(mp_ref_id, ref_num);
Chapter 7: Match/Consolidate functions
95
mp_list_get_default_action()
Synopsis#include <mplist.h>
int mp_list_get_default_action(mp_id, default_action);
MP_ID mp_id; Input: Session ID from mp_init()
int *default_action; Output: Default list action (see page 110).
DescriptionCall mp_list_get_default_action() to query the default list action. The default list
action determines what action is taken when a record does not belong to any of
your defined lists. For a list of default-action values, see page 110.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_ID mp_id;
int default_action;
/* Query the default list action */
status = mp_list_get_default_action(mp_id, &default_action);
96
Match/Consolidate Library Reference
mp_list_get_default_list_num()
Synopsis#include <mplist.h>
int mp_list_get_default_list_num(mp_id, list_num);
MP_ID mp_id; Input: Session ID from mp_init()
int *list_num; Output: List number of default list
DescriptionCall mp_list_get_default_list_num() to find out which list is the default list.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid.
Example
int status;
MP_ID mp_id;
int list_num;
/* Query the default list number */
status = mp_list_get_default_list_num(mp_id, &list_num);
Chapter 7: Match/Consolidate functions
97
mp_list_get_list_attr()
Synopsis#include <mplist.h>
int mp_list_get_list_attr(mp_id, list_num, list_attr_type, list_attr_value);
MP_ID mp_id; Input: Session ID from mp_init()I
int list_num; Input: List number
int list_attr_type; Input: Type of list attribute (see the table on page 113).
int *list_attr_value; Output: Current setting for this list attribute (see page 113).
DescriptionCall mp_list_set_list_attr() to get the current setting for a particular list attribute.
For a list of the values for list_attr_type and list_attr_value, see page 113.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
int list_num = 1;
int list_attr_value;
/* Query the list type of this list */
status = mp_list_get_list_attr(mp_id, list_num, MP_LIST_ATTR_TYPE,
&list_attr_value);
98
Match/Consolidate Library Reference
mp_list_get_list_id()
Synopsis#include <mplist.h>
int mp_list_get_list_id(mp_id, list_num, list_id);
MP_ID mp_id; Input: Session ID from mp_init()
int list_num; Input: List number
char *list_id; Output: List ID
DescriptionCall mp_list_get_list_id() to get the list ID for a particular list. You must provide
enough space with to hold the entire list ID value plus a NULL terminator.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
int list_num = 1;
char list_id[256];
/* Query the list_ID of list number list_num. */
status = mp_list_get_list_id(mp_id, list_num, list_id);
Chapter 7: Match/Consolidate functions
99
mp_list_get_list_name()
Synopsis#include <mplist.h>
int mp_list_get_list_name(mp_id, list_num, list_name);
MP_ID mp_id; Input: Session ID from mp_init()
int list_num; Input: List number
char *list_name; Output: Name of list
DescriptionCall mp_list_get_list_name() to get the name of a list. You must provide enough
space to hold the entire list name plus a NULL terminator. The maximum listname length is MP_MAX_LIST_NAME_LEN.
ReturnsReturns MP_OK if successful or MP_INVID if the session ID is not valid;
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info()
or mp_get_error_number() to retrieve one of the following possible error
numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Example
int status;
MP_ID mp_id;
int list_num = 1;
char list_name[MP_MAX_LIST_NAME_LEN + 1];
/* Query the list name of list number list_num. */
status = mp_list_get_list_name(mp_id, list_num, list_name);
100
Match/Consolidate Library Reference
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.