BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP
BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP
BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP
BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of
Business Objects, an SAP company and/or affiliated companies in the United States and/or
other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries.
All other names mentioned herein may be trademarks of their respective owners.
Index ............................................................................................................163
Contents
5
6
Match/Consolidate Library Reference
Preface
About Match/
Consolidate Library
Conventions
This manual is a reference for programmers working with the Match/Consolidate Library. It explains how to make your application work with the library.
Each chapter contains explanations, code examples, call sequences, and reference pages about each of the function calls.
In this manual, we assume that you are already familiar with your programming
language, your operating system, and with concepts of database management.
This document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file, and
the
.txt
Menu
commands
!
extension (
We indicate commands that you choose from menus in the following
format: Menu Name > Command Name. For example, “Choose File >
New.”
We use this symbol to alert you to important information and potential
problems.
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful to
you.
Preface
7
Documentation
Other documentationDocuments related to this manual include the following:
DocumentDescription
Access the latest
documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, including how to create DEF, FMT, and DMT files.
Match/Consolidate User’s
Guide to Record
Matching
Explains the concepts behind name and address matching
software and provides examples of how to implement,
analyze, and fine-tune match detection strategies for the
best results.
Match/Consolidate
Extended Matching
Contains the operational how-to instructions for setting
up extended matching.
Reference
Match Library Program-
This is a reference manual for the Match Library.
mer’s Reference
Quick Reference
Contains descriptions of the input and output fields, and
the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for each
product that you’ve installed are available in the Documentation folder.
Choose Start > Programs > Business ObjectsApplications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’
documentation.
8
Match/Consolidate Library Reference
Chapter 1:
Install, compile, and link
This chapter provides information about installing the software, compiling your
application, and linking the compiled application with the Match/Consolidate
(MCD) libraries for Windows and UNIX systems.
Chapter 1: Install, compile, and link
9
Build on Windows
We provide 32-bit dynamically-linked libraries (DLLs), which you can use with
Windows NT 4.0, Windows 2000 Professional, Windows XP, and Windows 2000
Advanced Server.
InstallationIf you need installation information, refer to the System Administrator’s Guide.
Refer to pw\mplib for the MCD library.
Compilers and
programming
languages
Additional
compilation flags
Create sample
programs
See InfoSource for the latest compiler information.
Beginning with Match/Consolidate release 7.31c, additional compilation flags are
required. These flags are platform specific. For example, on Windows, add the
following: /D “FL_WIN32”
For more information, see the sample build scripts.
We provide a build file, read_dll.me for the sample programs. Refer to this file
for step-by-step instructions for creating a Visual C++ project to build samples.
10
Match/Consolidate Library Reference
Build on UNIX
InstallationIf you need installation information, refer to the System Administrator’s Guide.
Refer to …/postware/mplib for the MCD library.
CompilerSee InfoSource for the latest compiler information.
Additional
compilation flags
Create sample
programs
Beginning with Match/Consolidate release 7.31c, additional compilation flags are
required. These flags are platform specific. For example, on Solaris 64-bit, add
the following: -DFL_UNIX -DFL_UNIX_SOL
For more information, see the sample build scripts.
Refer to buildmp for examples of how to build the sample programs. Modify this
file to set your include and library directories.
Chapter 1: Install, compile, and link
11
12
Match/Consolidate Library Reference
Chapter 2:
Overview of the Match/Consolidate Library
This chapter provides information about the before-and-after of record matching
and the basic steps for working with Match/Consolidate (MCD) Library. It also
provides information about configuration files, sample programs, error handling
and progress callbacks, work files, and function calls.
Chapter 2: Overview of the Match/Consolidate Library
13
Before-and-after of record matching
The MCD Library is a companion to Match Library. Match Library compares two
records and determines whether or not they match. MCD Library works with the
before-and-after of record matching.
Condense records into
essential data
MCD LibraryMCD Library
Select a pair of
records to compare
Match
Library
Compare the recordsUse the results
Theoretically, you could compare each complete, original record with every other
complete, original record, but that would take a very long time. To save time,
you’ll condense records into the data essential for matching and select for
comparison only records that have a reasonable chance of matching.
When you compare records, you’ll decide which data must match—and how
closely—for records to be considered a match. Theoretically, you could use your
complete, original records for comparisons, but such comparisons would be
prohibitively slow.
To make matching more efficient, use our Match Library to condense records so
that they contain only the data needed during the matching process. These
condensed records are called keys.
Original recordKey
FirstName: JoAnne
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
City: Leeds
State: MA
ZIP: 01053
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
ZIP: 01053
Use the MCD Library to collect keys into a file called a reference file. The
reference file contains the Match Library key data, plus additional information—
such as list identifiers—that you can add when you build the reference file.
14
Match/Consolidate Library Reference
Eliminate needless
comparisons
Theoretically you could compare every record to every other record, but that
would take a long time. And many comparisons wouldn't make a lot of sense-for
example, a record with a ZIP Code of 01234 does not match a record with a ZIP
Code of 98765, so it doesn't pay to compare them at all.
To eliminate comparisons between records that are unlikely to match, you can
separate records into clusters called break groups. For example, you could
separate records by ZIP Code and look for matches only within each ZIP Code
group.
Forming break groups eliminates a huge number of unnecessary comparisons.
For a more in-depth discussion-including examples showing how many
comparisons can be eliminated with a minimal effect on results, refer to the
User's Guide to Record Matching.
You can also use lists to eliminate unnecessary comparisons. For example, if you
have a database which you know contains no duplicate records, you can assign all
the records in that file to a list, and tell MCD to cancel comparisons between
records within that list.
Use different
matching rules for
different lists
Form groups of
duplicate records
To control how records are compared, you can assign each record to a list. A list
is a group of records that have some common characteristic—perhaps all of the
records come from the same input database, or all have a common demographic
code.
You can then use different matching rules with different lists. For example, the
matching rules for comparisons within List A could be different from the
matching rules for comparisons between List A and List B. When you pass a pair
of records to the match engine, you can use list membership to dictate which
match rules are used for the comparison.
The MCD Library interprets the matching results and tracks information relevant
to the duplicate-detection process. You can query these results and then handle
the information however you choose.
Chapter 2: Overview of the Match/Consolidate Library
15
Work with the Match/Consolidate Library
When you work with the MCD Library, you’ll follow these basic steps:
1.Create a file that contains only the record-data needed for matching. Such a
file is called a reference file, and the condensed records are called keys.
2.Define logical groups of records called lists. You’ll do this at the same time
you create your reference files by putting a list identifier into each reference
file or into each individual record key.
3.Separate the record keys into break groups. For example, you might separate
records by the first three digits of the ZIP Code.
4.Within each break group, select pairs of record keys and pass them to the
match engine. If you have more than one set of match rules, you can control
which set of rules to use for each pair of records.
5.After the match engine compares the record keys, query the results.
Reference filesKey pairs
List 1
Key A Key D
Key B Key E
Key C Key F
List 2
Key G Key K
Key H Key L
Key J Key M
Break groupsComparisonsResults
Key F
Key H
Key J
Key M
Key A
Key D
Key G
Key B
Key E
Key C
Key K
Key L
Key F
Key H
Match
Library
Not
duplicate
16
Match/Consolidate Library Reference
Configuration files make setup faster and easier
Match/Consolidate Library offers external text configuration files that contain
some of the settings and options to be used for your MCD session. Set the
parameters of the text configuration files, then call mp_cfg_open(). This function
sets your options and returns a session handle.
Advantages of
configuration files
Some advantages to using external text configuration files include the following:
Rather than making dozens of calls to set up a session handle, you make one
call to mp_cfg_open(). Compared with making direct calls to the
conventional API, this should reduce application code size, training time,
development time, errors, and testing time.
You can change breaking strategy, list definitions, and reference file setup
simply by editing the configuration files. Therefore, you can test different
scenarios without changing API calls and rebuilding.
With small modifications to your code, you can use one code base to support
many different MCD scenarios. For each scenario, make a different set of
text configuration files.
Configuration files are heavily commented so that users can edit them
without referring to printed documentation. In fact, all the potential options
and settings are included in the comments, so making additions and changes
to the configuration can be mostly copy-and-paste.
Preset MCD strategy We include a set of configuration files already set for the most commonly-used
MCD strategy. Most users will be able to use these without further editing or
modification. Just specify the location and file name of the mp.cfg file in your
mp_cfg_open() call.
Five configuration
files
Match/Consolidate Library configuration parameters are distributed among five
configuration files, as described below.
Configuration fileFile nameDescription
Overall
Reference
List
Break
Miscellaneous
mp.cfg
mpref.cfg
mplist.cfg
mpbreak.cfg
mpmisc.cfg
Specifies the paths and file names of the other
configuration files.
Controls the format of the reference file, which
contains the record keys.
Defines logical groups of records.
Identifies which fields to use to form break
groups, and which characters in those fields to
use for breaking.
Set miscellaneous options such as the workfile directory.
Chapter 2: Overview of the Match/Consolidate Library
Sample programsWe provide three sample programs and their source code.
ProgramDescription
reftest.c
mptest.c
mptest2.c
ModulesThe sample programs use the following modules.
ProgramDescription
suppfncs.c
errfncs.c
Build the sample
programs
For instructions on building the sample programs, see “Install, compile, and link”
on page 9.
Demonstrates how to create and load a reference file. In this sample
program, data is read from a sample ASCII data file.
Demonstrates how to open a previously created reference file, set up
lists and matching, set up progress callbacks, find break groups, find
duplicates, and query duplicate results.
Demonstrates how to use the MCD Library
configuration files.
Contains supporting functions used by all of the sample programs.
Demonstrates error handling.
Chapter 2: Overview of the Match/Consolidate Library
19
Error handling and progress callback
Nearly every function in the MCD Library returns a status code. It is important to
check the status code after every MCD Library function call. If MCD returns the
value MP_ERROR, your application must decide whether to exit or not, what
information to display, and so forth.
Error handlerThe easiest way to handle error checking is to set up an error-handling function.
Call your error handler whenever MP_OK is not returned.
Inside your error handler, you can call mp_get_error_number() and
mp_get_error_messages() to obtain more information about the failed function
call.
You can call mp_get_error_info() rather than mp_get_error_number() and
mp_get_error_messages(). However, mp_get_error_info() may not work with
Visual Basic.
For sample code, refer to the errfncs.c module in the samples subdirectory.
One error handler for
MCD and Match
If you want to use just one error handler for MCD and Match Library function
calls, you need to detect whether the current error is MCD or Match Library to
determine which get_error_info() function to call. You can do this by checking
the prefix of the error-causing function (mp or mtc), or by setting a variable and
passing its value as a parameter when calling your error handler.
Progress callbacksCertain MCD Library processing steps—such as breaking and finding
duplicates—work with multiple records and potentially involve I/O and sorting.
If you wish to display progress information for MCD Library processing steps,
you can set up progress callbacks for the desired processing steps.
For sample code, refer to the suppfncs.c module in the samples subdirectory.
20
Match/Consolidate Library Reference
Work files
During the processes of creating break groups and finding duplicates, MCD
creates work files. You can specify where to store these files, what to call them,
and whether to delete them automatically.
Work files for break
groups
When finding break groups, MCD creates work files. By default, these files are
stored in the current directory. If you’d prefer to store them somewhere else, you
can specify a work directory.
You must specify a work-file session name. MCD uses the session name as the
base file name for work files. For example, if you assign a session name of “test,”
work files will have names such as test.dbk.
Work files for
duplicate results
When finding duplicates, MCD builds two additional work files—one for
duplicate results, and one for key-file information. These two files can become
quite large, so you may want to store them in a separate location. You can specify
the path and file names for each of these two work files.
Sample codeTo set work-file options, call mp_misc_set_work_file_info(). For sample code,
refer to the suppfncs.c module in the samples subdirectory.
Chapter 2: Overview of the Match/Consolidate Library
21
Function calls for initialization and termination
You may use some or all of the functionality of the MCD Library. You’ll call
these functions for any MCD application.
1.Call mp_init() to initialize the MCD Library.
Initialize the Match Library before initializing the MCD Library. For
more information, refer to your Match Library documentation.
2.If you are not using configuration files, call mp_set_keyfile() to set the path
and file name of the installation key file, mplib.key. You must specify the
location of this file before calling any other MCD functions.
This key file is not related to the record keys stored in your reference file.
This key file “unlocks” the library for use.
3.If you are using configuration files, call mp_cfg_open(). For details about
configuration files, see “Configuration files make setup faster and easier” on
page 17.
4.Set up an error-handling function. Inside your function, call
mp_get_error_info(). For more information, see “Error handling and progress
callback” on page 20.
5.Optional: Call mp_get_revision() to get version information for the MCD and
Match Libraries. You will need this information before calling for technical
support.
6.If you are not using configuration files, call the following functions to set
miscellaneous options and work-file information:
mp_misc_set_option_info()
mp_misc_set_work_file_info()
7.If you want to establish exit functions—for example, to report progress at
crucial points during processing—call mp_misc_set_exit_progress() or
mp_misc_set_exit_blank_field() to register your exit functions with the
MCD Library.
8.Perform processing. A complete, end-to-end job process would involve the
following major steps:
Create, open, or update reference files (Chapter 3).
Create lists and super lists (Chapter 4).
Form break groups (Chapter 5).
Search for duplicates and retrieve results (Chapter 6).
9.If you are using configuration files, call mp_cfg_close().
10. Call mp_term() to terminate the MCD Library and free global memory
allocated for it.
22
Match/Consolidate Library Reference
Chapter 3:
Reference files
This chapter introduces reference files and provides information about creating
reference files, adding keys to a reference file, and maintaining reference files.
Chapter 3: Reference files
23
Introduction to reference files
A reference file is a specialized work file that contains record keys. The record
keys are condensed versions of your original record data, containing only the data
needed to form break groups, determine list membership, perform matching, and
rank records within duplicate groups.
You’ll use the Match Library to define most of the key layout—namely, which
record data to include in the key. When you use MCD to create a reference file,
you’ll tell MCD which match key to use, and you might add useful MCD key
fields to each key.
Header and keysA reference file consists of header data and record keys.
The header contains information about the key layout. You can also place a list
identifier and other user data in the header.
The record keys contain the record data that is needed to form break groups and
perform matching; you use the Match Library to define this portion of the key.
When you create a reference file, you can also add a record-mapping field,
record-priority field, and list identification field to each key.
Temporary or
reusable
Header
Key layout, user “miscellaneous” field, list ID (if constant)
Keys
Overhead data, record data, record-mapping field, record-priority field, list ID field (if variable)
A reference file can be used as a temporary work space. For example, if some of
your duplicate-detection data comes from a transaction file which is always
changing, you can create a temporary reference file from your transaction data
each time you want to find duplicates.
Alternately, once you generate keys and store them in a reference file, you can
save the reference file and add, delete, update, and reuse the keys. However,
many programmers find it easier to regenerate the reference file each time they
find duplicates, rather than maintaining and updating keys in an existing
reference file.
24
Match/Consolidate Library Reference
Key dataA key contains record data plus other data such as a list identifier or record-
mapping information. Keys must include all of the data you want to use to form
break groups and perform matching.
You’ll use the Match Library to define the layout and content of the key. When
you use the MCD Library to create the reference file, you’ll specify which match
key to use. Optionally, you can add the MCD key fields listed below.
Key fieldDescription
List IDDefines to which list a key belongs.
Field priority
Key miscellaneous
Defines a record’s priority within a group of duplicate records.
Stores user information such as a record identifier that allows you to
map a key back to a particular record in your database.
Key lengthEach key consists of a overhead data, Match Library key fields, and MCD
Library key fields. To calculate the length of each key, find the sum of the
following items:
The size of the overhead data (16 bytes per key).
The length of each Match Library key field, including any alternates.
The length of each MCD key field.
Match/Consolidate pads each key so that the total key length is divisible by eight.
List identifiersThere are two ways to store list identifiers:
If all keys in the reference file belong to the same list, you can store the list
ID once, in the reference-file header.
If the reference file contains keys from multiple lists, store the list ID in each
key, in the MCD list ID field.
Chapter 3: Reference files
25
Create a reference file
Configuration fileIf you use the MCD configuration files, use the Reference configuration file,
mpref.cfg, to define MCD key fields and reference-file header information. For
instructions about completing the configuration parameters, see the comments in
the configuration file.
This configuration file replaces calls to mp_refcreate_set_option(). For more
information about using configuration files, see “Configuration files make setup
faster and easier” on page 17.
Sequence of
function calls
To create an empty reference file, call the functions below. For sample code, refer
to reftest.c.
1.Use the Match Library to define a match key. The match key defines the
fields that hold record data—for example, personal-name data, address data,
and so on. For more information, refer to the Match Library Programmer’s Reference manual.
2.Call mp_refcreate_open() to initialize the library and open a session to create
a reference file.
3.Call mp_refcreate_set_mtc_key_id() to specify which match-key layout to
use. You used the Match Library to define the match-key layout (step 1).
4.Call mp_refcreate_set_option() to reserve space for MCD key fields and set
options. If you are using MCD configuration files, you do not need to make
these calls.
5.Call mp_refcreate_close() to lock your settings and create the reference file.
26
Match/Consolidate Library Reference
Add keys to a reference file
To load data into a reference file, add keys. Use the Match Library to load record
data into each key. Use the MCD Library to load data into the MCD key fields.
To add keys a reference file, call the functions below. For sample code, refer to
reftest.c.
1.Call mp_ref_open() to open a previously created reference file.
2.Call mp_refqry_get_mtc_key_id() to get the match-key ID for the reference
file. You will pass this ID as input to Match Library functions.
3.Enter a loop to load data into each key:
Call mp_refmod_clear_key() to clear the internal key buffer before
loading data.
Use the Match Library to load record data into the Match key fields. For
more information, see your Match Library manual.
Call mp_refmod_set_data() to set data in MCD key fields. Call this
function once for each MCD field.
Call mp_refmod_add_key() to add the completed key to the reference
file.
4.Call mp_ref_close() to close the reference file.
Chapter 3: Reference files
27
Maintain a reference file
Many programmers find that it’s easiest to regenerate reference files each time
they perform MCD processing, rather than trying to maintain the files. Often the
time needed to regenerate a reference file is minimal, especially when weighed
against the work of maintaining it.
In some cases, however, you may prefer to maintain an existing reference file. To
update an existing key, you need to find the key, modify the key fields as
necessary, and write the data back to the key.
To modify a reference file, use the mp_refmod_*() functions and the appropriate
Match Library functions. To query the contents of keys, use the mp_refqry_*()
functions and the appropriate Match Library functions. For more information
about working with Match Library key data, refer to the Match Library Programmer’s Reference manual.
28
Match/Consolidate Library Reference
Chapter 4:
Lists, match specifications, and match
levels
This chapter provides an introduction to lists and explains how to set up lists,
control matching within and between lists, give one list priority over another, and
how to gather statistics for a group of lists.
Chapter 4: Lists, match specifications, and match levels
29
Introduction to lists
A list is a group of records that are related in some way—for example, all of the
records might have come from the same input database. Lists give you added
control over the matching process:
Control which matching rules to use when comparing two records.
Cancel comparisons within a list if you know there are no duplicate records
within that list.
Assign one list priority over another. For example, you could favor a house
list over a rented list.
Prioritize records within a break group.
List membershipWhen you set up a list, you assign a list identifier for that list. A record is a
member of that list if the record has the same list identifier. There are two ways to
assign records to lists:
If all of the records (keys) in a reference file belong to the same list, you can
put the appropriate list ID in the file header.
Alternately, you can include a list ID field in your record key. The value in
the list ID field indicates to which list a particular record belongs.
Types of listsYou can define three different types of lists:
List typeDescription
NormalContains good or eligible records (default).
SuppressionContains records that should be suppressed. Suppression records and
all records that match them should be removed from the output.
SpecialContains records that are not counted in the determination of whether
a duplicate group is single-list or multiple-list.
If a record doesn’t
If a record doesn’t belong to any of your defined lists, you can:
belong to any list
ActionOutcome
IgnoreLeave the record out of the job.
AbortReturn an error code.
AssignAssign the record to a default list that you specify.
30
Match/Consolidate Library Reference
Loading...
+ 136 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.