BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects
Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP
company and/or affiliated companies in the United States and/or other countries. SAP® is a
registered trademark of SAP AG in Germany and/or other countries. All other names mentioned herein may be trademarks of their respective owners.
This manual is a reference for C programmers who are working with Match/
Consolidate Custom. This manual explains how to make your application
work with Match/Consolidate Custom, and provides detailed reference pages
about each of the functions.
In writing this manual, we assume that you are already familiar with the C
programming language, your operating system, and with basic concepts of
database management, mail processing, and address processing.
This document follows these conventions:
ConventionDescription
BoldWe use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
ItalicsWe use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file,
Menu
commands
!
and the
We indicate commands that you choose from menus in the following format: Menu Name > Command Name. For example, “Choose
File > New.”
We use this symbol to alert you to important information and potential problems.
.txt
extension (
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know
about.
We use this symbol to draw your attention to tips that may be useful
to you.
Preface
5
Documentation
Other documentationDocuments related to this manual include the following:
DocumentDescription
Access the latest
documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, including how to create DEF, FMT, and DMT files.
Match/Consolidate
User’s Guide to Record
Matching
Explains the concepts behind name and address matching
software and provides examples of how to implement,
analyze, and fine-tune match detection strategies for the
best results.
Match/Consolidate
Job-File Reference
Match/Consolidate
Extended Matching
Contains the operational how-to instructions for setting
up the Match/Consolidate job file.
Contains the operational how-to instructions for setting
up extended matching.
Reference
Quick Reference
Contains descriptions of the input and output fields, and
the command line for the Match/Consolidate job file.
You can access Firstlogic documentation in several places:
On your computer. Release notes, manuals, and other documents for
each Firstlogic product that you’ve installed are available in the
Documentation folder. Choose Start > Programs > FirstlogicApplications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’
documentation.
6
Match/Consolidate Custom Programmer’s Reference
Chapter 1:
Introduction to Match/Consolidate Custom
This chapter explains the purpose of Match/Consolidate Custom. It provides
information about the exit points in Match/Consolidate job processing, how
applications use the Match/Consolidate Custom libraries, and about compiling
and linking.
Chapter 1: Introduction to Match/Consolidate Custom
7
Purpose of Match/Consolidate Custom
The purpose of using Match/Consolidate Custom, rather than the off-the-shelf
Match/Consolidate program, is two-fold:
Match/Consolidate Custom enables you to run Match/Consolidate jobs from
within your own C application. You might want to develop a proprietary user
interface, for example.
Match/Consolidate Custom gives you greater control over Match/
Consolidate processing. At selected points within the process, Match/
Consolidate calls your functions. Your routines, called exit functions, may
alter the results that Match/Consolidate would otherwise produce.
To use Match/Consolidate Custom, you
will create one or two layers of software:
one above our Match/Consolidate batch
application, and the other underneath.
(See the figure at the right.)
Match/Consolidate Custom includes
libraries of C functions for each layer. The
setup functions are called by your main
program; the exit-support functions
provide information to your exit routines.
Exit functions are optional. If you don’t
use them, you will obtain the same
results from Match/Consolidate that you
would obtain from our off-the-shelf
program.
Your application
Match/Consolidate Custom
Match/Consolidate
batch application
Your exit functions
Match/Consolidate Custom
exit support library
Match/Consolidate Custom is based on the batch-oriented Match/Consolidate
program. Match/Consolidate Custom is not appropriate for interactive
applications.
8
Match/Consolidate Custom Programmer’s Reference
Exit points in Match/Consolidate job processing
Eight exit typesAn exit function is a C function that is written by you and called by Match/
Consolidate. The eight types of exit functions are listed below and described in
full detail in “Writing exit functions” on page 13. You need not use the function
names listed here; however, we will use them in this manual to avoid confusion.
input_processing()
compare_before()
compare_after_dupe()
compare_after_nodupe()
dupe_group()
dupe_group_post()
output_processing()
parse()
Three exit statesAs listed in the following table, an exit function may be called in any of three
states. Note that this refers to a state that Match/Consolidate is in, not a state of
your application.
State of exit
function
Description
InitializationBefore a processing step begins, Match/Consolidate may call your
exit function in the init state. Such a call will be made only once, and
then usually to allocate memory or open files.
ProcessDuring processing, Match/Consolidate may call your exit function in
the process state. This call will be made once for each item to be
processed (input record, pair of records, dupe group, or output
record). Obviously, any exit function called in this state may have a
dramatic effect on the rate of processing.
TerminationAfter completing the step, Match/Consolidate may call your exit
function in the term state. Such a call will be made only once—
usually, to free memory, close files, or write a report.
Unsupported exitsMatch/Consolidate Custom does not support exits for reading a record from an
input file or writing a record to an output file. Neither does Match/Consolidate
Custom support exits during the creation of reports, posting to an input file, or
purging of dupes from input files.
Chapter 1: Introduction to Match/Consolidate Custom
9
How applications use the Match/Consolidate Custom libraries
Setup libraryYour application will call just a few functions in the Match/Consolidate Custom
setup library. They are listed here in the order that you would call them:
FunctionDescription
mpc_init()Initializes Match/Consolidate Custom and allocates memory
for Match/Consolidate Custom processing.
mpc_set_exit()Signals to Match/Consolidate Custom that an exit function will
be called, and sets the address of that exit function. Because
there are eight types of exit functions, you may need to call
mpc_set_exit() up to eight times.
mpc_process_job()Starts Match/Consolidate batch processing. This function
passes an entire command line to Match/Consolidate.
An alternative form of this call, mpc_process_jobv(), enables
you to pass command-line arguments from your main function
to Match/Consolidate.
mpc_term()Halts Match/Consolidate Custom and frees all memory allo-
cated for Match/Consolidate Custom.
There are two more functions that pertain to group posting in the setup library.
For more information about group posting, see “Group posting” on page 27.
Exit-support libraryThe library of exit-support functions is described in “Writing exit functions” on
page 13. There, you will find complete information about how to write exit
functions.
ErrorsTo retrieve information about an error during Match/Consolidate processing, you
may call mpc_get_error_info(). You may call this function from your main
program or from your exit functions.
Most Match/Consolidate Custom functions return MPC_OK if the function
completed successfully. If an error occurs, the global variable mpc_errno is set to
an error value and MPC_ERROR is returned. If your application detects
MPC_ERROR, it may call mpc_get_error_info() to get more information.
Sample programTo see an example of an application calling Match/Consolidate Custom, see the
sample program mpc_test.c.
The sample program is for use only as a learning tool. It is not a prototype or
product. Firstlogic does not support or authorize any use for commercial purposes
and disclaims any warranty regarding such use.
10
Match/Consolidate Custom Programmer’s Reference
Compiling and linking
All copies of Match/Consolidate Custom are shipped compiled and ready to link;
source code is not available.
UNIXWe compile the Match/Consolidate Custom Library with cc(1).
On some UNIX platforms, you must link in specific operating system libraries.
For details about which system libraries to link in, see the sample build script,
build. For information about your operating system libraries, consult your
operating system manuals or vendor.
To build our sample application, see the build file. Read the instructions in the file
and edit it before using it. You can use the build file to build our sample program
(mpc_test). To build our mpc_test application, you would type the command
build mpc_test.
WindowsWe compile using Microsoft Visual C++ (MSVC).
To build our sample applications, follow the guidelines in read_mpc.me.
Chapter 1: Introduction to Match/Consolidate Custom
11
12
Match/Consolidate Custom Programmer’s Reference
Chapter 2:
Writing exit functions
This chapter provides information about exit functions and a table of exit-support
functions by exit point.
Chapter 2: Writing exit functions
13
Introduction to exit functions
The Match/Consolidate Custom library allows your application to gain control of
job file processing at certain key intervals by way of exit functions. An exit
function is a callable function that is written by the user and is called by the
Match/Consolidate code. There are eight types of exit functions:
input_processing()
compare_before()
compare_after_dupe()
compare_after_nodupe()
dupe_group()
dupe_group_post()
output_processing()
parse()
Exit informationAs listed in the following table, an exit function may be called in any of three
states. Note that this refers to a state that Match/Consolidate is in, not a state of
your application.
State of exit
function
Description
InitializationBefore a processing step begins, Match/Consolidate may call your
exit function in the initialization (init) state. Such a call will be
made only once, and then usually to allocate memory or open files.
ProcessDuring processing, Match/Consolidate may call your exit function
in the process state. This call will be made once for each item to be
processed (input record, pair of records, dupe group, or output
record). Any exit function called in this state may have a dramatic
effect on the rate of processing.
TerminationAfter completing the step, Match/Consolidate may call your exit
function in the termination (term) state. Such a call will be made
only once, and then usually to free memory, close files, or write a
report.
It is your responsibility to free any memory that you allocate, and
close any files that you open. Ordinarily, this will be accomplished
by another call to your exit function in the termination state.
Match/Consolidate will not free memory or close files opened by
your application.
When you set up your exit function, you will also select the state(s) in which that
exit function will be called.
14
Match/Consolidate Custom Programmer’s Reference
State passed as
argument
You will write up to eight exit functions, as listed on the previous page. You will
not write 24 exit functions (eight exit function types times three states). Instead,
Match/Consolidate Custom will pass the state when it calls your exit function,
and your exit function should act accordingly.
Data retrievalMatch/Consolidate Custom offers several get functions to retrieve fields or whole
records, or get information about the input file or fields. For details, see the
“Table of exit-support functions by exit point” on page 26.
Match/Consolidate can retrieve data much faster from its key file than from the
input files. For best performance, we recommend that whenever possible, you use
mpc_get_key_field() instead of mpc_get_db_field() or mpc_get_pw_field(). In
our early tests, retrieval from the key file has been about three times faster than
retrieval from input files.
For performance reasons, you may wish to create extra key fields for use by your
exit functions. These key fields need not be used in the Match/Consolidate
matching process. See the Quick Reference for information about the PW fields
Merg_Purg1 through Merg_Purg5.
Error handlingMost Match/Consolidate Custom functions return MPC_OK if the function
completed successfully. If an error occurs, the global variable mpc_errno is set to
an error value and MPC_ERROR is returned. If your exit function detects
MPC_ERROR, it may call mpc_get_error_info() to get more information. Values
for mpc_errno are defined in the header file mpc.h.
Return statusYour exit functions should return either of the integers MPC_OK or
MPC_ERROR. If Match/Consolidate detects an MPC_ERROR return, it will
shut down gracefully. It is your responsibility, in your exit functions, to report the
error to the user.
Modifying Match/
Consolidate results
Your application should not modify Match/Consolidate results except as provided
for in the Match/Consolidate Custom exit-support library. In particular, do not
modify the Match/Consolidate work files.
A stated purpose of Match/Consolidate Custom is to enable you to modify Match/
Consolidate results. However, one of our design goals for the Match/Consolidate
Custom exit-support library is to preserve the integrity of Match/Consolidate
processing. That’s why there are some functions that you won’t find in the
library; for example, there is no mpc_put_key_field() function.
Chapter 2: Writing exit functions
15
input_processing() exit function
If you declare an input_processing() exit function, it will be called during the
processing step called Read Records (see the Execution block in the Match/
Consolidate job file). The input_processing() exit function can perform three
tasks:
Exclude a record from all processing
Exclude a record from the dupe search
Modify a record
Exclude from all
processing
Your input_processing() exit function may determine that an input record should
not be included in any further processing. To carry out this decision, your
input_processing() exit function should call mpc_set_process() with the argument
MPC_DECISION_NO. Match/Consolidate will treat the record as if it had failed
the input filter.
If a record fails the input filter, you have no way of reversing that decision,
because your input_processing() exit function will not be called.
Exclude from the
dupe search
Your input_processing() exit function may determine that a record should be
excluded from the search for dupes. In other words, a record may be declared to
be unique. To carry out this decision, your input_processing() exit function
should call mpc_set_unique() with the argument MPC_DECISION_YES.
Modify a recordYour input_processing() exit function may modify an input record. For this
purpose, Match/Consolidate Custom offers several get functions to retrieve fields
or whole records, or get information about the input file or fields. For details, see
the “Table of exit-support functions by exit point” on page 26.
If your input_processing() exit function modifies a record, Match/Consolidate
Custom will write the modified record back into the input file and use it for all
subsequent processing.
Initialization callYou might want Match/Consolidate to call your input_processing() exit function
in the initialization state in order to allocate memory or open files. At the time of
this initialization call:
Match/Consolidate has opened the job file with exclusive access.
The input files have been opened with read/write access.
The DEF and FMT files have been read and closed.
Note that it is your responsibility to free any memory that you allocate and to
close any files that you open. Ordinarily, this will be accomplished by another
call to your input_processing() exit function in the termination state. Match/
Consolidate will not free memory or close files opened by your application.
16
Match/Consolidate Custom Programmer’s Reference
Process callAt the Read Records step, Match/Consolidate Custom follows these steps:
1.Reads a record from the input file.
2.Runs the record through the input filter. If the record fails the input filter, the
input_processing() exit function will not be called.
3.Calls the input_processing() exit function in the process state, if one has been
set. You perform the remaining steps only if your exit function does not
exclude the record from processing (see previous page).
4.Determines to which list the record belongs.
5.Parses (and perhaps standardizes) the name and address data.
6.Generates key fields and stores them in the key file.
If an input_processing() exit function has been set, it is called after a record is run
through the input filter, but before Match/Consolidate assigns the record to a list.
Termination callYou might want Match/Consolidate to call your input_processing() exit function
in the termination state in order to generate a report, free memory, and close files.
At the time of this call, the job file and all input files are still open.
Chapter 2: Writing exit functions
17
compare_before() exit function
The compare_before() exit function is called during the Find Duplicates step of
job processing.
Compare recordsWhen called in the process state, your compare_before() exit function will be
presented a pair of records that are about to be compared. Your function may
determine that the pair of records are dupes. For example, you might decide that
if two records match on the Social Security Number, they are definitely a match,
no matter what the normal comparison might determine.
To carry out this decision, your exit function may call mpc_set_dupes(). As listed
in the following table, there are three possible arguments to this call:
ArgumentDescription
YesThe records are accepted as a match and Match/Consolidate cancels its
own comparison.
NoThe records are accepted as a nonmatch and Match/Consolidate cancels
its own comparison.
UndecidedThis is the default state. Match/Consolidate will compare the records as
usual, according to the match criteria and options set up in the job file.
Support functionsYour compare_before() function will deal with pairs of records. To manage this,
you may call mpc_get_num_current_rec() to determine which record of the pair
is the current record; this will return a 1 or 2. You may also call
mpc_set_current_rec() to select one record from the pair to be the current record.
Match/Consolidate Custom also offers several get functions to retrieve fields or
whole records, or get information about the input file or fields. For details, see the
“Table of exit-support functions by exit point” on page 26.
Match/Consolidate Custom lets you compare multiline unparsed addresses as
well. See the next page for more information on comparing addresses in Match/
Consolidate Custom.
PerformanceIf your compare_before() exit function is to be called in the process state, please
note: Your exit function may be called millions, or even trillions of times in a
single job, Exit processing may reduce the overall Match/Consolidate processing
rate, so keep this in mind when designing your exit function. Of course, the time
taken for exit processing may be partially offset if, because of your Yes or No
decision, Match/Consolidate cancels its normal comparison.
Excluded recordsMatch/Consolidate cancels some comparisons. Note that your compare_before()
exit function will not be called when Match/Consolidate cancels a comparison.
18
Match/Consolidate Custom Programmer’s Reference
Multiline unparsed
comparisons
You can also compare unparsed records. Multiline unparsed comparisons, and
parsed to unparsed comparisons (if the records are in the same break group) are
possible. This includes comparison of foreign addresses.
Below is one example of how you may use the compare_before() exit function to
compare unparsed records. This comparison matches on first name and the
overall similarity of two records. The example given below could easily be
expanded to include more database or non-address key fields.
int status; /* return status */
int percent; /* match_percent */
char rec_buf1[1024]; /* buffer for whole record 1 */
char rec_buf2[1024]; /* buffer for whole record 2 */
char fname_buf1 [20]; /* buffer for first name 1 */
char fname_buf2 [20]; /* buffer for first name 2 */
char parse_buf1 [2]; /* buffer for parse status of record 1 */
char parse_buf2 [2]; /* buffer for parse status of record 2 */
/* get record 1 comparison info */
status = mpc_set_current_rec(1);
status = mpc_get_ap_field(MPG_AP_PARSE, parse_buf1);
status = mpc_get_key_field(MPG_KEY_FIRSTNAME, fname_buf1);
status = mpc_get_record(rec_buf1);
/* get record 2 comparison info */
status = mpc_set_current_rec(2);
status = mpc_get_ap_field(MPG_AP_PARSE, parse_buf2);
status = mpc_get_key_field(MPG_KEY_FIRSTNAME, fname_buf2);
status = mpc_get_record(rec_buf2);
/* UNPARSED RECORD MATCHING */
/* If one of the records is unparsed, first names must be 80% alike
and the records must be at least 75% alike overall in this example.
*/
if (parse_buf1[0] !=32 || parse_buf2[0] !=32 { /* one or both
recs unparsed */
percent = simscore(recbuf1, strlen(rec_buf1), rec_buf2,
strlen(rec_buf2));
if (percent >= 75 &&
simscore(fname_buf1, strlen(fname_buf1), fname_buf2,
strlen(fname_buf2)) {
fprintf(stdout, “UNPARSED MATCH DETECTED!\n”);
status = mpc_set_dupes(MPC_DECISION_YES) /* DUPE OVERRIDE */
}
Chapter 2: Writing exit functions
19
compare_after_dupe() exit function
Re-compare
duplicates
The compare_after_dupe() exit function is called during the Find Duplicates step
of job processing. When called in the process state, your compare_after_dupe()
function will be presented a pair of records that have been compared and found to
be dupes—either by the normal Match/Consolidate comparison, or by your
compare_before() exit function.
For example, you might set rather loose match criteria in the job file. This will
cause Match/Consolidate to err on the side of matching. Then your
compare_after_dupe() exit function could re-evaluate each “matching” pair, to
reduce false alarms (see the diagram below).
Your compare_after_dupe() exit function may determine, despite the previous
finding, that the pair of records are not dupes. To carry out this decision, your exit
function may call mpc_set_dupes().
Match/Consolidate
says the records match.
Match/Consolidate
says the records do not
match.
To you, the records
are duplicates.
Correct detectionFalse match
Missed duplicateCorrect non-detection
To you, the records are
not duplicates.
If your compare_after_dupe() exit function is to be called in the process state,
please note: Your exit function may be called thousands of times in a single job.
Obviously, exit processing may reduce the overall Match/Consolidate processing
rate, perhaps dramatically. Please bear this in mind when designing your exit
function.
Support functionsYour compare_after_dupe() exit function will deal with pairs of records. To
manage this function you may call mpc_get_num_current_rec() to determine
which record of the pair is the current record; this will return a 1 or 2. You may
also call mpc_set_current_rec() to select one record from the pair to be the
current record.
Match/Consolidate Custom also offers several get functions to retrieve fields or
whole records, or get information about the input file or fields. For details, see the
“Table of exit-support functions by exit point” on page 26.
20
Match/Consolidate Custom Programmer’s Reference
compare_after_nodupe() exit function
Re-compare
non-duplicates
The compare_after_nodupe() exit function is called during the Find Duplicates
step of job processing. When called in the process state, your
compare_after_nodupe() function will be presented a pair of records that have
been compared and found not to be dupes—either by the normal Match/
Consolidate comparison, or by your compare_before() exit function.
For example, you might set rather tight match criteria in the job file. This will
cause Match/Consolidate to err on the side of not matching. Then your
compare_after_nodupe() exit function could re-evaluate each nonmatching pair,
to reduce missed dupes (see the diagram below).
Your compare_after_nodupe() exit function may determine, despite the previous
finding, that the pair of records are dupes. To carry out this decision, your exit
function may call mpc_set_dupes().
Match/Consolidate
says the records match.
Match/Consolidate
says the records do not
match.
To you, the records are
duplicates.
Correct detectionFalse match
Missed duplicateCorrect non-detection
To you, the records are not
duplicates.
If your compare_after_nodupe() exit function is to be called in the process state,
your exit function may be called thousands or even millions of times in a single
job. Obviously, exit processing may reduce the overall Match/Consolidate
processing rate, perhaps dramatically. Please bear this in mind when designing
your exit function.
Support functionsYour compare_after_nodupe() exit function will deal with pairs of records. To
manage this, you may call mpc_get_num_current_rec() to determine which
record of the pair is the current record; this will return a 1 or 2. You may also call
mpc_set_current_rec() to select one record from the pair to be the current record.
Match/Consolidate Custom also offers several get functions to retrieve fields or
whole records, or get information about the input file or fields. For details, see the
“Table of exit-support functions by exit point” on page 26.
Chapter 2: Writing exit functions
21
dupe_group() exit function
The dupe_group() exit function is called after the Find Duplicates step is
completed and after the records in each dupe group have been prioritized. For
information about dupe groups and prioritization, see the User’s Guide to Record Matching.
Choose a new
master dupe
If the dupe_group() exit function is active, it is called once for each dupe group.
The dupe_group() exit function will be able to change the positions of records in
the dupe group. In theory, you may alter the sequence of subordinate dupes. More
typically, the purpose of calling mpc_set_group_pos() will be to choose a new
master dupe. You should not alter the data in the master dupe; that is the purpose
of the dupe_group_post() exit function.
For example, your program might determine, for whatever reason, that one of the
subordinate dupes should be elevated to be the master dupe. Be sure to make that
subordinate dupe the current record (see Support functions below).
Then to carry out your decision, call mpc_set_group_pos() to set the new position
of the current record in the dupe group. The position of other group members will
be adjusted as necessary.
If your dupe_group() exit function is to be called in the process state, note that it
is impossible to predict how many dupe groups there will be, or how many times
your function will be called. That depends on how the job file is set up and on the
input lists.
Support functionsWhen called in the process state, your dupe_group() exit function will process
groups of unknown size. One member of the group is the current record. Initially,
this is the first record in the group, the master dupe. Your exit functions may call
the following:
FunctionDescription
mpc_get_num_recs()Determine how many records are in the dupe group.
mpc_get_num_current_rec()Determine which record in the group is the
current record.
mpc_set_current_rec()Select one record from the group to be the
current record.
pc_set_group_pos()Set the new position of the current record in the
dupe group.
Match/Consolidate Custom also offers several get functions to retrieve fields or
whole records, or get information about the input file or fields. For details, see the
“Table of exit-support functions by exit point” on page 26.
22
Match/Consolidate Custom Programmer’s Reference
Loading...
+ 50 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.