SAP Match/Consolidate 8.00c Library Reference

Match/Consolidate

Library Reference

Match/Consolidate 8.00c
April 2009
Copyright information © 2009 SAP® BusinessObjects™. All rights reserved. SAP BusinessObjects and its logos,
BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP company and/or affiliated companies in the United States and/or other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries. All other names mentioned herein may be trademarks of their respective owners.
2
Match/Consolidate Library Reference

Contents

Preface .............................................................................................................7
Chapter 1:
Install, compile, and link............................................................................... 9
Build on Windows..........................................................................................10
Build on UNIX...............................................................................................11
Chapter 2:
Overview of the Match/Consolidate Library............................................ 13
Before-and-after of record matching..............................................................14
Work with the Match/Consolidate Library ....................................................16
Configuration files make setup faster and easier ...........................................17
Sample programs show you how to call the library.......................................19
Error handling and progress callback.............................................................20
Work files.......................................................................................................21
Function calls for initialization and termination ............................................22
Chapter 3:
Reference files.............................................................................................. 23
Introduction to reference files ........................................................................24
Create a reference file ....................................................................................26
Add keys to a reference file............................................................................27
Maintain a reference file ................................................................................28
Chapter 4:
Lists, match specifications, and match levels............................................ 29
Introduction to lists.........................................................................................30
Set up lists ......................................................................................................32
Control matching within and between lists....................................................33
Give one list priority over another .................................................................35
Gather statistics for a group of lists................................................................36
Chapter 5:
Break groups................................................................................................ 37
Introduction to breaking .................................................................................38
Set up normal breaking ..................................................................................39
Set up adaptive breaking ................................................................................41
Set up automatic breaking..............................................................................43
Query break groups ........................................................................................44
Chapter 6:
Duplicate search and results....................................................................... 47
Introduction ....................................................................................................48
How Match/Consolidate selects keys to compare..........................................49
Sequence of function calls for finding and retrieving duplicates...................51
Contents
3
Chapter 7:
Match/Consolidate functions ..................................................................... 53
mp_break_find_groups() ...............................................................................54
mp_break_get_info() ..................................................................................... 55
mp_break_set_auto() ..................................................................................... 56
mp_break_set_info()...................................................................................... 58
mp_breakfld_get_info() ................................................................................. 60
mp_breakfld_set_info() .................................................................................61
mp_breakqry_do_query() .............................................................................. 62
mp_breakqry_get_largest_group()................................................................. 63
mp_breakqry_get_next_str()..........................................................................64
mp_breakqry_get_num_groups()................................................................... 65
mp_breakqry_get_num_keys() ......................................................................66
mp_breakqry_get_str_len()............................................................................67
mp_breakqry_init() ........................................................................................ 68
mp_breakqry_set_qryfld() ............................................................................. 69
mp_breakqry_term() ...................................................................................... 70
mp_cfg_close() .............................................................................................. 71
mp_cfg_get_num_ref_files() ......................................................................... 72
mp_cfg_get_ref_info()................................................................................... 73
mp_cfg_open()...............................................................................................74
mp_duperes_get_data().................................................................................. 75
mp_duperes_get_keyid() ...............................................................................76
mp_duperes_get_value()................................................................................77
mp_duperes_set_key_num()..........................................................................79
mp_duperes_set_match_level() ..................................................................... 80
mp_duperes_set_results_type() .....................................................................81
mp_dupesrch_batch().....................................................................................82
mp_dupesrch_break_queries()....................................................................... 83
mp_dupexit_get_data() .................................................................................. 84
mp_dupexit_set_exit_compare() ...................................................................85
mp_get_error_info()....................................................................................... 87
mp_get_error_messages()..............................................................................88
mp_get_error_number()................................................................................. 89
mp_get_revision().......................................................................................... 90
mp_init() ........................................................................................................91
mp_input_get_num_refs() .............................................................................92
mp_input_get_ref_num() ............................................................................... 93
mp_input_set_num_refs()..............................................................................94
mp_input_set_ref_num() ...............................................................................95
mp_list_get_default_action()......................................................................... 96
mp_list_get_default_list_num()..................................................................... 97
mp_list_get_list_attr().................................................................................... 98
mp_list_get_list_id()...................................................................................... 99
mp_list_get_list_name() .............................................................................. 100
mp_list_get_list_num_from_list_id() ..........................................................101
mp_list_get_match_spec()...........................................................................102
mp_list_get_num_lists() .............................................................................. 103
mp_list_get_num_lists_in_super()............................................................... 104
mp_list_get_num_match_levels()................................................................105
mp_list_get_num_super_lists()....................................................................106
mp_list_get_super_name()...........................................................................107
4
Match/Consolidate Library Reference
mp_list_get_super_num().............................................................................108
mp_list_get_super_status()...........................................................................109
mp_list_set_default_action()........................................................................110
mp_list_set_default_list_num()....................................................................111
mp_list_set_exit_match_spec()....................................................................112
mp_list_set_list_attr()...................................................................................113
mp_list_set_list_id().....................................................................................115
mp_list_set_list_name() ...............................................................................116
mp_list_set_match_spec()............................................................................117
mp_list_set_match_spec_autoid()................................................................118
mp_list_set_match_spec_ruleids()...............................................................119
mp_list_set_num_lists() ...............................................................................120
mp_list_set_num_match_specs() .................................................................121
mp_list_set_num_super_lists().....................................................................122
mp_list_set_super_list_name().....................................................................123
mp_list_set_super_list_num() ......................................................................124
mp_misc_get_blank_field_priority() ...........................................................125
mp_misc_get_keyfile_id() ...........................................................................126
mp_misc_get_option_info() .........................................................................127
mp_misc_get_work_file_info()....................................................................128
mp_misc_set_blank_field_priority()............................................................129
mp_misc_set_exit_blank_field()..................................................................130
mp_misc_set_exit_progress().......................................................................131
mp_misc_set_option_info() .........................................................................133
mp_misc_set_work_file_info() ....................................................................135
mp_ref_close() .............................................................................................136
mp_ref_open()..............................................................................................137
mp_refcreate_close()....................................................................................138
mp_refcreate_open() ....................................................................................139
mp_refcreate_set_mtc_key_id()...................................................................140
mp_refcreate_set_option() ...........................................................................141
mp_refmod_add_key() .................................................................................143
mp_refmod_clear_key() ...............................................................................144
mp_refmod_delete_key() .............................................................................145
mp_refmod_read_key()................................................................................146
mp_refmod_set_data() .................................................................................147
mp_refmod_truncate()..................................................................................149
mp_refmod_update_key() ............................................................................150
mp_refqry_get_data()...................................................................................151
mp_refqry_get_mtc_key_id().......................................................................152
mp_refqry_get_value().................................................................................153
mp_term().....................................................................................................155
Appendix A:
Configuration parameters and their corresponding API calls ..............157
Appendix B:
Refmorph utility .........................................................................................161
Index ............................................................................................................163
Contents
5
6
Match/Consolidate Library Reference

Preface

About Match/ Consolidate Library

Conventions

This manual is a reference for programmers working with the Match/Consoli­date Library. It explains how to make your application work with the library. Each chapter contains explanations, code examples, call sequences, and refer­ence pages about each of the function calls.
In this manual, we assume that you are already familiar with your programming language, your operating system, and with concepts of database management.
This document follows these conventions:
Convention Description
Bold We use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
Italics We use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file, and the
.txt
Menu commands
!
extension (
We indicate commands that you choose from menus in the following format: Menu Name > Command Name. For example, “Choose File > New.”
We use this symbol to alert you to important information and potential problems.
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know about.
We use this symbol to draw your attention to tips that may be useful to you.
Preface
7
Documentation
Other documentation Documents related to this manual include the following:
Document Description
Access the latest documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, includ­ing how to create DEF, FMT, and DMT files.
Match/Consolidate User’s Guide to Record Matching
Explains the concepts behind name and address matching software and provides examples of how to implement, analyze, and fine-tune match detection strategies for the best results.
Match/Consolidate Extended Matching
Contains the operational how-to instructions for setting up extended matching.
Reference
Match Library Program-
This is a reference manual for the Match Library.
mer’s Reference
Quick Reference
Contains descriptions of the input and output fields, and the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for each
product that you’ve installed are available in the Documentation folder. Choose Start > Programs > Business Objects Applications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’ documentation.
8
Match/Consolidate Library Reference
Chapter 1: Install, compile, and link
This chapter provides information about installing the software, compiling your application, and linking the compiled application with the Match/Consolidate (MCD) libraries for Windows and UNIX systems.
Chapter 1: Install, compile, and link
9

Build on Windows

We provide 32-bit dynamically-linked libraries (DLLs), which you can use with Windows NT 4.0, Windows 2000 Professional, Windows XP, and Windows 2000 Advanced Server.

Installation If you need installation information, refer to the System Administrator’s Guide.

Refer to pw\mplib for the MCD library.

Compilers and programming languages

Additional compilation flags

Create sample programs

See InfoSource for the latest compiler information.
Beginning with Match/Consolidate release 7.31c, additional compilation flags are required. These flags are platform specific. For example, on Windows, add the following: /D “FL_WIN32”
For more information, see the sample build scripts.
We provide a build file, read_dll.me for the sample programs. Refer to this file for step-by-step instructions for creating a Visual C++ project to build samples.
10
Match/Consolidate Library Reference

Build on UNIX

Installation If you need installation information, refer to the System Administrator’s Guide.

Refer to …/postware/mplib for the MCD library.

Compiler See InfoSource for the latest compiler information.

Additional compilation flags

Create sample programs

Beginning with Match/Consolidate release 7.31c, additional compilation flags are required. These flags are platform specific. For example, on Solaris 64-bit, add the following: -DFL_UNIX -DFL_UNIX_SOL
For more information, see the sample build scripts.
Refer to buildmp for examples of how to build the sample programs. Modify this file to set your include and library directories.
Chapter 1: Install, compile, and link
11
12
Match/Consolidate Library Reference
Chapter 2: Overview of the Match/Consolidate Library
This chapter provides information about the before-and-after of record matching and the basic steps for working with Match/Consolidate (MCD) Library. It also provides information about configuration files, sample programs, error handling and progress callbacks, work files, and function calls.
Chapter 2: Overview of the Match/Consolidate Library
13

Before-and-after of record matching

The MCD Library is a companion to Match Library. Match Library compares two records and determines whether or not they match. MCD Library works with the before-and-after of record matching.

Condense records into essential data

MCD Library MCD Library
Select a pair of
records to compare
Match
Library
Compare the records Use the results
Theoretically, you could compare each complete, original record with every other complete, original record, but that would take a very long time. To save time, you’ll condense records into the data essential for matching and select for comparison only records that have a reasonable chance of matching.
When you compare records, you’ll decide which data must match—and how closely—for records to be considered a match. Theoretically, you could use your complete, original records for comparisons, but such comparisons would be prohibitively slow.
To make matching more efficient, use our Match Library to condense records so that they contain only the data needed during the matching process. These condensed records are called keys.
Original record Key
FirstName: JoAnne
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
City: Leeds
State: MA
ZIP: 01053
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
ZIP: 01053
Use the MCD Library to collect keys into a file called a reference file. The reference file contains the Match Library key data, plus additional information— such as list identifiers—that you can add when you build the reference file.
14
Match/Consolidate Library Reference

Eliminate needless comparisons

Theoretically you could compare every record to every other record, but that would take a long time. And many comparisons wouldn't make a lot of sense-for example, a record with a ZIP Code of 01234 does not match a record with a ZIP Code of 98765, so it doesn't pay to compare them at all.
To eliminate comparisons between records that are unlikely to match, you can separate records into clusters called break groups. For example, you could separate records by ZIP Code and look for matches only within each ZIP Code group.
Forming break groups eliminates a huge number of unnecessary comparisons. For a more in-depth discussion-including examples showing how many comparisons can be eliminated with a minimal effect on results, refer to the User's Guide to Record Matching.
You can also use lists to eliminate unnecessary comparisons. For example, if you have a database which you know contains no duplicate records, you can assign all the records in that file to a list, and tell MCD to cancel comparisons between records within that list.

Use different matching rules for different lists

Form groups of duplicate records

To control how records are compared, you can assign each record to a list. A list is a group of records that have some common characteristic—perhaps all of the records come from the same input database, or all have a common demographic code.
You can then use different matching rules with different lists. For example, the matching rules for comparisons within List A could be different from the matching rules for comparisons between List A and List B. When you pass a pair of records to the match engine, you can use list membership to dictate which match rules are used for the comparison.
The MCD Library interprets the matching results and tracks information relevant to the duplicate-detection process. You can query these results and then handle the information however you choose.
Chapter 2: Overview of the Match/Consolidate Library
15

Work with the Match/Consolidate Library

When you work with the MCD Library, you’ll follow these basic steps:
1. Create a file that contains only the record-data needed for matching. Such a file is called a reference file, and the condensed records are called keys.
2. Define logical groups of records called lists. You’ll do this at the same time you create your reference files by putting a list identifier into each reference file or into each individual record key.
3. Separate the record keys into break groups. For example, you might separate records by the first three digits of the ZIP Code.
4. Within each break group, select pairs of record keys and pass them to the match engine. If you have more than one set of match rules, you can control which set of rules to use for each pair of records.
5. After the match engine compares the record keys, query the results.
Reference files Key pairs
List 1
Key A Key D Key B Key E Key C Key F
List 2
Key G Key K Key H Key L Key J Key M
Break groups Comparisons Results
Key F Key H Key J Key M
Key A Key D Key G
Key B Key E
Key C Key K Key L
Key F
Key H
Match
Library
Not
duplicate
16
Match/Consolidate Library Reference

Configuration files make setup faster and easier

Match/Consolidate Library offers external text configuration files that contain some of the settings and options to be used for your MCD session. Set the parameters of the text configuration files, then call mp_cfg_open(). This function sets your options and returns a session handle.

Advantages of configuration files

Some advantages to using external text configuration files include the following:
Rather than making dozens of calls to set up a session handle, you make one
call to mp_cfg_open(). Compared with making direct calls to the conventional API, this should reduce application code size, training time, development time, errors, and testing time.
You can change breaking strategy, list definitions, and reference file setup
simply by editing the configuration files. Therefore, you can test different scenarios without changing API calls and rebuilding.
With small modifications to your code, you can use one code base to support
many different MCD scenarios. For each scenario, make a different set of text configuration files.
Configuration files are heavily commented so that users can edit them
without referring to printed documentation. In fact, all the potential options and settings are included in the comments, so making additions and changes to the configuration can be mostly copy-and-paste.

Preset MCD strategy We include a set of configuration files already set for the most commonly-used

MCD strategy. Most users will be able to use these without further editing or modification. Just specify the location and file name of the mp.cfg file in your mp_cfg_open() call.

Five configuration files

Match/Consolidate Library configuration parameters are distributed among five configuration files, as described below.
Configuration file File name Description
Overall
Reference
List
Break
Miscellaneous
mp.cfg
mpref.cfg
mplist.cfg
mpbreak.cfg
mpmisc.cfg
Specifies the paths and file names of the other configuration files.
Controls the format of the reference file, which contains the record keys.
Defines logical groups of records.
Identifies which fields to use to form break groups, and which characters in those fields to use for breaking.
Set miscellaneous options such as the work­file directory.
Chapter 2: Overview of the Match/Consolidate Library
17
MP_Reference_Config_File
MP_Reference_File_Name: test.ref MP_Reference_Constant_List_ID: 1 MP_Reference_List_ID_Field_Length: MP_Reference_Misc_Header_Field_Length: 10 MP_Reference_Misc_Key_Field_Length: 10 MP_Reference_Priority_Field_Length: 10
mp_cfg_open()
MP_Overall_Config_File
MP_Reference_Config_File: mpref.cfg MP_List_Config_File: mplist.cfg MP_Break_Config_File: mpbreak.cfg MP_Misc_Config_File: mpmisc.cfg MP_Keyfile: mplib.key
MP_List_Config_File
MP_List_Number_of_Lists: 1 MP_List_Number: 1 MP_List_ID: 1 MP_List_Name: List1 MP_List_Priority: 1 MP_List_Type: NORMAL MP_List_Compare_Within_This_List: YES MP_List_Break_Priority: 1 MP_List_Apply_Blank_Priority: YES MP_List_Data_Salvage: YES MP_List_Default_List_Action: ASSIGN_DEFAULT MP_List_Default_List_Number: 1
MP_Break_Config_File
MP_Break_Number_of_Break_Fields: 2 MP_Break_Field_Name: MTC_KEYFLD_ZIP MP_Break_Starting_Position: 1 MP_Break_Field_Length: 5 MP_Break_Mode: NORMAL
MP_Misc_Config_File
MP_Misc_Work_Dir: . MP_Misc_Sort_Dir: . MP_Misc_Session_Name: session1 MP_Misc_Max_Work_Buffer: 1024 MP_Misc_Sort_Option: SPEED MP_Misc_Dupe_File_Name: sample.dpd MP_Misc_Prioritize_Dupes: LIST_FIELD MP_Misc_Key_Work_File_Name: sample.dpk MP_Misc_Clean_Up_Work_Files_When_Done: YES
18
Match/Consolidate Library Reference

Sample programs show you how to call the library

Sample programs We provide three sample programs and their source code.

Program Description
reftest.c
mptest.c
mptest2.c

Modules The sample programs use the following modules.

Program Description
suppfncs.c
errfncs.c

Build the sample programs

For instructions on building the sample programs, see “Install, compile, and link”
on page 9.
Demonstrates how to create and load a reference file. In this sample program, data is read from a sample ASCII data file.
Demonstrates how to open a previously created reference file, set up lists and matching, set up progress callbacks, find break groups, find duplicates, and query duplicate results.
Demonstrates how to use the MCD Library configuration files.
Contains supporting functions used by all of the sample programs.
Demonstrates error handling.
Chapter 2: Overview of the Match/Consolidate Library
19

Error handling and progress callback

Nearly every function in the MCD Library returns a status code. It is important to check the status code after every MCD Library function call. If MCD returns the value MP_ERROR, your application must decide whether to exit or not, what information to display, and so forth.

Error handler The easiest way to handle error checking is to set up an error-handling function.

Call your error handler whenever MP_OK is not returned.
Inside your error handler, you can call mp_get_error_number() and mp_get_error_messages() to obtain more information about the failed function call.
You can call mp_get_error_info() rather than mp_get_error_number() and mp_get_error_messages(). However, mp_get_error_info() may not work with Visual Basic.
For sample code, refer to the errfncs.c module in the samples subdirectory.

One error handler for MCD and Match

If you want to use just one error handler for MCD and Match Library function calls, you need to detect whether the current error is MCD or Match Library to determine which get_error_info() function to call. You can do this by checking the prefix of the error-causing function (mp or mtc), or by setting a variable and passing its value as a parameter when calling your error handler.

Progress callbacks Certain MCD Library processing steps—such as breaking and finding

duplicates—work with multiple records and potentially involve I/O and sorting. If you wish to display progress information for MCD Library processing steps, you can set up progress callbacks for the desired processing steps.
For sample code, refer to the suppfncs.c module in the samples subdirectory.
20
Match/Consolidate Library Reference

Work files

During the processes of creating break groups and finding duplicates, MCD creates work files. You can specify where to store these files, what to call them, and whether to delete them automatically.

Work files for break groups

When finding break groups, MCD creates work files. By default, these files are stored in the current directory. If you’d prefer to store them somewhere else, you can specify a work directory.
You must specify a work-file session name. MCD uses the session name as the base file name for work files. For example, if you assign a session name of “test,” work files will have names such as test.dbk.

Work files for duplicate results

When finding duplicates, MCD builds two additional work files—one for duplicate results, and one for key-file information. These two files can become quite large, so you may want to store them in a separate location. You can specify the path and file names for each of these two work files.

Sample code To set work-file options, call mp_misc_set_work_file_info(). For sample code,

refer to the suppfncs.c module in the samples subdirectory.
Chapter 2: Overview of the Match/Consolidate Library
21

Function calls for initialization and termination

You may use some or all of the functionality of the MCD Library. You’ll call these functions for any MCD application.
1. Call mp_init() to initialize the MCD Library.
Initialize the Match Library before initializing the MCD Library. For more information, refer to your Match Library documentation.
2. If you are not using configuration files, call mp_set_keyfile() to set the path and file name of the installation key file, mplib.key. You must specify the location of this file before calling any other MCD functions.
This key file is not related to the record keys stored in your reference file. This key file “unlocks” the library for use.
3. If you are using configuration files, call mp_cfg_open(). For details about configuration files, see “Configuration files make setup faster and easier” on
page 17.
4. Set up an error-handling function. Inside your function, call mp_get_error_info(). For more information, see “Error handling and progress
callback” on page 20.
5. Optional: Call mp_get_revision() to get version information for the MCD and Match Libraries. You will need this information before calling for technical support.
6. If you are not using configuration files, call the following functions to set miscellaneous options and work-file information:
mp_misc_set_option_info()
mp_misc_set_work_file_info()
7. If you want to establish exit functions—for example, to report progress at crucial points during processing—call mp_misc_set_exit_progress() or mp_misc_set_exit_blank_field() to register your exit functions with the MCD Library.
8. Perform processing. A complete, end-to-end job process would involve the following major steps:
Create, open, or update reference files (Chapter 3).
Create lists and super lists (Chapter 4).
Form break groups (Chapter 5).
Search for duplicates and retrieve results (Chapter 6).
9. If you are using configuration files, call mp_cfg_close().
10. Call mp_term() to terminate the MCD Library and free global memory allocated for it.
22
Match/Consolidate Library Reference
Chapter 3: Reference files
This chapter introduces reference files and provides information about creating reference files, adding keys to a reference file, and maintaining reference files.
Chapter 3: Reference files
23

Introduction to reference files

A reference file is a specialized work file that contains record keys. The record keys are condensed versions of your original record data, containing only the data needed to form break groups, determine list membership, perform matching, and rank records within duplicate groups.
You’ll use the Match Library to define most of the key layout—namely, which record data to include in the key. When you use MCD to create a reference file, you’ll tell MCD which match key to use, and you might add useful MCD key fields to each key.

Header and keys A reference file consists of header data and record keys.

The header contains information about the key layout. You can also place a list identifier and other user data in the header.
The record keys contain the record data that is needed to form break groups and perform matching; you use the Match Library to define this portion of the key. When you create a reference file, you can also add a record-mapping field, record-priority field, and list identification field to each key.

Temporary or reusable

Header
Key layout, user “miscellaneous” field, list ID (if constant)
Keys
Overhead data, record data, record-mapping field, record-pri­ority field, list ID field (if variable)
A reference file can be used as a temporary work space. For example, if some of your duplicate-detection data comes from a transaction file which is always changing, you can create a temporary reference file from your transaction data each time you want to find duplicates.
Alternately, once you generate keys and store them in a reference file, you can save the reference file and add, delete, update, and reuse the keys. However, many programmers find it easier to regenerate the reference file each time they find duplicates, rather than maintaining and updating keys in an existing reference file.
24
Match/Consolidate Library Reference

Key data A key contains record data plus other data such as a list identifier or record-

mapping information. Keys must include all of the data you want to use to form break groups and perform matching.
You’ll use the Match Library to define the layout and content of the key. When you use the MCD Library to create the reference file, you’ll specify which match key to use. Optionally, you can add the MCD key fields listed below.
Key field Description
List ID Defines to which list a key belongs.
Field priority
Key miscella­neous
Defines a record’s priority within a group of duplicate records.
Stores user information such as a record identifier that allows you to map a key back to a particular record in your database.

Key length Each key consists of a overhead data, Match Library key fields, and MCD

Library key fields. To calculate the length of each key, find the sum of the following items:
The size of the overhead data (16 bytes per key).
The length of each Match Library key field, including any alternates.
The length of each MCD key field.
Match/Consolidate pads each key so that the total key length is divisible by eight.

List identifiers There are two ways to store list identifiers:

If all keys in the reference file belong to the same list, you can store the list
ID once, in the reference-file header.
If the reference file contains keys from multiple lists, store the list ID in each
key, in the MCD list ID field.
Chapter 3: Reference files
25

Create a reference file

Configuration file If you use the MCD configuration files, use the Reference configuration file,

mpref.cfg, to define MCD key fields and reference-file header information. For instructions about completing the configuration parameters, see the comments in the configuration file.
This configuration file replaces calls to mp_refcreate_set_option(). For more information about using configuration files, see “Configuration files make setup
faster and easier” on page 17.

Sequence of function calls

To create an empty reference file, call the functions below. For sample code, refer to reftest.c.
1. Use the Match Library to define a match key. The match key defines the fields that hold record data—for example, personal-name data, address data, and so on. For more information, refer to the Match Library Programmer’s Reference manual.
2. Call mp_refcreate_open() to initialize the library and open a session to create a reference file.
3. Call mp_refcreate_set_mtc_key_id() to specify which match-key layout to use. You used the Match Library to define the match-key layout (step 1).
4. Call mp_refcreate_set_option() to reserve space for MCD key fields and set options. If you are using MCD configuration files, you do not need to make these calls.
5. Call mp_refcreate_close() to lock your settings and create the reference file.
26
Match/Consolidate Library Reference

Add keys to a reference file

To load data into a reference file, add keys. Use the Match Library to load record data into each key. Use the MCD Library to load data into the MCD key fields.
To add keys a reference file, call the functions below. For sample code, refer to reftest.c.
1. Call mp_ref_open() to open a previously created reference file.
2. Call mp_refqry_get_mtc_key_id() to get the match-key ID for the reference
file. You will pass this ID as input to Match Library functions.
3. Enter a loop to load data into each key:
Call mp_refmod_clear_key() to clear the internal key buffer before
loading data.
Use the Match Library to load record data into the Match key fields. For
more information, see your Match Library manual.
Call mp_refmod_set_data() to set data in MCD key fields. Call this
function once for each MCD field.
Call mp_refmod_add_key() to add the completed key to the reference
file.
4. Call mp_ref_close() to close the reference file.
Chapter 3: Reference files
27

Maintain a reference file

Many programmers find that it’s easiest to regenerate reference files each time they perform MCD processing, rather than trying to maintain the files. Often the time needed to regenerate a reference file is minimal, especially when weighed against the work of maintaining it.
In some cases, however, you may prefer to maintain an existing reference file. To update an existing key, you need to find the key, modify the key fields as necessary, and write the data back to the key.
To modify a reference file, use the mp_refmod_*() functions and the appropriate Match Library functions. To query the contents of keys, use the mp_refqry_*() functions and the appropriate Match Library functions. For more information about working with Match Library key data, refer to the Match Library Programmer’s Reference manual.
28
Match/Consolidate Library Reference
Chapter 4: Lists, match specifications, and match levels
This chapter provides an introduction to lists and explains how to set up lists, control matching within and between lists, give one list priority over another, and how to gather statistics for a group of lists.
Chapter 4: Lists, match specifications, and match levels
29

Introduction to lists

A list is a group of records that are related in some way—for example, all of the records might have come from the same input database. Lists give you added control over the matching process:
Control which matching rules to use when comparing two records.
Cancel comparisons within a list if you know there are no duplicate records
within that list.
Assign one list priority over another. For example, you could favor a house
list over a rented list.
Prioritize records within a break group.

List membership When you set up a list, you assign a list identifier for that list. A record is a

member of that list if the record has the same list identifier. There are two ways to assign records to lists:
If all of the records (keys) in a reference file belong to the same list, you can
put the appropriate list ID in the file header.
Alternately, you can include a list ID field in your record key. The value in
the list ID field indicates to which list a particular record belongs.

Types of lists You can define three different types of lists:

List type Description
Normal Contains good or eligible records (default).
Suppression Contains records that should be suppressed. Suppression records and
all records that match them should be removed from the output.
Special Contains records that are not counted in the determination of whether
a duplicate group is single-list or multiple-list.
If a record doesn’t
If a record doesn’t belong to any of your defined lists, you can:
belong to any list
Action Outcome
Ignore Leave the record out of the job.
Abort Return an error code.
Assign Assign the record to a default list that you specify.
30
Match/Consolidate Library Reference

Priority You can use lists to influence how records are ranked within break groups and

how records are ranked within duplicate groups.
Type of priority Description
Break priority After finding break groups, MCD sorts records within each
group. Records are sorted first by break field, then by a “break­priority” value that you can set for each list. The top-ranked record in each break group becomes the driver record during matching.
List priority Apply blank priority
After finding duplicates, MCD sorts records within each dupli­cate group. There are two ways you can use lists to influence this sorting process:
You can set a list-priority value for each list and have MCD sort by list priority. For example, you might favor a house list over a rented list.
You can also tell MCD whether to apply your blank-priority setting to records from a particular list.

Data salvaging During matching, MCD can salvage data from a matching record and temporarily

place that data into the driver record for subsequent matching. You can turn this feature on or off for each list.
Chapter 4: Lists, match specifications, and match levels
31

Set up lists

There are two ways to set up lists: through the List configuration file or through function calls.

Configuration file If you are using the MCD configuration files, use the List configuration file,

mplist.cfg, to define lists. You’ll still use API calls to set up match specifications, match levels, and super lists.
For instructions about completing the configuration parameters, see the comments in the configuration file. The list configuration file replaces all of the function calls listed below.

Function calls If you are not using MCD configuration files, call these functions to define lists.

For sample code, refer to suppfncs.c.
1. Call mp_list_set_num_lists() to set the number of lists and the number of match levels to use. You must define at least one list and one match level.
2. Enter a loop. Make the following calls for each list that you want to define.
Call mp_list_set_list_name() to set the name of the list.
Call mp_list_set_list_id() to set the list ID. A record key must have the
same list ID to be considered a member of this list.
Call mp_list_set_list_attr() to specify list attributes.
3. Call mp_list_set_default_action() to specify what action to take if a record key does not belong to any of your defined lists.
4. If you set the default action to
MP_DLIST_ACTION_ASSIGN, call
mp_list_set_default_list_num() to specify which list is the default list.
32
Match/Consolidate Library Reference

Control matching within and between lists

To control matching within and between lists, you can define multiple match specifications and match levels. For example, you might perform household matching, then perform individual matching within each household.

Match specification A match specification defines matching rules to use within a list or between two

lists. A match specification consists of a Match Library automatic-matching session or rule-matching sessions, or a callback function. setting up an automatic matching session (auto_id) or a rule matching session (rule_id), refer to the Match Library Programmer’s Reference manual.
auto_id rule_id auto_id and rule_id callback
auto_id rule_id1 auto_id A user-supplied function returns
any of the three match specifica­tions shown at left.
rule_id2 rule_id1
For information about
n
rule_id2
...
rule_id
n
...
rule_id
By default, MCD compares all records from all lists using the first match specification. If desired, you can set up specific match specifications to control comparisons within or between particular lists. You can also cancel comparisons within or between lists.

Match level A match level consists of one or more match specifications. Duplicates are found

at each level, and then the results are made available for the next match level.
For example, you could use multiple match levels if you wanted to detect duplicates at the household level and at the individual level within the household level.
Input
John
Robert
Mary
Jonathon
Jerry
Anna
Smith
Johnson
Jones
Smith
Doe
Smith
Duplicate group Match level 1: Household
John
Jonathon
Anna
Smith
Smith
Smith
Duplicate group Match level 2: Individual
John
Jonathon
Smith
Smith
Chapter 4: Lists, match specifications, and match levels
33

Order is important The order of the match levels is important because duplicates are found at each

level, and only the results are made available for the next level. Usually, you will define your broadest match levels first, followed by more specific match levels.
In the example on the previous page, if the order of the match levels were reversed, Anna Smith would not be included in the duplicate group at the individual match level, and thus would not be available for duplicate detection at the household match level.

Considerations If you use multiple match levels with many lists, memory requirements grow

quite rapidly. Also, extra match levels require more comparisons and processing, so duplicate results are more complicated to retrieve and understand.

Function calls To define match specifications and match levels, call the functions below. For

sample code, refer to suppfncs.c.
1. Call mp_list_set_num_lists() to set the number of lists and match levels that will be defined. You must define at least one match level.
If you use configuration files, you do not need to call this function.
2. Call mp_list_set_num_match_specs() to set the number of match specifications that will be defined. You must define at least one match specification.
3. For each match specification, call mp_list_set_match_spec_ruleids(), mp_list_set_match_spec_autoid(), or mp_list_set_exit_match_spec() to define the match specification.
4. By default, MCD Library compares all records within or between all lists, using the first match specification. To change comparisons within or between lists, to set multiple match levels, or to cancel comparisons within or between lists, call mp_list_set_match_spec().
34
Match/Consolidate Library Reference

Give one list priority over another

For information about list priority, refer to the User’s Guide to Record Matching.

Master and subordinate records

During the matching process, MCD assembles matching records into duplicate groups. Within each group MCD ranks records. As shown at right, the best record is called the “master” record, and all remaining records are subordinate duplicates.
When you set up lists, you can give one list priority
Master record
Subordinate duplicate 1
Subordinate duplicate 2
Subordinate duplicate 3
over another, and thus control how records are ranked within duplicate groups.

List priority You can use list priority set a preference for one list over another. For example, if

you are processing a house list and a rented list, you might prefer to keep records from the house list.
To control list priority, you assign penalty points to each list. The more penalty points you assign a list, the lower its priority. For example, if you assign your house list a score of 1 and the rented list a score of 10, records from the house list will be ranked more highly. Remember: The lower the penalty score, the higher the priority.
To set list priority, call mp_list_set_list_attr().
Chapter 4: Lists, match specifications, and match levels
35

Gather statistics for a group of lists

A super list is a group of lists. You can use super lists to prepare a second set of match statistics, combining the statistics for two or more regular lists.
For example, suppose you define five lists—two house lists and three rented lists. You would get match statistics for each individual list. But suppose that you also wanted a summary for the house lists and a summary for the rented lists. You could create two super lists—one for the house lists, and one for the rented lists.
Super lists affect only the way that match statistics are reported. They do not affect matching or record priority.

Function calls To create a super list, call these functions:

1. Call mp_list_set_num_super_lists() to set the number of super lists that will be defined.
2. For each super list,
Call mp_list_set_super_list_name() to set the super-list name.
Call mp_list_set_super_list_num() to specify which lists belong to this
super list.
36
Match/Consolidate Library Reference
Chapter 5: Break groups
This chapter provides an introduction to breaking and explains how to set up normal, adaptive, and automatic breaking. The chapter also provides information about querying break groups.
Chapter 5: Break groups
37

Introduction to breaking

Eliminate needless comparisons

Breaking is a judgment by you, the user, that certain records shouldn’t be compared because there is no realistic probability that they would be considered duplicates. For example, many users form break groups based partly on the first three digits of the ZIP Code, because if the first three digits of the ZIP Code do not match, there is virtually no chance that the records match.
During breaking, input keys are sorted into groups, based on the field data that you identify for breaking. Then, when keys are compared during the search for duplicate records, comparisons are made only within—not among—these groups. This can drastically reduce the number of comparisons that MCD must make, and thus substantially reduce processing time.

Types of breaking The MCD Library offers three types of breaking:

Type of breaking
Normal breaking
Adaptive breaking
Description
You specify which fields to use for breaking.
Helps you balance performance (which is optimal with small break groups) and matching effectiveness (which is optimal with larger break groups). You specify which fields to use for breaking and the maximum break-group size. MCD breaks according to your settings, but combines small break groups whenever possible.
Automatic breaking

Break-group priority After forming break groups, MCD sorts the records within each group. Records

You make a few settings and select a few options, and MCD decides which fields to use for breaking.
are sorted first by break field, then by a “break priority” setting that you can make for each list. The first record in each list becomes the driver record for matching.
You can use break-priority to increase the chances that records from a particular list becomes the driver records for matching.

Queries After breaking, you can query break groups to determine the success of your

breaking strategy. You can also use break group queries to select certain break groups for duplicate detection.
38
Match/Consolidate Library Reference

Set up normal breaking

For normal breaking, you specify which fields, and precisely which characters from the field, to use for breaking. To use a field for breaking, that field should be common to all of your reference files. For example, if one of your files does not have a firm-name field, do not use the firm-name field for breaking.
When you choose a field for breaking, consider the quality of the input data. For example, if you use unstandardized data, or a field that is sometimes blank, some duplicates may be missed.

Unstandardized data If possible, the field on which you break should contain standardized data.

Otherwise, typing errors or inconsistencies may cause matching records to be placed into different break groups. Those records will never be compared and thus never detected as duplicates.
For example, if you were to break on the first three characters of an unstandardized first-name field, you would fail to catch the following records as duplicates:
Richard Smith
100 Main St
Dick Smith
100 Main St

Five-digit ZIP Codes Some ZIP Codes serve Post Office boxes only. This is important if you are

matching business addresses, which sometimes use a street address and sometimes use a PO box. If you broke on all five digits of ZIP, you would miss these duplicates:
ACME Hardware
1234 Main St
La Crosse WI 54601
ACME Hardware
PO Box 100
La Crosse WI 54602

Blank fields It is risky to break on any field that is empty in some records. For example, if you

were to break on the first three digits of the telephone number, you would fail to catch the following duplicates:
Jane Smith
100 Main St
608.788.8154
Jane Smith
100 Main St
Chapter 5: Break groups
39

Match options Some breaking strategies may defeat match options that you defined in the Match

Library.
For example, suppose you instructed the Match Library to search for acronym matches in the firm field, to detect matches such as IBM and International Business Machines. If you also used the firm field for breaking, acronyms and spelled-out names would end up in separate break groups and thus would never be compared. For example, you would miss these duplicates:
Rita Terranova
ETI

Configuration file If you are using the MCD configuration file, you can use the break configuration

Rita Terranova
Eco Technologies Inc.
file, mpbreak.cfg, to set up normal breaking. For instructions about completing the configuration parameters, see the comments in the configuration file.
The configuration file replaces calls to mp_break_set_info() and mp_breakfld_set_info().

Sequence of function calls

Before breaking, you must specify the work directory and the MCD Library session name so that work files can be built.
1. Optional: You may wish to write an exit function to report progress to the user during the process of finding break groups. Register your function by calling mp_misc_set_exit_progress() during initialization.
2. Call mp_break_set_info() to set the break mode and number of break fields.
If you do not set the number of break fields, MCD assumes there are no
!
break fields and places all records into one break group.
3. For each break field, enter a loop and make three calls to mp_breakfld_set_info() to set the break field type, start position, and field length.
4. Specify which reference file(s) to use. Call mp_input_set_num_refs() to set the number of reference files, then call mp_input_set_ref_num() once for each reference file to attach it to the session.
5. Call mp_break_find_groups() to form break groups.
40
Match/Consolidate Library Reference

Set up adaptive breaking

Good break groups are essential for optimum performance. If break groups are too large, the job may take a long time to process. Or, if break groups are too small, you can miss duplicates. With adaptive breaking, we balance performance and matching effectiveness. We do this by keeping the size of each break group below a specific value that you can set. The following is an example of how this works.
You set up break fields just as you would with normal breaking, plus you set a maximum break group size. Breaking is first done to the finest level possible according to your settings, just as with normal breaking. But then, break fields are adjusted in an attempt to combine small break groups into larger groups, thus increasing your chances of catching duplicates.
For example, suppose that we have a maximum break group size of 40 record keys, and we set breaking for all five digits of the ZIP Code and the first two characters of the street name. First, our defined break groups would be created. Then, wherever possible, small break groups would be combined into larger groups without exceeding the 40-record maximum:
ZIP: 54601
Street: Ma
27 Records
ZIP: 54603
Street: Me
5 Records
ZIP: 546
39 Records
MCD can combine the first three break groups by adjusting the ZIP-Code
ZIP: 54660
Street: Na
7 Records
ZIP: 54494
Street: Na
13 Records
ZIP: 54494
20 Records
ZIP: 54494
Street: N
33 Records
Street: Ne
ZIP: 54494
Street: Ma
34 Records
break and forming a “546” group.
If MCD tries to combine the next three break groups on the “54494” ZIP
Code, resultant groups would exceed the 40-record maximum. However, MCD can combine the “54494 / Na” group and “54494 / Ne” group by adjusting our street-name break and forming a “54494 / N” group.
MCD leaves the “54494 / Ma” group as it is. This group cannot be combined
with any other groups without exceeding the 40-record maximum.
ZIP: 60606
Street: Ma
42 Records
The “60606 / Ma” group exceeds the 40-record maximum, even at the finest
break level. MCD leaves this group as it is. MCD never attempts to break a group finer than your break-group settings.

Queries To determine which maximum break-group size to select, and more generally,

what kind of breaking results you might expect, you can do break group queries before you actually find duplicates. This should allow you to fine tune your break settings.
Chapter 5: Break groups
41

Configuration file If you are using the MCD configuration file, you can use the break configuration

file, mpbreak.cfg, to set up adaptive breaking. For instructions about completing the configuration parameters, see the comments in the configuration file.
The configuration file replaces calls to mp_break_set_info() and mp_breakfld_set_info().

Function calls Set up adaptive breaking as you would normal breaking (see “Set up normal

breaking” on page 39), but make an additional call to mp_break_set_info() to set
the maximum number of record keys that MCD can place into a combined break group. Also, be sure to set the break mode to adaptive.
42
Match/Consolidate Library Reference

Set up automatic breaking

Automatic breaking is the easiest way to set up break groups. You make a few settings, and MCD selects one or more break groups that are common to all of the reference files. You can query the resultant break groups and accept them or modify them.

Fields used MCD limits automatic breaking to the following fields:

ZIP Code
Street name
Street range (house number)
Last name
To use automatic breaking, at least one of these fields must be common among all of your reference files. You can set additional break groups if you wish.

Optional settings You may set the type of matching strategy, the tightness, and the density. This

information helps MCD decide which break fields to select.

Sequence of function calls

Before breaking, you must specify the work directory and the MCD Library session name so that work files can be built.
1. Optional: You may wish to write an exit function to report progress to the
user during the process of finding break groups. Register your function by calling mp_misc_set_exit_progress() during initialization.
2. Call mp_break_set_auto() to set up automatic breaking.
3. Specify which reference file(s) to use. Call mp_input_set_num_refs() to set
the number of reference files, then call mp_input_set_ref_num() once for each reference file to attach it to the session.
4. Call mp_break_find_groups() to form break groups.
Chapter 5: Break groups
43

Query break groups

After forming break groups, you can use queries to evaluate your break strategy or to find duplicates one break group at a time.

Evaluate a breaking strategy

Select a group with which to find duplicates

To evaluate your breaking strategy, you can get the following information:
Total number of break groups
Number of record keys in the largest break group
Break string for the largest break group
Break strings for a given break group
Break string lengths
Number of record keys found for a particular user query
When performing a query, you must set at least one break field, and the order of the query fields must match the order of the break field.
For example, if you broke on all five digits of ZIP Code and three characters of street name, you could query on 1, 2, 3, 4, or 5 digits of ZIP Code, or 1, 2, 3, 4, or 5 digits of ZIP Code plus 1, 2, or 3 characters of street name. However, a query of just 1, 2, or 3 characters of street name would not be valid.
Break group queries allow you to select one group of records at a time with which to subsequently find duplicates. You might take this approach to save time while testing your matching setup, or simply to break the search job down into manageable units.
For example, suppose you broke on five digits of ZIP and three characters of street name. Some break groups such as “54601MAI” may contain very few keys, so you might want to break “less fine” in order to catch more duplicates. You could query on a three-digit ZIP such as “546”, get the query results back, and decide whether to find duplicates in that group. If the query results said that the “546” group contained a million keys, you would probably want to do another query, perhaps on a five-digit ZIP such as “54601”.
You can also do multiple queries to form a group for a duplicate search. For example, suppose you wanted to find dupes in the “54601” break group, but you also wanted to include the “ ” break group (blank ZIP Code). You could do two queries and use the results to find duplicates.
When query results are acceptable, you can find duplicates by passing the break­query identifiers to the dupe search function mp_dupesrch_break_queries().
If you use multiple queries to form a group and your query results contain overlapping records, your dupe results may be difficult to interpret, because you may end up comparing a record to itself.
44
Match/Consolidate Library Reference

Sequence of function calls

The sequence of function calls is as follows:
1. Form break groups.
2. Call mp_breakqry_get_largest_group(). It’s a quick way to detect a problem.
If your largest break group is too large, your breaking strategy may need to be adjusted.
3. Get a list of all break strings, so you’ll know what to query:
Call mp_breakqry_get_str_len() to get the break-string length. Allocate
an array for the list.
Call mp_breakqry_get_num_groups() to get the number of break groups
that were found. Use this number to increment through a loop.
Enter a loop. Inside the loop, call the function
mp_breakqry_get_next_str() to get the next break string.
4. Call mp_breakqry_init() to set states and allocate memory for the query
process.
5. Call mp_breakqry_set_qryfld() to select a break field and set the query value
for this field.
6. Call mp_breakqry_do_query() to execute your query.
7. Call mp_breakqry_get_num_keys() to get the number of keys in the break
group(s) that met your query.
8. Call mp_breakqry_term() to free memory allocated for the query.
Chapter 5: Break groups
45
46
Match/Consolidate Library Reference
Chapter 6: Duplicate search and results
This chapter provides an introduction to the duplicate search and results function, explains how Match/Consolidate (MCD) selects keys to compare, and lists the sequence of function calls for finding and retrieving duplicates.
Chapter 6: Duplicate search and results
47

Introduction

This introduction provides an overview. For a more detailed discussion of the matching process, refer to the User’s Guide to Record Matching.

Find duplicates You can find duplicates in a batch process, which finds duplicates in all break

groups, or you can search selected break groups.
To find duplicates within all break groups in all specified reference files, use
batch mode.
To find duplicates within user-specified break groups only, use custom break-
group queries as discussed in “Query break groups” on page 44.

Query results Match/Consolidate stores duplicate results in a work file. You can save these

results for later, or you can query them right away. Duplicate results are available for each match level and can only be queried one match level at a time.

Use an exit routine to override decisions

You can override a decision that the Match Library has made or will make about a comparison.
For example, you can set a pair of keys to be duplicates or not duplicates before the Match Library compares them. You can also override a decision after the Match Library has already compared two keys.
If you set up a pre-comparison exit, MCD passes three pieces of information to your exit function:
Match level
Key IDs of the records to be compared
User-settable return code
From within the exit routine, you can call mp_dupexit_get_data() function to retrieve user data from the desired key. You can also call the mtc_key_get_*() functions to directly retrieve key-field data. Your function passes back a return code to indicate the results (if any) of your match decision.
Comparison exits are potentially expensive in terms of extra processing
!
time. The user exit could be called millions, even billions of times.
48
Match/Consolidate Library Reference

How Match/Consolidate selects keys to compare

The match feeder is the part of the MCD Library which selects pairs of keys from the break groups and passes them to Match Library for duplicate detection. The match feeder also selects which key will be the driver record, based on break­group priority settings.

Lists determine key pairs

Your list settings are critical in determining which key pairs will be selected and how each pair will be compared. The list comparison settings tell the match feeder which lists to compare within and between. The match levels and match specifications determine which Match Library matching sessions will be used, and how many levels of comparison will be performed.

Common format Before passing keys to the Match Library, the match feeder puts the fields in a

common format. For example, if the first name field is 10 characters in one reference file and 12 characters in another reference file, the match feeder selects a common format of 12 characters for the first name field.

Sequence of record comparisons

The MCD Library finds duplicates by comparing records one pair at a time. To find duplicates in a break group of just ten records, up to 45 comparisons could be performed.
1
2
3
4
5
6
7
8
9
10
10987654321
MCD performs comparisons.
Comparisons performed.
sons performed.
Comparisons not performed
Comparisons not
because MCD does not com-
performed because
pare a record within itself.
we don’t compare a record with itself.
Comparisons not performed
Comparisons not
because once MCD compares
performed because
record 1 with record 2, it does not compare record 2 with
once weve compared
record 1.
#1 with #2, we don’t need to compare #2 with #1.
Compari-
If you have more than one match level, even more comparisons are done. All necessary comparisons are done at each match level.
Chapter 6: Duplicate search and results
49

Driver record When the search begins, record #1 is called the driver record. It is compared with

4
each non-driver record—#2, #3, #4, and so on, to record #10. Then record #2 becomes the driver. It is compared with #3, #4, and so on. Finally, record #9 becomes the driver and is compared to record #10. Match/Consolidate uses this process to proceed through all 45 comparisons.

Some comparisons are skipped

Some comparisons are skipped. If a record is already tagged as a duplicate, it loses its chance to be the driver. For example, suppose that on the very first comparison records #1 and #2 match. In this case, record #2 is tagged as a duplicate, and it never gets a chance to be the driver. Thus any comparisons that would have been made with record #2 as the driver will be skipped.
We assume that if records #1 and #2 match, there is no need for further comparisons with record #2. Suppose that the next comparison—#1 and #3— yields a match. Then all three records would be in the same duplicate group. We never actually compared record #2 with record #3, but we assume that if they both match record #1, then they match each other.
1098765432
MCD performs comparisons.
1
2
3
5
6
7
8
9
MM
m
Comparisons performed
Comparisons skipped because
Comparisons skipped
record 1 matches record 2.
because #1 matches #2
Comparisons skipped because
Comparisons skipped
record 1 matches record 3.
because #1 matches #3
M
Match found
MmMatch found
m
Match assumed because if
Match assumed because
record 1 matches record 2,
if #1 matches #2
and record 1 matches record
and #1 matches #3
3, then record 2 matches
then #2 matches #3
record 3.

Some comparisons are canceled

A comparison is canceled if any of the following is true:
The non-driver record has already been tagged as a duplicate.
Both records come from the same input list, and searching was deactivated
within that list.
The records are from two different lists, and searching was deactivated
between those two lists.
50
Match/Consolidate Library Reference

Sequence of function calls for finding and retrieving duplicates

For sample code, refer to suppfncs.c.

Duplicate search 1. Optional: During initialization, you may set up exit functions that will be

called during the duplicate-search process. Register your functions with MCD Library by calling mp_dupexit_get_data() and mp_dupexit_set_exit_compare(). Within your exit routine, call mp_dupexit_get_data() or mtc_key_get_*() to retrieve data.
2. Specify which reference file(s) to use.
Call mp_input_set_num_refs() to set the number of reference files.
For each reference file, call mp_input_set_ref_num() to attach the
reference file to the session.
If you previously specified which reference files to use (you must do this before you form break groups), you may get reference-file information by calling mp_input_get_num_refs() and mp_input_get_ref_num().
3. To search all break groups, call mp_dupesrch_batch(). This function sorts
keys into break groups then compares records within each group.
To search only selected break groups, do custom break-group queries and call mp_dupesrch_break_queries(), as discussed in “Query break groups” on
page 44. You might take this approach to save time while testing your
matching setup, or simply to break the search job down into manageable units.

Duplicate results 4. Call mp_duperes_set_match_level() to specify which match level to query.

5. Call mp_duperes_set_results_type() to specify which types of records to
retrieve.
6. To increment through loops, you’ll need some numbers. Call
mp_duperes_get_value() to get the number of duplicates found and the number of duplicate groups.
7. Call mp_duperes_get_keyid() to get the Match Library key ID so you can
query Match Library key fields.
8. Call mp_duperes_get_value() and get the number of result keys available.
Then, enter a loop to retrieve data from each key. Use the number of duplicates and the number of duplicate groups (from step 6) to increment through the loop.
Call mp_duperes_set_key_num() to set the desired key.
Call mp_duperes_get_value() to get information about matching results,
such as which duplicate group the key belongs to,
Call mp_duperes_get_data() to retrieve data from MCD key fields or the
miscellaneous field in the reference-file header.
Using the Match Library, call mtc_key_get_*() to get data from the
Match Library key fields.
Chapter 6: Duplicate search and results
51
52
Match/Consolidate Library Reference
Chapter 7: Match/Consolidate functions
This chapter describes each of the Match/Consolidate (MCD) Library functions.
Chapter 7: Match/Consolidate functions
53

mp_break_find_groups()

S y nopsis #include <mpbrkinf.h>

int mp_break_find_groups(mp_id);
MP_ID mp_id; Input: Session ID from mp_init()

Description Call mp_break_find_groups() to find break groups according to the current user

settings. At least one reference file must be attached to the session or the key work file must be defined. At least one break field must be defined before calling this function. You may not call this function while finding duplicates.
If you use batch mode to search for duplicates, you do not need to call this function. When you call mp_dupesrch_batch(), the mp_break_find_groups() function will be called automatically.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error MP_ERR_IGNORED — Function called out of sequence MP_ERR_INVVAL — Invalid value was specified MP_ERR_NONCOMMON_BREAK_FIELD — Invalid break field type MP_ERR_INV_REF_FILE — Invalid reference file

Example

int status; MP_ID mp_id;
/* Find break groups. */ status = mp_break_find_groups(mp_id);
54
Match/Consolidate Library Reference

mp_break_get_info()

Synopsis #include <mpbrkinf.h>

int mp_break_get_info(mp_id, break_info, break_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int break_info; Input: Type of information (see page 56). int *break_info_val; Output: Current setting (see page 56).

Description Call mp_break_get_info() to query various break group setting information. For

descriptions of the types of break information you can request, and the results you will receive, see mp_break_set_info() on page 56.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_ID mp_id; int break_info; int break_info_val;
/* Query the break mode setting */ status = mp_break_get_info(mp_id, MP_BREAK_INFO_BREAK_MODE, &break_info_val);
Chapter 7: Match/Consolidate functions
55

mp_break_set_auto()

Synopsis #include <mpbrkinf.h>

int mp_break_set_auto(mp_id, auto_type, tightness, density, max_num_auto_breaks, num_user_breaks, auto_status);
MP_ID mp_id; Input: Session ID from mp_init()
int auto_type; Input: Type of automatic breaking int tightness; Input: Matching tightness int density; Input: Density of records int max_num_auto_breaks; Input: Maximum number of automatic breaks int num_user_breaks;Input: Maximum number of additional user breaks int *auto_status;Output: Did auto breaking set any break fields?

Description Call mp_break_set_auto() to set up automatic breaking. Match/Consolidate will

determine which fields to use for breaking based on the key layouts of the reference files and the information you provide in this call. This function overwrites any previous break-field settings.
After calling mp_break_set_auto(), you can query the settings it made, if any, and set up additional break fields or modify the break fields that were set. The maximum value for max_num_auto_breaks is MAX_AUTO_BREAK_FIELDS, as defined in mpbrkinf.h.
Before calling this function, you must specify which reference files to use. For more information, refer to mp_input_set_num_refs() and mp_input_set_ref_num().
You can set the auto_type, tightness, and density. The more information you provide, the more intelligent automatic breaking becomes.
Parameter Values Description
auto_type MP_AUTO_RESIDENT Matching goal is to find one record per address.
MP_AUTO_HOUSEHOLD Matching goal is to find one record per household.
MP_AUTO_INDIVIDUAL Matching goal is to find one record per person.
MP_AUTO_UNKNOWN Matching goal is unknown (default).
tightness MP_AUTO_TIGHT Use strict matching when forming break groups.
MP_AUTO_MEDIUM Use medium matching when forming break groups.
MP_AUTO_LOOSE Use loose matching when forming break groups.
MP_AUTO_UNKNOWN Tightness of matching is unknown (default).
56
Match/Consolidate Library Reference
Parameter Values Description
density MP_AUTO_DENSITY_LOCAL Most records are from a small regional area—for example, a city
or state.
MP_AUTO_DENSITY_NATIONAL Records are distributed across the entire nation.
MP_AUTO_UNKNOWN Density of records is unknown (default).

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; int max_num_auto_breaks = 4; /* set no more than 4 auto break fields */ int num_user_breaks = 0; /* no additional user break fields */ int auto_status;
/* Set up auto breaking. */ status = mp_break_set_auto(mp_id, MP_AUTO_INDIVIDUAL, MP_AUTO_LOOSE, MP_AUTO_NATIONAL_DENSITY, max_num_auto_breaks, num_user_breaks, &auto_status);
Chapter 7: Match/Consolidate functions
57

mp_break_set_info()

Synopsis #include <mpbrkinf.h>

int mp_break_set_info(mp_id, break_info, break_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int break_info; Input: Type of information
int break_info_val; Input: Value

Description Call mp_break_set_info() to set up normal or adaptive breaking. If you are using

automatic breaking, you do not need to call this function.
Values for break_info Values for break_info_val Description
MP_BREAK_INFO_NUM_FLDS 0 to 16 Number of break fields set for this
session.
MP_BREAK_INFO_BREAK_MODE MP_BREAK_MODE_NORMAL
MP_BREAK_MODE_ADAPTIVE
MP_BREAK_INFO_COMPBUF_SPANS >0 Number of comparison-buffer
MP_BREAK_INFO_ADAPTIVE_3DG_ZIP MP_TRUE
MP_FALSE
MP_BREAK_INFO_ADAPTIVE_MAXKEYS 2 to

Configuration file Rather than calling this function, you can set these options in the Break

LONG_MAX
Note that LONG_MAX is a C run­time library definition. If you are pro­gramming in another language, use the equivalent maximum for your system.
Break mode for this session. Default value is
spans allowed while finding dupli­cates for this session. Default value is 1.
Indicates whether or not to auto­matically use the first three digits of the ZIP Code field. Applies to adaptive breaking only. Default value is
Maximum number of keys that can be combined into a single break group while finding adaptive break groups.
If no value is specified, the maxi­mum number of keys allowed is the number of keys that can fit in the work buffer. See mp_misc_set_option_info().
TRUE
NORMAL
.
.
configuration file.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
58
Match/Consolidate Library Reference

Example

int status; MP_ID mp_id;
/* Set the break mode to be adaptive breaking. */ status = mp_break_set_info(mp_id, MP_BREAK_INFO_BREAK_MODE, MP_BREAK_MODE_ADAPTIVE);
Chapter 7: Match/Consolidate functions
59

mp_breakfld_get_info()

Synopsis #include <mpbrkinf.h>

int mp_breakfld_get_info(mp_id, breakfld_num, breakfld_info, breakfld_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int breakfld_num; Input: Break field number int breakfld_info; Input: Type of information to get (see page 61)
int *breakfld_info_val; Output: Break field information

Description Call mp_breakfld_get_info() to get break-field settings. For a list of the

information you can get, see mp_breakfld_set_info() on page 61.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_INVVAL — Value not valid

Example

int status; MP_ID mp_id; int breakfld_info_val;
/* Query the break field type of the first break field. */ status = mp_breakfld_get_info(mp_id, 1, MP_BREAKFLD_INFO_FLDTYPE, &breakfld_info_val);
60
Match/Consolidate Library Reference

mp_breakfld_set_info()

Synopsis #include <mpbrkinf.h>

int mp_breakfld_set_info(mp_id, breakfld_num, breakfld_info, breakfld_info_val);
MP_ID mp_id; Input: Session ID from mp_init()
int breakfld_num; Input: Break field number int breakfld_info; Input: Type of information to set (see table below)
int breakfld_info_val; Input: Value

Description Call mp_breakfld_set_info() to set the following break-field information:

Value for breakfld_info Values for breakfld_info_val Description and valid settings
MP_BREAKFLD_INFO_FLDTYPE For valid field types, see the Match
Library header file
MP_BREAKFLD_INFO_STARTPOS 1 to the length of the field The starting break position in the field.
MP_BREAKFLD_INFO_FLDLEN 1 to (field length – STARTPOS + 1) The length of the break field—in other

Configuration file Rather than calling this function, you can set these options in the Break

mtckey.h
.
The type of Match Library key field.
words, the number of characters from the field to use for breaking.
configuration file.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id;
/* Set the field type of the first break field. */ status = mp_breakfld_set_info(mp_id, 1, MP_BREAKFLD_INFO_FLDTYPE,MTC_KEYFLD_ZIP);
Chapter 7: Match/Consolidate functions
61

mp_breakqry_do_query()

Synopsis #include <mpbquery.h>

int mp_breakqry_do_query(mp_breakqry_id);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query session ID from
mp_breakqry_init()

Description Call mp_breakqry_do_query() to execute a previously defined break-group

query. After the query has been performed, you can get the results of the query and use the results to find duplicates.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

MP_ERR_INV_WORK_FILE — One or more work files are corrupt
int status; MP_BREAKQRY_ID mp_breakqry_id;
/* Perform a break-group query. */ status = mp_breakqry_do_query(mp_breakqry_id);
62
Match/Consolidate Library Reference

mp_breakqry_get_largest_group()

Synopsis #include <mpbquery.h>

int mp_breakqry_get_largest_group(mp_id, largest_bg_count, largest_bg_str);
MP_ID mp_id; Input: Session ID from mp_init()
unsigned long *largest_bg_count; Output: Size of largest break group char *largest_bg_str; Output: Break string of largest break group

Description Call mp_breakqry_get_largest_group() to query the size and break string of the

largest break group.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_INVFLD — Passed in a NULL pointer for largest_bg_str

Example

int status; MP_ID mp_id; unsigned long largest_bg_count; char largest_bg_str[256];
/* Query the largest break group */ status = mp_breakqry_get_largest_group(mp_id, &largest_bg_count, largest_bg_str);
Chapter 7: Match/Consolidate functions
63

mp_breakqry_get_next_str()

Synopsis #include <mpbquery.h>

int mp_breakqry_get_next_str(mp_id, break_str_num, break_str);
MP_ID mp_id; Input: Session ID from mp_init()
int break_str_num; Input: Number of the break group whose break string to
query
char *break_str; Output: Break-string value

Description Call mp_breakqry_get_next_str() to query the break strings of the break groups to

be used in duplicate detection. You must call mp_break_find_groups() before calling this function.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence

Example

MP_ERR_INVVAL — Invalid value specified
MP_ERR_INV_WORK_FILE — Invalid work file encountered
int status; MP_ID mp_id; int break_str_num = 1; char break_str[256];
/* Get the break string of first break group */ status = mp_breakqry_get_next_str(mp_id, break_str_num, break_str);
64
Match/Consolidate Library Reference

mp_breakqry_get_num_groups()

Synopsis #include <mpbquery.h>

int mp_breakqry_get_num_groups(mp_id, num_groups);
MP_ID mp_id; Input: Session ID from mp_init()
unsigned long *num_groups; Output: Number of break groups

Description Call mp_breakqry_get_num_groups() to query the number of break groups to be

used in duplicate detection.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INV_WORK_FILE — Invalid work file encountered

Example

int status; MP_ID mp_id; unsigned long num_groups;
/* Get the number of break groups that will be used in dupe detection */ status = mp_breakqry_get_num_groups(mp_id, &num_groups);
Chapter 7: Match/Consolidate functions
65

mp_breakqry_get_num_keys()

Synopsis #include <mpbquery.h>

int mp_breakqry_get_num_keys(mp_breakqry_id, total_bgkeys);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query session ID from
mp_breakqry_init()
unsigned long *total_bgkeys; Output: Number of keys found

Description Call mp_breakqry_get_num_keys() to learn the number of keys found for your

break-group query. You can then decide if a query was too restrictive or not restrictive enough. For example, if a query returns thousands of keys, you might narrow the query before finding duplicates.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_BREAKQRY_ID mp_breakqry_id; unsigned long total_bgkeys;
/* Get the number of keys found for a break-group query. */ status = mp_breakqry_get_num_keys(mp_breakqry_id, &total_bgkeys);
66
Match/Consolidate Library Reference

mp_breakqry_get_str_len()

Synopsis #include <mpbquery.h>

int mp_breakqry_get_str_len(mp_id, break_str_len);
MP_ID mp_id; Input: Session ID from mp_init()
int *break_str_len; Output: Length of break-group strings

Description Call mp_breakqry_get_str_len() to query the length of the break strings to be

used in duplicate detection.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_ID mp_id; int break_str_len;
/* Get the length of the break strings. */ status = mp_breakqry_get_str_len(mp_id, &break_str_len);
Chapter 7: Match/Consolidate functions
67

mp_breakqry_init()

Synopsis #include <mpbquery.h>

int mp_breakqry_init(mp_id, mp_breakqry_id);
MP_ID mp_id; Input: Session ID from mp_init() MP_BREAKQRY_ID *mp_breakqry_id; Output: Break-query session ID

Description Call mp_breakqry_init() to initialize a session for querying break groups. You can

call this function multiple times to initialize multiple query sessions.
Before calling this function, you must attach at least one reference file to the MCD Library session, have at least one break field defined, and call mp_break_find_groups().

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence

Example

int status; MP_ID mp_id; MP_BREAKQRY_ID mp_breakqry_id;
/* Initialize a break group query session */ status = mp_breakqry_init(mp_id, &mp_breakqry_id);
68
Match/Consolidate Library Reference

mp_breakqry_set_qryfld()

Synopsis #include <mpbquery.h>

int mp_breakqry_set_qryfld(mp_breakqry_id, break_fld_num, break_pos, data);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query-session ID from
mp_breakqry_init()
int break_fld_num; Input: Break-field number int break_pos; Input: Starting position within the break field char *data; Input: Break-string value to query. Must be a null-terminated string.

Description Call mp_breakqry_set_qryfld() to set a break field to be used as part of a break-

group query. You must set at least one break field to form a valid break-group query. You must set query fields in the same order as the break fields are defined. For example if the first break field is ZIP Code, then the only possible first query field is ZIP Code.
The break_pos must be between 1 and the break length of the break field. The break-string value passed in as data can be the first 1 to n characters of a break field starting at the break_pos.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
MP_ERR_INV_QUERY — Invalid query defined

Example

int status; MP_BREAKQRY_ID mp_breakqry_id; int break_fld_num = 1; int break_pos = 1; char *data = “54601”; /* the break string to query */
/* Set a break field to use in a break group query */ status = mp_breakqry_set_qryfld(mp_breakqry_id, break_fld_num, break_pos, data);
Chapter 7: Match/Consolidate functions
69

mp_breakqry_term()

Synopsis #include <mpbquery.h>

int mp_breakqry_term(mp_breakqry_id);
MP_BREAKQRY_ID mp_breakqry_id; Input: Query session ID from
mp_breakqry_init()

Description Call mp_breakqry_term() to terminate a break-query session and free all

information and memory associated with the break-query session.
You must call mp_breakqry_init() before calling mp_breakqry_term().

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence

Example

int status; MP_BREAKQRY_ID mp_breakqry_id;
/* terminate a break group query session */ status = mp_breakqry_term(mp_breakqry_id);
70
Match/Consolidate Library Reference

mp_cfg_close()

Synopsis #include <mpcfg.h>

int mp_cfg_close(mp_cfg_id);
MP_CFG_ID mp_cfg_id; Input: Session ID from mp_cfg_open()

Description Call mp_cfg_close() to close a configuration-file session. Call mp_cfg_close() for

each session that was initialized with mp_cfg_open().

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error

Example

int status; MP_CFG_ID mp_cfg_id;
/* Close a Match/Consolidate Library configuration-file session. */ status = mp_cfg_close(mp_cfg_id);
Chapter 7: Match/Consolidate functions
71

mp_cfg_get_num_ref_files()

Synopsis #include <mpcfg.h>

int mp_cfg_get_num_ref_files(mp_cfg_id, num_refs);
MP_CFG_ID mp_cfg_id; Input: Session ID from mp_cfg_open()
int *num_refs; Output: Number of reference files

Description Call mp_cfg_get_num_ref_files() to query the number of reference files that were

set by the configuration-file session.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_CFG_ID mp_cfg_id; int num_refs;
/* Query the number of reference files set by the cfg file session */ status = mp_cfg_get_num_ref_files(mp_cfg_id, &num_refs);
72
Match/Consolidate Library Reference

mp_cfg_get_ref_info()

Synopsis #include <mpcfg.h>

int mp_cfg_get_ref_info(mp_cfg_id, refnum, mp_ref_id, refname, mtc_key_id, key_misc_len, const_list_id, list_id_len);
MP_CFG_ID mp_cfg_id; Input: Session ID from mp_cfg_open()
int refnum; Input: Reference-file number
MP_REF_ID *mp_ref_id; Output: Reference session ID
char *refname; Output: Name of reference file
MTC_KEY_ID *mtc_key_id; Output: Match Library key ID for this reference
file
int *key_misc_len; Output: Length of key-miscellaneous field in file header
char *const_list_id; Output: Constant list ID from file header int *list_id_len; Output: Length of list ID key field

Description Call mp_cfg_get_ref_info() to get information about a reference file in a

configuration-file session.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_CFG_ID mp_cfg_id; int refnum; MP_REF_ID mp_ref_id; char refname[256]; MTC_KEY_ID mtc_key_id; int key_misc_len; char const_list_id[64]; int list_id_len;
/* Query info about the first ref set through the cfg file session */ status = mp_cfg_get_ref_info(mp_cfg_id, &refnum, &mp_ref_id, refname,&mtc_key_id, &key_misc_len, const_list_id, &list_id_len);
Chapter 7: Match/Consolidate functions
73

mp_cfg_open()

Synopsis #include <mpcfg.h>

int mp_cfg_open(mtc_cfg_id, mp_id, mp_cfg_id, cfg_file_name);
MTC_CFG_ID mtc_cfg_id; Input: Session ID from mtc_cfg_open()
MP_ID mp_id; Input: Session ID from mp_init()
MP CFG_ID *mp_cfg_id; Output: Match/Consolidate configuration-file
session ID
char *cfg_file_name; Input: Name of Match/Consolidate “overall” configuration file to open

Description Call mp_cfg_open() to open a configuration-file session for the MCD Library.

Make one call to mp_cfg_open() for each session desired. You cannot call this function unless you have successfully called mtc_cfg_open().

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:

Example

MP_ERR_SYSTEM — System error
MP_ERR_CFG_INVHEADER — Invalid configuration file header
MP_ERR_CFG_INVLINE — Invalid line in configuration file
MP_ERR_CFG_FILE_NOT_OPEN — Configuration file not open
MP_ERR_CFG_MISSING_ENTRIES — Required line missing from configuration file
MP_ERR_CFG_REPEAT_CFG_FILE — Invalid repeated value in configuration file
MP_ERR_CFG_MISSING_LABEL — File label missing from configuration file
MP_ERR_CFG_WRONG_ORDER — Configuration file labels in wrong order
MP_ERR_CFG_DEFAULT — Configuration file error
int status; MTC_CFG_ID mtc_cfg_id; MP_ID mp_id; MP_CFG_ID mp_cfg_id;
/* Open a Match/Consolidate Library cfg file session where settings are in file named "mp.cfg" */ status = mp_cfg_open(mtc_cfg_id, mp_id, &mp_cfg_id, "mp.cfg");
74
Match/Consolidate Library Reference

mp_duperes_get_data()

Synopsis #include <mpdupes.h>

int mp_duperes_get_data(mp_id, duperes_info, ret_res_info);
MP_ID mp_id; Input: Session ID from mp_init()
int duperes_info; Input: Key field from which to retrieve data
(see table below)
char *ret_res_info; Output: Requested data

Description Call mp_duperes_get_data() to get data from a MCD key field.

Value for duperes_info Description
MP_DUPERES_KEY_MISC Key-miscellaneous field.
MP_DUPERES_PRIORITY_FLD_VALUE Field-priority field.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:

Example

MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVFLD — Invalid field specified
MP_ERR_INVVAL — Invalid value specified
int status; MP_ID mp_id; char ret_res_info[256];
/* Query the key miscellaneous data from a duplicate-result key */ status = mp_duperes_get_data(mp_id, MP_DUPERES_KEY_MISC, ret_res_info);
Chapter 7: Match/Consolidate functions
75

mp_duperes_get_keyid()

Synopsis #include <mpdupes.h>

int mp_duperes_get_keyid(mp_id, mtc_key_id);
MP_ID mp_id; Input: Session ID from mp_init() MTC_KEY_ID *mtc_key_id; Output: Match key ID

Description Call mp_duperes_get_keyid() to get the match-key ID needed to query the Match

Library key fields.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence

Example

int status; MP_ID mp_id; MTC_KEY_ID mtc_key_id;
/* Get the mtc_key_id needed to query Match key fields */ status = mp_duperes_get_keyid(mp_id, &mtc_key_id);
76
Match/Consolidate Library Reference

mp_duperes_get_value()

Synopsis #include <mpdupes.h>

int mp_duperes_get_value(mp_id, duperes_info, ret_res_info);
MP_ID mp_id; Input: Session ID from mp_init()
int duperes_info; Input: Information to retrieve (see table on next page) unsigned long *ret_res_info;Output: Information
Description Call mp_duperes_get_value() to query information about the duplicate results.
For a list of the information you can query, see the table below.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specifie

Example

Values for duperes_info Description

MP_DUPERES_LARGEST_GROUP_COUNT The total number of records in the largest duplicate group.
MP_DUPERES_NUM_DUPES The total number of duplicates found.
MP_DUPERES_NUM_GROUPS The total number of duplicate groups. Use this number to loop through
MP_DUPERES_NUM_KEYS Number of result keys available for querying.
MP_DUPERES_GROUP_NUM Group number of the duplicate group to which this record belongs.
MP_DUPERES_UNIQUE_NUM Unique number for this record.
MP_DUPERES_GROUP_ORDER This record’s position within the duplicate group.
MP_DUPERES_GROUP_COUNT Total number of records in the duplicate group to which this record
MP_DUPERES_DRIVER Was this record used as the driver record for comparisons?
int status; MP_ID mp_id; unsigned long ret_res_info = 0;
/* Query how many dupe results keys there are */ status = mp_duperes_get_value(mp_id, MP_DUPERES_NUM_KEYS, &ret_res_info);
the results.
belongs.
MP_DUPERES_RULE_NUM Rule number of the rule used to make the match decision.
MP_DUPERES_RULE_SCORE Score for the rule used to make the match decision.
MP_DUPERES_WEIGHTED_SCORE Decision weighted score for this record.
MP_DUPERES_SUPER_COUNT Number of super lists.
Chapter 7: Match/Consolidate functions
77
Values for duperes_info Description
MP_DUPERES_SUPER_NUM Super-list number of the super list to which this record belongs.
MP_DUPERES_LIST_COUNT Number of lists.
MP_DUPERES_LIST_NUM List number of the list to which this record belongs.
MP_DUPERES_LIST_PRIORITY The list-priority value for this record.
MP_DUPERES_LIST_TYPE The type of list to which this record belongs.
MP_DUPERES_DUPE Is this record a subordinate duplicate?
MP_DUPERES_MASTER Is this record a master duplicate?
MP_DUPERES_MATCHSPEC_EXIT_CALLED Was a user match exit called?
MP_DUPERES_COMPEXIT_DECISION Did a user match exit make the match decision?
MP_DUPERES_MATCHSPEC_INDEX Which match specification caused the decision?
MP_DUPERES_MATCH_LEVEL Which match level is being queried?
78
Match/Consolidate Library Reference

mp_duperes_set_key_num()

Synopsis #include <mpdupes.h>

int mp_duperes_set_key_num(mp_id, key_num);
MP_ID mp_id; Input: Session ID from mp_init()
unsigned long key_num; Input: Number of results key

Description Call mp_duperes_set_key_num() to set the key number of the duplicate-results

key with which you want to work. The default is to query the first key.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
MP_ERR_MTC_CALL — Internal Match Library error

Example

int status; MP_ID mp_id; unsigned long key_num = 10;
/* Query the 10th dupe results key */ status = mp_duperes_set_key_num(mp_id, key_num);
Chapter 7: Match/Consolidate functions
79

mp_duperes_set_match_level()

Synopsis #include <mpdupes.h>

int mp_duperes_set_match_level(mp_id, level_num);
MP_ID mp_id; Input: Session ID from mp_init()
int level_num; Input: Match level number to work with

Description Call mp_duperes_set_match_level() to set the match level at which to query

duplicate-results keys. The default is to query is the first match level.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; int level_num = 2;
/* Query at match level 2 */ status = mp_duperes_set_match_level(mp_id, level_num);
80
Match/Consolidate Library Reference

mp_duperes_set_results_type()

Synopsis #include <mpdupes.h>

int mp_duperes_set_results_type(mp_id, duperes_type);
MP_ID mp_id; Input: Session ID from mp_init()
int duperes_type; Input:Type of duplicate-results keys with which to work
Description Call mp_duperes_set_results_type() to set the type of duplicate results with
which you want to work. The default is to query duplicate keys only.

Values for duperes_type Description

MP_DUPERES_ONLY_DUPES Work with duplicates only (default).
MP_DUPERES_ALL_RECORDS Work with duplicates and unique records.
Querying all records requires more processing time because there are more records to mark.Returns
Returns MP_OK if successful or MP_INVID if the session ID is not valid; otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:

Example

MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
int status; MP_ID mp_id;
/* Set the results keys to query to be all input keyfile records. */ status = mp_duperes_set_results_type(mp_id, MP_DUPERES_ALL_RECORDS);
Chapter 7: Match/Consolidate functions
81

mp_dupesrch_batch()

Synopsis #include <mpdupes.h>

int mp_dupesrch_batch(mp_id);
MP_ID mp_id; Input: Session ID from mp_init()

Description Call mp_dupesrch_batch() to find duplicates according to all of the current user

settings. Match/Consolidate will find all duplicates in all reference files and accumulate the results in a work file.
Before calling this function, you must attach at least one reference file to the session (or create the key file), define all break-group settings, define at least one list, define at least one match specification, and enable comparisons within or between one or more lists.
You may not call mp_dupesrch_batch() while finding break groups or finding duplicates.
The function mp_break_find_groups() will be called automatically as part of the duplicate-detection process.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:

Example

MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value was specified
MP_ERR_INV_WORK_FILE — Invalid work file encountered
MP_ERR_INV_LIST_ID — Invalid list ID encountered
MP_ERR_NONCOMMON_BREAK_FIELD — Invalid break field type
MP_ERR_INV_REF_FILE — Invalid reference file specified
MP_ERR_MTC_CALL — Internal Match Library error
int status; MP_ID mp_id;
/* Find duplicates in all break groups. */ status = mp_dupesrch_batch(mp_id);
82
Match/Consolidate Library Reference

mp_dupesrch_break_queries()

Synopsis #include <mpdupes.h>

int mp_dupesrch_break_queries(mp_id, num_queries, mp_breakqry_id_list);
MP_ID mp_id; Input: Session ID from mp_init()
int num_queries; Input: Number of break-group queries
MP_BREAKQRY_ID *mp_breakqry_id_list; Input: List of query IDs

Description Call mp_dupesrch_break_queries() to find duplicates. Match/Consolidate finds

duplicates in each set of keys from the specified break-group queries, and accumulates results in a work file.
Before calling this function, you must attach at least one reference file to the session (or you must create the key work file). This function cannot be called while finding break groups or finding duplicates.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:

Example

MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value was specified
MP_ERR_INV_WORK_FILE — Invalid work file encountered
MP_ERR_INV_LIST_ID — Invalid list id encountered
MP_ERR_NONCOMMON_BREAK_FIELD — Invalid break field type
MP_ERR_INV_REF_FILE — Invalid reference file specified
MP_ERR_MTC_CALL — Internal Match Library error
int status; MP_ID mp_id; int num_queries = 2; /* Two user queries */ MP_BREAKQRY_ID *mp_breakqry_id_list; /* List of query IDs */
mp_breakqry_id_list = malloc(sizeof(MP_BREAKQRY_ID) * num_queries); mp_breakqry_id_list[0] = first_query_ID; /* ID ret’d from query */ mp_breakqry_id_list[1] = second_query_ID; /* ID ret’d from query */
/* Search for dupes on 2 groups of keys defined by break queries */ status = mp_dupesrch_break_queries(mp_id, num_queries, mp_breakqry_id_list);
Chapter 7: Match/Consolidate functions
83

mp_dupexit_get_data()

Synopsis #include <mpdupes.h>

int mp_dupexit_get_data(mp_id, dupe_rec_type, datatype, ret_data);
MP_ID mp_id; Input: Session ID from mp_init()
int dupe_rec_type; Input: Record from which to get data
int datatype; Input: Type of data
char *ret_data; Output: User buffer to hold data

Description Call mp_dupexit_get_data() from within your comparison-exit function to get

data from MCD key fields for either of the two records being matched. The data is copied to the user’s buffer ret_data, which must be large enough to hold all the data stored in the key field plus a NULL terminator.
Parameter Values Description
dupe_rec_type MP_DUP_DRIVER_RECORD
MP_DUP_SUBORDINATE_RECORD
datatype MP_REF_DATA_KEY_MISC
MP_REF_DATA_LIST_ID MP_REF_DATA_PRIORITY_FLD

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

Driver record. Subordinate record.
Key miscellaneous field. List ID field. Field-priority field.
otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVFLD — Invalid user buffer
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; char ret_data[255];
/* Query the key misc field of the subordinate key */ status = mp_dupexit_get_data(mp_id, MP_DUPE_SUBORDINATE_RECORD, MP_REF_DATA_KEY_MISC, ret_data);
84
Match/Consolidate Library Reference

mp_dupexit_set_exit_compare()

Synopsis #include <mpdupes.h>

int mp_dupexit_set_exit_compare(mp_id, dupe_exit, compare_exit);
MP_ID mp_id; Input: Session ID from mp_init()
int dupe_exit; Input: Type of comparison exit (see table)
int (*compare_exit)(match_level, driver_mtc_key_id, subordinate_mtc_key_id, dup_results)
int match_level;In: Match level
MTC_KEY_ID driver_mtc_key_id; In: Driver record key ID MTC_KEY_ID subordinate_mtc_key_id; In: Subordinate record key ID
int *dup_results;Out: Match decision

Description Call mp_dupexit_set_exit_compare() to set an exit routine to compare records

and decide whether or not they match. Match/Consolidate passes to you the match level, the key ID of the driver record, and the key ID of the subordinate record. Within your exit routine, you can call mp_dupexit_get_data() and mtc_key_get_flddata() to retrieve data from the keys.
You can set up the following types of exits:
Values for dupe_exit Description
MP_DUPEXIT_PRECOMP Call your comparison exit before passing
the keys to the Match Library.
MP_DUPEXIT_POSTCOMP_DUPE Call your comparison exit if the Match
Library finds that two keys to be duplicates.
MP_DUPEXIT_POSTCOMP_NODUPE Call your comparison exit if the Match
Library finds that two keys are not dupli­cates.
Your exit routine must pass back one of the following match decisions:
Values for dup_results Description
MP_DUP_RESULT_UNDECIDED Undecided.
MP_DUP_RESULT_DUPE The keys match.
MP_DUP_RESULT_NODUPE The keys do not match.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified
Chapter 7: Match/Consolidate functions
85

Example

int status; MP_ID mp_id; int compare_exit(int, MTC_KEY_ID, MTC_KEY_ID, int *);
/* Set the pre compare user exit routine */ status = mp_dupexit_set_exit_compare(mp_id, MP_DUPEXIT_PRECOMP, compare_exit);
86
Match/Consolidate Library Reference

mp_get_error_info()

Synopsis #include <mp.h>

int mp_get_error_info(mp_id, errnum, stdmsg, detailmsg);
MP_ID mp_id; Input: Session ID from mp_init()
int *errnum; Output: Error number char **stdmsg; Output: Pointer to buffer containing short error message char **detailmsg; Output: Pointer to buffer containing detailed error message

Description After a MCD function returns MP_ERROR, call mp_get_error_info() to get the

error number, a short error message, and a detailed error message. The maximum length of an error-message string is MP_MAX_MSG_LEN, as defined in mp.h. The message buffers may be overwritten the next time a MCD function is called.

Visual Basic If you are programming in Visual Basic, do not call this function. Instead, call

mp_get_error_messages() and mp_get_error_number().

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

MP_ID mp_id; int retval, errnum; char *stdmsg, *detailmsg;
/* call a function incorrectly to demonstrate error handling */ /* invalid session name error */ retval = mp_misc_set_work_file_info(mp_id, MP_WORK_FILE_SESSION_NAME, "");
/* check for an error */ if (retval != MP_OK) { if (retval == MP_ERROR) { /* error? */ /* get the error information */ retval = mp_get_error_info(mp_id, &errnum, &stdmsg, &detailmsg); if (retval == MP_OK) { /* print an error message */ printf("mp_misc_set_work_file_info() failed: <%d><%s><%s>\ n", errnum, stdmsg, detailmsg); } } else { /* invalid id */ printf("mp_misc_set_work_file_info() failed: Invalid ID!\ n"); }
break; }
Chapter 7: Match/Consolidate functions
87

mp_get_error_messages()

Synopsis #include <mp.h>

void mp_get_error_messages(mp_id, stdmsg, detailmsg);
MP_ID mp_id; Input: Session ID from mp_init()
char *stdmsg; Output: Short error message char *detailmsg; Output: Detailed error message

Description After a MCD function returns MP_ERROR, call mp_get_error_messages() to

retrieve the error messages. Subsequent MCD calls will reset the error messages.
The maximum length of an error-message string is MP_MAX_MSG_LEN, as defined in mp.h.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

MP_ID mp_id; int retval; char stdmsg[MP_MAX_MSG_LEN], detailmsg[MP_MAX_MSG_LEN];
/* call a function incorrectly to demonstrate error handling */ /* invalid session name error */ retval = mp_misc_set_work_file_info(mp_id, MP_WORK_FILE_SESSION_NAME, "");
/* check for an error */ if (retval != MP_OK) { if (retval == MP_ERROR) { /* error? */ /* get the error messages */ retval = mp_get_error_messages(mp_id, stdmsg, detailmsg); if (retval == MP_OK) { /* print an error message */ printf("mp_misc_set_work_file_info() failed: <%s><%s>\n", stdmsg, detailmsg); } } else { /* invalid id */ printf("mp_misc_set_work_file_info() failed: Invalid ID!\ n"); }
break; }
88
Match/Consolidate Library Reference

mp_get_error_number()

Synopsis #include <mp.h>

int mp_get_error_number(mp_id, errnum);
MP_ID mp_id; Input: Session ID from mp_init()
int *errnum; Output: Error number

Description After a MCD function call returns MP_ERROR, call mp_get_error_number()to

get the error number. Subsequent MCD calls will reset the error.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

MP_ID mp_id; int retval, errnum;
/* call a function incorrectly to demonstrate error handling */ /* invalid session name error */ retval = mp_misc_set_work_file_info(mp_id, MP_WORK_FILE_SESSION_NAME, "");
/* check for an error */ if (retval != MP_OK) { if (retval == MP_ERROR) { /* error? */ /* get the error number */ retval = mp_get_error_number(mp_id, &errnum); if (retval == MP_OK) { /* print an error message */ printf("mp_misc_set_work_file_info() failed: <%d>\n", errnum); } } else { /* invalid id */ printf("mp_misc_set_work_file_info() failed: Invalid ID!\ n"); }
break; }
Chapter 7: Match/Consolidate functions
89

mp_get_revision()

Synopsis #include <mp.h>

int mp_get_revision(revision_str);
char *revision_str; Output: Revision string

Description Call mp_get_revision() to get the version number of the MCD Library and all

associated Business Objects libraries. This function is especially useful if you need to call us for technical support. You must have sufficient space in your buffer for the string to be copied. The maximum length of the string is MP_MAX_REV_STR_LEN.

Returns Returns MP_OK if successful, otherwise MP_ERROR.

Example

int status; char revision_str[MP_MAX_REV_STR_LEN];
/* Get the Match/Consolidate Library revision string */ status = mp_get_revision(revision_str);
90
Match/Consolidate Library Reference

mp_init()

Synopsis #include <mp.h>

int mp_init(mtc_id, mp_id);
MTC_ID mtc_id; Input: Match session ID from mtc_init() MP_ID *mp_id; Output: Match/Consolidate session ID

Description Call mp_init() to initialize a MCD Library session and the global data used by it.

Make one call to mp_init() for each MCD session.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Default settings could not be initialized
MP_ERR_SYSTEM — System error
MP_ERR_SRT_CALL — Sort library could not be initialized

Example

MTC_ID mtc_id; MP_ID mp_id; int status;
/* Initialize the Match/Consolidate Library */ status = mp_init(mtc_id, &mp_id);
Chapter 7: Match/Consolidate functions
91

mp_input_get_num_refs()

Synopsis #include <mpinput.h>

int mp_input_get_num_refs(mp_id, num_refs);
MP_ID mp_id; Input: Session ID from mp_init()
int *num_refs; Output: Number of reference files

Description Call mp_input_get_num_refs() to find out how many reference files will be input

into the MCD Library and used for finding break groups and duplicates.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_ID mp_id; int num_refs;
/* Query the number reference files currently set for input into the Match/Consolidate Library. */ status = mp_input_get_num_refs(mp_id, &num_refs);
92
Match/Consolidate Library Reference

mp_input_get_ref_num()

Synopsis #include <mpinput.h>

int mp_input_get_ref_num(mp_ref_id, ref_num);
MP_REF_ID mp_ref_id; Input: Reference ID from mp_ref_open()
int *ref_num; Output: Reference-file number

Description Call mp_input_get_ref_num() to get the file number for a reference file. This

number tells you where the file stands in the sequence of reference files that were attached to the session by calling mp_input_set_ref_num().
If the reference ID is not found, then ref_num is set to 0 (zero).

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence

Example

int status; MP_REF_ID mp_ref_id; int ref_num;
/* Get reference file number from the mp_ref_id. */ status = mp_input_get_ref_num(mp_ref_id, &ref_num);
Chapter 7: Match/Consolidate functions
93

mp_input_set_num_refs()

Synopsis #include <mpinput.h>

int mp_input_set_num_refs(mp_id, num_refs);
MP_ID mp_id; Input: Session ID from mp_init()
int num_refs; Input: Number of reference files (1–255)

Description Call mp_input_set_num_refs() to set the number of reference files that you will

use as input to the MCD Library for this session. You must call mp_input_set_num_refs() before finding break groups and before finding duplicates.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_SYSTEM — System error
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; int num_refs = 1;
/* Set the number reference files that will be input into the Match/Consolidate Library. */ status = mp_input_set_num_refs(mp_id, num_refs);
94
Match/Consolidate Library Reference

mp_input_set_ref_num()

Synopsis #include <mpinput.h>

int mp_input_set_ref_num(mp_ref_id, ref_num);
MP_REF_ID mp_ref_id; Input: Reference ID from mp_ref_open()
int ref_num; Input: Reference-file number

Description Call mp_input_set_ref_num() to attach a reference file to the session. You must

set at least one reference file before finding break groups or finding duplicates.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_REF_ID mp_ref_id; int ref_num = 1;
/* Input a reference file into a Match/Consolidate Library session. */ status = mp_input_set_ref_num(mp_ref_id, ref_num);
Chapter 7: Match/Consolidate functions
95

mp_list_get_default_action()

Synopsis #include <mplist.h>

int mp_list_get_default_action(mp_id, default_action);
MP_ID mp_id; Input: Session ID from mp_init()
int *default_action; Output: Default list action (see page 110).

Description Call mp_list_get_default_action() to query the default list action. The default list

action determines what action is taken when a record does not belong to any of your defined lists. For a list of default-action values, see page 110.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_ID mp_id; int default_action;
/* Query the default list action */ status = mp_list_get_default_action(mp_id, &default_action);
96
Match/Consolidate Library Reference

mp_list_get_default_list_num()

Synopsis #include <mplist.h>

int mp_list_get_default_list_num(mp_id, list_num);
MP_ID mp_id; Input: Session ID from mp_init()
int *list_num; Output: List number of default list

Description Call mp_list_get_default_list_num() to find out which list is the default list.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid.

Example

int status; MP_ID mp_id; int list_num;
/* Query the default list number */ status = mp_list_get_default_list_num(mp_id, &list_num);
Chapter 7: Match/Consolidate functions
97

mp_list_get_list_attr()

Synopsis #include <mplist.h>

int mp_list_get_list_attr(mp_id, list_num, list_attr_type, list_attr_value);
MP_ID mp_id; Input: Session ID from mp_init()I
int list_num; Input: List number int list_attr_type; Input: Type of list attribute (see the table on page 113). int *list_attr_value; Output: Current setting for this list attribute (see page 113).

Description Call mp_list_set_list_attr() to get the current setting for a particular list attribute.

For a list of the values for list_attr_type and list_attr_value, see page 113.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; int list_num = 1; int list_attr_value;
/* Query the list type of this list */ status = mp_list_get_list_attr(mp_id, list_num, MP_LIST_ATTR_TYPE, &list_attr_value);
98
Match/Consolidate Library Reference

mp_list_get_list_id()

Synopsis #include <mplist.h>

int mp_list_get_list_id(mp_id, list_num, list_id);
MP_ID mp_id; Input: Session ID from mp_init()
int list_num; Input: List number char *list_id; Output: List ID

Description Call mp_list_get_list_id() to get the list ID for a particular list. You must provide

enough space with to hold the entire list ID value plus a NULL terminator.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; int list_num = 1; char list_id[256];
/* Query the list_ID of list number list_num. */ status = mp_list_get_list_id(mp_id, list_num, list_id);
Chapter 7: Match/Consolidate functions
99

mp_list_get_list_name()

Synopsis #include <mplist.h>

int mp_list_get_list_name(mp_id, list_num, list_name);
MP_ID mp_id; Input: Session ID from mp_init()
int list_num; Input: List number char *list_name; Output: Name of list

Description Call mp_list_get_list_name() to get the name of a list. You must provide enough

space to hold the entire list name plus a NULL terminator. The maximum list­name length is MP_MAX_LIST_NAME_LEN.

Returns Returns MP_OK if successful or MP_INVID if the session ID is not valid;

otherwise, returns MP_ERROR. Your error handler may call mp_get_error_info() or mp_get_error_number() to retrieve one of the following possible error numbers:
MP_ERR_IGNORED — Function called out of sequence
MP_ERR_INVVAL — Invalid value specified

Example

int status; MP_ID mp_id; int list_num = 1; char list_name[MP_MAX_LIST_NAME_LEN + 1];
/* Query the list name of list number list_num. */ status = mp_list_get_list_name(mp_id, list_num, list_name);
100
Match/Consolidate Library Reference
Loading...