SAP Match/Consolidate 8.00c Library Reference

Match/Consolidate

Library Reference

Match/Consolidate 8.00c
April 2009
Copyright information © 2009 SAP® BusinessObjects™. All rights reserved. SAP BusinessObjects and its logos,
BusinessObjects, Crystal Reports®, SAP BusinessObjects Rapid Mart™, SAP BusinessObjects Data Insight™, SAP BusinessObjects Desktop Intelligence™, SAP BusinessObjects Rapid Marts®, SAP BusinessObjects Watchlist Security™, SAP BusinessObjects Web Intelligence®, and Xcelsius® are trademarks or registered trademarks of Business Objects, an SAP company and/or affiliated companies in the United States and/or other countries. SAP® is a registered trademark of SAP AG in Germany and/or other countries. All other names mentioned herein may be trademarks of their respective owners.
2
Match/Consolidate Library Reference

Contents

Preface .............................................................................................................7
Chapter 1:
Install, compile, and link............................................................................... 9
Build on Windows..........................................................................................10
Build on UNIX...............................................................................................11
Chapter 2:
Overview of the Match/Consolidate Library............................................ 13
Before-and-after of record matching..............................................................14
Work with the Match/Consolidate Library ....................................................16
Configuration files make setup faster and easier ...........................................17
Sample programs show you how to call the library.......................................19
Error handling and progress callback.............................................................20
Work files.......................................................................................................21
Function calls for initialization and termination ............................................22
Chapter 3:
Reference files.............................................................................................. 23
Introduction to reference files ........................................................................24
Create a reference file ....................................................................................26
Add keys to a reference file............................................................................27
Maintain a reference file ................................................................................28
Chapter 4:
Lists, match specifications, and match levels............................................ 29
Introduction to lists.........................................................................................30
Set up lists ......................................................................................................32
Control matching within and between lists....................................................33
Give one list priority over another .................................................................35
Gather statistics for a group of lists................................................................36
Chapter 5:
Break groups................................................................................................ 37
Introduction to breaking .................................................................................38
Set up normal breaking ..................................................................................39
Set up adaptive breaking ................................................................................41
Set up automatic breaking..............................................................................43
Query break groups ........................................................................................44
Chapter 6:
Duplicate search and results....................................................................... 47
Introduction ....................................................................................................48
How Match/Consolidate selects keys to compare..........................................49
Sequence of function calls for finding and retrieving duplicates...................51
Contents
3
Chapter 7:
Match/Consolidate functions ..................................................................... 53
mp_break_find_groups() ...............................................................................54
mp_break_get_info() ..................................................................................... 55
mp_break_set_auto() ..................................................................................... 56
mp_break_set_info()...................................................................................... 58
mp_breakfld_get_info() ................................................................................. 60
mp_breakfld_set_info() .................................................................................61
mp_breakqry_do_query() .............................................................................. 62
mp_breakqry_get_largest_group()................................................................. 63
mp_breakqry_get_next_str()..........................................................................64
mp_breakqry_get_num_groups()................................................................... 65
mp_breakqry_get_num_keys() ......................................................................66
mp_breakqry_get_str_len()............................................................................67
mp_breakqry_init() ........................................................................................ 68
mp_breakqry_set_qryfld() ............................................................................. 69
mp_breakqry_term() ...................................................................................... 70
mp_cfg_close() .............................................................................................. 71
mp_cfg_get_num_ref_files() ......................................................................... 72
mp_cfg_get_ref_info()................................................................................... 73
mp_cfg_open()...............................................................................................74
mp_duperes_get_data().................................................................................. 75
mp_duperes_get_keyid() ...............................................................................76
mp_duperes_get_value()................................................................................77
mp_duperes_set_key_num()..........................................................................79
mp_duperes_set_match_level() ..................................................................... 80
mp_duperes_set_results_type() .....................................................................81
mp_dupesrch_batch().....................................................................................82
mp_dupesrch_break_queries()....................................................................... 83
mp_dupexit_get_data() .................................................................................. 84
mp_dupexit_set_exit_compare() ...................................................................85
mp_get_error_info()....................................................................................... 87
mp_get_error_messages()..............................................................................88
mp_get_error_number()................................................................................. 89
mp_get_revision().......................................................................................... 90
mp_init() ........................................................................................................91
mp_input_get_num_refs() .............................................................................92
mp_input_get_ref_num() ............................................................................... 93
mp_input_set_num_refs()..............................................................................94
mp_input_set_ref_num() ...............................................................................95
mp_list_get_default_action()......................................................................... 96
mp_list_get_default_list_num()..................................................................... 97
mp_list_get_list_attr().................................................................................... 98
mp_list_get_list_id()...................................................................................... 99
mp_list_get_list_name() .............................................................................. 100
mp_list_get_list_num_from_list_id() ..........................................................101
mp_list_get_match_spec()...........................................................................102
mp_list_get_num_lists() .............................................................................. 103
mp_list_get_num_lists_in_super()............................................................... 104
mp_list_get_num_match_levels()................................................................105
mp_list_get_num_super_lists()....................................................................106
mp_list_get_super_name()...........................................................................107
4
Match/Consolidate Library Reference
mp_list_get_super_num().............................................................................108
mp_list_get_super_status()...........................................................................109
mp_list_set_default_action()........................................................................110
mp_list_set_default_list_num()....................................................................111
mp_list_set_exit_match_spec()....................................................................112
mp_list_set_list_attr()...................................................................................113
mp_list_set_list_id().....................................................................................115
mp_list_set_list_name() ...............................................................................116
mp_list_set_match_spec()............................................................................117
mp_list_set_match_spec_autoid()................................................................118
mp_list_set_match_spec_ruleids()...............................................................119
mp_list_set_num_lists() ...............................................................................120
mp_list_set_num_match_specs() .................................................................121
mp_list_set_num_super_lists().....................................................................122
mp_list_set_super_list_name().....................................................................123
mp_list_set_super_list_num() ......................................................................124
mp_misc_get_blank_field_priority() ...........................................................125
mp_misc_get_keyfile_id() ...........................................................................126
mp_misc_get_option_info() .........................................................................127
mp_misc_get_work_file_info()....................................................................128
mp_misc_set_blank_field_priority()............................................................129
mp_misc_set_exit_blank_field()..................................................................130
mp_misc_set_exit_progress().......................................................................131
mp_misc_set_option_info() .........................................................................133
mp_misc_set_work_file_info() ....................................................................135
mp_ref_close() .............................................................................................136
mp_ref_open()..............................................................................................137
mp_refcreate_close()....................................................................................138
mp_refcreate_open() ....................................................................................139
mp_refcreate_set_mtc_key_id()...................................................................140
mp_refcreate_set_option() ...........................................................................141
mp_refmod_add_key() .................................................................................143
mp_refmod_clear_key() ...............................................................................144
mp_refmod_delete_key() .............................................................................145
mp_refmod_read_key()................................................................................146
mp_refmod_set_data() .................................................................................147
mp_refmod_truncate()..................................................................................149
mp_refmod_update_key() ............................................................................150
mp_refqry_get_data()...................................................................................151
mp_refqry_get_mtc_key_id().......................................................................152
mp_refqry_get_value().................................................................................153
mp_term().....................................................................................................155
Appendix A:
Configuration parameters and their corresponding API calls ..............157
Appendix B:
Refmorph utility .........................................................................................161
Index ............................................................................................................163
Contents
5
6
Match/Consolidate Library Reference

Preface

About Match/ Consolidate Library

Conventions

This manual is a reference for programmers working with the Match/Consoli­date Library. It explains how to make your application work with the library. Each chapter contains explanations, code examples, call sequences, and refer­ence pages about each of the function calls.
In this manual, we assume that you are already familiar with your programming language, your operating system, and with concepts of database management.
This document follows these conventions:
Convention Description
Bold We use bold type for file names, paths, emphasis, and text that you
should type exactly as shown. For example, “Type
Italics We use italics for emphasis and text for which you should substitute
your own data or values. For example, “Type a name for your file, and the
.txt
Menu commands
!
extension (
We indicate commands that you choose from menus in the following format: Menu Name > Command Name. For example, “Choose File > New.”
We use this symbol to alert you to important information and potential problems.
testfile
.txt
).”
cd\dirs
.”
We use this symbol to point out special cases that you should know about.
We use this symbol to draw your attention to tips that may be useful to you.
Preface
7
Documentation
Other documentation Documents related to this manual include the following:
Document Description
Access the latest documentation
System Administrator’s
Explains how to install your software.
Guide
Database Prep
Explains how to prepare input files for processing, includ­ing how to create DEF, FMT, and DMT files.
Match/Consolidate User’s Guide to Record Matching
Explains the concepts behind name and address matching software and provides examples of how to implement, analyze, and fine-tune match detection strategies for the best results.
Match/Consolidate Extended Matching
Contains the operational how-to instructions for setting up extended matching.
Reference
Match Library Program-
This is a reference manual for the Match Library.
mer’s Reference
Quick Reference
Contains descriptions of the input and output fields, and the command line for the Match/Consolidate job file.
You can access documentation in several places:
On your computer. Release notes, manuals, and other documents for each
product that you’ve installed are available in the Documentation folder. Choose Start > Programs > Business Objects Applications > Documentation.
On the SAP Service Market Place. Go to http://help.sap.com, and then
click the Business Objects tab. Here, you can search for your products’ documentation.
8
Match/Consolidate Library Reference
Chapter 1: Install, compile, and link
This chapter provides information about installing the software, compiling your application, and linking the compiled application with the Match/Consolidate (MCD) libraries for Windows and UNIX systems.
Chapter 1: Install, compile, and link
9

Build on Windows

We provide 32-bit dynamically-linked libraries (DLLs), which you can use with Windows NT 4.0, Windows 2000 Professional, Windows XP, and Windows 2000 Advanced Server.

Installation If you need installation information, refer to the System Administrator’s Guide.

Refer to pw\mplib for the MCD library.

Compilers and programming languages

Additional compilation flags

Create sample programs

See InfoSource for the latest compiler information.
Beginning with Match/Consolidate release 7.31c, additional compilation flags are required. These flags are platform specific. For example, on Windows, add the following: /D “FL_WIN32”
For more information, see the sample build scripts.
We provide a build file, read_dll.me for the sample programs. Refer to this file for step-by-step instructions for creating a Visual C++ project to build samples.
10
Match/Consolidate Library Reference

Build on UNIX

Installation If you need installation information, refer to the System Administrator’s Guide.

Refer to …/postware/mplib for the MCD library.

Compiler See InfoSource for the latest compiler information.

Additional compilation flags

Create sample programs

Beginning with Match/Consolidate release 7.31c, additional compilation flags are required. These flags are platform specific. For example, on Solaris 64-bit, add the following: -DFL_UNIX -DFL_UNIX_SOL
For more information, see the sample build scripts.
Refer to buildmp for examples of how to build the sample programs. Modify this file to set your include and library directories.
Chapter 1: Install, compile, and link
11
12
Match/Consolidate Library Reference
Chapter 2: Overview of the Match/Consolidate Library
This chapter provides information about the before-and-after of record matching and the basic steps for working with Match/Consolidate (MCD) Library. It also provides information about configuration files, sample programs, error handling and progress callbacks, work files, and function calls.
Chapter 2: Overview of the Match/Consolidate Library
13

Before-and-after of record matching

The MCD Library is a companion to Match Library. Match Library compares two records and determines whether or not they match. MCD Library works with the before-and-after of record matching.

Condense records into essential data

MCD Library MCD Library
Select a pair of
records to compare
Match
Library
Compare the records Use the results
Theoretically, you could compare each complete, original record with every other complete, original record, but that would take a very long time. To save time, you’ll condense records into the data essential for matching and select for comparison only records that have a reasonable chance of matching.
When you compare records, you’ll decide which data must match—and how closely—for records to be considered a match. Theoretically, you could use your complete, original records for comparisons, but such comparisons would be prohibitively slow.
To make matching more efficient, use our Match Library to condense records so that they contain only the data needed during the matching process. These condensed records are called keys.
Original record Key
FirstName: JoAnne
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
City: Leeds
State: MA
ZIP: 01053
LastName: MacKiewicz
Str_Range: 1
Str_Name: Brook
Str_Suffix: Rd
ZIP: 01053
Use the MCD Library to collect keys into a file called a reference file. The reference file contains the Match Library key data, plus additional information— such as list identifiers—that you can add when you build the reference file.
14
Match/Consolidate Library Reference

Eliminate needless comparisons

Theoretically you could compare every record to every other record, but that would take a long time. And many comparisons wouldn't make a lot of sense-for example, a record with a ZIP Code of 01234 does not match a record with a ZIP Code of 98765, so it doesn't pay to compare them at all.
To eliminate comparisons between records that are unlikely to match, you can separate records into clusters called break groups. For example, you could separate records by ZIP Code and look for matches only within each ZIP Code group.
Forming break groups eliminates a huge number of unnecessary comparisons. For a more in-depth discussion-including examples showing how many comparisons can be eliminated with a minimal effect on results, refer to the User's Guide to Record Matching.
You can also use lists to eliminate unnecessary comparisons. For example, if you have a database which you know contains no duplicate records, you can assign all the records in that file to a list, and tell MCD to cancel comparisons between records within that list.

Use different matching rules for different lists

Form groups of duplicate records

To control how records are compared, you can assign each record to a list. A list is a group of records that have some common characteristic—perhaps all of the records come from the same input database, or all have a common demographic code.
You can then use different matching rules with different lists. For example, the matching rules for comparisons within List A could be different from the matching rules for comparisons between List A and List B. When you pass a pair of records to the match engine, you can use list membership to dictate which match rules are used for the comparison.
The MCD Library interprets the matching results and tracks information relevant to the duplicate-detection process. You can query these results and then handle the information however you choose.
Chapter 2: Overview of the Match/Consolidate Library
15

Work with the Match/Consolidate Library

When you work with the MCD Library, you’ll follow these basic steps:
1. Create a file that contains only the record-data needed for matching. Such a file is called a reference file, and the condensed records are called keys.
2. Define logical groups of records called lists. You’ll do this at the same time you create your reference files by putting a list identifier into each reference file or into each individual record key.
3. Separate the record keys into break groups. For example, you might separate records by the first three digits of the ZIP Code.
4. Within each break group, select pairs of record keys and pass them to the match engine. If you have more than one set of match rules, you can control which set of rules to use for each pair of records.
5. After the match engine compares the record keys, query the results.
Reference files Key pairs
List 1
Key A Key D Key B Key E Key C Key F
List 2
Key G Key K Key H Key L Key J Key M
Break groups Comparisons Results
Key F Key H Key J Key M
Key A Key D Key G
Key B Key E
Key C Key K Key L
Key F
Key H
Match
Library
Not
duplicate
16
Match/Consolidate Library Reference

Configuration files make setup faster and easier

Match/Consolidate Library offers external text configuration files that contain some of the settings and options to be used for your MCD session. Set the parameters of the text configuration files, then call mp_cfg_open(). This function sets your options and returns a session handle.

Advantages of configuration files

Some advantages to using external text configuration files include the following:
Rather than making dozens of calls to set up a session handle, you make one
call to mp_cfg_open(). Compared with making direct calls to the conventional API, this should reduce application code size, training time, development time, errors, and testing time.
You can change breaking strategy, list definitions, and reference file setup
simply by editing the configuration files. Therefore, you can test different scenarios without changing API calls and rebuilding.
With small modifications to your code, you can use one code base to support
many different MCD scenarios. For each scenario, make a different set of text configuration files.
Configuration files are heavily commented so that users can edit them
without referring to printed documentation. In fact, all the potential options and settings are included in the comments, so making additions and changes to the configuration can be mostly copy-and-paste.

Preset MCD strategy We include a set of configuration files already set for the most commonly-used

MCD strategy. Most users will be able to use these without further editing or modification. Just specify the location and file name of the mp.cfg file in your mp_cfg_open() call.

Five configuration files

Match/Consolidate Library configuration parameters are distributed among five configuration files, as described below.
Configuration file File name Description
Overall
Reference
List
Break
Miscellaneous
mp.cfg
mpref.cfg
mplist.cfg
mpbreak.cfg
mpmisc.cfg
Specifies the paths and file names of the other configuration files.
Controls the format of the reference file, which contains the record keys.
Defines logical groups of records.
Identifies which fields to use to form break groups, and which characters in those fields to use for breaking.
Set miscellaneous options such as the work­file directory.
Chapter 2: Overview of the Match/Consolidate Library
17
MP_Reference_Config_File
MP_Reference_File_Name: test.ref MP_Reference_Constant_List_ID: 1 MP_Reference_List_ID_Field_Length: MP_Reference_Misc_Header_Field_Length: 10 MP_Reference_Misc_Key_Field_Length: 10 MP_Reference_Priority_Field_Length: 10
mp_cfg_open()
MP_Overall_Config_File
MP_Reference_Config_File: mpref.cfg MP_List_Config_File: mplist.cfg MP_Break_Config_File: mpbreak.cfg MP_Misc_Config_File: mpmisc.cfg MP_Keyfile: mplib.key
MP_List_Config_File
MP_List_Number_of_Lists: 1 MP_List_Number: 1 MP_List_ID: 1 MP_List_Name: List1 MP_List_Priority: 1 MP_List_Type: NORMAL MP_List_Compare_Within_This_List: YES MP_List_Break_Priority: 1 MP_List_Apply_Blank_Priority: YES MP_List_Data_Salvage: YES MP_List_Default_List_Action: ASSIGN_DEFAULT MP_List_Default_List_Number: 1
MP_Break_Config_File
MP_Break_Number_of_Break_Fields: 2 MP_Break_Field_Name: MTC_KEYFLD_ZIP MP_Break_Starting_Position: 1 MP_Break_Field_Length: 5 MP_Break_Mode: NORMAL
MP_Misc_Config_File
MP_Misc_Work_Dir: . MP_Misc_Sort_Dir: . MP_Misc_Session_Name: session1 MP_Misc_Max_Work_Buffer: 1024 MP_Misc_Sort_Option: SPEED MP_Misc_Dupe_File_Name: sample.dpd MP_Misc_Prioritize_Dupes: LIST_FIELD MP_Misc_Key_Work_File_Name: sample.dpk MP_Misc_Clean_Up_Work_Files_When_Done: YES
18
Match/Consolidate Library Reference

Sample programs show you how to call the library

Sample programs We provide three sample programs and their source code.

Program Description
reftest.c
mptest.c
mptest2.c

Modules The sample programs use the following modules.

Program Description
suppfncs.c
errfncs.c

Build the sample programs

For instructions on building the sample programs, see “Install, compile, and link”
on page 9.
Demonstrates how to create and load a reference file. In this sample program, data is read from a sample ASCII data file.
Demonstrates how to open a previously created reference file, set up lists and matching, set up progress callbacks, find break groups, find duplicates, and query duplicate results.
Demonstrates how to use the MCD Library configuration files.
Contains supporting functions used by all of the sample programs.
Demonstrates error handling.
Chapter 2: Overview of the Match/Consolidate Library
19

Error handling and progress callback

Nearly every function in the MCD Library returns a status code. It is important to check the status code after every MCD Library function call. If MCD returns the value MP_ERROR, your application must decide whether to exit or not, what information to display, and so forth.

Error handler The easiest way to handle error checking is to set up an error-handling function.

Call your error handler whenever MP_OK is not returned.
Inside your error handler, you can call mp_get_error_number() and mp_get_error_messages() to obtain more information about the failed function call.
You can call mp_get_error_info() rather than mp_get_error_number() and mp_get_error_messages(). However, mp_get_error_info() may not work with Visual Basic.
For sample code, refer to the errfncs.c module in the samples subdirectory.

One error handler for MCD and Match

If you want to use just one error handler for MCD and Match Library function calls, you need to detect whether the current error is MCD or Match Library to determine which get_error_info() function to call. You can do this by checking the prefix of the error-causing function (mp or mtc), or by setting a variable and passing its value as a parameter when calling your error handler.

Progress callbacks Certain MCD Library processing steps—such as breaking and finding

duplicates—work with multiple records and potentially involve I/O and sorting. If you wish to display progress information for MCD Library processing steps, you can set up progress callbacks for the desired processing steps.
For sample code, refer to the suppfncs.c module in the samples subdirectory.
20
Match/Consolidate Library Reference

Work files

During the processes of creating break groups and finding duplicates, MCD creates work files. You can specify where to store these files, what to call them, and whether to delete them automatically.

Work files for break groups

When finding break groups, MCD creates work files. By default, these files are stored in the current directory. If you’d prefer to store them somewhere else, you can specify a work directory.
You must specify a work-file session name. MCD uses the session name as the base file name for work files. For example, if you assign a session name of “test,” work files will have names such as test.dbk.

Work files for duplicate results

When finding duplicates, MCD builds two additional work files—one for duplicate results, and one for key-file information. These two files can become quite large, so you may want to store them in a separate location. You can specify the path and file names for each of these two work files.

Sample code To set work-file options, call mp_misc_set_work_file_info(). For sample code,

refer to the suppfncs.c module in the samples subdirectory.
Chapter 2: Overview of the Match/Consolidate Library
21

Function calls for initialization and termination

You may use some or all of the functionality of the MCD Library. You’ll call these functions for any MCD application.
1. Call mp_init() to initialize the MCD Library.
Initialize the Match Library before initializing the MCD Library. For more information, refer to your Match Library documentation.
2. If you are not using configuration files, call mp_set_keyfile() to set the path and file name of the installation key file, mplib.key. You must specify the location of this file before calling any other MCD functions.
This key file is not related to the record keys stored in your reference file. This key file “unlocks” the library for use.
3. If you are using configuration files, call mp_cfg_open(). For details about configuration files, see “Configuration files make setup faster and easier” on
page 17.
4. Set up an error-handling function. Inside your function, call mp_get_error_info(). For more information, see “Error handling and progress
callback” on page 20.
5. Optional: Call mp_get_revision() to get version information for the MCD and Match Libraries. You will need this information before calling for technical support.
6. If you are not using configuration files, call the following functions to set miscellaneous options and work-file information:
mp_misc_set_option_info()
mp_misc_set_work_file_info()
7. If you want to establish exit functions—for example, to report progress at crucial points during processing—call mp_misc_set_exit_progress() or mp_misc_set_exit_blank_field() to register your exit functions with the MCD Library.
8. Perform processing. A complete, end-to-end job process would involve the following major steps:
Create, open, or update reference files (Chapter 3).
Create lists and super lists (Chapter 4).
Form break groups (Chapter 5).
Search for duplicates and retrieve results (Chapter 6).
9. If you are using configuration files, call mp_cfg_close().
10. Call mp_term() to terminate the MCD Library and free global memory allocated for it.
22
Match/Consolidate Library Reference
Chapter 3: Reference files
This chapter introduces reference files and provides information about creating reference files, adding keys to a reference file, and maintaining reference files.
Chapter 3: Reference files
23

Introduction to reference files

A reference file is a specialized work file that contains record keys. The record keys are condensed versions of your original record data, containing only the data needed to form break groups, determine list membership, perform matching, and rank records within duplicate groups.
You’ll use the Match Library to define most of the key layout—namely, which record data to include in the key. When you use MCD to create a reference file, you’ll tell MCD which match key to use, and you might add useful MCD key fields to each key.

Header and keys A reference file consists of header data and record keys.

The header contains information about the key layout. You can also place a list identifier and other user data in the header.
The record keys contain the record data that is needed to form break groups and perform matching; you use the Match Library to define this portion of the key. When you create a reference file, you can also add a record-mapping field, record-priority field, and list identification field to each key.

Temporary or reusable

Header
Key layout, user “miscellaneous” field, list ID (if constant)
Keys
Overhead data, record data, record-mapping field, record-pri­ority field, list ID field (if variable)
A reference file can be used as a temporary work space. For example, if some of your duplicate-detection data comes from a transaction file which is always changing, you can create a temporary reference file from your transaction data each time you want to find duplicates.
Alternately, once you generate keys and store them in a reference file, you can save the reference file and add, delete, update, and reuse the keys. However, many programmers find it easier to regenerate the reference file each time they find duplicates, rather than maintaining and updating keys in an existing reference file.
24
Match/Consolidate Library Reference

Key data A key contains record data plus other data such as a list identifier or record-

mapping information. Keys must include all of the data you want to use to form break groups and perform matching.
You’ll use the Match Library to define the layout and content of the key. When you use the MCD Library to create the reference file, you’ll specify which match key to use. Optionally, you can add the MCD key fields listed below.
Key field Description
List ID Defines to which list a key belongs.
Field priority
Key miscella­neous
Defines a record’s priority within a group of duplicate records.
Stores user information such as a record identifier that allows you to map a key back to a particular record in your database.

Key length Each key consists of a overhead data, Match Library key fields, and MCD

Library key fields. To calculate the length of each key, find the sum of the following items:
The size of the overhead data (16 bytes per key).
The length of each Match Library key field, including any alternates.
The length of each MCD key field.
Match/Consolidate pads each key so that the total key length is divisible by eight.

List identifiers There are two ways to store list identifiers:

If all keys in the reference file belong to the same list, you can store the list
ID once, in the reference-file header.
If the reference file contains keys from multiple lists, store the list ID in each
key, in the MCD list ID field.
Chapter 3: Reference files
25

Create a reference file

Configuration file If you use the MCD configuration files, use the Reference configuration file,

mpref.cfg, to define MCD key fields and reference-file header information. For instructions about completing the configuration parameters, see the comments in the configuration file.
This configuration file replaces calls to mp_refcreate_set_option(). For more information about using configuration files, see “Configuration files make setup
faster and easier” on page 17.

Sequence of function calls

To create an empty reference file, call the functions below. For sample code, refer to reftest.c.
1. Use the Match Library to define a match key. The match key defines the fields that hold record data—for example, personal-name data, address data, and so on. For more information, refer to the Match Library Programmer’s Reference manual.
2. Call mp_refcreate_open() to initialize the library and open a session to create a reference file.
3. Call mp_refcreate_set_mtc_key_id() to specify which match-key layout to use. You used the Match Library to define the match-key layout (step 1).
4. Call mp_refcreate_set_option() to reserve space for MCD key fields and set options. If you are using MCD configuration files, you do not need to make these calls.
5. Call mp_refcreate_close() to lock your settings and create the reference file.
26
Match/Consolidate Library Reference

Add keys to a reference file

To load data into a reference file, add keys. Use the Match Library to load record data into each key. Use the MCD Library to load data into the MCD key fields.
To add keys a reference file, call the functions below. For sample code, refer to reftest.c.
1. Call mp_ref_open() to open a previously created reference file.
2. Call mp_refqry_get_mtc_key_id() to get the match-key ID for the reference
file. You will pass this ID as input to Match Library functions.
3. Enter a loop to load data into each key:
Call mp_refmod_clear_key() to clear the internal key buffer before
loading data.
Use the Match Library to load record data into the Match key fields. For
more information, see your Match Library manual.
Call mp_refmod_set_data() to set data in MCD key fields. Call this
function once for each MCD field.
Call mp_refmod_add_key() to add the completed key to the reference
file.
4. Call mp_ref_close() to close the reference file.
Chapter 3: Reference files
27

Maintain a reference file

Many programmers find that it’s easiest to regenerate reference files each time they perform MCD processing, rather than trying to maintain the files. Often the time needed to regenerate a reference file is minimal, especially when weighed against the work of maintaining it.
In some cases, however, you may prefer to maintain an existing reference file. To update an existing key, you need to find the key, modify the key fields as necessary, and write the data back to the key.
To modify a reference file, use the mp_refmod_*() functions and the appropriate Match Library functions. To query the contents of keys, use the mp_refqry_*() functions and the appropriate Match Library functions. For more information about working with Match Library key data, refer to the Match Library Programmer’s Reference manual.
28
Match/Consolidate Library Reference
Chapter 4: Lists, match specifications, and match levels
This chapter provides an introduction to lists and explains how to set up lists, control matching within and between lists, give one list priority over another, and how to gather statistics for a group of lists.
Chapter 4: Lists, match specifications, and match levels
29

Introduction to lists

A list is a group of records that are related in some way—for example, all of the records might have come from the same input database. Lists give you added control over the matching process:
Control which matching rules to use when comparing two records.
Cancel comparisons within a list if you know there are no duplicate records
within that list.
Assign one list priority over another. For example, you could favor a house
list over a rented list.
Prioritize records within a break group.

List membership When you set up a list, you assign a list identifier for that list. A record is a

member of that list if the record has the same list identifier. There are two ways to assign records to lists:
If all of the records (keys) in a reference file belong to the same list, you can
put the appropriate list ID in the file header.
Alternately, you can include a list ID field in your record key. The value in
the list ID field indicates to which list a particular record belongs.

Types of lists You can define three different types of lists:

List type Description
Normal Contains good or eligible records (default).
Suppression Contains records that should be suppressed. Suppression records and
all records that match them should be removed from the output.
Special Contains records that are not counted in the determination of whether
a duplicate group is single-list or multiple-list.
If a record doesn’t
If a record doesn’t belong to any of your defined lists, you can:
belong to any list
Action Outcome
Ignore Leave the record out of the job.
Abort Return an error code.
Assign Assign the record to a default list that you specify.
30
Match/Consolidate Library Reference
Loading...
+ 136 hidden pages