Business objects DATA QUALITY MANAGEMENT SDK 4.0 User Manual

Download

Page 1

Developer Guide

■ SAP BusinessObjects Data Quality Management SDK 4.0 (14.0.0.1)

2010-12-09

Page 2

© 2010 SAP AG. All rights reserved.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company.All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

2010-12-09

Page 3

Overview...............................................................................................................................13Chapter 1

1.1

1.1.1

1.1.2

2.1

2.2

2.3

3.1

3.1.1

3.1.2

3.1.3

3.1.4

3.1.5

3.1.6

Data Quality Management SDK overview..............................................................................13

Relationship to Data Services................................................................................................13

EmDQ....................................................................................................................................13

Installing Data Quality Management SDK............................................................................15Chapter 2

Upgrading..............................................................................................................................15

To install the SDK on Windows..............................................................................................15

To install the SDK on Unix.....................................................................................................16

Directory data.......................................................................................................................17Chapter 3

Directory Data.......................................................................................................................17

Directory listing and update schedule.....................................................................................17

U.S. Directory expiration........................................................................................................19

Where to copy directories......................................................................................................21

To install and set up SAP Download Manager........................................................................21

To download directory files.....................................................................................................22

To extract directory files.........................................................................................................22

Cleansing packages..............................................................................................................23Chapter 4

4.1

5.1

5.2

5.3

5.3.1

5.4

5.4.1

To install Data Cleanse cleansing packages...........................................................................23

Samples................................................................................................................................25Chapter 5

Getting started with the samples............................................................................................25

Sample program files.............................................................................................................25

Building the sample................................................................................................................26

To build the samples..............................................................................................................26

Running the samples..............................................................................................................26

To run a sample.....................................................................................................................27

2010-12-093

Page 4

Contents

API Reference for C++..........................................................................................................29Chapter 6

6.1

6.1.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

6.9

6.10

6.11

6.12

6.13

6.14

6.15

6.16

6.17

6.18

C++ API reference overview..................................................................................................29

ToLatin1.................................................................................................................................29

CertifiedReportGenerator......................................................................................................30

DataRecordSchema...............................................................................................................30

Date.......................................................................................................................................32

DateTime...............................................................................................................................33

EmdqException......................................................................................................................34

InputDataRecord....................................................................................................................34

MessageHandler....................................................................................................................37

MultiRecordTransform............................................................................................................37

MultiRecordTransformHelper.................................................................................................40

OutputDataRecord.................................................................................................................41

ProgressHandler....................................................................................................................44

RecordTransform...................................................................................................................44

RecordTransformHelper.........................................................................................................46

StatisticsHandler....................................................................................................................46

StatisticsSchema...................................................................................................................47

Time.......................................................................................................................................48

TransformFactory...................................................................................................................50

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

7.10

7.11

7.12

7.13

7.14

7.15

API Reference for Java..........................................................................................................53Chapter 7

Java API reference overview .................................................................................................53

CertifiedReportGenerator......................................................................................................53

DataRecordSchema...............................................................................................................54

EmdqException......................................................................................................................55

InputDataRecord....................................................................................................................56

MessageHandler....................................................................................................................58

MultiRecordTransform............................................................................................................58

MultiRecordTransformHelper.................................................................................................61

OutputDataRecord.................................................................................................................62

ProgressHandler....................................................................................................................65

RecordTransform...................................................................................................................65

RecordTransformHelper.........................................................................................................68

StatisticsHandler....................................................................................................................68

StatisticsSchema...................................................................................................................69

TransformFactory...................................................................................................................70

2010-12-094

Page 5

Contents

API Reference for .Net..........................................................................................................73Chapter 8

8.1

8.2

8.3

8.4

8.5

8.6

8.7

8.8

8.9

9.1

9.2

10.1

10.2

10.2.1

10.2.2

10.2.3

10.2.4

10.2.5

10.2.6

10.2.7

10.2.8

10.2.9

10.2.10

10.2.11

10.3

10.4

10.4.1

10.4.2

10.5

10.5.1

10.5.2

10.5.3

10.5.4

10.6

.Net API reference overview..................................................................................................73

EmDQException.....................................................................................................................73

LogHandler............................................................................................................................73

MultiRecordProgressHandler.................................................................................................74

MultiRecordTransform............................................................................................................74

MultiRecordTransformHelper.................................................................................................75

RecordTransform...................................................................................................................76

RecordTransformHelper.........................................................................................................77

TransformFactory...................................................................................................................77

Address cleanse concepts....................................................................................................81Chapter 9

Address cleanse basics..........................................................................................................81

Set up the reference files.......................................................................................................81

USA Regulatory Address Cleanse.........................................................................................83Chapter 10

USA Regulatory Address Cleanse overview...........................................................................83

USPS DPV®...........................................................................................................................83

Benefits of DPV.....................................................................................................................84

DPV security..........................................................................................................................84

DPV monthly directories........................................................................................................85

Required information in the job setup.....................................................................................85

DPV output fields...................................................................................................................85

Non certified mode.................................................................................................................88

DPV performance..................................................................................................................88

DPV locking...........................................................................................................................89

Unlocking DPV.......................................................................................................................93

DPV No Stats indicators........................................................................................................93

DPV Vacant indicators...........................................................................................................95

USPS eLOT® .........................................................................................................................95

Early Warning System (EWS).................................................................................................96

Overview of EWS...................................................................................................................96

EWS directory .......................................................................................................................96

SuiteLink™..............................................................................................................................97

Benefits of SuiteLink..............................................................................................................97

How SuiteLink works ............................................................................................................97

SuiteLink directory ................................................................................................................98

Improve processing speed ....................................................................................................99

LACSLink®.............................................................................................................................99

2010-12-095

Page 6

Contents

10.6.1

10.6.2

10.6.3

10.6.4

10.6.5

10.6.6

10.6.7

10.6.8

10.6.9

10.6.10

10.6.11

10.7

10.7.1

10.7.2

10.7.3

10.7.4

10.8

10.8.1

10.8.2

10.8.3

10.9

10.9.1

10.9.2

10.9.3

10.10

10.10.1

10.10.2

10.10.3

10.10.4

10.11

10.11.1

10.11.2

10.11.3

10.11.4

10.11.5

10.11.6

10.11.7

10.11.8

10.11.9

10.11.10

10.11.11

Benefits of LACSLink.............................................................................................................99

How LACSLink works .........................................................................................................100

Conditions for address processing.......................................................................................100

LACSLink directory files ......................................................................................................100

Required information in the job setup ..................................................................................101

Reasons for errors ..............................................................................................................101

LACSLink output fields .......................................................................................................102

Memory usage and caching for LACSLink processing..........................................................104

LACSLink® security..............................................................................................................105

Unlocking LACSLink............................................................................................................107

USPS Form 3553.................................................................................................................109

USPS RDI®..........................................................................................................................109

How RDI works ...................................................................................................................110

RDI directory files................................................................................................................110

RDI output field ...................................................................................................................111

CASS Statement, USPS Form 3553....................................................................................111

Z4Change (USA Regulatory Address Cleanse)....................................................................112

Enable Z4Change for faster processing ...............................................................................112

Z4Change and USPS rules .................................................................................................112

Z4Change directory.............................................................................................................112

Introduction to suggestion lists.............................................................................................113

Breaking ties........................................................................................................................114

More information is needed..................................................................................................116

CASS rule ...........................................................................................................................117

USPS certifications..............................................................................................................117

To complete USPS certifications .........................................................................................117

Static directories..................................................................................................................118

CASS self-certification ........................................................................................................120

NCOALink certification........................................................................................................122

NCOALink (USA Regulatory Address Cleanse)...................................................................125

The importance of move updating .......................................................................................126

Benefits of NCOALink.........................................................................................................126

How NCOALink works.........................................................................................................126

Software performance .........................................................................................................128

Address not known (ANKLink) ............................................................................................128

Getting started with NCOALink............................................................................................130

What to expect from the USPS and SAP BusinessObjects..................................................130

About NCOALink directories................................................................................................132

About the NCOALink daily delete file ..................................................................................134

Output file strategies............................................................................................................135

Improving NCOALink processing performance.....................................................................136

2010-12-096

Page 7

Contents

10.11.12

10.12

10.12.1

10.12.2

11.1

11.2

11.3

11.4

11.5

11.6

11.7

11.8

11.9

11.10

11.10.1

11.10.2

11.11

11.12

11.12.1

11.13

11.13.1

11.13.2

11.13.3

11.13.4

11.13.5

11.13.6

NCOALink log files .............................................................................................................138

Multiple data source statistics reporting...............................................................................140

Data_Source_ID field...........................................................................................................141

USPS Form 3553 and group reporting.................................................................................142

USA Regulatory Address Cleanse Reference.....................................................................145Chapter 11

USA Regulatory Address Cleanse.......................................................................................145

System group......................................................................................................................145

Report and analysis..............................................................................................................146

Transform performance........................................................................................................146

Reference files.....................................................................................................................148

Assignment options.............................................................................................................150

Standardization options........................................................................................................152

Z4 Change options..............................................................................................................159

CASS Report options..........................................................................................................160

Suggestion List options........................................................................................................161

Suggestion List output options.............................................................................................163

Suggestion list components.................................................................................................163

Non Certified options ..........................................................................................................166

USPS license information options .......................................................................................167

Required options for USPS License Information...................................................................169

NCOALink options...............................................................................................................170

Processing options..............................................................................................................170

Report Options....................................................................................................................173

Output options.....................................................................................................................174

Processing Acknowledgment Form (PAF) Details.................................................................174

Service provider options......................................................................................................175

Contact Details....................................................................................................................177

12.1

12.2

12.2.1

12.2.2

12.3

12.3.1

12.3.2

13.1

Global Address Cleanse......................................................................................................179Chapter 12

Supported countries (Global Address Cleanse)....................................................................179

Process Japanese addresses .............................................................................................179

Standard Japanese address format......................................................................................179

Special Japanese address formats.......................................................................................184

Process Chinese addresses.................................................................................................186

Chinese address format.......................................................................................................186

Sample Chinese address.....................................................................................................188

Global Address Cleanse Reference....................................................................................191Chapter 13

Global Address Cleanse.......................................................................................................191

2010-12-097

Page 8

Contents

13.2

13.3

13.4

13.5

13.6

13.7

13.8

13.8.1

13.8.2

13.8.3

13.9

13.10

13.10.1

13.10.2

13.11

13.11.1

13.11.2

14.1

14.2

14.3

14.3.1

14.3.2

14.3.3

14.3.4

14.3.5

14.4

14.5

14.6

14.7

14.8

14.8.1

14.8.2

System group......................................................................................................................191

Report and analysis..............................................................................................................192

Reference files.....................................................................................................................192

Country ID Options (Global Address Cleanse).....................................................................192

Engines................................................................................................................................193

Standardization Options.......................................................................................................194

Canada engine.....................................................................................................................203

Canada engine Options........................................................................................................204

Canada engine Report Options............................................................................................206

Canada engine Suggestion List Options...............................................................................207

Global Address Country Options..........................................................................................209

Global Address Engine Report Options................................................................................210

Report options for Australia.................................................................................................211

Report options for New Zealand..........................................................................................211

USA engine..........................................................................................................................212

USA engine Options............................................................................................................212

USA engine Suggestion Lists Options.................................................................................213

Data Cleanse......................................................................................................................217Chapter 14

About Cleansing Data..........................................................................................................217

Ranking and prioritizing parsing engines...............................................................................217

About parsing data...............................................................................................................217

About parsing phone numbers..............................................................................................218

About parsing dates.............................................................................................................219

About parsing Social Security numbers................................................................................219

About parsing Email addresses............................................................................................220

About parsing street addresses...........................................................................................222

About standardizing data......................................................................................................222

About assigning gender descriptions and prenames.............................................................222

Prepare records for matching...............................................................................................222

Cleansing packages and transforms.....................................................................................223

About Japanese data...........................................................................................................224

Text width in output fields.....................................................................................................224

Process Japanese data .......................................................................................................225

15.1

15.2

15.3

15.4

Data Cleanse Reference.....................................................................................................227Chapter 15

Data Cleanse.......................................................................................................................227

System group......................................................................................................................227

Cleansing Package ..............................................................................................................228

Engines................................................................................................................................228

2010-12-098

Page 9

Contents

15.5

15.5.1

15.6

15.7

15.8

15.9

15.10

16.1

16.1.1

16.1.2

17.1

17.2

17.3

17.4

17.4.1

17.4.2

Person standardization options............................................................................................228

Gender standardization options............................................................................................230

Firm standardization options.................................................................................................231

Other standardization options...............................................................................................232

Input word breaker...............................................................................................................234

Date options........................................................................................................................235

Parser configuration.............................................................................................................238

Geocoder............................................................................................................................239Chapter 16

Geocoding...........................................................................................................................239

POI and address geocoding ................................................................................................239

POI and address reverse geocoding ....................................................................................240

Geocoder Reference...........................................................................................................241Chapter 17

Geocoder.............................................................................................................................241

Directories...........................................................................................................................241

System group......................................................................................................................242

Geocoder options................................................................................................................242

Report and analysis..............................................................................................................242

Reference files.....................................................................................................................243

18.1

18.1.1

18.2

18.3

18.4

18.4.1

18.4.2

18.5

18.5.1

18.6

18.7

18.7.1

18.8

18.9

18.9.1

18.9.2

18.9.3

18.9.4

Match..................................................................................................................................249Chapter 18

Matching strategies..............................................................................................................249

Match samples.....................................................................................................................249

Match components..............................................................................................................250

Physical and logical sources.................................................................................................251

Using sources .....................................................................................................................252

Source types ......................................................................................................................253

Source groups ....................................................................................................................253

Prepare data for matching....................................................................................................254

Fields to include for matching...............................................................................................255

Compare tables....................................................................................................................255

Data Salvage ......................................................................................................................255

Data salvaging and initials ...................................................................................................256

Overview of match criteria....................................................................................................258

Matching methods...............................................................................................................259

Similarity score....................................................................................................................259

Rule-based method..............................................................................................................260

Weighted-scoring method....................................................................................................261

Combination method............................................................................................................262

2010-12-099

Page 10

Contents

18.10

18.10.1

18.10.2

18.10.3

18.10.4

18.10.5

18.10.6

18.11

18.12

19.1

19.2

19.2.1

19.3

19.4

19.4.1

19.4.2

19.5

19.6

19.7

19.8

19.9

19.9.1

19.9.2

19.10

19.11

19.12

19.12.1

19.12.2

19.13

19.14

19.15

19.16

19.17

Matching business rules.......................................................................................................263

Matching on strings, abbreviations, and initials.....................................................................263

Extended abbreviation matching...........................................................................................263

Name matching....................................................................................................................264

Numeric data matching.........................................................................................................265

Blank field matching.............................................................................................................267

Multiple field (cross-field) comparison..................................................................................269

Group statistics....................................................................................................................269

Input source select records .................................................................................................270

Match Reference.................................................................................................................273Chapter 19

Match XML..........................................................................................................................273

System group......................................................................................................................273

MatchSettings.....................................................................................................................274

Report and analysis..............................................................................................................275

Match control.......................................................................................................................275

Match levels group...............................................................................................................276

Input fields group.................................................................................................................276

Match level group................................................................................................................277

Match criteria standard keys................................................................................................277

Match criteria key layout......................................................................................................281

Compare table group...........................................................................................................286

Compare match criteria group..............................................................................................288

Standard key match options.................................................................................................290

Criteria definition group........................................................................................................293

Post match processing group...............................................................................................302

Group statistics group..........................................................................................................303

Input sources.......................................................................................................................306

Source groups.....................................................................................................................309

Input sources / Input fields...................................................................................................310

Field algorithm numeric difference group..............................................................................311

Field algorithm numeric percent difference group ................................................................312

Field algorithm geo proximity group .....................................................................................312

Input source group statistics group......................................................................................313

Input source select record group..........................................................................................314

20.1

20.2

20.3

Data Quality fields..............................................................................................................317Chapter 20

Input fields...........................................................................................................................317

Output fields........................................................................................................................318

Data type support................................................................................................................319

2010-12-0910

Page 11

Contents

20.4

20.4.1

20.4.2

20.5

20.5.1

20.5.2

20.6

20.6.1

20.6.2

20.6.3

20.7

20.7.1

20.7.2

20.8

21.1

21.2

21.3

21.4

21.5

21.6

21.7

21.7.1

21.7.2

21.7.3

21.8

Data Cleanse fields..............................................................................................................321

Input fields...........................................................................................................................321

Output fields........................................................................................................................323

Geocoder fields...................................................................................................................329

Input fields...........................................................................................................................329

Output fields........................................................................................................................331

Global Address Cleanse fields.............................................................................................337

Input fields...........................................................................................................................337

Output Fields.......................................................................................................................341

Global Address Cleanse Suggestion List fields....................................................................353

USA Regulatory Address Cleanse fields..............................................................................359

Input fields...........................................................................................................................359

Output fields........................................................................................................................362

Match output fields..............................................................................................................381

Data Quality Appendix........................................................................................................387Chapter 21

Address Cleanse reference..................................................................................................387

Country ISO codes and assignment engines........................................................................387

Information codes (Global Address Cleanse).......................................................................405

Status Codes (USA Regulatory Address Cleanse)...............................................................408

Quality codes (Global Address Cleanse)..............................................................................412

Status codes (Global Address Cleanse)...............................................................................413

About ShowA and ShowL (USA and Canada)......................................................................418

USA ShowA command line options......................................................................................419

Canada ShowA command line options.................................................................................421

Canada ShowL command line options..................................................................................422

Geocoder reference.............................................................................................................424

Glossary..............................................................................................................................425Chapter 22

Index 437

2010-12-0911

Page 12

Contents

2010-12-0912

Page 13

Overview

1.1 Data Quality Management SDK overview

The Data Quality Management SDK provides a framework and APIs that allow you to write applications that use SAP BusinessObjects Data Quality technology, such as parsing, standardization, correction, and matching of data. You can use it to create applications that target the specific Data Quality functionality you want to employ with an in-process integration.

1.1.1 Relationship to Data Services

This product provides functionality similar to SAP BusinessObjects Data Services, but deploys that technology as an API.

The Data Quality Management SDK provides a lighter footprint than Data Services. This product requires no server components (either from SAP or a third party) or user interface to access the Data Quality functionality.

Many customers choose to use this product in conjunction with Data Services, however. You can use the same release number version of Data Services to configure transform options in the Data Services Designer and create a configuration XML file for use with this SDK. To create the file, right-click on a transform in the Data Services Designer and select Export for DQM SDK. For more information on using the Data Services Designer, see the Data Services documentation.

When you use Data Services as a configuration tool for the Data Quality Management SDK, Data Services does not support the creation of a change log for changes to the configuration. That is, you can employ the Data Services central repository concept to manage changes to the Data Quality transforms, but no change log is created. Instead, the developer must implement a change log within a custom application created using the SDK.

1.1.2 EmDQ

2010-12-0913

Page 14

Overview

In many aspects of this product, the letters “emdq” (or cased as “EmDQ”) are often used in naming conventions. You can see this convention in namespaces, folder names, and file names. As the Data Quality SDK is an embedded, in-line processing, data quality solution, you might think of the letters emdq meaning Embedded Data Quality.

2010-12-0914

Page 15

Installing Data Quality Management SDK

Installing Data Quality Management SDK is a simple as running a self-extracting executable.

2.1 Upgrading

• If you are upgrading this product from the previous release, you can install this version while the

existing version still exists on the same machine. You should not overwrite the files from the previous version. The default location for where the installation routine places the files is different in this version.

• This product provides a new method to the TransformFactory class, UpgradeTransformSettings(),

that makes the transform settings built from the previous version of this product compatible with this version.

• For information about using UpgradeTransformSettings(), see the TransformFactory class

documentation for C++, Java, or .Net.

2.2 To install the SDK on Windows

Before installing this product, you must have downloaded from the SAP Service Marketplace the appropriate package file (named *.exe).

Run the executable file. The "Welcome" screen appears.

Tip:

If the installer does not start by running the executable, you can begin the installation routine by running setup.exe, which is contained in the archive.

Click Next. The "License Agreement" screen appears.

After reading and indicating that you accept the license agreement, click Next. The "Specify the destination folder" screen appears.

After choosing a folder to install the files for this product, click Next.

2010-12-0915

Page 16

Installing Data Quality Management SDK

The "Start Installation" screen appears.

Click Next. The installation routine extracts and places the files for this product in the folder you specified, until

the "Install Complete" screen appears.

Click Finish to dismiss the installer.

The files for the SDK are now installed.

You must also install the Addressing Directories and Cleansing Packages before using the address correction and data cleanse functionality of this product, or the sample applications.

2.3 To install the SDK on Unix

Before installing this product, you must have downloaded from the SAP Service Marketplace the appropriate package file (named *.tgz).

Unpack the *.tgz file. The files required for installation are copied to your system.

Run setup.sh. The "Destination Path" screen appears.

Type a destination path for the installation.

Note:

You must choose a different path than the default (which is the current working directory). The "Welcome" screen appears.

Press Enter to dismiss the "Welcome" screen The "License Agreement" screen appears.

Press Enter to accept the license agreement. The installation routine places the files for this product in the path you specified until completion of

the installation.

The files for the SDK are now installed.

You must also install the Addressing Directories and Cleansing Packages before using the address correction and data cleanse functionality of this product, or the sample applications.

2010-12-0916

Page 17

Directory data

3.1 Directory Data

To correct addresses and assign codes with SAP BusinessObjects Data Quality Management SDK, the transforms rely on directories, or databases. When this product uses the directories, it’s similar to the way that you use the telephone directory. A telephone directory is a large table in which you look up something you know—someone’s name—and locate something that you don’t know—their phone number.

Depending on which option you own, some disks or online packages that you receive may contain extra files in addition to your directories. You may not need to use all of these reference files depending on which transforms or options you use. For example, you may see an Extract folder. If you do not need these extra files, do not copy them to your computer. For information about extra folders, see the ReadMe.txt file included with the reference files.

3.1.1 Directory listing and update schedule

2010-12-0917

Page 18

Directory data

Updated Monthly (M)

Bimonthly (B)

Auxiliary Directories

Canada engine - Address Data

cityxx.dir

zcfxx.dir

revzip4.dir

zip4us.rev

zip4us.shs

canada.dir

cancity.dir

canfsa.dir

canpci.dir

Approximate SizeDirectory filenameDirectory type

2 MB

1 MB

97 MB

4 MB

Quarterly (Q)

MB699 MBzip4us.dirZIP4 and Auxiliary Directories

Weekly1 MBewyymmdd.dirEarly Warning System Directory

M653 MBdpv_pathDPV Data

M486MBelot.dirEnhanced Line of Travel Directory

M42 MB

Australia engine - Address Data

Engine - Data

Note:

You will receive files only for those countries your company has purchased.

apc.dir

aucity.dir

aus.dir

all filesGlobal Address engine and EMEA

MQ200 MB

Qup to 12.2 GB (for

all countries)

Q720 MBcgeox.dirCentroid Level Geo Data

Q4.67 GBageox.dirAddress Level Geo Data

2010-12-0918

Page 19

Directory data

Updated Monthly (M)

Bimonthly (B)

Geocoder

Japan engine - Address Data

geo_addr_ca_ven dorx .dir

geo_cent_ca_ven dorx .dir

geo_addr_fr_ven dorx .dir

geo_cent_fr_ven dorx .dir

geo_addr_us_ven dorx<num> .dir

geo_cent_us_ven dorx<num> .dir

gion_jp_paf.dir

Approximate SizeDirectory filenameDirectory type

Canada: 1.6 GB

Canada: 1 MB

France: 1.6 GB

France: 6 MB

USA: < 2 GB

461 MBall filesLACSLink

248 MBga_re

Quarterly (Q)

M199 MBz4change.dirZ4Change Data

3.1.2 U.S. Directory expiration

We publish and distribute the ZIP4 and supporting directory files under a non-exclusive license from the USPS. The USPS requires that our software disable itself when a user attempts to use expired directories.

ga_loc12_jp_paf.dir

ga_loc34_jp_paf.dir

ga_dp_jp_paf.dir

2010-12-0919

Page 20

Directory data

If you do not install new directories as you receive them, the software issues a warning in the log files when the directories are due to expire within 30 days. To ensure that your projects are based on up-to-date directory data, it's recommended that you heed the warning and install the latest directories.

Note:

Incompatible or out-of-date directories can render the software unusable. The directories are lookup files used by SAP BusinessObjects solution portfolio software. The system administrator must install monthly or bimonthly directory updates to ensure that they are compatible with the current software.

Expiration schedule

You can choose to receive updated U.S. national directories on a monthly or bimonthly basis. Bimonthly updates are distributed during the even months. Directory expiration guidelines are:

• ZIP4 and Auxiliary Directories expire on 1st day of the fourth month after directory creation. When

• LACSLink directories expire 105 days after directory creation.

running in Non-Certified mode, Zip4 and Auxiliary directories expire on the first day of the fourteenth month after directory creation.

3.1.2.1 U.S. National and Auxiliary files

The U.S. National and Auxiliary file self-extracting files are named as follows.

2010-12-0920

Page 21

Directory data

Zip file nameDirectory name

us_dirs_2004.exe2004-2008 U.S. National directory

U.S. Address-level GeoCensus

U.S. Centroid-level GeoCensus

3.1.3 Where to copy directories

We recommend that you install the directory files in the reference_data folder for each transform created during the Data Quality Management SDK installation. By default, the software looks for directories in <LINK_DIR>\DataQuality\reference_data (Windows) <LINK_DIR>/DataQuality/reference_data (Unix). If you place your directories in a different location, you must change the individual reference file option values in the XML files.

us_ageo1_2.exe

us_ageo3_4.exe

us_ageo5_6.exe

us_ageo7_8.exe

us_ageo9_10.exe

us_cgeo.exe

us_cgeo1.exe

us_cgeo2.exe

3.1.4 To install and set up SAP Download Manager

Before you can download directory files, you need to install and set up SAP Download Manager.

To install and set up SAP Download Manager:

Access the SAP Service Marketplace (SMP): http://service.sap.com/bosap-support

Select Downloads.

Select Download Basket.

Click the Get Download Manager button.

Follow the steps to install and set up the Download Manager.

2010-12-0921

Page 22

Directory data

3.1.5 To download directory files

The directories are available for download from the SAP Service Marketplace (SMP). To download directories:

Access the SAP Service Marketplace (SMP) site: http://service.sap.com/bosap-support

Select Software Downloads.

From the left pane, select Downloads > SAP Software Distribution Center > Installations and Upgrades > My Company's Application Components.

A list of your company's applications and any license-free products or components appear.

Select the files you want to download and add them to the Download Basket. The files you select are placed in the Download Basket.

To access the Download Basket, click Download Basket.

To access the Download Manager documentation, click Get Download Manager.

Follow the steps included in the Download Manager documentation to download the directory files.

3.1.6 To extract directory files

The steps listed here describe how to install the zipped directories using Info-Zip. If you use a different unzip tool, see the unzip procedure included with that tool.

Copy the self-extracting directory files manually from the download package to the \temporary\ folder.

Locate and double-click the file. The files are extracted and placed in the \temporary\ folder.

Copy the directory files from the \temporary\ folder to the location where you keep your directories.

Copy the zipped directory files manually from location of the extracted files to the location where you keep your directories.

Type unzip filename.zip -d outputfolder. For ZIP4US, type unzip us_dirs_2004.zip -d /SAP BusinessObjects/SAP BusinessObjects Data

Quality Management SDK/linux_x86_32/DataQuality/gac).

Repeat these steps for each required file.

2010-12-0922

Page 23

Cleansing packages

4.1 To install Data Cleanse cleansing packages

Before you can install cleansing packages, you must have successfully installed this product and downloaded SBOP DQM Cleansing Packages for the appropriate platform from the SAP Service Marketplace.

Installing a cleansing package prepares your system to use Data Cleanse to control parsing of person and firm data for the specific cleansing package.

You must install the cleansing packages to the same relative path as the directory holding the Data Cleanse reference files.

Go to the directory where you downloaded cleansing packages from the SAP Service Marketplace, and run the setup file (setup.exe or setup.sh) to start the installer.

The DQM Cleansing Package installer starts.

Read the explanatory text on screen.

Review and accept the license agreement.

Choose DQM SDK.

Specify the destination folder.

Note:

For Windows, the destination folder is determined by the location of the installation of the SDK itself and cannot be altered. For Unix, you must assure that the destination directory is in the same relative path as the installation of the SDK.

Choose the operating system for the machine on which you are installing the cleansing packages.

Choose the cleansing packages that you want to install.

Click the Disk Cost button (or, in Unix, select Disk Cost) to learn how much disk space is needed for this installation and how much disk space is available.

Proceed until installation is complete.

The cleansing packages that you selected are now installed and available for you to use.

2010-12-0923

Page 24

Cleansing packages

2010-12-0924

Page 25

Samples

5.1 Getting started with the samples

The best way to get started with this SDK is to examine, build, and run the provided sample programs.

The installation routine places folders containing the sample program files in each supported operating system andlanguage (for example, <install_path>\windows_32\Java\samples). The \samples folder contains files needed for integrating the public API and running Data Quality transforms built using this SDK.

5.2 Sample program files

The following is the folder structure for the source code, build, and run script the samples use to demonstrate a given Data Quality transform.

• cpp\ - contains files needed to integrate this SDK into a C++ environment.

• inc\ - contains the SDK public headers you will include in your code.

• lib\ - contains the SDK public libraries you need to link against for C++ code. You need not

link the certifiedreportgenerator.lib library unless you intend to use the certified report generator.

• samples\ - contains C++ sample drivers for several Data Quality transforms.

• dotNet\ - contains files needed to integrate the SDK into a .Net environment.

• bin\ - contains the SDK public library you need to link against for .Net code.

• samples\ - contains .Net sample drivers for several Data Quality transforms.

• Java\ - contains files needed to integrate the SDK into a Java environment.

• bin\ - contains the SDK public library you need to link against for Java code.

• samples\ - contains Java sample drivers for several Data Quality transforms.

• bin\ - contains many of the shared libraries and binaries needed to run the Data Quality transforms

included in this package. This directory must be in your PATH and shared library load environment variable (such as LD_LIBRARY_PATH on Linux) for the shared objects and other required files to be loaded properly. The run scripts for the included samples set these variables for you.

• DataQuality\ - contains many files required for the Data Quality transforms to run.

• redist\ - contains the MSVC VS 2005 redistributable package that you must have installed to run

the windows executables.

2010-12-0925

Page 26

Samples

• xsd\ - contains all Data Quality transform configuration file XSD files. Note that the

xsi:schemaLocation element in your configuration xml files must be able to locate these XSD files.

5.3 Building the sample

All the build scripts included in a samples folder assume you are running from a command prompt with your compiler paths set up correctly. On Windows, for C++ and .Net builds, the devenv and dumpbin executable from the VisualStudio 2005 SP1 or greater should be available in your PATH environment variable. Likewise on Unix platforms, the appropriate compiler for that platform should available.

For Java projects, JAVA_HOME must be set to a compatible JDK location so that the javac executable can be found.

For .Net projects, ensure the <install_path>\<platform>\bin folder is added to your PATH environment variable prior to launching Visual Studio. The samples require these libraries to be found to create an instance of an SDK transform object.

5.3.1 To build the samples

To build all samples for a particular programming language use the build.bat (Windows) or build.sh (Unix).

Navigate to your desired language folder.

Navigate to the samples folder.

Run the build.bat (Windows) or build.sh (Unix) script to build all samples.

The script builds the samples.

Example:

To build all C++ samples on Windows 32 bit, navigate to <install_path>\windows_32\cpp\sam ples, and type build.bat.

To build a specific sample navigate one level further to the transform's sub directory and run the build.bat (Windows) or build.sh (Unix) script within that subdirectory.

5.4 Running the samples

All of the run scripts included in a samples folder assume you are running from a command prompt.

2010-12-0926

Page 27

Samples

For Java projects, JAVA_HOME must be set to a compatible JDK or JRE location so that the Java executable can be found.

5.4.1 To run a sample

Before running the samples, you must have installed the address directory reference files and cleansing packages, and built the sample.

Each run script takes at least the configuration XML file as the first command line argument. Multi-record transforms such as Match also require you to list the input .txt file as the second argument to the run script.

Navigate to the sample transform directory.

Run the run.bat (Windows) or run.sh (Unix) run scripts with the necessary command line arguments, to setup your environment and run the sample.

Example:

To run the Global Address Cleanse C++ sample on Windows 32 bit, navigate to the folder <in stall_location>/windows_32/cpp/samples/gac and type run.bat

EmDQ_GlobalAddressCleanse.xml.

To run the Match C++ sample on Windows 32 bit, navigate to the folder <install_location>/win dows_32/cpp/cpp/samples/match and type run.bat EmDQ_NameAddressMatch.xml MatchNameAddrUSSingleSource.txt.

2010-12-0927

Page 28

Samples

2010-12-0928

Page 29

API Reference for C++

6.1 C++ API reference overview

This section details the API for the C++ implementation.

The define for the C++ namespace is EmDQ.

6.1.1 ToLatin1

The ToLatin1 method converts Unicode data to standard Latin1 data. It is defined in the file utility.h.

char* ToLatin1(const uint16_t* str, char* dstBuf, int32_t dstBufLength, char invalidCharReplacement = '?');

invalidCharReplacement [IN]

This method is used convert the UCS2 characters of the str object to Latin1 characters and copy the Latin1 characters into the passed-in buffer. A NULL terminator is always added, so there must be room in the buffer for the entire string plus a NULL terminator.

A UCS2 value greater than 255 is considered to be an invalid Latin1 character. An invalid Latin1 character is replaced with the invalidCharReplacement value. If this value is set to 0, then the invalid Latin1 character is deleted.

There are no conversions made for Locale.

DescriptionParameter

The string to convertstr [IN]

The buffer to receive Latin1 chars and a NULL terminatordstBuf [OUT]

The size of dstBuf in bytes.dstBufLength [IN]

The character to be used to replace invalid Latin1 characters (a 0 means to drop invalid characters)

Returns the dstBuf parameter.

2010-12-0929

Page 30

API Reference for C++

6.2 CertifiedReportGenerator

Class CertifiedReportGenerator is an implementation of the public StatisticsEvenHandler interface that can generate the certified mailing Statement of Address Accuracy (SERP), Address Matching processing Summary (AMAS), and Coding Accuracy Support System (CASS) 3553 reports.

This method Implements the StatisticsHandler interface.

void SetReportFile(const char* fileName, REPORT_TYPE report);

DescriptionParameter

A valid filename and path where the report is to be generatedfileName [IN]

Which report to write to fileNamereport [IN]

This method tells CertifiedReportGenerator a path and file name to create the report. Valid report types are REPORT_3553, REPORT_AMAS, and REPORT_SERP. You must call this method for each report you want generated.

If the specified file exists, if previous version of the file is overwritten. If the path to the file specified does not exist, the file is not created and an error occurs.

This method must be called for each report you wish to have generated prior to using any transform.

Related Topics

• StatisticsHandler

6.3 DataRecordSchema

Class DataRecordSchema defines the layout of a Data Record.

int GetFieldCount();

This method returns the number of fields defined in the data record.

int GetFieldIndex(const uint16_t* fieldName);

DescriptionParameter

The name of the field to getfieldName [IN]

2010-12-0930

Page 31

API Reference for C++

methods that have a field index as a parameter; otherwise, if fieldName is invalid, a value of -1 is returned.

int GetFieldIndex(const char* fieldName);

This method returns the field index of the fieldName field. Field names are treated case-insensitive (that is, NAME is equivalent to name). Returns the field index (0-based) that can be used in the other methods that have a field index as a parameter; otherwise, if fieldName is invalid, a value of -1 is returned.

int GetFieldLength(int fieldIndex);

This method returns the length of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

DescriptionParameter

The name of the field to getfieldName [IN]

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

const uint16_t* GetFieldName(int fieldIndex);

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

This method returns the name of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

bool GetFieldName(int fieldIndex, char* buffer, int bufferSize, char unicodeReplacement = '?');

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

The buffer where the field name is to be placedbuffer [OUT]

The size of the bufferbufferSize [IN]

unicodeReplacement [IN]

The character that is substituted when a character is encountered that cannot be represented as Latin1; set to to remove the character instead

This method gets the name of the fieldIndex field. Returns TRUE if successful; otherwise, on a non-fatal error, returns FALSE.

DATATYPE GetDataType(int fieldIndex, bool& status);

2010-12-0931

Page 32

API Reference for C++

This method gets the datatype of the field.

6.4 Date

Class Date represents a datatype. It is used to hold a date value for a record field.

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

TRUE upon success; FALSE if a non-fatal error has occuredstatus [OUT]

Date();Default Constructor

Date(const Date& rhs);Copy Constructor

bool SetDate(const char* dateStr);

This method sets the date of this object. The string must be in the format YYYYMMDD, where YYYY is the year, MM is the month and DD is the day. The year can range from 0 to 9999. The month can range from 1 to 12. The day can range from 1 to 31. If the string is not formatted correctly, the date value is not changed. Returns TRUE if the date is valid; otherwise, it returns FALSE.

bool SetDate(int year, int month, int day);

This method sets the date of this object. Returns TRUE if the date is valid; otherwise, it returns FALSE.

bool SetDay(int day);

DescriptionParameter

The date in the format YYYYMMDDdateStr [IN]

DescriptionParameter

The year value from 0 to 9999year [IN]

The month value from 1 to 12month [IN]

The day value from 1 to 31day [IN]

DescriptionParameter

This method sets the day of this object. The day can range from 1 to 28-31, depending on the month. Returns TRUE if the day is valid for the month and year; otherwise, it returns FALSE .

bool SetMonth(int month);

The day value from 1 to 31day [IN]

2010-12-0932

Page 33

API Reference for C++

This method sets the month of this object. The month can range from 1 to 12. Returns TRUE if the month is valid for the day and year; otherwise, it returns FALSE .

bool SetYear(int year);

This method sets the year of this object. The year can range from 0 to 9999. Returns TRUE if the year is valid for the day and month; otherwise, it returns FALSE .

void GetDate(char* dateStr, int bufferSize) const;

DescriptionParameter

The size of the dateStr destination bufferbufferSize [IN]

DescriptionParameter

The month value from 1 to 12month [IN]

DescriptionParameter

The year value from 0 to 9999year [IN]

This method gets the date value of this object. The date is returned as a string with the format YYYYMMDD, where YYYY is the year, MM is the month and DD is the day.

int GetDay() const;

This method gets the day value of this object. It returns the day value from 1 to 31.

int GetMonth() const;

This method gets the month value of this object. It returns the month value from 1 to 12.

int GetYear() const;

This method gets the year value of this object. It returns the year value from 0 to 9999.

6.5 DateTime

Class DateTime represents a datatype. It is used to hold a date and time value for a record field. This class inherits from both the Date and the Time class, so the methods of those classes are available also.

The date value in YYYYMMDD formatdateStr [OUT]

bool SetDateTime(const char* dateTimeStr);

DateTime();Default Constructor

DateTime(const DateTime& rhs);Copy Constructor

2010-12-0933

Page 34

API Reference for C++

DescriptionParameter

The date time in the format YYYYMMDDHHMMSSFdateTimeStr [IN]

This method sets the date time of this object. The string must be in the format YYYYMMDDHHMMSSF, where YYYY is the year, MM is the month, DD is the day, HH is the hours, MM is the minutes, SS is the seconds, and F is an optional one digit of the fraction, which can be repeated. The year can range from 0 to 9999. The month can range from 1 to 12. The day can range from 1 to 31. The hours can range from 0 to 23. The minutes can range from 0 to 59. The seconds can range from 0 to 59. Each optional fraction digit can range from 0 to 9. If the string is not formatted correctly, the date time value is not changed.

Returns TRUE if the date is valid; otherwise, it returns FALSE.

void GetDateTime(char* dateTimeStr, int bufferSize) const;

DescriptionParameter

The date time in the format YYYYMMDDHHMMSSFdateTimeStr [OUT]

The number of characters of the dateTimeStr bufferbufferSize [IN]

This method gets the date time value of this object. The date time will be returned as a string with the following format YYYYMMDDHHMMSSF, where YYYY is the year, MM is the month, DD is the day, HH is the hours, MM is the minutes, SS is the seconds, and F is an optional one digit of the fraction, which can be repeated.

6.6 EmdqException

Class EmdqException is the the exception class thrown by all public interfaces of this product. This class is required for processing. It is defined in the file exception.h.

virtual const uint16_t* GetMessage() const = 0;

This method returns this object's message in UCS2 characters.

virtual const char* GetMessageId() const = 0;

This method returns the message ID of this exception object. The message ID is in the format CCCNNN, where C is an alpha character and N is a numeric character. The CCC represents the source of the error. The NNN is the message number. For example, REC001 is the first message for the DataRecord class.

6.7 InputDataRecord

2010-12-0934

Page 35

API Reference for C++

Class InputDataRecord is the the main interface to the Input Data Record functionality. It inherits from the superclass DataRecord. This class is required for processing.

void Clear();

This method clears all of the fields of the data record. Each character field has a data length of 0.

void SetFieldData(int fieldIndex, const uint16_t* fieldData, int fieldDataLength = -1);

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

The number of UCS2 characters in fieldDatafieldDataLength [IN]

This method sets the data of the fieldIndex field of the data record. fieldDataLength UCS2 characters are copied from the fieldData buffer to the specified data record field.

If the field is set as null, the null will be cleared.

If fieldDataLength is -1, which is the default, then fieldData is assumed to be NULL terminated and its length will be calculated.

void SetFieldData(int fieldIndex, const char* fieldData, int fieldDataLength = -1);

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

The number of Latin1 characters in fieldDatafieldDataLength [IN]

This method sets the data of the fieldIndex field of the data record. fieldDataLength Latin1 characters are copied from the fieldData buffer to the specified data record field.

If the data is longer that the specified field's length, the data is truncated. If the data is shorter than the specified field's length, the field is copied left-justified into the field.

If the field is set as null, the null will be cleared.

If fieldDataLength is -1, which is the default, then strlen() will be used to determine the length fieldData.

void SetFieldData(int fieldIndex, const Date& fieldData);

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

2010-12-0935

Page 36

API Reference for C++

If the field datatype is not compatible with Date, an exception is thrown.

void SetFieldData(int fieldIndex, const DateTime& fieldData);

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with DateTime, an exception is thrown.

vvoid SetFieldData(int fieldIndex, const Time& fieldData);

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with Time, an exception is thrown.

void SetFieldData(int fieldIndex, double fieldData);

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with double, an exception is thrown.

void SetFieldData(int fieldIndex, int fieldData);

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

2010-12-0936

Page 37

API Reference for C++

If the field datatype is not compatible with int, an exception is thrown.

void SetFieldNull(int fieldIndex);

DescriptionParameter

The field to set to NULL (0-based)fieldIndex [IN]

This method sets the field to NULL.

6.8 MessageHandler

Class MessageHandler is a callback class to handle messages from a transform. This class is required for processing and its interface must be implemented by the integrating application.

virtual bool HandleMessage(MESSAGETYPE messageType, const char* messageId, const uint16_t* message) = 0;

Implement to handle a message. This method is passed a message for the implementor to handle. Returns true upon success; returns false upon error and stops processing.

6.9 MultiRecordTransform

Class MultiRecordTransform is the record processing Transform class for processing multiple records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Instances of MultiRecordTransform cannot be instantiated. You must use the CreateMultiRecordTransform methodsin TransformFactory methodsto create valid MultiRecordTransform instances.

MultiRecordTransform objects are used to represent Match.

DescriptionParameter

The type of message to handlemessageType [IN]

The ID of message to handlemessageId [IN]

The message to handlemessage [IN]

MultiRecordTransformHelper* CreateHelper();

2010-12-0937

Page 38

API Reference for C++

It will also share the transform's handlers (log, statistics, etc). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics. Returns a pointer to a newly created helper object.

This method is not thread safe.

void DestroyHelper(MultiRecordTransformHelper* helper);

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

void LoadInputDataRecord(InputDataRecord* record);

DescriptionParameter

The helper to be destroyedhelper [IN]

DescriptionParameter

The input data record to loadrecord [IN]

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method. Do not attempt to use an input data record that belongs to a different transform.

void Process();

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

void SetProgressHandler(ProgressHandler* handler);

This method sets the progress event handler. A progress event handler is called by the transform as it processes the loaded input records. A progress event handler is optional. This method saves a shallow copy of the passed-in progress event handler. It is the application's responsibility to not delete the event handler until this transform has been destroyed.

const OutputDataRecord* UnloadOutputDataRecord();

This method unloads the next available output data record from this transform. There should be one output data record for each input data record that was loaded. An output data record becomes available after the transform has finished processing the input data record and posting results to the output data record.

Normally output records will be available for unloading after Process() has been called. But it is possible for a transform to make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to see if there are any output records available for unloading.

DescriptionParameter

The progress event handlerhandler [IN]

2010-12-0938

Page 39

API Reference for C++

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record; returns 0 if no records are available.

void ClearRecords();

This method clears all input and output records, and readies the transform to process again. Call this method only after you have loaded your input records, processed, and extracted your output.

ProgressHandler* GetProgressHandler();

This method returns a pointer to the current progress event hander. Returns a pointer to the progress event handler.

void SetStatisticsHandler(StatisticsHandler* handler);

This method sets the statistics event handler. A statistics event handler is called whenever a transform wishes to output statistics. Normally, statistics are output when the transform is terminating (see the method DestroyTransform()). A statistics event handler is optional. If omitted, the statistics are not output.

DescriptionParameter

The statisticis event handlerhandler [IN]

This method saves a shallow copy of the passed-in statistics event handler to be used for handling statistics events. It is the application's responsibility not to delete the event handler until this transform has been destroyed.

InputDataRecord* GetInputDataRecord();

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application only has to get the input record once. The same input record may be used repeatedly.

StatisticsHandler* GetStatisticsHandler();

This method returns a pointer to the current statistics event handler.

const DataRecordSchema* GetInputSchema() const;

This method returns the schema of the input data record.

const DataRecordSchema* GetOutputSchema() const;

This method returns the schema of the output data record.

const StatisticsSchema* GetStatisticsSchema(int schemaIndex) const;

DescriptionParameter

The handler of the statistics schema for which to get informationschemaIndex [IN]

This method returns schemaIndex statistics schema.

int GetStatisticsSchemaCount() const;

2010-12-0939

Page 40

API Reference for C++

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

6.10 MultiRecordTransformHelper

Class MultiRecordTransformHelper is the helper class for the multi-record processing Transform class.

void LoadInputDataRecord(InputDataRecord* record);

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method.

DescriptionParameter

The input data record to loadrecord [IN]

This method cannot process an input data record owned by a different transform.

void Process();

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

const OutputDataRecord* UnloadOutputDataRecord();

Normally output records are available for unloading after Process() has been called. However, a transform can make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to check if there are any output records available for unloading.

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record, if available; otherwise, returns 0 if no records are available.

void ClearRecords();

Clears all input and output records and makes the transform ready to process again.

Call this method only after you have loaded your input records, processed, and extracted your output.

InputDataRecord* GetInputDataRecord();

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application must get the input record only once. The same input record may be used repeatedly.

2010-12-0940

Page 41

API Reference for C++

6.11 OutputDataRecord

Class OutputDataRecord is the the main interface to the Output Data Record functionality. It inherits its methods from the superclass DataRecord. This class is required for processing.

void GetFieldData(int fieldIndex, uint16_t* buffer, int bufferSize;

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

Holds UCS2 data and a NULL terminatorbuffer [OUT]

The size of bufferbufferSize[IN]

This method gets the data of the fieldIndex field of the data record. The data is copied to the buffer, including a NULL terminator. bufferSize indicates the number of UCS2 characters that will fit into the buffer.

If the data and NULL terminator is longer that the specified buffer length, the data is truncated. The NULL terminator is always copied.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, char* buffer, int bufferSize, char unicodeReplacement = '?');

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

Holds Latin1 data and a NULL terminatorbuffer [OUT]

The size of bufferbufferSize [IN]

unicodeReplacement [IN]

The character that is substitued if a character is encountered that cannot be recognized as a Latin1 character;set this parameter to 0 if you want characters that cannot be recognized as Latin1 to be removed

This method gets the data of the fieldIndex field of the data record. The data is copied to the buffer, including a NULL terminator. bufferSize indicates the number of Latin1 characters that will fit into the buffer.

All UCS2 values above 255 are converted or dropped. All UCS2 values <= 255 are saved as is. Locality and/or codepage are not considered.

If the data and NULL terminator is longer that the specified buffer length, the data is truncated. The NULL terminator is always copied.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, uint16_t* buffer, int bufferSize, int& numCopied);

2010-12-0941

Page 42

API Reference for C++

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

Holds Latin1 data and a NULL terminatorbuffer [OUT]

The size of bufferbufferSize [IN]

The number of UCS2 characters copied to buffernumCopied [OUT]

This method gets the data of the fieldIndex field of the data record. The data is copied to the buffer. bufferSize indicates the number of UCS2 characters that will fit into buffer. numCopied is set to the number of UCS2 characters copied into buffer . Thebuffer is not NULL-terminated.

If the data is longer that the specified buffer length, the data is truncated.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, Date& output);

DescriptionParameter

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with Date, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, Time& output);

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with Time, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, DateTime& output);

The field to get (0-based)fieldIndex [IN]

The resulting Dateoutput [OUT]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

The resulting Timeoutput [OUT]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

The resulting DateTimeoutput [OUT]

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

2010-12-0942

Page 43

API Reference for C++

If the field datatype is not compatible with DateTime, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, double& output);

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with double, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, int& output);

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

The resulting doubleoutput [OUT]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with int, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

bool IsFieldNull(int fieldIndex);

This method determines if the field fieldIndex is NULL. Returns TRUE if the field is NULL; otherwise, it returns FALSE

int GetFieldDataLength(int fieldIndex);

This method returns the number of characters in the field fieldIndex of the data record, or it returns 0 if the field is NULL.

The resulting intoutput [OUT]

DescriptionParameter

The field to check (0-based)fieldIndex [IN]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

2010-12-0943

Page 44

API Reference for C++

6.12 ProgressHandler

Class ProgressHandler is a callback class to show the progress of a MultiRecordTransform and allow the handler to end processing.

virtual bool HandleProgress(double percentDone) = 0;

This method shows the percentage of completion for the current set of records being processed.

Returns TRUE on success; otherwise, returns FALSE and stops processing.

bool SetProgressInterval(int interval);

DescriptionParameter

The percent done (0.0 - 100.0)percentDone [IN]

This method specifies the interval the transform should wait between calls to the Progress() method. The interval is in seconds and should be greater than 0. Interval values less than or equal to 0 are invalid and will be ignored.

Returns TRUE on success; otherwise, returns FASLE to indicate an invalid interval.

int GetProgressInterval();

This method returns the current progress interval in seconds.

6.13 RecordTransform

Class RecordTranform is the record processing Transform class for processing single records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Instances of RecordTransform cannot be instantiated. You must use the CreateRecordTransform methods in TransformFactory methods to create valid RecordTransform instances.

RecordTransform objects are used to represent Data Cleanse, USA Regulatory Address Cleanse, Global Address Cleanse and Geocoder.

DescriptionParameter

The number of seconds to wait between calls to Progress()interval [IN]

RecordTransformHelper* CreateHelper();

2010-12-0944

Page 45

API Reference for C++

transform. The helper shares some of the transform's resources. It also shares the transform's handlers (for example, for logs and statistics). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

void DestroyHelper(RecordTransformHelper* helper);

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

const OutputDataRecord* Process(InputDataRecord* record);

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record. The output data record is returned. The results can be queried from the output data record. Do not attempt to process an input data record owned by a different transform. Returns output data record on success; returns 0 on a nonfatal error.

DescriptionParameter

The helper to be destroyedhelper [IN]

DescriptionParameter

The input data record to processrecord [IN]

void SetStatisticsHandler(StatisticsHandler* handler);

InputDataRecord* GetInputDataRecord();

StatisticsHandler* GetStatisticsHandler();

This method returns a pointer to the current statistics event handler.

const DataRecordSchema* GetInputSchema() const;

DescriptionParameter

The statisticis event handlerhandler [IN]

2010-12-0945

Page 46

API Reference for C++

This method returns the schema of the input data record.

const DataRecordSchema* GetOutputSchema() const;

This method returns the schema of the output data record.

const StatisticsSchema* GetStatisticsSchema(int schemaIndex) const;

DescriptionParameter

The handler of the statistics schema for which to get informationschemaIndex [IN]

This method returns schemaIndex statistics schema.

int GetStatisticsSchemaCount() const;

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

6.14 RecordTransformHelper

Class RecordTransformHelper is the helper class for the record processing Transform class.

const OutputDataRecord* Process(InputDataRecord* record);

Returns the output data record. The results can be queried from the output data record.

This method cannot process an input data record owned by a different transform.

InputDataRecord* GetInputDataRecord();

DescriptionParameter

The input data record to processrecord [IN]

6.15 StatisticsHandler

2010-12-0946

Page 47

API Reference for C++

Class StatisticsHandler is a callback class to handle statistics records. This interface must be implemented by the integrating application.

virtual bool HandleStatistics(const OutputDataRecord* record) = 0;

DescriptionParameter

The output statistics record to outputrecord [IN]

This method is passed an output record that holds statistics information. The application may query the record to determine which statistics table the record belongs.

The record pointer passed to this method should not be saved. The pointer becomes invalid after this method returns. The application must query and save any field data from the record that it intends to keep.

Returns TRUE upon success; otherwise, returns FALSE and produces an error and stops.

int GetRecordsRemainingCount() const;

This method gets the number of output records that remain to be passed to the Output() method. If the transform has a block of records to send, the transform calls SetRecordsRemainingCount() before each call to Output() to indicate how many records are left to send. The application may use this information to buffer the records instead of processing them individually.

Returns the number of additional records ready to be output.

const StatisticsSchema* GetStatisticsSchema() const;

This method returns the statistics schema.

6.16 StatisticsSchema

Class StatisticsSchema defines the layout of a statistics table.

DATATYPE GetFieldDataType(int fieldIndex, bool& status) const = 0

DescriptionParameter

The field on which to get information (0-based)fieldIndex [IN]

TRUE on success; FALSE on a nonfatal errorstatus [OUT]

This method gets the data type of the fieldIndex field.

const uint16_t* GetTableName();

This method returns the name of the table that this schema describes.

bool GetTableName(char* buffer, int bufferSize, char unicodeReplacement = '?');

2010-12-0947

Page 48

API Reference for C++

DescriptionParameter

The buffer to hold the table namebuffer [OUT]

The size of the bufferbufferSize [IN]

unicodeReplacement [IN]

The character substituted if a character is encountered that cannot be represented as Latin1; set to 0 if you want characters that cannot be converted to be removed.

This method returns the name of the table that this schema describes.

bool AllowNull(int fieldIndex;

DescriptionParameter

The field on which to get information (0-based)fieldIndex [IN]

This method indicates whether the fieldIndex field allows a NULL value. Returns TRUE if fieldIndex allows a NULL value or if fieldIndex is invalid; otherwise, returns FALSE.

bool IsPrimaryKey(int fieldIndex);

DescriptionParameter

The field on which to get information (0-based)fieldIndex [IN]

This method indicates whether the fieldIndex field is a primary key. Returns TRUE if fieldIndex is a primary key; otherwise, returns FALSE.

DataRecordSchema::DATATYPE GetDataType(int fieldIndex, bool& status);

DescriptionParameter

This method gets the datatype for the field.

6.17 Time

Class Time represents a datatype. It is used to hold a time value for a record field.

bool SetTime(const char* timeStr);

The field on which to get information (0-based)fieldIndex [IN]

FALSE if a non-fatal error has occurredstatus [OUT]

2010-12-0948

Page 49

API Reference for C++

This method sets the time of this object. The string must be in the format HHMMSSF, where HH is the hour, MM is the minutes, SS is the seconds, and F is an optional digit of the fraction, which can be repeated. The hours can range from 0 to 23. The minutes can range from 0 to 59. The seconds can range from 0 to 59. Each optional fraction digit can range from 0 to 9. If the string is not formatted correctly, the time value is not changed. Returns TRUE if the time is valid; otherwise, it returns FALSE.

bool SetHours(int hours);

This method sets the hours of this object. The hours can range from 0 to 23. Returns TRUE if the hours is valid; otherwise, it returns FALSE.

bool SetMinutes(int minutes);

DescriptionParameter

The time in the format HHMMSSFtimeStr [IN]

DescriptionParameter

The hours value from 0 to 23hours [IN]

This method sets the minutes of this object. The minutes can range from 0 to 59. Returns TRUE if the minutes is valid; otherwise, it returns FALSE.

bool SetSeconds(int seconds);

This method sets the seconds of this object. The seconds can range from 0 to 59. Returns TRUE if the seconds is valid; otherwise, it returns FALSE.

bool SetFractionOfSeconds(double fraction);

This method sets the fractional seconds of this object. The range must be 0.0 <= fraction < 1.0. Returns TRUE if the seconds is valid; otherwise, it returns FALSE.

void GetTime(char* timeStr, int bufferLength) const;

DescriptionParameter

The minutes value from 0 to 59minutes [IN]

DescriptionParameter

The seconds value from 0 to 59seconds [IN]

DescriptionParameter

The fractional seconds value.fraction [IN]

2010-12-0949

Page 50

API Reference for C++

This method gets the time value of this object. The time is returned as a string with the format HHMMSSF, where HH is the hours, MM is the minutes, SS is the seconds, and F is one digit of the fraction, which can be repeated.

int GetHours() const;

This method gets the hours value of this object. It returns the hours value from 0 to 23.

int GetMinutes() const;

This method gets the minutes value of this object. It returns the minutes value from 0 to 59.

int GetSeconds() const;

This method gets the seconds value of this object. It returns the seconds value from 0 to 59.

double GetFractionOfSeconds() const;

This method gets the fractional seconds of this object. The fractional seconds can range from 0 to < 1.

DescriptionParameter

The size of the timeStr bufferbufferLength [IN]

The time value in HHMMSSF formattimeStr [OUT]

6.18 TransformFactory

Class TransformFactory is used to create a Transform. This class is required for processing.

MultiRecordTransform* CreateMultiRecordTransform(const char* transformSettings, int transformSettingsBufferSize);

DescriptionParameter

The transform settings buffertransformSettingsBuffer [IN]

The number of bytes in transformSettingsBuffertransformSettingsBufferSize

This method creates a multi-record transform, using the XML found in transformSettingsBuffer.

transformSettingsBuffer is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute.

If you are passing UCS2 data (2 byte characters) to this method, then the encoding attribute in the XML must either not exist, or be UCS2/UTF16. If you are passing Latin1 (1 byte characters) to this method, the encoding attribute in the XML must either not exist, or be UTF-8/Latin1.

Returns a pointer to the created multi-record transform.

RecordTransform* CreateRecordTransform(const char* transformSettings, int transformSettingsBufferSize);

2010-12-0950

Page 51

API Reference for C++

This method creates a record transform, using the XML found in transformSettingsBuffer.

transformSettingsBuffer is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute.

If you are passing UCS2 data (2 byte characters )to this method, then the encoding attribute in the XML must either not exist, or be UCS2/UTF16. If you are passing Latin1 (1 byte characters) to this method, the encoding attribute in the XML must either not exist, or be UTF-8/Latin1.

Returns a pointer to the created record transform.

void DestroyTransform(Transform* transform);

DescriptionParameter

The transform settings buffertransformSettingsBuffer [IN]

The number of bytes in transformSettingsBuffertransformSettingsBufferSize

DescriptionParameter

The transform to destroytransform [IN]

This method destroys a record transform or a multi-record transform. Destroying a transform may cause final statistics to be passed to the statistics event handler.

const char* UpgradeTransformSettings(const char* transformSettings, int transformSettingsLength, int& upgradedSet tingsLength);

DescriptionParameter

The transform settings buffertransformSettings [IN]

The number of bytes in transformSettingstransformSettingsLength [IN]

The actual length in bytes of the upgraded XMLupgradedSettingsLength [OUT]

This method upgrades the transform settings. It upgrades a transform’s XML settings found in transformSettings. The transformSettings parameter is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute. If you are passing UCS2 data to this method (2 byte characters) then the encoding attribute in the XML must either not exist, or must be set to UCS2 or UTF16. If you are passing Latin (1 byte characters) to this method, then the encoding attribute in the XML must either not exist, or must be set to UTF-8 or Latin1.

Once the XML is successfully parsed, the version is checked. If the XML is current, then the pointer to the passed in buffer is returned. If the XML is not current, then the XML is upgraded with the latest version string and other transform-specific changes. The upgraded XML is then stored as a string into an internal buffer and that internal buffer is returned. The number of bytes in the returned buffer is stored in upgradedSettingsLength.

Returns the pointer to the passed in buffer if the XML is current; otherwise, returns the internal buffer that holds the updated XML. If the internal buffer is returned, the data in the buffer should be copied out of the buffer before any other calls to this object.

bool ValidateTransformSettings(const char* transformSettings, int transformSettingsBufferSize);

2010-12-0951

Page 52

API Reference for C++

This method validates the XML for the transform found in transformSettingsBuffer.

transformSettingsBuffer is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute.

void SetMessageHandler(MessageHandler* handler);

DescriptionParameter

The transform settings buffertransformSettingsBuffer [IN]

The number of bytes in transformSettingsBuffertransformSettingsBufferSize

DescriptionParameter

The log event handlerhandler [IN]

This method sets the log event handler. A log event handler is called whenever a transform needs to output log information. A log event handler is required.

This method saves a shallow copy of the passed-in log event handler to be used for logging messages. The object will be used by each subsequently created Transform. If the application needs that each created transform has its own statistics event handler, the application must call this method with a new log event handler before each new transform is created. It is the application's responsibility to not delete the event handler until all transforms that are using the event handler have been destroyed.

void SetLocale(const char* locale);

This method sets the locale to use for messages produced by Transforms. If the locale is not supported, a warning will be logged to the MessageHandler set using SetMessageHandler and all messages will default to en_US.

MessageHandler* GetMessageHandler();

This method returns a pointer to the current log event handler.

const char* GetLocale() const;

This method gets the locale that is currently being used. If the locale set by a call to SetLocale is supported, this method will return that value. If the locale set using SetLocale was not supported, the default locale of en_US will be returned.

DescriptionParameter

The locale to uselocale [IN]

static const char* GetVersion();

This method gets the version of the Data Quality Management SDK being used.

2010-12-0952

Page 53

API Reference for Java

7.1 Java API reference overview

This section details the API for the Java implementation.

The package for the Java API is com.sap.emdq.

7.2 CertifiedReportGenerator

Class CertifiedReportGenerator is an implementation of the public StatisticsEvenHandler interface that can generate the certified mailing Statement of Address Accuracy (SERP), Address Matching processing Summary (AMAS), and U.S. Coding Accuracy Support System (CASS) 3553 reports.

CertifiedReportGenerator()

This method is the constructor and must be run before use of the Certified Report Generator.

void Destroy()

This method is required to create the reports. Only call this method after all processing is done. The object will no longer be valid after this call.

boolean handleStatistics(OutputDataRecord record)

DescriptionParameter

The statistics record to outputrecord

This method Implements the StatisticsHandler interface.

void setReportFile(ReportType reportType, String reportFile)

2010-12-0953

Page 54

API Reference for Java

This method tells CertifiedReportGenerator a path and file name to create the report. Valid reportType options are Cass3553Report, AmasReport, and SerpReport. You must call this method for report you want generated.

If the specified file exists, if previous version of the file is overwritten. If the path to the file specified does not exist, the file is not created and an error occurs.

This method must be called for each report you wish to have generated prior to using any transform.

Throws EmdqException.

Related Topics

• StatisticsHandler

DescriptionParameter

Which report to write to reportFilereportType

A valid filename and path where the report is to be generatedreportFile

7.3 DataRecordSchema

Class DataRecordSchema defines the layout of a Data Record.

int getFieldCount()

This method gets the field count.

Returns the number of fields defined in the data record.

Throws EmdqException.

int getFieldIndex(String fieldName)

This method returns the field index of the fieldName field. Field names are treated case-insensitive (that is, NAME is equivalent to name).

Returns the field index (0-based) that can be used in the other methods that have a field index as a parameter; otherwise, if fieldName is invalid, a value of -1 is returned.

Throws EmdqException.

int getFieldLength(int fieldIndex)

DescriptionParameter

The name of the field to getfieldName [IN]

2010-12-0954

Page 55

API Reference for Java

This method gets the length of the field fieldIndex.

Returns the length of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

Throws EmdqException.

String getFieldName(int fieldIndex)

This method gets the name of the field fieldIndex.

Returns the name of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

Throws EmdqException.

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

DataType getDataType(int fieldIndex)

This method gets the datatype of the field fieldIndex.

Returns the datatype of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

Throws EmdqException.

7.4 EmdqException

Class EmdqException is the the exception class thrown by all public interfaces of this product. This class is required for processing.

public String getMessageId()

DescriptionParameter

The field for which to get information (0-based)fieldIndex [IN]

2010-12-0955

Page 56

API Reference for Java

7.5 InputDataRecord

Class InputDataRecord is the main interface to the Input Data Record functionality. It inherits from the superclass DataRecord. This class is required for processing.

void clear()

This method clears all of the fields of the data record. Each character field has a data length of 0.

Throws EmdqException.

void setStringData(int fieldIndex, String fieldData)

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

This method sets the data of the fieldIndex field of the data record. The data is copied from the fieldData buffer.

Throws EmdqException.

void setDateData(int fieldIndex, Calendar fieldData)

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with Date.

void setDateTimeData(int fieldIndex, Calendar fieldData)

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

2010-12-0956

Page 57

API Reference for Java

Throws EmdqException if the field datatype is not compatible with DateTime.

void setTimeData(int fieldIndex, Calendar fieldData)

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with Time.

void setDoubleData(int fieldIndex, double fieldData)

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with double.

void setIntegerData(int fieldIndex, int fieldData)

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with int.

void setFieldNull(int fieldIndex)

The field data to setfieldData [IN]

DescriptionParameter

The field to set (0-based)fieldIndex [IN]

The field data to setfieldData [IN]

DescriptionParameter

This method sets the field to NULL.

Throws EmdqException.

The field to set to NULL (0-based)fieldIndex [IN]

2010-12-0957

Page 58

API Reference for Java

7.6 MessageHandler

Class MessageHandler is a callback class to handle messages from a transform. This class is required for processing and its interface must be implemented by the integrating application.

MessageHandler()

This method is the protected scope constructor. It must be called before the extended class is used.

abstract boolean handleMessage(MessageType type, String messageId, String message)

DescriptionParameter

The type of message to handletype

The ID of message to handlemessageId

The message to handlemessage

This method handles log messages produced by this product. Set delegate object on TransformFactory using the LogHandler property.

Returns true upon success; returns false upon error and stops processing.

7.7 MultiRecordTransform

Instances of MultiRecordTransform cannot be instantiated. You must use the CreateMultiRecordTransform methodsin TransformFactory methodsto create valid MultiRecordTransform instances.

MultiRecordTransform objects are used to represent Match.

MultiRecordTransformHelper createHelper()

This method is used to create a helper for this transform. A helper is used to process records, just like the transform itself does. Typically a helper is run in a different thread than the transform. The helper will have all the same settings as this transform. The helper will share some of the transform's resources. It will also share the transform's handlers (log, statistics, etc). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

Returns a newly created helper object.

2010-12-0958

Page 59

API Reference for Java

This method is not thread safe.

Throws EmdqException.

void destroyHelper(MultiRecordTransformHelper helper)

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

Throws EmdqException .

void loadInputDataRecord(InputDataRecord record)

DescriptionParameter

The helper to be destroyedhelper [IN]

DescriptionParameter

The input data record to loadrecord [IN]

Throws EmdqException.

void process()

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

Throws EmdqException.

void setProgressHandler(ProgressHandler handler)

Throws EmdqException.

OutputDataRecord unloadOutputDataRecord()

DescriptionParameter

The progress event handlerhandler [IN]

2010-12-0959

Page 60

API Reference for Java

Normally output records will be available for unloading after Process() has been called, but it is possible for a transform to make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to see if there are any output records available for unloading.

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record; returns null if no records are available.

Throws EmdqException.

void clearRecords()

This method clears all input and output records, and readies the transform to process again. Call this method only after you have loaded your input records, processed, and extracted your output.

Throws EmdqException.

ProgressHandler getProgressHandler()

This method returns a pointer to the current progress event handler. Returns a pointer to the progress event handler.

Throws EmdqException.

void setStatisticsHandler(StatisticsHandler handler)

Throws EmdqException.

InputDataRecord getInputDataRecord()

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application has to get the input record only once. The same input record may be used repeatedly.

Throws EmdqException.

StatisticsHandler getStatisticsHandler()

This method returns the current statistics event handler.

DescriptionParameter

The statisticis event handlerhandler [IN]

Throws EmdqException.

DataRecordSchema getInputSchema()

2010-12-0960

Page 61

API Reference for Java

This method returns the schema of the input data record.

Throws EmdqException.

DataRecordSchema getOutputSchema()

This method returns the schema of the output data record.

Throws EmdqException.

StatisticsSchema getStatisticsSchema(int schemaIndex)

DescriptionParameter

The statistics schema for which to get information (0-based)schemaIndex [IN]

This method returns schemaIndex statistics schema.

Throws EmdqException.

int getStatisticsSchemaCount()

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

Throws EmdqException.

7.8 MultiRecordTransformHelper

Class MultiRecordTransformHelper is the helper class for the multi-record processing Transform class.

void loadInputDataRecord(InputDataRecord record)

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method.

This method cannot process an input data record owned by a different transform.

Throws EmdqException.

MultiRecordTransformHelper();Default Constructor

virtual ~MultiRecordTransformHelper();Default Destructor

DescriptionParameter

The input data record to loadrecord [IN]

void process()

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

2010-12-0961

Page 62

API Reference for Java

Throws EmdqException.

OutputDataRecord unloadOutputDataRecord()

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record if available; otherwise, returns 0 if no records are available.

Throws EmdqException.

void clearRecords()

Clears all input records set and makes the transform ready to process again.

Call this method only after you have loaded your input records, processed, and extracted your output.

Throws EmdqException.

InputDataRecord getInputDataRecord()

7.9 OutputDataRecord

Class OutputDataRecord is the main interface to the Output Data Record functionality. It inherits its methods from the superclass DataRecord. This class is required for processing.

OutputDataRecord(long internalPtr)

internalPtr

This method is the package scope constructor of this class.

DescriptionParameter

The long value representing C++ pointer to OutputDataRecord object in native code

String getStringData(int fieldIndex)

2010-12-0962

Page 63

API Reference for Java

This method gets the data of the fieldIndex field of the data record. In the process a new string is created and returned containing the data of the field.

Returns the field data on success; otherwise, returns null if there is no data.

Throws EmdqException.

Calendar getDateData(int fieldIndex)

This method gets the Date data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with Date, an exception is thrown.

Any time information stored within the Calendar object will be invalid.

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

Calendar getTimeData(int fieldIndex)

This method gets the Time data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with Time, an exception is thrown.

Any time information stored within the Calendar object will be invalid.

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

Calendar getDateTimeData(int fieldIndex)

This method gets the DateTime data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with DateTime, an exception is thrown.

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

double getDoubleData(int fieldIndex)

2010-12-0963

Page 64

API Reference for Java

This method gets the double data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with double, an exception is thrown.

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

int getIntData(int fieldIndex)

This method gets the int data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with int, an exception is thrown.

Returns the field data. If fieldIndex is null, no processing occurs.

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

Throws EmdqException.

boolean isFieldNull(int fieldIndex)

This method determines if the field fieldIndex is null.

Returns TRUE if the field is null; otherwise, it returns FALSE.

Throws EmdqException.

int getFieldDataLength(int fieldIndex)

This method gets the field data length.

Returns the number of characters in the field fieldIndex of the data record; returns -1 if fieldIndex is invalid.

Throws EmdqException.

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

DescriptionParameter

The field to get (0-based)fieldIndex [IN]

The resulting intoutput [OUT]

2010-12-0964

Page 65

API Reference for Java

7.10 ProgressHandler

Class ProgressHandler is a callback class to show the progress of a MultiRecordTransform and allow the handler to end processing.

protected abstract boolean handleProgress(double percentDone)

DescriptionParameter

The percent done (0.0 - 100.0)percentDone [IN]

This method shows the percentage of completion for the current set of records being processed. Returns TRUE to continue processing; otherwise, returns FALSE to stop processing.

boolean setProgressInterval(int interval)

This method specifies the interval the transform should wait between calls to Progress(). The interval is in seconds and must be greater than 0.

Throws EmdqException.

int getProgressInterval()

This method returns the current progress interval in seconds.

Throws EmdqException.

7.11 RecordTransform

Instances of RecordTransform cannot be instantiated. You must use the CreateRecordTransform methods in TransformFactory methods to create valid RecordTransform instances.

DescriptionParameter

The number of seconds to wait between calls to Progress()interval [IN]

RecordTransform objects are used to represent Data Cleanse, USA Regulatory Address Cleanse, Global Address Cleanse, and Geocoder.

RecordTransform(long internalPtr)

2010-12-0965

Page 66

API Reference for Java

DescriptionParameter

internalPt

The pointer representing the C++ pointer to the C++ RecordTransform object in native code

This method is the package scoped constructor for RecordTransform. It should be called only from TransformFactory.

RecordTransformHelper createHelper()

This method is used to create a helper for this transform. A helper is used to process records. Typically a helper is run in a different thread than the transform. The helper has all the same settings as this transform. The helper shares some of the transform's resources. It also shares the transform's handlers (for example, for logs and statistics). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

Returns a newly created helper object.

Throws EmdqException.

void destroyHelper(RecordTransformHelper helper)

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

DescriptionParameter

The helper to be destroyedhelper [IN]

Throws EmdqException.

OutputDataRecord process(InputDataRecord record)

Values found within the OuputDataRecord return will only be valid until a Process is called again.

Returns output data record on success; returns null on a nonfatal error.

Throws EmdqException.

void setStatisticsHandler(StatisticsHandler handler)

DescriptionParameter

The input data record to processrecord [IN]

2010-12-0966

Page 67

API Reference for Java

Throws EmdqException.

InputDataRecord getInputDataRecord()

DescriptionParameter

The statisticis event handlerhandler [IN]

Throws EmdqException.

StatisticsHandler getStatisticsHandler()

This method returns the current statistics event handler.

Throws EmdqException.

DataRecordSchema getInputSchema()

This method returns the schema of the input data record.

Throws EmdqException.

DataRecordSchema getOutputSchema()

This method returns the schema of the output data record.

Throws EmdqException.

StatisticsSchema getStatisticsSchema(int schemaIndex)

DescriptionParameter

The statistics schema for which to get information (0-based)schemaIndex [IN]

This method returns schemaIndex statistics schema.

Throws EmdqException.

int getStatisticsSchemaCount()

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

Throws EmdqException.

2010-12-0967

Page 68

API Reference for Java

7.12 RecordTransformHelper

Class RecordTransformHelper is the helper class for the record processing Transform class.

OutputDataRecord process(InputDataRecord record)

Returns the output data record. The results can be queried from the output data record.

DescriptionParameter

The input data record to processrecord [IN]

This method cannot process an input data record owned by a different transform.

Throws EmdqException.

InputDataRecord getInputDataRecord()

Throws EmdqException.

7.13 StatisticsHandler

Class StatisticsHandler is a callback class to handle statistics records. This interface must be implemented by the integrating application.

abstract boolean handleStatistics(OutputDataRecord record);

DescriptionParameter

The output statistics record to outputrecord [IN]

This method is passed an output record that holds statistics information. The application may query the record to determine which statistics table the record belongs.

2010-12-0968

Page 69

API Reference for Java

Returns TRUE upon success; otherwise, returns FALSE and produces an error and stops.

int getRecordsRemainingCount()

Returns the number of additional records ready to be output.

StatisticsSchema getStatisticsSchema()

This method returns the statistics schema.

7.14 StatisticsSchema

Class StatisticsSchema defines the layout of a statistics table.

DataType getFieldDataType(int fieldIndex)

This method gets the data type of the fieldIndex field.

Throws EmdqException.

String getTableName()

This method returns the name of the table that this schema describes.

Throws EmdqException.

boolean allowNull(int fieldIndex)

DescriptionParameter

The field on which to get information (0-based)fieldIndex [IN]

DescriptionParameter

The field on which to get information (0-based)fieldIndex [IN]

This method indicates whether the fieldIndex field allows a NULL value. Returns TRUE if fieldIndex allows a NULL value or if fieldIndex is invalid; otherwise, returns FALSE.

Throws EmdqException.

boolean isPrimaryKey(int fieldIndex)

2010-12-0969

Page 70

API Reference for Java

DescriptionParameter

The field on which to get information (0-based)fieldIndex [IN]

This method indicates whether the fieldIndex field is a primary key. Returns TRUE if fieldIndex is a primary key; otherwise, returns FALSE.

Throws EmdqException.

7.15 TransformFactory

Class TransformFactory is used to create a Transform. This class is required for processing.

synchronized MultiRecordTransform createMultiRecordTransform (String transformSettings)

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method creates a multi-record transform, using the XML found in transformSettings.

Returns a handle to the created multi-record transform.

Throws EmdqException.

synchronized RecordTransform createRecordTransform(String transformSettings)

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method creates a record transform, using the XML found in transformSettings.

Returns a handle to the created record transform.

Throws EmdqException.

void destroyTransform(Transform transform)

DescriptionParameter

The transform to destroytransform [IN]

This method destroys a record transform or a multi-record transform. Destroying a transform may cause final statistics to be passed to the statistics event handler. You must call this method when you are finished using a transform instance.

Throws EmdqException.

synchronized String upgradeTransformSettings(String transformSettings)

2010-12-0970

Page 71

API Reference for Java

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method upgrades the transform settings. It upgrades a transform’s XML settings found in transformSettings and returns it as a String.

Throws EmdqException.

synchronized boolean validateTransformSettings(String transformSettings)

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method validates the XML for the transform found in transformSettings. Returns TRUE if the transform settings had no error; otherwise, returns FALSE.

Throws EmdqException.

void setMessageHandler(MessageHandler logEventHandler)

This method sets the log event handler. A log event handler is called whenever a transform wishes to output log information. A log event handler is required.

public void setLocale(String locale)

public MessageHandler getMessageHandler()

This method returns the current log event handler.

String getLocale()

static const char* GetVersion();

This method gets the version of the Data Quality Management SDK being used.

DescriptionParameter

The log event handlerlogEventHandler [IN]

DescriptionParameter

The locale to uselocale [IN]

2010-12-0971

Page 72

API Reference for Java

2010-12-0972

Page 73

API Reference for .Net

8.1 .Net API reference overview

This section details the API for the .Net implementation.

The namespace for the .Net API is Sap.Emdq.

Ensure the <install_location>\<platform>\bin folder is added to your PATH environment variable prior to launching Visual Studio.

In Visual Studio, set the application type within the application integrating this product to x86 for 32 bit applications and x64 for 64 bit applications. The default value of “Either” is not sufficient.

8.2 EmDQException

Class EmDQException is the the exception class thrown by all public interfaces of this product. This class is derived from the Exception class and the text of the message can be found in the Message member. This class is required for processing.

property System::String^ MessageId

property System::String^ Message

This method is a member of the base class, Exception. It contains the content of the message.

8.3 LogHandler

Class Loghandler is a delegate object used to pass messages from the SDK to the integrating application.

delegate bool LogHandler(LogMessageType type, string messageId, string message)

2010-12-0973

Page 74

API Reference for .Net

This method passes messages from the SDK to the integrating application.

8.4 MultiRecordProgressHandler

Class MultiRecordProgresshandler is a delegate object used to pass progress information from the SDK to the integrating application.

delegate bool MultiRecordProgressHandler(double percentDone)

DescriptionParameter

The type of message to handletype

The ID of the message to handlemessageId

The message to handlemessage

This method indicates indicates progress information.

8.5 MultiRecordTransform

Instances of MultiRecordTransform cannot be instantiated. You must use the CreateMultiRecordTransform methodsin TransformFactory methodsto create valid MultiRecordTransform instances.

MultiRecordTransform objects are used to represent Match.

System::Data::DataTable^ Process(System::Data::DataTable^ input);

DescriptionParameter

The percent of progresspercentDone

The collection of input data records to processinput

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

2010-12-0974

Page 75

API Reference for .Net

To monitor progress of processing, register a delegate with the DataTable input, RowChanged event.

MultiRecordTransformHelper^ CreateHelper();

Returns a newly created helper object.

This method is not thread safe.

void DestroyHelper(MultiRecordTransformHelper^ helper);

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

DescriptionParameter

The helper to be destroyedhelper [IN]

property System::Data::DataTable^ InputSchema

This method returns the schema of the input data record.

property System::Data::DataTable^ OutputSchema

This method returns the schema of the output data record.

property System::Data::DataSet^ StatisticsSchemas

This method gets the set of statistics tables that will be populated.

Statistics are received from the SDK by adding a DataRowChangeEventHandler delegate to each of the statistics tables contained in the StatisticsSchemas data set. The method associated with the delegate is called each time statistics are generated by a transform. This is generally done when the transform terminates.

property MultiRecordProgressHandler^ ProgressHandler

This property is a pointer to the progress handler delegate that will receive progress status from the SDK.

property int ProgressInterval

This property contains the interval, in seconds, that progress is reported from the SDK.

8.6 MultiRecordTransformHelper

2010-12-0975

Page 76

API Reference for .Net

Class MultiRecordTransformHelper is a shared resource based processing object that is a clone of a MultiRecordTransform.

System::Data::DataTable^ Process(System::Data::DataTable^ input);

DescriptionParameter

The collection of input data records to processinput

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

8.7 RecordTransform

Instances of RecordTransform cannot be instantiated. You must use the CreateRecordTransform methods in TransformFactory methods to create valid RecordTransform instances.

RecordTransform objects are used to represent Data Cleanse, USA Regulatory Address Cleanse, Global Address Cleanse, and Geocoder.

void Process(System::Data::DataRow^ input, System::Data::DataRow^ output);

DescriptionParameter

The input data record to processrecord [IN]

RecordTransformHelper^ CreateHelper();

This method is used to create a helper for this transform. A helper is used to process records, just like the transform itself does. Typically a helper is run in a different thread than the transform. The helper has all the same settings as this transform. The helper shares some of the transform's resources. It also shares the transform's handlers (for example, for logs and statistics). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

void DestroyHelper(RecordTransformHelper^ helper);

2010-12-0976

Page 77

API Reference for .Net

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

property System::Data::DataTable^ InputSchema;

This method returns the schema of the input data record.

property System::Data::DataTable^ OutputSchema;

This method returns the schema of the output data record.

property System::Data::DataSet^ StatisticsSchemas;

This method gets the set of statistics tables that will be populated.

DescriptionParameter

The helper to be destroyedhelper [IN]

8.8 RecordTransformHelper

Class RecordTransformHelper is the helper class for the record processing Transform class.

void Process(System::Data::DataRow^ input, System::Data::DataRow^ output);

Returns the output data record or nullptr on a nonfatal error. The results can be queried from the output data record.

This method cannot process an input data record owned by a different transform.

8.9 TransformFactory

Class TransformFactory is used to create a Transform. This class is required for processing.

In the .Net implementation, this class also contains the methods used by the Certified Report Generator.

property LogHandler^ LoggerHandler;

2010-12-0977

Page 78

API Reference for .Net

DescriptionParameter

The log event handlerhandler [IN]

Set the log event handler. A log event handler is called whenever a transform wishes to output log information. A log event handler is required.

property System::String^ Locale;

DescriptionParameter

The locale to uselocale [IN]

Sets the locale to use for messages produced by Transforms. If the locale is not supported, a warning is logged to the LogHandler set and all messages will default to en_US.

MultiRecordTransform^ CreateMultiRecordTransform(System::String^ transformSettings);

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method creates a multi-record transform, using the XML found in transformSettings.

Returns a handle to the created multi-record transform.

RecordTransform^ CreateRecordTransform(System::String^ transformSettings);

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method creates a record transform, using the XML found in transformSettings.

Returns a handle to the created record transform.

bool ValidateTransformSettings(System::String^ transformSettings);

DescriptionParameter

The transform settings buffertransformSettings [IN]

This method validates the XML transform settings found in transformSettings.

Returns TRUE if the transform settings had no errors; otherwise, returns FALSE.

System::String^ UpgradeTransformSettings(System::String^ transformSettings);

2010-12-0978

Page 79

API Reference for .Net

This method upgrades the transform settings. It upgrades a transform’s XML settings found in transformSettings and returns it as a String.

void DestroyTransform(Transform^ transform);

This method disposes of a transform instance. This method is needed to ensure operations that are performed when the user is finished with the transform, such as producing statistics that can only be done when the transform is finished processing.

You must call this method when you are finished using a transform instance.

void SetSerpReport(System::String^ fileName);

DescriptionParameter

The transform settingstransformSettings [IN]

DescriptionParameter

The transform to destroytransform [IN]

DescriptionParameter

The filename and path to where the report is to be generatedfileName [IN]

This method generates certified mailing reports. You must call this method prior to creating a transform if you want the SERP report generated.

void SetAmasReport(System::String^ fileName);

DescriptionParameter

The filename and path to where the report is to be generatedfileName [IN]

This method generates certified mailing reports. You must call this method prior to creating a transform if you want the AMAS report generated.

void SetCass3553Report(System::String^ fileName);

DescriptionParameter

The filename and path to where the report is to be generatedfileName [IN]

This method generates certified mailing reports. You must call this method prior to creating a transform if you want the CASS 3553 report generated.

2010-12-0979

Page 80

API Reference for .Net

2010-12-0980

Page 81

Address cleanse concepts

This product allows you to create applications that use many address cleanse features, from basic parsing and standardizing to more advanced concepts unique to only some transforms.

9.1 Address cleanse basics

Address cleanse provides a corrected, complete, and standardized form of your original address data. With the USA Regulatory Address Cleanse transform and for some countries with the Global Address Cleanse transform, address cleanse can also correct or add postal codes.

What happens during address cleanse?

The USA Regulatory Address Cleanse transform and the Global Address Cleanse transform cleanse your data in the following ways:

• Verify that the locality, region, and postal codes agree with one another. If your data has just a

locality and region, the transform usually can add the postal code and vice versa (depending on the country).

• Standardize the way the address line looks. For example, they can add or remove punctuation and

abbreviate or spell-out the primary type (depending on what you want).

• Identify undeliverable addresses, such as vacant lots and condemned buildings (USA records only).

• Assign diagnostic codes to indicate why addresses were not assigned or how they were corrected.

Reports

The USA Regulatory Address Cleanse transform provides data for the creation of the USPS Form 3553 (required for CASS) and the NCOALink Summary Report. The Global Address Cleanse transform provides data for the creation of the Canadian SERP—Statement of Address Accuracy Report, the Australia Post’s AMAS report, and the New Zealand SOA Report.

9.2 Set up the reference files

The USA Regulatory Address Cleanse transform and the Global Address Cleanse transform and engines rely on directories (reference files) in order to cleanse your data.

2010-12-0981

Page 82

Address cleanse concepts

Directories

To correct addresses and assign codes, the address cleanse transforms rely on databases called postal directories.

Besides the basic address directories, there are many specialized directories that the USA Regulatory Address Cleanse transform uses:

• DPV®

• Early Warning System (EWS)

• eLOT®

• GeoCensus

• LACSLink®

• NCOALink®

• RDI™

• SuiteLink™

• Z4Change

These features help extend address cleansing beyond the basic parsing and standardizing.

Define directory file locations

In the transform, you must tell the transform or engine where your directory (reference) files are located.

Caution:

Incompatible or out-of-date directories can render the software unusable. The system administrator must install weekly, monthly or bimonthly directory updates for the USA Regulatory Address Cleanse Transform; monthly directory updates for the Australia and Canada engines; and quarterly directory updates for the Global Address engine to ensure that they are compatible with the current software.

Related Topics

• Directory Data

2010-12-0982

Page 83

USA Regulatory Address Cleanse

10.1 USA Regulatory Address Cleanse overview

The USA Regulatory Address Cleanse transform identifies, parses, validates, and corrects USA address data according to the U.S. Coding Accuracy Support System (CASS). This transform supports the generation of data that can be used to generate the USPS Form 3553 and can output many useful codes to your records. You can also run in a non-certification mode as well as produce suggestion lists.

Note:

If an input record has characters not included in the Latin1 code page, the USA Regulatory Address Cleanse transform will not process that data. Instead, the software sends the mapped input record to the corresponding standardized output field (if applicable). No other output fields will be populated for that record. If your Unicode database has valid U.S. addresses from the Latin1 character set, the transform processes as normal.

If you perform both data cleansing and matching, the USA Regulatory Address Cleanse transform typically should process the data before the Data Cleanse transform, as well as any of the Match transforms.

The following sections describe the configurations for the USA Regulatory Address Cleanse XML. You can find examples of the XML configurations with the samples installed with the product.

10.2 USPS DPV®

DPV is a USPS product developed to assist users in validating the accuracy of their address information. DPV compares Postcode2 information against the DPV directories to identify known addresses and potential problems that may cause an address to become undeliverable.

DPV is available for U.S. data in the USA Regulatory Address Cleanse transform only.

You can enable DPV in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Note:

DPV processing is required for CASS certification. If you are not processing for CASS certification, you can choose to run your jobs in non-certified mode and still enable DPV.

2010-12-0983

Page 84

USA Regulatory Address Cleanse

Caution:

If you choose to disable DPV processing, the software will not generate the CASS-required documentation and your mailing will not be eligible for postal discounts.

Related Topics

• Assignment options

10.2.1 Benefits of DPV

DPV can be beneficial in the following areas:

• Mailing: DPV helps to screen out undeliverable-as-addressed (UAA) mail and helps to reduce mailing

costs.

• Information quality: DPV increases the level of data accuracy by verifying an address down to the

individual house, suite, or apartment instead of only the block face.

• Increased assignment rate: DPV may increase assignment rate through the use of DPV tiebreaking

to resolve a tie when other tie-breaking methods are not conclusive.

• Preventing mail-order-fraud: DPV can eliminate shipping of merchandise to individuals who place

fraudulent orders by verifying valid delivery addresses and Commercial Mail Receiving Agencies (CMRA).

10.2.2 DPV security

The USPS has instituted processes that monitor the use of DPV. Each company that purchases the DPV functionality is required to sign a legal agreement stating that it will not attempt to misuse the DPV product. If a user abuses the DPV product, the USPS has the right to prohibit the user from using DPV in the future.

10.2.2.1 False positive addresses

The USPS has added security to prevent DPV abuse by including false positive addresses within the DPV directories. If the software finds a false positive address in the data, DPV can lock processing based on your provider level.

Related Topics

• DPV locking

2010-12-0984

Page 85

USA Regulatory Address Cleanse

10.2.3 DPV monthly directories

DPV directories are shipped monthly with the USPS directories in accordance with USPS guidelines.

The directories expire in 105 days. The date on the DPV directories must be the same date as the Address directory.

Do not rename any of the files. DPV will not run if the file names are changed. The following is a list of the DPV directories:

• dpva.dir

• dpvb.dir

• dpvc.dir

• dpvd.dir

• dpv_vacant.dir

• dpv_no_stats.dir

10.2.4 Required information in the job setup

When you set up for DPV processing, the following options in the USPS License Information group are required:

• Customer Company Name

• Customer Company Address

• Customer Company Locality

• Customer Company Region

• Customer Company Postcode1

• Customer Company Postcode2

10.2.5 DPV output fields

Several output fields are available for reporting DPV processing results:

2010-12-0985

Page 86

USA Regulatory Address Cleanse

DPV_CMRA

DescriptionField

The DPV Commercial Mail Receiving Agency (CMRA) component that is generated for this record.

L = The address triggered DPV locking.

N = The address is not a CMRA

Y = The address is a valid CMRA

blank = A blank output value indicates that Enable DPV is set to No, DPV processing is currently locked, or the transform cannot assign the input address.

DPV footnotes are required for CASS. The footnotes contain the following information:

AA = Input address matches to the postcode2 file.

A1 = Input address does not match to the postcode2 file.

DPV_Footnote

BB = All input address field values match to DPV.

CC = Input address primary number matches to DPV, but the

secondary number does not match (the secondary is present but invalid).

F1 = Input address matches a military address.

G1 = Input address matches a general delivery address.

M1 = Input address primary number is missing.

M3 = Input address primary number is invalid.

N1 = Input address primary number matches to DPV but the ad-

dress is missing the secondary number.

P1 = Input address is missing the RR or HC Box number.

P3 = Input address has an invalid PO, RR, or HC number.

RR = Input address matches to CMRA.

R1 = Input address matches to CMRA, but the secondary number

is not present.

U1 = Input address matches a unique address.

2010-12-0986

Page 87

USA Regulatory Address Cleanse

DescriptionField

No Stats indicator. No Stats means that the address is a vacant property, it receives mail as a part of a drop, or it does not have an established delivery yet.

Y = Address is flagged as No Stats in DPV data.

DPV_NoStats

DPV_Status

N = Address is not flagged as No Stats.

blank = Address was not looked up.

Note:

The US Addressing report contains DPV No Stats counts in the DPV Summary section.

The DPV status component that is generated for this record. D = The primary range is a confirmed delivery point, but the sec-

ondary range was not available on input.

L = The address triggered DPV locking.

N = The address is not a valid delivery point.

S = The primary range is a valid delivery point, but the parsed

secondary range is not valid in the DPV directory.

Y = The address is a confirmed delivery point. The primary range and secondary range (if present) are valid.

blank = A blank output value indicates that Enable DPV is set to No, DPV processing is currently locked, or the transform cannot assign the input address.

DPV_Vacant

Vacant address indicator.

Y = Address is vacant.

N = Address is not vacant.

blank = Address was not looked up.

Note:

The US Addressing report contains DPV Vacant counts in the DPV Summary section.

2010-12-0987

Page 88

USA Regulatory Address Cleanse

10.2.6 Non certified mode

End users can set up jobs with DPV disabled if the end user is not a CASS customer but still wants a Postcode2 added to addresses. The non-CASS option, Assign Postcode2 to Non DPV, enables the software to assign a Postcode2 when an address does not DPV-confirm.

Caution:

When DPV processing is disabled, the software does not generate the CASS-required documentation and the mailing is not eligible for postal discounts.

Related Topics

• Non Certified options

10.2.7 DPV performance

The additional time required to perform DPV processing may affect processing time. Processing time may vary with the DPV feature based on operating system, system configuration, and other variables that may be unique to your operating environment.

You can decrease the time required for DPV processing by loading DPV directories into system memory before processing.

10.2.7.1 Memory usage

You may need to install additional memory on your operating system for DPV processing. We recommend a minimum of 768 MB to process with DPV enabled.

To determine the amount of memory required to run with DPV enabled, check the size of the DPV directories (recently about 600 MB1) and add that to the amount of memory required to run the software.

The size of the DPV directories will vary depending on the amount of new data in each directory release.

Make sure that your computer has enough memory available before performing DPV processing.

To find the amount of disk space required to cache the directories, see the

Supported Platforms

in the SAP BusinessObjects Support portal.

The directory size is subject to change each time new DPV directories are installed.

document

2010-12-0988

Page 89

USA Regulatory Address Cleanse

10.2.7.2 Cache DPV directories

To better manage memory usage when you have enabled DPV processing, choose to cache the DPV directories.

Related Topics

• Transform performance

10.2.7.3 Running multiple jobs with DPV

When running multiple DPV jobs and loading directories into memory, you should add a 10-second pause between jobs to allow time for the memory to be released. For more information about setting this properly, see your operating system manual.

If you don't add a 10-second pause between jobs, there may not be enough time for your system to release the memory used for caching the directories from the first job. The next job waiting to process may produce an error or access the directories from disk if there is not enough memory to cache directories, resulting in performance degradation.

10.2.8 DPV locking

False-positive addresses are included in the DPV directories as a security precaution. If the software detects a false-positive address during processing, it marks the record as a false-positive and discontinues DPV processing.

Before releasing the mailing list that contains the false positive address, the mailer is required to send the DPV log files containing the false positive addresses to the USPS.

Related Topics

• False positive addresses

2010-12-0989

Page 90

USA Regulatory Address Cleanse

10.2.8.1 Stop processing alternative

The NCOALink end user, DPV user, or LACSLink user may use the stop processing alternative in the case of the software locking files.

Stop processing alternative allows you to bypass any future DPV or LACSLink directory locks. The stop processing alternative is not an option in the software. It is a key code that you obtain from SAP BusinessObjects Business User Support.

First you must obtain the proper permissions from the USPS, and then provide proof of permission to SAP BusinessObjects Business User Support. Business User Support will provide a key code that disables the DPV or LACSLink directory locking.

For NCOALink end users with the stop processing alternative keycode entered into the SAP License Manager, the software takes the following actions when it detects a false positive address during DPV processing:

• Marks the record as a false positive.

• Generates a DPV log file containing the false positive address.

• Notes the path to the DPV log files in the error log.

• Generates a US Regulatory Locking Report containing the path to the DPV log files.

• Continues DPV processing without interruption (however, you are required to notify the USPS that

a false positive address was detected.)

10.2.8.2 Reasons for errors

If a job setup is missing information in the USPS License Information group, and you have DPV and/or LACSLink enabled in your job, you will get error messages based on these specific situations:

• missing required parameters

• unwritable log file directory

Missing required parameters

When your job setup does not include the required options in the USPS License Information group, and you have DPV and/or LACSLink enabled, the software issues an error.

Unwritable log file directory

If you haven't specified a log file path, or if the path that you specified is not writable, the software issues an error.

2010-12-0990

Page 91

USA Regulatory Address Cleanse

10.2.8.3 DPV false positive logs

The software generates a false-positive log file any time it encounters a false positive record, regardless of how the job is set up. The software creates a separate log file for each mailing list that contains a false positive. If multiple false positives exist within one mailing list, the software writes them all to the same log file.

Note:

When the software locks because false-positive log files were created, end users must contact SAP BusinessObjects Business User Support to unlock the file.

Caution:

NCOALink limited and full service providers must not process additional lists for a customer that has given them a list that contains a false-positive record. The mailing list cannot be released until the USPS approves it.

Related Topics

• To notify the USPS of DPV locking addresses

• To retrieve the DPV unlock code

10.2.8.4 DPV log file location

The software stores DPV log files in the directory specified for the USPS Log Path in the Reference Files group.

Log file naming convention

The software automatically names DPV false positive logs like this: dpvl####.log, where #### is a number between 0001 and 9999. For example, the first log file generated is dpvl0001.log, the next one is dpvl0002.log, and so on.

Note:

When you have set the degree of parallelism to greater than 1, the software generates one log per thread. During a job run, if the software encounters only one false positive record, one log will be generated. However, if it encounters more than one false positive record and the records are processed on different threads, then the software will generate one log for each thread that processes a false positive record.

2010-12-0991

Page 92

USA Regulatory Address Cleanse

10.2.8.5 Submit to USPS

All NCOALink service providers must submit the false-positive log to the USPS NCSC (National Customer Service Center) via email (dsf2stop@email.usps.gov), with the mailer's name, the total number of addresses processed, and the number of addresses matched. Also include in the subject line “DPV False Positive”.

The NCSC uses this information to determine whether the list can be returned to the mailer.

Tip:

When the USPS releases the list that contained the locked record, you should delete the corresponding log file.

10.2.8.6 End users

End users of application made with this product must unlock DPV processing if a lock occurs by contacting SAP Business User Support to unlock the file.

10.2.8.7 To notify the USPS of DPV locking addresses

Follow these steps only if you have received an alert that DPV false positive addresses are present in your address list and you are either an NCOALink service provider or an NCOALink end user with stop processing alternative enabled.

Send an email to the USPS at dsf2stop@usps.gov. Include the following:

• write “DPV False Positive” as the subject line

• attach the dpvl####.log file or files from the job, where #### is a number between 0001 and

9999.

After the USPS has released the list that contained the locked or false positive record, the corresponding log files should be deleted.

Remove the record that caused the lock from the database.

Related Topics

• DPV security

2010-12-0992

Page 93

USA Regulatory Address Cleanse

10.2.9 Unlocking DPV

If DPV locking occurs, NCOALink full and limited-service providers must email the DPV False Positive log file to the USPS NCSC (National Customer Support Center) to obtain approval and the necessary information to unlock the list.

The software also locks DPV processing. You must contact SAP BusinessObjects Business User Support to obtain an unlock code.

10.2.9.1 To retrieve the DPV unlock code

Go to the SAP Service Market Place (SMP) at http://service.sap.com/message and log a message using the component “BOJ-EIM-DS”.

Attach the dpvx.txt file to your message and the log file named dpvl####.log, where ### is a number between 001 and 999.

The dpvx.txt file is located in the DPV directory referenced in the job. The log file is located in the directory specified for the USPS Log Path option in the USA Regulatory Address Cleanse transform.

Note:

If your files cannot be attached to the original message, include the unlock information in the message instead.

SAP Business User Support sends you an unlock file named dpvw.txt. Replace the existing dpvw.txt file with the new file.

Open your database and remove the record causing the lock.

Note:

Keep in mind that you can only use the unlock code one time. If the software detects another false-positive, you will need to retrieve a new DPV unlock code. Be sure to remove the record that is causing the lock from the database.

10.2.10 DPV No Stats indicators

The USPS uses No Stats indicators to mark addresses that fall under the No Stats category. The software uses the No Stats table when you have DPV enabled in a job. The USPS puts No Stats addresses in three categories:

2010-12-0993

Page 94

USA Regulatory Address Cleanse

• Addresses that do not have delivery established yet.

• Addresses that receive mail as part of a drop.

• Addresses that have been vacant for a certain period of time.

10.2.10.1 No Stats table

You must install the No Stats table (dpv_no_stats.dir) before the software performs DPV processing. The No Stats table is supplied by SAP BusinessObjects with the DPV directory install.

The software automatically checks for the No Stats table in the directory folder that you indicate in your job setup. The software performs DPV processing based on the install status of the directory.

Installed

DPV

ResultsType of processingdpv_no_stats.dir

The software automatically outputs No Stats indicators when you include the DPV_NoStats output field in your job.

The software automatically skips the No Stats process-

Not installed

DPV

ing and does not issue an error message. The software will perform DPV processing but won't populate the DPV_NoStat output field.

10.2.10.2 No Stats output field

Use the DPV_NoStats output field to post No Stat indicator information to an output file.

No Stat means that the address is a vacant property, it receives mail as a part of a drop, or it does not have an established delivery yet.

Related Topics

• DPV output fields

2010-12-0994

Page 95

USA Regulatory Address Cleanse

10.2.11 DPV Vacant indicators

The software provides vacant information in output fields and reports using DPV vacant counts. The USPS DPV vacant lookup table is supplied by SAP BusinessObjects with the DPV directory install.

The USPS uses DPV vacant indicators to mark addresses that fall under the vacant category. The software uses DPV vacant indicators when you have DPV enabled in your job.

Tip:

The USPS defines vacant as any delivery point that was active in the past, but is currently not occupied (usually over 90 days) and is not currently receiving mail delivery. The address could receive delivery again in the future. Vacant does not apply to seasonal addresses.

10.2.11.1 DPV address-attribute output fields

Vacant indicators for the assigned address are available in the DPV_Vacant output field.

Note:

The US Addressing report contains DPV Vacant counts in the DPV Summary section.

Related Topics

• DPV output fields

10.3 USPS eLOT®

eLOT is available for U.S. records in the USA Regulatory Address Cleanse transform only.

eLOT takes line of travel one step further. The original LOT narrowed the mail carrier's delivery route to the block face level (Postcode2 level) by discerning whether an address resided on the odd or even side of a street or thoroughfare.

eLOT narrows the mail carrier's delivery route walk sequence to the house (delivery point) level. This allows you to sort your mailings to a more precise level.

You can enable eLOT in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

2010-12-0995

Page 96

USA Regulatory Address Cleanse

Related Topics

• Assignment options

• Set up the reference files

10.4 Early Warning System (EWS)

EWS helps reduce the amount of misdirected mail caused when valid delivery points are created between national directory updates. EWS is available for U.S. records in the USA Regulatory Address Cleanse transform only.

You can enable EWS in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Topics

• Assignment options

10.4.1 Overview of EWS

The EWS feature is the solution to the problem of misdirected mail caused by valid delivery points that appear between national directory updates. For example, suppose that 300 Main Street is a valid address and that 300 Main Avenue does not exist. A mail piece addressed to 300 Main Avenue is assigned to 300 Main Street on the assumption that the sender is mistaken about the correct suffix.

Now consider that construction is completed on a house at 300 Main Avenue. The new owner signs up for utilities and mail, but it may take a couple of months before the delivery point is listed in the national directory. All the mail intended for the new house at 300 Main Avenue will be mis-directed to 300 Main Street until the delivery point is added to the national directory.

The EWS feature solves this problem by using an additional directory which informs CASS users of the existence of 300 Main Avenue long before it appears in the national directory. When using EWS processing, the previously mis-directed address now defaults to a 5-digit assignment.

10.4.2 EWS directory

The EWS directory contains four months of rolling data. Each week, the USPS adds new data and drops a week's worth of old data. The USPS then publishes the latest EWS data. Each Friday, SAP

2010-12-0996

Page 97

USA Regulatory Address Cleanse

BusinessObjects converts the data to our format (EWyymmdd.zip) and posts it on the SAP Business User Support site at https://service.sap.com/bosap-downloads-usps.

10.5 SuiteLink™

SuiteLink is an extra option in the USA Regulatory Address Cleanse transform. SuiteLink uses a USPS directory that contains multiple files of specially indexed address information like secondary numbers and unit designators for locations identified as high-rise business default buildings.

With SuiteLink you can build accurate and complete addresses by adding suite numbers to high-rise business addresses. With the secondary address information added to your addresses, more of your pieces are sorted by delivery sequence and delivered with accuracy and speed.

SuiteLink is not required when you process with CASS enabled. However, the USPS requires that NCOALink full service providers offer SuiteLink processing as an option to their customers.

You can enable SuiteLink in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Topics

• Assignment options

10.5.1 Benefits of SuiteLink

Businesses who depend on Web-site, mail, or in-store orders from customers will find that SuiteLink is a powerful money-saving tool. Also businesses who have customers that reside in buildings that house several businesses will appreciate getting their marketing materials, bank statements, and orders delivered right to their door.

The addition of secondary number information to your addresses allows for the most efficient and cost-effective delivery sequencing and postage discounts.

10.5.2 How SuiteLink works

The software uses the data in the SuiteLink directories to add suite numbers to an address. The software matches a company name, a known high-rise address, and the CASS-certified postcode2 in your database to data in SuiteLink. When there is a match, the software creates a complete business address that includes the suite number.

2010-12-0997

Page 98

USA Regulatory Address Cleanse

Example: SuiteLink

This example shows a record that is processed through SuiteLink, and the output record with the assigned suite number.

The input record contains:

• Firm name (in FIRM input field)

• Known high-rise address

• CASS-certified postcode2

The SuiteLink directory contains:

• secondary numbers

• unit designators

The output record contains:

• the correct suite number

Telera

910 E Hamilton Ave Fl2

Campbell CA 95008 0610

10.5.3 SuiteLink directory

The SuiteLink directory is distributed monthly. You must use the SuiteLink directory with a ZIP+4 directory labeled for the same month. For example, the December 2010 SuiteLink directory can be used with only the December 2010 ZIP+4 directory.

Caution:

SuiteLink will be disabled if you are running your job in non-certified mode (Non Certified Options > Disable Certification).

You cannot use a SuiteLink directory that is older than 60 days based on its release date. The software warns you 15 days before the directory expires. As with all directories, the software won't process your records with an expired SuiteLink directory.

Output recordInput record

TELERA

910 E HAMILTON AVE STE 200

CAMPBELL CA 95008 0625

2010-12-0998

Page 99

USA Regulatory Address Cleanse

10.5.4 Improve processing speed

You may increase SuiteLink processing speed if you load the SuiteLink directories into memory. To activate this option, go to the Transform Performance group and set the Cache SuiteLink Directories to Yes.

10.6 LACSLink®

LACSLink is a USPS product that is available for U.S. records with the USA Regulatory Address Cleanse transform only. LACSLink processing is required for CASS certification.

LACSLink updates addresses when the physical address does not move but the address has changed. For example, when the municipality changes rural route addresses to street-name addresses. Rural route conversions make it easier for police, fire, ambulance, and postal personnel to locate a rural address. LACSLink also converts addresses when streets are renamed or post office boxes renumbered.

LACSLink technology ensures that the data remains private and secure, and at the same time gives you easy access to the data. LACSLink is an integrated part of address processing; it is not an extra step. To obtain the new addresses, you must already have the old address data.

You can enable LACSLink in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Topics

• Assignment options

• Memory usage and caching for LACSLink processing

• LACSLink® security

10.6.1 Benefits of LACSLink

LACSLink processing is required for all CASS customers (beginning with CASS Cycle L).

If you process your data without LACSLink enabled, you won't get the CASS-required reports or postal discounts.

2010-12-0999

Page 100

USA Regulatory Address Cleanse

10.6.2 How LACSLink works

LACSLink provides a new address when one is available. LACSLink follows these steps when processing an address:

The USA Regulatory Address Cleanse transform standardizes the input address.

The transform looks for a matching address in the LACSLink data.

If a match is found, the transform outputs the LACSLink-converted address and other LACSLink information.

10.6.3 Conditions for address processing

The transform does not process all of your addresses with LACSLink when it is enabled. Here are the conditions under which your data is passed into LACSLink processing:

• The address is found in the address directory, and it is flagged as a LACS-convertible record within

the address directory.

• The address is found in the address directory, and, even though a rural route or highway contract

default assignment was made, the record wasn't flagged as LACS convertible.

• The address is not found in the address directory, but the record contains enough information to be

sent into LACSLink.

For example, the following table shows an address that was found in the address directory as a LACS-convertible address.

After LACSLink conversionOriginal address

RR2 BOX 204

DU BOIS PA 15801

463 SHOWERS RD

DU BOIS PA 15801-66675

10.6.4 LACSLink directory files

2010-12-09100

Business objects DATA QUALITY MANAGEMENT SDK 4.0 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Contents

Overview

Installing Data Quality Management SDK

Directory data

Directory data

Cleansing packages

Samples

API Reference for C++

API Reference for Java

API Reference for .Net

Address cleanse concepts

USA Regulatory Address Cleanse