Kofax Getting Started with Ascent Xtrata Pro User Manual

Getting Started
Ascent Xtrata Pro
Version 3.0
10300602-000 Revision A
with
Copyright © 2002-2006 LCI GmbH. All right reserved. Printed in USA.
Portions, Copyright 2006 Kofax Image Products, Inc. All Rights Reserved.
The information contained in this document is the property of LCI GmbH. Neither receipt nor possession hereof confers or transfers any right to reproduce or disclose any part of the contents hereof, without the prior written consent of LCI GmbH. No patent liability is assumed, however, with respect to the use of the information contained herein.
Trademarks
Kofax, Ascent, Ascent Capture, and Ascent Capture Internet Server are registered trademarks; and Xtrata and VRS are trademarks of Kofax Image Products, Inc.
The software is subject to copyright © 2002-2006 LCI GmbH.
ABBYY® FineReader® Engine 7.0 © ABBYY Software Ltd. 2004, ABBYY FineReader—the keenest eye in OCR.
ABBYY, FINEREADER and ABBYY FineReader are registered trademarks of ABBYY Software Ltd.
Chinese, Japanese, Korean recognition:
Technologies from NewSoft Inc. are used to recognize Chinese, Japanese and Korean texts:
Recore®, NewSoft®, Presto! ®.
All other product names and logos are trade and service marks of their respective
companies.
Disclaimer
The instructions and descriptions contained in this document were accurate at the time of printing. However, succeeding products and documents are subject to change without notice. Therefore the Kofax Image Products, Inc. assumes no liability for damages incurred directly or indirectly from errors, omissions, or discrepancies between the product and this document.
ii
An attempt has been made to state all allowable values where applicable throughout this document. Any values or parameters used beyond those stated may have unpredictable results.
iii
iv
Contents
How to Use This Guide..........................................................................................................xv
Introduction .......................................................................................................................... xv
How This Guide is Organized............................................................................................ xv
Related Documentation......................................................................................................xvi
Training................................................................................................................................xvii
Kofax Technical Support ...................................................................................................xvii
Overview ...................................................................................................................................1
Introduction ............................................................................................................................ 1
Ascent Xtrata Pro.................................................................................................................... 1
Capture Flow ............................................................................................................ 2
Ascent Xtrata Pro Project Builder......................................................................................... 3
Ascent Xtrata Pro Synchronization...................................................................................... 5
Ascent Xtrata Pro Knowledge Base Administration ......................................................... 6
Ascent Xtrata Pro Server ....................................................................................................... 6
Ascent Xtrata Pro Validation ................................................................................................ 7
Ascent Xtrata Pro Statistics Viewer...................................................................................... 7
Ascent Xtrata Pro Technology.............................................................................................. 7
Classification............................................................................................................. 8
Document Separation .............................................................................................. 9
Extraction .................................................................................................................. 9
Online Learning...................................................................................................... 10
OCR and Script Integration .................................................................................. 10
Release Script.......................................................................................................... 11
Statistical Information ........................................................................................... 11
Ascent Xtrata Pro User's Guide v
Contents
Validation ................................................................................................................11
Invoice Processing.................................................................................................................12
Special Invoice Processing Technology..............................................................................13
Knowledge Bases....................................................................................................13
Templates.................................................................................................................14
Group Locators .......................................................................................................14
Project Builder .......................................................................................................................17
Introduction...........................................................................................................................17
License Activation ..................................................................................................18
Project Level Fields................................................................................................. 21
Classification ...........................................................................................................21
Extraction.................................................................................................................22
Validation ................................................................................................................23
Managing Projects.................................................................................................................23
Creating a new Project ...........................................................................................24
Loading an Existing Project...................................................................................28
Saving a Project.......................................................................................................28
Project Properties....................................................................................................29
Testing and Optimizing a Project.........................................................................33
Invoice Projects......................................................................................................................36
Training Documents for Extraction......................................................................40
Templates.................................................................................................................41
Test and Optimize an Invoice Project ..................................................................42
Setting Up Classification...................................................................................................... 43
Layout Classifier.....................................................................................................43
Adaptive Feature Classifier...................................................................................44
Instruction Classifier ..............................................................................................45
Setting Up Extraction............................................................................................................46
Adding Fields and Locators .........................................................................................46
Setting Up Document Separation .......................................................................................47
Testing Document Separation...............................................................................48
Setting Up Validation...........................................................................................................48
Classification..........................................................................................................................51
Introduction...........................................................................................................................51
vi Ascent Xtrata Pro User's Guide
Contents
Concept of Classification..................................................................................................... 51
Classification Engines and Learning by Example............................................................ 53
Definition of Classes and the Class Tree........................................................................... 54
Adding Classes....................................................................................................... 54
Class Hierarchy ...................................................................................................... 55
Class Properties .................................................................................................................... 56
Classification Options..........................................................................................................62
Multipage Evaluation............................................................................................ 62
Hierarchical Evaluation and Other Classification Rules .................................. 65
Layout Classifier................................................................................................................... 75
Concept and Application ...................................................................................... 75
Set Up....................................................................................................................... 75
Layout Classifier Properties ................................................................................. 78
Image Clustering.................................................................................................... 80
Adaptive Feature Classifier................................................................................................. 84
Concept.................................................................................................................... 84
Set Up....................................................................................................................... 84
Properties ................................................................................................................ 86
Thresholds, Precision, and Recall ........................................................................ 89
Auto Optimization................................................................................................. 91
Result Matrix......................................................................................................................... 93
Instruction Classifier............................................................................................................ 96
Concept.................................................................................................................... 96
Set Up....................................................................................................................... 97
Using the Instruction Classifier With the Adaptive Feature Classifier ........ 101
Testing Content Classification.......................................................................................... 102
Managing Views................................................................................................................. 103
Extraction..............................................................................................................................107
Introduction ........................................................................................................................ 107
Locators and Fields ............................................................................................................ 107
Managing Fields .................................................................................................................108
Confidences........................................................................................................... 110
Field Inheritance................................................................................................... 111
Field Formatting................................................................................................... 112
Ascent Xtrata Pro User's Guide
vii
Contents
Locators ................................................................................................................................118
Basic Concept of Locators.................................................................................... 118
Managing Locators...............................................................................................121
Exporting and Importing Locators.....................................................................122
Locator Methods...................................................................................................123
Assign Locators to Field.......................................................................................124
Alternatives ...........................................................................................................125
Regions...................................................................................................................126
Testing Locators....................................................................................................129
Field Group Locators.......................................................................................................... 131
Amount Group Locator .......................................................................................131
Invoice Group Locator .........................................................................................132
Order Group Locator ...........................................................................................132
Setting Up Field Group Locators........................................................................ 132
Knowledge Bases ................................................................................................................135
OCR and OMR Profiles......................................................................................................137
Recognition Engines.............................................................................................137
OCR Substitution..................................................................................................138
Script Programming ...........................................................................................................138
Address Evaluator ..............................................................................................................140
Concept ..................................................................................................................140
Properties...............................................................................................................141
Advanced Zone Locator.....................................................................................................142
Concept ..................................................................................................................142
Properties...............................................................................................................142
Barcode Locator...................................................................................................................147
Concept ..................................................................................................................147
Properties...............................................................................................................147
Classification Locator .........................................................................................................148
Concept ..................................................................................................................148
Properties...............................................................................................................149
Using the Classification Locator.........................................................................149
Database Evaluator.............................................................................................................151
Concept ..................................................................................................................152
Properties...............................................................................................................152
Database Locator................................................................................................................. 153
viii Ascent Xtrata Pro User's Guide
Contents
Concept.................................................................................................................. 153
Setting Up a Database.......................................................................................... 153
Using the Database Locator................................................................................ 156
Speed Considerations.......................................................................................... 159
Format Locator.................................................................................................................... 160
Concept.................................................................................................................. 160
Regular Expressions............................................................................................. 161
Formats.................................................................................................................. 162
Format Templates ................................................................................................ 164
Keywords .............................................................................................................. 165
Dictionaries ........................................................................................................... 167
Invoice Header Locator ..................................................................................................... 174
Concept.................................................................................................................. 174
Properties .............................................................................................................. 177
OCR Voting Evaluator....................................................................................................... 183
Concept.................................................................................................................. 183
Properties .............................................................................................................. 183
Relation Evaluator..............................................................................................................185
Concept.................................................................................................................. 185
Properties .............................................................................................................. 185
Script Locator...................................................................................................................... 187
Concept.................................................................................................................. 187
Properties .............................................................................................................. 187
Standard Evaluator ............................................................................................................ 188
Concept.................................................................................................................. 188
Properties .............................................................................................................. 189
Table Locator....................................................................................................................... 190
Concept.................................................................................................................. 190
Table Models and Global Columns ................................................................... 190
Language Packages.............................................................................................. 194
Setting up Table Locator ..................................................................................... 201
Methods of Finding Tables ................................................................................. 202
Comparing the Methods ..................................................................................... 205
Manual Mode........................................................................................................ 206
Order Numbers .................................................................................................... 209
Zone Locator ....................................................................................................................... 210
Concept.................................................................................................................. 210
Ascent Xtrata Pro User's Guide
ix
Contents
Properties...............................................................................................................211
Set Up Validation .................................................................................................................215
Introduction......................................................................................................................... 215
Setting Up Validation.........................................................................................................215
Step 1 Set Up Classification and Extraction ............................................................216
Step 2 Set up Validation with Project Builder.........................................................216
Step 3 Add Ascent Xtrata Pro Validation to a Batch Class....................................216
Set Up Validation within Ascent Xtrata Pro Project Builder.........................................216
Extraction ......................................................................................................................217
Field Properties ..................................................................................................... 218
Field Formatter......................................................................................................218
Validation Methods.....................................................................................................219
Validation Rules...........................................................................................................221
Sequence of Validation Rules..............................................................................225
Validation Sequence.................................................................................................... 225
Validation Forms .........................................................................................................225
Validation Test .............................................................................................................230
Validation Script Events..............................................................................................232
Validation Design User Interface......................................................................................233
User Interface Elements ..............................................................................................234
Menu Bar................................................................................................................234
Toolbar ...................................................................................................................234
Form Elements ......................................................................................................236
Document Viewer.................................................................................................237
InPlace Editor........................................................................................................ 237
Validation Form and Form Elements Properties..............................................237
General Dialog Boxes ..................................................................................................242
Define Tab Sequence Dialog Box........................................................................242
Default Font Settings Dialog Box........................................................................243
Validation Sample...............................................................................................................244
Step 1: Set up Classification and Extraction Project................................................244
Step 2: Define Validation ............................................................................................245
Define Validation Methods .................................................................................245
Validation Rules....................................................................................................247
Validation Form....................................................................................................250
Project Builder User Interface............................................................................................251
Introduction......................................................................................................................... 251
x Ascent Xtrata Pro User's Guide
Contents
User Interface Elements..................................................................................................... 251
Initial View............................................................................................................ 251
Project Panel.......................................................................................................... 260
Project Panel for Invoice Projects....................................................................... 262
Classification Design Panel................................................................................. 266
Classification Result Panel.................................................................................. 267
Extraction Design Panel ...................................................................................... 269
Extraction Result Panel........................................................................................ 272
Validation Rules Panel ........................................................................................ 273
Result Matrix Panel.............................................................................................. 274
Test Folder Panel.................................................................................................. 276
Training Set (Classification) Panel..................................................................... 278
Training Set (Extraction) Panel........................................................................... 280
Selection Panel...................................................................................................... 281
New Samples Panel ............................................................................................. 283
Document Viewer ................................................................................................ 286
General Dialog Boxes......................................................................................................... 289
Add Classification View Dialog Box................................................................. 289
Advanced Zone Locator Zone Settings Dialog Box......................................... 291
Application Language Dialog Box..................................................................... 294
Class Based Precision and Recall Dialog Box................................................... 295
Classification Results Dialog Box....................................................................... 296
Class Properties Dialog Box................................................................................ 297
Create new class and table locator Dialog Box ................................................ 301
Dictionary Options Dialog Box .......................................................................... 303
Field Formatter Properties Dialog Boxes.......................................................... 306
Field Properties Dialog Box ................................................................................ 313
Filter Options Dialog Box.................................................................................... 319
Fuzzy Database Options Dialog Box................................................................. 322
Global Columns Settings Dialog Box ................................................................ 327
Instruction Properties Dialog Box...................................................................... 328
New Field Formatter Dialog Box ....................................................................... 330
New Validation Method Dialog Box ................................................................. 331
OCR Substitution Dialog Box............................................................................. 332
Open Test Folder Dialog Box.............................................................................. 333
Project Properties Dialog Box............................................................................. 334
Project Settings Dialog Box................................................................................. 335
Read Protection Password Dialog Box.............................................................. 353
Recognition Engine’s Properties Dialog Box.................................................... 354
Script Code Dialog Box ....................................................................................... 361
Table Model Properties Dialog Box................................................................... 363
Ascent Xtrata Pro User's Guide
xi
Contents
Validation Methods Properties Dialog Boxes................................................... 365
View Table for Field Dialog Box.........................................................................381
View Properties Dialog Box ................................................................................381
Write Protection Password Dialog Box .............................................................386
Zone Locator Zone Settings Dialog Box ............................................................387
Zone Locator Zone Profile Settings Dialog Boxes............................................393
General Invoice Dialog Boxes............................................................................................397
Create Knowledge Base Dialog Box................................................................... 397
Select Knowledge Base Dialog Box ....................................................................399
Create Knowledge Base Activation Code Dialog Box.....................................401
Edit Document Dialog Box..................................................................................402
Import Knowledge Base Dialog Box..................................................................406
Insert Knowledge Base Activation Code Dialog Box.......................................408
Knowledge Base Activation Dialog Box............................................................ 410
Move Training Document Dialog Box...............................................................411
Locator Properties Dialog Boxes.......................................................................................411
User Interface Elements .......................................................................................412
Address Evaluator Properties Dialog Box.........................................................413
Advanced Zone Locator Properties Dialog Box...............................................417
Barcode Locator Properties Dialog Box.............................................................425
Classification Locator Properties Dialog Box....................................................430
Database Evaluator Properties Dialog Box .......................................................436
Database Locator Properties Dialog Box........................................................... 438
Format Locator Properties Dialog Box...............................................................445
Invoice Header Locator Properties Dialog Box ................................................457
OCR Voting Evaluator Properties Dialog Box..................................................472
Relation Evaluator Properties Dialog Box.........................................................476
Script Locator Properties Dialog Box.................................................................479
Standard Evaluator Properties Dialog Box .......................................................484
Table Locator Properties Dialog Box..................................................................486
Zone Locator Properties Dialog Box .................................................................. 494
Invoice Locator Properties Dialog Boxes......................................................................... 502
Amount Group Locator Properties Dialog Box................................................503
Invoice Group Locator Properties Dialog Box..................................................509
Order Group Locator Properties Dialog Box....................................................512
Setup a Batch Class in Ascent Capture............................................................................517
Introduction......................................................................................................................... 517
Adding Ascent Xtrata Pro to a Batch Class..................................................................... 518
xii Ascent Xtrata Pro User's Guide
Contents
Batch Class Considerations............................................................................................... 519
Synchronizing Projects ........................................................................................ 519
Recognition Server............................................................................................... 519
Publishing Batch Classes..................................................................................... 520
Importing/Exporting Batch Classes.................................................................. 521
Synchronize Project with Batch Class.............................................................................. 521
Open Synchronization Tool................................................................................ 522
Extended Synchronization Settings................................................................... 523
Assigning Classes to Form Types ...................................................................... 525
Assigning Extraction Fields to Index Fields of Document Classes ............... 531
Perform Synchronization .................................................................................... 536
Adding Ascent Xtrata Pro Validation to a Batch Class ................................................. 538
Using the Release Script .................................................................................................... 539
Processing Batches.............................................................................................................543
Introduction ........................................................................................................................ 543
Ascent Capture 7.0 Features ............................................................................... 543
Multiprocessor Support ...................................................................................... 543
High Availability Support .................................................................................. 544
Ascent Capture Internet Server (ACIS) Support.............................................. 544
Processing Batches with Ascent Xtrata Pro Server ........................................................ 544
Processing Batches with Ascent Xtrata Pro Batch Processing Service ........................ 545
Ascent Xtrata Pro Batch Processing Service Performance Monitoring ......... 546
Quick Tour of the Ascent Xtrata Pro Server User Interface.......................................... 548
Polling Interval..................................................................................................... 550
Understanding the Log File .............................................................................................. 550
Ascent Xtrata Pro Validation...............................................................................................553
Introduction ........................................................................................................................ 553
Quick Tour of the User Interface...................................................................................... 553
User Interface Elements....................................................................................... 554
Settings Dialog Box.............................................................................................. 561
Select Folder Class Dialog Box ........................................................................... 565
Application Language Dialog Box..................................................................... 566
Adjusting the User Interface............................................................................... 567
Processing Batches with Ascent Xtrata Pro Validation................................................. 568
Ascent Xtrata Pro User's Guide
xiii
Contents
Validate a Document............................................................................................ 570
Batches with No Invalid Documents .................................................................571
Batch Editing .........................................................................................................572
Show Field Contents in Batch Tree ....................................................................576
Online Learning ....................................................................................................576
Character Level Editing .......................................................................................577
Shortcut Keys ........................................................................................................577
Read-Only Fields ..................................................................................................578
Force Valid Field...................................................................................................578
Assign a Document Class ....................................................................................578
Reject Documents or Pages..................................................................................580
Table Indexing.......................................................................................................581
Security Boost........................................................................................................ 581
Shortcuts...............................................................................................................................582
Statistics Viewer ..................................................................................................................585
Introduction......................................................................................................................... 585
Quick Tour of the User Interface ......................................................................................586
Elements................................................................................................................. 587
Reports..................................................................................................................................589
Actual Reports.......................................................................................................589
Historical Reports ................................................................................................. 592
Report Conditions.................................................................................................595
Index......................................................................................................................................597
xiv Ascent Xtrata Pro User's Guide
How to Use This Guide
Introduction
This guide contains information about using Ascent Xtrata Pro. It is provided for system administrators, operators, project developers, and other personnel who are setting up and using Ascent Xtrata Pro components for use with Ascent Capture.
This guide assumes that you have a thorough understanding of Windows standards and interfaces, and Ascent Capture.
How This Guide is Organized
This guide includes the following chapters:
Chapter 1 – Overview introduces the components installed with Ascent
Xtrata Pro and the key features provided with the product.
Chapter 2 – Project Builder describes how to create new projects with Ascent
Xtrata Pro Project Builder and introduces some of its interfaces and panels. It also includes some high-level general procedures for setting up classification, extraction, and validation.
Chapter 3 – Classification contains details about setting up classification
projects.
Chapter 4 – Extraction contains details about setting up extraction projects.
Chapter 5 – Setting Up Validation contains details about setting up
validation in projects, including instructions for designing custom validation forms.
Chapter 6 – Project Builder User Interface provides information about
Project Builder user interface items and various dialog boxes.
Ascent Xtrata Pro User's Guide xv
Chapter 7 – Setting Up a Batch Class in Ascent Capture explains how to add
Ascent Xtrata Pro components to Ascent Capture batch classes and use the Synchronization tool to synchronize the project classes and fields with Ascent Capture.
Chapter 8 – Processing Batches describes the general operation of Ascent
Xtrata Pro Server and provides information about its user interface.
Chapter 9 – Ascent Xtrata Pro Validation describes the general operation of
the Ascent Xtrata Pro Validation module.
Chapter 10 – Statistics Viewer describes the general operation of the Ascent
Xtrata Pro Statistics Viewer module.
Related Documentation
In addition to this Getting Started with Ascent Xtrata Pro guide, the following documentation is available.
Installation Guide for Ascent Xtrata Pro
This installation guide is provided as a separate document in the Ascent Xtrata Pro software case.
Using the Ascent Xtrata Pro Knowledge Base Administration Module
This guide contains information about training, creating, and otherwise managing knowledge bases for invoice projects.
Ascent Xtrata Pro Online Help
Ascent Xtrata Pro online help is available from the application components as follows:
From any of the Ascent Xtrata Pro components, click the Help button from
the toolbar or select Help|Contents (or Index) from the menu bar.
From any dialog box, click the Help button to display context sensitive help
information for the dialog box.
xvi Ascent Xtrata Pro User's Guide
How to Use This Guide
Scripting Online Help
Information about scripting is available from the Help menu of any Project Builder interface that allows you to write or access scripts. Select Help and then the desired help component.
Ascent Xtrata Pro Release Notes
Late-breaking product information is available from the release notes. You should read the release notes carefully, as they contain information that may not be included in other Ascent Xtrata Pro documentation.
Training
Kofax offers a variety of training options that will help you make the most of your software. Visit the Kofax Web site at www.kofax.com for complete details about the available training options and schedules.
Kofax Technical Support
For additional technical information about Kofax products, visit the Kofax Web site at www.kofax.com and select an appropriate option from the Support menu. The Kofax Support pages provide product-specific information, such as current revision levels, the latest drivers and software patches, online documentation and user manuals, updates to product release notes (if any), technical tips, and an extensive searchable knowledgebase.
The Kofax Web site also contains information that describes support options for Kofax products. Please review the site for details about the available support options.
If you need to contact Kofax Technical Support, please have the following information available:
Ascent Xtrata Pro software version
Ascent Capture and ACI Server software versions
Operating system and service pack version
Network and client configuration
Copies of your error log files
Scanner make and model
Ascent Xtrata Pro User's Guide
xvii
Scanner engine (board) type
Special/custom configuration or integration information
xviii Ascent Xtrata Pro User's Guide
How to Use This Guide
Ascent Xtrata Pro User's Guide
xix
Introduction
This chapter introduces the components installed with Ascent Xtrata Pro, as well as their key features.
The rest of this guide describes these components in more detail, and explains how to incorporate Ascent Xtrata Pro into your Ascent Capture processing flow.
Ascent Xtrata Pro
Ascent Xtrata Pro is a complete system for processing structured, semi-structured, and unstructured documents within the Ascent Capture framework. Ascent Capture’s document and data capture capabilities are enhanced by advanced intelligent document processing. Ascent Xtrata Pro provides methods for hierarchical, content-based classification, and the free-form field extraction of arbitrary, mixed, and unstructured documents.
Chapter 1
Overview
Ascent Xtrata Pro adds the following components to your Ascent Capture system:
Ascent Xtrata Pro Project Builder lets you set up, store, and test Ascent
Xtrata Pro projects that contain all the information required to process documents.
Ascent Xtrata Pro Synchronization tool is a setup component that is
integrated into the Ascent Capture Administration module as a custom panel. It is used for linking Ascent Capture document classes and index fields to classes and fields in the Ascent Xtrata Pro project.
Ascent Xtrata Pro Knowledge Base Administration is used to train
documents and manage knowledge bases for a given project. Fields cannot be added to the project and locator settings cannot be changed.
Ascent Xtrata Pro User's Guide 1
Chapter 1
Ascent Xtrata Pro Server processes batches in the Ascent Capture workflow
by performing document classification and data extraction. The Server module uses the definitions stored in a project and executes them when processing batches for a linked batch class.
Ascent Xtrata Pro Validation provides enhanced validation functionality. It
allows for validating and manually correcting documents that contain invalid classification and/or extraction results. Problem documents can be flagged for additional training.
Ascent Xtrata Pro Statistics Viewer is used to show statistical data gathered
by Ascent Xtrata Pro Server.
Ascent Xtrata Pro XDoc Browser is used to view the contents of XDoc files.
These files contain a textual representation of the contents, structure, and extraction results from image files. Ascent Xtrata Pro uses XDoc files internally when processing batches.
Ascent Xtrata Pro Image Classifier is a utility that you can use to classify and
cluster documents without using the Project Builder
Once Ascent Xtrata Pro is installed, you can add Ascent Xtrata Pro Server and Ascent Xtrata Pro Validation to any batch class already defined in the Ascent Capture Administration module. Typically, Ascent Xtrata Pro Server is placed directly after the Scan module and replaces the Recognition Server in the Ascent Capture workflow. Documents are classified and processed for data extraction and then routed to the Ascent Xtrata Pro Validation module and/or the Release module.
Capture Flow
An overview of a typical Ascent Capture workflow that includes Ascent Xtrata Pro Server is shown below.
2 Ascent Xtrata Pro User's Guide
Overview
Ascent Xtrata Pro
Server
Figure 1-1. Typical Capture Workflow with Ascent Xtrata Pro Server and Validation
First, documents are prepared for scanning. There is no need to sort the documents, but the pages must be smoothed and all staples and/or clips removed. Then, using a professional scanner with VRS, batches of documents are scanned into Ascent Capture. Ascent Xtrata Pro Server processes the documents and provides the classification and recognition results. Invalid results are reviewed, and if necessary, corrected in the Ascent Xtrata Pro Validation module.
Optionally, documents in the batch can be routed to either the Ascent Capture Recognition Server or Ascent Xtrata Pro Server to perform advanced forms processing. After all the documents are validated and verified either by Ascent Capture Validation or Ascent Xtrata Pro Validation, the batch is passed to the Release module and exported to the final repository.
Ascent Xtrata Pro Project Builder
Ascent Xtrata Pro Project Builder is a standalone program intended for system administrators, operators, project developers, and other skilled individuals who are setting up Ascent Xtrata Pro projects. Project Builder allows for defining the hierarchical structure of classes (categories of documents) and adding sample documents and classification instructions to these classes. Extraction rules and fields can be defined for each class.
Note that for invoice projects there is, by definition, only one class (the invoice class). Consequently, class related settings are not displayed and are handled automatically by the program.
Ascent Xtrata Pro User's Guide
3
Chapter 1
A project created with Project Builder is stored in its own project folder. The folder includes the project file and a number of additional files that contain everything needed to manage and execute the project. This project folder is portable; if desired, it can be copied to another location and used from there.
Project Builder supports robust features for interactively testing project settings during configuration and maintenance. Thorough testing, using your own sets of test documents, is vitally important for evaluating the behavior of defined rules and learned document samples. The settings can then be adjusted (and retested) until the desired results are achieved.
Test documents can be displayed in an integrated document viewer. A test set may contain any number of .tif, .txt, or .xdc files placed in one or more designated folders. (.xdc is a proprietary file format used by Ascent Xtrata Pro that contains textual and geometric information extracted from a .tif file by the built-in Optical Character Recognition (OCR) engine.)
Project Builder has flexible features you can use to test classification results for the entire test set or extraction results for a single document. Test results are displayed in the Classification Results or Extraction Results panels for quick review. Or, you can directly view the results in the Document Viewer when the document is displayed.
The results are also displayed in a result matrix, which provides a three-dimensional column graph of the classification results. This matrix provides an immediate, highly visual assessment of classification quality.
4 Ascent Xtrata Pro User's Guide
Overview
Figure 1-2. Classification Result Matrix for a News Group Project of Nine Classes
Ascent Xtrata Pro Synchronization
Once classes and fields are defined in the Ascent Xtrata Pro project, they must be mapped to Ascent Capture document classes, form types, and index fields.
Ascent Capture document classes, form types, and index fields can be set up in Ascent Capture as usual. The batch class does not need sample pages, index zones, or other recognition settings because these items are set up in Project Builder.
A project can be synchronized with any batch class that contains Ascent Xtrata Pro Server as a queue. To facilitate the synchronization process, the Ascent Xtrata Pro Synchronization tool has an easy-to-use and efficient interface for linking Ascent Xtrata Pro project elements with corresponding elements in the Ascent Capture batch class.
The Synchronization tool is available from the Ascent Capture batch class context menu so long as Ascent Xtrata Pro Server is set up as a queue.
Ascent Xtrata Pro User's Guide
5
Chapter 1
Ascent Xtrata Pro Knowledge Base Administration
Once a project is set up, the Knowledge Base Administration module is used to train the project, as well as manage training sets and knowledge bases. For complete information on this application, refer to the Using the Ascent Xtrata Pro Knowledge Base Administration Module guide that is included with your product.
Ascent Xtrata Pro Server
Ascent Xtrata Pro Server is a custom module that performs document classification, OCR, and data extraction. Once installed, it can be added to the list of processing queues for any Ascent Capture batch class.
Ascent Xtrata Pro Server normally runs as an unattended module. Statistical data and error messages are available through a log file. A user interface shows the status of the batch, the document, and the recognition results for the current document.
Ascent Xtrata Pro Server can be started manually for one batch from the Ascent Capture Batch Manager or run as a polling server that automatically processes all batches that are ready for it. For each batch, the project associated with its batch class is automatically loaded by the Server as needed.
The Server can run as an application, where it has a graphical user interface, or it can run in the background as a Windows service. Start the Server in application mode from either the Windows start button or the Ascent Capture Batch Manager. To automatically start the Server as service every time the computer starts, change the starting mode from ‘manual’ to ‘automatic’. Select Control Panel | Administrative Tools | Services, find “Ascent Xtrata Pro Batch Processing Service,“ and change the starting mode from “manual“ to “automatic.“
To monitor the service a performance counter “Ascent Xtrata Pro Batch Processing Service“ is added to the Microsoft Windows monitoring system. To add the performance counter, select Start | Control panel | Administrative Tools | Performance and start the monitoring system. From the context menu, click “‘Add Counters“ and type “Ascent Xtrata Pro Batch Processing Service“.
The Ascent Xtrata Pro Server (including when running as a service) supports multi­processor CPUs. Parallel document processing supports up to four services. For example, while processing a batch, the Server can allocate multiple processors so that
each one is dedicated to a single document.
6 Ascent Xtrata Pro User's Guide
Overview
The Server collects statistical data on all documents as they are processed and saves this information in the XDocument (XDoc). A release script retrieves the data from the XDoc and stores it in a database. The statistics are also updated based on changes that occur during validation.
The Server collects the following statistics:
Number of pages/documents per day/month.
Recognition rates (correct, reject, error) per field and per document.
Processing time per page.
Field and Document statistics grouped by index field or classification result.
The statistics feature offers the following capabilities:
Cleanup of obsolete data within in a specified time span.
Collection of data grouped by index field for each classification result.
Automatic archiving of data older than a month.
Ascent Xtrata Pro Validation
Ascent Xtrata Pro Validation is a custom module that can be used in conjunction with Ascent Xtrata Pro Server for Ascent Capture batches. It provides an interface for validating and manually correcting classification and extraction results returned by the Server.
Ascent Xtrata Pro Statistics Viewer
The Ascent Xtrata Pro Statistics Viewer is a standalone application that displays the statistical data gathered by the Ascent Xtrata Pro Server and the Ascent Xtrata Pro Validation module. The statistics contain information about speed as well as about recognition accuracy.
Ascent Xtrata Pro Technology
The following sections give a short overview of the processing capabilities of Ascent Xtrata Pro. The capabilities are documented in detail in the following chapters.
Ascent Xtrata Pro User's Guide
7
Chapter 1
Classification
Classification is the process of determining the category (class) of a document by identifying its relevant characteristics. The features used for classifying a document can be geometrical or textual. The Ascent Xtrata Pro classification engine can use either of these characteristics to make the best determination.
Classification Hierarchy
In most organizations, the manual classification of documents follows a hierarchical scheme. First, the main category of a document is determined and then classification is refined and performed in greater detail over several steps until the final result (the type of document) is obtained.
With Ascent Xtrata Pro you can replicate your legacy classification hierarchy when using automatic classification, thereby ensuring familiar results. This type of hierarchical evaluation is designed to traverse the full extent of the classification tree defined for a project. Different classification methods can be used at each level of the hierarchy. Extraction can be defined for any class in the tree and is inherited by any sub nodes of that class.
Layout Classification
Layout classification uses the geometric structure of a document to classify it. This structure is learned automatically from a single sample page that serves as a prototype for the geometric analysis. If the class contains documents of several distinct layouts, layout classification can be used to match new documents with the appropriate class.
Typically, layout classification is used for identifying forms in a batch. But, it can also be used for recognizing the sender of a letter, if the sender’s document layout is unique. For example, this might be the case for formal letters and invoices.
Content Classification
Content classification uses the textual content of a document to classify it. This type of classification is trained with several dozen sample documents per class. The Adaptive Feature Classifier (AFC) automatically determines the features that are relevant for a class. Because the AFC is fault tolerant and evaluates words as well as other features, even information with OCR or typing errors can be used to correctly classify a document. The sample documents are analyzed and a classification pattern is automatically created for use during production.
8 Ascent Xtrata Pro User's Guide
Overview
Instruction Classification
Instruction classification uses explicit rules about a document to classify it. These rules consist of words and phrases that can be combined using Boolean operations. Negative instructions can be used to inhibit placing a document into a class. When used in conjunction with the AFC, these explicit instructions can be used to handle exceptions.
Document Separation
Ascent Xtrata Pro is capable of separating multi-page .tif images into single documents or grouping loose pages into multi-page documents.
Although disabled by default, document separation can be enabled as a project-level setting in Project Builder. A variety of options are available for defining how Ascent Xtrata Pro Server handles unclassified pages. When the feature is enabled, Ascent Xtrata Pro Server performs document separation before extraction.
For details about setting up document separation, see Project Builder.
Extraction
Extraction is the act of processing a document, usually with an OCR engine, to identify information from an image file and preserve that information as text.
For classified documents, a class-specific extraction algorithm is applied to the index fields for that class. Ascent Xtrata Pro provides several complementary extraction methods for both finding relevant information in a document, and for filling the index fields with the extracted items.
Extraction is not performed for unclassified documents.
Locators
Extraction methods, which are called locators, are available as integrated components that can be configured for any class or at the project level.
Locators are attached to one or more fields that store the results of the locator algorithm. Locators and fields are inherited by classes in accordance with their position in the class tree.
Ascent Xtrata Pro User's Guide
9
Chapter 1
Evaluators
In addition to the locators, various evaluators are available. Evaluators work on the results of locators and do not directly retrieve data from the document.
Online Learning
The New Samples working mode is available within Project Builder. This working mode shows documents that have been returned from validation. These documents can be added to either a classification or extraction training set so that they may optimize the extraction of tables and invoice header locators.
In order to make online learning available for a batch class, the Ascent Capture Release module must be added to the list of queues for the batch class.
OCR and Script Integration
In addition to the classification and extraction methods provided with Ascent Xtrata Pro, Project Builder also provides access to OCR settings and an editor for the built­in script engine.
OCR Integration
To process unstructured documents and locate arbitrary content, the complete document must be processed by the OCR engine before any of the extraction methods can be applied. The OCR results are stored in a structured representation of the document that is saved as an .xdc (XDoc) file. All subsequent algorithms operate on the XDoc representation of the original file.
OCR is integrated transparently into Project Builder and Ascent Xtrata Pro Server. It is also performed automatically during runtime, and only on demand. This means that it is only done when the full text results of a page are needed. For example, when extraction is restricted to the first page of the document, and none of the classification methods require more than one page, OCR is only performed on the first page.
Ascent Xtrata Pro is delivered with the ABBYY ® Finereader ® 8.0 OCR engine. An
additional language package for Asian languages for ABBYY ® Finereader ® and an additional recognition engine KADMOS 4.2 ®, developed by Recognition GmbH, is available. The language package as well as additional recognition engines like for example KADMOS 4.2 ® must be licensed separately.
10 Ascent Xtrata Pro User's Guide
Overview
Script Integration
A VBA-compatible script engine is built into Ascent Xtrata Pro. This engine can be used to extend the capabilities of the classification, extraction, and validation methods. The script is called when specific events occur before and after classification. In the scripting environment, the complete Ascent Xtrata Pro object document model is available to the script programmer.
Release Script
The Xtrata Pro Statistics release script lets you configure the settings for online learning and statistical information.
To make online learning and statistical information available, the standard Ascent Capture Release module must be added to the list of queues for the batch class and the Xtrata Pro Statistics release script must be added to each Ascent Capture document class in the batch class.
For further details about release scripts, see the Ascent Capture documentation.
Statistical Information
The statistics database contains information about server performance and recognition accuracy. For a period of time, statistical information is available for each field and document. After a user configurable number of days, this detailed information will be accumulated into average daily values.
You can set the number of days in the properties dialog box for the release script.
Recognition accuracy statistics are available at the field level and as an average value for each document. Furthermore, it is possible to group the statistical information by the classification result or by other field values. You can then further evaluate the statistical data by grouping it according to the value of that field. For example, recognition accuracy or OCR computing time can be tracked for a field and then grouped by supplier or by Ascent Capture document class.
The group value is set in the properties dialog box for the release script.
Validation
Before you can use the Ascent Xtrata Pro Validation module to correct documents, validation must be set up in the Ascent Xtrata Pro Project Builder. Furthermore, validation thresholds must be assigned, as well as validation methods and rules.
Ascent Xtrata Pro User's Guide
11
Loading...
+ 593 hidden pages