Portions, Copyright 2006 Kofax Image Products, Inc. All Rights Reserved.
The information contained in this document is the property of LCI GmbH. Neither
receipt nor possession hereof confers or transfers any right to reproduce or disclose
any part of the contents hereof, without the prior written consent of LCI GmbH. No
patent liability is assumed, however, with respect to the use of the information
contained herein.
Trademarks
Kofax, Ascent, Ascent Capture, and Ascent Capture Internet Server are registered
trademarks; and Xtrata and VRS are trademarks of Kofax Image Products, Inc.
ABBYY, FINEREADER and ABBYY FineReader are registered trademarks of ABBYY
Software Ltd.
Chinese, Japanese, Korean recognition:
Technologies from NewSoft Inc. are used to recognize Chinese, Japanese and Korean
texts:
Recore®, NewSoft®, Presto! ®.
All other product names and logos are trade and service marks of their respective
companies.
Disclaimer
The instructions and descriptions contained in this document were accurate at the
time of printing. However, succeeding products and documents are subject to change
without notice. Therefore the Kofax Image Products, Inc. assumes no liability for
damages incurred directly or indirectly from errors, omissions, or discrepancies
between the product and this document.
ii
An attempt has been made to state all allowable values where applicable throughout
this document. Any values or parameters used beyond those stated may have
unpredictable results.
iii
iv
Contents
How to Use This Guide..........................................................................................................xv
Introduction .......................................................................................................................... xv
How This Guide is Organized............................................................................................ xv
Related Documentation......................................................................................................xvi
This guide contains information about using Ascent Xtrata Pro. It is provided for
system administrators, operators, project developers, and other personnel who are
setting up and using Ascent Xtrata Pro components for use with Ascent Capture.
This guide assumes that you have a thorough understanding of Windows standards
and interfaces, and Ascent Capture.
How This Guide is Organized
This guide includes the following chapters:
•Chapter 1 – Overview introduces the components installed with Ascent
Xtrata Pro and the key features provided with the product.
•Chapter 2 – Project Builder describes how to create new projects with Ascent
Xtrata Pro Project Builder and introduces some of its interfaces and panels. It
also includes some high-level general procedures for setting up classification,
extraction, and validation.
•Chapter 3 – Classification contains details about setting up classification
projects.
• Chapter 4 – Extraction contains details about setting up extraction projects.
• Chapter 5 – Setting Up Validation contains details about setting up
validation in projects, including instructions for designing custom validation
forms.
•Chapter 6 – Project Builder User Interface provides information about
Project Builder user interface items and various dialog boxes.
Ascent Xtrata Pro User's Guide xv
•Chapter 7 – Setting Up a Batch Class in Ascent Capture explains how to add
Ascent Xtrata Pro components to Ascent Capture batch classes and use the
Synchronization tool to synchronize the project classes and fields with Ascent
Capture.
•Chapter 8 – Processing Batches describes the general operation of Ascent
Xtrata Pro Server and provides information about its user interface.
•Chapter 9 –Ascent Xtrata Pro Validation describes the general operation of
the Ascent Xtrata Pro Validation module.
•Chapter 10 – Statistics Viewer describes the general operation of the Ascent
Xtrata Pro Statistics Viewer module.
Related Documentation
In addition to this Getting Started with Ascent Xtrata Pro guide, the following
documentation is available.
Installation Guide for Ascent Xtrata Pro
This installation guide is provided as a separate document in the Ascent Xtrata Pro
software case.
Using the Ascent Xtrata Pro Knowledge Base Administration Module
This guide contains information about training, creating, and otherwise managing
knowledge bases for invoice projects.
Ascent Xtrata Pro Online Help
Ascent Xtrata Pro online help is available from the application components as
follows:
• From any of the Ascent Xtrata Pro components, click the Help button from
the toolbar or select Help|Contents (or Index) from the menu bar.
• From any dialog box, click the Help button to display context sensitive help
information for the dialog box.
xvi Ascent Xtrata Pro User's Guide
How to Use This Guide
Scripting Online Help
Information about scripting is available from the Help menu of any Project Builder
interface that allows you to write or access scripts. Select Help and then the desired
help component.
Ascent Xtrata Pro Release Notes
Late-breaking product information is available from the release notes. You should
read the release notes carefully, as they contain information that may not be included
in other Ascent Xtrata Pro documentation.
Training
Kofax offers a variety of training options that will help you make the most of your
software. Visit the Kofax Web site at www.kofax.com for complete details about the
available training options and schedules.
Kofax Technical Support
For additional technical information about Kofax products, visit the Kofax Web site
at www.kofax.com and select an appropriate option from the Support menu. The
Kofax Support pages provide product-specific information, such as current revision
levels, the latest drivers and software patches, online documentation and user
manuals, updates to product release notes (if any), technical tips, and an extensive
searchable knowledgebase.
The Kofax Web site also contains information that describes support options for
Kofax products. Please review the site for details about the available support options.
If you need to contact Kofax Technical Support, please have the following
information available:
• Ascent Xtrata Pro software version
• Ascent Capture and ACI Server software versions
• Operating system and service pack version
• Network and client configuration
• Copies of your error log files
• Scanner make and model
Ascent Xtrata Pro User's Guide
xvii
• Scanner engine (board) type
• Special/custom configuration or integration information
xviii Ascent Xtrata Pro User's Guide
How to Use This Guide
Ascent Xtrata Pro User's Guide
xix
Introduction
This chapter introduces the components installed with Ascent Xtrata Pro, as well as
their key features.
The rest of this guide describes these components in more detail, and explains how
to incorporate Ascent Xtrata Pro into your Ascent Capture processing flow.
Ascent Xtrata Pro
Ascent Xtrata Pro is a complete system for processing structured, semi-structured,
and unstructured documents within the Ascent Capture framework. Ascent
Capture’s document and data capture capabilities are enhanced by advanced
intelligent document processing. Ascent Xtrata Pro provides methods for
hierarchical, content-based classification, and the free-form field extraction of
arbitrary, mixed, and unstructured documents.
Chapter 1
Overview
Ascent Xtrata Pro adds the following components to your Ascent Capture system:
•Ascent Xtrata Pro Project Builder lets you set up, store, and test Ascent
Xtrata Pro projects that contain all the information required to process
documents.
•Ascent Xtrata Pro Synchronization tool is a setup component that is
integrated into the Ascent Capture Administration module as a custom panel.
It is used for linking Ascent Capture document classes and index fields to
classes and fields in the Ascent Xtrata Pro project.
•Ascent Xtrata Pro Knowledge Base Administration is used to train
documents and manage knowledge bases for a given project. Fields cannot be
added to the project and locator settings cannot be changed.
Ascent Xtrata Pro User's Guide 1
Chapter 1
•Ascent Xtrata Pro Server processes batches in the Ascent Capture workflow
by performing document classification and data extraction. The Server
module uses the definitions stored in a project and executes them when
processing batches for a linked batch class.
•Ascent Xtrata Pro Validation provides enhanced validation functionality. It
allows for validating and manually correcting documents that contain invalid
classification and/or extraction results. Problem documents can be flagged
for additional training.
•Ascent Xtrata Pro Statistics Viewer is used to show statistical data gathered
by Ascent Xtrata Pro Server.
•Ascent Xtrata Pro XDoc Browser is used to view the contents of XDoc files.
These files contain a textual representation of the contents, structure, and
extraction results from image files. Ascent Xtrata Pro uses XDoc files
internally when processing batches.
•Ascent Xtrata Pro Image Classifier is a utility that you can use to classify and
cluster documents without using the Project Builder
Once Ascent Xtrata Pro is installed, you can add Ascent Xtrata Pro Server and Ascent
Xtrata Pro Validation to any batch class already defined in the Ascent Capture
Administration module. Typically, Ascent Xtrata Pro Server is placed directly after
the Scan module and replaces the Recognition Server in the Ascent Capture
workflow. Documents are classified and processed for data extraction and then
routed to the Ascent Xtrata Pro Validation module and/or the Release module.
Capture Flow
An overview of a typical Ascent Capture workflow that includes Ascent Xtrata Pro
Server is shown below.
2 Ascent Xtrata Pro User's Guide
Overview
Ascent Xtrata Pro
Server
Figure 1-1. Typical Capture Workflow with Ascent Xtrata Pro Server and Validation
First, documents are prepared for scanning. There is no need to sort the documents,
but the pages must be smoothed and all staples and/or clips removed. Then, using a
professional scanner with VRS, batches of documents are scanned into Ascent
Capture. Ascent Xtrata Pro Server processes the documents and provides the
classification and recognition results. Invalid results are reviewed, and if necessary,
corrected in the Ascent Xtrata Pro Validation module.
Optionally, documents in the batch can be routed to either the Ascent Capture
Recognition Server or Ascent Xtrata Pro Server to perform advanced forms
processing. After all the documents are validated and verified either by Ascent
Capture Validation or Ascent Xtrata Pro Validation, the batch is passed to the
Release module and exported to the final repository.
Ascent Xtrata Pro Project Builder
Ascent Xtrata Pro Project Builder is a standalone program intended for system
administrators, operators, project developers, and other skilled individuals who are
setting up Ascent Xtrata Pro projects. Project Builder allows for defining the
hierarchical structure of classes (categories of documents) and adding sample
documents and classification instructions to these classes. Extraction rules and fields
can be defined for each class.
Note that for invoice projects there is, by definition, only one class (the invoice class).
Consequently, class related settings are not displayed and are handled automatically
by the program.
Ascent Xtrata Pro User's Guide
3
Chapter 1
A project created with Project Builder is stored in its own project folder. The folder
includes the project file and a number of additional files that contain everything
needed to manage and execute the project. This project folder is portable; if desired,
it can be copied to another location and used from there.
Project Builder supports robust features for interactively testing project settings
during configuration and maintenance. Thorough testing, using your own sets of test
documents, is vitally important for evaluating the behavior of defined rules and
learned document samples. The settings can then be adjusted (and retested) until the
desired results are achieved.
Test documents can be displayed in an integrated document viewer. A test set may
contain any number of .tif, .txt, or .xdc files placed in one or more designated folders.
(.xdc is a proprietary file format used by Ascent Xtrata Pro that contains textual and
geometric information extracted from a .tif file by the built-in Optical Character
Recognition (OCR) engine.)
Project Builder has flexible features you can use to test classification results for the
entire test set or extraction results for a single document. Test results are displayed in
the Classification Results or Extraction Results panels for quick review. Or, you can
directly view the results in the Document Viewer when the document is displayed.
The results are also displayed in a result matrix, which provides a three-dimensional
column graph of the classification results. This matrix provides an immediate, highly
visual assessment of classification quality.
4 Ascent Xtrata Pro User's Guide
Overview
Figure 1-2. Classification Result Matrix for a News Group Project of Nine Classes
Ascent Xtrata Pro Synchronization
Once classes and fields are defined in the Ascent Xtrata Pro project, they must be
mapped to Ascent Capture document classes, form types, and index fields.
Ascent Capture document classes, form types, and index fields can be set up in
Ascent Capture as usual. The batch class does not need sample pages, index zones,
or other recognition settings because these items are set up in Project Builder.
A project can be synchronized with any batch class that contains Ascent Xtrata Pro
Server as a queue. To facilitate the synchronization process, the Ascent Xtrata Pro
Synchronization tool has an easy-to-use and efficient interface for linking Ascent
Xtrata Pro project elements with corresponding elements in the Ascent Capture batch
class.
The Synchronization tool is available from the Ascent Capture batch class context
menu so long as Ascent Xtrata Pro Server is set up as a queue.
Ascent Xtrata Pro User's Guide
5
Chapter 1
Ascent Xtrata Pro Knowledge Base Administration
Once a project is set up, the Knowledge Base Administration module is used to train
the project, as well as manage training sets and knowledge bases. For complete
information on this application, refer to the Using the Ascent Xtrata Pro Knowledge Base Administration Module guide that is included with your product.
Ascent Xtrata Pro Server
Ascent Xtrata Pro Server is a custom module that performs document classification,
OCR, and data extraction. Once installed, it can be added to the list of processing
queues for any Ascent Capture batch class.
Ascent Xtrata Pro Server normally runs as an unattended module. Statistical data
and error messages are available through a log file. A user interface shows the status
of the batch, the document, and the recognition results for the current document.
Ascent Xtrata Pro Server can be started manually for one batch from the Ascent
Capture Batch Manager or run as a polling server that automatically processes all
batches that are ready for it. For each batch, the project associated with its batch class
is automatically loaded by the Server as needed.
The Server can run as an application, where it has a graphical user interface, or it can
run in the background as a Windows service. Start the Server in application mode
from either the Windows start button or the Ascent Capture Batch Manager. To
automatically start the Server as service every time the computer starts, change the
starting mode from ‘manual’ to ‘automatic’. Select Control Panel | Administrative
Tools | Services, find “Ascent Xtrata Pro Batch Processing Service,“ and change the
starting mode from “manual“ to “automatic.“
To monitor the service a performance counter “Ascent Xtrata Pro Batch Processing
Service“ is added to the Microsoft Windows monitoring system. To add the
performance counter, select Start | Control panel | Administrative
Tools | Performance and start the monitoring system. From the context menu, click
“‘Add Counters“ and type “Ascent Xtrata Pro Batch Processing Service“.
The Ascent Xtrata Pro Server (including when running as a service) supports multiprocessor CPUs. Parallel document processing supports up to four services. For
example, while processing a batch, the Server can allocate multiple processors so that
each one is dedicated to a single document.
6 Ascent Xtrata Pro User's Guide
Overview
The Server collects statistical data on all documents as they are processed and saves
this information in the XDocument (XDoc). A release script retrieves the data from
the XDoc and stores it in a database. The statistics are also updated based on changes
that occur during validation.
The Server collects the following statistics:
• Number of pages/documents per day/month.
• Recognition rates (correct, reject, error) per field and per document.
• Processing time per page.
• Field and Document statistics grouped by index field or classification result.
The statistics feature offers the following capabilities:
• Cleanup of obsolete data within in a specified time span.
• Collection of data grouped by index field for each classification result.
• Automatic archiving of data older than a month.
Ascent Xtrata Pro Validation
Ascent Xtrata Pro Validation is a custom module that can be used in conjunction
with Ascent Xtrata Pro Server for Ascent Capture batches. It provides an interface for
validating and manually correcting classification and extraction results returned by
the Server.
Ascent Xtrata Pro Statistics Viewer
The Ascent Xtrata Pro Statistics Viewer is a standalone application that displays the
statistical data gathered by the Ascent Xtrata Pro Server and the Ascent Xtrata Pro
Validation module. The statistics contain information about speed as well as about
recognition accuracy.
Ascent Xtrata Pro Technology
The following sections give a short overview of the processing capabilities of Ascent
Xtrata Pro. The capabilities are documented in detail in the following chapters.
Ascent Xtrata Pro User's Guide
7
Chapter 1
Classification
Classification is the process of determining the category (class) of a document by
identifying its relevant characteristics. The features used for classifying a document
can be geometrical or textual. The Ascent Xtrata Pro classification engine can use
either of these characteristics to make the best determination.
Classification Hierarchy
In most organizations, the manual classification of documents follows a hierarchical
scheme. First, the main category of a document is determined and then classification
is refined and performed in greater detail over several steps until the final result (the
type of document) is obtained.
With Ascent Xtrata Pro you can replicate your legacy classification hierarchy when
using automatic classification, thereby ensuring familiar results. This type of
hierarchical evaluation is designed to traverse the full extent of the classification tree
defined for a project. Different classification methods can be used at each level of the
hierarchy. Extraction can be defined for any class in the tree and is inherited by any
sub nodes of that class.
Layout Classification
Layout classification uses the geometric structure of a document to classify it. This
structure is learned automatically from a single sample page that serves as a
prototype for the geometric analysis. If the class contains documents of several
distinct layouts, layout classification can be used to match new documents with the
appropriate class.
Typically, layout classification is used for identifying forms in a batch. But, it can
also be used for recognizing the sender of a letter, if the sender’s document layout is
unique. For example, this might be the case for formal letters and invoices.
Content Classification
Content classification uses the textual content of a document to classify it. This type
of classification is trained with several dozen sample documents per class. The
Adaptive Feature Classifier (AFC) automatically determines the features that are
relevant for a class. Because the AFC is fault tolerant and evaluates words as well as
other features, even information with OCR or typing errors can be used to correctly
classify a document. The sample documents are analyzed and a classification pattern
is automatically created for use during production.
8 Ascent Xtrata Pro User's Guide
Overview
Instruction Classification
Instruction classification uses explicit rules about a document to classify it. These
rules consist of words and phrases that can be combined using Boolean operations.
Negative instructions can be used to inhibit placing a document into a class. When
used in conjunction with the AFC, these explicit instructions can be used to handle
exceptions.
Document Separation
Ascent Xtrata Pro is capable of separating multi-page .tif images into single
documents or grouping loose pages into multi-page documents.
Although disabled by default, document separation can be enabled as a project-level
setting in Project Builder. A variety of options are available for defining how Ascent
Xtrata Pro Server handles unclassified pages. When the feature is enabled, Ascent
Xtrata Pro Server performs document separation before extraction.
For details about setting up document separation, see Project Builder.
Extraction
Extraction is the act of processing a document, usually with an OCR engine, to
identify information from an image file and preserve that information as text.
For classified documents, a class-specific extraction algorithm is applied to the index
fields for that class. Ascent Xtrata Pro provides several complementary extraction
methods for both finding relevant information in a document, and for filling the
index fields with the extracted items.
Extraction is not performed for unclassified documents.
Locators
Extraction methods, which are called locators, are available as integrated components
that can be configured for any class or at the project level.
Locators are attached to one or more fields that store the results of the locator
algorithm. Locators and fields are inherited by classes in accordance with their
position in the class tree.
Ascent Xtrata Pro User's Guide
9
Chapter 1
Evaluators
In addition to the locators, various evaluators are available. Evaluators work on the
results of locators and do not directly retrieve data from the document.
Online Learning
The New Samples working mode is available within Project Builder. This working
mode shows documents that have been returned from validation. These documents
can be added to either a classification or extraction training set so that they may
optimize the extraction of tables and invoice header locators.
In order to make online learning available for a batch class, the Ascent Capture
Release module must be added to the list of queues for the batch class.
OCR and Script Integration
In addition to the classification and extraction methods provided with Ascent Xtrata
Pro, Project Builder also provides access to OCR settings and an editor for the builtin script engine.
OCR Integration
To process unstructured documents and locate arbitrary content, the complete
document must be processed by the OCR engine before any of the extraction
methods can be applied. The OCR results are stored in a structured representation of
the document that is saved as an .xdc (XDoc) file. All subsequent algorithms operate
on the XDoc representation of the original file.
OCR is integrated transparently into Project Builder and Ascent Xtrata Pro Server. It
is also performed automatically during runtime, and only on demand. This means
that it is only done when the full text results of a page are needed. For example,
when extraction is restricted to the first page of the document, and none of the
classification methods require more than one page, OCR is only performed on the
first page.
Ascent Xtrata Pro is delivered with the ABBYY ® Finereader ® 8.0 OCR engine. An
additional language package for Asian languages for ABBYY ® Finereader ® and an
additional recognition engine KADMOS 4.2 ®, developed by Recognition GmbH, is
available. The language package as well as additional recognition engines like for
example KADMOS 4.2 ® must be licensed separately.
10 Ascent Xtrata Pro User's Guide
Overview
Script Integration
A VBA-compatible script engine is built into Ascent Xtrata Pro. This engine can be
used to extend the capabilities of the classification, extraction, and validation
methods. The script is called when specific events occur before and after
classification. In the scripting environment, the complete Ascent Xtrata Pro object
document model is available to the script programmer.
Release Script
The Xtrata Pro Statistics release script lets you configure the settings for online
learning and statistical information.
To make online learning and statistical information available, the standard Ascent
Capture Release module must be added to the list of queues for the batch class and
the Xtrata Pro Statistics release script must be added to each Ascent Capture
document class in the batch class.
For further details about release scripts, see the Ascent Capture documentation.
Statistical Information
The statistics database contains information about server performance and
recognition accuracy. For a period of time, statistical information is available for each
field and document. After a user configurable number of days, this detailed
information will be accumulated into average daily values.
You can set the number of days in the properties dialog box for the release script.
Recognition accuracy statistics are available at the field level and as an average value
for each document. Furthermore, it is possible to group the statistical information by
the classification result or by other field values. You can then further evaluate the
statistical data by grouping it according to the value of that field. For example,
recognition accuracy or OCR computing time can be tracked for a field and then
grouped by supplier or by Ascent Capture document class.
The group value is set in the properties dialog box for the release script.
Validation
Before you can use the Ascent Xtrata Pro Validation module to correct documents,
validation must be set up in the Ascent Xtrata Pro Project Builder. Furthermore,
validation thresholds must be assigned, as well as validation methods and rules.
Ascent Xtrata Pro User's Guide
11
Loading...
+ 593 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.