Third-party software is copyrighted and licensed from Kofax’s suppliers.
This product is protected by U.S. Patent No. 5,159,667.
THIS SOFTWARE CONTAINS CONFIDENTIAL INFORMATION AND TRADE
SECRETS OF KOFAX, INC. USE, DISCLOSURE OR REPRODUCTION IS
PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF
KOFAX, INC.
Kofax, the Kofax logo, INDICIUS, Ascent Capture, Kofax Capture, VirtualReScan, the
“VRS VirtualReScan” logo, and VRS are trademarks or registered trademarks of
Kofax, Inc. in the U.S. and other countries. All other trademarks are the trademarks
or registered trademarks of their respective owners.
U.S. Government Rights Commercial software. Government users are subject to the
Kofax, Inc. standard license agreement and applicable provisions of the FAR and its
supplements.
You agree that you do not intend to and will not, directly or indirectly, export or
transmit the Software or related documentation and technical data to any country to
which such export or transmission is restricted by any applicable U.S. regulation or
statute, without the prior written consent, if required, of the Bureau of Export
Administration of the U.S. Department of Commerce, or such other governmental
entity as may have jurisdiction over such export or transmission. You represent and
warrant that you are not located in, under the control of, or a national or resident of
any such country.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED
CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY
IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE
EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Contents
How to Use This Guide .......................................................................................................... vii
Step 8: Publish Batch Class ................................................................................. 114
Step 9: Process Batch ............................................................................................ 114
Getting Started Guide (Classification and Separation)
v
vi Getting Started Guide (Classification and Separation)
Introduction
This guide introduces INDICIUS and describes how it is used to automatically
separate pages into documents and to classify documents. It starts with brief
installation instructions which are followed by a tutorial. The tutorial will guide you
through processing batches using the pre-installed Mortgage Applications example.
The guide then describes how each of the modules are configured, enabling you to
create a new set of configurations to use to classify mortgage application documents
(and how to assign this configuration to Kofax Capture). The final tutorial steps
through creating an alternative configuration, this time to include automatic
separation of pages into documents.
How to Use This Guide
This guide assumes that you have a thorough understanding of Windows standards,
applications, and interfaces.
This guide is for people who need an introduction to INDICIUS, specifically
automatically classifying and separating documents. It is beneficial to people who
will be:
Configuring INDICIUS to process documents on the Kofax Capture platform.
Administering or supporting an INDICIUS solution.
Read the entire guide sequentially. It includes several tutorials which need to be
completed in order. The tutorials require INDICIUS to be installed including the
INDICIUS examples.
If you need more detailed information on configuring a module, open the INDICIUS Help and read the relevant “How to configure” book. Additional details of all the
documentation provided with INDICIUS are included in the section Related
Documentation.
Getting Started Guide (Classification and Separation)
vii
Related Documentation
The following documentation is included with INDICIUS.
Each PDF guide can be opened by clicking Start on the taskbar to display the menu,
and selecting All Programs | INDICIUS | Documentation.
The INDICIUS Help can be opened from the same menu, but can also be opened from
the Help menu within the tools. Pressing F1 within Definer and Script Editor will
open the topic for the feature being used.
Installation Guide (.pdf)
This guide is written for those installing INDICIUS, either on a development
computer (where a solution is configured or tested) or on a production computer.
The guide explains:
System requirements.
Licensing requirements.
The procedure for installing INDICIUS.
How to customize modules running as dedicated applications.
How to install the unattended modules as Windows services.
User's Guide (.pdf)
The User's Guide (.pdf) is written for keyboard operators (keyers) who will be using
the attended modules on a production computer, and for those using all of the
modules on a development computer.
The guide explains:
What each INDICIUS module is used for.
How to operate each module.
viii Getting Started Guide (Classification and Separation)
Getting Started Guides
These guides are written for people who need an introduction to INDICIUS. The
guides are useful as a starting point for those who will be configuring or
administering INDICIUS, or those using the keying modules. The guides are self
contained, however each focuses on configuring a different document processing
solution.
Getting Started (Fixed-Form) (.pdf)
The Ge
tting Started Guide (Fixed-Form) (.pdf) focuses on configuring a solution to
extract data from fixed-form (structured) documents.
The guide explains:
How to extract data from single page documents of a known document type,
using the installed Order Forms example.
How the tools, concepts and configuration files relate to the setup in Kofax
Capture.
How to replicate the Order Forms configuration by following detailed
procedures.
Getting Started Guide (Free-Form) (.pdf)
The Ge
tting Started Guide (Free-Form) (.pdf) focuses on configuring a solution to extract
data from free-form (semi-structured or unstructured) documents.
The guide explains:
How to extract data from single page documents of a known document type,
using the installed Solicitors Letters example.
How the tools, concepts and configuration files relate to the setup in Kofax
Capture.
How to replicate the Solicitors Letters configuration by following detailed
procedures.
Getting Started Guide (Classification and Separation)
ix
INDICIUS Help
The INDICIUS Help is written for those configuring a solution and for system
administrators, and assumes those reading it have read the Getting Started Guides or
attended an INDICIUS training course. This assumption is made so that the
INDICIUS Help can provide the most accurate and detailed information across every
aspect of the product.
The INDICIUS Help explains:
How to configure the INDICIUS modules to process a document set.
How to use the module setup dialogs to assign a configuration to a batch
The integration of INDICIUS within the Kofax Capture platform.
How to set up and monitor an efficient production environment
The INDICIUS Help also contains a reference section which includes:
Definition file parameters used by the Recognition and Correction modules.
Script objects, hooks, methods and properties used by all of the modules.
class.
(Administration Help).
Visual Basic Scripting Help (.chm)
The Visual Basic Scripting Help (.chm) is provided for further information on VB
scripting.
x Getting Started Guide (Classification and Separation)
Introduction
This chapter introduces some of the concepts of data capture and key points of
INDICIUS.
What Does INDICIUS Add to Kofax Capture?
INDICIUS is a set of modules that provide additional automatic recognition
(classification, separation and extraction) as well as advanced keying (indexing and
validation) functionality to Kofax Capture.
Chapter 1
Overview
Kofax Capture scans paper-based documents, creating a series of scanned image
files. Alternatively, Kofax Capture Import Connector – Email can retrieve emails
(including attachments) from a server. Kofax Capture then routes the files through
INDICIUS, a set of modules that separate pages into documents, classify documents
and extract information. Within INDICIUS these classification, separation and
extraction results are presented for review by keyboard operators. The accurate,
validated data and images can then be exported to a back-end system using Kofax
Capture and/or INDICIUS depending on the requirements of the system.
Features of INDICIUS
This guide covers two key features of INDICIUS: classification and separation.
What is Classification?
sification is the process of assigning a type to each document, either to export to
Clas
the final repository or to use during extraction. INDICIUS can be configured to
classify documents directly or as a result of page classification and document
separation.
Getting Started Guide (Classification and Separation)
1
Chapter 1
Classification Methods
Classification can be done using one or more of the following methods:
Image Classification: Classification based on the overall layout and structure
of a page, including lines, boxes, logos and placement of text.
Text Classification: Classification based on detailed analysis of the text
content of a page or document.
Rules-Based Classification: Classification performed by searching for specific
data or keywords, independent of layout.
Templated Classification: Classification determined by the presence of one
or more marks, barcodes or items of text in pre-defined locations.
What is Separation?
Document sepa
ration methods provide an automated approach to identifying the
boundaries between multiple documents in a single batch.
Separation Methods
Document separation is determined from the page classification results using either
of the following methods:
Rules-based document separation One or more rules specify when new
documents are created; for example, if a page of type A is seen, create a
document of type X.
Advanced document separation A probabilistic method that ascertains the
most likely document structure from the page classifications and their
confidence scores. This method is robust to variation in documents and misclassifications due to its probabilistic nature.
Classification and Separation of Documents in Production
The Recognition and Document Review modules (along with Kofax Capture Scan)
are used to classify and separate documents.
Recognition
C
lassification and separation are done in the same processing step, in an instance of
the Recognition module. A single solution would do one of the following:
Document Classification
2 Error! No text of specified style in document.
Overview
Page Classification and Separation (resulting in document classification)
If extraction is also being done as part of the queue, an additional instance of the
Recognition module named INDICIUS Recognition (Classification and Separation) is
used for classification and separation.
This leaves the standard instance of Recognition available for extraction.
Note Data extraction is generally done in the standard instance of Recognition, once
all document types have been determined (and manually reviewed if needed).
Document Review
Document Review is usually used after Recognition (Classification and Separation)
to review the automatic classification results. Within Document Review, a user can
confirm any types that Recognition is uncertain about, fix any validation failures (for
example, by changing document type) and review the batch.
Scan
As we
ll as obtaining the images, Kofax Capture Scan is used prior to the INDICIUS
modules to do one of the following:
Establish document boundaries using patch code separators or fixed pages
(for a Document Classification solution).
Place imported pages in a single document (for a Page Classification and
Separation solution).
What is the System Architecture for an INDICIUS Solution?
The
system architecture is determined by the capacity of the system: high volume or
low volume.
High Volume, Distributed Environment
In high volume environments it is typical to have multiple stations processing
batches, with each station dedicated to running a specific module.
Getting Started Guide (Classification and Separation)
3
Chapter 1
Low Volume, Single Station Environment
In lower volume environments it is possible to run batches through all the modules
on a single station, using Kofax Capture Batch Manager.Configuring a Classification
and Separation Solution
Configuration (that is, setting up the INDICIUS modules to process particular
documents) is a two step process:
Configure the INDICIUS modules using the INDICIUS configuration tools
and a set of sample documents.
Assign the configuration to a batch class using Kofax Capture
Administration.
The primary INDICIUS tool used for configuration is Transformation Studio. The
tutorial in this guide will step you through the configuration process.
The Tutorial
This guide includes a tutorial on processing documents using the classification and
separation functionality in INDICIUS.
The tutorial works through processing and configuring two solutions:
Document Classification
Page Classification and Separation
The Example Documents
The tutoria
following document types:
Appraisal Report
Header
Funding Transmittal
Redemption
Initial Escrow
Request for Tax Form
Tax Escrow
Truth In Lending
Loan Application
4 Error! No text of specified style in document.
l uses a set of example mortgage application documents with the
Introduction
This chapter provides instructions for installing INDICIUS using the installation
wizard (standard installation).
To install INDICIUS the following items are required:
1 A computer satisfying the system requirements as described in the
Installation Guide (.pdf).
2 An INDICIUS installation CD.
Chapter 2
Installing INDICIUS
3 A Kofax Capture license hardware key with INDICIUS features enabled.
Note Kofax Capture must be pre-installed and licensed.
INDICIUS is installed to its own program folder. By default this location, referred to
as <Installation Path>, is:
C:\Program Files\INDICIUS\.
Getting Started Guide (Classification and Separation)
5
Chapter 2
Installing INDICIUS for the First Time
Standard Installation
X To install INDICIUS
1 Place the INDICIUS installation CD into the CD-ROM drive.
The main installation screen will display.
2 Select INDICIUS and follow the on-screen instructions.
3 To install Document Review, select INDICIUS Document Review and follow
the on-screen instructions.
4 To install Transformation Studio, select Transformation Studio and follow
the on-screen instructions.
Licensing
The Kofax Capture license hardware key controls:
The number of stations of each module that can be run simultaneously.
Any optional features for each module.
The page throughput of Recognition and Scripted Export.
Note The tools (except Recognition Test Tool) are not licensed through the hardware
key.
For more information on licensing, refer to the Installation Guide (.pdf).
6 Error! No text of specified style in document.
Introduction
This chapter will introduce you to the INDICIUS modules as they are used in
production. You will use a pre-configured example solution to experience how the
modules run.
The Mortgage Applications Example
The INDICIUS installation includes an example configuration that demonstrates
some of the processing features of classification and separation in INDICIUS. The
example includes INDICIUS module configurations (assigned to pre-defined batch
classes) for capturing data from a set of example images.
Chapter 3
Processing
The example contains two different configurations which demonstrate two different
methods for using INDICIUS. The first demonstrates document classification, and
the second demonstrates page classification and separation (resulting in document
classification).
Getting Started Guide (Classification and Separation)
7
Chapter 3
Setting Up the Classification and Separation Instance of
Recognition
The following section will describe how to set up the classification and separation
instance of Recognition.
Registering the Additional Instance
The following steps will guide you through registering an additional instance of
Recognition to be used for classification and separation.
X To register the Classification and Separation instance of Recognition
1 Click Start on the taskbar to display the menu, and select All Programs |
Kofax Capture 8.0 | Administration.
2 Select Tools | Custom Module Manager... to open the Custom Module
Manager window.
3 Click Add to open the file selection window and select the following file:
The Import/Export window will display showing the progress of the
unpacking operation.
5 When unpacking has completed, click OK.
The Import window displays showing “Mortgage Apps” in the Available
Batch Classes list.
6 Double-click “Mortgage Apps” to add it to the Selected Batch Classes list.
7 Click Import.
The Import/Export window will display showing the progress of the import
operation.
8 When import has completed, click OK.
9 Repeat the previous steps to import the following batch class:
<Installation Path>\examples\Mortgage Applications\Mortgage Apps with
Separation.cab.
Important For this batch class (or whichever batch class you import last)
select the “Do not import duplicates” option on the Import window. If you
do not select this option, Kofax Capture will rename the pre-configured
document classes during import and the example will not function correctly.
10 Select File | Publish.
10 Getting Started Guide (Classification and Separation)
Processing
The Publish window will display.
11 Press and hold Ctrl and click to select the following batch classes:
Mortgage Apps
Mortgage Apps with Separation
12 Click Publish.
The progress of the publishing operation will be logged in the Results panel.
Note It is normal for a warning to be generated when the batch class is
published. This is because Index Fields are defined to hold exported data but
neither Kofax Capture Validation nor Kofax Capture Recognition Server are
included in the batch class. This warning can be ignored, as the index fields
will in fact be populated by the INDICIUS modules.
Troubleshooting
In a client-server installation of Kofax Capture, an error may be generated when the
batch class is published.
The batch class is configured to store images in the Kofax Capture images folder, and
an error is raised if this folder exists elsewhere on the network. If this is the case, use
the procedure below to specify a different images folder.
X To specify a different images folder
1 On the Batch panel, select the “Mortgage Apps” batch class.
2 Right click on the selection to display the menu, and select Properties.
3 The Batch Class Properties window is displayed.
4 Change the image folder by directly entering a path in the “Image folder:”
box, or click Browse to navigate to a folder on disk.
5 Click OK.
6 Repeat the previous steps for the “Mortgage Apps with Separation” batch
class.
7 Select File | Publish.
The Publish window will display.
8 Press and hold Ctrl and click to select the following batch classes:
Getting Started Guide (Classification and Separation)
11
Chapter 3
Mortgage Apps
Mortgage Apps with Separation
9 Click Publish.
The progress of the publishing operation will be logged in the Results panel.
10 When publishing has been completed, click Close.
12 Getting Started Guide (Classification and Separation)
Processing
Viewing the Modules
X To view the modules included in the example batch classes
1 On the Batch panel, select the “Mortgage Apps” batch class.
2 Right click on the selection to display the menu, and select Properties.
3 The Batch Class Properties window is displayed.
4 Select the Queues tab.
The modules included in the batch class are displayed in the Selected Queues
list.
5 Click OK.
6 Optionally, repeat the previous steps for the “Mortgage Apps with
Separation” batch class.
Both batch classes contain the same modules, but these modules are
configured differently in each case.
Getting Started Guide (Classification and Separation)
13
Chapter 3
Document Classification
In the document classification example, INDICIUS Recognition (Classification and
Separation) and INDICIUS Document Review are used to classify mortgage
applications. Document boundaries are established prior to INDICIUS Recognition
(Classification and Separation) using patch code separators in Kofax Capture Scan.
The complete set of Kofax Capture and INDICIUS modules used is:
This section will step you through classifying documents automatically using the
example. This document will first take you through typical processing in a low
volume scenario, using Kofax Capture Batch Manager. It will then step you through
the same process, but opening each module as a dedicated application.
Tutorial: Process the Batch from Batch Manager
Crea
te a Batch
X To create an example batch in Batch Manager
1 Start Batch Manager by clicking Start on the taskbar to display the menu, and
selecting:
All Programs | Kofax Capture 8.0 | Batch Manager.
2 Select File | New Batch or click Create Batch on the toolbar.
3 Make sure the “Mortgage Apps” batch class is selected.
14 Getting Started Guide (Classification and Separation)
Processing
Figure 3-1. Create Batch Window
4 Enter a name for the new batch in the “Name:” box, for example “Mortgage
Applications 1”.
5 Click Save.
6 Click Close.
Your batch is displayed in the list. The Queue column indicates that the
batch is ready to be processed by Kofax Capture Scan.
Import Images and Establish Document Boundaries
X To import images and establish document boundaries
1 Make sure the name of the new batch is highlighted and select File | Process
Batch or click Process Batch(
The batch is opened in Kofax Capture Scan.
2 Click Scan Batch (
) to display an Import window.
3 Select the images in the following location.
Getting Started Guide (Classification and Separation)
4 Click Open to import the images and establish document boundaries.
Kofax Capture Scan detects the patch code separators placed between each
document and uses this to create the full batch structure (shown in the tree
view).
5 Select Batch | Close and click Yes to the message box.
In Batch Manager, the Queue column indicates that the batch is ready to be
processed by the Classification and Separation instance of Recognition.
Classify the Documents
Use the Classification and Separation instance of the Recognition module to
automatically classify the documents. Any documents that can not be recognized
confidently will be displayed in the next stage, in Document Review.
XTo classify the documents, click Process Batch on the toolbar in Batch
Manager.
Recognition will automatically begin processing the batch. Information messages
will be displayed and the “Docs processed” should increment. When “Docs
processed” reaches 12, Recognition will close.
In Batch Manager, the Queue column indicates that the batch is ready to be
processed by INDICIUS Document Review.
Review the Classification Results
Use INDICIUS Document Review to confirm any document types that Recognition is
unsure about, or fix problems with any documents which are failing validation rules.
X To review the automatic classification results
1 Click Process Batch on the toolbar in Batch Manager.
2 Wait for the batch to be automatically loaded into Document Review.
Document Review will launch with the batch open in the module’s
Document Classification view. This is used to quickly set any missing
document types or confirm any that are uncertain. If nothing can be quickly
fixed in this view, the problems can be overridden and will then display in
the Review view where you can see all the documents in the batch.
16 Getting Started Guide (Classification and Separation)
Processing
Figure 3-2. The Batch Loaded in Document Review
3 Select Document Classification | Override Problem to ignore the problem for
now.
A message will display stating that there are no more problem documents to
display in Document Classification view so the batch will now be shown in
Review.
Note You can also press F7 to override a problem.
Getting Started Guide (Classification and Separation)
17
Chapter 3
Figure 3-3. Transition from Document Classification view to Review
4 Click OK on the message.
The batch will open in Review, with the overridden document displayed. It
has failed a validation rule (shown in the yellow message); a problem which
must be fixed before the batch can be closed.
18 Getting Started Guide (Classification and Separation)
Processing
Figure 3-4. The Batch in Review
In Review, you can see all the documents in the batch. Although the best
classification result is “Appraisal Report,” the document is poor quality and
appears to be upside down.
5 Press F3 to rotate the image by 180˚.
You can see that this is a very poorly scanned Truth In Lending document.
The next document in the batch is also a Truth In Lending.
6 Click the + buttons to the left of the two documents to expand them and
display the thumbnails.
Getting Started Guide (Classification and Separation)
19
Chapter 3
Figure 3-5. The Two Documents with Thumbnail Images
7 Compare the two documents.
They appear to be the same document (they have the same loan number in
the top left). The first of the two documents can therefore be deleted.
8 Right click on the problem document to display the context menu
20 Getting Started Guide (Classification and Separation)
Loading...
+ 94 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.