Nuance OMNIPAGE PRO 14, SCANSOFT OMNIPAGE PRO 14, OmniPage Pro - 14.0 User Manual

2.75 Mb
Loading...

L E G A L N O T I C E S

Copyright © 2003 ScanSoft, Inc. All rights reserved. No part of this publication may be transmitted, transcribed, reproduced, stored in any retrieval system or translated into any language or computer language in any form or by any means, mechanical, electronic, magnetic, optical, chemical, manual, or otherwise, without prior written consent from ScanSoft, Inc., 9 Centennial Drive, Peabody, Massachusetts 01960. Printed in the United States of America and in Ireland.

The software described in this book is furnished under license and may be used or copied only in accordance with the terms of such license.

IM P O R T A N T NO T I C E

ScanSoft, Inc. provides this publication "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Some states or jurisdictions do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you. ScanSoft reserves the right to revise this publication and to make changes from time to time in the content hereof without obligation of ScanSoft to notify any person of such revision or changes.

TR A D E M A R K S A N D CR E D I T S

ScanSoft, OmniPage, OmniPage Pro, PaperPort, Pagis, True Page, Direct OCR, RealSpeak and

ASR-1600 are registered trademarks or trademarks of ScanSoft, Inc., in the United States and/or other countries. All other company names or product names referenced herein may be the trademarks of their respective holders.

TH I R D PA R T Y L I C E N S E S /NO T I C E S

Please see acknowledgements/notices at the end of this guide.

ScanSoft, Inc.

9 Centennial Drive

Peabody, MA 01960

U.S.A.

ScanSoft Belgium BVBA

Guldensporenpark 32

BE-9820 Merelbeke

Belgium

Part Number 50-281A-10119

C O N T E N T S

 

W E L C O M E

7

 

Using this Guide

8

 

Getting online Help

9

 

Online HTML Help

9

 

Context-Sensitive Help

9

 

Tech Notes

10

 

Glossary

10

 

When to go online

10

1

I N S T A L L A T I O N A N D S E T U P

11

 

System requirements

12

 

Installing OmniPage Pro

13

 

Setting up your scanner with OmniPage Pro

14

 

How to start the program

16

 

Registering your software

17

 

New features in OmniPage Pro 14

17

2

I N T R O D U C T I O N

19

 

What is optical character recognition

20

 

OmniPage Pro’s OCR capabilities

20

 

Documents in OmniPage Pro

21

 

Basic processing steps

21

 

The OmniPage Desktop

22

 

The Menu bar

23

 

The Toolbars

23

 

The Image Panel

24

 

The Text Editor

24

 

The OmniPage Toolbox

25

OmniPage Pro User’s Guide

iii

 

Managing documents

26

 

Thumbnails

26

 

Document Manager

27

 

Customizing Document Manager columns

28

 

Deleting pages from a document

28

 

Printing a document

29

 

Closing a document

29

 

OmniPage Documents

29

 

Why save to OPD

30

 

How to save to OPD

30

 

How to load an OPD

31

 

Settings

31

3

P R O C E S S I N G D O C U M E N T S

33

 

Quick Start Guide

34

 

Loading and recognizing sample image files

34

 

Scanning and recognizing a single page

34

 

Processing overview

36

 

Automatic processing

38

 

Stopping and restarting automatic processing

39

 

Manual processing

40

 

Combined processing

41

 

Processing with workflows

43

 

Processing from other applications

44

 

How to set up Direct OCR

44

 

How to use Direct OCR

45

 

How to use OmniPage Pro with PaperPort

46

 

Processing with the Batch Manager

47

 

Defining the source of page images

48

 

Input from image files

48

 

Input from scanner

49

 

Scanning with an ADF

50

 

Scanning without an ADF

51

 

Describing the layout of the document

51

 

Zones and backgrounds

53

 

Automatic zoning

53

iv Contents

 

Manual zoning

54

 

Zone types and properties

55

 

Working with zones

57

 

Speed zoning

59

 

Table grids in the image

59

 

Using zone templates

61

4

P R O O F I N G A N D E D I T I N G

63

 

The editor display and views

64

 

Proofreading OCR results

65

 

Verifying text

67

 

User dictionaries

68

 

Languages

69

 

Training

69

 

Manual training

70

 

IntelliTrain

70

 

Training files

71

 

Text and image editing

73

 

On-the-fly editing

75

 

Reading text aloud

76

5

S A V I N G A N D E X P O R T I N G

79

 

Saving OmniPage Documents

80

 

Export Results button

80

 

Saving original images

81

 

Saving recognition results

82

 

Selecting a formatting level

83

 

Selecting converter options

85

 

Using multiple converters

86

 

Saving to PDF

88

 

Converting from PDF

89

 

Copying pages to Clipboard

89

 

Sending pages by mail

90

 

Other export targets

92

OmniPage Pro User’s Guide

v

6

W O R K F L O W S

93

 

Workflows

94

 

Sample workflows

94

 

Running workflows

96

 

Workflow Assistant

98

 

Creating workflows

98

 

Modifying workflows

101

 

Batch Manager

101

 

Creating new jobs

102

 

Modifying jobs

103

 

Managing and running jobs

103

 

Watched folders

104

 

Barcode driven workflows

106

 

Voice recognition

107

7

T E C H N I C A L I N F O R M A T I O N

109

 

Troubleshooting

110

 

Solutions to try first

110

 

Testing OmniPage Pro

111

 

Increasing memory resources

112

 

Increasing disk space

112

 

Text does not get recognized properly

113

 

Problems with fax recognition

114

 

System or performance problems during OCR 114

 

Supported file types

115

 

File types for opening and saving images

115

 

File types for saving recognition results

116

 

Uninstalling the software

118

vi Contents

Welcome

Welcome to this OmniPage Pro® text recognition program, and thank you for choosing our software! The following documentation has been provided to help you get started and give you an overview of the program.

This User’s Guide

This guide introduces you to using OmniPage Pro 14. It includes installation and setup instructions, a description of the program’s commands and working areas, task-oriented instructions, ways to customize and control processing, and technical information. The guide is presented in PDF format, allowing you to use hyperlink jumps on cross-references and other navigation tools in your PDF viewer.

Online Help

OmniPage Pro’s online Help contains information on features, settings, and procedures. The online Help is provided as HTML help, and has been designed for quick and easy information retrieval. Comprehensive context-sensitive help aims to provide just enough assistance to let you keep working without delay. See “Getting online Help” on page 9.

Readme File

The Readme file contains last-minute information about the software. Please read it before using OmniPage Pro. To open this HTML file, choose Readme in the OmniPage Pro Installer or afterwards in the Help menu.

Scanning and other information

ScanSoft’s web site at www.scansoft.com provides timely information on the program. The Scanner Guide contains up-dated information about supported scanners and related issues; ScanSoft tests the 25 most widely

OmniPage Pro User’s Guide

7

used scanner models. Access ScanSoft’s web site from the OmniPage Pro Installer or afterwards from the Help menu.

Using this Guide

This guide is written with the assumption that you know how to work in the Microsoft Windows environment. Please refer to your Windows documentation if you have questions about how to use dialog boxes, menu commands, scroll bars, drag and drop functionality, shortcut menus, and so on.

We also assume you are familiar with your scanner and its supporting software, and that the scanner is installed and working correctly before it is setup with OmniPage Pro 14. Please refer to the scanner’s own documentation as necessary.

The following conventions are used in this guide:

Bold

Introduces new terms and presents sub-headings.

 

 

Italic

Names topics in the online Help system.

 

Presents longer option texts in dialog boxes.

 

 

Non-serif

Presents file names: sample.tif

 

 

 

A note presents an item of additional information.

 

 

 

A tip presents ideas for using program features to

 

accomplish specific tasks.

 

 

 

OmniPage Pro 14 Office is a version of the product

 

designed for more intensive use and is tailored to office

 

environments. Its added features are denoted through

 

the guide by this symbol. For a concise listing, see “New

 

features in OmniPage Pro 14” on page 17.

 

 

8 Welcome

Getting online Help

In addition to using this guide, you can use OmniPage Pro’s online Help to learn about features, settings, and procedures. Online Help is available after you install OmniPage Pro.

Online HTML Help

Open OmniPage Pro’s online Help at its top level by choosing Help Topics at the top of the Help menu. This allows you to see topics arranged in a Table of Contents, search an alphabetical list of keywords or make full-text searches through the topics. Other items in the Help menu provide access to useful topics or web pages.

Press F1 as you are working with the program to see an online help topic relating to the current screen area, dialog box or warning message.

Context-Sensitive Help

You can get concise on-the-spot information in a popup window about a particular OmniPage Pro menu item, toolbar button, screen area or dialog box, in the following ways:

Click the Help tool in the Standard toolbar to get the help icon. Click this on any item on the desktop outside a dialog box or warning message.

Press Shift + F1 to get the same help icon. Use Shift + F1 to get contextsensitive help for shortcut menu items.

Click the question mark button in the upper right corner of a dialog box and then click an item in the dialog box to see the popup window.

Some dialog boxes or warning messages have their own Help button, or a help text. Click the button or the text to get information on the dialog or message box.

Click anywhere to remove a context-sensitive popup Help window.

OmniPage Pro User’s Guide

9

Tech Notes

ScanSoft’s web site at www.scansoft.com contains Tech Notes on commonly reported issues using OmniPage Pro 14. Web pages may also offer assistance on the installation process and troubleshooting.

Glossary

This guide does not include a glossary. The online Help has a comprehensive glossary, with its own alphabetical index and a table of contents. Please consult it if you want to find the meaning of a term used in this guide or in the program.

When to go online

This guide concentrates on providing background understanding of program features, suggesting also what they can be useful for. The online Help provides mainly numbered procedures. Turn to online Help for the following items or for grater detail on the following topics:

Keyboard guide

Settings guidelines

Manual training

Export Converter options

Using the Text Editor

On-the-fly zoning and editing

10 Welcome

Chapter 1

Installation and setup

This chapter provides information on installing and starting OmniPage

Pro 14. It presents the following topics:

System requirements

Installing OmniPage Pro

Setting up your scanner with OmniPage Pro

How to start the program

Registering your software

New features in OmniPage Pro 14

OmniPage Pro User’s Guide

11

System requirements

The minimum requirements to install and run OmniPage Pro 14 are:

A computer with an Intel® Pentium® III processor or equivalent

Microsoft® Windows® 98 (from second edition), Windows Me, Windows NT® 4.0 (from Service Pack 6), Windows 2000 (from Service Pack 2), Windows XP or Windows Server 2003

Microsoft Internet Explorer 5.01 with at least Service Pack 2

128MB of memory (RAM), 256MB recommended

135MB of free hard disk space for application and sample files plus 40-45MB working space during installation. Additionally:

20-67 MB per RealSpeakTM module (343 MB for 9 languages)

2 MB per ASR speech recognition language (15MB for 7 languages) *

18 MB for ScanSoft PDF Converter *

3.4 MB for ScanSoft PDF Printer Driver *

5MB for Microsoft Installer (MSI) if not present (it is included in most Windows operating systems)

Up to 5MB for system updates

An SVGA monitor with 256 colors, but preferably 16-bit color (called High Color in Windows 2000 and Medium Color in XP) and a resolution of at least 800 x 600 pixels

A CD-ROM drive for installation

A Windows compatible pointing device

A compatible scanner with its own scanner driver software, if you plan to scan documents. See the Scanner Guide at ScanSoft’s web site (www.scansoft.com) for a list of supported scanners

Web access is needed for product registration, Scanner Wizard database updating and obtaining live updates for the program.

*Supplied with OmniPage Pro 14 Office only.

Performance and speed will be enhanced if your computer’s processor, memory, and available disk space exceed minimum requirements.

12 Installation and setup

Chapter 1

Installing OmniPage Pro

OmniPage Pro 14’s installation program takes you through installation with instructions on every screen.

Before installing OmniPage Pro:

Close all other applications, especially anti-virus programs.

Log into your computer with administrator privileges if you are installing on Windows NT, 2000, XP or Server 2003.

If you own a previous version of OmniPage Pro, or if you are upgrading from demonstration software or an OmniPage Special Edition, the installer asks your consent to uninstall that product.

To install OmniPage Pro:

1.Insert OmniPage Pro’s CD-ROM in the CD-ROM drive. The installation program should start automatically. If it does not start, locate your CD-ROM drive in Windows Explorer and double-click the Autorun.exe program at the top-level of the CD-ROM.

2.Choose a language to use during installation. Accept the End-User License Agreement and enter the serial number shown on the CD envelope.

3.Choose a complete or a custom installation. A complete installation installs all RealSpeakTM Text-to-Speech language modules (currently 9). In OmniPage Pro 14 Office, up to 7 ASR-1600™ Speech Recognition modules are installed. Custom installation lets you exclude or add modules. To exclude a module, click its down arrow and select ‘This feature will be installed when required’.

4.Follow the instructions on each screen to install the software. All files needed for scanning are copied automatically during installation.

Sometimes uninstalling and then reinstalling OmniPage Pro will solve a problem. See “Uninstalling the software” on page 118.

You can use the Control Panel’s Add/Remove Programs facility to add or remove RealSpeak or ASR modules later. You will need your installation CD for this.

Installing OmniPage Pro

13

Setting up your scanner with OmniPage Pro

All files needed for scanner setup and support are copied automatically during the program’s installation, but no scanner setup occurs at installation time. Before using OmniPage Pro 14 for scanning, your scanner should be installed with its own scanner driver software and tested for correct functionality. Scanner driver software is not included with OmniPage Pro.

Scanner setup is done through the Scanner Setup Wizard. You can start this yourself, as described below. Otherwise, it appears when you first attempt to perform scanning. Proceed as follows:

Choose Start!All Programs!ScanSoft OmniPage Pro 14.0! Scanner Wizard

or click the Setup button in the Scanner panel of the Options dialog box.

or choose Scan in the Get Page drop-down list in the OmniPage Toolbox and click the Get Page button.

The Scanner Setup Wizard starts. If you have a web connection, the first panel invites you to update the scanner database supplied with the wizard. Choose Yes or No and click on Next.

Choose ‘Select and test scanner or digital camera’, then click Next. If you have a single installed scanner, it appears, along with any scanners previously set up with OmniPage Pro. If the required scanner is not listed, click Add Scanner... .

You see a list of all detected scanner drivers in the checkmarked categories. This can include network devices. Select one and click OK. To install a second device, you must run the Scanner Wizard again.

The wizard reports whether the chosen scanner model already has settings in the scanner database. If it does, you do not need to test it. If it does not, you should test it. Click on Next.

If you chose not to test, click Finish. If you chose testing, click Next to have the scanner connection tested. If the connection is in order, you see a menu of further tests. Choose which testing steps you want to run. The Basic test scan is recommended.

14 Installation and setup

Chapter 1

By default OmniPage Pro uses its own scanning interface, located in the Scanner panel of the Options dialog box. If you want to use your scanner’s own interface instead, choose Advanced settings and select this. Choose Modify hints only if you are experienced in configuring scanners or have been advised by Technical Support to do so.

Click Next to start the tests. For the Basic scan test, insert a test page into your scanner. The wizard will scan using your scanner manufacturer’s software. Click on Next. Your scanner’s native user-interface will appear.

Click on Scan to begin the sample scan.

If necessary, click on Missing Image… or Improper Orientation... and make the appropriate selections.

Once the image appears correctly in the window, click on Next.

Move through the remaining requested tests, following the instructions on the screen.

When all the requested tests have been completed successfully, the Scanner Wizard reports and invites you to click on Finish.

You have successfully configured your scanner to work with OmniPage Pro 14!

To change the scanner settings at a later time, or to setup or remove a scanner, reopen the Scanner Setup Wizard from the Windows Start menu or from the Scanner panel of the Options dialog box.

To test and repair an improperly functioning scanner, open the wizard and select ‘Test the current scanner or digital camera’ in the second panel, then work through the procedure described above, maybe using advice received from Technical Support.

To specify a different default scanner, open the wizard to reach the list of setup scanners. Move the highlight to the desired scanner and be sure to close the wizard with Finish.

To get updated settings for your current scanner, open the wizard, request a fresh database download in the first screen, then choose ‘Use current settings with current device’, click Next and then Finish.

Setting up your scanner with OmniPage Pro

15

How to start the program

To start OmniPage Pro 14 do one of the following:

Click Start in the Windows taskbar and choose All Programs! ScanSoft OmniPage Pro 14.0!OmniPage Pro 14.0.

Double-click the OmniPage Pro icon in the program’s installation folder or on the Windows desktop if placed there.

Double-click an OmniPage Document (OPD) icon or file name; the clicked document is loaded into the program. See “OmniPage Documents” on page 29.

Right click one or more image file icons or file names for a shortcut menu. Select Open With... OmniPage Pro application. The images are loaded into the program.

On opening, OmniPage Pro’s title screen is displayed and then its desktop. See “The OmniPage Desktop” on page 22. It provides an introduction to the program’s main working areas.

There are several ways of running the program with a limited interface:

Use the Batch Manager program. Click Start in the Windows taskbar and choose All Programs!ScanSoft OmniPage Pro 14.0! OmniPage Batch Manager. See page 47.

Click Acquire Text from the File menu of an application registered with the Direct OCR™ facility. See “How to set up Direct OCR” on page 44.

Right-click on one or more image file icons or file names for a shortcut menu. Select OmniPage Pro 14 and choose a target format or a workflow from its sub-menu. The files will be processed according to the workflow instructions. See page 96.

Click the OmniPage icon on the taskbar. Choose a workflow to start the program and run the workflow. In OmniPage Pro 14 Office, voice selection of workflow is possible.

Use OmniPage Pro 14 with ScanSoft’s PaperPort® or Pagis® document management products, to add OCR services. See “How to use OmniPage Pro with PaperPort” on page 46.

16 Installation and setup

Chapter 1

Registering your software

ScanSoft’s online registration runs at the end of installation. Please ensure web access is available. We provide an easy electronic form that can be completed in less than five minutes. When the form is filled, click Submit. If you did not register the software during installation, you will be periodically invited to register later. You can go to www.scansoft.com to register online. Click on Support and from the main support screen choose Register in the left-hand column. For a statement on the use of your registration data, please see ScanSoft’s Privacy Policy.

New features in OmniPage Pro 14

The OmniPage® product family is augmented by OmniPage Pro 14. If you are upgrading, you may not need to consult this guide very much. Here are some main areas of innovation compared to OmniPage Pro 12. Features unique to OmniPage Pro 14 Office have the Office icon:

Feature

Description

See

Higher accuracy

A new recognition and parsing engine, four years in development, delivers

page 31

even better OCR accuracy.

 

 

Improved layout retention

This engine also delivers superior page layout retention with True Page and

page 83

Flowing Page formatting levels. Text flows better round irregular pictures.

 

 

 

 

 

Workflows and instant access

Save workflows to perform recurring tasks without having to take care of indi-

page 93

vidual settings each time. Run workflows instantly from your taskbar.

 

 

 

 

 

Drag-and-drop recognition

Select a zone and drag it into the working area of a target application. Graphics

page 89

are pasted as image, text is recognized and pasted.

 

 

Adjustable recognition speed

Recognition performance can be optimized for greater speed or greater accu-

page 31

racy. On good quality documents even faster processing gives good results.

 

 

 

 

 

Speed zoning

Do manual zoning quickly. See auto-detected zones and double-click on any of

page 59

them to turn them into real zones.

 

 

 

 

 

Financial dictionary

An English financial dictionary is added to the existing legal and medical pro-

page 31

fessional dictionaries, available for several languages.

 

 

Character validation

Validate individual accented letters for recognition, in addition to those enabled

page 31

by the language choice.

 

 

 

 

 

Bullets and numbering

Bulleted and numbered paragraphs can be detected. Bullets and numbering

page 73

can be inserted, removed and edited in the Text Editor.

 

 

 

 

 

More portable OPDs

Save to OmniPage Document (Extended) to have training files, user dictionar-

page 29

ies or zone templates embedded in the OPD.

 

 

Registering your software

17

Feature

Description

See

Office

Colored backgrounds

Get better recognition of text printed on color or shaded back-

page 49

 

grounds. There is also improved noise removal.

 

 

 

 

 

 

 

 

Resolution control

Choose the resolution for saved page images and for images

page 85

 

embedded in recognized pages.

 

 

 

 

 

 

 

 

Improved proofing system

The two parts of words hyphenated at line ends are now joined.

page 65

 

The image viewer and the verifier display both image parts.

 

 

 

 

Concurrent saving

Create multiple converters to save to more than one file type in

page 86

 

one step: for example save page images and recognized pages.

 

 

 

 

 

 

 

 

Audio book publishing

Save recognized texts as WAV audio files. Transfer these to CD to

page 116

 

have scanned documents read aloud anytime - even on the move.

 

 

 

 

 

 

 

 

Voice read-back

ScanSoft RealSpeak as the Text-to-Speech solution provides bet-

page 76

 

ter quality voices in more languages for having text read aloud.

 

 

 

 

Batch Manager

Redesign of the previous Schedule OCR facility gives more control

page 101

 

and better overview for unattended processing of pre-defined jobs.

 

 

 

 

 

 

 

 

Smart Folders

Jobs can take input from watched folders, with better handling of

page 104

 

multiple folders. Background processing runs whenever image

 

 

files are sent to these folders. Recurring jobs are supported.

 

 

 

 

 

 

Barcode cover pages

Start a workflow by placing a barcode cover page in a scanner on

page 106

 

top of your document. Cover page image files can start image file

 

 

workflows.

 

 

 

 

 

 

Greater PDF support

Generate tagged, signed and encrypted Portable Document Files.

page 88

 

Allow reading of tags when PDF files are opened to improve layout

 

 

retention.

 

 

 

 

 

 

Open PDF files in MS Word

ScanSoft enables PDF files to be converted to Word documents

page 89

 

when working in Microsoft Word, without using OmniPage Pro.

 

 

 

 

 

 

 

 

Export to Office 2003

Support for Microsoft Word 2003 (Word ML). In OmniPage Pro 14

page 116

 

Office support is added for Microsoft Reader (.LIT) and maintained

 

 

for the XML and eBook file types.

 

 

 

 

 

 

Voice control

Start workflows and control proofing corrections by voice com-

page 107

 

mands in a number of languages, using the included ScanSoft

 

 

ASR-1600 voice recognition modules.

 

 

 

 

 

 

SharePoint, DMS and FTP

Take image files from Microsoft SharePoint, any ODMA-compliant

page 92

 

Document Management System (DMS) or an FTP site and export

 

 

files to these locations for storage or group use.

 

 

 

 

 

 

Print to PDF functionality

Create searchable, editable PDF files from text files, such as Word

page 88

 

documents. OmniPage Pro installs a PDF printer driver that

 

 

becomes available in all your print-capable applications.

 

 

 

 

 

 

A more complete list of features and differences appears in online Help.

18 Installation and setup

Chapter 2

Introduction

You probably use your computer for business correspondence, preparing reports, handling data and an ever-increasing number of other uses. The challenge is that, in spite of the digital revolution, certain sources of information still circulate in printed, paper form and cannot be used immediately in a computer.

For example, if you want to incorporate information from a magazine article in a report you are preparing, you somehow have to get the text from the article into your computer. Painstakingly retyping the article is not an appealing solution.

This chapter introduces you to the solution: optical character recognition (OCR). It describes how OmniPage Pro 14 uses OCR technology to transform text from scanned pages or image files into editable text for use in your favorite computer applications.

We present the following topics:

What is optical character recognition

Documents in OmniPage Pro

Basic processing steps

The OmniPage Desktop

Managing documents

OmniPage Documents

Settings

OmniPage Pro User’s Guide

19

What is optical character recognition

Optical character recognition is the process of extracting text from an image. This image can result from scanning a paper document or opening an electronic image file. Images do not have editable text characters; they have many tiny dots (pixels) that together form character shapes. These present a picture of the text on a page.

During OCR, OmniPage Pro analyzes the character shapes in an image and defines solutions to produce editable text. After OCR, you can save the resulting text to a variety of word-processing, desktop publishing or spreadsheet applications.

OmniPage Pro’s OCR capabilities

In addition to text recognition, OmniPage Pro can retain the following elements of a document through the OCR process.

Graphics

Photos, logos, and drawings are examples of graphics.

Text formatting

Font types, sizes and styles (such as bold, italic and underlines) are examples of character formatting. Indents, tabs, margins and line spacing are examples of paragraph formatting.

Page formatting

Column structure, table formats, and placement of graphics and headings are examples of page formatting.

The graphics, text and page formatting elements that OmniPage Pro retains are determined by the settings you select. Refer to the Settings Guidelines in the online Help for more information about selecting settings.

OmniPage Pro only recognizes machine-generated characters such as offset or laserprinted or typewritten text. However, it can retain handwritten text, such as a signature, as a graphic.

20 Introduction

Chapter 2

Documents in OmniPage Pro

OmniPage Pro 14 handles documents one at a time. When you acquire your first image (from scanner or from file) a new document is started. Further acquired images are added to the same document, until you save and close it.

A document in OmniPage Pro consists of one image for each document page. After you perform OCR, the document will also contain recognized text, displayed in the Text Editor, possibly along with graphics and tables. See “The OmniPage Desktop” on page 22.

Basic processing steps

There are three ways of handling documents: with automatic, manual or workflow processing. See “Automatic processing” on page 38, “Manual processing” on page 40 and “Workflows” on page 94. The basic steps for all processing methods are broadly the same:

1.Bring a set of images into OmniPage Pro.

You can scan a paper document with or without an Automatic Document Feeder (ADF) or load one or more image files. The resulting images can appear as thumbnails in the Image Panel along with the image of the first page entered. The document pages are summarized in the Document Manager. See “Defining the source of page images” on page 48.

2.Perform OCR to generate editable text.

During OCR, OmniPage Pro creates zones around elements on the page that will be processed, and then interprets text characters or graphics in each zone. Manual and template zoning are also possible. After OCR, you can check and correct errors in the document using the OCR Proofreader and edit the document in the Text Editor.

3.Export the document to the desired location.

You can save your document to a specified file name and type, place it on the Clipboard, send it as a mail attachment or publish it. You can save it as an OmniPage Document (OPD) as described later. You can save the same document repeatedly to different destinations, different file types, with different settings and levels of formatting. See “Saving and exporting” on page 79.

What is optical character recognition

21

Standard toolbar

OmniPage

Toolbox

Thumbnails show a picture of each page

in the document.

The current page has an “eye” icon.

This page has been recognized.

Image toolbar

Page navigation buttons

Buttons to show or hide the Document Manager, Text Editor and the Image Panel’s thumbnails and current page display. This can also be done from the View menu.

The OmniPage Desktop

The OmniPage Desktop has a title bar and a menu bar along the top and a status bar along the bottom. It has three main working areas, separated by splitters: the Document Manager, the Image Panel and the Text Editor. Each has close, maximize and restore buttons top right. The Image Panel has an Image toolbar and the Text Editor has a Formatting toolbar.

Formatting toolbar

Drag these splitters to resize the working areas.

Image Panel:

This is displaying the image of the current page, together with its zones. The image panel can display the current page, thumbnails, or both.

The Text Editor view buttons offer three formatting levels.

Text Editor:

This is displaying the recognition results from the current page in True Page view.

22 Introduction

Chapter 2

We show the program with a three-page document. Page one is the current page, which has been recognized and proofed. Page two has been recognized but not proofed yet. Page three has been acquired and manually zoned, but not recognized yet. The icons at the bottom of the thumbnail images show page status.

Status bar buttons let you show or hide the main screen areas and move to other pages in the document. A right mouse click in any screen area brings up a shortcut menu with the most useful commands for that area.

The Menu bar

For concise information on any menu item, click the context-sensitive help button and then click a menu item. A popup text explains the purpose of the menu item. Click anywhere to close the popup.

The Toolbars

The program has three main toolbars; all can be floated. Use the View menu to show, hide or customize them. Context-sensitive help explains the purpose of all tools. Two further toolbars govern specific tasks.

Toolbar

Default

Other docking

Purpose

location

locations

 

 

 

 

 

 

Standard

Horizontal under

Any edge of the

Performing basic program functions.

Menu bar

OmniPage Desktop

See page 29 and page 65.

 

 

 

 

 

Image

Vertically to left of

Vertically to right of

Image, zoning and table operations.

current page image

current page image

See page 53 and page 59.

 

 

 

 

 

Formatting

Horizontal at top of

None

Formatting recognized text in the

Text Editor

Text Editor. See page 73.

 

 

 

 

 

 

Verifier

Hover the cursor over the verifier window

Controlling the location and appear-

to see this floating toolbar.

ance of the verifier. See page 67.

 

 

 

 

Reorder

Click the Change reading order tool. This

Modifying the order of elements in

toolbar replaces the Formatting toolbar.

recognized pages. See page 73.

 

 

 

 

 

The OmniPage Desktop

23

The Image Panel

When this displays the current page image, the Image toolbar is available. All page images have a background value: process or ignore. Zones can be manually drawn on page images, or can be placed automatically after recognition. There are five zone types: Process, Ignore, Text, Table, Graphics. Areas inside process zones and on a process background outside other zones have zones automatically drawn and their zone types determined during processing. See “Zones and backgrounds” on page 53.

If the current page image is hidden, the thumbnails appear in rows to make the best use of the available space.

The Text Editor

This displays recognition results in any of three formatting levels:

No Formatting view (NF)

Retain Fonts and Paragraphs view (RFP)

True Page (TP)

The True Page® formatting level retains page layout using text, table and picture boxes, and frames. It can display multicolumn areas, to show text blocks that can be treated as flowing columns at export time. True Page is also an export formatting level, along with Flowing Page that retains page layout without boxes and frames. See page 64.

In both the Image Panel and Text Editor, the shortcut menu (right mouse click) allows you to zoom in our out on the display. The standard toolbar also provides a zoom control.

24 Introduction

Chapter 2

The OmniPage Toolbox

This Toolbox lets you drive the processing. By default it is located along the top of the OmniPage Desktop, just above the working areas. It can be floated and also be docked along the bottom of the desktop.

Start/Stop button

Get Page button

Perform OCR button

Export Results button

Get Pages

Layout Description

Export Results

drop-down list

drop-down list

drop-down list

Workflow dropdown list with two sample workflows and a user-defined one.

Automatic processing is started, and can be stopped and re-started with the Start button when “1-2-3” is selected in the Workflow drop-down list. See “Automatic processing” on page 38.

Manual processing allows you to process documents page-by-page and step-by-step. Start each step (again with “1-2-3” selected) with the three large buttons: the Get Page button (1), the Perform OCR button (2) and the Export Results button (3). See “Manual processing” on page 40.

You can switch between automatic and manual processing any time the program is not busy with processing. That means you can switch between them while you are working within a document. You can automatically process some pages, then add more pages with manual processing. After processing a stack of pages automatically, you can inspect the results and then go back to reprocess certain pages manually. This procedure is described in chapter 3. See “Combined processing” on page 41.

Workflow processing is designed for performing repeated tasks efficiently. Select New Workflow... in the Workflow drop-down list and click Start. The Workflow Assistant helps you define a workflow that can be saved for repeated use. See “Workflow Assistant” on page 98.

The OmniPage Desktop

25

Managing documents

Document management can be done by thumbnails in the Image Panel or by the Document Manager, situated along the bottom of the OmniPage Desktop. Both summarize the pages in the document and are synchronized. Our pictures show these with the same seven-page document. Pages 1 and 2 are selected and page 4 is the current page, that is, the one shown in the Image Panel. Page status is shown as follows:

Page

Status

Icon

Page image has been...

1

Acquired

 

acquired but has not yet been recognized.

 

 

 

 

2

Recognized

 

recognized, but not proofread, or proofing

 

was interrupted on the page.

 

 

 

3

Recognized,

 

recognized, and proofing has reached the

Proofed

 

end of the page.

 

 

4

Modified

 

recognized with at least one editing or for-

 

matting change made in the Text Editor.

 

 

 

5

Modified,

 

recognized, edited in the Text Editor, and

proofed

 

proofing has reached the end of the page.

 

 

6

Pending

 

acquired, maybe recognized; some zone

 

changes are stored but not yet processed.

 

 

 

7

Saved

 

recognized and saved at least once.

 

 

 

 

Thumbnails

These present a set of numbered thumbnail images, one for each page in the document. Scroll to see pages as necessary. The current page has an ‘eye’ icon. You can select multiple pages in the document; these have a distinctive appearance. Use thumbnails for page operations, as follows:

Jump to a page: Click the thumbnail of the desired page.

Reorder a page: Click the thumbnail of the page you want to move and drag it above the desired page number. Pages are renumbered automatically.

Delete a page: Select the thumbnail of the page you want to delete and press the Delete key.

Select multiple pages: Hold down the Shift key and click two thumbnails to select all pages between and including them. Hold down

26 Introduction

Move the cursor onto the page’s status icon to see a thumbnail of the page.

Chapter 2

the Ctrl key as you click thumbnails to add pages to a selection one by one. Then you can move or delete the selected pages as a group, or send them to (re)recognition. You can also export selected pages.

Get information on an image by hovering the cursor over it with Image Info enabled in the image panel shortcut menu. A popup text displays the image size in pixels and the program’s unit of measurement. Image resolution is also shown.

Document Manager

This provides an overview of your document with a table. Each row represents one page. Columns present statistical or status information for each page, and (where appropriate) document totals. The picture shows columns that a user has specified.

Enter comments or searchable keywords here.

The current page is shown with an ‘eye’ icon. You can use the Document Manager for page operations, as follows:

Jump to a page: Click the leftmost part of the page row or double click anywhere in its row.

Reorder a page: Click the row of the page you want to move and drag it to the desired location. An indicator on the left shows where the page will be inserted. Pages are renumbered automatically.

Delete a page: Select the row of the page you want to delete and press the Delete key.

Select multiple pages: Hold down the Shift key and click two page rows to select all pages between and including them. Hold down the Ctrl key as you click rows to add pages to a selection one by one. Then you can move or delete the selected pages as a group, or send them to (re)recognition. You can also export selected pages.

Managing documents

27

When multiple pages are being selected, the page set as current does not change. All selected pages are highlighted.

Customizing Document Manager columns

You can specify which columns of information you want to see in the Document Manager. Click Customize Columns... in the View menu for the following dialog box:

This item is highlighted.

Click a checkbox

 

Highlight an

to select the item.

 

 

 

item and use

Image sizes are

 

these arrows to

 

change the

expressed in

 

 

order of

pixels.

 

 

columns.

 

 

Define a width for the highlighted item.

Define which columns should appear, their widths, and column order. The topic Customizing Document Manager columns in online Help clarifies what is presented in each column. You can change column widths easily in the Document Manager; just drag the column dividers in the title bar.

Deleting pages from a document

Page deletions must be confirmed and can be undone. Delete the current page only with the item Delete Current Page in the Edit menu. Delete all selected pages in the Document Manager or from the thumbnails by pressing the Delete key or using the shortcut menu command Clear.

28 Introduction

Chapter 2

Printing a document

You can print the document with the Print item in the File menu. Choose whether to print images or text (that is, recognition results as they appear in the Text Editor). You can print all pages or a range of pages. The Print tool in the Standard toolbar prints images or text, depending whether the Image Panel or the Text Editor is active.

Closing a document

Choose Close in the File menu to close a document. You are prompted to save your document if you have not saved it or you have modified it since the last save. See the next section on saving the document as an OmniPage Document (*.opd). You will also be prompted to save unsaved training data if you selected ‘Prompt to save training data when closing document’ in the Proofing panel of the Options dialog box.

OmniPage Documents

The OmniPage Document is the program’s proprietary file type; it has the extension .opd. You save the document to the OPD file type if you want to work with it again in OmniPage Pro during a future session. You can then process unfinished pages, add more pages and proof or edit recognition results.

An OmniPage Document contains the original page images (deskewed and pre-processed) with any zones placed on them. After recognition, the OPD also contains the recognition results. Recognized characters are stored along with their coordinate and confidence data. This preserves the links between image and text, so that verification and proofing remain available when the OPD is reopened in future sessions.

When you save an OmniPage Document, the current settings (and unsaved training) are also saved. When you open an OmniPage Document, its settings are applied, replacing those existing in the program.

OmniPage Documents

29

Why save to OPD

You do not have to save your documents to the OPD file type. You would typically do this for the following reasons:

You cannot finish working with the document in the current session.

You want to pass the document to other users who have OmniPage Pro. For example, you can pass an OPD file to a specialist for proofing. In an office network, you may have one scanner generating images for recognition and proofing at several workstations.

You want to build up an archive of recognized documents whose original images remain accessible. The recognized texts allow searching by keywords and other document retrieval techniques.

Recognition results should be saved from OPD files before installing any OmniPage Pro upgrade. These files may not be upwards compatible to newer OPD file formats, or possibly only the images will be retained when the files are upgraded. When you open an OPD created by OmniPage Pro 10, only images are loaded. When you open an OPD created by OmniPage Pro 11, images and recognized pages are loaded, but no zones are retained. All three are retained in OPD files originating from OmniPage Pro 12.

How to save to OPD

Saving to OPD is done from the File menu, or by using the Save button in the Standard toolbar. The title bar shows the OmniPage Document file name. If you intend to create an OPD, you can save it to this file type at an early stage, for protection. Then use the Save button to save it periodically as you work. Save it again at the end of your session.

When you close the document or exit the program, you will be prompted to save the document as an OPD. You can include one or more saves to the OPD file type in a workflow, along with steps to save images or recognition results to other file types, see “Creating workflows” on

page 98.

30 Introduction

Chapter 2

When saving, you have two file type choices: OmniPage Document or OmniPage Document (Extended). The latter allows you to embed a user dictionary, training file or zone template file in the OPD. This can increase file size considerably but makes the OPD more portable. To embed any of these items, load them before the save to the OmniPage Document (Extended) file type.

How to load an OPD

Select Open OPD... from the File menu. The file type OmniPage Document includes both normal and extended OPDs. Choose the required file. An embedded user dictionary, training file or zone template can be resaved to a named file. Opening an OmniPage Document is also available as a workflow step.

Settings

The Options dialog box is the central location for OmniPage Pro settings. Access it from the Standard toolbar or the Tools menu. Contextsensitive help provides information on each setting. In overview, the settings panels are:

OCR

Use this to specify recognition languages, additional characters and a user or professional dictionary. Click the checkbox before a language to select or deselect it. Multiple selection is possible; select only languages appearing in the document to be recognized. The top items are the recently selected languages. Key in the first letters of a language to jump to it. You can also choose to optimize processing for speed or accuracy, define a reject character, handle font matching and provide a custom layout description. See page 51.

Scanner

Use this to define page size and orientation for scanning. You can also make brightness and contrast settings and define options for scanning multi-page documents, with or without an Automatic Document Feeder (ADF). You can change scanner setup settings or install a new scanner or change the default scanner. See “Input from scanner” on page 49. This

Settings 31

panel is not available if you requested display of your scanner’s native TWAIN interface when you set up your scanner. See “Setting up your scanner with OmniPage Pro” on page 14.

Direct OCR

This feature provides OCR services directly from your favorite word processor or similar application. Use this panel to register and unregister applications for Direct OCR and to enable or disable this service. You can also specify automatic or manual zoning and whether proofreading is desired or not. See “How to set up Direct OCR” on page 44.

Process

Use this to define where new images should be placed in the document, to request prompting for more pages when scanning, to specify two-page scanning for handling books, and other settings.

Proofing

Use this to define whether proofreading should begin automatically after recognition. Define also whether IntelliTrain should run, and use it to load or work with a training file. See “Proofreading OCR results” on page 65.

General

Change the interface language here. Enable an OmniPage icon on your taskbar that will list your workflows for quick start processing. Enable automatic detection of online updates and other settings.

Text Editor

Use this to show or hide some features in the Text Editor, to define the unit of measurement to be used and to turn word wrapping on or off. See “Text and image editing” on page 73.

Some settings have an effect only on future recognition. Examples are the recognition languages, a training file or scanner brightness. These settings should be correctly adjusted before you start processing. To have changes in these settings applied to already recognized pages, you will have to re-recognize them. Other settings are implemented immediately in all existing pages. Examples are Text Editor settings like word wrap or measurement units.

32 Introduction

Chapter 3

Processing documents

This tutorial chapter describes different ways you can process a document and also provides information on key parts of this processing.

Quick Start Guide

Processing overview

Automatic processing

Manual processing

Combined processing

Processing with workflows

Processing from other applications (Direct OCR, PaperPort)

Processing with the Batch Manager

The detailed topics are:

Defining the source of page images

Describing the layout of the document

Zones and backgrounds

Automatic zoning

Manual zoning

Zone types and properties

Working with zones

Speed zoning

Table grids in the image

Using zone templates

OmniPage Pro User’s Guide

33

Quick Start Guide

This topic takes you step-by-step through the basic OCR process.

Loading and recognizing sample image files

You will find sample image files in the program folder, both single-page and multi-page files. First try reading these files using the procedure presented below, except for the references to a scanner. See “Input from image files” on page 48. The results provide you with a benchmark of the recognition quality you should expect from your own files of comparable quality.

Next, try scanning a page from your scanner.

Scanning and recognizing a single page

Turn your scanner on and be sure it is working correctly. Choose a page with good-quality clear text for this test.

We assume OmniPage Pro’s default settings are set and that your document is in the language you specified for interface language during installation. Open the Options dialog box from the Tools menu and choose Use Defaults if you are not using the program for the first time.

You will process the document automatically and save the recognition results to a file. You will proof the document but will not edit it inside the Text Editor.

34 Processing documents

Chapter 3

 

What you do:

What happens:

 

 

 

1.

Set up your scanner using the Scanner

Configures OmniPage Pro to work with your scanner.

 

Wizard, if this is not already done.

 

 

 

 

 

2.

Select Start!All Programs!ScanSoft

Opens OmniPage Pro on your computer.

 

OmniPage Pro 14.0!OmniPage Pro 14.0

 

 

 

 

 

3.

Place the document correctly in your

 

 

scanner.

 

 

 

 

4.

From the Get Page drop-down list, select a

Allows you to determine how pictures or colored texts

 

scan option for your document:

and backgrounds will look in the exported document.

 

black-and-white, grayscale or color.

Color scanning needs a color scanner.

 

 

 

5.

From the Layout Description drop-down list,

Configures the program how to place zones on the

 

check Automatic is selected. For a wide

 

page and decide their properties automatically.

 

range of documents, this is the best choice.

 

 

 

 

 

6.

From the Export Results drop-down list,

This means you will be able to name your export file

 

check that Save to File is selected.

after you have proofed the document.

 

 

 

7.

Make sure 1-2-3 is selected in the Workflow

OmniPage Pro will start to scan in your document. A

 

thumbnail appears with a progress indicator. The

 

drop-down list. Click the Start button.

 

OCR Proofreader appears.

 

 

 

 

 

8.

Use the OCR Proofreader to modify words

The OCR Proofreader operates like a spell checker in

 

a word processing program, but with added

 

that the program suspects have not been

 

OCR-specific features. It removes markings from

 

recognized correctly.

 

words you proof.

 

 

 

 

 

9.

Click in the Text Editor. Select Text Editor

Each Text Editor view defines a formatting level. This

 

views one after another, to see how the

 

guides you which level to choose at saving time.

 

page appears in each view.

 

 

 

 

 

10.

Click Resume to restart proofing. When the

This ends the OCR Proofreader process. The Save to

 

message OCR Proofreading is complete

 

File dialog box will appear.

 

appears, click on OK.

 

 

 

 

 

11.

Choose a file name, file type, path and a

By default, Save and Launch is enabled, so your

 

document will be automatically opened in the word

 

formatting level to save your recognized

 

processing program associated with the file type that

 

document. Click on OK.

 

you selected.

 

 

 

 

 

12.

Inspect the document in your word process-

You have successfully used OmniPage Pro 14 to

 

recognize your document and open it in your target

 

ing program.

 

application!

 

 

 

 

 

If you succeeded in getting good results from the sample image files, but not from the scanned page, check your scanner installation and settings: in particular brightness and image resolution. See “Input from scanner” on page 49. This provides a model of optimum brightness. See also the online Help topics Setting up your scanner and Scanner troubleshooting.

Quick Start Guide

35

Get Pages

from file page 48

from scanner page 49

other page 48

Processing overview

The following flow diagram summarizes the processing steps:

Describe

Auto-

page

zoning

layout

page 53

page 51

 

 

Manual

Apply a

zoning

page 54

template

 

page 61

 

 

 

Export pages

Perform

Verify and

to file

OCR

edit

page 82

 

with

page 67

to Clipboard

current

 

page 89

settings

Proofread

via Mail

page 31

page 65

page 90

 

 

 

 

 

other

 

 

page 92

Here is an overview of the processing methods you can use. You will find step-by-step guidance for each of them in the following pages.

Automatic

A fast and easy way to process documents is to let OmniPage Pro do it automatically for you. Select settings in the Options dialog box and in the OmniPage Toolbox drop-down lists and then click Start. It will take each page through the whole process from beginning to end, when possible running in parallel. It will typically auto-zone the pages.

Manual

Manual processing gives you more precise control over the way your pages are handled. You can process the document page-by-page with different settings for each page. The program also stops between each step: acquiring images, performing recognition, exporting. This lets you, for instance, draw zones manually or change recognition language(s). You start each step by clicking the three buttons on the OmniPage Toolbox.

Combined

You can process a document automatically and view results in the Text Editor. If most pages are in order, but a few have not turned out as expected, you can switch to manual processing to adjust settings and re-

36 Processing documents

Chapter 3

recognize just those problem pages. Alternatively, you can acquire images with manual processing, draw zones on some or all of them, and then send all pages to automatic processing.

Workflow

A workflow consists of a series of steps and their settings. Typically it will include a recognition step, but it does not have to. Workflows are listed in the Workflow drop-down list – sample workflows plus any you create. You can choose to place an OmniPage icon on your taskbar. Its shortcut menu lists your workflows. Click a workflow to launch OmniPage Pro and have it run.

Let the Workflow Assistant guide you in creating new workflows. It provides a choice of steps and the settings they need. After each step icon is selected and its settings (if any) defined, you get a new set of step icons to choose from. When ready, you can save the workflow for future use, but this is not compulsory. You can use the Assistant just to get more guidance when doing automatic processing. See “Workflow Assistant” on page 98.

In other applications

You can use the Direct OCR feature to call on the recognition services of OmniPage Pro while working in your usual word-processor or similar application. OmniPage Pro also automatically links itself to ScanSoft’s PaperPort and Pagis document management programs.

At a later time

You can schedule OCR jobs or other processing jobs to be performed automatically at a later time, when you may not even be present at your computer. This is done through the Batch Manager. When you choose New Job, the Workflow Assistant appears, with a slightly modified set of choices and settings. The main difference is its closing panel that allows you to specify a starting time, a recurring job or watched folder instructions.

A Batch Manager job is basically a workflow with timing instructions added. See “Batch Manager” on page 101.

Processing overview

37

Automatic processing

Automatic processing provides an efficient way of handling documents, especially larger ones. First you select all settings needed, then you can use the Start button in the OmniPage Toolbox to process a new document from start to finish or to restart and finish processing on an open document.

Start button

Get Page button

Perform OCR button

Export Results button

Workflow drop-down list

Get Pages drop-down list

Some items appear only in OmniPage

Pro 14 Office, others only if the source is available.

Layout

Export

Description

Results

drop-down

drop-down

list

list

1.Make sure 1-2-3 is selected in the Workflow drop-down list.

2.Select the desired Get Page setting in the drop-down list. You define the document source, which can be from image files or from a scanner. See “Defining the source of page images” on page 48.

3.Select a setting from the Layout Description drop-down list, as shown above. This guides the program in auto-zoning the pages. You describe the incoming pages or specify a zone template file. See “Describing the layout of the document” on page 51.

4.Select a setting from the Export Results drop-down list. You can save pages (current, selected, all) to file, copy them to Clipboard, send

38 Processing documents

+ 86 hidden pages