Visioneer PROOCR100 User Manual

Download

Untitled Document

Pro OCR User’s Guide

file:///C|/VisioneerDoc/Main.html [1/20/2003 4:21:09 PM]

Pro OCR User’s Guide

Contents

Chapter 1: Introducing Visioneer Pro OCR 100

Chapter 2: Learning Pro OCR Basics

Chapter 3: Getting Documents

Chapter 4: Locating Text and Graphics

Chapter 5: Setting Recognize Options and Proofing a

Pro OCR User’s Guide

Chapter 1 Introducing Visioneer Pro

OCR 100

This chapter introduces you to the Pro OCR application and to the concept of optical character recognition (OCR).

Why Pro OCR

Pro OCR is an Optical Character Recognition (OCR) application. An OCR application converts images of text, such as those obtained from scanning a document or receiving a fax through your faxmodem, into editable text. For example, when a scanner scans a page of text, it sees black and white areas on the page. The scanner converts what it sees into an image and stores the image on the computer. To transform a scanned text image into something a word processing or spreadsheet application can recognize as characters, you need an OCR (optical

file:///C|/VisioneerDoc/html/ug_main.htm (1 of 3) [1/20/2003 4:21:10 PM]

Pro OCR User’s Guide

Recognized Document

Chapter 6: Saving and Printing Documents

Chapter 7: Creating and Processing Deferred and Batch Jobs

Chapter 8: Tips for Getting the Best Results

Glossary

character recognition) application, such as Pro OCR.

Every day you may spend a lot of time retyping printed text or numbers from hard copy documents. By using Pro OCR and a scanner as an input device, you can eliminate much of this retyping.

Features and Highlights of Pro OCR

Many of the existing OCR products are typically capable of recognizing 200–300 plain, nonstylized typefaces. Using recognition technology, Pro OCR can recognize over 2,000 typefaces.

Most basic OCR applications inspect the scanned page image, attempt to recognize the dots on the page as characters, and transform the image into a plain text file. Pro OCR does all of these basic tasks, but it can also get the entire page into your word processor or spreadsheet as is—retaining the shape, form, type, and spacing, as well as the content, of the input page. Pro OCR provides:

■ The ability to read one or more pages of

text including graphics. Pro OCR reads pages directly from your scanner, or it reads TIFF, PCX, and DCX files. Pro OCR can automatically locate pictures and embed them in your document. You can also export pictures separately in a number of file formats.

■ Speed and accuracy of recognition. With

most documents, Pro OCR is faster than, and as accurate as a good typist.

■ Numeric regions. You can specify that a

given region on a page can contain only numbers. Numeric regions help Pro OCR make sure that numbers are always recognized as numbers and never mistakenly identified as

file:///C|/VisioneerDoc/html/ug_main.htm (2 of 3) [1/20/2003 4:21:10 PM]

Pro OCR User’s Guide

letters.

■ Recognition and retention of fonts,

characters, styles, and page formatting. Pro OCR recognizes and retains the differences between serif and sans-serif fonts, styles such as bold, underline, and subscript, and formatting such as columns, tables, and indents.

■ Deferred and batch processing. You can

perform procedures that need your attention or interaction (for example, locating), and then do the time consuming steps that don’t need interaction (for example, recognizing) at another time.

■ Internet readiness. supports HTML export

format. You can convert an image file directly to an HTML page and upload it to the Web site.

■ Proofing options. Pro OCR has a number of

proofing options. You can also send recognized text directly to your word processor.

■ Save features. With Pro OCR you can save

recognized text in a wide variety of word processor and spreadsheet file formats. Pro OCR works with imperfect input pages that may have skewed lines of text, touching or broken characters, and fuzzy characters.

file:///C|/VisioneerDoc/html/ug_main.htm (3 of 3) [1/20/2003 4:21:10 PM]

Introducing Visioneer Pro OCR 100

Pro OCR User’s Guide

Chapter 1 Introducing Visioneer Pro OCR 100

This chapter introduces you to the Pro OCR application and to the concept of optical character recognition (OCR).

Why Pro OCR

Pro OCR is an Optical Character Recognition (OCR) application. An OCR application converts images of text, such as those obtained from scanning a document or receiving a fax through your fax-modem, into editable text. For example, when a scanner scans a page of text, it sees black and white areas on the page. The scanner converts what it sees into an image and stores the image on the computer. To transform a scanned text image into something a word processing or spreadsheet application can recognize as characters, you need an OCR (optical character recognition) application, such as Pro OCR.

Every day you may spend a lot of time retyping printed text or numbers from hard copy documents. By using Pro OCR and a scanner as an input device, you can eliminate much of this retyping.

Features and Highlights of Pro OCR

Many of the existing OCR products are typically capable of recognizing 200–300 plain, nonstylized typefaces. Using recognition technology, Pro OCR can recognize over 2,000 typefaces.

file:///C|/VisioneerDoc/html/01intro.htm (1 of 2) [1/20/2003 4:21:10 PM]

Introducing Visioneer Pro OCR 100

■ The ability to read one or more pages of text including graphics. Pro

OCR reads pages directly from your scanner, or it reads TIFF, PCX, and DCX files. Pro OCR can automatically locate pictures and embed them in your document. You can also export pictures separately in a number of file formats.

■ Speed and accuracy of recognition. With most documents, Pro OCR is

faster than, and as accurate as a good typist.

■ Numeric regions. You can specify that a given region on a page can contain

only numbers. Numeric regions help Pro OCR make sure that numbers are always recognized as numbers and never mistakenly identified as letters.

■ Recognition and retention of fonts, characters, styles, and page

formatting. Pro OCR recognizes and retains the differences between serif and sans-serif fonts, styles such as bold, underline, and subscript, and formatting such as columns, tables, and indents.

■ Deferred and batch processing. You can perform procedures that need

your attention or interaction (for example, locating), and then do the time consuming steps that don’t need interaction (for example, recognizing) at another time.

■ Internet readiness. supports HTML export format. You can convert an

image file directly to an HTML page and upload it to the Web site.

■ Proofing options. Pro OCR has a number of proofing options. You can also

send recognized text directly to your word processor.

■ Save features. With Pro OCR you can save recognized text in a wide

variety of word processor and spreadsheet file formats. Pro OCR works with imperfect input pages that may have skewed lines of text, touching or broken characters, and fuzzy characters.

file:///C|/VisioneerDoc/html/01intro.htm (2 of 2) [1/20/2003 4:21:10 PM]

file:///C|/VisioneerDoc/html/copyrt.htm

Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.

AnyPort, AutoFix, AutoLaunch, FormTyper, MicroChrome, PaperEnable, PaperLaunch, PaperPort, PaperPort Deluxe, PaperPort ix, PaperPort Links, PaperPort mx, PaperPort PowerBar, PaperPort 3000, PaperPort 6000, PaperPort vx, PaperPortation, PaperPort Strobe, Pro OCR, ScanDirect, SimpleSearch, SharpPage, and Visioneer are trademarks of Visioneer, Inc. PaperPort, Paper-driven, and the Visioneer logo are registered trademarks of Visioneer, Inc.

Microsoft is a U.S. registered trademark of Microsoft Corporation. Windows is a trademark of Microsoft Corporation. TextBridge is a registered trademark of Xerox Corporation. ZyINDEX is a registered trademark of ZyLAB International, Inc. ZyINDEX toolkit portions, Copyright © 1990–1996, ZyLAB International, Inc. All Rights Reserved. All other products mentioned herein may be trademarks of their respective companies.

Information is subject to change without notice and does not represent a commitment on the part of Visioneer, Inc. The software described is furnished under a licensing agreement. The software may be used or copied only in accordance with the terms of such an agreement. It is against the law to copy the software on any medium except as specifically allowed in the licensing agreement. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage and retrieval systems, or translated to another language, for any purpose other than the licensee’s personal use and as specifically allowed in the licensing agreement, without the express written permission of Visioneer, Inc.

Part Number: 05-0340-000

Restricted Rights Legend

Use, duplication, or disclosure is subject to restrictions as set forth in contract subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software Clause 52.227-FAR14. Material scanned by this product may be protected by governmental laws and other regulations, such as copyright laws. The customer is solely responsible for complying with all such laws and regulations.

file:///C|/VisioneerDoc/html/copyrt.htm (1 of 3) [1/20/2003 4:21:10 PM]

file:///C|/VisioneerDoc/html/copyrt.htm

Visioneer’s Limited Product Warranty

If you find physical defects in the materials or the workmanship used in making the product described in this document, Visioneer will repair, or at its option, replace, the product at no charge to you, provided you return it (postage prepaid, with proof of your purchase from the original reseller) during the 12-month period after the date of your original purchase of the product.

THIS IS VISIONEER’S ONLY WARRANTY AND YOUR EXCLUSIVE REMEDY CONCERNING THE PRODUCT, ALL OTHER REPRESENTATIONS, WARRANTIES OR CONDITIONS, EXPRESS OR IMPLIED, WRITTEN OR ORAL, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT, ARE EXPRESSLY EXCLUDED. AS A RESULT, EXCEPT AS SET OUT ABOVE, THE PRODUCT IS SOLD “AS IS” AND YOU ARE ASSUMING THE ENTIRE RISK AS TO THE PRODUCT’S SUITABILITY TO YOUR NEEDS, ITS QUALITY AND ITS PERFORMANCE,

IN NO EVENT WILL VISIONEER BE LIABLE FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM ANY DEFECT IN THE PRODUCT OR FROM ITS USE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

All exclusions and limitations in this warranty are made only to the extent permitted by applicable law and shall be of no effect to the extent in conflict with the express requirements of applicable law.

FCC Radio Frequency Interference Statement

This equipment has been tested and found to comply with the limits for the class B digital device, pursuant to part 15 of the FCC Rules. These limits are designed to provide reasonable protection against interference in a residential installation. This equipment generates, uses and can radiate radio frequency energy and if not installed, and used in accordance with the instructions, may cause harmful interference to radio communications. However, there is no guarantee that interference will not occur in a particular installation. If this equipment does cause harmful interference to radio or television reception, which can be determined by turning the equirpment off and on, the user is encouraged to try and correct the interference by one or more of the following measures:

■ Reorient or relocate the recemng antenna.

file:///C|/VisioneerDoc/html/copyrt.htm (2 of 3) [1/20/2003 4:21:10 PM]

file:///C|/VisioneerDoc/html/copyrt.htm

■ Increase the separation between the equipment and receiver.

■ Connect the equipment into an outlet on a circuit different from that to

which the receiver is connected.

■ Consult the dealer or an experienced radio/TV technician for help.

This equipment has been certified to comply with the limits for a class B computing device, pursuant to FCC Rules. In order to maintain compliance with FCC regulations, shielded cables must be used with this equipment. Operation with nonapproved equipment or unshielded cables is likely to result in interference to radio and TV reception. The user is cautioned that changes and modifications made to the equipment without the approval of manufacturer could void the user's authority to operate this equipment.

This device complies with part 15 of the FCC Rules. Operation is subject to the following two conditions: (1) This device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation.

Back to Pro OCR User’s Guide.

file:///C|/VisioneerDoc/html/copyrt.htm (3 of 3) [1/20/2003 4:21:10 PM]

Contents

Chapter 1: Introducing Visioneer Pro OCR 100

Chapter 2: Learning Pro OCR Basics

Chapter 3: Getting Documents

Chapter 4: Locating Text and Graphics

Chapter 5: Setting Recognize Options and Proofing a Recognized Document

Chapter 6: Saving and Printing Documents

Chapter 7: Creating and Processing Deferred and Batch Jobs

Chapter 8: Tips for Getting the Best Results

Glossary

file:///C|/VisioneerDoc/html/toc.htm [1/20/2003 4:21:11 PM]

Contents

Chapter 1: Introducing Visioneer Pro OCR 100

Why Pro OCR

Features and Highlights of Pro OCR

Glossary

file:///C|/VisioneerDoc/html/toc1.htm [1/20/2003 4:21:11 PM]

Glossary

A4 Letter page size

accelerator key

ADF

alphanumeric word

ASCII

As Single Column locating method

Auto OCR

Auto brightness

automatic document feeder (ADF)

automatic processing

background noise

backup

backwards compatible

bit image

bitmap

bitmapped character

bold text

brightness

broken character

file:///C|/VisioneerDoc/html/glos.htm (1 of 9) [1/20/2003 4:21:11 PM]

Glossary

built-in dictionary

CCITT

character

character format

character identification error

character image

character recognition

character style

clipboard

column information

compression

confidence

consistent document

copyrighted document

deferred job

deferred processing

degraded image

dialog box

desktop

document area

dots per inch (dpi)

file:///C|/VisioneerDoc/html/glos.htm (2 of 9) [1/20/2003 4:21:11 PM]

Glossary

dpi

draft quality text

driver

exporting

export format

file extension

file formats

file type

fine resolution

flatbed scanner

font

font family

font mapping

format retention

Gallery

Get Page

grayscale image

hard page breaks

heavy character

I-beam pointer

file:///C|/VisioneerDoc/html/glos.htm (3 of 9) [1/20/2003 4:21:11 PM]

Glossary

icon

illegible character

illegible character symbol

image view

input file formats

insertion point

italic text

justification

kerning

landscape orientation

layout

layout analysis error

Legal page size

Lenient suspect threshold

letter quality text

line break

Locate

locate region

locating

locating method

file:///C|/VisioneerDoc/html/glos.htm (4 of 9) [1/20/2003 4:21:11 PM]

Glossary

menu bar

multi-column text

monospaced font

monospaced font mapping

newspaper style columns

Normal locating method

Normal suspect threshold

numeric region

OCR

On-Screen Verifier™

Optical Character Recognition (OCR)

order of text regions

orientation

output file formats

page controls

page format

page image

page number box

page orientation

page size

file:///C|/VisioneerDoc/html/glos.htm (5 of 9) [1/20/2003 4:21:11 PM]

Glossary

page source

PCX

picture element

picture region

pixel

pixel-for-pixel

plain text

portrait orientation

printer font

Pro OCR Deferred format

Pro OCR format

Pro OCR process

Pro OCR window

Proof

proportionally spaced font

recognition accuracy

Recognize

recognized text

recognizing

region style

resolution

file:///C|/VisioneerDoc/html/glos.htm (6 of 9) [1/20/2003 4:21:11 PM]

Glossary

Rich Text Format (RTF)

RTF

sans serif

sans serif font mapping

scanner

scanner driver

scanning

screen font

scroll bars

serif

serif font mapping

settings file

sheetfed scanner

side-by-side columns

single-bit image

single-step processing

skewed text

spell checking

standard resolution

status bar

file:///C|/VisioneerDoc/html/glos.htm (7 of 9) [1/20/2003 4:21:11 PM]

Glossary

status display area

Stringent suspect threshold

stroke weight

Style ribbon

stylized font

subscript text

superscript text

supplementary dictionaries

suspect character

suspect threshold

Tag Image File Format

template

template matching

Template locating method

text quality

text region

text style

text view

throughput

TIFF

touching characters

file:///C|/VisioneerDoc/html/glos.htm (8 of 9) [1/20/2003 4:21:11 PM]

Glossary

typeface

type quality

type size

type style

underline text

User Defined page size

user dictionary

view selector

window

Windows

word wrap

zoom controls

file:///C|/VisioneerDoc/html/glos.htm (9 of 9) [1/20/2003 4:21:11 PM]

file:///C|/VisioneerDoc/html/glossary.htm

Glossary

A4 Letter page size An A4 size page measures 8.33" x 11.66". accelerator key In Windows applications, a keyboard shortcut to a menu

command.

ADF See automatic document feeder (ADF).

alphanumeric word A word made up of the alphabetic and numeric characters

(A–Z, a–z, 0–9) in a character set. Excludes punctuation and other symbol characters.

ASCII Acronym for American Standard Code for Information

Interchange (pronounced “ASK-ee”). A standard that assigns a unique binary number to each text and control character. ASCII code is used for representing text inside a computer and for transmitting text between computers or between a computer and a peripheral device.

As Single Column locating method One of Pro OCR’s three locating methods. Use it when you

want Pro OCR to read a page as a single column, from left margin to right margin, ignoring any column or paragraph spacing. Most commonly used for pages in which there is no clear column or paragraph structure.

Auto OCR Clicking this button starts automatic processing, which uses

Get Page, Locate, and Recognize according to the current gallery settings.

Auto brightness A feature of some scanners, by which brightness is adjusted

automatically while the page is scanned.

automatic document feeder (ADF) Built-in or optional equipment for a scanner that lets you

automatically scan stacks of pages instead of having to place them one at a time on the flatbed. Sometimes it’s difficult to control the proper alignment of pages using an automatic document feeder. Compare with

flatbed scanner and

sheetfed scanner.

file:///C|/VisioneerDoc/html/glossary.htm (1 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

automatic processing A method for using Pro OCR with minimal intervention.

Automatic processing involves setting appropriate Gallery settings, before using Auto Start to read in one or more image files or scan in one or more pages. Once page images have been acquired, automatic processing Locates and Recognizes each page image in succession. Automatic processing is best suited to documents that require the same Gallery settings (Page Size, Brightness, Locate method, etc.). Compare with

single-step processing.

background noise Non-character or non-graphic information in a page image

that adversely affects optical recognition. Background noise includes the shading that results from scanning colored paper stock, extraneous marks, dirt or ink bleed. Problems with background noise can be reduced by using the brightness setting in Pro OCR to compensate for the type of noise on the page.

backup (n.) A copy of a disk or of a file on a disk. It’s a good idea to

make backups of all your important disks and to use the copies for everyday work, keeping the originals in a safe place.

backwards compatible The ability of an application to open files created with earlier

versions of that application.

bit image A collection of bits in memory that represents a two-

dimensional surface. For example, the screen is a visible bit image.

bitmap 1. A set of bits that represents the graphic image of an

original document in memory.

2. A set of bits that represents the positions and states of a corresponding set of items, such as pixels. Used by the computer to construct graphic images and fonts. See also bit

image.

file:///C|/VisioneerDoc/html/glossary.htm (2 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

bitmapped character A character image made up of a pattern of dots that exists in a

computer file or in memory as a bitmap. Bitmapped characters cannot be interpreted by a computer. In order for a computer to use bitmapped characters in a word processor or spreadsheet, the characters must first be interpreted by an OCR application and translated into ASCII text.

bold text Text with the bold attribute looks like this. See also text

style.

brightness The relative amount of light or darkness reflected from an

image. A scanner’s brightness control is used in Pro OCR to adjust for pages that are either too light or too dark.

broken character A character with one or more missing pieces, such as a

missing serif, stem, or cross bar. For example, a broken lower case ‘e’ might not have a fully closed loop, which could cause it to be misrecognized. Problems with broken characters can be reduced by using the brightness setting in Pro OCR to darken the image when scanning. Compare with

heavy character and touching characters.

built-in dictionary The dictionary that Pro OCR automatically loads and uses

whenever Recognize is done. The built-in dictionary is used to enhance Pro OCR’s recognition accuracy and also to find misspelled words in the document. Compare with

supplementary dictionaries and user dictionary.

CCITT Abbreviation for Consultative Committee on International

Telegraphy and Telephony; an international committee that

sets standards and makes recommendations for international communication. One of the standards set by CCITT is for the compression of image files. Pro OCR employs CCITTstandard compression methods. See also

compression and

TIFF.

character Any symbol that has a widely understood meaning and thus

can convey information, including alphabetic, numeric, symbolic, and punctuation elements.

file:///C|/VisioneerDoc/html/glossary.htm (3 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

character format Font and style information applied to characters. Character

format information includes the font name and type size, as attributes such as underline, bold, italic, or some combination of these properties. Compare with page format.

character identification error An incorrectly recognized bitmapped character. There are

two kinds of character identification errors—substitutions and rejects. A character substitution occurs when a character is incorrectly recognized as another. A reject character results from the inability of the OCR application to interpret a character image with sufficient confidence. In such cases, recognition is not attempted and the character is flagged as illegible. Compare with

layout analysis error.

character image An arrangement of bits that defines a character in a font. character recognition The OCR process in which bitmapped character images are

interpreted and translated into ASCII computer codes.

character style See

type style.

clipboard In Windows applications, temporary storage for text that is

cut or copied from a document. Text saved in the clipboard may be pasted back into the same or another document.

column information Part of Pro OCR’s page format information. Column

information includes the location of the column on the page, the width of the column, and its left and right margins.

compression Electronic method for reducing the size of a file without

losing any information in the file. Compressed TIFF files take up significantly less disk space than uncompressed files. See also

TIFF and CCITT.

confidence In Pro OCR, a measure of the certainty of an unknown

character’s identity. Above a certain confidence level, a character is automatically recognized. At lower confidence levels, a character may either be recognized, but flagged as a suspect character, or not recognized and flagged as an illegible character.

file:///C|/VisioneerDoc/html/glossary.htm (4 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

consistent document A set of pages or image files where the same Gallery settings

apply to each page in the document. Pro OCR’s Auto Start feature can be used to best effect when a document is consistent.

copyrighted document Most published or printed materials and documents are

copyrighted. It is illegal to use a computer and Pro OCR to copy, store, or reproduce, on paper or electronically, any copyrighted documents without the permission of the copyright holder.

deferred job A file that contains one or more partially processed pages for

Pro OCR to finish processing later on. See also Pro OCR

Deferred format and deferred processing.

deferred processing Provides the ability to individually specify Get Page, Locate,

and recognize settings for particular pages when necessary, while still being able to automatically process a job at a later time.

degraded image An image that contains broken characters, touching

characters and/or background noise. See

broken character,

touching characters and background noise.

dialog box In Windows applications, the standard pop-up box that is

displayed to communicate with the user when a command requires some further action. Some dialog boxes are informational.

desktop Your working environment on the computer—the menu bar

and the background area on the screen. You can have a number of documents or windows on the desktop open at the same time.

document area The main part of the application window in Pro OCR. The

document area shows one page of the current document at a time using the selected View Size setting.

file:///C|/VisioneerDoc/html/glossary.htm (5 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

dots per inch (dpi) A measure of the visual resolution of a display or output

device. Monitor screens typically have resolutions in the range of 70 to 75 dpi. Most common laser printers have a resolution of 300 dpi. The lower the resolution of a page in dots per inch, the lower the visual quality of characters on that page. Pro OCR can quickly and accurately recognize characters scanned in at resolutions down to 200 dpi.

dpi See

dots per inch (dpi).

draft quality text On 9-pin dot matrix printers, the low resolution printing

option. Draft quality text is monospaced and made up of visible dots that do not touch. In Pro OCR, click the Draft Quality button in the Recognize section of the Gallery, to improve recognition on draft quality dot matrix text. Compare with

letter quality text.

driver See

scanner driver.

exporting Saving a document in an external format, such as a word

processor, spreadsheet, text or standard image file. An exported document is created for use outside of Pro OCR.

export format Pro OCR can save and export documents in a variety of

specific word processor and spreadsheet formats. The specific export format is specified in the Save As dialog box.

file extension In the MS-DOS operating system, file names conventionally

consist of a base and a file extension, for example SAMPLE.TXT. In this example, “SAMPLE” is the base, and the file extension is “.TXT”. File extensions are used to identify the type of file. In this example, the file extension indicates that this is a text (ASCII) file.

file formats See

input file formats and output file formats.

file type Different applications create different file types. Some file

types are application-specific. Other file types are generic. The file type indicates what kind of information is contained in the file and what format the information is in. Most applications can only open files of certain file types.

file:///C|/VisioneerDoc/html/glossary.htm (6 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

fine resolution A term associated with FAX modems, referring to the highest

resolution of the image files typically produced by these devices. Fine resolution is approximately 200 x 200 dpi, which is adequate for reliable recognition.

flatbed scanner Scanner with a glass plate on which pages are placed face

down. Although such scanners can only read one page at a time, they can support a variety of paper sizes and it’s easier to control the proper alignment of a page. Compare with

automatic document feeder (ADF) and sheetfed scanner.

font All characters (letters, numbers, and symbols) in one size and

style of a font family. 12 point Helvetica Bold Italic is a font. “Font” is sometimes incorrectly used instead of “font family” or “typeface.” See also

font family and typeface.

font family The complete set of variations of a particular typeface. For

example, Helvetica is a font family. It contains a variety of typefaces including, for example, Helvetica, Helvetica Bold, Helvetica Italic, Helvetica Bold Italic. See also

font and

typeface.

font mapping Set in the Display Options dialog box. Tells Pro OCR which

fonts to use to display recognized text. Also specifies which fonts to use in documents that are exported to Windowsbased word processors.

format retention The ability to retain the layout of a page, including margins,

paragraph and column widths, and tabs and indents. Pro OCR preserves as much page format information as export formats support.

Gallery The Pro OCR toolbar. All settings for the Get Page, Locate,

and Recognize stages of the Pro OCR process are set in the Gallery. Common Pro OCR processes—Auto Start, and single-step Get Page, Locate, and Recognize—can be initiated from the Gallery.

Get Page Single-step Gallery function. It is also the first stage of the

Pro OCR process. Scans one page from a scanner or reads one file, using the current Get Page settings.

file:///C|/VisioneerDoc/html/glossary.htm (7 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

grayscale image An image format where individual pixels can be expressed

with more than a single bit, allowing the image to contain true shades of gray. Pro OCR will not open grayscale images. Compare with single-bit image.

hard page breaks Special formatting that you put in manually in a text or word

processor document. Most word processors and text editors automatically create soft page breaks unless you explicitly specify hard page breaks. In Pro OCR, you can force the output application to preserve the page breaks of the input document by clicking the “Insert Hard Page Breaks” checkbox when you are in the Save As Options dialog box.

heavy character In Pro OCR, a character that is printed too dark or thick, so

that the representation obscures detail and reduces confidence in the identity of that character.

I-beam pointer A mouse pointer shape that resembles an upper-case “I”.

When the pointer has this shape, you can select text. See also

insertion point.

icon An image that graphically represents an object, a concept, or

a message. Screen icons can represent disks, documents, application programs, or other things you can select and open. In an application such as Pro OCR, icons are also used to represent various settings in the gallery, Style ribbon, and Status bar.

illegible character A character that Pro OCR cannot recognize with adequate

certainty. Illegible characters in a document are highlighted and displayed with the specified illegible character symbol in the text view. See also

suspect character.

illegible character symbol The symbol Pro OCR uses to display illegible characters in

the text view. Set in the Display Preferences dialog box. See also

illegible character.

image view The view that displays the bitmapped image of a page. Used

to locate regions of text or graphics, and for viewing the original scanned image of a page during proofing and editing.

file:///C|/VisioneerDoc/html/glossary.htm (8 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

input file formats Pro OCR can read documents saved by other applications in

TIFF, PCX and DCX formats, as well as those documents saved in its own proprietary TIFF format. See also

PCX and

TIFF.

insertion point The place in a text file where text is inserted or deleted.

Indicated by a blinking vertical bar.

italic text Text with the italic attribute looks like this. See also

text

style.

justification Alignment of text to the left, right, or both margins of a

column or page. Text may be left-justified, right-justified, center-justified, or fully justified (both left- and rightjustified). Pro OCR preserves justification.

kerning A measure of the spacing between characters. In tightly

kerned text, the letters are very close together, which can cause letters to touch when the page is scanned. See also

touching characters.

landscape orientation When you hold a page of text to read it, it is in landscape

orientation when the page is wider than it is tall. Compare with

portrait orientation.

layout The relative position of elements on a page, such as margins,

columns, graphics, titles and sections.

layout analysis error The result of an OCR product’s inability to correctly organize

recognized text into words, lines and paragraphs on the page. There are two kinds of common layout analysis errors— incorrectly interpreting the flow of text on a page and incorrectly grouping or separating side-by-side paragraphs. Layout analysis errors can be more troublesome than character identification errors, particularly with documents having complex layouts. Compare with

character

identification error.

Legal page size See

page size.

file:///C|/VisioneerDoc/html/glossary.htm (9 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

Lenient suspect threshold Tells Pro OCR to only highlight suspect characters it is very

uncertain of. Very few characters are marked as suspect, compared to when the suspect threshold is set to normal or stringent. Use it when you’re dealing with documents containing fonts that you know from experience have been recognized accurately or when you’re less concerned with double-checking. Set in the Display Options dialog box. Compare with

Normal suspect threshold and Stringent

suspect threshold.

letter quality text Text made up of characters that are fully formed with dots

that are touching. Compare with

draft quality text.

line break The point at the edge of a line of text where the text flows

onto the next line.

Locate Single-step Gallery function. It is also the second stage of the

Pro OCR process. Specifies which text will be recognized on a page by creating or applying locate regions on the page according to the current Locate and Pictures settings. The current Locate setting may be either Normal, As Single Column, or Template. The current Pictures setting may be Locate Text and Pictures or Locate Text Only.

locate region Defines an area on the page image in the image view and the

text view. The text and picture kinds of locate regions may be defined automatically or manually. All three types of locate regions may be manually defined using the locate region drawing feature, or may be recalled using the Template locating method. See also

text region, numeric region,

picture region, and Template locating method.

locating The process in Pro OCR for specifying which locate regions

will be recognized on a page by creating or applying locate regions on the page.

locating method Tells Pro OCR how to locate regions for processing on a

page. The three locating methods are Normal, As Single Column, and Template. See also

Normal locating method, As Single Column locating method, and Template locating method.

file:///C|/VisioneerDoc/html/glossary.htm (10 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

menu A list of choices from which the user can choose. Menus

appear when you point to and click a menu title in the menu bar, or a pop-up menu title in a window or dialog box.

menu bar The horizontal strip at the top of a window that contains

menu titles.

multi-column text Text that is formatted into more than one column on a single

page. Examples include phone books and newspapers.

monospaced font Also known as a fixed pitch font. A typeface, such as

Courier, in which each character takes up the same amount of horizontal space. The output from most typewriters is monospaced. Compare with proportionally spaced font.

monospaced font mapping The font chosen for displaying monospaced text characters in

text views. Set in the Display Options dialog box. Compare with

sans serif font mapping and serif font mapping.

newspaper style columns Also known as “snaked” or winding columns. A column

format where the text flows down the vertical length of the column before moving to the top of the next column. As the name suggests, this type of column is commonly found in newspaper and magazine articles. This glossary is formatted in newspaper style columns. The flow of text in newspaper style columns is best suited for the Normal locate setting in Pro OCR.

Normal locating method One of Pro OCR’s three locating methods. Use it for most

kinds of input, including many tables and forms. Creates text regions based on column or paragraph spacing. Compare with

As Single Column locating method and Template locating method.

Normal suspect threshold Tells Pro OCR to highlight suspect characters that it is

somewhat uncertain of. More characters are marked as suspect than when a lenient suspect threshold is used. Use it with clean, clear, typeset documents when most of the words in the document are probably in the dictionaries. Set in the Display Options dialog box. Compare with

Lenient suspect

threshold and Stringent suspect threshold.

file:///C|/VisioneerDoc/html/glossary.htm (11 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

numeric region Defines a numeric area on the page image in Image View and

Text View. Numeric regions may be defined using Pro OCR’s manual region drawing feature, or may be recalled using the Template locating method. Compare with text

region and picture region. See also Template locating method.

OCR See

Optical Character Recognition (OCR).

On-Screen Verifier™ Pops up in the document area to display a section of the page

image corresponding to the current text selection in the text view. The on-screen verifier is displayed automatically when proofing, and can also be shown or hidden by choosing the Show/Hide On-Screen Verifier command from the Edit menu.

Optical Character Recognition (OCR) The process by which a computer converts scanned text

images into editable text characters.

order of text regions Shown by an arrow from the center of a text region to the top

center of the next text region, in Image View after Locating has been done. Text is output to application files in the order in which text regions are specified.

orientation Determines the angle or rotation of the page. Pro OCR allows

you to choose between portrait or landscape orientation. See also

portrait orientation and landscape orientation.

output file formats Pro OCR can save documents in a variety of formats,

including ASCII, a multitude of export formats, the Pro OCR format, and Pro OCR Deferred format. See also

export

format, Pro OCR format, and Pro OCR Deferred format.

page controls Contains the previous and next page arrows and the page

number box. Click the previous page arrow or the next page arrow to move from page to page in a document. See also

page number box.

page format The layout of the page, including its margins, paragraph and

column widths, and tabs and indents. Pro OCR preserves nearly all page format information. What page format information is preserved in saved application files depends on the application format.

file:///C|/VisioneerDoc/html/glossary.htm (12 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

page image The bitmapped image of a scanned page, displayed in the

image view in Pro OCR.

page number box Shows which page is being viewed and how many pages are

in the document. Double-click it to go to a specific page. See also page controls.

page orientation See

orientation.

page size The width and height to use when getting a page from a

scanner within Pro OCR. There are three pre-defined page sizes: US Letter, US Legal, and A4 Letter. There is also an option for user-defined page sizes.

page source Pro OCR can get pages from a file or the selected scanner.

You can draw pages from either source at any time.

PCX A common graphic file format on MS-DOS computers. Some

scanners produce PCX files. Pro OCR can read single PCX files produced by many scanners, fax cards, and graphics applications. A variation of the PCX format is DCX—a multipage PCX file. Pro OCR can also read DCX files.

picture element See

pixel.

picture region Defines a picture area on the page image. Picture regions may

be defined manually or by using the Locate button with “Locate Text and Pictures” selected.

pixel A single unit (or dot) of screen, printer or image resolution.

The number of pixels (or dots) per inch determines the resolution of an image. Most scanners and laser printers offer resolutions of at least 300 pixels (or dots) per inch.

pixel-for-pixel A large magnified image view (approximately 400%) of the

page. Lets you inspect the quality of the image. Each screen pixel corresponds to one image pixel.

plain text Text with no special attributes or styling, such as bold, italic,

or underline.

file:///C|/VisioneerDoc/html/glossary.htm (13 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

portrait orientation When you hold a page of text to read it, it is in portrait

orientation when the page is taller than it is wide. Compare with

landscape orientation.

printer font The representation of a font or typeface used for printing by a

printer. See also

font, font family, and typeface. Compare

with

screen font.

Pro OCR Deferred format One of Pro OCR’s output file formats. Saves a document

with the current state of Get Page, Locate, and Recognize for every page. When the document is processed using Process Deferred Job, the saved information is retained and only those processes and pages that have not already been specified are completed using the current Gallery settings.

Pro OCR format Pro OCR’s native/internal file format. The Pro OCR format is

a proprietary variation of the Group 4 TIFF format. Documents at various stages of processing may be saved in this format and opened later for additional processing.

Pro OCR process The five stage process that translates printed text or image

files into an output form suitable for use in other applications. The five steps of the Pro OCR process are: Get Page, Locate, Recognize, Proof/Edit and Save/Export.

Pro OCR window The main window for interacting with Pro OCR. Contains the

title bar, menu bar, gallery, scroll bars, Status bar, and document area.

Proof The fourth stage in the Pro OCR process, where any suspect

and illegible characters or misspelled words can be examined and corrected, if necessary. This command moves the insertion point to the next piece of text in the text view, according to the Proofing Options. The Proofing Options configure Proof to view suspect or illegible characters, misspelled words, punctuation, numbers, alphanumeric words, or entire lines at a time. Use the Tab key as a keyboard shortcut.

file:///C|/VisioneerDoc/html/glossary.htm (14 of 22) [1/20/2003 4:21:13 PM]

file:///C|/VisioneerDoc/html/glossary.htm

proportionally spaced font Also known as a variable pitch font. Typeface in which each

character takes up an amount of horizontal space consistent with its relative physical width, i.e. an “i” needs less space than a “w.” Times Roman and Helvetica are two common proportionally spaced typefaces. Compare with monospaced

font.

recognition accuracy A measure of the degree to which OCR output conforms to

the individual characters in the input document. Recognition accuracy is a percentage expression of the number of correct character identifications in relation to the total number of characters in the page or document. This measure is often used as the primary criterion in evaluating OCR performance, even though it does not account for layout analysis errors. Compare with

throughput.

Recognize Single-step Gallery function. It is also the third stage of the

Pro OCR process. The process in Pro OCR in which bitmapped text images are converted into editable text. Recognizes text defined by the text regions on the current page according to the current Recognize setting.

recognized text The initial result of OCR processing. Once an image has been

recognized, the resultant text can be proofed/edited and exported to other applications.

recognizing The process in Pro OCR in which character images are

converted into digital computer character codes (ASCII equivalents).

region style The type of a locate region, either text, numeric or picture.

Learning Pro OCR Basics

Chapter 2 Learning Pro OCR Basics

This chapter gets you started with Pro OCR. It introduces you to the Pro OCR window features, tells you the basic steps that you use when you work with Pro OCR, and provides several tutorial examples that you can complete to practice with Pro OCR.

TIP: If you use PaperPort software or scanners, see the Working with PaperPort document that came with Pro OCR. It provides tips and other information about using Pro OCR with these Visioneer products.

The Basic Steps

When you use Pro OCR, you convert an image of text and save it an editable format. To complete this conversion you perform the following basic steps:

1. Get Page—acquire pages either from a scanner or by opening an image file.

2. Locate—indicate which text on the page you want to recognize, and which

pictures (if any) to retain.

3. Recognize—convert the image to text.

4. Proof—check for incorrectly identified and unidentifiable characters and make

changes to recognized text.

5. Save—save the text to a variety of application formats.

Often, you automatically complete the first three steps by clicking the Auto OCR button, however, you can perform each step individually. You can also use a combination of automatic and individual processing by using deferred and finish processing features.

file:///C|/VisioneerDoc/html/02learn.htm (1 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

Starting Pro OCR

The following procedure helps you to get acquainted with Pro OCR and make sure that everything is set up correctly.

TIP: In addition to the following procedure, Visioneer provides two other ways to start and use Pro OCR: 1) From the Windows Start menu, choose Programs, and then choose Visioneer OCR Wizard. 2) If you use PaperPort software, start PaperPort and then choose the Pro OCR link.

To start Pro OCR and select processing options:

1. From the Windows Start menu, choose Programs, and then choose Visioneer. From the Visioneer menu, choose Visioneer Pro OCR 100.

The Pro OCR window appears. It includes pull-down menus, the Gallery toolbar, the Style ribbon, and the Status bar.

file:///C|/VisioneerDoc/html/02learn.htm (2 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

Feature Does this...

Pull-down menus Contains commands and options that you use to set process

options and initiate actions. Many of the commands in the pull-down menus are also available by using the Gallery buttons and Gallery buttons drop-down lists.

Gallery toolbar Lets you change common settings, start Auto OCR, or

individually perform any of the basic steps required to convert an image to text. Several Gallery buttons have drop-down lists from which you can select options.

Style bar Makes it easy to choose various style attributes for selected

regions and text. The Region Type options are available in image view and the Text Style options are available in text view.

file:///C|/VisioneerDoc/html/02learn.htm (3 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

Status bar Contains controls with which you choose how to view

pages (text or image view) and which pages to view. The Status bar also contains a status display area to keep you

informed of Pro OCR’s progress. Zoom controls Magnifies or reduces the view of the document. View controls Displays the page in a landscape or portrait orientation. Page controls Displays the previous or next page. Suspects or Illegibles Displays the number of suspect or illegible characters in

the document.

Selecting a TWAIN-Compliant Scanner

Before you scan an item with Pro OCR, make sure the scanner software is installed, and the scanner can scan images into your computer. Pro OCR works with many TWAIN-compliant devices. You can select the TWAIN device in the Pro OCR software.

NOTE: If you are using Pro OCR with Visioneer’s PaperPort software or scanners, see the Working with PaperPort document that came with Pro OCR, instead of the following procedure. If you are using a scanner that is not TWAIN-compliant, you cannot scan directly to Pro OCR. Instead, use your scanner’s software to save the scanned file in a TIF format, and then use the Pro OCR Get File command. For more information,

see “Getting Pages from an Image File” in Chapter 3.

To select a scanner:

1. Choose Select Scanner from the Tools menu.

The Select Source dialog box appears.

file:///C|/VisioneerDoc/html/02learn.htm (4 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

Figure 2-1: Select Source Dialog Box

NOTE: If the scanner driver you want is not shown, make sure that the scanner is properly connected to the computer and that both the scanner and the computer are plugged in, turned on, and operating correctly.

2. In the Select Source dialog box, select the TWAIN scanner driver you want to

use with Pro OCR.

3. Click Select.

The scanner you selected is available until you select a different one. You don’t have to repeat this procedure unless you want to select a different scanner.

Learning About the Gallery Toolbar

The Gallery toolbar contains buttons for starting the various steps of the Pro OCR process, including the Auto OCR button. The buttons numbered one through four are also important because you can select different options from drop-down lists before processing a document. For example, you can tell Pro OCR whether the document is one column or multiple columns. The options you select from these buttons affect the way that Auto OCR processes a document.

file:///C|/VisioneerDoc/html/02learn.htm (5 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

NOTE: Often you will use Auto OCR to complete processing. However, sometimes it is better to perform each step individually. (This is also referred to as manual or singlestep operation.) For example, you use the single-step procedures when you want to manually define locate regions, create a template, redo a step, recognize different type quality settings, or scan pages that have mixed orientations (portrait and landscape.)

Button Does this...

Auto OCR Performs Steps 1, 2, and 3 (Get, Locate, and

Recognize) of the OCR process. Before you click this button, select processing options from the Get, Locate, and Recognize drop-down lists.

Get Page Scans a page or opens an image file.

Locate Locates areas of text, pictures, and numbers and

determines how text flows on the page.

Recognize Converts areas of the page into editable text.

Proof Checks the document for errors.

file:///C|/VisioneerDoc/html/02learn.htm (6 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

Save As Saves the converted document in a variety of

formats, such as text, Rich Text Format (RTF), or HTML.

You can select options with the Gallery buttons by using the drop-down list next to each button.

To select an option from a Gallery drop-down list:

1. Click the arrow next to the Gallery button you want.

The drop-down list for the button appears. The following figure shows the Locate button with the drop-down list displayed.

2. Select the option you want.

A checkmark appears next to the option you selected.

Tutorial Examples

Now that you know the basic steps you can practice them using the sample documents that came with Pro OCR. The Pro OCR software comes configured and ready to use so that you don’t have to change the various options. You can find copies of the pages that you scan for the tutorials in the back of the Getting Started Guide. You can also find sample files in the Pro OCR directory.

NOTE: If you don’t have a scanner, you can complete the following exercises that require scanning, by instead using the Get File command and selecting the file from the Pro OCR directory.

file:///C|/VisioneerDoc/html/02learn.htm (7 of 33) [1/20/2003 4:21:15 PM]

Learning Pro OCR Basics

Example 1: Using Auto OCR to Scan a One-Page Simple Document and Save It in Pro OCR Format

This example shows how to convert (recognize) the text in a one-page document. You can find a ready-to-use sample in the back of the Getting Started Guide.

Selecting Gallery Options

Pro OCR processes a document using the options that are set in each drop-down list associated with a button of the Gallery toolbar.

To set Gallery options for this example:

1. From the Get Page drop-down list, choose Use Scanner.

2. From the Locate drop-down list, choose Locate Text Only and Single Columns Only.

3. From the Recognize drop-down list, choose Degraded or Fax Quality.

Starting Auto OCR

By clicking the Auto OCR button, you can perform the first three steps of the OCR process, that is, Get Page, Locate, and Recognize.

To process a simple document without any graphics:

1. Remove Sample A from the back of the Getting Started Guide. The document is a simple business letter.

2. Place the document on the scanner.

3. Click the Auto OCR button.

When you click Auto OCR, your scanner software dialog box appears.

4. Use the scanner software as you usually do to scan a page.

5. After the scanner has scanned the page, Pro OCR displays a dialog box that asks if you want to scan another page.

file:///C|/VisioneerDoc/html/02learn.htm (8 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

6. Click End.

Pro OCR continues with the second task to locate text regions on the page.

A progress bar moves down the page. When Pro OCR finishes locating, it displays text boxes indicating located text regions, with arrows connecting each text region to the next. Pro OCR outputs text in the order in which the arrows connect the text regions.

file:///C|/VisioneerDoc/html/02learn.htm (9 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

In the next step, Pro OCR recognizes the located text. While Pro OCR is recognizing, again a progress bar moves down the page.

When Pro OCR finishes recognizing the text, the Recognition Completed dialog box appears.

7. Click OK.

The document appears in the text view. You use the text view to proof the document and correct any errors.

file:///C|/VisioneerDoc/html/02learn.htm (10 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Usually at this point you proof the document. For now, just save it.

Saving a Document

You can save the processed document to disk in different formats. For example, if you want to open the document again in Pro OCR, you select the Pro OCR format.

To save the document:

file:///C|/VisioneerDoc/html/02learn.htm (11 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

1. Choose Save from the File menu, or click the Save As button on the Gallery toolbar.

The Save As dialog box appears.

2. Choose Pro OCR from the Save As drop-down list.

By saving the document in this format, you can edit the pages later within Pro

file:///C|/VisioneerDoc/html/02learn.htm (12 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

OCR. If you save in another file format, you must open it in an application that supports that format.

3. Type in a name for the file in the File Name box.

4. Click Save.

The text and format information of the document is saved in the format you’ve selected.

5. Choose Close from the File menu.

You just completed your first job using Pro OCR. Many of the jobs for which you use Pro OCR are as quick and simple as this one. You can now continue by completing the rest of the examples in this guide.

Example 2: Opening a File and Saving It in a Word Processor Format

Instead of getting and processing a document from a scanner, you can also process a file that was saved on disk. You can use this procedure to read TIFF, PCX, or DCX files produced by Pro OCR or other applications.

Opening a File

For this example, use the file, SAMPLEB.TIF, in the Pro OCR directory. This is a document that has a graphic. Because of the difference between this document and the one used in the previous example, you will change the options in the Gallery toolbar. Although this document has a graphic, let’s assume you don’t want to save the graphic.

You can either set the options before each step or set them all at once. In this example, you’ll set them as you go along.

To set the OCR options and get a file from disk:

1. Select Open File from the Get Page drop-down list.

2. Click the Get Page button in the Gallery toolbar.

The Get Page dialog box appears.

file:///C|/VisioneerDoc/html/02learn.htm (13 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

3. In the Pro OCR directory, select the file SAMPLEB.TIF.

4. Click Get.

The sample file is read in and the progress bar moves down the page.

Locating the Regions in a Document

For Pro OCR to properly convert areas of a document, you must locate the regions of the page that will be recognized. There are three types of regions: text, numeric, and picture. For example, a picture region is one that contains any kind of graphic, illustration, photograph, drawing, or picture. The contents of a picture region cannot be recognized, but can be saved as an image. By specifying the Locate options, Pro OCR knows what types of regions are in the document.

To specify the regions to locate:

1. Select Locate Text Only and Single Column from the Locate drop-down list.

If you did want to save the graphics in a document, you would select Locate Text and Pictures. Sometimes, you want the graphics so that you can recreate an exact duplicate of the document you are processing.

2. Click the Locate button in the Gallery toolbar.

Pro OCR goes through the document and recognizes the different regions. Arrows appear on the document showing the flow of the information.

file:///C|/VisioneerDoc/html/02learn.htm (14 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Recognizing the Document

The third step is to actually convert or recognize the text in a document. Pro OCR reads the text and displays the actual characters.

Before recognizing the document, you should specify the quality of the image text. You can do this by using the Recognize drop-down list.

file:///C|/VisioneerDoc/html/02learn.htm (15 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

To recognize the document:

1. Select Degraded or Fax Quality from the Recognize button drop-down list.

2. Click the Recognize button in the Gallery toolbar.

Pro OCR displays a bar that moves through the document as Pro OCR recognizes the text. When the process finishes, you see the document with text only.

file:///C|/VisioneerDoc/html/02learn.htm (16 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Proofing the Document

After a document is recognized it appears in the text view. In this view, you can proof the document for errors and make changes to the document when you find problems.

When you proof, you can:

■ Inspect recognized text and edit it if necessary.

■ Search for misspelled words, numbers, punctuation, symbols, and

alphanumeric words.

■ Change font style information.

NOTE: You can change the proofing options by choosing Options from the Tools menu.

To proof the document:

1. Click the Proof button in the Gallery toolbar, or press the Tab key.

Pro OCR starts at the current insertion point, if there is one. Otherwise, it starts at the top of the current page.

Pro OCR highlights the first word it does not recognize and displays the suspect text in the On-Screen Verifier.

The On-Screen Verifier is a pop-up window that displays the part of the page image corresponding to selected text.

TIP: For a a close up of the text, click the image to increase the magnification.

2. If the text is wrong, select the text and type the correct text.

3. Click the Proof button in the Gallery toolbar again or press the Tab key.

file:///C|/VisioneerDoc/html/02learn.htm (17 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Pro OCR displays the next suspect entry.

4. Repeat the previous steps until you have checked the entire document.

5. If you want to change the font style, select the text, and click the Style option.

Saving the Document

Saving the document places a permanent copy of it on disk.

To save the document:

1. Choose Save from the File menu, or click the Save As button in the Gallery toolbar.

The Save As dialog box appears.

2. Type the file name in the File Name box.

3. Select MS Word for Windows from the Save as drop-down list.

You can save documents in many popular formats, including Rich Text Format (RTF), plain text, and Microsoft Excel.

4. Click Save.

5. Choose Close from the File menu.

Example 3: Scanning a Document of Multi-Column Text

This example introduces you to processing of multi-column text like newspapers, magazine articles, and multicolumn books (but not tables), where you want the text to be recognized column by column.

To scan multi-column text and save in Pro OCR format:

1. Put Sample Document C in the scanner. You can find a copy of this document in the back of the Getting Started Guide.

Make sure to place the document in the correct orientation and to align it.

2. Select Locate Text Only and Multiple Columns from the Locate drop-down list in the Gallery toolbar.

file:///C|/VisioneerDoc/html/02learn.htm (18 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Locate Text Only prevents Pro OCR from locating any picture element in the document to be scanned.

3. Select Use Scanner from the Get Page drop-down list in the Gallery toolbar.

4. Click Auto OCR in the Gallery toolbar.

Your scanner software dialog box appears.

5. Use the scanner software as you usually do to scan the document.

After scanning the sample document, the document appears in Pro OCR.

A dialog box appears asking for additional pages to scan. For this example, you won’t scan any additional pages.

6. Click End.

Automatic processing continues with locating and then recognizing.

file:///C|/VisioneerDoc/html/02learn.htm (19 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

While Pro OCR recognizes the page, notice the boxes indicating located text regions around each column, and the arrows connecting each text region to the next. Note that by using Locate Text Only, the graphic element in the sample was not located and so a box does not appear around it.

Pro OCR outputs text in the order in which the arrows connect the text regions. For this example, notice how the boxes are drawn and connected.

file:///C|/VisioneerDoc/html/02learn.htm (20 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

When Pro OCR finishes recognizing, the Recognition Completed dialog box appears.

7. Click OK.

The document appears in the text view.

To save the document

1. Choose Save As from the File menu, or click the Save As button in the Gallery toolbar.

The Save As dialog box appears.

2. Select Pro OCR from the Save As Type drop-down list.

The Pro OCR format saves all available information in the document.

3. Type in a name for the file in the File Name box.

4. Click Save.

Both the image of the scanned page and the recognized text are saved. Always save files in the Pro OCR format when you want to reopen them in Pro OCR.

NOTE: To reopen a file saved in the Pro OCR format, use the Open command from the File menu. If you use Get Page, Pro OCR only restores the page image. The Open command restores all the saved information, including any recognized text and proofing information.

5. Choose Close from the File menu.

For information about other file formats,

see Chapter 6, “Saving and Printing

file:///C|/VisioneerDoc/html/02learn.htm (21 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Documents.”

Example 4: Scanning a Document With Tables and Saving in a Spreadsheet Format

This example introduces you to processing of multi-column text in tables, where you want the text to be recognized as all one text block and not broken into columns. You can use this procedure whenever you want to recognize tables and other documents that you don’t want broken into columns.

To scan multicolumn table text and save in spreadsheet format:

1. Select Single Columns Only and Locate Text Only from the Locate dropdown list in the Gallery toolbar.

2. Put Sample Document D in the scanner.

Make sure to place it in the correct orientation to align it.

3. Click Auto OCR.

Pro OCR displays your scanner software.

4. Use the scanner software as you usually do to scan the document.

After scanning the sample document, it appears in the Pro OCR window.

A dialog box appears. asking if you want to scan additional pages. For this example, you won’t be scanning any additional pages.

5. Click End.

Pro OCR locates and then recognizes the page.

file:///C|/VisioneerDoc/html/02learn.htm (22 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Notice that the text regions are not drawn separately around each column. By using the Single Column locating method, you force Pro OCR to ignore columns and tell it to read the page from left to right, top to bottom.

When Pro OCR is finished recognizing the page, the Recognition Completed dialog box appears.

6. Click OK.

file:///C|/VisioneerDoc/html/02learn.htm (23 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

Pro OCR displays the document in the text view.

To save the document:

1. Choose Save As from the File menu, or click the Save As button in the Gallery toolbar.

The Save As dialog box appears.

file:///C|/VisioneerDoc/html/02learn.htm (24 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

2. Choose Microsoft Excel from the Save as Type drop-down list.

Notice that the following options are already selected.

TIP: To change these options, click the Options button.

3. Type in a name for the file in the File Name box.

4. Click Save.

Pro OCR saves the text and format information of the document in the format you have selected.

5. Choose Close from the File menu.

NOTE: If you don’t save a version of this file in the Pro OCR format, you cannot open it again in Pro OCR. You can open the version that you just saved in any spreadsheet application that supports the Microsoft Excel format.

Example 5: Scanning and Saving a Document with Pictures

This example shows you how to scan a document with photographs or line drawings and save it in a word processor file format.

To scan and save a document with pictures:

1. Select Multiple Columns and Locate Text and Pictures from the Locate drop-down list in the Gallery toolbar.

2. Put Sample Document C in the scanner. You can find this document in the back of the Getting Started Guide.

3. Click Auto OCR.

Pro OCR displays your scanner software.

4. Use the scanner software as you usually do to scan a document.

file:///C|/VisioneerDoc/html/02learn.htm (25 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

After scanning the sample document, it appears in the Pro OCR window.

Pro OCR begins getting the page from the scanner. When the scanning is done, a dialog box appears asking if you want to scan additional pages. For this example, you won’t be scanning any additional pages.

5. Click End.

Automatic processing continues with the Locate and Recognize steps.

file:///C|/VisioneerDoc/html/02learn.htm (26 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

The Recognition Complete dialog box appears.

6. Click OK.

The document appears in the text view. Notice that the graphic image appears and has a picture region drawn around it.

To save the document:

1. Choose Save As from the File menu, or click the Save As button in the Gallery toolbar.

The Save As dialog box appears.

2. Choose Rich Text Format (RTF) from the Save as Type drop-down list.

RTF allows you to save the pictures along with the text in the exported file.

NOTE: As an alternative, you can save in a format for an application that you have, such as Ami Pro, Word for Windows, and WordPerfect 5.x.

3. Select the Save Pictures option.

4. Choose Embed in Export File from the Save Pictures drop-down list.

This format embeds the pictures into the RTF file along with the text.

5. Type a name for the file.

file:///C|/VisioneerDoc/html/02learn.htm (27 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

6. Click Save.

The picture from the scanned page is now saved within the RTF file along with the recognized text. If you open this file in a word processor that supports pictures in RTF files, you see the recognized text and the pictures.

7. Choose Close from the File menu.

Example 6: Locating a Document Using a Template

At times, you don’t want to recognize all the text on a page. For example, in this exercise the sample page has a header and a footer that you don’t want to recognize or save. The sample template in this example is designed to create a text region around just the body text during the Locate step. The title and copyright in the footer are not recognized (saving time during recognition) and are not displayed (saving you the time of searching for and deleting them).

In this example, you use a supplied template that you can use for your own documents as well. You can also create your own templates, to customize Pro OCR for the kinds of pages that you typically use.

To use a template:

1. Choose Template from the Locate drop-down list in the Gallery toolbar.

2. Choose Select Template from the File menu.

The Select Template dialog box appears.

file:///C|/VisioneerDoc/html/02learn.htm (28 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

3. In the Temp folder, find and select the file TEMPB.TPL.

4. Click Open.

Pro OCR displays the name of the template you selected next to Template in the Locate drop-down list.

5. Select Open File from the Get Page drop-down list.

6. Click the Get Page button in the Gallery tool bar.

7. In the Pro OCR directory, select the file SAMPLEB.TIF and click the Get button.

The sample file is read in.

8. Click the Locate button.

Notice that text boxes are drawn around just the body text on the page. This is the text region defined by template. Only the text within this text region is recognized.

9. Click the Recognize button.

After recognizing is completed the document appears in the text view. You can review the recognized document in the text view. Notice that the title and the

file:///C|/VisioneerDoc/html/02learn.htm (29 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

copyright in the footer were not recognized. If you save this page in an application or text format, only the displayed text is saved.

10. Save and close the document.

Use the same procedures described in the earlier examples.

Example 7: Scanning a Document with Mixed Tables and Manually Locating Regions

This example shows you how to scan and manually locate a document with a table that has some rows or columns suitable for numeric regions and other rows or columns suitable for text regions.

To scan and locate a document with mixed tables:

1. Put Sample Document D in the scanner.

Make sure to place it in the correct orientation and to align it.

2. Select Single Column from the Locate drop-down list.

3. Click the Get Page button.

Pro OCR begins getting the page from the scanner and displays your scanner software.

4. Use your scanner software as you usually do.

Pro OCR scans in the page and then displays it in the image view.

5. Choose Zoom Out from the View menu, or click the Zoom Out button on the Status bar.

You can reduce or enlarge the document on the screen by using the Zoom In or Zoom Out features.

To select regions manually:

1. Scroll the page up a short distance so that the table labeled “ZBOL Mining Production, 1998” is fully visible on your screen.

2. Move the pointer just above and to the left of the first column header, titled “Mineral.”

file:///C|/VisioneerDoc/html/02learn.htm (30 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

3. Press and hold the mouse button; then drag down and to the right until the box following the pointer encloses all of the column headers.

4. Release the mouse button.

You have just manually located a text region.

5. Move the pointer just above and to the left of the item labeled “Gold.”

6. Press and hold the mouse button; then drag down and to the right until the box following the pointer encloses the first column of the table.

The box should enclose the items from “Gold” through “Cobalt.”

TIP: If you make a mistake, select the region and press Del.

7. Release the mouse button.

You have just manually located another text region. Note that an arrow appears that connects this text region to the first text region you defined for the table headers.

8. Move the pointer just above and to the left of the first number column.

file:///C|/VisioneerDoc/html/02learn.htm (31 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

9. Using the same steps you used to create the text regions, drag the mouse until the box following it encloses all three columns of numbers and release the mouse button.

Make sure the entire image of the number columns is enclosed by the new region you have defined.

10. Choose Numeric from the Style menu.

The locate region you just defined becomes a numeric region.

To make a table from the selected regions:

1. Choose Select All from the Edit menu.

Pro OCR selects all of the locate regions you defined.

2. Choose Make Table from the Edit menu.

Pro OCR creates a table from the selected locate regions.

3. Click the Recognize button in the Gallery toolbar.

Pro OCR recognizes the page image using the locate regions you defined in the previous steps.

After Pro OCR is finished recognizing, the page appears in the text view.

file:///C|/VisioneerDoc/html/02learn.htm (32 of 33) [1/20/2003 4:21:16 PM]

Learning Pro OCR Basics

You have completed this example. A message appears asking if you want to save the document.

4. Choose Close from the File menu.

Close the document without saving it.

file:///C|/VisioneerDoc/html/02learn.htm (33 of 33) [1/20/2003 4:21:16 PM]

Getting Documents

Chapter 3 Getting Documents

This chapter tells you how to get (acquire) documents with Pro OCR. It is assumed that you completed the procedures in

“Starting Pro OCR,” and “Selecting a TWAIN-

Compliant Scanner,” in Chapter 2.

In this chapter you learn:

■ The basic steps for getting a page

■ How to get a page using a scanner

■ How to get a page from a file

Getting a Page—The Basic Steps

There are two ways to get a page: 1) Use Auto OCR to automatically get a page. 2) Perform an individual Get Page. In each case you need to select the source—your scanner or an image file—that you want to use to get the page. If you select a scanner, you also need to select a few other options. The following procedure tells you the basic steps to get a page. For more detailed information, see

“Getting Pages

From a Scanner,” and “Getting Pages from an Image File,” later in this chapter.

To get a page:

1. Select a source from the Get Page drop-down list, in the Gallery toolbar.

2. If you select Use Scanner, select scanner options as described in

“Setting

Scanning Options,” later in this chapter.

3. Click Get Page or Auto OCR depending on which process you want to use.

4. If you select Use Scanner, scan the document using your scanner. If you select Open File, open the file you want to use.

file:///C|/VisioneerDoc/html/03get.htm (1 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

Getting Pages From a Scanner

You can use a scanner to get one page at time by using the Get Page button, or use a scanner with Auto OCR to get multiple pages automatically. This section tells you how to:

■ Set scanning options

■ Get one page using Get Page

■ Get pages with Auto OCR

Setting Scanning Options

To set scanner settings for your scanner, such as the resolution, brightness, and page orientation, see the documentation that came with your scanner. You can set the following processing options in Pro OCR by choosing Options from the Tools menu:

■ Straightening Skewed Images. Automatically straightens type that is skewed

(crooked) on a page. When text on a page is badly skewed, Pro OCR may have trouble correctly locating paragraph boundaries. Recognition may also be affected, resulting in many illegible characters.

NOTE: Processing with the Straighten Skewed Images option selected takes longer than processing the same page with this option not selected. However, recognition is usually much better on skewed type if the page image has been straightened. You may want to experiment on skewed pages to see when to use the Straighten Skewed Images option. Pro OCR is preset to not straighten skewed images.

■ Splitting one A3 page. For scanners that scan two, 11 by 17 inch pages, you

can scan bound material and Pro OCR will automatically split the image into two pages.

■ Auto Orientation. Automatically selects Portrait or Landscape orientation for

the page.

By default, Pro OCR does not select these settings for you.

file:///C|/VisioneerDoc/html/03get.htm (2 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

To set Get Page Processing options:

1. Choose Options from the Tools menu.

The Options dialog box appears with the Processing tab selected.

2. Select the options that you want to use.

3. Click OK.

Selecting a Scanner as the Source

When you get pages from a scanner by using Auto OCR, a deferred job, or Get Page, one or more page images are read in from the scanner. Pages are scanned according to the current page size, orientation, brightness, and scanning settings selected in your scanner’s software.

When you read in additional pages from a scanner, new page images are added to the active document. You can read up to 999 pages into a document, as long as you have enough available disk space.

To select a scanner as the source:

■ Select Use Scanner from the Get Page drop-down list in the Gallery.

file:///C|/VisioneerDoc/html/03get.htm (3 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

NOTE: If you did not previously select a scanner, the Select Scanner dialog box appears, letting you select one now. (You can also select a scanner by choosing Select Scanner from the Tools menu.)

Getting a Page Using a Scanner

During the single-step Get Page operation, you scan only one side of one page at a time. You cannot automatically read stacks of pages or double-sided pages. Instead, you must manually feed pages that you want to be read. The procedure is the same whether you use an automatic document feeder or a flatbed scanner.

When you scan in a page using Get Page, the new page is added after the current page. If you want to add pages to the end of a document, make sure the last page of the document is displayed before you do Get Page. To insert a page after any other page, make sure the appropriate page is displayed. Go to the page, if necessary, and then use Get Page to insert the new page after it. You can also use single-step Get Page to replace a current page.

To get one page from a scanner using Get Page:

1. Make sure you have set scan options as described in

“Setting Scanning

Options,” previously in this document, and select Use Scanner from the Get

Page drop-down list.

2. If you are adding pages, to other pages that you already got, make sure the current page is displayed in Pro OCR.

New pages are added after the current page.

3. Place one page on the flatbed or place one page in the ADF.

Make sure the page is oriented correctly for your scanner and the orientation you have selected.

You can put in as many pages as the ADF will hold, but Pro OCR will only scan one page at a time using Get Page.

4. Click Get Page.

The Get Page button is highlighted to indicate that Pro OCR is getting pages. In the status display area, a meter bar indicates that Pro OCR is scanning the page.

file:///C|/VisioneerDoc/html/03get.htm (4 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

Pro OCR scans the page on the flatbed or the first page in the ADF, using the current brightness, page size, orientation, and scanning resolution settings. After the single page is read in, it appears using the previous magnification.

NOTE: To find the most appropriate brightness setting for a page, use Get Page to scan the same page as many times as necessary. You can change the level of brightness in your scanner’s software.

To scan additional pages:

1. Place another page on the flatbed.

2. Click Get Page.

Repeat steps 1 and 2 for each additional page you want to scan.

NOTE: After Get Page is completed, whether or not pages have been located or recognized, you can save files in Pro OCR format, or in any of the other image output file formats. for more information about saving, see

Chapter 6, “Saving and

Printing Documents.”

Using Auto OCR with Scanners

This section tells you how to use Auto OCR with a flatbed scanner or Automatic Document Feed (ADF) scanner.

NOTE: When scanning pages, make sure that pages are placed as straight as possible. Pages skewed more than 2° may result in the incorrect sorting and grouping of text lines unless the Straighten Skewed Images processing option is selected. Also note that pages skewed more than 0.5° may jam in an ADF.

Using Auto OCR with a Flatbed Scanner

To use Auto OCR with a flatbed scanner, complete the following procedure.

NOTE: You cannot scan double-sided pages automatically when using a flatbed scanner. You should place the pages on the scanner’s bed in the order in which you want the text to be read.

To automatically process one or more pages with a flatbed scanner:

1. Make sure you have set scan options as described in

“Setting Scanning

Options,” previously in this document, and select Use Scanner from the Get

file:///C|/VisioneerDoc/html/03get.htm (5 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

Page drop-down list.

2. Check the Locate and Recognize options to make sure they are set the way you want them.

3. Place the first page on the flatbed.

Make sure the page is oriented correctly for your scanner and the page orientation you have selected in the Gallery.

4. To scan more than one page, choose Options from the Tools menu, and then select the Enable Auto OCR Dialogs processing option.

5. Click Auto OCR.

The scanner software appears.

6. Use the software as you usually do.

Pro OCR begins getting pages:

file:///C|/VisioneerDoc/html/03get.htm (6 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

If the Enable Auto OCR Dialogs processing option is not selected, scanning is completed. Pro OCR begins locating and then recognizing.

If the Enable Auto OCR Dialogs processing option is selected, Pro OCR asks for additional pages to scan after it finishes reading in the current page:

file:///C|/VisioneerDoc/html/03get.htm (7 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

7. If you want to get additional pages, place another page on the flatbed.

Pro OCR scans the additional page on the flatbed and displays the dialog box again, asking for the next page. Repeat this step for as many additional pages that you want to scan.

8. If you do not want to scan more pages, click End.

Scanning is completed. Pro OCR displays the page you’ve scanned in the image view. Pro OCR then begins locating and then recognizing.

Using Auto OCR With a Scanner with an ADF

Complete the following procedure to use Auto OCR with scanners that have an ADF.

NOTE: To use an ADF scanner with Pro OCR, you need the Pro OCR ISIS upgrade. For more information, visit Visioneer’s Web site at

www.Visioneer.com.

To automatically process one or more pages with a scanner that has an ADF:

1. Make sure you have set scan options as described in

“Setting Scanning

Options,” previously in this document, and select Use Scanner from the Get

Page drop-down list.

2. Place one or more pages in the ADF.

Make sure the pages are oriented correctly for your scanner and the page orientation you have selected in the Gallery.

3. To scan more than one page, choose Options from the Tools menu, click the Processing tab, and then select the Enable Auto OCR Dialogs processing option.

4. Check the Locate and Recognize options to make sure they are set the way you want them.

file:///C|/VisioneerDoc/html/03get.htm (8 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

5. Click Auto OCR.

Pro OCR begins getting pages.

If the Enable Auto OCR Dialogs processing option is not selected, scanning is completed. Pro OCR begins locating and then recognizing.

If the Enable Auto OCR Dialogs processing option is selected, Pro OCR asks for additional pages to scan.

6. If you want to scan another stack of pages, place the next stack of pages in the ADF.

Pro OCR scans the additional pages in the ADF and displays the dialog box again, asking for additional pages to scan. If you need to scan the second side of a stack of double-sided pages, see the next procedure,

“To scan the second

side of double-sided pages:.”

Repeat this step for as many additional stacks of pages as you want to scan.

7. If you’ve scanned all the pages you need for this job, click End.

Scanning is completed. Pro OCR displays the first page of the scanned stack, in the image view. Pro OCR then begins locating and recognizing.

To scan the second side of double-sided pages:

1. When you’re finished scanning the first side, turn the entire stack of pages in the ADF over and replace them in the ADF.

Make sure that you don’t change the order of pages, and that you replace them in the proper orientation. If your double-sided document contains more pages than your ADF can handle, you’ll need to separate the document into smaller stacks. After scanning the first side of a smaller stack, scan the flip side of the same stack before continuing with the next stack.

2. Click Flip in the dialog box that appears.

Pro OCR scans the second side of each page using the current brightness, page size, orientation, and scanning resolution settings.

3. When you’re finished, click End.

file:///C|/VisioneerDoc/html/03get.htm (9 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

Scanning is completed. Pro OCR finishes getting pages and displays the first page of the scanned stack in the image view. The scanned double-sided text is correctly sequenced, in correct page order.

Getting Pages from an Image File

Typically, Pro OCR obtains the image of a page by working directly with your scanner. You can, however, also use Pro OCR with image files you scanned or created using other applications. There are several common sources for obtaining image files, other than scanning with Pro OCR:

■ Scanner applications not supported by Pro OCR

■ Fax-modem applications

■ High-resolution paint programs

Pro OCR can read the following image file formats:

■ TIFF (Uncompressed, PackBits, Group 3, Group 3 modified, Group 4)

■ PCX

■ DCX

Pro OCR can open black-and-white (one-bit) single-page or multiple-page image files. Pro OCR does not open grayscale (greater than one-bit) or color image files.

Not all instances of the above files from every application are supported, however, because specific implementations of these formats are not necessarily standard. If you try to open a file of a type that Pro OCR doesn’t recognize, Pro OCR displays a warning message.

Selecting a File as the Source and Getting Pages

The following procedure tells you how to select and open a file as the source for Get Page.

To select and open files:

file:///C|/VisioneerDoc/html/03get.htm (10 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

1. Select Open File from the Get Page drop-down list.

A checkmark appears next to it when selected.

2. Click the Get Page button in the Gallery toolbar.

The Get Page dialog box appears.

3. Select the file and click Get.

The file is read in and the progress bar moves down the page.

Getting Files From Other Scanner Applications

Pro OCR supports many of the most popular scanners directly. However, if you don’t have a scanner that Pro OCR supports directly, you may still be able to use Pro OCR with the scanner application you do have. Most scanner applications save to one of the image file formats that Pro OCR supports.

To get pages from a non-supported scanner:

1. Scan a page using a scanner application that is compatible with your scanner.

2. Save the page in an image file format that Pro OCR supports.

3. Select Open File from the Get Page drop-down list.

file:///C|/VisioneerDoc/html/03get.htm (11 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

4. Click Auto OCR.

5. Find and select the file(s) that you want to process.

6. Click Add and then click Get.

Pro OCR automatically processes the image file(s) according to the controls in the Locate and Recognize rows of the Gallery.

1. Click Get Page.

2. Find and select the file that you want to process.

3. Click Add then click Get.

Pro OCR reads in the specified file. You can continue with any combination of the single-step Locate and Recognize operations, followed by Finish Processing, or save it in the Pro OCR Deferred format and finish processing it later on using Process Deferred Jobs.

After the page is read in, Pro OCR treats the page as if it had scanned it.

Getting Fax-modem Files

Pro OCR can also open fax-modem files, if they have been saved in one of the supported input file formats.

Many fax-modems have both a Standard and a High-Resolution (or Fine) setting. The Standard setting typically transmits characters at 204 x 98 dpi. The HighResolution setting typically transmits at 204 x 196 dpi. Fax-modem files transmitted at Standard setting may not be recognized by Pro OCR as accurately as those transmitted at High-Resolution.

To get a fax-modem file, use the same procedure as described in the previous section.

NOTE: It is recommended that you use the highest resolution a fax-modem can produce for the best possible recognition.

Using Auto OCR With a File

file:///C|/VisioneerDoc/html/03get.htm (12 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

You can specify one or more image files for the Get Page step, and then have Pro OCR automatically locate and recognize them. If you’ve selected the Enable Auto OCR Dialogs processing option, you can also select one or more additional files after reading the initial files and before locating and recognizing begin. Pro OCR can process most standard black and white TIFF, PCX, and DCX files.

To automatically process from a file:

1. Select Open File from the Get Page drop-down list in the Gallery toolbar.

2. Check the Locate and Recognize options to make sure they are set the way you want them.

3. Click Auto OCR.

The following dialog box appears:

4. Find and select the files you want.

To get just one file, click the file name.

To get multiple files, click the Advanced button. The dialog box expands. Click the file that you want to get and then click the Add button. The file names appear in the Selected Files list in the lower half of the dialog box. You

file:///C|/VisioneerDoc/html/03get.htm (13 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

can add available files from as many directories and disks as necessary. Files are displayed in the Selected Files list in the order in which you add them.

NOTE: To remove a file from the Selected list, select the file name and click the Remove button. To remove all selected files, click Remove All.

5. Click Get.

Pro OCR reads in the selected file or files. As a page is read in, the Get Page button is highlighted, and the progress bar moves down the page.

Each page is read in and displayed in the image view at 25% magnification (zoom level).

If the Enable Auto OCR Dialogs processing option is not selected, when all pages have been read Pro OCR finishes getting pages and displays the first page in the image view. Pro OCR then locates and recognizes each page.

If the Enable Auto OCR Dialogs processing option is selected, when all pages have been read the Get Page dialog box is again displayed.

6. To add pages from an additional file or files to the end of your current document, repeat steps 5 and 6 as often as necessary.

Each time you read in another file, the new pages are read in and added to the end of the current document. You can read up to 999 pages into a document, as long as you have enough available disk space.

7. When you’re done reading files, click Finished.

When you click End, the file reading step completes, and locating and then recognizing begins. For more information about locating see

Chapter 4, “Locating Text and Graphics.” For more information about recognizing and

proofing, see

Chapter 5, “Setting Recognize Options and Proofing a

Recognized Document.”

NOTE: When you use Auto OCR, the locate and recognize steps occur automatically.

file:///C|/VisioneerDoc/html/03get.htm (14 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

More About Enabling Auto OCR Dialogs

By default, after you’ve used Auto OCR to scan pages or to read in one or more files, Pro OCR displays a dialog box that prompts you to continue in one of several ways:

■ Scan another page or stack of pages

■ Scan the second side of a page or stack

■ Open additional files

This lets you read in and process multiple files or stacks of pages as one document.

However, it also means that you have to click Finish to proceed with automatic processing after the Get Page step is done. If instead you want the Auto OCR process to continue without interruption, you can prevent the dialog box from reappearing by deselecting the Enable Auto OCR Dialogs option.

NOTE: You can’t process more than a single stack of pages or set of files when Enable Auto OCR Dialogs is deselected.

To enable/disable Auto OCR dialog boxes:

1. From the Tools menu, choose Options.

The Options dialog box appears with the Processing options.

file:///C|/VisioneerDoc/html/03get.htm (15 of 16) [1/20/2003 4:21:17 PM]

Getting Documents

2. To enable the dialogs, select Enable Auto OCR Dialogs. To disable the

dialog boxes, deselect the option.

file:///C|/VisioneerDoc/html/03get.htm (16 of 16) [1/20/2003 4:21:17 PM]

Saving and Printing Documents

Chapter 6 Saving and Printing Documents

This chapter describes the input file formats and output file formats that Pro OCR supports and tells you how to save documents in a variety of these formats.

Saving Documents and Other Pro OCR Items

You can save the following documents and items:

■ Documents (in various file formats)

■ Templates (text, numeric, picture, and table region definitions and ordering

information)

■ Gallery settings and selected processing, display, and proofing options

Saving a Document

Documents are not saved automatically. You save a document using Save or Save As from the File menu. If you close or exit Pro OCR without saving a document, a message prompts you to save the current document.

After you get a document, you can save it at any or all of the various stages of the Pro OCR process—after locating, recognizing, or proofing. If a document does not contain recognized text, you can save only as Pro OCR, Pro OCR Deferred, or using one of the standard image file formats.

NOTE: When you save to formats other than Pro OCR or Pro OCR Deferred, you must still save the document in one of the Pro OCR formats to be able to use it again in Pro OCR.

To save an open document:

1. Choose Save As from the File menu.

file:///C|/VisioneerDoc/html/06save.htm (1 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

The Save As dialog box appears:

If the document has been saved previously, the name of the document is displayed and selected in the File Name box. If the document has not already been saved, the File Name box is selected and contains the default file name:

UNTITLED.XXX. Pro OCR adjusts the file extension represented here as XXX according to the document format you select in the Save as Type drop-

down list.

2. Type a new file name, if necessary.

3. Choose a document format from the Save as Type drop-down list.

If this is a new file, the last used document format is displayed. If this is a previously saved file, the previously saved document format is displayed.

You can choose from the following document formats:

■ Pro OCR document file formats

■ Standard image file formats

file:///C|/VisioneerDoc/html/06save.htm (2 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

■ Standard text file formats

■ Word processor and spreadsheet file formats

For more information about the different file formats,

see “Supported Output

File Formats” later in this chapter.

4. If you want to save any pictures in the document, select the Save Pictures option and choose a picture format from the Picture Format drop-down list.

NOTE: Saving pictures in a document is different from saving the entire page image. Save the page image using one of the image file formats presented in the Save as Type drop-down list. For more information,

see

“Saving Pictures” later in this chapter.

5. If you want to embed any pictures into the document when it is saved, choose Embed in Export File from the Picture Format drop-down list.

The embedding option is only available for the following document formats:

■ MS Word for Windows

■ Rich Text Format (RTF)

■ WordPerfect 5.0 and 5.1

If you want to save the page images only, choose one of the other picture formats from the Save as Type drop-down list.

Choosing a picture format tells Pro OCR to save only the page images from the active document.

NOTE: When saving the document in one of the standard TIFF formats, you can choose whether to save all pages in one file, to split on blank pages or to save one page per file. When saving to the PCX format, you must save one page per file.

6. To select the formatting information that will be exported to the format

file:///C|/VisioneerDoc/html/06save.htm (3 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

currently chosen in the Save as Type drop-down list, click the Options button to open the Save As Options dialog box.

Most formats have additional options. If there are no options available for the format you’ve selected, the Options button is dimmed.

The Save As Options dialog box has the following sets of options:

■ If page breaks should be inserted between each page

■ If formatting should be preserved or completely discarded, or if only

certain formatting should be preserved

■ If all pages in the document should be saved as a single file, or as

separate files for each page

If you decide to only save certain formatting, you can select from the following formatting to be saved:

■ Style

■ Font (typeface)

file:///C|/VisioneerDoc/html/06save.htm (4 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

■ Point size

■ Justification

■ Number of columns

■ Line spacing

■ Paragraph indentation

■ Page size

■ Margin sizes

Choose one of the Split Document options to either keep all pages in one file or split the document into multiple files:

■ All Pages in One File: Choose this option to save all the pages in the

document in one file.

■ Split on Blank Pages: Choose this option when you want Pro OCR

to save a stack of documents into separate files.

To use this option, before you scan the stack of pages, put a blank page after the last page of each document you want Pro OCR to save as a separate file. For more information, about saving multiple documents,

see “Saving Multiple Documents as Separate Files” and see “Saving Multiple Page Images as Separate Image Files” later in this chapter.

NOTE: For Split on Blank Pages to work properly, make sure to use the Recognize operation on every blank page.

Pro OCR saves each stack of pages up to a blank page as a separate file, using the name you specified followed by a sequential three- digit numeric identifier, followed by the appropriate extension. For example, if you name the current document BOOK, and then save it to Excel 2.x format with “Split on Blank Pages” selected, Pro OCR will save the first file (up to the first blank page) as BOOK001.XLS, the next file as BOOK002.XLS, and so on.

■ One Page Per File: Many image editing programs can support only

file:///C|/VisioneerDoc/html/06save.htm (5 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

one image page per file. If you save in PCX format, Pro OCR automatically selects this option, because a PCX file can only have one page.

When you use this option, Pro OCR automatically creates one file for each page. Pro OCR saves each file using the name you specified followed by a sequential three-digit numeric identifier, followed by the appropriate extension. For example, if you name the current document IMAGE, and then save it in a TIFF format with “One Page Per File” selected, Pro OCR saves the first page image as IMAGE001.TIF, the next page image as IMAGE002.TIF, and so on.

7. If you opened the Save As Options dialog box, click OK to close it.

The Save As dialog box reappears.

8. Click OK.

The document is saved according to the selected options.

If you try to save the document with a name that has already been used, a dialog box asks if you want to replace the existing document. Click No to return to working with the document. Click Yes to replace the document.

NOTE: When you want to open a document in an image editing program, save it in one of the image file formats. Any locate regions that have been applied or created are not saved. If the document has been recognized, the recognized text is not saved.

Saving Multiple Documents as Separate Files

Often you’ll have many documents on which you want to do Get Page, Locate, and Recognize at one time, but you want the recognized files saved as separate documents. Pro OCR makes it easy for you to process a large stack of separate documents as one and still keep them separate when you save them. You can do this when you’re saving to a text format, the various image output formats, or to any export format.

To save multiple multipage documents as separate files using the split option:

1. Before you put the pages in the scanner, separate the documents by putting a blank piece of paper between each document and the next.

file:///C|/VisioneerDoc/html/06save.htm (6 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

2. Process the pages as you would normally.

3. When you save the document, choose Save As from the File menu.

The Save As dialog box appears.

4. Click the Options button.

The Options dialog box appears.

5. Select the Split on Blank Pages option and click OK.

Pro OCR saves each stack of pages up to a blank page as a separate file, using the name you specified followed by a sequential numeric identifier, followed by the appropriate extension. For example, if you name the current document BOOK, and then save it to Excel 2.x format with the Split on Blank Pages option selected, Pro OCR will save the first file (up to the first blank page) as BOOK001.XLS, the next file as BOOK002.XLS, and so on.

6. Click OK.

To save multiple single-page documents as separate files using the one page option:

1. Process the pages as you usually would.

2. When you save the document, choose Save As from the File menu.

The Save As dialog box appears.

3. Click the Options button.

The Options dialog box appears.

4. Select the One Page Per File option and click OK.

Pro OCR saves each page image as a separate file, using the name you specified followed by a sequential numeric identifier, followed by the appropriate extension. For example, if you name the current document PAGES, and then save it to RTF format with the “One Page Per File” option selected, Pro OCR will save the first page image as PAGES001.RTF, the next page image as PAGES002.RTF, and so on.

file:///C|/VisioneerDoc/html/06save.htm (7 of 18) [1/20/2003 4:21:18 PM]

Saving and Printing Documents

5. Click OK.

Saving Multiple Page Images as Separate Image Files

In addition to Pro OCR format, you can save a document in a number of image output formats. Usually, you’ll save a copy of your document in one of these graphic formats when the document you’re processing has illustrations that you want to save and use in other applications. Because many image-processing programs cannot process multipage image files, you’ll probably want to save multipage documents one image per file.

To save multiple pages as separate image files:

1. Process the pages as you usually would.

2. When you save the document, choose Save As from the File menu.

The Save As dialog box appears.

3. Click the Options button.

The Options dialog box appears.

4. Select the One Page Per File and click OK.

When you name the file, choose a file name of up to five characters. If the file name is longer, Pro OCR truncates it to five characters.

Pro OCR saves each page image as a separate file, using the name you specified followed by a sequential numeric identifier, followed by the appropriate extension. For example, if you name the current document IMAGE, and then save it to PCX format with “One Page Per File” selected, Pro OCR will save the first page image as IMAGE001.PCX, the next page image as IMAGE002.PCX, and so on.

Saving Templates

Save a template when you’ve defined locate regions that can be applied to other page images. A template may be used to identify the locate regions on all pages to be recognized. Or, you can use different templates to identify locate regions on different pages.

file:///C|/VisioneerDoc/html/06save.htm (8 of 18) [1/20/2003 4:21:18 PM]

Visioneer PROOCR100 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Pro OCR User’s Guide

Introducing Visioneer Pro OCR 100

file:///C|/VisioneerDoc/html/copyrt.htm

Table of Contents

Table of Contents

Glossary

file:///C|/VisioneerDoc/html/glossary.htm

Table of Contents

Learning Pro OCR Basics

Getting Documents

Saving and Printing Documents