Chapter 5:
Setting
Recognize
Options and
Proofing a
Pro OCR User’s
Guide
Chapter 1
Introducing Visioneer Pro
OCR 100
This chapter introduces you to the Pro OCR
application and to the concept of optical character
recognition (OCR).
Why Pro OCR
Pro OCR is an Optical Character Recognition (OCR)
application. An OCR application converts images of
text, such as those obtained from scanning a
document or receiving a fax through your faxmodem, into editable text. For example, when a
scanner scans a page of text, it sees black and white
areas on the page. The scanner converts what it sees
into an image and stores the image on the computer.
To transform a scanned text image into something a
word processing or spreadsheet application can
recognize as characters, you need an OCR (optical
file:///C|/VisioneerDoc/html/ug_main.htm (1 of 3) [1/20/2003 4:21:10 PM]
Pro OCR User’s Guide
Recognized
Document
Chapter 6:
Saving and
Printing
Documents
Chapter 7:
Creating and
Processing
Deferred and
Batch Jobs
Chapter 8: Tips
for Getting the
Best Results
Glossary
character recognition) application, such as Pro OCR.
Every day you may spend a lot of time retyping
printed text or numbers from hard copy documents.
By using Pro OCR and a scanner as an input device,
you can eliminate much of this retyping.
Features and Highlights of Pro OCR
Many of the existing OCR products are typically
capable of recognizing 200–300 plain, nonstylized
typefaces. Using recognition technology, Pro OCR
can recognize over 2,000 typefaces.
Most basic OCR applications inspect the scanned
page image, attempt to recognize the dots on the page
as characters, and transform the image into a plain
text file. Pro OCR does all of these basic tasks, but it
can also get the entire page into your word processor
or spreadsheet as is—retaining the shape, form, type,
and spacing, as well as the content, of the input page.
Pro OCR provides:
■The ability to read one or more pages of
text including graphics. Pro OCR reads
pages directly from your scanner, or it reads
TIFF, PCX, and DCX files. Pro OCR can
automatically locate pictures and embed them
in your document. You can also export
pictures separately in a number of file formats.
■Speed and accuracy of recognition. With
most documents, Pro OCR is faster than, and
as accurate as a good typist.
■Numeric regions. You can specify that a
given region on a page can contain only
numbers. Numeric regions help Pro OCR
make sure that numbers are always recognized
as numbers and never mistakenly identified as
file:///C|/VisioneerDoc/html/ug_main.htm (2 of 3) [1/20/2003 4:21:10 PM]
Pro OCR User’s Guide
letters.
■Recognition and retention of fonts,
characters, styles, and page formatting. Pro
OCR recognizes and retains the differences
between serif and sans-serif fonts, styles such
as bold, underline, and subscript, and
formatting such as columns, tables, and
indents.
■Deferred and batch processing. You can
perform procedures that need your attention or
interaction (for example, locating), and then
do the time consuming steps that don’t need
interaction (for example, recognizing) at
another time.
■Internet readiness. supports HTML export
format. You can convert an image file directly
to an HTML page and upload it to the Web
site.
■Proofing options. Pro OCR has a number of
proofing options. You can also send
recognized text directly to your word
processor.
■Save features. With Pro OCR you can save
recognized text in a wide variety of word
processor and spreadsheet file formats. Pro
OCR works with imperfect input pages that
may have skewed lines of text, touching or
broken characters, and fuzzy characters.
file:///C|/VisioneerDoc/html/ug_main.htm (3 of 3) [1/20/2003 4:21:10 PM]
Introducing Visioneer Pro OCR 100
Pro OCR User’s Guide
Chapter 1
Introducing Visioneer Pro OCR 100
This chapter introduces you to the Pro OCR application and to the concept of
optical character recognition (OCR).
Why Pro OCR
Pro OCR is an Optical Character Recognition (OCR) application. An OCR
application converts images of text, such as those obtained from scanning a
document or receiving a fax through your fax-modem, into editable text. For
example, when a scanner scans a page of text, it sees black and white areas on the
page. The scanner converts what it sees into an image and stores the image on the
computer. To transform a scanned text image into something a word processing or
spreadsheet application can recognize as characters, you need an OCR (optical
character recognition) application, such as Pro OCR.
Every day you may spend a lot of time retyping printed text or numbers from hard
copy documents. By using Pro OCR and a scanner as an input device, you can
eliminate much of this retyping.
Features and Highlights of Pro OCR
Many of the existing OCR products are typically capable of recognizing 200–300
plain, nonstylized typefaces. Using recognition technology, Pro OCR can recognize
over 2,000 typefaces.
file:///C|/VisioneerDoc/html/01intro.htm (1 of 2) [1/20/2003 4:21:10 PM]
Introducing Visioneer Pro OCR 100
Most basic OCR applications inspect the scanned page image, attempt to recognize
the dots on the page as characters, and transform the image into a plain text file. Pro
OCR does all of these basic tasks, but it can also get the entire page into your word
processor or spreadsheet as is—retaining the shape, form, type, and spacing, as well
as the content, of the input page. Pro OCR provides:
■The ability to read one or more pages of text including graphics. Pro
OCR reads pages directly from your scanner, or it reads TIFF, PCX, and
DCX files. Pro OCR can automatically locate pictures and embed them in
your document. You can also export pictures separately in a number of file
formats.
■Speed and accuracy of recognition. With most documents, Pro OCR is
faster than, and as accurate as a good typist.
■Numeric regions. You can specify that a given region on a page can contain
only numbers. Numeric regions help Pro OCR make sure that numbers are
always recognized as numbers and never mistakenly identified as letters.
■Recognition and retention of fonts, characters, styles, and page
formatting. Pro OCR recognizes and retains the differences between serif
and sans-serif fonts, styles such as bold, underline, and subscript, and
formatting such as columns, tables, and indents.
■Deferred and batch processing. You can perform procedures that need
your attention or interaction (for example, locating), and then do the time
consuming steps that don’t need interaction (for example, recognizing) at
another time.
■Internet readiness. supports HTML export format. You can convert an
image file directly to an HTML page and upload it to the Web site.
■Proofing options. Pro OCR has a number of proofing options. You can also
send recognized text directly to your word processor.
■Save features. With Pro OCR you can save recognized text in a wide
variety of word processor and spreadsheet file formats. Pro OCR works with
imperfect input pages that may have skewed lines of text, touching or broken
characters, and fuzzy characters.
Reproduction, adaptation, or translation without prior written permission is
prohibited, except as allowed under the copyright laws.
AnyPort, AutoFix, AutoLaunch, FormTyper, MicroChrome, PaperEnable,
PaperLaunch, PaperPort, PaperPort Deluxe, PaperPort ix, PaperPort Links,
PaperPort mx, PaperPort PowerBar, PaperPort 3000, PaperPort 6000, PaperPort vx,
PaperPortation, PaperPort Strobe, Pro OCR, ScanDirect, SimpleSearch, SharpPage,
and Visioneer are trademarks of Visioneer, Inc. PaperPort, Paper-driven, and the
Visioneer logo are registered trademarks of Visioneer, Inc.
Information is subject to change without notice and does not represent a
commitment on the part of Visioneer, Inc. The software described is furnished
under a licensing agreement. The software may be used or copied only in
accordance with the terms of such an agreement. It is against the law to copy the
software on any medium except as specifically allowed in the licensing agreement.
No part of this document may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, recording, or information
storage and retrieval systems, or translated to another language, for any purpose
other than the licensee’s personal use and as specifically allowed in the licensing
agreement, without the express written permission of Visioneer, Inc.
Part Number: 05-0340-000
Restricted Rights Legend
Use, duplication, or disclosure is subject to restrictions as set forth in contract
subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software
Clause 52.227-FAR14. Material scanned by this product may be protected by
governmental laws and other regulations, such as copyright laws. The customer is
solely responsible for complying with all such laws and regulations.
file:///C|/VisioneerDoc/html/copyrt.htm (1 of 3) [1/20/2003 4:21:10 PM]
file:///C|/VisioneerDoc/html/copyrt.htm
Visioneer’s Limited Product Warranty
If you find physical defects in the materials or the workmanship used in making the
product described in this document, Visioneer will repair, or at its option, replace,
the product at no charge to you, provided you return it (postage prepaid, with proof
of your purchase from the original reseller) during the 12-month period after the
date of your original purchase of the product.
THIS IS VISIONEER’S ONLY WARRANTY AND YOUR EXCLUSIVE
REMEDY CONCERNING THE PRODUCT, ALL OTHER
REPRESENTATIONS, WARRANTIES OR CONDITIONS, EXPRESS OR
IMPLIED, WRITTEN OR ORAL, INCLUDING ANY WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT, ARE EXPRESSLY EXCLUDED. AS A RESULT, EXCEPT
AS SET OUT ABOVE, THE PRODUCT IS SOLD “AS IS” AND YOU ARE
ASSUMING THE ENTIRE RISK AS TO THE PRODUCT’S SUITABILITY TO
YOUR NEEDS, ITS QUALITY AND ITS PERFORMANCE,
IN NO EVENT WILL VISIONEER BE LIABLE FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING
FROM ANY DEFECT IN THE PRODUCT OR FROM ITS USE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
All exclusions and limitations in this warranty are made only to the extent permitted
by applicable law and shall be of no effect to the extent in conflict with the express
requirements of applicable law.
FCC Radio Frequency Interference Statement
This equipment has been tested and found to comply with the limits for the class B
digital device, pursuant to part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against interference in a residential installation. This
equipment generates, uses and can radiate radio frequency energy and if not
installed, and used in accordance with the instructions, may cause harmful
interference to radio communications. However, there is no guarantee that
interference will not occur in a particular installation. If this equipment does cause
harmful interference to radio or television reception, which can be determined by
turning the equirpment off and on, the user is encouraged to try and correct the
interference by one or more of the following measures:
■Reorient or relocate the recemng antenna.
file:///C|/VisioneerDoc/html/copyrt.htm (2 of 3) [1/20/2003 4:21:10 PM]
file:///C|/VisioneerDoc/html/copyrt.htm
■ Increase the separation between the equipment and receiver.
■ Connect the equipment into an outlet on a circuit different from that to
which the receiver is connected.
■Consult the dealer or an experienced radio/TV technician for help.
This equipment has been certified to comply with the limits for a class B computing
device, pursuant to FCC Rules. In order to maintain compliance with FCC
regulations, shielded cables must be used with this equipment. Operation with nonapproved equipment or unshielded cables is likely to result in interference to radio
and TV reception. The user is cautioned that changes and modifications made to the
equipment without the approval of manufacturer could void the user's authority to
operate this equipment.
This device complies with part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) This device may not cause harmful interference, and
(2) this device must accept any interference received, including interference that
may cause undesired operation.
Back to Pro OCR User’s Guide.
file:///C|/VisioneerDoc/html/copyrt.htm (3 of 3) [1/20/2003 4:21:10 PM]
Table of Contents
Contents
Chapter 1: Introducing Visioneer Pro OCR 100
Chapter 2: Learning Pro OCR Basics
Chapter 3: Getting Documents
Chapter 4: Locating Text and Graphics
Chapter 5: Setting Recognize Options and Proofing a Recognized
Document
Chapter 6: Saving and Printing Documents
Chapter 7: Creating and Processing Deferred and Batch Jobs
file:///C|/VisioneerDoc/html/glos.htm (1 of 9) [1/20/2003 4:21:11 PM]
Glossary
built-in dictionary
CCITT
character
character format
character identification error
character image
character recognition
character style
clipboard
column information
compression
confidence
consistent document
copyrighted document
deferred job
deferred processing
degraded image
dialog box
desktop
document area
dots per inch (dpi)
file:///C|/VisioneerDoc/html/glos.htm (2 of 9) [1/20/2003 4:21:11 PM]
Glossary
dpi
draft quality text
driver
exporting
export format
file extension
file formats
file type
fine resolution
flatbed scanner
font
font family
font mapping
format retention
Gallery
Get Page
grayscale image
hard page breaks
heavy character
I-beam pointer
file:///C|/VisioneerDoc/html/glos.htm (3 of 9) [1/20/2003 4:21:11 PM]
Glossary
icon
illegible character
illegible character symbol
image view
input file formats
insertion point
italic text
justification
kerning
landscape orientation
layout
layout analysis error
Legal page size
Lenient suspect threshold
letter quality text
line break
Locate
locate region
locating
locating method
menu
file:///C|/VisioneerDoc/html/glos.htm (4 of 9) [1/20/2003 4:21:11 PM]
Glossary
menu bar
multi-column text
monospaced font
monospaced font mapping
newspaper style columns
Normal locating method
Normal suspect threshold
numeric region
OCR
On-Screen Verifier™
Optical Character Recognition (OCR)
order of text regions
orientation
output file formats
page controls
page format
page image
page number box
page orientation
page size
file:///C|/VisioneerDoc/html/glos.htm (5 of 9) [1/20/2003 4:21:11 PM]
Glossary
page source
PCX
picture element
picture region
pixel
pixel-for-pixel
plain text
portrait orientation
printer font
Pro OCR Deferred format
Pro OCR format
Pro OCR process
Pro OCR window
Proof
proportionally spaced font
recognition accuracy
Recognize
recognized text
recognizing
region style
resolution
file:///C|/VisioneerDoc/html/glos.htm (6 of 9) [1/20/2003 4:21:11 PM]
Glossary
Rich Text Format (RTF)
RTF
sans serif
sans serif font mapping
scanner
scanner driver
scanning
screen font
scroll bars
serif
serif font mapping
settings file
sheetfed scanner
side-by-side columns
single-bit image
single-step processing
skewed text
spell checking
standard resolution
status bar
file:///C|/VisioneerDoc/html/glos.htm (7 of 9) [1/20/2003 4:21:11 PM]
Glossary
status display area
Stringent suspect threshold
stroke weight
Style ribbon
stylized font
subscript text
superscript text
supplementary dictionaries
suspect character
suspect threshold
Tag Image File Format
template
template matching
Template locating method
text quality
text region
text style
text view
throughput
TIFF
touching characters
file:///C|/VisioneerDoc/html/glos.htm (8 of 9) [1/20/2003 4:21:11 PM]
Glossary
typeface
type quality
type size
type style
underline text
User Defined page size
user dictionary
view selector
window
Windows
word wrap
zoom controls
file:///C|/VisioneerDoc/html/glos.htm (9 of 9) [1/20/2003 4:21:11 PM]
file:///C|/VisioneerDoc/html/glossary.htm
Glossary
A4 Letter page size An A4 size page measures 8.33" x 11.66".
accelerator key In Windows applications, a keyboard shortcut to a menu
command.
ADF See automatic document feeder (ADF).
alphanumeric word A word made up of the alphabetic and numeric characters
(A–Z, a–z, 0–9) in a character set. Excludes punctuation and
other symbol characters.
ASCII Acronym for American Standard Code for Information
Interchange (pronounced “ASK-ee”). A standard that assigns
a unique binary number to each text and control character.
ASCII code is used for representing text inside a computer
and for transmitting text between computers or between a
computer and a peripheral device.
As Single Column locating method One of Pro OCR’s three locating methods. Use it when you
want Pro OCR to read a page as a single column, from left
margin to right margin, ignoring any column or paragraph
spacing. Most commonly used for pages in which there is no
clear column or paragraph structure.
Auto OCR Clicking this button starts automatic processing, which uses
Get Page, Locate, and Recognize according to the current
gallery settings.
Auto brightness A feature of some scanners, by which brightness is adjusted
automatically while the page is scanned.
automatic document feeder (ADF) Built-in or optional equipment for a scanner that lets you
automatically scan stacks of pages instead of having to place
them one at a time on the flatbed. Sometimes it’s difficult to
control the proper alignment of pages using an automatic
document feeder. Compare with
flatbed scanner and
sheetfed scanner.
file:///C|/VisioneerDoc/html/glossary.htm (1 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
automatic processing A method for using Pro OCR with minimal intervention.
Automatic processing involves setting appropriate Gallery
settings, before using Auto Start to read in one or more image
files or scan in one or more pages. Once page images have
been acquired, automatic processing Locates and Recognizes
each page image in succession. Automatic processing is best
suited to documents that require the same Gallery settings
(Page Size, Brightness, Locate method, etc.). Compare with
single-step processing.
background noise Non-character or non-graphic information in a page image
that adversely affects optical recognition. Background noise
includes the shading that results from scanning colored paper
stock, extraneous marks, dirt or ink bleed. Problems with
background noise can be reduced by using the brightness
setting in Pro OCR to compensate for the type of noise on the
page.
backup (n.) A copy of a disk or of a file on a disk. It’s a good idea to
make backups of all your important disks and to use the
copies for everyday work, keeping the originals in a safe
place.
backwards compatible The ability of an application to open files created with earlier
versions of that application.
bit image A collection of bits in memory that represents a two-
dimensional surface. For example, the screen is a visible bit
image.
bitmap 1. A set of bits that represents the graphic image of an
original document in memory.
2. A set of bits that represents the positions and states of a
corresponding set of items, such as pixels. Used by the
computer to construct graphic images and fonts. See also bit
image.
file:///C|/VisioneerDoc/html/glossary.htm (2 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
bitmapped character A character image made up of a pattern of dots that exists in a
computer file or in memory as a bitmap. Bitmapped
characters cannot be interpreted by a computer. In order for a
computer to use bitmapped characters in a word processor or
spreadsheet, the characters must first be interpreted by an
OCR application and translated into ASCII text.
bold text Text with the bold attribute looks like this. See also text
style.
brightness The relative amount of light or darkness reflected from an
image. A scanner’s brightness control is used in Pro OCR to
adjust for pages that are either too light or too dark.
broken character A character with one or more missing pieces, such as a
missing serif, stem, or cross bar. For example, a broken lower
case ‘e’ might not have a fully closed loop, which could
cause it to be misrecognized. Problems with broken
characters can be reduced by using the brightness setting in
Pro OCR to darken the image when scanning. Compare with
heavy character and touching characters.
built-in dictionary The dictionary that Pro OCR automatically loads and uses
whenever Recognize is done. The built-in dictionary is used
to enhance Pro OCR’s recognition accuracy and also to find
misspelled words in the document. Compare with
supplementary dictionaries and user dictionary.
CCITT Abbreviation for Consultative Committee on International
Telegraphy and Telephony; an international committee that
sets standards and makes recommendations for international
communication. One of the standards set by CCITT is for the
compression of image files. Pro OCR employs CCITTstandard compression methods. See also
compression and
TIFF.
character Any symbol that has a widely understood meaning and thus
can convey information, including alphabetic, numeric,
symbolic, and punctuation elements.
file:///C|/VisioneerDoc/html/glossary.htm (3 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
character format Font and style information applied to characters. Character
format information includes the font name and type size, as
attributes such as underline, bold, italic, or some combination
of these properties. Compare with page format.
character identification error An incorrectly recognized bitmapped character. There are
two kinds of character identification errors—substitutions
and rejects. A character substitution occurs when a character
is incorrectly recognized as another. A reject character results
from the inability of the OCR application to interpret a
character image with sufficient confidence. In such cases,
recognition is not attempted and the character is flagged as
illegible. Compare with
layout analysis error.
character image An arrangement of bits that defines a character in a font.
character recognition The OCR process in which bitmapped character images are
interpreted and translated into ASCII computer codes.
character style See
type style.
clipboard In Windows applications, temporary storage for text that is
cut or copied from a document. Text saved in the clipboard
may be pasted back into the same or another document.
column information Part of Pro OCR’s page format information. Column
information includes the location of the column on the page,
the width of the column, and its left and right margins.
compression Electronic method for reducing the size of a file without
losing any information in the file. Compressed TIFF files
take up significantly less disk space than uncompressed files.
See also
TIFF and CCITT.
confidence In Pro OCR, a measure of the certainty of an unknown
character’s identity. Above a certain confidence level, a
character is automatically recognized. At lower confidence
levels, a character may either be recognized, but flagged as a
suspect character, or not recognized and flagged as an
illegible character.
file:///C|/VisioneerDoc/html/glossary.htm (4 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
consistent document A set of pages or image files where the same Gallery settings
apply to each page in the document. Pro OCR’s Auto Start
feature can be used to best effect when a document is
consistent.
copyrighted document Most published or printed materials and documents are
copyrighted. It is illegal to use a computer and Pro OCR to
copy, store, or reproduce, on paper or electronically, any
copyrighted documents without the permission of the
copyright holder.
deferred job A file that contains one or more partially processed pages for
Pro OCR to finish processing later on. See also Pro OCR
Deferred format and deferred processing.
deferred processing Provides the ability to individually specify Get Page, Locate,
and recognize settings for particular pages when necessary,
while still being able to automatically process a job at a later
time.
degraded image An image that contains broken characters, touching
characters and/or background noise. See
broken character,
touching characters and background noise.
dialog box In Windows applications, the standard pop-up box that is
displayed to communicate with the user when a command
requires some further action. Some dialog boxes are
informational.
desktop Your working environment on the computer—the menu bar
and the background area on the screen. You can have a
number of documents or windows on the desktop open at the
same time.
document area The main part of the application window in Pro OCR. The
document area shows one page of the current document at a
time using the selected View Size setting.
file:///C|/VisioneerDoc/html/glossary.htm (5 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
dots per inch (dpi) A measure of the visual resolution of a display or output
device. Monitor screens typically have resolutions in the
range of 70 to 75 dpi. Most common laser printers have a
resolution of 300 dpi. The lower the resolution of a page in
dots per inch, the lower the visual quality of characters on
that page. Pro OCR can quickly and accurately recognize
characters scanned in at resolutions down to 200 dpi.
dpi See
dots per inch (dpi).
draft quality text On 9-pin dot matrix printers, the low resolution printing
option. Draft quality text is monospaced and made up of
visible dots that do not touch. In Pro OCR, click the Draft
Quality button in the Recognize section of the Gallery, to
improve recognition on draft quality dot matrix text.
Compare with
letter quality text.
driver See
scanner driver.
exporting Saving a document in an external format, such as a word
processor, spreadsheet, text or standard image file. An
exported document is created for use outside of Pro OCR.
export format Pro OCR can save and export documents in a variety of
specific word processor and spreadsheet formats. The
specific export format is specified in the Save As dialog box.
file extension In the MS-DOS operating system, file names conventionally
consist of a base and a file extension, for example
SAMPLE.TXT. In this example, “SAMPLE” is the base, and
the file extension is “.TXT”. File extensions are used to
identify the type of file. In this example, the file extension
indicates that this is a text (ASCII) file.
file formats See
input file formats and output file formats.
file type Different applications create different file types. Some file
types are application-specific. Other file types are generic.
The file type indicates what kind of information is contained
in the file and what format the information is in. Most
applications can only open files of certain file types.
file:///C|/VisioneerDoc/html/glossary.htm (6 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
fine resolution A term associated with FAX modems, referring to the highest
resolution of the image files typically produced by these
devices. Fine resolution is approximately 200 x 200 dpi,
which is adequate for reliable recognition.
flatbed scanner Scanner with a glass plate on which pages are placed face
down. Although such scanners can only read one page at a
time, they can support a variety of paper sizes and it’s easier
to control the proper alignment of a page. Compare with
automatic document feeder (ADF) and sheetfed scanner.
font All characters (letters, numbers, and symbols) in one size and
style of a font family. 12 point Helvetica Bold Italic is a font.
“Font” is sometimes incorrectly used instead of “font family”
or “typeface.” See also
font family and typeface.
font family The complete set of variations of a particular typeface. For
example, Helvetica is a font family. It contains a variety of
typefaces including, for example, Helvetica, Helvetica Bold,
Helvetica Italic, Helvetica Bold Italic. See also
font and
typeface.
font mapping Set in the Display Options dialog box. Tells Pro OCR which
fonts to use to display recognized text. Also specifies which
fonts to use in documents that are exported to Windowsbased word processors.
format retention The ability to retain the layout of a page, including margins,
paragraph and column widths, and tabs and indents. Pro OCR
preserves as much page format information as export formats
support.
Gallery The Pro OCR toolbar. All settings for the Get Page, Locate,
and Recognize stages of the Pro OCR process are set in the
Gallery. Common Pro OCR processes—Auto Start, and
single-step Get Page, Locate, and Recognize—can be
initiated from the Gallery.
Get Page Single-step Gallery function. It is also the first stage of the
Pro OCR process. Scans one page from a scanner or reads
one file, using the current Get Page settings.
file:///C|/VisioneerDoc/html/glossary.htm (7 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
grayscale image An image format where individual pixels can be expressed
with more than a single bit, allowing the image to contain
true shades of gray. Pro OCR will not open grayscale images.
Compare with single-bit image.
hard page breaks Special formatting that you put in manually in a text or word
processor document. Most word processors and text editors
automatically create soft page breaks unless you explicitly
specify hard page breaks. In Pro OCR, you can force the
output application to preserve the page breaks of the input
document by clicking the “Insert Hard Page Breaks”
checkbox when you are in the Save As Options dialog box.
heavy character In Pro OCR, a character that is printed too dark or thick, so
that the representation obscures detail and reduces confidence
in the identity of that character.
I-beam pointer A mouse pointer shape that resembles an upper-case “I”.
When the pointer has this shape, you can select text. See also
insertion point.
icon An image that graphically represents an object, a concept, or
a message. Screen icons can represent disks, documents,
application programs, or other things you can select and
open. In an application such as Pro OCR, icons are also used
to represent various settings in the gallery, Style ribbon, and
Status bar.
illegible character A character that Pro OCR cannot recognize with adequate
certainty. Illegible characters in a document are highlighted
and displayed with the specified illegible character symbol in
the text view. See also
suspect character.
illegible character symbol The symbol Pro OCR uses to display illegible characters in
the text view. Set in the Display Preferences dialog box. See
also
illegible character.
image view The view that displays the bitmapped image of a page. Used
to locate regions of text or graphics, and for viewing the
original scanned image of a page during proofing and editing.
file:///C|/VisioneerDoc/html/glossary.htm (8 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
input file formats Pro OCR can read documents saved by other applications in
TIFF, PCX and DCX formats, as well as those documents
saved in its own proprietary TIFF format. See also
PCX and
TIFF.
insertion point The place in a text file where text is inserted or deleted.
Indicated by a blinking vertical bar.
italic text Text with the italic attribute looks like this. See also
text
style.
justification Alignment of text to the left, right, or both margins of a
column or page. Text may be left-justified, right-justified,
center-justified, or fully justified (both left- and rightjustified). Pro OCR preserves justification.
kerning A measure of the spacing between characters. In tightly
kerned text, the letters are very close together, which can
cause letters to touch when the page is scanned. See also
touching characters.
landscape orientation When you hold a page of text to read it, it is in landscape
orientation when the page is wider than it is tall. Compare
with
portrait orientation.
layout The relative position of elements on a page, such as margins,
columns, graphics, titles and sections.
layout analysis error The result of an OCR product’s inability to correctly organize
recognized text into words, lines and paragraphs on the page.
There are two kinds of common layout analysis errors—
incorrectly interpreting the flow of text on a page and
incorrectly grouping or separating side-by-side paragraphs.
Layout analysis errors can be more troublesome than
character identification errors, particularly with documents
having complex layouts. Compare with
character
identification error.
Legal page size See
page size.
file:///C|/VisioneerDoc/html/glossary.htm (9 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
Lenient suspect threshold Tells Pro OCR to only highlight suspect characters it is very
uncertain of. Very few characters are marked as suspect,
compared to when the suspect threshold is set to normal or
stringent. Use it when you’re dealing with documents
containing fonts that you know from experience have been
recognized accurately or when you’re less concerned with
double-checking. Set in the Display Options dialog box.
Compare with
Normal suspect threshold and Stringent
suspect threshold.
letter quality text Text made up of characters that are fully formed with dots
that are touching. Compare with
draft quality text.
line break The point at the edge of a line of text where the text flows
onto the next line.
Locate Single-step Gallery function. It is also the second stage of the
Pro OCR process. Specifies which text will be recognized on
a page by creating or applying locate regions on the page
according to the current Locate and Pictures settings. The
current Locate setting may be either Normal, As Single
Column, or Template. The current Pictures setting may be
Locate Text and Pictures or Locate Text Only.
locate region Defines an area on the page image in the image view and the
text view. The text and picture kinds of locate regions may be
defined automatically or manually. All three types of locate
regions may be manually defined using the locate region
drawing feature, or may be recalled using the Template
locating method. See also
text region, numeric region,
picture region, and Template locating method.
locating The process in Pro OCR for specifying which locate regions
will be recognized on a page by creating or applying locate
regions on the page.
locating method Tells Pro OCR how to locate regions for processing on a
page. The three locating methods are Normal, As Single
Column, and Template. See also
Normal locating method,
As Single Column locating method, and Template locating
method.
file:///C|/VisioneerDoc/html/glossary.htm (10 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
menu A list of choices from which the user can choose. Menus
appear when you point to and click a menu title in the menu
bar, or a pop-up menu title in a window or dialog box.
menu bar The horizontal strip at the top of a window that contains
menu titles.
multi-column text Text that is formatted into more than one column on a single
page. Examples include phone books and newspapers.
monospaced font Also known as a fixed pitch font. A typeface, such as
Courier, in which each character takes up the same amount of
horizontal space. The output from most typewriters is
monospaced. Compare with proportionally spaced font.
monospaced font mapping The font chosen for displaying monospaced text characters in
text views. Set in the Display Options dialog box. Compare
with
sans serif font mapping and serif font mapping.
newspaper style columns Also known as “snaked” or winding columns. A column
format where the text flows down the vertical length of the
column before moving to the top of the next column. As the
name suggests, this type of column is commonly found in
newspaper and magazine articles. This glossary is formatted
in newspaper style columns. The flow of text in newspaper
style columns is best suited for the Normal locate setting in
Pro OCR.
Normal locating method One of Pro OCR’s three locating methods. Use it for most
kinds of input, including many tables and forms. Creates text
regions based on column or paragraph spacing. Compare with
As Single Column locating method and Template locating
method.
Normal suspect threshold Tells Pro OCR to highlight suspect characters that it is
somewhat uncertain of. More characters are marked as
suspect than when a lenient suspect threshold is used. Use it
with clean, clear, typeset documents when most of the words
in the document are probably in the dictionaries. Set in the
Display Options dialog box. Compare with
Lenient suspect
threshold and Stringent suspect threshold.
file:///C|/VisioneerDoc/html/glossary.htm (11 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
numeric region Defines a numeric area on the page image in Image View and
Text View. Numeric regions may be defined using Pro
OCR’s manual region drawing feature, or may be recalled
using the Template locating method. Compare with text
region and picture region. See also Template locating
method.
OCR See
Optical Character Recognition (OCR).
On-Screen Verifier™ Pops up in the document area to display a section of the page
image corresponding to the current text selection in the text
view. The on-screen verifier is displayed automatically when
proofing, and can also be shown or hidden by choosing the
Show/Hide On-Screen Verifier command from the Edit
menu.
Optical Character Recognition (OCR) The process by which a computer converts scanned text
images into editable text characters.
order of text regions Shown by an arrow from the center of a text region to the top
center of the next text region, in Image View after Locating
has been done. Text is output to application files in the order
in which text regions are specified.
orientation Determines the angle or rotation of the page. Pro OCR allows
you to choose between portrait or landscape orientation. See
also
portrait orientation and landscape orientation.
output file formats Pro OCR can save documents in a variety of formats,
including ASCII, a multitude of export formats, the Pro OCR
format, and Pro OCR Deferred format. See also
export
format, Pro OCR format, and Pro OCR Deferred format.
page controls Contains the previous and next page arrows and the page
number box. Click the previous page arrow or the next page
arrow to move from page to page in a document. See also
page number box.
page format The layout of the page, including its margins, paragraph and
column widths, and tabs and indents. Pro OCR preserves
nearly all page format information. What page format
information is preserved in saved application files depends on
the application format.
file:///C|/VisioneerDoc/html/glossary.htm (12 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
page image The bitmapped image of a scanned page, displayed in the
image view in Pro OCR.
page number box Shows which page is being viewed and how many pages are
in the document. Double-click it to go to a specific page. See
also page controls.
page orientation See
orientation.
page size The width and height to use when getting a page from a
scanner within Pro OCR. There are three pre-defined page
sizes: US Letter, US Legal, and A4 Letter. There is also an
option for user-defined page sizes.
page source Pro OCR can get pages from a file or the selected scanner.
You can draw pages from either source at any time.
PCX A common graphic file format on MS-DOS computers. Some
scanners produce PCX files. Pro OCR can read single PCX
files produced by many scanners, fax cards, and graphics
applications. A variation of the PCX format is DCX—a multipage PCX file. Pro OCR can also read DCX files.
picture element See
pixel.
picture region Defines a picture area on the page image. Picture regions may
be defined manually or by using the Locate button with
“Locate Text and Pictures” selected.
pixel A single unit (or dot) of screen, printer or image resolution.
The number of pixels (or dots) per inch determines the
resolution of an image. Most scanners and laser printers offer
resolutions of at least 300 pixels (or dots) per inch.
pixel-for-pixel A large magnified image view (approximately 400%) of the
page. Lets you inspect the quality of the image. Each screen
pixel corresponds to one image pixel.
plain text Text with no special attributes or styling, such as bold, italic,
or underline.
file:///C|/VisioneerDoc/html/glossary.htm (13 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
portrait orientation When you hold a page of text to read it, it is in portrait
orientation when the page is taller than it is wide. Compare
with
landscape orientation.
printer font The representation of a font or typeface used for printing by a
printer. See also
font, font family, and typeface. Compare
with
screen font.
Pro OCR Deferred format One of Pro OCR’s output file formats. Saves a document
with the current state of Get Page, Locate, and Recognize for
every page. When the document is processed using Process
Deferred Job, the saved information is retained and only
those processes and pages that have not already been
specified are completed using the current Gallery settings.
Pro OCR format Pro OCR’s native/internal file format. The Pro OCR format is
a proprietary variation of the Group 4 TIFF format.
Documents at various stages of processing may be saved in
this format and opened later for additional processing.
Pro OCR process The five stage process that translates printed text or image
files into an output form suitable for use in other applications.
The five steps of the Pro OCR process are: Get Page, Locate,
Recognize, Proof/Edit and Save/Export.
Pro OCR window The main window for interacting with Pro OCR. Contains the
title bar, menu bar, gallery, scroll bars, Status bar, and
document area.
Proof The fourth stage in the Pro OCR process, where any suspect
and illegible characters or misspelled words can be examined
and corrected, if necessary. This command moves the
insertion point to the next piece of text in the text view,
according to the Proofing Options. The Proofing Options
configure Proof to view suspect or illegible characters,
misspelled words, punctuation, numbers, alphanumeric
words, or entire lines at a time. Use the Tab key as a
keyboard shortcut.
file:///C|/VisioneerDoc/html/glossary.htm (14 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
proportionally spaced font Also known as a variable pitch font. Typeface in which each
character takes up an amount of horizontal space consistent
with its relative physical width, i.e. an “i” needs less space
than a “w.” Times Roman and Helvetica are two common
proportionally spaced typefaces. Compare with monospaced
font.
recognition accuracy A measure of the degree to which OCR output conforms to
the individual characters in the input document. Recognition
accuracy is a percentage expression of the number of correct
character identifications in relation to the total number of
characters in the page or document. This measure is often
used as the primary criterion in evaluating OCR performance,
even though it does not account for layout analysis errors.
Compare with
throughput.
Recognize Single-step Gallery function. It is also the third stage of the
Pro OCR process. The process in Pro OCR in which
bitmapped text images are converted into editable text.
Recognizes text defined by the text regions on the current
page according to the current Recognize setting.
recognized text The initial result of OCR processing. Once an image has been
recognized, the resultant text can be proofed/edited and
exported to other applications.
recognizing The process in Pro OCR in which character images are
converted into digital computer character codes (ASCII
equivalents).
region style The type of a locate region, either text, numeric or picture.
See also
locate region, text region, numeric region, and
picture region.
resolution Density of pixels in an output device such as a screen display
or printer, or in an input device such as a scanner. Usually
specified in dots per inch. See also
dots per inch (dpi).
Rich Text Format (RTF) An output file format for word processors that preserves most
page format and font information. One of Pro OCR’s export
file formats. Many Windows-based word processors can read
files in RTF, although they have varying levels of support for
RTF.
file:///C|/VisioneerDoc/html/glossary.htm (15 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
RTF See Rich Text Format (RTF).
sans serif Designation for font families in which the characters do not
have serifs, which are the small strokes at the ends of
characters. Common sans serif font families include
Helvetica, Avant Garde, and Univers. Compare with serif.
sans serif font mapping The font chosen for displaying sans serif text characters in
text views. Set in the Display Options dialog box. Compare
with
serif font mapping and monospaced font mapping.
scanner A peripheral device that can convert (or digitize) the image of
a page into digital form for use by a computer. A scanner is
similar to a photocopier, but instead of producing a hard copy
result on paper it sends its results electronically over a cable
hooked up to a computer.
scanner driver The system file that identifies a scanner to the system. It
typically contains the I/O address of the scanner and specific
information about the scanner’s characteristics.
scanning The act of using a scanner to convert (or digitize) the image
of a page into digital form for use by a computer.
screen font The representation of a font or typeface used for display on a
screen. See also
font, font mapping, and typeface. Compare
with
printer font.
scroll bars A Pro OCR window contains two scroll bars—the vertical
scroll bar and the horizontal scroll bar—that enable you to
move around on a page beyond the screen boundaries, when
necessary.
serif The small decorative stroke at the ends of characters in some
typefaces. Also, the designation for font families in which the
characters have serifs. Common serif font families include
Times Roman, Palatino, and Garamond. Compare with
sans
serif.
serif font mapping The font chosen for displaying serif text characters in text
views. Set in the Display Preferences dialog box. Compare
with
sans serif font mapping and monospaced font
mapping.
file:///C|/VisioneerDoc/html/glossary.htm (16 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
settings file A file, saved by choosing Save Settings from the File menu,
that saves the current gallery, processing preferences, display
preferences, proofing preferences, and selected scanner
information in a named settings file. To use a settings file,
retrieve it by choosing Retrieve Settings from the File menu.
sheetfed scanner Scanner with an integral sheetfeeder, but no flatbed, on which
pages are placed and fed through the scanner. Although they
can scan multiple pages at a time, sheetfed scanners often
support only a small range of paper sizes and it’s difficult to
control the proper alignment of a page. Compare with flatbed
scanner and automatic document feeder (ADF).
side-by-side columns Also known as “bound” columns. A column format where the
text flows as in a table, left to right, by column groups. Sideby-side columns are commonly found in tables and
documents where the text reads left-to-right, then top to
bottom. The flow of text in side-by-side columns is best
suited for the As Single Column locate setting in Pro OCR.
single-bit image Also referred to as line art. An image format where individual
pixels are expressed as a single bit—either black or white.
Compare with
grayscale image.
single-step processing A method for using Pro OCR with maximum control over
individual pages. Single-step processing involves selecting
Gallery settings for individual pages in a document, and
manually launching Get Page, Locate and Recognize. Singlestep processing is best suited to documents that require
different Gallery settings (Page Size, Brightness, Locate
method, etc.) on different pages. Compare with
automatic
processing.
skewed text Text that is not horizontal in the page image. The most
common cause of skewed text is scanning a page in crooked.
Sometimes, text may be skewed on the input page. Pro OCR
can accurately recognize text skewed up to 2°. If text is
skewed more than that, Pro OCR may have difficulty in
properly locating text regions. Problems with skewed pages
(up to 15°) can be eliminated by selecting the Straighten
Skewed Images setting in the Processing Options dialog box.
file:///C|/VisioneerDoc/html/glossary.htm (17 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
spell checking Pro OCR automatically checks spelling during the Recognize
step using its built-in dictionary and the current user
dictionary. After Pro OCR finishes recognizing, you can
check the spelling in a document using the user-configured
Proof command.
standard resolution A term associated with FAX modems, referring to the default
resolution of the image files produced by these devices.
Standard resolution is approximately 200 x 100 dpi, which
may be insufficient for reliable recognition.
status bar The panel of controls located along the bottom edge of the
Pro OCR window. The status bar contains the view size
selector, page indicator, view selector, and status display
area.
status display area At the right end of the status bar. The status display area
shows the percentage of the current process that is completed.
After recognition this area shows the number of suspect and
illegible characters in the current page.
Stringent suspect threshold Tells Pro OCR to highlight all suspect characters. Use it
when accuracy is important and when there are many words
in the document that are not in the dictionaries. Set in the
Display Options dialog box. Compare with Lenient suspect
threshold and Normal suspect threshold.
stroke weight A measure of the average distance between the edges of the
lines in a character. Certain typefaces have heavier stroke
weights than others. A bold typeface has a heavier stroke
weight than a Roman typeface.
Style ribbon The panel located just beneath the Gallery inside of the Pro
OCR window. The Style ribbon makes it quicker and easier
to find and choose various style attributes for locate regions
and selected text. See also
region style and text style.
stylized font A font with exaggerated serifs and embellishments and/or
extraneous lines. Stylized fonts are a problem for the socalled omnifont (feature extraction) systems because these
fonts do not adhere to generic character format rules required
by omnifont technology. Zapf Chancery is a stylized font.
file:///C|/VisioneerDoc/html/glossary.htm (18 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
subscript text Text with the
subscript attribute is below the baseline like this.
superscript text
Text with the
superscript attribute is above the baseline like this.
supplementary dictionaries Optional dictionaries that can be used during spell checking
in Pro OCR. There are four supplementary dictionaries
included with Pro OCR: geographical, legal, medical, and an
expanded dictionary. Compare with built-in dictionary and
user dictionary.
suspect character A character that Pro OCR recognized with less than total
confidence. Suspect characters in a document are highlighted
in the text view. Compare with
illegible character. See also
suspect character.
suspect threshold Pro OCR has three thresholds for highlighting suspect
characters: Stringent, Normal, and Lenient. Each suspect
character has a confidence value associated with it. Setting
the suspect threshold determines the minimum confidence
value used to highlight suspect characters. A lenient threshold
displays only the suspect characters with the lowest
confidence values, while a stringent threshold displays all
suspect characters.
Tag Image File Format See
TIFF.
template A previously saved file that defines and applies the locate
regions on the pages of a document.
template matching An older OCR technology where the application is trained by
the user to recognize certain fonts by providing wholecharacter samples to be referenced against an unknown
character until a suitable match is found. In practice, limited
to recognizing a few specific fonts (typeface and point size).
Template locating method One of Pro OCR’s three locating methods. Use it to specify
preset locate regions. Compare with
Normal locating
method and As Single Column locating method.
text quality See
type quality.
file:///C|/VisioneerDoc/html/glossary.htm (19 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
text region Defines a text area on the page image in the image view and
the text view. Only text within defined text regions is
recognized. Text regions may be defined manually or by
using Pro OCR’s automatic locating settings.
text style A piece of text’s attributes or styling, such as bold, italic, or
underline. Use the Style menu or the style ribbon to set these
attributes. See also bold text, italic text, underline text, and
Style ribbon.
text view The view that displays the recognized text from the page
image. You can proof and edit recognized text in the text
view.
throughput A measure of the total time required to reproduce printed
documents. This effort measurement accounts for scanning
time, recognition accuracy, error correction and format
retention. Throughput is a more illuminating measure of OCR
effectiveness than the simplistic recognition accuracy
criterion commonly used to evaluate OCR performance.
Compare with
recognition accuracy.
TIFF (Tag Image File Format) Standard graphic file format for
saving high-resolution bitmapped images. Pro OCR can read
most single-bit TIFF files produced by many scanners and
applications. Pro OCR also saves to its own proprietary TIFF
format. See also
Pro OCR format.
touching characters Character elements of an image where the spacing of the
characters is insufficient to easily determine proper character
boundaries. For example, in a document with touching
characters, it may be difficult to differentiate between the
letter pair “rn” and the character “m.” Problems with
touching characters can be reduced by using the brightness
setting in Pro OCR to lighten the image.
typeface One style within a font family. For example, Helvetica Bold
Italic is a typeface. See also
font and font family.
type quality A quality of printed matter. Pro OCR offers a choice between
Letter Quality or Draft Quality. See also
letter quality text
and
draft quality text.
file:///C|/VisioneerDoc/html/glossary.htm (20 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
type size The vertical height measurement of type, commonly
expressed in points (72 points=1 inch). Pro OCR recognizes
and preserves type ranging in size from 5 points to 64 points.
type style The variations in characters, including font characteristics
such as bold and italic, and styling characteristics such as
underlining. Pro OCR recognizes and preserves many type
style characteristics.
underline text
Text with the underline attribute looks like this. See also text
style.
User Defined page size One of Pro OCR’s page size options when scanning. You
may set the page size from 1" x 1" up to the limits of your
scanner.
user dictionary A dictionary file that the user may add words to. It is used
along with the built-in dictionary to assist in recognition and
to mark possible misspelled words. Compare with
built-in
dictionary and supplementary dictionaries.
view selector The second set of controls from the left in the Status bar. Use
it to quickly change between the image view and the text
view. One of the two view icons is highlighted to indicate
which view you’re currently in.
window An area that displays information on a desktop; you view a
document through a window. You can open or close a
window, move it around on the desktop, and sometimes
change its size, scroll through it, and edit its contents.
Windows The application interface manufactured by Microsoft
Corporation that provides a graphical user interface (GUI)
based upon a desktop, windows, menus, and icons.
file:///C|/VisioneerDoc/html/glossary.htm (21 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
word wrap The automatic continuation of text from the end of one line to
the beginning of the next. Word wrap lets you avoid pressing
the Return key at the end of each line as you type. For
example, when you input text in most word processors, lines
of type are automatically “wrapped” to the next line when
they won’t fit within the current line margins. If you change
the margins, or the type size, or the spacing between words in
a document, lines are often re-wrapped. When you save
documents in any export format, text lines are wrapped in the
output file. When you save documents in ASCII format, you
can prevent lines from wrapping and preserve specific line
breaks by selecting the option to preserve line breaks in the
Save As Options dialog box.
zoom controls The first set of controls at the left end of the status bar. Use
them to easily change between magnification (zoom) levels.
file:///C|/VisioneerDoc/html/glossary.htm (22 of 22) [1/20/2003 4:21:13 PM]
Table of Contents
Contents
Chapter 2: Learning Pro OCR Basics
The Basic Steps
Starting Pro OCR
Selecting a TWAIN-Compliant Scanner
Learning About the Gallery Toolbar
Tutorial Examples
Example 1: Using Auto OCR to Scan a One-Page Simple Document and Save It in Pro OCR Format
Example 2: Opening a File and Saving It in a Word Processor Format
Example 3: Scanning a Document of Multi-Column Text
Example 4: Scanning a Document With Tables and Saving in a Spreadsheet Format
Example 5: Scanning and Saving a Document with Pictures
Example 6: Locating a Document Using a Template
Example 7: Scanning a Document with Mixed Tables and Manually Locating Regions
This chapter gets you started with Pro OCR. It introduces you to the Pro OCR
window features, tells you the basic steps that you use when you work with Pro OCR,
and provides several tutorial examples that you can complete to practice with Pro
OCR.
TIP: If you use PaperPort software or scanners, see the Working with PaperPort
document that came with Pro OCR. It provides tips and other information about using
Pro OCR with these Visioneer products.
The Basic Steps
When you use Pro OCR, you convert an image of text and save it an editable format.
To complete this conversion you perform the following basic steps:
1. Get Page—acquire pages either from a scanner or by opening an image file.
2. Locate—indicate which text on the page you want to recognize, and which
pictures (if any) to retain.
3. Recognize—convert the image to text.
4. Proof—check for incorrectly identified and unidentifiable characters and make
changes to recognized text.
5. Save—save the text to a variety of application formats.
Often, you automatically complete the first three steps by clicking the Auto OCR
button, however, you can perform each step individually. You can also use a
combination of automatic and individual processing by using deferred and finish
processing features.
file:///C|/VisioneerDoc/html/02learn.htm (1 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
Starting Pro OCR
The following procedure helps you to get acquainted with Pro OCR and make sure
that everything is set up correctly.
TIP: In addition to the following procedure, Visioneer provides two other ways to
start and use Pro OCR: 1) From the Windows Start menu, choose Programs, and then
choose Visioneer OCR Wizard. 2) If you use PaperPort software, start PaperPort and
then choose the Pro OCR link.
To start Pro OCR and select processing options:
1. From the Windows Start menu, choose Programs, and then choose
Visioneer. From the Visioneer menu, choose Visioneer Pro OCR 100.
The Pro OCR window appears. It includes pull-down menus, the Gallery
toolbar, the Style ribbon, and the Status bar.
file:///C|/VisioneerDoc/html/02learn.htm (2 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
Feature Does this...
Pull-down menus Contains commands and options that you use to set process
options and initiate actions. Many of the commands in the
pull-down menus are also available by using the Gallery
buttons and Gallery buttons drop-down lists.
Gallery toolbar Lets you change common settings, start Auto OCR, or
individually perform any of the basic steps required to
convert an image to text. Several Gallery buttons have
drop-down lists from which you can select options.
Style bar Makes it easy to choose various style attributes for selected
regions and text. The Region Type options are available in
image view and the Text Style options are available in text
view.
file:///C|/VisioneerDoc/html/02learn.htm (3 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
Status bar Contains controls with which you choose how to view
pages (text or image view) and which pages to view. The
Status bar also contains a status display area to keep you
informed of Pro OCR’s progress.
Zoom controls Magnifies or reduces the view of the document.
View controls Displays the page in a landscape or portrait orientation.
Page controls Displays the previous or next page.
Suspects or Illegibles Displays the number of suspect or illegible characters in
the document.
Selecting a TWAIN-Compliant Scanner
Before you scan an item with Pro OCR, make sure the scanner software is installed,
and the scanner can scan images into your computer. Pro OCR works with many
TWAIN-compliant devices. You can select the TWAIN device in the Pro OCR
software.
NOTE: If you are using Pro OCR with Visioneer’s PaperPort software or scanners,
see the Working with PaperPort document that came with Pro OCR, instead of the
following procedure. If you are using a scanner that is not TWAIN-compliant, you
cannot scan directly to Pro OCR. Instead, use your scanner’s software to save the
scanned file in a TIF format, and then use the Pro OCR Get File command. For more
information,
see “Getting Pages from an Image File” in Chapter 3.
To select a scanner:
1. Choose Select Scanner from the Tools menu.
The Select Source dialog box appears.
file:///C|/VisioneerDoc/html/02learn.htm (4 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
Figure 2-1: Select Source Dialog Box
NOTE: If the scanner driver you want is not shown, make sure that the
scanner is properly connected to the computer and that both the scanner and
the computer are plugged in, turned on, and operating correctly.
2. In the Select Source dialog box, select the TWAIN scanner driver you want to
use with Pro OCR.
3. Click Select.
The scanner you selected is available until you select a different one. You
don’t have to repeat this procedure unless you want to select a different
scanner.
Learning About the Gallery Toolbar
The Gallery toolbar contains buttons for starting the various steps of the Pro OCR
process, including the Auto OCR button. The buttons numbered one through four are
also important because you can select different options from drop-down lists before
processing a document. For example, you can tell Pro OCR whether the document is
one column or multiple columns. The options you select from these buttons affect the
way that Auto OCR processes a document.
file:///C|/VisioneerDoc/html/02learn.htm (5 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
NOTE: Often you will use Auto OCR to complete processing. However, sometimes it
is better to perform each step individually. (This is also referred to as manual or singlestep operation.) For example, you use the single-step procedures when you want to
manually define locate regions, create a template, redo a step, recognize different type
quality settings, or scan pages that have mixed orientations (portrait and landscape.)
Button Does this...
Auto OCR Performs Steps 1, 2, and 3 (Get, Locate, and
Recognize) of the OCR process. Before you click
this button, select processing options from the Get,
Locate, and Recognize drop-down lists.
Get Page Scans a page or opens an image file.
Locate Locates areas of text, pictures, and numbers and
determines how text flows on the page.
Recognize Converts areas of the page into editable text.
Proof Checks the document for errors.
file:///C|/VisioneerDoc/html/02learn.htm (6 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
Save As Saves the converted document in a variety of
formats, such as text, Rich Text Format (RTF), or
HTML.
You can select options with the Gallery buttons by using the drop-down list next to
each button.
To select an option from a Gallery drop-down list:
1. Click the arrow next to the Gallery button you want.
The drop-down list for the button appears. The following figure shows the
Locate button with the drop-down list displayed.
2. Select the option you want.
A checkmark appears next to the option you selected.
Tutorial Examples
Now that you know the basic steps you can practice them using the sample documents
that came with Pro OCR. The Pro OCR software comes configured and ready to use
so that you don’t have to change the various options. You can find copies of the pages
that you scan for the tutorials in the back of the Getting Started Guide. You can also
find sample files in the Pro OCR directory.
NOTE: If you don’t have a scanner, you can complete the following exercises that
require scanning, by instead using the Get File command and selecting the file from
the Pro OCR directory.
file:///C|/VisioneerDoc/html/02learn.htm (7 of 33) [1/20/2003 4:21:15 PM]
Learning Pro OCR Basics
Example 1: Using Auto OCR to Scan a One-Page Simple Document
and Save It in Pro OCR Format
This example shows how to convert (recognize) the text in a one-page document. You
can find a ready-to-use sample in the back of the Getting Started Guide.
Selecting Gallery Options
Pro OCR processes a document using the options that are set in each drop-down list
associated with a button of the Gallery toolbar.
To set Gallery options for this example:
1. From the Get Page drop-down list, choose Use Scanner.
2. From the Locate drop-down list, choose Locate Text Only and Single Columns Only.
3. From the Recognize drop-down list, choose Degraded or Fax Quality.
Starting Auto OCR
By clicking the Auto OCR button, you can perform the first three steps of the OCR
process, that is, Get Page, Locate, and Recognize.
To process a simple document without any graphics:
1. Remove Sample A from the back of the Getting Started Guide. The document
is a simple business letter.
2. Place the document on the scanner.
3. Click the Auto OCR button.
When you click Auto OCR, your scanner software dialog box appears.
4. Use the scanner software as you usually do to scan a page.
5. After the scanner has scanned the page, Pro OCR displays a dialog box that
asks if you want to scan another page.
file:///C|/VisioneerDoc/html/02learn.htm (8 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
6. Click End.
Pro OCR continues with the second task to locate text regions on the page.
A progress bar moves down the page. When Pro OCR finishes locating, it
displays text boxes indicating located text regions, with arrows connecting
each text region to the next. Pro OCR outputs text in the order in which the
arrows connect the text regions.
file:///C|/VisioneerDoc/html/02learn.htm (9 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
In the next step, Pro OCR recognizes the located text. While Pro OCR is
recognizing, again a progress bar moves down the page.
When Pro OCR finishes recognizing the text, the Recognition Completed
dialog box appears.
7. Click OK.
The document appears in the text view. You use the text view to proof the
document and correct any errors.
file:///C|/VisioneerDoc/html/02learn.htm (10 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Usually at this point you proof the document. For now, just save it.
Saving a Document
You can save the processed document to disk in different formats. For example, if you
want to open the document again in Pro OCR, you select the Pro OCR format.
To save the document:
file:///C|/VisioneerDoc/html/02learn.htm (11 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
1. Choose Save from the File menu, or click the Save As button on the Gallery
toolbar.
The Save As dialog box appears.
2. Choose Pro OCR from the Save As drop-down list.
By saving the document in this format, you can edit the pages later within Pro
file:///C|/VisioneerDoc/html/02learn.htm (12 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
OCR. If you save in another file format, you must open it in an application that
supports that format.
3. Type in a name for the file in the File Name box.
4. Click Save.
The text and format information of the document is saved in the format you’ve
selected.
5. Choose Close from the File menu.
You just completed your first job using Pro OCR. Many of the jobs for which
you use Pro OCR are as quick and simple as this one. You can now continue
by completing the rest of the examples in this guide.
Example 2: Opening a File and Saving It in a Word Processor Format
Instead of getting and processing a document from a scanner, you can also process a
file that was saved on disk. You can use this procedure to read TIFF, PCX, or DCX
files produced by Pro OCR or other applications.
Opening a File
For this example, use the file, SAMPLEB.TIF, in the Pro OCR directory. This is a
document that has a graphic. Because of the difference between this document and the
one used in the previous example, you will change the options in the Gallery toolbar.
Although this document has a graphic, let’s assume you don’t want to save the
graphic.
You can either set the options before each step or set them all at once. In this example,
you’ll set them as you go along.
To set the OCR options and get a file from disk:
1. Select Open File from the Get Page drop-down list.
2. Click the Get Page button in the Gallery toolbar.
The Get Page dialog box appears.
file:///C|/VisioneerDoc/html/02learn.htm (13 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
3. In the Pro OCR directory, select the file SAMPLEB.TIF.
4. Click Get.
The sample file is read in and the progress bar moves down the page.
Locating the Regions in a Document
For Pro OCR to properly convert areas of a document, you must locate the regions of
the page that will be recognized. There are three types of regions: text, numeric, and
picture. For example, a picture region is one that contains any kind of graphic,
illustration, photograph, drawing, or picture. The contents of a picture region cannot
be recognized, but can be saved as an image. By specifying the Locate options, Pro
OCR knows what types of regions are in the document.
To specify the regions to locate:
1. Select Locate Text Only and Single Column from the Locate drop-down list.
If you did want to save the graphics in a document, you would select Locate
Text and Pictures. Sometimes, you want the graphics so that you can recreate
an exact duplicate of the document you are processing.
2. Click the Locate button in the Gallery toolbar.
Pro OCR goes through the document and recognizes the different regions.
Arrows appear on the document showing the flow of the information.
file:///C|/VisioneerDoc/html/02learn.htm (14 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Recognizing the Document
The third step is to actually convert or recognize the text in a document. Pro OCR
reads the text and displays the actual characters.
Before recognizing the document, you should specify the quality of the image text.
You can do this by using the Recognize drop-down list.
file:///C|/VisioneerDoc/html/02learn.htm (15 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
To recognize the document:
1. Select Degraded or Fax Quality from the Recognize button drop-down list.
2. Click the Recognize button in the Gallery toolbar.
Pro OCR displays a bar that moves through the document as Pro OCR
recognizes the text. When the process finishes, you see the document with text
only.
file:///C|/VisioneerDoc/html/02learn.htm (16 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Proofing the Document
After a document is recognized it appears in the text view. In this view, you can proof
the document for errors and make changes to the document when you find problems.
When you proof, you can:
■ Inspect recognized text and edit it if necessary.
■ Search for misspelled words, numbers, punctuation, symbols, and
alphanumeric words.
■Change font style information.
NOTE: You can change the proofing options by choosing Options from the Tools
menu.
To proof the document:
1. Click the Proof button in the Gallery toolbar, or press the Tab key.
Pro OCR starts at the current insertion point, if there is one. Otherwise, it starts
at the top of the current page.
Pro OCR highlights the first word it does not recognize and displays the
suspect text in the On-Screen Verifier.
The On-Screen Verifier is a pop-up window that displays the part of the page
image corresponding to selected text.
TIP: For a a close up of the text, click the image to increase the magnification.
2. If the text is wrong, select the text and type the correct text.
3. Click the Proof button in the Gallery toolbar again or press the Tab key.
file:///C|/VisioneerDoc/html/02learn.htm (17 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Pro OCR displays the next suspect entry.
4. Repeat the previous steps until you have checked the entire document.
5. If you want to change the font style, select the text, and click the Style option.
Saving the Document
Saving the document places a permanent copy of it on disk.
To save the document:
1. Choose Save from the File menu, or click the Save As button in the Gallery
toolbar.
The Save As dialog box appears.
2. Type the file name in the File Name box.
3. Select MS Word for Windows from the Save as drop-down list.
You can save documents in many popular formats, including Rich Text Format
(RTF), plain text, and Microsoft Excel.
4. Click Save.
5. Choose Close from the File menu.
Example 3: Scanning a Document of Multi-Column Text
This example introduces you to processing of multi-column text like newspapers,
magazine articles, and multicolumn books (but not tables), where you want the text to
be recognized column by column.
To scan multi-column text and save in Pro OCR format:
1. Put Sample Document C in the scanner. You can find a copy of this document
in the back of the Getting Started Guide.
Make sure to place the document in the correct orientation and to align it.
2. Select Locate Text Only and Multiple Columns from the Locate drop-down
list in the Gallery toolbar.
file:///C|/VisioneerDoc/html/02learn.htm (18 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Locate Text Only prevents Pro OCR from locating any picture element in the
document to be scanned.
3. Select Use Scanner from the Get Page drop-down list in the Gallery toolbar.
4. Click Auto OCR in the Gallery toolbar.
Your scanner software dialog box appears.
5. Use the scanner software as you usually do to scan the document.
After scanning the sample document, the document appears in Pro OCR.
A dialog box appears asking for additional pages to scan. For this example,
you won’t scan any additional pages.
6. Click End.
Automatic processing continues with locating and then recognizing.
file:///C|/VisioneerDoc/html/02learn.htm (19 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
While Pro OCR recognizes the page, notice the boxes indicating located text
regions around each column, and the arrows connecting each text region to the
next. Note that by using Locate Text Only, the graphic element in the sample
was not located and so a box does not appear around it.
Pro OCR outputs text in the order in which the arrows connect the text regions.
For this example, notice how the boxes are drawn and connected.
file:///C|/VisioneerDoc/html/02learn.htm (20 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
When Pro OCR finishes recognizing, the Recognition Completed dialog box
appears.
7. Click OK.
The document appears in the text view.
To save the document
1. Choose Save As from the File menu, or click the Save As button in the Gallery
toolbar.
The Save As dialog box appears.
2. Select Pro OCR from the Save As Type drop-down list.
The Pro OCR format saves all available information in the document.
3. Type in a name for the file in the File Name box.
4. Click Save.
Both the image of the scanned page and the recognized text are saved. Always
save files in the Pro OCR format when you want to reopen them in Pro OCR.
NOTE: To reopen a file saved in the Pro OCR format, use the Open
command from the File menu. If you use Get Page, Pro OCR only restores the
page image. The Open command restores all the saved information, including
any recognized text and proofing information.
5. Choose Close from the File menu.
For information about other file formats,
see Chapter 6, “Saving and Printing
file:///C|/VisioneerDoc/html/02learn.htm (21 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Documents.”
Example 4: Scanning a Document With Tables and Saving in a
Spreadsheet Format
This example introduces you to processing of multi-column text in tables, where you
want the text to be recognized as all one text block and not broken into columns. You
can use this procedure whenever you want to recognize tables and other documents
that you don’t want broken into columns.
To scan multicolumn table text and save in spreadsheet format:
1. Select Single Columns Only and Locate Text Only from the Locate dropdown list in the Gallery toolbar.
2. Put Sample Document D in the scanner.
Make sure to place it in the correct orientation to align it.
3. Click Auto OCR.
Pro OCR displays your scanner software.
4. Use the scanner software as you usually do to scan the document.
After scanning the sample document, it appears in the Pro OCR window.
A dialog box appears. asking if you want to scan additional pages. For this
example, you won’t be scanning any additional pages.
5. Click End.
Pro OCR locates and then recognizes the page.
file:///C|/VisioneerDoc/html/02learn.htm (22 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Notice that the text regions are not drawn separately around each column. By
using the Single Column locating method, you force Pro OCR to ignore
columns and tell it to read the page from left to right, top to bottom.
When Pro OCR is finished recognizing the page, the Recognition Completed
dialog box appears.
6. Click OK.
file:///C|/VisioneerDoc/html/02learn.htm (23 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
Pro OCR displays the document in the text view.
To save the document:
1. Choose Save As from the File menu, or click the Save As button in the Gallery
toolbar.
The Save As dialog box appears.
file:///C|/VisioneerDoc/html/02learn.htm (24 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
2. Choose Microsoft Excel from the Save as Type drop-down list.
Notice that the following options are already selected.
TIP: To change these options, click the Options button.
3. Type in a name for the file in the File Name box.
4. Click Save.
Pro OCR saves the text and format information of the document in the format
you have selected.
5. Choose Close from the File menu.
NOTE: If you don’t save a version of this file in the Pro OCR format, you cannot
open it again in Pro OCR. You can open the version that you just saved in any
spreadsheet application that supports the Microsoft Excel format.
Example 5: Scanning and Saving a Document with Pictures
This example shows you how to scan a document with photographs or line drawings
and save it in a word processor file format.
To scan and save a document with pictures:
1. Select Multiple Columns and Locate Text and Pictures from the Locate
drop-down list in the Gallery toolbar.
2. Put Sample Document C in the scanner. You can find this document in the
back of the Getting Started Guide.
3. Click Auto OCR.
Pro OCR displays your scanner software.
4. Use the scanner software as you usually do to scan a document.
file:///C|/VisioneerDoc/html/02learn.htm (25 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
After scanning the sample document, it appears in the Pro OCR window.
Pro OCR begins getting the page from the scanner. When the scanning is done,
a dialog box appears asking if you want to scan additional pages. For this
example, you won’t be scanning any additional pages.
5. Click End.
Automatic processing continues with the Locate and Recognize steps.
file:///C|/VisioneerDoc/html/02learn.htm (26 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
The Recognition Complete dialog box appears.
6. Click OK.
The document appears in the text view. Notice that the graphic image appears
and has a picture region drawn around it.
To save the document:
1. Choose Save As from the File menu, or click the Save As button in the Gallery
toolbar.
The Save As dialog box appears.
2. Choose Rich Text Format (RTF) from the Save as Type drop-down list.
RTF allows you to save the pictures along with the text in the exported file.
NOTE: As an alternative, you can save in a format for an application that you
have, such as Ami Pro, Word for Windows, and WordPerfect 5.x.
3. Select the Save Pictures option.
4. Choose Embed in Export File from the Save Pictures drop-down list.
This format embeds the pictures into the RTF file along with the text.
5. Type a name for the file.
file:///C|/VisioneerDoc/html/02learn.htm (27 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
6. Click Save.
The picture from the scanned page is now saved within the RTF file along with
the recognized text. If you open this file in a word processor that supports
pictures in RTF files, you see the recognized text and the pictures.
7. Choose Close from the File menu.
Example 6: Locating a Document Using a Template
At times, you don’t want to recognize all the text on a page. For example, in this
exercise the sample page has a header and a footer that you don’t want to recognize or
save. The sample template in this example is designed to create a text region around
just the body text during the Locate step. The title and copyright in the footer are not
recognized (saving time during recognition) and are not displayed (saving you the
time of searching for and deleting them).
In this example, you use a supplied template that you can use for your own documents
as well. You can also create your own templates, to customize Pro OCR for the kinds
of pages that you typically use.
To use a template:
1. Choose Template from the Locate drop-down list in the Gallery toolbar.
2. Choose Select Template from the File menu.
The Select Template dialog box appears.
file:///C|/VisioneerDoc/html/02learn.htm (28 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
3. In the Temp folder, find and select the file TEMPB.TPL.
4. Click Open.
Pro OCR displays the name of the template you selected next to Template in
the Locate drop-down list.
5. Select Open File from the Get Page drop-down list.
6. Click the Get Page button in the Gallery tool bar.
7. In the Pro OCR directory, select the file SAMPLEB.TIF and click the Get
button.
The sample file is read in.
8. Click the Locate button.
Notice that text boxes are drawn around just the body text on the page. This is
the text region defined by template. Only the text within this text region is
recognized.
9. Click the Recognize button.
After recognizing is completed the document appears in the text view. You can
review the recognized document in the text view. Notice that the title and the
file:///C|/VisioneerDoc/html/02learn.htm (29 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
copyright in the footer were not recognized. If you save this page in an
application or text format, only the displayed text is saved.
10. Save and close the document.
Use the same procedures described in the earlier examples.
Example 7: Scanning a Document with Mixed Tables and Manually
Locating Regions
This example shows you how to scan and manually locate a document with a table
that has some rows or columns suitable for numeric regions and other rows or
columns suitable for text regions.
To scan and locate a document with mixed tables:
1. Put Sample Document D in the scanner.
Make sure to place it in the correct orientation and to align it.
2. Select Single Column from the Locate drop-down list.
3. Click the Get Page button.
Pro OCR begins getting the page from the scanner and displays your scanner
software.
4. Use your scanner software as you usually do.
Pro OCR scans in the page and then displays it in the image view.
5. Choose Zoom Out from the View menu, or click the Zoom Out button on the
Status bar.
You can reduce or enlarge the document on the screen by using the Zoom In or
Zoom Out features.
To select regions manually:
1. Scroll the page up a short distance so that the table labeled “ZBOL Mining
Production, 1998” is fully visible on your screen.
2. Move the pointer just above and to the left of the first column header, titled
“Mineral.”
file:///C|/VisioneerDoc/html/02learn.htm (30 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
3. Press and hold the mouse button; then drag down and to the right until the box
following the pointer encloses all of the column headers.
4. Release the mouse button.
You have just manually located a text region.
5. Move the pointer just above and to the left of the item labeled “Gold.”
6. Press and hold the mouse button; then drag down and to the right until the box
following the pointer encloses the first column of the table.
The box should enclose the items from “Gold” through “Cobalt.”
TIP: If you make a mistake, select the region and press Del.
7. Release the mouse button.
You have just manually located another text region. Note that an arrow appears
that connects this text region to the first text region you defined for the table
headers.
8. Move the pointer just above and to the left of the first number column.
file:///C|/VisioneerDoc/html/02learn.htm (31 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
9. Using the same steps you used to create the text regions, drag the mouse until
the box following it encloses all three columns of numbers and release the
mouse button.
Make sure the entire image of the number columns is enclosed by the new
region you have defined.
10. Choose Numeric from the Style menu.
The locate region you just defined becomes a numeric region.
To make a table from the selected regions:
1. Choose Select All from the Edit menu.
Pro OCR selects all of the locate regions you defined.
2. Choose Make Table from the Edit menu.
Pro OCR creates a table from the selected locate regions.
3. Click the Recognize button in the Gallery toolbar.
Pro OCR recognizes the page image using the locate regions you defined in the
previous steps.
After Pro OCR is finished recognizing, the page appears in the text view.
file:///C|/VisioneerDoc/html/02learn.htm (32 of 33) [1/20/2003 4:21:16 PM]
Learning Pro OCR Basics
You have completed this example. A message appears asking if you want to
save the document.
file:///C|/VisioneerDoc/html/02learn.htm (33 of 33) [1/20/2003 4:21:16 PM]
Getting Documents
Chapter 3
Getting Documents
This chapter tells you how to get (acquire) documents with Pro OCR. It is assumed
that you completed the procedures in
“Starting Pro OCR,” and “Selecting a TWAIN-
Compliant Scanner,” in Chapter 2.
In this chapter you learn:
■ The basic steps for getting a page
■ How to get a page using a scanner
■ How to get a page from a file
Getting a Page—The Basic Steps
There are two ways to get a page: 1) Use Auto OCR to automatically get a page. 2)
Perform an individual Get Page. In each case you need to select the source—your
scanner or an image file—that you want to use to get the page. If you select a
scanner, you also need to select a few other options. The following procedure tells
you the basic steps to get a page. For more detailed information, see
“Getting Pages
From a Scanner,” and “Getting Pages from an Image File,” later in this chapter.
To get a page:
1. Select a source from the Get Page drop-down list, in the Gallery toolbar.
2. If you select Use Scanner, select scanner options as described in
“Setting
Scanning Options,” later in this chapter.
3. Click Get Page or Auto OCR depending on which process you want to use.
4. If you select Use Scanner, scan the document using your scanner. If you select
Open File, open the file you want to use.
file:///C|/VisioneerDoc/html/03get.htm (1 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
Getting Pages From a Scanner
You can use a scanner to get one page at time by using the Get Page button, or use a
scanner with Auto OCR to get multiple pages automatically. This section tells you
how to:
■ Set scanning options
■ Get one page using Get Page
■ Get pages with Auto OCR
Setting Scanning Options
To set scanner settings for your scanner, such as the resolution, brightness, and page
orientation, see the documentation that came with your scanner. You can set the
following processing options in Pro OCR by choosing Options from the Tools menu:
■Straightening Skewed Images. Automatically straightens type that is skewed
(crooked) on a page. When text on a page is badly skewed, Pro OCR may
have trouble correctly locating paragraph boundaries. Recognition may also
be affected, resulting in many illegible characters.
NOTE: Processing with the Straighten Skewed Images option selected takes
longer than processing the same page with this option not selected. However,
recognition is usually much better on skewed type if the page image has been
straightened. You may want to experiment on skewed pages to see when to
use the Straighten Skewed Images option. Pro OCR is preset to not straighten
skewed images.
■Splitting one A3 page. For scanners that scan two, 11 by 17 inch pages, you
can scan bound material and Pro OCR will automatically split the image into
two pages.
■Auto Orientation. Automatically selects Portrait or Landscape orientation for
the page.
By default, Pro OCR does not select these settings for you.
file:///C|/VisioneerDoc/html/03get.htm (2 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
To set Get Page Processing options:
1. Choose Options from the Tools menu.
The Options dialog box appears with the Processing tab selected.
2. Select the options that you want to use.
3. Click OK.
Selecting a Scanner as the Source
When you get pages from a scanner by using Auto OCR, a deferred job, or Get Page,
one or more page images are read in from the scanner. Pages are scanned according
to the current page size, orientation, brightness, and scanning settings selected in
your scanner’s software.
When you read in additional pages from a scanner, new page images are added to the
active document. You can read up to 999 pages into a document, as long as you have
enough available disk space.
To select a scanner as the source:
■Select Use Scanner from the Get Page drop-down list in the Gallery.
file:///C|/VisioneerDoc/html/03get.htm (3 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
NOTE: If you did not previously select a scanner, the Select Scanner dialog box
appears, letting you select one now. (You can also select a scanner by choosing
Select Scanner from the Tools menu.)
Getting a Page Using a Scanner
During the single-step Get Page operation, you scan only one side of one page at a
time. You cannot automatically read stacks of pages or double-sided pages. Instead,
you must manually feed pages that you want to be read. The procedure is the same
whether you use an automatic document feeder or a flatbed scanner.
When you scan in a page using Get Page, the new page is added after the current
page. If you want to add pages to the end of a document, make sure the last page of
the document is displayed before you do Get Page. To insert a page after any other
page, make sure the appropriate page is displayed. Go to the page, if necessary, and
then use Get Page to insert the new page after it. You can also use single-step Get
Page to replace a current page.
To get one page from a scanner using Get Page:
1. Make sure you have set scan options as described in
“Setting Scanning
Options,” previously in this document, and select Use Scanner from the Get
Page drop-down list.
2. If you are adding pages, to other pages that you already got, make sure the
current page is displayed in Pro OCR.
New pages are added after the current page.
3. Place one page on the flatbed or place one page in the ADF.
Make sure the page is oriented correctly for your scanner and the orientation
you have selected.
You can put in as many pages as the ADF will hold, but Pro OCR will only
scan one page at a time using Get Page.
4. Click Get Page.
The Get Page button is highlighted to indicate that Pro OCR is getting pages.
In the status display area, a meter bar indicates that Pro OCR is scanning the
page.
file:///C|/VisioneerDoc/html/03get.htm (4 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
Pro OCR scans the page on the flatbed or the first page in the ADF, using the
current brightness, page size, orientation, and scanning resolution settings.
After the single page is read in, it appears using the previous magnification.
NOTE: To find the most appropriate brightness setting for a page, use Get Page to
scan the same page as many times as necessary. You can change the level of
brightness in your scanner’s software.
To scan additional pages:
1. Place another page on the flatbed.
2. Click Get Page.
Repeat steps 1 and 2 for each additional page you want to scan.
NOTE: After Get Page is completed, whether or not pages have been located or
recognized, you can save files in Pro OCR format, or in any of the other image
output file formats. for more information about saving, see
Chapter 6, “Saving and
Printing Documents.”
Using Auto OCR with Scanners
This section tells you how to use Auto OCR with a flatbed scanner or Automatic
Document Feed (ADF) scanner.
NOTE: When scanning pages, make sure that pages are placed as straight as
possible. Pages skewed more than 2° may result in the incorrect sorting and grouping
of text lines unless the Straighten Skewed Images processing option is selected. Also
note that pages skewed more than 0.5° may jam in an ADF.
Using Auto OCR with a Flatbed Scanner
To use Auto OCR with a flatbed scanner, complete the following procedure.
NOTE: You cannot scan double-sided pages automatically when using a flatbed
scanner. You should place the pages on the scanner’s bed in the order in which you
want the text to be read.
To automatically process one or more pages with a flatbed scanner:
1. Make sure you have set scan options as described in
“Setting Scanning
Options,” previously in this document, and select Use Scanner from the Get
file:///C|/VisioneerDoc/html/03get.htm (5 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
Page drop-down list.
2. Check the Locate and Recognize options to make sure they are set the way
you want them.
3. Place the first page on the flatbed.
Make sure the page is oriented correctly for your scanner and the page
orientation you have selected in the Gallery.
4. To scan more than one page, choose Options from the Tools menu, and then
select the Enable Auto OCR Dialogs processing option.
5. Click Auto OCR.
The scanner software appears.
6. Use the software as you usually do.
Pro OCR begins getting pages:
file:///C|/VisioneerDoc/html/03get.htm (6 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
If the Enable Auto OCR Dialogs processing option is not selected, scanning is
completed. Pro OCR begins locating and then recognizing.
If the Enable Auto OCR Dialogs processing option is selected, Pro OCR asks
for additional pages to scan after it finishes reading in the current page:
file:///C|/VisioneerDoc/html/03get.htm (7 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
7. If you want to get additional pages, place another page on the flatbed.
Pro OCR scans the additional page on the flatbed and displays the dialog box
again, asking for the next page. Repeat this step for as many additional pages
that you want to scan.
8. If you do not want to scan more pages, click End.
Scanning is completed. Pro OCR displays the page you’ve scanned in the
image view. Pro OCR then begins locating and then recognizing.
Using Auto OCR With a Scanner with an ADF
Complete the following procedure to use Auto OCR with scanners that have an ADF.
NOTE: To use an ADF scanner with Pro OCR, you need the Pro OCR ISIS upgrade.
For more information, visit Visioneer’s Web site at
www.Visioneer.com.
To automatically process one or more pages with a scanner that has an ADF:
1. Make sure you have set scan options as described in
“Setting Scanning
Options,” previously in this document, and select Use Scanner from the Get
Page drop-down list.
2. Place one or more pages in the ADF.
Make sure the pages are oriented correctly for your scanner and the page
orientation you have selected in the Gallery.
3. To scan more than one page, choose Options from the Tools menu, click the
Processing tab, and then select the Enable Auto OCR Dialogs processing
option.
4. Check the Locate and Recognize options to make sure they are set the way
you want them.
file:///C|/VisioneerDoc/html/03get.htm (8 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
5. Click Auto OCR.
Pro OCR begins getting pages.
If the Enable Auto OCR Dialogs processing option is not selected, scanning is
completed. Pro OCR begins locating and then recognizing.
If the Enable Auto OCR Dialogs processing option is selected, Pro OCR asks
for additional pages to scan.
6. If you want to scan another stack of pages, place the next stack of pages in the
ADF.
Pro OCR scans the additional pages in the ADF and displays the dialog box
again, asking for additional pages to scan. If you need to scan the second side
of a stack of double-sided pages, see the next procedure,
“To scan the second
side of double-sided pages:.”
Repeat this step for as many additional stacks of pages as you want to scan.
7. If you’ve scanned all the pages you need for this job, click End.
Scanning is completed. Pro OCR displays the first page of the scanned stack,
in the image view. Pro OCR then begins locating and recognizing.
To scan the second side of double-sided pages:
1. When you’re finished scanning the first side, turn the entire stack of pages in
the ADF over and replace them in the ADF.
Make sure that you don’t change the order of pages, and that you replace them
in the proper orientation. If your double-sided document contains more pages
than your ADF can handle, you’ll need to separate the document into smaller
stacks. After scanning the first side of a smaller stack, scan the flip side of the
same stack before continuing with the next stack.
2. Click Flip in the dialog box that appears.
Pro OCR scans the second side of each page using the current brightness,
page size, orientation, and scanning resolution settings.
3. When you’re finished, click End.
file:///C|/VisioneerDoc/html/03get.htm (9 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
Scanning is completed. Pro OCR finishes getting pages and displays the first
page of the scanned stack in the image view. The scanned double-sided text is
correctly sequenced, in correct page order.
Getting Pages from an Image File
Typically, Pro OCR obtains the image of a page by working directly with your
scanner. You can, however, also use Pro OCR with image files you scanned or
created using other applications. There are several common sources for obtaining
image files, other than scanning with Pro OCR:
■ Scanner applications not supported by Pro OCR
■ Fax-modem applications
■ High-resolution paint programs
Pro OCR can read the following image file formats:
■ TIFF (Uncompressed, PackBits, Group 3, Group 3 modified, Group 4)
■ PCX
■ DCX
Pro OCR can open black-and-white (one-bit) single-page or multiple-page image
files. Pro OCR does not open grayscale (greater than one-bit) or color image files.
Not all instances of the above files from every application are supported, however,
because specific implementations of these formats are not necessarily standard. If
you try to open a file of a type that Pro OCR doesn’t recognize, Pro OCR displays a
warning message.
Selecting a File as the Source and Getting Pages
The following procedure tells you how to select and open a file as the source for Get
Page.
To select and open files:
file:///C|/VisioneerDoc/html/03get.htm (10 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
1. Select Open File from the Get Page drop-down list.
A checkmark appears next to it when selected.
2. Click the Get Page button in the Gallery toolbar.
The Get Page dialog box appears.
3. Select the file and click Get.
The file is read in and the progress bar moves down the page.
Getting Files From Other Scanner Applications
Pro OCR supports many of the most popular scanners directly. However, if you don’t
have a scanner that Pro OCR supports directly, you may still be able to use Pro OCR
with the scanner application you do have. Most scanner applications save to one of
the image file formats that Pro OCR supports.
To get pages from a non-supported scanner:
1. Scan a page using a scanner application that is compatible with your scanner.
2. Save the page in an image file format that Pro OCR supports.
3. Select Open File from the Get Page drop-down list.
file:///C|/VisioneerDoc/html/03get.htm (11 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
4. Click Auto OCR.
5. Find and select the file(s) that you want to process.
6. Click Add and then click Get.
Pro OCR automatically processes the image file(s) according to the controls
in the Locate and Recognize rows of the Gallery.
OR
1. Click Get Page.
2. Find and select the file that you want to process.
3. Click Add then click Get.
Pro OCR reads in the specified file. You can continue with any combination
of the single-step Locate and Recognize operations, followed by Finish
Processing, or save it in the Pro OCR Deferred format and finish processing it
later on using Process Deferred Jobs.
After the page is read in, Pro OCR treats the page as if it had scanned it.
Getting Fax-modem Files
Pro OCR can also open fax-modem files, if they have been saved in one of the
supported input file formats.
Many fax-modems have both a Standard and a High-Resolution (or Fine) setting.
The Standard setting typically transmits characters at 204 x 98 dpi. The HighResolution setting typically transmits at 204 x 196 dpi. Fax-modem files transmitted
at Standard setting may not be recognized by Pro OCR as accurately as those
transmitted at High-Resolution.
To get a fax-modem file, use the same procedure as described in the previous section.
NOTE: It is recommended that you use the highest resolution a fax-modem can
produce for the best possible recognition.
Using Auto OCR With a File
file:///C|/VisioneerDoc/html/03get.htm (12 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
You can specify one or more image files for the Get Page step, and then have Pro
OCR automatically locate and recognize them. If you’ve selected the Enable Auto
OCR Dialogs processing option, you can also select one or more additional files after
reading the initial files and before locating and recognizing begin. Pro OCR can
process most standard black and white TIFF, PCX, and DCX files.
To automatically process from a file:
1. Select Open File from the Get Page drop-down list in the Gallery toolbar.
2. Check the Locate and Recognize options to make sure they are set the way
you want them.
3. Click Auto OCR.
The following dialog box appears:
4. Find and select the files you want.
To get just one file, click the file name.
To get multiple files, click the Advanced button. The dialog box expands.
Click the file that you want to get and then click the Add button. The file
names appear in the Selected Files list in the lower half of the dialog box. You
file:///C|/VisioneerDoc/html/03get.htm (13 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
can add available files from as many directories and disks as necessary. Files
are displayed in the Selected Files list in the order in which you add them.
NOTE: To remove a file from the Selected list, select the file name and click
the Remove button. To remove all selected files, click Remove All.
5. Click Get.
Pro OCR reads in the selected file or files. As a page is read in, the Get Page
button is highlighted, and the progress bar moves down the page.
Each page is read in and displayed in the image view at 25% magnification
(zoom level).
If the Enable Auto OCR Dialogs processing option is not selected, when all
pages have been read Pro OCR finishes getting pages and displays the first
page in the image view. Pro OCR then locates and recognizes each page.
If the Enable Auto OCR Dialogs processing option is selected, when all pages
have been read the Get Page dialog box is again displayed.
6. To add pages from an additional file or files to the end of your current
document, repeat steps 5 and 6 as often as necessary.
Each time you read in another file, the new pages are read in and added to the
end of the current document. You can read up to 999 pages into a document,
as long as you have enough available disk space.
7. When you’re done reading files, click Finished.
When you click End, the file reading step completes, and locating and then
recognizing begins. For more information about locating see
Chapter 4,
“Locating Text and Graphics.” For more information about recognizing and
proofing, see
Chapter 5, “Setting Recognize Options and Proofing a
Recognized Document.”
NOTE: When you use Auto OCR, the locate and recognize steps occur
automatically.
file:///C|/VisioneerDoc/html/03get.htm (14 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
More About Enabling Auto OCR Dialogs
By default, after you’ve used Auto OCR to scan pages or to read in one or more files,
Pro OCR displays a dialog box that prompts you to continue in one of several ways:
■ Scan another page or stack of pages
■ Scan the second side of a page or stack
■ Open additional files
This lets you read in and process multiple files or stacks of pages as one document.
However, it also means that you have to click Finish to proceed with automatic
processing after the Get Page step is done. If instead you want the Auto OCR process
to continue without interruption, you can prevent the dialog box from reappearing by
deselecting the Enable Auto OCR Dialogs option.
NOTE: You can’t process more than a single stack of pages or set of files when
Enable Auto OCR Dialogs is deselected.
To enable/disable Auto OCR dialog boxes:
1. From the Tools menu, choose Options.
The Options dialog box appears with the Processing options.
file:///C|/VisioneerDoc/html/03get.htm (15 of 16) [1/20/2003 4:21:17 PM]
Getting Documents
2. To enable the dialogs, select Enable Auto OCR Dialogs. To disable the
file:///C|/VisioneerDoc/html/03get.htm (16 of 16) [1/20/2003 4:21:17 PM]
Saving and Printing Documents
Chapter 6
Saving and Printing Documents
This chapter describes the input file formats and output file formats that Pro OCR
supports and tells you how to save documents in a variety of these formats.
Saving Documents and Other Pro OCR Items
You can save the following documents and items:
■ Documents (in various file formats)
■ Templates (text, numeric, picture, and table region definitions and ordering
information)
■Gallery settings and selected processing, display, and proofing options
Saving a Document
Documents are not saved automatically. You save a document using Save or Save
As from the File menu. If you close or exit Pro OCR without saving a document, a
message prompts you to save the current document.
After you get a document, you can save it at any or all of the various stages of the
Pro OCR process—after locating, recognizing, or proofing. If a document does not
contain recognized text, you can save only as Pro OCR, Pro OCR Deferred, or
using one of the standard image file formats.
NOTE: When you save to formats other than Pro OCR or Pro OCR Deferred, you
must still save the document in one of the Pro OCR formats to be able to use it
again in Pro OCR.
To save an open document:
1. Choose Save As from the File menu.
file:///C|/VisioneerDoc/html/06save.htm (1 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
The Save As dialog box appears:
If the document has been saved previously, the name of the document is
displayed and selected in the File Name box. If the document has not already
been saved, the File Name box is selected and contains the default file name:
UNTITLED.XXX. Pro OCR adjusts the file extension represented here as
XXX according to the document format you select in the Save as Type drop-
down list.
2. Type a new file name, if necessary.
3. Choose a document format from the Save as Type drop-down list.
If this is a new file, the last used document format is displayed. If this is a
previously saved file, the previously saved document format is displayed.
You can choose from the following document formats:
■ Pro OCR document file formats
■ Standard image file formats
file:///C|/VisioneerDoc/html/06save.htm (2 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
■ Standard text file formats
■ Word processor and spreadsheet file formats
For more information about the different file formats,
see “Supported Output
File Formats” later in this chapter.
4. If you want to save any pictures in the document, select the Save Pictures
option and choose a picture format from the Picture Format drop-down list.
NOTE: Saving pictures in a document is different from saving the entire
page image. Save the page image using one of the image file formats
presented in the Save as Type drop-down list. For more information,
see
“Saving Pictures” later in this chapter.
5. If you want to embed any pictures into the document when it is saved,
choose Embed in Export File from the Picture Format drop-down list.
The embedding option is only available for the following document formats:
■ MS Word for Windows
■ Rich Text Format (RTF)
■ WordPerfect 5.0 and 5.1
OR
If you want to save the page images only, choose one of the other picture
formats from the Save as Type drop-down list.
Choosing a picture format tells Pro OCR to save only the page images from
the active document.
NOTE: When saving the document in one of the standard TIFF formats,
you can choose whether to save all pages in one file, to split on blank pages
or to save one page per file. When saving to the PCX format, you must save
one page per file.
6. To select the formatting information that will be exported to the format
file:///C|/VisioneerDoc/html/06save.htm (3 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
currently chosen in the Save as Type drop-down list, click the Options
button to open the Save As Options dialog box.
Most formats have additional options. If there are no options available for
the format you’ve selected, the Options button is dimmed.
The Save As Options dialog box has the following sets of options:
■ If page breaks should be inserted between each page
■ If formatting should be preserved or completely discarded, or if only
certain formatting should be preserved
■If all pages in the document should be saved as a single file, or as
separate files for each page
If you decide to only save certain formatting, you can select from the
following formatting to be saved:
■ Style
■ Font (typeface)
file:///C|/VisioneerDoc/html/06save.htm (4 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
■ Point size
■ Justification
■ Number of columns
■ Line spacing
■ Paragraph indentation
■ Page size
■ Margin sizes
Choose one of the Split Document options to either keep all pages in one file
or split the document into multiple files:
■All Pages in One File: Choose this option to save all the pages in the
document in one file.
■Split on Blank Pages: Choose this option when you want Pro OCR
to save a stack of documents into separate files.
To use this option, before you scan the stack of pages, put a blank page after
the last page of each document you want Pro OCR to save as a separate file.
For more information, about saving multiple documents,
see “Saving
Multiple Documents as Separate Files” and see “Saving Multiple Page
Images as Separate Image Files” later in this chapter.
NOTE: For Split on Blank Pages to work properly, make sure to use the
Recognize operation on every blank page.
Pro OCR saves each stack of pages up to a blank page as a separate file,
using the name you specified followed by a sequential three- digit numeric
identifier, followed by the appropriate extension. For example, if you name
the current document BOOK, and then save it to Excel 2.x format with “Split
on Blank Pages” selected, Pro OCR will save the first file (up to the first
blank page) as BOOK001.XLS, the next file as BOOK002.XLS, and so on.
■One Page Per File: Many image editing programs can support only
file:///C|/VisioneerDoc/html/06save.htm (5 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
one image page per file. If you save in PCX format, Pro OCR
automatically selects this option, because a PCX file can only have
one page.
When you use this option, Pro OCR automatically creates one file for each
page. Pro OCR saves each file using the name you specified followed by a
sequential three-digit numeric identifier, followed by the appropriate
extension. For example, if you name the current document IMAGE, and then
save it in a TIFF format with “One Page Per File” selected, Pro OCR saves
the first page image as IMAGE001.TIF, the next page image as
IMAGE002.TIF, and so on.
7. If you opened the Save As Options dialog box, click OK to close it.
The Save As dialog box reappears.
8. Click OK.
The document is saved according to the selected options.
If you try to save the document with a name that has already been used, a
dialog box asks if you want to replace the existing document. Click No to
return to working with the document. Click Yes to replace the document.
NOTE: When you want to open a document in an image editing program, save it in
one of the image file formats. Any locate regions that have been applied or created
are not saved. If the document has been recognized, the recognized text is not
saved.
Saving Multiple Documents as Separate Files
Often you’ll have many documents on which you want to do Get Page, Locate, and
Recognize at one time, but you want the recognized files saved as separate
documents. Pro OCR makes it easy for you to process a large stack of separate
documents as one and still keep them separate when you save them. You can do
this when you’re saving to a text format, the various image output formats, or to
any export format.
To save multiple multipage documents as separate files using the split option:
1. Before you put the pages in the scanner, separate the documents by putting a
blank piece of paper between each document and the next.
file:///C|/VisioneerDoc/html/06save.htm (6 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
2. Process the pages as you would normally.
3. When you save the document, choose Save As from the File menu.
The Save As dialog box appears.
4. Click the Options button.
The Options dialog box appears.
5. Select the Split on Blank Pages option and click OK.
Pro OCR saves each stack of pages up to a blank page as a separate file,
using the name you specified followed by a sequential numeric identifier,
followed by the appropriate extension. For example, if you name the current
document BOOK, and then save it to Excel 2.x format with the Split on Blank
Pages option selected, Pro OCR will save the first file (up to the first blank
page) as BOOK001.XLS, the next file as BOOK002.XLS, and so on.
6. Click OK.
To save multiple single-page documents as separate files using the one page
option:
1. Process the pages as you usually would.
2. When you save the document, choose Save As from the File menu.
The Save As dialog box appears.
3. Click the Options button.
The Options dialog box appears.
4. Select the One Page Per File option and click OK.
Pro OCR saves each page image as a separate file, using the name you
specified followed by a sequential numeric identifier, followed by the
appropriate extension. For example, if you name the current document
PAGES, and then save it to RTF format with the “One Page Per File” option
selected, Pro OCR will save the first page image as PAGES001.RTF, the
next page image as PAGES002.RTF, and so on.
file:///C|/VisioneerDoc/html/06save.htm (7 of 18) [1/20/2003 4:21:18 PM]
Saving and Printing Documents
5. Click OK.
Saving Multiple Page Images as Separate Image Files
In addition to Pro OCR format, you can save a document in a number of image
output formats. Usually, you’ll save a copy of your document in one of these
graphic formats when the document you’re processing has illustrations that you
want to save and use in other applications. Because many image-processing
programs cannot process multipage image files, you’ll probably want to save
multipage documents one image per file.
To save multiple pages as separate image files:
1. Process the pages as you usually would.
2. When you save the document, choose Save As from the File menu.
The Save As dialog box appears.
3. Click the Options button.
The Options dialog box appears.
4. Select the One Page Per File and click OK.
When you name the file, choose a file name of up to five characters. If the
file name is longer, Pro OCR truncates it to five characters.
Pro OCR saves each page image as a separate file, using the name you
specified followed by a sequential numeric identifier, followed by the
appropriate extension. For example, if you name the current document
IMAGE, and then save it to PCX format with “One Page Per File” selected,
Pro OCR will save the first page image as IMAGE001.PCX, the next page
image as IMAGE002.PCX, and so on.
Saving Templates
Save a template when you’ve defined locate regions that can be applied to other
page images. A template may be used to identify the locate regions on all pages to
be recognized. Or, you can use different templates to identify locate regions on
different pages.
file:///C|/VisioneerDoc/html/06save.htm (8 of 18) [1/20/2003 4:21:18 PM]
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.