Visioneer PROOCR100 User Manual

Untitled Document
Pro OCR User’s Guide
file:///C|/VisioneerDoc/Main.html [1/20/2003 4:21:09 PM]

Pro OCR User’s Guide

Contents
Chapter 1: Introducing Visioneer Pro OCR 100
Chapter 2: Learning Pro OCR Basics
Chapter 3: Getting Documents
Chapter 4: Locating Text and Graphics
Chapter 5: Setting Recognize Options and Proofing a
Pro OCR User’s Guide
Chapter 1 Introducing Visioneer Pro
OCR 100
This chapter introduces you to the Pro OCR application and to the concept of optical character recognition (OCR).
Why Pro OCR
Pro OCR is an Optical Character Recognition (OCR) application. An OCR application converts images of text, such as those obtained from scanning a document or receiving a fax through your fax­modem, into editable text. For example, when a scanner scans a page of text, it sees black and white areas on the page. The scanner converts what it sees into an image and stores the image on the computer. To transform a scanned text image into something a word processing or spreadsheet application can recognize as characters, you need an OCR (optical
file:///C|/VisioneerDoc/html/ug_main.htm (1 of 3) [1/20/2003 4:21:10 PM]
Pro OCR User’s Guide
Recognized Document
Chapter 6: Saving and Printing Documents
Chapter 7: Creating and Processing Deferred and Batch Jobs
Chapter 8: Tips for Getting the Best Results
Glossary
character recognition) application, such as Pro OCR.
Every day you may spend a lot of time retyping printed text or numbers from hard copy documents. By using Pro OCR and a scanner as an input device, you can eliminate much of this retyping.
Features and Highlights of Pro OCR
Many of the existing OCR products are typically capable of recognizing 200–300 plain, nonstylized typefaces. Using recognition technology, Pro OCR can recognize over 2,000 typefaces.
Most basic OCR applications inspect the scanned page image, attempt to recognize the dots on the page as characters, and transform the image into a plain text file. Pro OCR does all of these basic tasks, but it can also get the entire page into your word processor or spreadsheet as is—retaining the shape, form, type, and spacing, as well as the content, of the input page. Pro OCR provides:
The ability to read one or more pages of
text including graphics. Pro OCR reads pages directly from your scanner, or it reads TIFF, PCX, and DCX files. Pro OCR can automatically locate pictures and embed them in your document. You can also export pictures separately in a number of file formats.
Speed and accuracy of recognition. With
most documents, Pro OCR is faster than, and as accurate as a good typist.
Numeric regions. You can specify that a
given region on a page can contain only numbers. Numeric regions help Pro OCR make sure that numbers are always recognized as numbers and never mistakenly identified as
file:///C|/VisioneerDoc/html/ug_main.htm (2 of 3) [1/20/2003 4:21:10 PM]
Pro OCR User’s Guide
letters.
Recognition and retention of fonts,
characters, styles, and page formatting. Pro OCR recognizes and retains the differences between serif and sans-serif fonts, styles such as bold, underline, and subscript, and formatting such as columns, tables, and indents.
Deferred and batch processing. You can
perform procedures that need your attention or interaction (for example, locating), and then do the time consuming steps that don’t need interaction (for example, recognizing) at another time.
Internet readiness. supports HTML export
format. You can convert an image file directly to an HTML page and upload it to the Web site.
Proofing options. Pro OCR has a number of
proofing options. You can also send recognized text directly to your word processor.
Save features. With Pro OCR you can save
recognized text in a wide variety of word processor and spreadsheet file formats. Pro OCR works with imperfect input pages that may have skewed lines of text, touching or broken characters, and fuzzy characters.
© Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com.
file:///C|/VisioneerDoc/html/ug_main.htm (3 of 3) [1/20/2003 4:21:10 PM]

Introducing Visioneer Pro OCR 100

Pro OCR User’s Guide
Chapter 1 Introducing Visioneer Pro OCR 100
This chapter introduces you to the Pro OCR application and to the concept of optical character recognition (OCR).
Why Pro OCR
Pro OCR is an Optical Character Recognition (OCR) application. An OCR application converts images of text, such as those obtained from scanning a document or receiving a fax through your fax-modem, into editable text. For example, when a scanner scans a page of text, it sees black and white areas on the page. The scanner converts what it sees into an image and stores the image on the computer. To transform a scanned text image into something a word processing or spreadsheet application can recognize as characters, you need an OCR (optical character recognition) application, such as Pro OCR.
Every day you may spend a lot of time retyping printed text or numbers from hard copy documents. By using Pro OCR and a scanner as an input device, you can eliminate much of this retyping.
Features and Highlights of Pro OCR
Many of the existing OCR products are typically capable of recognizing 200–300 plain, nonstylized typefaces. Using recognition technology, Pro OCR can recognize over 2,000 typefaces.
file:///C|/VisioneerDoc/html/01intro.htm (1 of 2) [1/20/2003 4:21:10 PM]
Introducing Visioneer Pro OCR 100
Most basic OCR applications inspect the scanned page image, attempt to recognize the dots on the page as characters, and transform the image into a plain text file. Pro OCR does all of these basic tasks, but it can also get the entire page into your word processor or spreadsheet as is—retaining the shape, form, type, and spacing, as well as the content, of the input page. Pro OCR provides:
The ability to read one or more pages of text including graphics. Pro
OCR reads pages directly from your scanner, or it reads TIFF, PCX, and DCX files. Pro OCR can automatically locate pictures and embed them in your document. You can also export pictures separately in a number of file formats.
Speed and accuracy of recognition. With most documents, Pro OCR is
faster than, and as accurate as a good typist.
Numeric regions. You can specify that a given region on a page can contain
only numbers. Numeric regions help Pro OCR make sure that numbers are always recognized as numbers and never mistakenly identified as letters.
Recognition and retention of fonts, characters, styles, and page
formatting. Pro OCR recognizes and retains the differences between serif and sans-serif fonts, styles such as bold, underline, and subscript, and formatting such as columns, tables, and indents.
Deferred and batch processing. You can perform procedures that need
your attention or interaction (for example, locating), and then do the time consuming steps that don’t need interaction (for example, recognizing) at another time.
Internet readiness. supports HTML export format. You can convert an
image file directly to an HTML page and upload it to the Web site.
Proofing options. Pro OCR has a number of proofing options. You can also
send recognized text directly to your word processor.
Save features. With Pro OCR you can save recognized text in a wide
variety of word processor and spreadsheet file formats. Pro OCR works with imperfect input pages that may have skewed lines of text, touching or broken characters, and fuzzy characters.
© Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com.
file:///C|/VisioneerDoc/html/01intro.htm (2 of 2) [1/20/2003 4:21:10 PM]

file:///C|/VisioneerDoc/html/copyrt.htm

Copyright Information
Pro OCR User’s Guide for Windows. Copyright ©1998 Visioneer, Inc. All rights reserved.
Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.
AnyPort, AutoFix, AutoLaunch, FormTyper, MicroChrome, PaperEnable, PaperLaunch, PaperPort, PaperPort Deluxe, PaperPort ix, PaperPort Links, PaperPort mx, PaperPort PowerBar, PaperPort 3000, PaperPort 6000, PaperPort vx, PaperPortation, PaperPort Strobe, Pro OCR, ScanDirect, SimpleSearch, SharpPage, and Visioneer are trademarks of Visioneer, Inc. PaperPort, Paper-driven, and the Visioneer logo are registered trademarks of Visioneer, Inc.
Microsoft is a U.S. registered trademark of Microsoft Corporation. Windows is a trademark of Microsoft Corporation. TextBridge is a registered trademark of Xerox Corporation. ZyINDEX is a registered trademark of ZyLAB International, Inc. ZyINDEX toolkit portions, Copyright © 1990–1996, ZyLAB International, Inc. All Rights Reserved. All other products mentioned herein may be trademarks of their respective companies.
Information is subject to change without notice and does not represent a commitment on the part of Visioneer, Inc. The software described is furnished under a licensing agreement. The software may be used or copied only in accordance with the terms of such an agreement. It is against the law to copy the software on any medium except as specifically allowed in the licensing agreement. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage and retrieval systems, or translated to another language, for any purpose other than the licensee’s personal use and as specifically allowed in the licensing agreement, without the express written permission of Visioneer, Inc.
Part Number: 05-0340-000
Restricted Rights Legend
Use, duplication, or disclosure is subject to restrictions as set forth in contract subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software Clause 52.227-FAR14. Material scanned by this product may be protected by governmental laws and other regulations, such as copyright laws. The customer is solely responsible for complying with all such laws and regulations.
file:///C|/VisioneerDoc/html/copyrt.htm (1 of 3) [1/20/2003 4:21:10 PM]
file:///C|/VisioneerDoc/html/copyrt.htm
Visioneer’s Limited Product Warranty
If you find physical defects in the materials or the workmanship used in making the product described in this document, Visioneer will repair, or at its option, replace, the product at no charge to you, provided you return it (postage prepaid, with proof of your purchase from the original reseller) during the 12-month period after the date of your original purchase of the product.
THIS IS VISIONEER’S ONLY WARRANTY AND YOUR EXCLUSIVE REMEDY CONCERNING THE PRODUCT, ALL OTHER REPRESENTATIONS, WARRANTIES OR CONDITIONS, EXPRESS OR IMPLIED, WRITTEN OR ORAL, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON­INFRINGEMENT, ARE EXPRESSLY EXCLUDED. AS A RESULT, EXCEPT AS SET OUT ABOVE, THE PRODUCT IS SOLD “AS IS” AND YOU ARE ASSUMING THE ENTIRE RISK AS TO THE PRODUCT’S SUITABILITY TO YOUR NEEDS, ITS QUALITY AND ITS PERFORMANCE,
IN NO EVENT WILL VISIONEER BE LIABLE FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM ANY DEFECT IN THE PRODUCT OR FROM ITS USE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
All exclusions and limitations in this warranty are made only to the extent permitted by applicable law and shall be of no effect to the extent in conflict with the express requirements of applicable law.
FCC Radio Frequency Interference Statement
This equipment has been tested and found to comply with the limits for the class B digital device, pursuant to part 15 of the FCC Rules. These limits are designed to provide reasonable protection against interference in a residential installation. This equipment generates, uses and can radiate radio frequency energy and if not installed, and used in accordance with the instructions, may cause harmful interference to radio communications. However, there is no guarantee that interference will not occur in a particular installation. If this equipment does cause harmful interference to radio or television reception, which can be determined by turning the equirpment off and on, the user is encouraged to try and correct the interference by one or more of the following measures:
Reorient or relocate the recemng antenna.
file:///C|/VisioneerDoc/html/copyrt.htm (2 of 3) [1/20/2003 4:21:10 PM]
file:///C|/VisioneerDoc/html/copyrt.htm
Increase the separation between the equipment and receiver.
Connect the equipment into an outlet on a circuit different from that to
which the receiver is connected.
Consult the dealer or an experienced radio/TV technician for help.
This equipment has been certified to comply with the limits for a class B computing device, pursuant to FCC Rules. In order to maintain compliance with FCC regulations, shielded cables must be used with this equipment. Operation with non­approved equipment or unshielded cables is likely to result in interference to radio and TV reception. The user is cautioned that changes and modifications made to the equipment without the approval of manufacturer could void the user's authority to operate this equipment.
This device complies with part 15 of the FCC Rules. Operation is subject to the following two conditions: (1) This device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation.
Back to Pro OCR User’s Guide.
file:///C|/VisioneerDoc/html/copyrt.htm (3 of 3) [1/20/2003 4:21:10 PM]

Table of Contents

Contents
Chapter 1: Introducing Visioneer Pro OCR 100
Chapter 2: Learning Pro OCR Basics
Chapter 3: Getting Documents
Chapter 4: Locating Text and Graphics
Chapter 5: Setting Recognize Options and Proofing a Recognized Document
Chapter 6: Saving and Printing Documents
Chapter 7: Creating and Processing Deferred and Batch Jobs
Chapter 8: Tips for Getting the Best Results
Glossary
file:///C|/VisioneerDoc/html/toc.htm [1/20/2003 4:21:11 PM]

Table of Contents

Contents
Chapter 1: Introducing Visioneer Pro OCR 100
Why Pro OCR
Features and Highlights of Pro OCR
Glossary
file:///C|/VisioneerDoc/html/toc1.htm [1/20/2003 4:21:11 PM]

Glossary

Glossary
A4 Letter page size
accelerator key
ADF
alphanumeric word
ASCII
As Single Column locating method
Auto OCR
Auto brightness
automatic document feeder (ADF)
automatic processing
background noise
backup
backwards compatible
bit image
bitmap
bitmapped character
bold text
brightness
broken character
file:///C|/VisioneerDoc/html/glos.htm (1 of 9) [1/20/2003 4:21:11 PM]
Glossary
built-in dictionary
CCITT
character
character format
character identification error
character image
character recognition
character style
clipboard
column information
compression
confidence
consistent document
copyrighted document
deferred job
deferred processing
degraded image
dialog box
desktop
document area
dots per inch (dpi)
file:///C|/VisioneerDoc/html/glos.htm (2 of 9) [1/20/2003 4:21:11 PM]
Glossary
dpi
draft quality text
driver
exporting
export format
file extension
file formats
file type
fine resolution
flatbed scanner
font
font family
font mapping
format retention
Gallery
Get Page
grayscale image
hard page breaks
heavy character
I-beam pointer
file:///C|/VisioneerDoc/html/glos.htm (3 of 9) [1/20/2003 4:21:11 PM]
Glossary
icon
illegible character
illegible character symbol
image view
input file formats
insertion point
italic text
justification
kerning
landscape orientation
layout
layout analysis error
Legal page size
Lenient suspect threshold
letter quality text
line break
Locate
locate region
locating
locating method
menu
file:///C|/VisioneerDoc/html/glos.htm (4 of 9) [1/20/2003 4:21:11 PM]
Glossary
menu bar
multi-column text
monospaced font
monospaced font mapping
newspaper style columns
Normal locating method
Normal suspect threshold
numeric region
OCR
On-Screen Verifier™
Optical Character Recognition (OCR)
order of text regions
orientation
output file formats
page controls
page format
page image
page number box
page orientation
page size
file:///C|/VisioneerDoc/html/glos.htm (5 of 9) [1/20/2003 4:21:11 PM]
Glossary
page source
PCX
picture element
picture region
pixel
pixel-for-pixel
plain text
portrait orientation
printer font
Pro OCR Deferred format
Pro OCR format
Pro OCR process
Pro OCR window
Proof
proportionally spaced font
recognition accuracy
Recognize
recognized text
recognizing
region style
resolution
file:///C|/VisioneerDoc/html/glos.htm (6 of 9) [1/20/2003 4:21:11 PM]
Glossary
Rich Text Format (RTF)
RTF
sans serif
sans serif font mapping
scanner
scanner driver
scanning
screen font
scroll bars
serif
serif font mapping
settings file
sheetfed scanner
side-by-side columns
single-bit image
single-step processing
skewed text
spell checking
standard resolution
status bar
file:///C|/VisioneerDoc/html/glos.htm (7 of 9) [1/20/2003 4:21:11 PM]
Glossary
status display area
Stringent suspect threshold
stroke weight
Style ribbon
stylized font
subscript text
superscript text
supplementary dictionaries
suspect character
suspect threshold
Tag Image File Format
template
template matching
Template locating method
text quality
text region
text style
text view
throughput
TIFF
touching characters
file:///C|/VisioneerDoc/html/glos.htm (8 of 9) [1/20/2003 4:21:11 PM]
Glossary
typeface
type quality
type size
type style
underline text
User Defined page size
user dictionary
view selector
window
Windows
word wrap
zoom controls
file:///C|/VisioneerDoc/html/glos.htm (9 of 9) [1/20/2003 4:21:11 PM]

file:///C|/VisioneerDoc/html/glossary.htm

Glossary
A4 Letter page size An A4 size page measures 8.33" x 11.66". accelerator key In Windows applications, a keyboard shortcut to a menu
command.
ADF See automatic document feeder (ADF).
alphanumeric word A word made up of the alphabetic and numeric characters
(A–Z, a–z, 0–9) in a character set. Excludes punctuation and other symbol characters.
ASCII Acronym for American Standard Code for Information
Interchange (pronounced “ASK-ee”). A standard that assigns a unique binary number to each text and control character. ASCII code is used for representing text inside a computer and for transmitting text between computers or between a computer and a peripheral device.
As Single Column locating method One of Pro OCR’s three locating methods. Use it when you
want Pro OCR to read a page as a single column, from left margin to right margin, ignoring any column or paragraph spacing. Most commonly used for pages in which there is no clear column or paragraph structure.
Auto OCR Clicking this button starts automatic processing, which uses
Get Page, Locate, and Recognize according to the current gallery settings.
Auto brightness A feature of some scanners, by which brightness is adjusted
automatically while the page is scanned.
automatic document feeder (ADF) Built-in or optional equipment for a scanner that lets you
automatically scan stacks of pages instead of having to place them one at a time on the flatbed. Sometimes it’s difficult to control the proper alignment of pages using an automatic document feeder. Compare with
flatbed scanner and
sheetfed scanner.
file:///C|/VisioneerDoc/html/glossary.htm (1 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
automatic processing A method for using Pro OCR with minimal intervention.
Automatic processing involves setting appropriate Gallery settings, before using Auto Start to read in one or more image files or scan in one or more pages. Once page images have been acquired, automatic processing Locates and Recognizes each page image in succession. Automatic processing is best suited to documents that require the same Gallery settings (Page Size, Brightness, Locate method, etc.). Compare with
single-step processing.
background noise Non-character or non-graphic information in a page image
that adversely affects optical recognition. Background noise includes the shading that results from scanning colored paper stock, extraneous marks, dirt or ink bleed. Problems with background noise can be reduced by using the brightness setting in Pro OCR to compensate for the type of noise on the page.
backup (n.) A copy of a disk or of a file on a disk. It’s a good idea to
make backups of all your important disks and to use the copies for everyday work, keeping the originals in a safe place.
backwards compatible The ability of an application to open files created with earlier
versions of that application.
bit image A collection of bits in memory that represents a two-
dimensional surface. For example, the screen is a visible bit image.
bitmap 1. A set of bits that represents the graphic image of an
original document in memory.
2. A set of bits that represents the positions and states of a corresponding set of items, such as pixels. Used by the computer to construct graphic images and fonts. See also bit
image.
file:///C|/VisioneerDoc/html/glossary.htm (2 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
bitmapped character A character image made up of a pattern of dots that exists in a
computer file or in memory as a bitmap. Bitmapped characters cannot be interpreted by a computer. In order for a computer to use bitmapped characters in a word processor or spreadsheet, the characters must first be interpreted by an OCR application and translated into ASCII text.
bold text Text with the bold attribute looks like this. See also text
style.
brightness The relative amount of light or darkness reflected from an
image. A scanner’s brightness control is used in Pro OCR to adjust for pages that are either too light or too dark.
broken character A character with one or more missing pieces, such as a
missing serif, stem, or cross bar. For example, a broken lower case ‘e’ might not have a fully closed loop, which could cause it to be misrecognized. Problems with broken characters can be reduced by using the brightness setting in Pro OCR to darken the image when scanning. Compare with
heavy character and touching characters.
built-in dictionary The dictionary that Pro OCR automatically loads and uses
whenever Recognize is done. The built-in dictionary is used to enhance Pro OCR’s recognition accuracy and also to find misspelled words in the document. Compare with
supplementary dictionaries and user dictionary.
CCITT Abbreviation for Consultative Committee on International
Telegraphy and Telephony; an international committee that
sets standards and makes recommendations for international communication. One of the standards set by CCITT is for the compression of image files. Pro OCR employs CCITT­standard compression methods. See also
compression and
TIFF.
character Any symbol that has a widely understood meaning and thus
can convey information, including alphabetic, numeric, symbolic, and punctuation elements.
file:///C|/VisioneerDoc/html/glossary.htm (3 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
character format Font and style information applied to characters. Character
format information includes the font name and type size, as attributes such as underline, bold, italic, or some combination of these properties. Compare with page format.
character identification error An incorrectly recognized bitmapped character. There are
two kinds of character identification errors—substitutions and rejects. A character substitution occurs when a character is incorrectly recognized as another. A reject character results from the inability of the OCR application to interpret a character image with sufficient confidence. In such cases, recognition is not attempted and the character is flagged as illegible. Compare with
layout analysis error.
character image An arrangement of bits that defines a character in a font. character recognition The OCR process in which bitmapped character images are
interpreted and translated into ASCII computer codes.
character style See
type style.
clipboard In Windows applications, temporary storage for text that is
cut or copied from a document. Text saved in the clipboard may be pasted back into the same or another document.
column information Part of Pro OCR’s page format information. Column
information includes the location of the column on the page, the width of the column, and its left and right margins.
compression Electronic method for reducing the size of a file without
losing any information in the file. Compressed TIFF files take up significantly less disk space than uncompressed files. See also
TIFF and CCITT.
confidence In Pro OCR, a measure of the certainty of an unknown
character’s identity. Above a certain confidence level, a character is automatically recognized. At lower confidence levels, a character may either be recognized, but flagged as a suspect character, or not recognized and flagged as an illegible character.
file:///C|/VisioneerDoc/html/glossary.htm (4 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
consistent document A set of pages or image files where the same Gallery settings
apply to each page in the document. Pro OCR’s Auto Start feature can be used to best effect when a document is consistent.
copyrighted document Most published or printed materials and documents are
copyrighted. It is illegal to use a computer and Pro OCR to copy, store, or reproduce, on paper or electronically, any copyrighted documents without the permission of the copyright holder.
deferred job A file that contains one or more partially processed pages for
Pro OCR to finish processing later on. See also Pro OCR
Deferred format and deferred processing.
deferred processing Provides the ability to individually specify Get Page, Locate,
and recognize settings for particular pages when necessary, while still being able to automatically process a job at a later time.
degraded image An image that contains broken characters, touching
characters and/or background noise. See
broken character,
touching characters and background noise.
dialog box In Windows applications, the standard pop-up box that is
displayed to communicate with the user when a command requires some further action. Some dialog boxes are informational.
desktop Your working environment on the computer—the menu bar
and the background area on the screen. You can have a number of documents or windows on the desktop open at the same time.
document area The main part of the application window in Pro OCR. The
document area shows one page of the current document at a time using the selected View Size setting.
file:///C|/VisioneerDoc/html/glossary.htm (5 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
dots per inch (dpi) A measure of the visual resolution of a display or output
device. Monitor screens typically have resolutions in the range of 70 to 75 dpi. Most common laser printers have a resolution of 300 dpi. The lower the resolution of a page in dots per inch, the lower the visual quality of characters on that page. Pro OCR can quickly and accurately recognize characters scanned in at resolutions down to 200 dpi.
dpi See
dots per inch (dpi).
draft quality text On 9-pin dot matrix printers, the low resolution printing
option. Draft quality text is monospaced and made up of visible dots that do not touch. In Pro OCR, click the Draft Quality button in the Recognize section of the Gallery, to improve recognition on draft quality dot matrix text. Compare with
letter quality text.
driver See
scanner driver.
exporting Saving a document in an external format, such as a word
processor, spreadsheet, text or standard image file. An exported document is created for use outside of Pro OCR.
export format Pro OCR can save and export documents in a variety of
specific word processor and spreadsheet formats. The specific export format is specified in the Save As dialog box.
file extension In the MS-DOS operating system, file names conventionally
consist of a base and a file extension, for example SAMPLE.TXT. In this example, “SAMPLE” is the base, and the file extension is “.TXT”. File extensions are used to identify the type of file. In this example, the file extension indicates that this is a text (ASCII) file.
file formats See
input file formats and output file formats.
file type Different applications create different file types. Some file
types are application-specific. Other file types are generic. The file type indicates what kind of information is contained in the file and what format the information is in. Most applications can only open files of certain file types.
file:///C|/VisioneerDoc/html/glossary.htm (6 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
fine resolution A term associated with FAX modems, referring to the highest
resolution of the image files typically produced by these devices. Fine resolution is approximately 200 x 200 dpi, which is adequate for reliable recognition.
flatbed scanner Scanner with a glass plate on which pages are placed face
down. Although such scanners can only read one page at a time, they can support a variety of paper sizes and it’s easier to control the proper alignment of a page. Compare with
automatic document feeder (ADF) and sheetfed scanner.
font All characters (letters, numbers, and symbols) in one size and
style of a font family. 12 point Helvetica Bold Italic is a font. “Font” is sometimes incorrectly used instead of “font family” or “typeface.” See also
font family and typeface.
font family The complete set of variations of a particular typeface. For
example, Helvetica is a font family. It contains a variety of typefaces including, for example, Helvetica, Helvetica Bold, Helvetica Italic, Helvetica Bold Italic. See also
font and
typeface.
font mapping Set in the Display Options dialog box. Tells Pro OCR which
fonts to use to display recognized text. Also specifies which fonts to use in documents that are exported to Windows­based word processors.
format retention The ability to retain the layout of a page, including margins,
paragraph and column widths, and tabs and indents. Pro OCR preserves as much page format information as export formats support.
Gallery The Pro OCR toolbar. All settings for the Get Page, Locate,
and Recognize stages of the Pro OCR process are set in the Gallery. Common Pro OCR processes—Auto Start, and single-step Get Page, Locate, and Recognize—can be initiated from the Gallery.
Get Page Single-step Gallery function. It is also the first stage of the
Pro OCR process. Scans one page from a scanner or reads one file, using the current Get Page settings.
file:///C|/VisioneerDoc/html/glossary.htm (7 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
grayscale image An image format where individual pixels can be expressed
with more than a single bit, allowing the image to contain true shades of gray. Pro OCR will not open grayscale images. Compare with single-bit image.
hard page breaks Special formatting that you put in manually in a text or word
processor document. Most word processors and text editors automatically create soft page breaks unless you explicitly specify hard page breaks. In Pro OCR, you can force the output application to preserve the page breaks of the input document by clicking the “Insert Hard Page Breaks” checkbox when you are in the Save As Options dialog box.
heavy character In Pro OCR, a character that is printed too dark or thick, so
that the representation obscures detail and reduces confidence in the identity of that character.
I-beam pointer A mouse pointer shape that resembles an upper-case “I”.
When the pointer has this shape, you can select text. See also
insertion point.
icon An image that graphically represents an object, a concept, or
a message. Screen icons can represent disks, documents, application programs, or other things you can select and open. In an application such as Pro OCR, icons are also used to represent various settings in the gallery, Style ribbon, and Status bar.
illegible character A character that Pro OCR cannot recognize with adequate
certainty. Illegible characters in a document are highlighted and displayed with the specified illegible character symbol in the text view. See also
suspect character.
illegible character symbol The symbol Pro OCR uses to display illegible characters in
the text view. Set in the Display Preferences dialog box. See also
illegible character.
image view The view that displays the bitmapped image of a page. Used
to locate regions of text or graphics, and for viewing the original scanned image of a page during proofing and editing.
file:///C|/VisioneerDoc/html/glossary.htm (8 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
input file formats Pro OCR can read documents saved by other applications in
TIFF, PCX and DCX formats, as well as those documents saved in its own proprietary TIFF format. See also
PCX and
TIFF.
insertion point The place in a text file where text is inserted or deleted.
Indicated by a blinking vertical bar.
italic text Text with the italic attribute looks like this. See also
text
style.
justification Alignment of text to the left, right, or both margins of a
column or page. Text may be left-justified, right-justified, center-justified, or fully justified (both left- and right­justified). Pro OCR preserves justification.
kerning A measure of the spacing between characters. In tightly
kerned text, the letters are very close together, which can cause letters to touch when the page is scanned. See also
touching characters.
landscape orientation When you hold a page of text to read it, it is in landscape
orientation when the page is wider than it is tall. Compare with
portrait orientation.
layout The relative position of elements on a page, such as margins,
columns, graphics, titles and sections.
layout analysis error The result of an OCR product’s inability to correctly organize
recognized text into words, lines and paragraphs on the page. There are two kinds of common layout analysis errors— incorrectly interpreting the flow of text on a page and incorrectly grouping or separating side-by-side paragraphs. Layout analysis errors can be more troublesome than character identification errors, particularly with documents having complex layouts. Compare with
character
identification error.
Legal page size See
page size.
file:///C|/VisioneerDoc/html/glossary.htm (9 of 22) [1/20/2003 4:21:13 PM]
file:///C|/VisioneerDoc/html/glossary.htm
Lenient suspect threshold Tells Pro OCR to only highlight suspect characters it is very
uncertain of. Very few characters are marked as suspect, compared to when the suspect threshold is set to normal or stringent. Use it when you’re dealing with documents containing fonts that you know from experience have been recognized accurately or when you’re less concerned with double-checking. Set in the Display Options dialog box. Compare with
Normal suspect threshold and Stringent
suspect threshold.
letter quality text Text made up of characters that are fully formed with dots
that are touching. Compare with
draft quality text.
line break The point at the edge of a line of text where the text flows
onto the next line.
Locate Single-step Gallery function. It is also the second stage of the
Pro OCR process. Specifies which text will be recognized on a page by creating or applying locate regions on the page according to the current Locate and Pictures settings. The current Locate setting may be either Normal, As Single Column, or Template. The current Pictures setting may be Locate Text and Pictures or Locate Text Only.
locate region Defines an area on the page image in the image view and the
text view. The text and picture kinds of locate regions may be defined automatically or manually. All three types of locate regions may be manually defined using the locate region drawing feature, or may be recalled using the Template locating method. See also
text region, numeric region,
picture region, and Template locating method.
locating The process in Pro OCR for specifying which locate regions
will be recognized on a page by creating or applying locate regions on the page.
locating method Tells Pro OCR how to locate regions for processing on a
page. The three locating methods are Normal, As Single Column, and Template. See also
Normal locating method, As Single Column locating method, and Template locating method.
file:///C|/VisioneerDoc/html/glossary.htm (10 of 22) [1/20/2003 4:21:13 PM]
Loading...
+ 167 hidden pages