Xerox TEXTBRIDGE PRO 98 User Manual

TextBridge
PRO
User's Guide
98
COPYRIGHT INFORMATION
Copyright © 1997 by Scansoft, Inc., a Xerox Company. All rights reserved. No part of this publication may be transmitted, transcribed, reproduced, stored in any retrieval system or translated into any language or computer language in any form or by any means, mechanical, electronic, magnetic, optical, chemical, manual, or otherwise, without the prior written consent of Scansoft, Inc., 9 Centennial Drive, Peabody, Massachusetts 01960. Printed in the United States of America.
The software described in this book is furnished under license and may be used or copied only in accordance with the terms of such license.
IMPORTANT NOTICE
TRADEMARKS AND CREDITS
Scansoft, Inc. provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Some states or jurisdictions do not allow disclaimer of express or implied warranties in certain transactions; therefore, this state-ment may not apply to you. Scansoft reserves the right to revise this publication and to make changes from time to time in the content hereof without obligation of Scansoft to notify any person of such revision or changes.
TextBridge is a registered trademark, and Smart Zones, Instant Access OCR, and Custom Proof are trademarks, of Scansoft, Inc., a Xerox
Company. Xerox, The Document Company, and the Stylized X are trademarks of Xerox Corp.
Excel, Word, and Windows are trademarks of Microsoft Corp.
WordPerfect is a registered trademark of WordPerfect Corp.
Other terms used in this manual are the trademarks of their respective holders.
Portions of this product copyright © 1990–1997, Pixel Translations, Inc.
Portions of this product copyright © 1994–1997, Mastersoft Corp.
Designed, written, and illustrated by Lois West and Jim Cahill
© SCANSOFT, INC.
9 Centennial Drive Peabody, Massachusetts 01960
TextBridge Pro 98 User’s Guide
Part Number 00–09066–00 August 1997
CONTENTS
PREFACE
About This User’s Guide ..............................vi
Organization of this user’s guide .....................vi
Documentation conventions........................ vii
Related Documentation .............................. vii
Technical Support ..................................viii
1 INTRODUCTION TO TEXTBRIDGE
Features and Benefits .............................. 1–2
New Productivity features in TextBridge Pro 98 ....... 1–4
Other TextBridge features........................ 1–6
Characteristics of Documents TextBridge can recognize . 1–8
What Comes with TextBridge ....................... 1–10
Scanners Supported............................... 1–11
System Requirements ............................. 1–12
Installing TextBridge ............................. 1–12
Setting Up TextBridge Instant Access ................. 1–19
Uninstalling TextBridge ........................... 1–20
Input Image File Formats Supported.................. 1–21
Output Text File Formats Supported.................. 1–22
Where to Go From Here............................ 1–23
2 OCR AND TEXTBRIDGE
What is TextBridge OCR? ........................... 2–2
Page types.................................... 2–2
Page sources .................................. 2–4
Recomposition ................................. 2–5
Running TextBridge ............................... 2–6
Standalone Application .......................... 2–7
Instant Access ................................. 2–7
TextBridge Pro 98 User’s Guide iii
TextBridge Functionality............................ 2–8
Before You Start to OCR ............................ 2–9
Using TextBridge to OCR .......................... 2–10
Automatic Processing ............................. 2–10
Manual Processing ............................... 2–14
Selecting Page Type and Source .................. 2–15
Previewing the Page ........................... 2–16
Zoning the Page............................... 2–18
Proofreading the Document...................... 2–21
Saving the Document .......................... 2–22
Improving Page Recognition with Settings ............. 2–23
Page Type Settings ............................ 2–23
Scanner Settings .............................. 2–25
Processing Settings ............................ 2–27
Text Document Settings ........................ 2–28
Saving a Document in a PDF File ................. 2–31
Improving OCR with Training....................... 2–32
Where to Go From Here............................ 2–34
3 LEARNING TO USE TEXTBRIDGE
Starting TextBridge................................ 3–2
Using the Help System ............................. 3–4
Using the Sample Documents ........................ 3–6
Session 1: Processing a Simple Document
Using Auto Processing ........................ 3–12
Session 2: Using Instant Access OCR.................. 3–19
Session 3: Processing a Complex Document Using Manual
Processing ................................. 3–25
Session 4: Processing Text, Pictures, and a Table ........ 3–35
Session 5: Training OCR and Using the Page toolbar...... 3–42
Where to Go From Here............................ 3–51
INDEX
iv TextBridge Pro 98 User’s Guide
PREFACE
ScanSoft, Inc., a Xerox Company, welcomes you to TextBridge® Pro 98 for Windows 95™ and Windows NT. (Hereinafter
TextBridge Pro 98 will be referred to as “TextBridge.”)
Before going on to find out more about TextBridge, please read this preface because it describes these important items:
About this user’s guide
Related documentation
Technical support
ABOUT THIS USERS GUIDE
This user’s guide includes introductory information designed primarily for non-technical users as well as information designed for more technical users. It assumes that you are familiar with the management and operation of your computer and Windows.
The documentation that comes with TextBridge should provide all the information you need to operate TextBridge. TextBridge documentation includes this user’s guide, a Help system, and Release Notes. ScanSoft invites your comments about the information provided in the documentation. Please make sure to register your software and provide any comments to ScanSoft.
TextBridge Pro 98 User’s Guide v
Organization of this user’s guide
This user’s guide is designed as a reference tool to provide basic information about TextBridge. It is organized as follows:
Chapter 1, “Introduction to TextBridge,” discusses TextBridge’s
features. It also describes: documents TextBridge can recognize, what comes with TextBridge, supported scanners, system requirements, installation, setting up Instant Access, uninstalling TextBridge, and input and output file formats.
Chapter 2, “OCR and TextBridge,” provides an explanation of the
concepts of document recognition and OCR and the basic functionality of TextBridge.
Chapter 3, “Learning to Use TextBridge,” walks you through
several practice sessions designed to provide a firm basis on which to learn and use the important features of TextBridge.
This user’s guide also provides a comprehensive index for you to quickly locate the information you need.
vi TextBridge Pro 98 User’s Guide
Documentation conventions
As described in Table P–1, TextBridge documentation uses certain graphical elements and formatting to emphasize information and give more meaning to text.
Table P–1. Documentation Conventions
bold Introduces a new term or the first use of an
italic Denotes titles of other user’s guides or books
monospace Denotes text that appears on the computer
“ ” (quotes) Denotes titles of chapters and sections in this
important term in a chapter. Sometimes used to denote strong in-line emphasis.
and generic representations of file name entries in examples; for example, filename
screen such as examples, menu text, and messages plus actual file names.
user’s guide.
Note Introduces information of note about the
RELATED DOCUMENTATION
TextBridge provides a comprehensive set of printed and online documentation designed to assist you in learning and operating the product. The documentation provided with TextBridge covers all aspects of installation and operation.
In addition to this TextBridge Pro 98 User’s Guide, refer to the following documentation for more information:
Preface vii
Introduces tips that provide useful information about a procedural step or system function.
current subject.
Online Release Notes—After you install TextBridge, read the
online Release Notes first. These provide the most up-to-date
information about TextBridge. Release Notes automatically appears in the TextBridge 98 folder. Simply point to Release Notes in the TextBridge 98 folder to open the Release Notes so that you can read them.
Help—An extensive online Help system comes with TextBridge.
The Help provides you with information about the software in general; the menus, commands, and tools; step-by-step procedures; and a glossary.
TextBridge online electronic documentation—This includes
an electronic version of this TextBridge Pro 98 User’s Guide in Adobe Acrobat format (.pdf). The documentation resides on the compact disk in the directory TextBridgePro Documents. Please refer to the Release Notes in that directory for information about using the online documentation.
Multimedia Guided Tour—The Guided Tour provides you with
an introduction to TextBridge.
Note You may need to refer to additional publications, such as the
manufacturer’s documentation for your scanner.
TECHNICAL SUPPORT
If you should experience problems with TextBridge that you cannot resolve with the documentation and software, contact TextBridge Technical Support.
You can contact TextBridge Technical Support by the Internet, telephone, or fax.
This information will assist Technical Support in solving the problem:
viii TextBridge Pro 98 User’s Guide
Your software version number
(This is on the back of the CD-ROM case and in the Help menu under About TextBridge.)
Your software serial number
(This is the serial number on the back of the TextBridge CD-ROM case and in the Help menu under About TextBridge.)
Your scanner make and model
A description of the steps that led up to the problem
If TextBridge generated an error message, a verbatim description
of the error message or its number
Internet and electronic mail addresses
You can also contact Technical Support and get information about TextBridge on the Internet at the addresses in the following list:
TextBridge site: www.textbridge.com
The TextBridge Web site provides a link to Technical Support with Frequently Asked Questions, technical information bulletins, and a problem report form.
E-mail in the United States, Canada, or the Pacific Rim:
Technical Support: textbridge_support@xis.xerox.com
Upgrade information: textbridge_sales@xis.xerox.com
E-mail from European countries and the Middle East:
Technical Support: uk_support@xis.xerox.com
Upgrade information: xisuk@xis.xerox.com
Preface ix
Telephone and fax numbers
Call one of the following telephone numbers or send a fax describing the problem to one of the fax numbers.
In the United States, Canada, or the Pacific Rim:
Telephone: 978–977–0764
Fax: 978–977–2434
From European countries and the Middle East:
Xerox Scansoft Ltd. in England:
Telephone: +44 (0) 1923 209140
Fax: +44 (0) 1923 208446
x TextBridge Pro 98 User’s Guide
1
INTRODUCTION TO TEXTBRIDGE
Welcome to ScanSoft’s TextBridge™ Pro 98, optical character recognition (OCR) software for Microsoft Windows™ 95 and Windows NT. (Hereinafter TextBridge Pro 98 will be referred to as “TextBridge.”)
This chapter provides an introduction to TextBridge including:
Features and benefits
What comes with TextBridge
Scanners supported
System requirements
Installing TextBridge
Setting up TextBridge Instant Access
Uninstalling TextBridge
Input image file formats supported
Output text file formats supported
OCR is a technology that enables you to reproduce the paper documents you use every day into fully editable text on your computer. TextBridge even retains the layout of the original document when possible.
TextBridge Pro 98 User’s Guide 1–1
You can use TextBridge to convert printed documents from fax machines, photocopiers, and dot matrix and laser printers to electronic documents for your word processor or text application as well as documents for some database, desktop publishing, and spreadsheet software. TextBridge OCR can also recognize page image files from scanners as well as fax machines and other sources.
FEATURES AND BENEFITS
Using Xerox’s latest document recognition technology, DocuRT™, TextBridge OCR produces a fully-editable electronic document that retains the original document layout, complete with text and pictures (Figure 1–1). TextBridge understands your original document format, and the keeps the layout the same, including columns, headers, footers, pictures and picture captions. This feature is supported only if your text application supports pictures and layout. For example, this feature is supported in Microsoft Word and WordPerfect but not in Notepad.
1–2 TextBridge Pro 98 User’s Guide
Original document
Recomposed document in word processor
Figure 1–1. TextBridge document recomposition
TextBridge offers many productivity features. Whether you need to capture a simple one-page letter, a magazine article, a spreadsheet, or a long transcript, TextBridge can save you valuable time and effort. In addition, TextBridge provides all the capabilities that experienced OCR users have come to expect.
Introduction 1–3
New Productivity features in TextBridge Pro 98
TextBridge offers these major features:
Improved OCR accuracy. Dramatically saves time and
eliminates retyping.
Instant Access™. You can start TextBridge within most
Windows text programs. After recognizing and converting the page image to text, TextBridge then automatically pastes recognized data (text and pictures) directly into the text program’s open document.
Document recomposition. TextBridge offers true document
recomposition to retain your original page layout. It reproduces multiple columns, cell tables, and pictures in the same location as they are in your original document.
When you specify output to Microsoft Word™ or WordPerfect
format, TextBridge can retain the original document layout in fully-editable form, even for pages containing tables, line art, reverse video, drop caps, insets, and pictures. When you edit the document, the original text flow is maintained.
When you specify output to Microsoft Excel™ or Lotus 1-2-3
format, spreadsheets and cell tables retain their original layout as cell tables not tabbed columns. When you edit the table information, the lines move to fit exactly as you would expect.
TextBridge supports formats for the the following applications that retain page layout:
Excel 3.0–5.0
Excel Max 3.0–7.0
Excel Office 97
®
1–4 TextBridge Pro 98 User’s Guide
HTML
HTML Editor
HTML Netscape
Lotus 1-2-3
Quatro Pro
Word 6.0
Word 7.0 in Office 95
Word for Windows 2.x
Word Office 97
Word Perfect 6.1, 7.0, and 8.0
TextBridge wizard. An easy-to-use wizard guides you through
each step of the TextBridge process, including page type selection and recomposition options.
Page type templates. TextBridge provides many predesigned
page type templates to make processing more efficient. Templates automatically provide appropriate settings for the type of page you want to process. For example, there is a magazine page type and a letter page type that automatically activate settings for improved results. Page types incorporate three page settings: page size, page loayout, and print quality. You do not have to go through a complicated process of determining and specifying settings for common types of pages.
Built-in Proofreader™. After document recognition, you can use
TextBridge’s built-in proofreader to view and accept or correct any words that TextBridge suspects may not be recognized accurately.
Automatic zoning and zone editing. TextBridge automatically
zones your page is into text, picture, and table zones.
Introduction 1–5
Zone editing. You can edit the automatic zones to further refine
the zoning. Use zone editing to increase the accuracy and efficiency of page processing by reshaping zones and renumbering them.
Adobe Acrobat PDF output. You can output the document in
Adobe Acrobat Portable Document Format (PDF), which can be viewed on either a PC or Macintosh computer.
Dynamic OCR training. You can train OCR to improve
recognition accuracy as the job progresses. Use dynamic training with difficult documents, such as faxes or multi-generation photocopies. TextBridge enables you to interact with the OCR process and view then accept or correct its automatic recognition decisions. The software actually learns special symbols and words.
Guided Tour. Multimedia introduction to TextBridge including
a guide to major features.
ToolTips and What’s This? Help. Instant context-sensitive
information about commands, dialog boxes, and buttons on the interface.
Other TextBridge features
In addition to the features listed in the previous section, TextBridge provides these other productivity features:
Windows 95 certification
Microsoft Office 97 certification
MMX support
1–6 TextBridge Pro 98 User’s Guide
Broad scanner support. TextBridge supports most popular
desktop scanners. It provides many built-in Image and Scanner Interface Standard (ISIS) drivers supporting a number of scanners. It supports the TWAIN device interface standard. It also supports the Text Enhancement Technology and Auto Area Segmentation features of the EPSON® scanner family. It also supports some Windows NT scanners.
Image processing. TextBridge accepts a wide range of images
from a variety of sources for processing. Specifically, the program imports and recognizes online document images in BMP, Delrina, PCX, DCX, TIFF, and XIF formats originating from fax modems and other sources. For more information, see the “Supported Input Image File Formats” section in this chapter.
OLE drag and drop. TextBridge supports Windows OLE
standard drag and drop operations. For example, you can drag an image from an OLE-compliant program onto TextBridge.
Clipboard support. TextBridge can import and recognize
images from the Clipboard.
Deferred processing. TextBridge enables you to scan all pages
of a document to a TIFF file, then later open the image file for document recognition.
Output text formats including HTML. TextBridge supports a
number of output text formats, including word processor, desktop publishing, portable document, spreadsheet, HTML, and database formats. Now you can process your text for publication on the Web. For more information, see the “Supported Output Text File Formats” section in this chapter.
Preview with manual zoning. TextBridge provides a set of
tools for previewing page images before processing them. You can view a page before continuing with processing. You can manually define areas of page images as zones to be processed and capture only the text, tables, or pictures you want. You can also edit the automatic zoning by adjusting the text, table, and picture zones.
Introduction 1–7
Zone templates with re-usable data. After you create a set of
zones, TextBridge lets you save and reload the zone templates for new jobs. In this way you can consistently process or ignore specific areas on the same type of pages and save time.
Re-usable training data. After you interactively train OCR,
you can save the training data in a file. You can reload this training file for similar documents of the same page type. Using this training file assures the highest recognition accuracy without having to repeat the training.
Custom dictionaries. To improve recognition accuracy further,
you can create specialized word lists (scientific terminology, proper names, acronyms, and so on) within TextBridge or in ASCII files and load them into TextBridge.
Two-sided document processing. If your scanner has a sheet
feeder, you can scan the fronts (odd sides) of the pages first, then flip the stack and scan the reverse (even) sides. When scanning and recognition are complete, TextBridge automatically collates the text.
With these features, you can import virtually any paper document or document image file to your computer. TextBridge attains the highest degree of OCR accuracy and provides the output in fully editable form in your favorite text program.
Characteristics of Documents TextBridge can recognize
TextBridge includes a number of advances developed by the Xerox Desktop Document Systems (DDS) division and by the Palo Alto Research Center (PARC), where modern computer interfaces were invented.
Consequently, TextBridge provides the highly accurate OCR and format retention results on the widest range of documents. TextBridge can recognize:
Documents printed on typewriters, phototypesetters, and impact,
ink-jet, dot-matrix, and laser printers
1–8 TextBridge Pro 98 User’s Guide
Photocopied, degraded, or dirty documents
Documents with single- or multiple-column layouts
Spreadsheets or cell tables
Paper documents with black and white, grayscale, or color
pictures including photos and line art
Page image files with black and white pictures
Note After processing with TextBridge, all pictures are output as black
and white or grayscale. However, TextBridge can recognize and retain color and grayscale pictures in XIF files.
Online single- or multiple-page images from fax modems and
other sources
Hard-copy faxes
Documents with point sizes ranging from 5-point to 72-point type
in practically any typeface
TextBridge software in English, French, German, Italian, or
Spanish
Documents composed in English, French, German, Italian, or
Spanish
Note TextBridge versions shipped in international markets can
recognize an even greater number of languages: Danish, Dutch, Finnish, Norwegian, Portuguese, and Swedish.
Introduction 1–9
WHAT COMES WITH TEXTBRIDGE
TextBridge comes with the following items:
One installation CD-ROM. The CD-ROM includes software
programs, scanner drivers, language packs, sample page image files, release notes, Help, and User’s Guide in PDF format.
A printed User’s Guide.
A software registration card.
Note Be sure to register electronically or complete and return the
software registration card. Registration entitles you to free customer support and assures that you are kept up-to-date on new software releases and other information related to TextBridge and the ScanSoft family of software.
In the US, the mailing address is on the registration card. In the UK, the mailing address for registration is:
XIS Support Department (PSC) Willow Grange Church Road Watford Hertfordshore WD1 3QA
Check to be sure you have all the items listed above. If any item is missing from your TextBridge package, call your authorized ScanSoft dealer.
For information about contacting ScanSoft, refer to the Preface of this manual or the Help system.
1–10 TextBridge Pro 98 User’s Guide
SCANNERS SUPPORTED
Built-in ISIS drivers provided by Pixel Translations Inc.
The TWAIN standard, which lets TextBridge work with virtually
Text Enhancement Technology and Auto Area Segmentation
Note Install your scanner before you install TextBridge.
TextBridge works with many popular desktop scanners using:
any fully TWAIN-compliant device that provides a binary image in a supported size and resolution.
features of the EPSON® scanner family.
The full list of scanners supported by TextBridge is always growing. Check the online Release Notes and the TextBridge Web site at www.textbridge.com to find the latest list of supported scanners. If your scanner is not in this list, call ScanSoft or to check further if your scanner is supported.
Scanners require a system-level driver or a TWAIN source driver, which is provided by the scanner or interface card manufacturer. Consult the scanner documentation for details about installing your scanner, interface card, and driver.
After installing your scanner, test that the scanner is functioning. Refer to the scanner manufacturer’s documentation to answer any questions about the scanner.
Note Your scanner must be working independently of TextBridge prior
to connecting it to TextBridge.
In general, it is recommended that you turn on your scanner before you turn on your PC.
Next, install the TextBridge software.
Introduction 1–11
SYSTEM REQUIREMENTS
To install and run TextBridge 98, your Windows-compatible PC must be equipped with the following:
An Intel (or compatible) 80486 or Pentium™ microprocessor
VGA, SVGA, or multi-sync color monitor
A minimum of 16 megabytes of random access memory (RAM) for
Windows 95 and Windows NT
Microsoft Windows™ 95 or Windows NT 4.0
A hard disk with a minimum of 20 megabytes (20 MB) of free
space in which to install TextBridge. This enables installation of all TextBridge software and one language pack. Please allow one megabyte (1 MB) for each additional language pack you intend to install.
INSTALLING TEXTBRIDGE
After you have installed your scanner and checked that it is working, you are ready to install TextBridge software.
This section provides procedures to install TextBridge.
Note If you want TextBridge to run on both Windows 95 and NT with a
dual boot, install TextBridge twice.
Before you begin installation, exit from any open applications so that only Windows is running. There should be no applications listed in the task bar and no floating toolbars.
To install TextBridge for Windows 95/NT 4.0:
1–12 TextBridge Pro 98 User’s Guide
Read the instructions
Click Next to proceed
1. Insert the TextBridge CD into your CD-ROM drive.
An autorun program on the CD-ROM launches the TextBridge setup program automatically. (You can also use the Windows Explorer to open the drive, and double-click the autorun.exe program at the top level of the CD-ROM.)
The TextBridge 98 Setup dialog box appears (Figure 1–2).
Figure 1–2. TextBridge Pro 98 Setup dialog box
2. Read the information in the Setup dialog box, then click Next.
3. Read the Software License Agreement (Figure 1–3), then click Yes to proceed with installation.
Introduction 1–13
Read the agreement
Click Yes to proceed
Figure 1–3. Software License Agreement dialog box
If you click No because you do not accept the license
agreement, the TextBridge setup closes.
4. The Setup Installation Type dialog box appears (Figure 1–4). Complete this screen, as follows.
1–14 TextBridge Pro 98 User’s Guide
Read the instructions
Click Next to proceed
Figure 1–4. Setup Installation Type dialog box
Choose one type of installation:
Typical (Recommended for most users.)
Compact (Used for minimum installation.)
Custom (Used to select OCR language packs to install.)
Accept the default destination directory, or Browse for a
different directory. (It is recommended that you install TextBridge in the default directory.)
Click Next to install the TextBridge files onto your
destination disk directory.
5. After the files are copied onto your system, the installation of TextBridge is complete.
All of the TextBridge program is now installed.
Introduction 1–15
Select your scanner
6. A Scanner Setup dialog box asks if you want to set up your scanner. Select Yes (Figure 1–5).
Figure 1–5. Scanner Setup dialog box
Click No if you are not using a scanner or are not sure at this time.
7. The Scanner Setup dialog box appears (Figure 1–6). Complete this screen as follows:
Figure 1–6. Scanner Setup dialog box
• If you will not be using TextBridge with a scanner, select No
scanner, then click OK. If you want to use a scanner at a later time, you can get this dialog box from the Select Scanner in the File menu
1–16 TextBridge Pro 98 User’s Guide
• Select your scanner, then click OK.
• If applicable, click Configure to further define your scanner
configuration. Refer to your scanner documentation for details about scanner configuration settings.
For some scanners, a dialog box appears that lets you define settings including Port Address, SCSI ID Number, Transfer Mode, and Scanning Speed.
For other scanners, a dialog box appears with the following message:
This scanner’s configuration is set using the system-level driver.
When you are finished specifying scanner configuration settings, click OK to save the new settings.
• Click OK in the Scanner Setup dialog box.
8. Complete the electronic registration.
Follow the instructions in the registration dialog box.
Introduction 1–17
Read the instructions
Click Done to proceed
Figure 1–7. TextBridge Product Registration dialog box
If your PC is not set up for electronic registration, please fill in the registration card, and mail it.
9. The Setup Complete dialog box appears. Specify when you want to restart your PC, then click OK.
Figure 1–8. Setup Complete dialog box
1–18 TextBridge Pro 98 User’s Guide
Click OK to restart your computer
Restarting is necessary to complete TextBridge setup. It is
recommended that you restart immediately. However, if you want to perform other activities before restarting, you can click No.
Congratulations! TextBridge setup is now complete, and your new software is installed on your PC.
SETTING UP TEXTBRIDGE INSTANT ACCESS
When you restart your PC, you can use the TextBridge Instant Access Control Panel dialog box to set up Instant Access (Figure 1–9). To set up TextBridge Instant Access from your other programs, use the following procedure:
1. On the Windows task bar, click Start.
2. Point to Programs, then point to the TextBridge Pro 98 folder.
3. Click Instant Access Control Panel.
The TextBridge Instant Access Control Panel dialog box appears.
Check one or more programs
Click OK
Figure 1–9. TextBridge Instant Access Control Panel
Introduction 1–19
4. Check one or more programs in the list.
5. Click OK.
More information is available if you click the Help button in the dialog box.
TextBridge will now be available in the File menu of the program(s) you checked if they are installed on your PC.
UNINSTALLING TEXTBRIDGE
To restore your PC to the state it was in before you installed TextBridge, use the following procedure:
1. Close all active applications, including TextBridge.
2. On the Windows task bar, click Start.
3. Point to Programs, then point to the TextBridge Pro 98 folder.
4. Click TextBridge Uninstall.
The TextBridge Uninstall dialog box appears.
5. Click Yes to continue the uninstall process.
TextBridge automatically uninstalls.
Click No to exit the uninstall process.
6. The Uninstall Complete dialog box appears. Click OK to restart your computer.
With the above steps completed, TextBridge is completely uninstalled from your PC.
1–20 TextBridge Pro 98 User’s Guide
INPUT IMAGE FILE FORMATS SUPPORTED
The source of page images for TextBridge can be your scanner or it can be image files. TextBridge can recognize the following types of image file formats:
Image File Format File Name Extension
Windows bitmap .bmp PCX .pcx Multi-page PCX used in some fax programs .dcx Tag image file format (including Alacrity
TIFF) Delrina WinFax fax image files .fxr, .fxd, fxm, .fxs Extended image file .xif
All the previous image files must be black and white with the exception of .xif, which can contain color or grayscale images. TextBridge can process images in resolutions from 72 to 900 dots per inch. However, you will not receive noticeably better OCR on images with resolutions higher than 400 dpi. In addition, you may encounter memory errors or at least slower processing time. It is recommended that you scan at 300 or 400 dpi.
Note This list is subject to change. Refer to the online Release Notes
for the latest information.
.tif, .ala
Introduction 1–21
OUTPUT TEXT FILE FORMATS SUPPORTED
TextBridge can convert its recognized text to files for the following programs:
Program File Name
Adobe PDF/Normal .pdf Adobe PDF/Image Only .pdf Adobe PDF/Image and Text Only .pdf Ami Pro 2.0 and 3.0 .sam ASCII Smart, Standard, and Stripped .txt dBase IV .dbf DCA/RFT .rft DisplayWrite 5 .rft Excel 3.0, 4.0, and 5.0 .xls Excel for the Macintosh 3.0 to 7.0 .xls Excel 97 .xls FrameMaker .mif HTML .htm HTML Editor .htm HTML Netscape .htm Interleaf .wps Lotus 1-2-3 .wk1 Lotus Word Pro .lwp MSWorks .rtf MultiMate Advantage .doc PostScript .ps Professional Write 2.0 and 2.2 .doc Quatro Pro for Windows .wb2 Rich Text Format .rtf
Extension
1–22 TextBridge Pro 98 User’s Guide
Program File Name
RTF for the Macintosh .rtf Windows Write .wri Word for Windows 2.x .rtf Word 6.0 and 7.0 .rtf Word 97 .rtf WordPerfect 4.2 and 5.1 .wpf Word Perfect 6.0, 6.1, and 7.0 .wpd WordStar .doc
Note This list is subject to change. Refer to the Release Notes for the
latest information.
WHERE TO GO FROM HERE
To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 2. This chapter explains the basic concepts and functions of the software.
Extension
To learn how you use TextBridge to process simple and complex documents, refer to Chapter 3. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor.
The online Help system provides a complete reference to the user interface, including window areas, menus, commands, and tools as well as overview information and features, step-by-step procedures for using the software, tips, and a glossary.
Introduction 1–23
2
This chapter provides information about the process of page recognition. Use this chapter to learn about optical character recognition (OCR), recomposition, and other concepts that
will help you use TextBridge effectively.
This chapter provides information about OCR and TextBridge including:
What is TextBridge OCR?
Running TextBridge
TextBridge functionality
Before you start to OCR
Using TextBridge to OCR
Automatic processing
Manual processing
Improving Page Recognition with Settings
Improving OCR with Training
Page recognition refers to the process during which a page is analyzed, and characters and words are saved as a text. Optical character recognition is the technology that converts documents that you can read into documents that your computer can read. Recomposition is the technology that reproduces the formatting of text and the layout of the page, including the positioning of text, pictures, and tables.
TextBridge Pro 98 User’s Guide 2–1
WHAT IS TEXTBRIDGE OCR?
TextBridge is OCR software that turns paper documents or page image data into image documents then into text documents on your PC. Page image data is electronic information about the pages of a document that comes from a source such as your scanner or fax software. This data becomes an image document and is stored in an image file. Text documents are files containing information about the text and pictures in your document. A text document contains one or more pages and is expressed in text form and stored in a text file. You can open, edit, reformat, and republish this information.
Page types
TextBridge can recognize a wide variety of pages. All you need to do is specify settings to control page processing. To make this easier and quicker, TextBridge gives you a set of page types. These are common types of pages that you match with your original pages. Each page type comes with default settings that are most often used to process pages of that type. Using page types makes it quick and easy for you to perform page recognition. You just select the page type that most closely matches your original page.
2–2 TextBridge Pro 98 User’s Guide
Figure 2–1. Page types in Start dialog box
Page types incorporate three settings: page size, print type, and page layout. The page types to choose from and their characteristics are described in the following table:
Page Type Page Size Print Type Page Layout
Any Page Letter Any Any Any Page (Fax quality) Letter Fax Any Legal Document Legal Good Single column Magazine Page Letter Good Multi-column Memo or Letter Letter Good Single column Newspaper Legal Newspaper Multi-column Spreadsheet or Table Letter Good Table and one-
column text
In addition to the settings connected with page types, there are several other settings that control page processing. These have default settings. The scanner resolution is set at 300 dpi. Page layout, pictures, and paragraph styles are automatically retained. You can modify these settings by using the Page Type tab in the Settings dialog box.
OCR and TextBridge Pro 2–3
Page sources
Figure 2–2. Page Type tab in Settings dialog box
You can get pages to process from your scanner or from page images. Use your scanner as a source to input documents on paper to TextBridge, which then takes the scanned images, performs OCR, converts the recognized text and pictures to the text file format of your choice, and stores it on your PC. Alternatively, use TextBridge to recognize and convert page images stored in image files that come from fax modems or other sources. Refer to Figure 2–1, which illustrates where to select your page source.
2–4 TextBridge Pro 98 User’s Guide
Recomposition
TextBridge recomposition lets you keep the layout of the original page. When you select Retain page layout, TextBridge recomposes the layout, while maintaining full ability to edit in the output file. After recomposition, text, pictures, and tables are in the same position in relation to each other as in the original page. You can see the results of recomposition when you print the page or look at it in layout view, if your word processor supports these elements.
Retain pictures keeps pictures in the saved document. If you do not select retain page layout, pictures are saved at the end of the document. If you select retain page layout, the pictures are in the same position in relation to the text and each other as they were in the original page when you print the page or view it so that you can see the layout.
Format with paragraph styles makes it possible for you to see the specific styles assigned to text by TextBridge. Formatting styles include indentation, font size and style, underline, bold, and italic. Paragraph styles have names that begin with “TxBr” followed by two sets of numbers separated by a letter. The first number represents the number of times you have pasted into the application. The letter represents information about the style. The letter “c” means that the paragraph is centered, “p” means a regular paragraph, and “t” means a tab table. The second number represents a specific style. For example, in the Style box in Word, you can see and use the styles assigned by TextBridge such as TxBr_3c11. The number of consecutive Instant Access pastes into Word is 3, the paragraph is centered, and the style number is 11.
TextBridge can output formatted text and graphics to word processor and spreadsheet file formats, as well as some database, desktop publishing, and electronic document file formats.
OCR and TextBridge Pro 2–5
It is important to note that in reconstructing the layout of the original document, TextBridge is limited by the composition capabilities of the text application. For example, there are some complex magazine pages originally created with a publishing program for which you will not get identical output in your word processor. Even the most powerful word processors do not have some of the composition capabilities of publishing software.
In addition, some complex, free-form layouts defeat TextBridge’s recomposition capabilities. For these types of documents, it is often best to preview pages and manually zone text and image zones that you want to capture.
For some documents, you may want only the text in simple galley (one-column) form. In this case, you would not want to retain the layout. The output document will have a single column of all the text in the original document. If you choose to retain paragraph styles, the text formatting but not the page layout will be retained. For example, the final document will have paragraphs and headings in styles like the original document and in the order of the original document. If you choose to retain pictures, the pictures will be at the end of the document. If you use zone ordering, you can number the zones in the order in which you want them to be in the final document.
Note TextBridge is not designed to recognize and retain the layout of
forms, including forms designed with fill in the blanks, check boxes, or vertical and horizontal lines separating fields of information.
RUNNING TEXTBRIDGE
You can run TextBridge as standalone application or invoke it from within an application with Instant Access.
2–6 TextBridge Pro 98 User’s Guide
Standalone Application
Instant Access
When you start TextBridge from the Start menu, it operates as standalone application and runs independently of any other application. TextBridge recognizes pages and saves them in the output format that you specify. You can then open the file in the application that uses the format you specified.
Instant Access gives you direct access to TextBridge from applications such as Microsoft Word. Programs with Instant Access have a TextBridge command in the File menu. Clicking TextBridge in the File menu starts TextBridge, which recognizes pages and pastes them directly into the open document in the program.
You operate TextBridge just as if you had started the standalone TextBridge software. The differences between running TextBridge standalone compared to Instant Access are:
The options to process automatically or manually and to retain pictures and retain layout are available in the Start dialog box.
Deferred processing is not available.
TextBridge automatically determines which format to use based
on the application being used.
The Instant Access Control Panel, which is available from the Start menu, enables you to specify which programs have Instant Access to TextBridge. The programs in Figure 2–3 automatically have Instant Access.
OCR and TextBridge Pro 2–7
Figure 2–3. Instant Access Control Panel
TEXTBRIDGE FUNCTIONALITY
You can perform the activities in the following list with TextBridge:
Select a page type for the type of page to process.
Select page type settings for the entire document or change
settings on a page-by-page basis.
View the scanned page and delete the page if desired.
Scan pages but defer OCR until later.
View the text and picture zoning.
Adjust the zoning, use a zone template, or entirely redo the
zoning.
Select a portion of a page to process.
Train OCR on one or more pages.
View the OCR results.
Proofread the OCR results.
Save the recognition results in one or more file formats.
2–8 TextBridge Pro 98 User’s Guide
BEFORE YOU START TO OCR
The following checklist will take you through the most important questions to ask before you start to process a document.
1. Is my document coming from my scanner or an image?
2. What type of page is the document?
3. Is this document a good candidate for OCR?
4. Do I want to retain the original layout of the document?
5. Does the original document have pictures? If so, do I want to
retain the pictures?
6. Do I want to save the document as a PDF document?
7. Do I want to stop to train OCR?
8. Are there any other settings I want to check and change?
The rest of this chapter provides information that helps you to answer these questions.
USING TEXTBRIDGE TO OCR
The next two sections provide information on automatic OCR processing as well as other more advanced processes that are options in manual processing. Refer to the Help system for the step-by-step procedures for these activities.
TextBridge provides an easy-to-use interface and a powerful set of built-in capabilities. You can use it in a number of ways to do OCR, depending on the complexity of the document to be recognized. You can use TextBridge to OCR in automatic mode or manual mode.
OCR and TextBridge Pro 2–9
You can process all the pages automatically in the automatic mode or interact with the process in the manual mode.
You can preview and zone pages before OCR.
You can train OCR to achieve the highest possible accuracy.
AUTOMATIC PROCESSING
When you use TextBridge’s automatic processing, TextBridge processes pages automatically with very little interaction with you. In automatic mode, once you select the page type, TextBridge automatically recognizes your page(s). TextBridge only stops for you to add more pages and to proofread the results of recognition. You may request to train OCR, but this is optional.
The following steps describe the automatic process of using TextBridge for page recognition. Refer to the Help system for the step-by-step procedures for these activities.
1. Click the Auto Process button.
2. Select a page type.
3. Select the source of the page image, either your scanner or image file.
4. Click OK.
5. TextBridge processes all the pages in your scanner or the selected image file(s).
6. If scanning, click the More Pages button to add another page to the final document. (Optional)
7. If scanning, click the No More button when there are no more pages to add.
2–10 TextBridge Pro 98 User’s Guide
Click the Auto Process button
8. TextBridge recognizes the page(s) including character, picture, and format recognition.
9. Proofread the results of character recognition.
10. Save the text and picture(s) in a file format of your choice.
During the automatic process, you will interact with these screens.
Figure 2–4. Click the Auto Process button in the TextBridge window
OCR and TextBridge Pro 2–11
Click No More to proceed when ready
Figure 2–5. Start the automatic process using the Start dialog box
Figure 2–6. Add Pages to Scanner dialog box
2–12 TextBridge Pro 98 User’s Guide
Click Save when finished proofreading
Proofread toolbar
Figure 2–7. Proofread the results of recognition using the Proofread toolbar
Figure 2–8. Save the document using the Save As dialog box
OCR and TextBridge Pro 2–13
MANUAL PROCESSING
Preview the page
Zone the page manually
Proofread the document
1. Click the Get Page button.
TextBridge is powerful OCR software that enables you to get professional results from page recognition. Page recognition is a complex process, and it can require your interaction with TextBridge to get the best output. There are a number of opportunities during the page recognition process that allow you to enhance the results for the particular document and future similar documents. During manual mode, you lead TextBridge through processing a document.
During manual processing you can stop the page recognition process to perform the activities in the following list:
The following steps describe the manual process of using TextBridge for page recognition. Refer to the Help system for the step-by-step procedures for these activities.
2. Select a page type.
3. Select the source of the page image, either your scanner or an image file.
4. Click OK.
5. TextBridge processes the first page image.
6. Preview the page, including zoning.
7. Click the Recognize Page button.
2–14 TextBridge Pro 98 User’s Guide
8. TextBridge recognizes the page, including character, picture, and format recognition.
9. Proofread the results of recognition.
10. Add more pages to the document. (Optional)
11. Save the text and picture(s) in a file format of your choice.
Each of these activities is explained in the next sections.
Selecting Page Type and Source
When you start processing a new document, the TextBridge — Start dialog box appears, and you can perform the actions in the following list:
Indicate whether pages are from your scanner or an image file.
Select the Page type that best matches your original page(s).
View and change the settings for the page type you selected.
Figure 2–9. Start the manual process using the Start dialog box
OCR and TextBridge Pro 2–15
Page toolbar
Note You can use the optional Page toolbar to select the page type and
source rather than the Start dialog box. From the Toolbars dialog box in the View menu, choose the Page toolbar.
Figure 2–10. Page toolbar
Previewing the Page
After TextBridge gets a page of a document and before it begins page recognition, you preview the page. You commonly use Preview to check the contents, brightness, orientation, and quality of the page and delete unwanted pages from the document. After you check the page, zone it.
2–16 TextBridge Pro 98 User’s Guide
Preview toolbar
Figure 2–11. Preview the page using the Preview toolbar
Processing stops after TextBridge gets each page and displays the acquired image of the original page. At this point, you can perform one or more of the activities in the following list:
Check that this is the page you want.
Check the quality of the scanned page.
Rotate the page to turn the page upright, if necessary.
Use the Zoom commands to magnify or reduce the page view.
Delete the page from the document.
Adjust the settings for processing the page.
Cancel the process by creating a new file or opening another file.
Look at the properties of the page.
Continue processing the page.
You can use the Preview toolbar or View menu commands to examine and orient the acquired page.
OCR and TextBridge Pro 2–17
Zoning the Page
During preview and before recognition can begin, the page must be zoned. TextBridge can automatically zone the page, or you can zone the page manually. An acquired page is divided into one or more zones. There are three types of zones: text, table, and picture.
Text zone Contains text and can be normal or reverse (white
characters on a black background).
Table zone Contains tables divided into cells. Tables can be
ruled and unruled.
Picture zone
Note A form is not a table. TextBridge does not OCR forms and does
not retain the original layout of most forms.
Each type of zone has a different transparent color so you can easily distinguish among them. TextBridge assigns colors to each type of zone. You can change the assigned colors in the Options dialog box. TextBridge also orders zones for output. TextBridge assigns a number to every text and table zone in the following order: headers, text including titles and headings, insets and picture captions, and footers. You can change the order of the zones during preview. This is useful when you want to output the document without retaining the page layout and reorder the paragraphs in the output document.
Only those parts of the page that are marked with zones are recognized by TextBridge. If you want to recognize only part of a page, mark only that portion. TextBridge does OCR on text and table zones and converts both to text. OCR is not done on picture zones. Picture zones are saved as part of the output. They are not saved as separate files.
Contains any graphic art such as line art and photographs, and halftone images, which you see as shades of gray.
2–18 TextBridge Pro 98 User’s Guide
Preview toolbar
When you stop to preview the page, you can display the zones automatically generated by TextBridge. You can adjust these zones before continuing the zoning process and recognizing the page. You can also manually zone the page. Use the zoning tools in the Preview toolbar called text marker, table marker, picture marker, and erase marker like highlighting markers to create and adjust zones.
Figure 2–12. Zone the page using the Preview toolbar
You can perform these activities related to zones:
Use TextBridge automatic zoning.
Mark text, table, and picture zones.
Zone only part of a page.
View and adjust the text and picture zoning by adjusting the size,
merging, or splitting the zones.
OCR and TextBridge Pro 2–19
Drag a selected zone to adjust its position.
Delete zones so that text, tables, or pictures are not included in
the final document.
Display the attributes of the current zone.
Change a zone from one type such as text to another type such as
table.
Enlarge or reduce the page to view the zones, using the Zoom In and Zoom Out buttons.
You can also perform these less common activities related to zones:
Use the same zoning for subsequent pages of the same document.
Save the current set of zones (including their size, location, and
type) as a template and use the zone template on other documents.
Change the colors used to highlight different types of zones.
Create polygonal zones with intersecting rectangular zones.
Select zone order. Select zones and assign a number to each zone
to determine the order in which zones are output to the text document. However, if you choose to Retain Page Layout, TextBridge ignores the zone order.
You can use the Preview toolbar to quickly perform many of these activities or use commands in the File, Edit and View menus. After you complete the preview, tell TextBridge to recognize the page.
2–20 TextBridge Pro 98 User’s Guide
Proofreading the Document
In automatic mode, after TextBridge recognizes all the pages of a document, you can proofread the recognition results. TextBridge displays the first page for proofreading after all the pages of a document have been recognized and before you save the document. In manual mode, TextBridge stops for you to proofread after it recognizes each page. The page is laid out like the original page. Pictures found by OCR are displayed in the same location as in the original page.
Note Pictures in a XIF file are not shown, and place holders are
displayed in their place. Proofreading is not available if you are creating a PDF file.
Proofread toolbar
Original Image
Figure 2–13. Proofreading the page using the Proofread toolbar
OCR and TextBridge Pro 2–21
Saving the Document
Words that TextBridge suspects may not have been recognized correctly are color coded. Suspect words are identified by one color and unrecognized characters are highlighted in another color. By default, the suspect words are blue, and the current word in the Suspect box is yellow in the view area and Original Image window. Use the Proofread toolbar to correct words.
You can add corrected words to the user dictionary, which can improve recognition in subsequent pages of the same document and subsequent documents. The user dictionary is most useful for non-standard words that you frequently need to recognize such as proper nouns and technical words.
While you are still in proofread mode, you can add pages to the final document by getting a page using either the automatic or manual process.
After you finish proofreading the document, you are ready to save it. Once you save the document, you can not add any more pages to it. You can specify the location, name, and type of format of the output document. TextBridge converts the document to the format of your choice and saves it. You can save the same document more than once using the Save As command. For example, you can save the document as text only and then save it with pictures and layout.
2–22 TextBridge Pro 98 User’s Guide
Figure 2–14. Saving the page using the Save As dialog box
After you save the document, the image of the document remains on the screen until you begin a new job.
IMPROVING PAGE RECOGNITION WITH SETTINGS
There are a number of settings that you select in TextBridge at the beginning of the recognition process to help it recognize a document with more accuracy. Many of these options are related to the manual processes described in the previous section. Use the Settings dialog box to specify which options of the software you want to use.
Page Type Settings
Usually, you will want to use the settings automatically assigned to a page type. However, it is possible for you to change these settings.
OCR and TextBridge Pro 2–23
You can view and change the settings for a page type in the Page Type tab of the Settings dialog box. Check the settings to be sure they are the best ones for processing the original page.
Figure 2–15. Page Type tab in the Settings dialog box
On the Page Type tab, you can choose the following page type settings for the specific page type you select:
Select the page layout of the original page:
Any layoutSingle column with or without pictures and tablesMulti-column with or without pictures and tablesTable for pages with a table or spreadsheet and single-column
text.
2–24 TextBridge Pro 98 User’s Guide
When you select Any layout, TextBridge automatically determines the page layout. Use Any layout when pages in your document have different layouts or when your pages have complex layouts that do not fit the above layouts.
Set the print type of the document to be processed.
AnyGoodFaxDot matrixNewspaper
When you select Any, TextBridge automatically determines the print type.
Set the page size to reflect the actual size of the original page:
LetterLegal
Scanner Settings
A4 for European documentsBusiness Card
You can view and change the settings for your scanner in the Scanner tab of the Settings dialog box. For ISIS and TWAIN scanners that have been configured to allow TextBridge to control the scanner, check the scanner settings to be sure they are the best ones for processing the original page.
OCR and TextBridge Pro 2–25
Figure 2–16. Scanner tab in the Settings dialog box
On the Scanner tab:
Choose the resolution value to reflect the actual resolution of your scanner. For most documents, use 300 dpi. For 8 point or smaller text, use 400 dpi for the best results.
Change the brightness based on whether your original page has light or dark text and pictures. For example, if the text and pictures on the original page are light, darken the brightness control.
Check the box to use the Automatic Document Feeder if your scanner has this feature and you are scanning multiple pages.
If you use an Epson scanner, check Text Enhancement Technology to enhance the scanned image for text recognition.
If you use an Epson scanner, check Auto Area segmentation to enhance the scanned image for improved picture quality.
2–26 TextBridge Pro 98 User’s Guide
Processing Settings
You can view and change the settings for processing in the Processing tab of the Settings dialog box. Check the settings to be sure they are the best ones for processing the original page.
Figure 2–17. Settings dialog box and Processing tab
On the Processing tab:
Select the primary language of the document, which could be: English, German, French, Italian, Spanish, and other languages.
Choose one or more of the following options to apply during page processing for special documents, if you have created any of these: a user dictionary, training data, and zone template.
OCR and TextBridge Pro 2–27
Set the page orientation for the way text and images are printed
Check Train OCR if you want to train TextBridge OCR for
Text Document Settings
on the original page:
Any orientationPortraitLandscape
If you select Any orientation, TextBridge automatically determines the page orientation.
recognition.
You can view and change the settings for the output document in the Text Document tab in the Settings dialog box. Check the settings to be sure they are the best ones for processing the original page.
Figure 2–18. Text Document tab in the Settings dialog box
2–28 TextBridge Pro 98 User’s Guide
On the Text Document tab select default settings for saving the results:
Specify one or more recomposition settings to reflect the output results you want based on the original page:
Retain the layout of the original page.Retain the pictures on the original page.Format with paragraph styles
If you select format with paragraph styles, the text in the output document is formatted with the paragraph styles of the original page rather than automatically formatted with a single style.
Note The output page looks the same whether format with paragraph
styles is checked or not; however, in your word processor there are paragraph and table styles assigned to the text. You can use the styles from the original document if you modify the text in the output document.
Specify how you want to save the results of document processing:
Save the output as one document in one file.Save each page as a separate document in a separate files.Save a new document whenever a blank page is found in the
original document.
OCR and TextBridge Pro 2–29
Specify where you want to save the results of document processing:
Save the document on the Desktop.Save the document on the a: drive.Save the document in My Documents.Save the document in Text Documents.Save the document in a location you determine by using the
Browse button or typing the location.
Specify the type of format in which to save the results from the list of options.
Note The save as type and the recomposition settings are related. If
there is a conflict between the recomposition and save as type settings, some settings will automatically be unavailable to prevent a conflict.
Specify the default name of the scanned document to save. The default name is automatically the name of the current
page type for pages from your scanner and the name of the image file for pages from an image file.
Type in another name, if desired.
If you are saving more than one document, each document has the same base name appended with an integer in parentheses. For example, Magazine Page (2). If you want to rename the file, you can rename it in the Save As dialog box.
You can change settings for retain picture, retain page layout, and file location later in the Save As dialog box, if you desire.
2–30 TextBridge Pro 98 User’s Guide
Saving a Document in a PDF File
TextBridge saves the results of page recognition in a format compatible with common text software such as your word processor or spreadsheet. In addition, TextBridge can save the results of page recognition in the Adobe Acrobat Portable Document Format (PDF). This format lets you share documents among different types of computers, such as PC or Macintosh. TextBridge can output documents in the PDF formats in the following list:
Acrobat PDF Normal with no word images
Acrobat PDF Normal with suspect word images
Acrobat PDF Normal with highly suspect word images
Acrobat PDF Image only
Acrobat PDF Image and text
After you select the type of page to OCR and before you begin processing, use the Text Document tab in the Settings dialog box to set the Save As type to the type of PDF that you want. When you save the results, you must keep the PDF format that you specified before beginning the job.
You can automatically or manually zone pages, as you can with any other format. Retain pictures and page layout are not applicable, since both are automatically retained.
Note The Proofreader is not available when you select PDF output.
Also, you cannot output PDF files if you use Instant Access to TextBridge.
OCR and TextBridge Pro 2–31
Figure 2–19. Text document tab in the Settings dialog box with Adobe PDF format selected
IMPROVING OCR WITH TRAINING
To assure the highest possible accuracy, TextBridge provides an interactive training capability that can aid in word recognition on pages with the same fonts and print quality. This feature enables you to participate in the OCR process, verifying correctly recognized words and correcting any recognition errors. You can select the Train OCR command at the beginning of a job or during a job and use it in automatic or manual mode. Interactive training is especially effective for documents with poor quality originals, such as faxes and multi-generation photocopies.
2–32 TextBridge Pro 98 User’s Guide
Training level
Training toolbar
Figure 2–20. Training using the Training toolbar
You can set training level options to control the sensitivity of the training process. The level determines how frequently suspect words are displayed for your input. You can request to see any word that is slightly suspect or only those words that are highly suspect. In the first case you will review a fewer number of words than in the second case.
As you correct or accept TextBridge’s recognition decisions, you also train TextBridge to improve its own accuracy rate for later pages of the same document. During this process, TextBridge compiles information about the character shapes, styles, and sizes found in the document being recognized. You can accept or correct each suspect word until you are satisfied that TextBridge is sufficiently trained. Then you can turn interactive training off, and TextBridge will recognize the rest of the document.
OCR and TextBridge Pro 2–33
You can use the Training toolbar to correct words. You can add corrected words to the user dictionary, which can improve recognition in subsequent pages of the same document and subsequent documents. The user dictionary is most useful for non-standard words that you frequently need to recognize such as proper nouns and technical words. You can save and later reload the user dictionary to assure that documents containing the same words are recognized with the same high degree of accuracy.
You can save and later reload training data to assure that other similar documents of the same font are recognized with the same high degree of accuracy. Training data is only effective for documents with the same font(s) characters, and print quality.
WHERE TO GO FROM HERE
To learn how you use TextBridge to process simple and complex documents, refer to Chapter 3. It also explains how to start TextBridge and use the Help system and sample documents, use Instant Access, plus view, zone, train, and proofread your document in TextBridge.
The online Help system provides a complete reference to the user interface, including window areas, menus, commands, and tools as well as overview information and features, step-by-step procedures for using the software, tips, and a glossary.
2–34 TextBridge Pro 98 User’s Guide
3
LEARNING TO USE TEXTBRIDGE
The previous chapters have introduced you to TextBridge and document recognition. This chapter provides step-by-step instructions to teach you how to use the most important capabilities of TextBridge.
The learning sessions build on each other and assume that you understand the procedures explained in the previous sessions. It’s best to do them in order or skim through prior sessions to familiarize yourself with the steps. Each learning session begins with introductory information including a list of what you will learn followed by step-by-step procedures and explanations.
The topics presented in this chapter are in the following list:
Starting TextBridgeUsing the Help systemUsing the sample documentsProcessing a simple document using auto processingUsing Instant Access OCRProcessing a complex document using manual processingProcessing text, pictures, and a tableTraining OCR and using the Page toolbar
TextBridge Pro 98 User’s Guide 3–1
STARTING TEXTBRIDGE
1. On the Windows task bar, click Start.
2. Point to Programs, then point to the TextBridge Pro 98
3. Click the TextBridge Pro 98 icon.
Note For these learning sessions you will be selecting the page type
There are two ways to start TextBridge. You can start TextBridge as a standalone application or as Instant Access from any Windows-based text application.
In this section you will learn to start TextBridge as a standalone application.
To start TextBridge:
folder.
The TextBridge Pro 98 main window appears.
and page source from the Start dialog box. The instructions for these sessions assume that your screen looks like the following figure.
3–2 TextBridge Pro 98 User’s Guide
Menu bar Main toolbar
Process toolbar
View area
Status bar
Figure 3–1. TextBridge Pro 98 main window
If your screen does not look like this, in the View menu, select Toolbars. In the Toolbars dialog box, select Start dialog box in the Select Page Type and Source From box.
Figure 3–2. Toolbars dialog box
Learning to Use TextBridge Pro 3–3
USING THE HELP SYSTEM
TextBridge is designed to be easy to learn and use. The online Help system provides general information about the program, step-by-step procedures for using the program, a glossary, and a complete reference to the user interface, including window areas, menus, commands, and tools.
In this section, you will learn to:
Get information from the Help system, including What’s This?
Help.
Use Help Topics window.
What you want to know about
Item in a dialog box or menu Click the ? button then click the
Entire dialog box Click the Help button in the dialog
How to do something Click TextBridge Help in the Help
General information Click TextBridge Help in the Help
How to get Help
item or Right mouse click the item then click What’s This? in the shortcut menu. or Select the item than press F1 or Shift+F1.
box.
menu, then click Step-by-Step Procedures in the Contents tab or use the Index.
menu then click About TextBridge Pro 98 in the Contents tab or use the Index.
3–4 TextBridge Pro 98 User’s Guide
What you want to know
How to get Help
about
Meaning of a word Click Help in the menu bar and in
the Contents tab, click Glossary or use the Index.
A concept not listed in the Contents or Index
Click TextBridge Help in the Help menu then click Find and follow the directions.
You can get Help by using the main Help Topics window shown in the following figure.
Figure 3–3. Help Topics: TextBridge Pro 98 Help window
From the Help Topics window, you can get the help you want by performing one of the activities in the following list:
Select a topic from a book in the Contents tab.Select a topic from the Index tab.
Learning to Use TextBridge Pro 3–5
Search for information about a specific word or phrase using the
Find tab.
Jump from one topic to a related topic.
USING THE SAMPLE DOCUMENTS
In this section, you will learn about the sample documents and how to open a sample document.
Use the sample documents provided on the installation CD with the learning sessions in this chapter. You can find the five sample documents in the installation folder in the following location:
C:\Program Files\TextBridge Pro 98\Images\Samples
This is the default location for these files; however, you may have installed them in another location. The sample documents are stored in TIFF format and are named:
Letter BookWise Markplan Scanning Plexis
The sample documents provide a cross-section of the page types that TextBridge can process:
Memo or LetterMagazine PageAny PageSpreadsheet or Table
The sample documents are designed to provide you with documents on which to learn and to highlight the capabilities of the application. In each of the learning sessions, you are asked to use a specific sample document.
3–6 TextBridge Pro 98 User’s Guide
In this session, you will learn to open a sample document. For
this session, use Letter.tif.
Figure 3–4. Letter sample document
To find and open a sample document:
1. Click the Auto Process button.
The Start dialog box appears.
Learning to Use TextBridge Pro 3–7
Figure 3–5. Start dialog box with Any Page and Image file selected
2. In the Start dialog box:
Click Any Page in the Page type box.
Click Image file in the Page source box.
Click OK.
The Open dialog box appears. The default folder Samples is open. The sample TIFF files are listed in the Open dialog box.
3–8 TextBridge Pro 98 User’s Guide
Select an image file
Figure 3–6. Open dialog box with Letter.tif selected
If Samples is not the open folder, access the sample documents folder in the following location from the Look In: box in the Open dialog box:
C:\Program Files\TextBridge Pro 98\Images\Samples
This is the default location unless you installed TextBridge in another place.
3. The Open dialog box, double-click a file name to open it.
In this case, double-click Letter.tif.
TextBridge gets the page as shown in the following figure.
Learning to Use TextBridge Pro 3–9
Figure 3–7. TextBridge - Getting Page dialog box
TextBridge automatically zones the page and identifies text, tables, and pictures as shown in the Zoning dialog box.
Figure 3–8. TextBridge - Zoning dialog box
TextBridge automatically recognizes the characters and page layout as shown in the Recognizing dialog box.
3–10 TextBridge Pro 98 User’s Guide
Figure 3–9. TextBridge - Recognizing dialog box
After TextBridge reads the page image and processes it, it stops for you to proofread the page.
Figure 3–10. TextBridge - Proofread window
Learning to Use TextBridge Pro 3–11
For this lesson, you just want to get back to where you started without saving the document.
4. Click the New command in the File menu to discard the current page.
A dialog box appears and tells you that the current page will not be saved.
5. Click OK.
You return to the original TextBridge screen.
Now you know how to find and open a sample document .tif file.
Proceed to the learning sessions to work with TextBridge, and familiarize yourself with using its capabilities.
SESSION 1: PROCESSING A SIMPLE DOCUMENT USING AUTO PROCESSING
TextBridge provides a range of powerful features. However, TextBridge is also designed to be very easy to use. For many documents, you can use default settings and automatically process a document.
For this learning session, use the sample document named
Letter. This document has a single column of text and a logo.
In this session you’ll learn to:
Use Auto ProcessUse the Start dialog boxSelect the Memo or Letter page typeOpen an image file
3–12 TextBridge Pro 98 User’s Guide
Save a document after recognition
When you select Memo or Letter as the page type, it automatically specifies the following settings:
Single column page layoutGood print typeLetter size
TextBridge also uses the following default settings:
Scanner resolution 300 dpiScanner brightness at normalEnglish languageDefault user dictionaryNo training dataNo zone templatePortrait orientationRetain page layoutRetain picturesFormat with paragraph stylesOne file for all pagesSave in the Text Documents folder of the TextBridge folderSave as standard format
Learning to Use TextBridge Pro 3–13
Refer to Chapter 2 and Help for more information about these settings.
When retain page layout and retain pictures are set, TextBridge recomposition keeps the layout of the original page when you save and open your document in a format for an application that supports recomposition, such as Word, WordPerfect, and Excel.
When retain page layout is selected, TextBridge recomposes the layout of the document, including text and pictures, while maintaining full editability in the output file. After recomposition, text, pictures , and tables are in the same position in relation to each other as in the original page when you print the page or look at it in layout view, if your word processor supports these elements.
With retain pictures selected, pictures are included in the final document. If retain page layout is not selected, pictures are placed at the end of the text rather than the same position in relation to the text that they were in the original page.
If format with paragraph styles is selected, text in the final document will be assigned specific styles with “TxBr” followed by two sets of numbers separated by a letter. The first number represents the number of times you have pasted into the application. The letter represents information about the style. The letter “c” means that the paragraph is centered, “p” means a regular paragraph, and “t” means a tab table. The second number represents a specific style. For example, TxBr_3c11 means that the number of consecutive Instant Access pastes into Word is 3, the paragraph is centered, and the style number is 11.
To process a simple document, use the following procedure:
1. Start TextBridge.
TextBridge appears.
3–14 TextBridge Pro 98 User’s Guide
2. On the Process toolbar, click the Auto Process button.
The Start dialog box appears.
Figure 3–11. Start dialog box with Memo or Letter and Image file selected
3. In the Start dialog box:
Click Memo or Letter in the Page type box.
Select Image file in the Page source box.
Click OK.
The Open dialog box appears.
Learning to Use TextBridge Pro 3–15
Select an image file
Figure 3–12. Open dialog box with Letter.tif selected
4. In the Open dialog box, double-click the sample document, Letter.tif.
TextBridge reads the image file, and automatically performs OCR on it, as indicated by the feedback display in the view area of the main window. It stops to display the page for you to proofread. For this lesson, just continue and save the document.
5. Click the Save button to save the document.
The Save As dialog box appears.
3–16 TextBridge Pro 98 User’s Guide
Accept the default name, or type a new name
Select the output format
Click Save
Figure 3–13. Save As dialog box
6. In the Save As dialog box, complete the following steps:
In the Save in list, select the folder in which to save
the text file.
In the File name box, type a file name.
In the Save as type list, select the output format for
your word processor or other text application.
Check that Retain pictures and Retain page layout
are selected.
Click the Save button.
TextBridge formats and saves the document.
The status bar at the bottom of the screen confirms that you have saved the document. The Proofread window remains open. You can start to process another document if you press the Auto Process button or the Get Page button.
Learning to Use TextBridge Pro 3–17
Be sure to notice where the document is saved so that you can
find it easily. The save location originally defaults to Text Documents in the TextBridge Pro 98 folder in Program Files. You can check or change the default in the Settings dialog box Text Document tab.
7. Open the file in your word processor or other text application.
Unless you specified otherwise, open the file in the Text Documents folder of the TextBridge Pro 98 folder in the Program Files folder. You can use the shortcut to this location.
The document is saved with an .rtf extension if your text
application is Word. In the Open dialog box of your word processor, check that you can see files of this type listed.
Compare the recognized document in your word processor with the picture of the sample document, Letter.tif.
Figure 3–14. Letter sample document
3–18 TextBridge Pro 98 User’s Guide
With a word processor such as Word or WordPerfect in the page layout view, the recognized document should have the same or similar layout as the TIFF image or sample document. The difference is that now you have formatted, fully editable text, just as if you had typed it in yourself. At this point, you could spell check the document and make any other changes in your word processor.
SESSION 2: USING INSTANT ACCESS OCR
You can use TextBridge Instant Access OCR to run TextBridge from within another application, such as a word processor. To use Instant Access, simply start TextBridge from within an application, such as Word or WordPerfect. During Instant Access OCR, TextBridge processes a document then pastes it into the open document in your text application.
For this learning session, use the sample document named
Markplan. This document has a single column of text, a title, headings, and bullet lists. The procedure is similar to processing a simple document.
In this session you’ll learn to:
Use TextBridge Instant Access in your word processor.Select the Any Page page type.
When you select Any Page as the page type, it automatically specifies the following settings:
Any page layoutAny print typeLetter size
Learning to Use TextBridge Pro 3–19
TextBridge also uses the following default settings:
Scanner resolution 300 dpiScanner brightness at normalEnglish languageDefault user dictionaryNo training dataNo zone templatePortrait orientationAuto process all pagesRetain picturesRetain page layoutFormat with paragraph stylesOne file for all pagesSave as standard formatSave in Text Documents folder
Refer to Chapter 2 and Help for a more information about these settings.
If TextBridge is still running from the previous learning session,
exit from TextBridge. This will let you run TextBridge from your word processor. You can not have more than one copy of TextBridge running at the same time.
3–20 TextBridge Pro 98 User’s Guide
Before you run TextBridge as Instant Access, you may need to use the Instant Access Control Panel to choose which applications have Instant Access to TextBridge. TextBridge automatically provides Instant Access for the applications listed in the control panel, as shown by the check mark.
If you want to examine the status of Instant Access, click Start on the Windows taskbar, then Programs, then TextBridge Pro 98, and Instant Access Control Panel. You can also access the Instant Access Control Panel from the main TextBridge application in the File menu by clicking Instant Access Control Panel. Help provides additional information about Instant Access.
Check one or more programs
Click OK
Figure 3–15. TextBridge Instant Access Control Panel
The Enable access to TextBridge list shows the text applications from which TextBridge can be invoked. The list includes applications commonly used with TextBridge and applications that are currently running. If your application does not appear in this list, close the TextBridge Instant Access Control Panel, start your application, and reopen the TextBridge Instant Access Control Panel. Your application should now appear in the list.
Click on applications in the list to check or uncheck them. Click All to check all items in the list. Click None to uncheck all items in the list. Instant Access to TextBridge will be available from all checked applications.
Learning to Use TextBridge Pro 3–21
Start Textbridge Instant Access
Click OK to close the Instant Access Control Panel and save any changes you specified.
To use Instant Access OCR from your word processor, use the following procedure:
1. Start your word processor, and open a new document.
2. In the File menu, click the TextBridge... command.
Figure 3–16. Textbridge... command in File menu
The Start dialog box appears. Notice that the Start dialog box is slightly different than the Start dialog box in the standalone version of TextBridge. Processing and Output boxes have been added.
3–22 TextBridge Pro 98 User’s Guide
Figure 3–17. Start dialog box for Instant Access
3. In the Start dialog box:
In the Page type box, click Any Page.
In the Page Source box, select Image file.
In the Processing box, select Auto process all pages.
In the Output box, select Retain pictures and Retain
layout.
Click OK.
If your application does not support Retain pictures and Retain layout, these will not be available to select.
The Open dialog box appears.
Learning to Use TextBridge Pro 3–23
Select an image file
Figure 3–18. Open dialog box with Markplan.tif selected
4. In the Open dialog box, double-click the sample document, Markplan.
TextBridge reads the image file, and automatically performs OCR on it, as indicated by the feedback display in the view area of the main window. After acquiring and recognizing the page, TextBridge pastes the document into the open document in your word processor.
If TextBridge can not do this, you will get a message that tells you, “TextBridge is unable to paste text automatically. Select Paste from the Edit menu to manually copy the recognized text from the clipboard.” You may have to open a new document before you can use the Paste command. After you follow these instructions, the processed document is pasted in the open document of your word processor.
Compare the recognized document in your word processor with the reproduction of the sample document, Markplan.tif.
3–24 TextBridge Pro 98 User’s Guide
Figure 3–19. Markplan sample document
With a word processor such as Word or WordPerfect in the page layout view, the recognized document should have the same or similar layout as the TIFF image or sample document. The difference is that now you have formatted, fully editable text.
If this document continues to a second page, delete any additional
spacing that was inserted into the document.
You can save the document or make any changes you’d like to the document just as if you’d typed it yourself. For example, you can spell check it and save it with your changes.
SESSION 3: PROCESSING A COMPLEX DOCUMENT USING MANUAL PROCESSING
For more complex documents such as magazine articles, you often can use TextBridge in automatic mode. However, simply using a few additional steps in manual mode can sometimes produce a more accurate result in less time.
Learning to Use TextBridge Pro 3–25
For this learning session, use the sample document named
BookWise. This document has multiple columns, a dropped capital letter, headings, paragraphs, bullet lists, and reversed video text.
In this session you’ll learn to:
Use manual processing with the Get Page button.Select Magazine Page type.Zone a page.Use the Zoom button.Add a word to the user dictionary.Proofread a page.Retain page layout.Save a page.Edit the document in your word processor.
Refer to Chapter 2 and Help to learn more about zoning and proofreading.
When you select Magazine Page as the page type, it automatically specifies the following settings:
Multi-column page layoutGood print typeLetter size
TextBridge also uses the following default settings:
Scanner resolution 300 dpi
3–26 TextBridge Pro 98 User’s Guide
Scanner brightness at normalEnglish languageDefault user dictionaryNo training dataNo zone templatePortrait orientationRetain picturesRetain page layoutFormat with paragraph stylesOne file for all pagesSave as standard formatSave in Text Documents folder
Run the standalone version of TextBridge from the Start button for this learning session.
1. Start the TextBridge standalone version.
2. Click the Get Page button.
The Start dialog box appears.
Learning to Use TextBridge Pro 3–27
Figure 3–20. Start dialog box with Magazine Page and Image file selected
3. In the Start dialog box:
Click Magazine Page in the Page Type box.
The settings are automatically set to multi-column page layout, good print type, letter size. Note: If your text application does not retain page layout, the page will be a single column of text also referred to as galley text followed by pictures.
Click Image file under Page source.
The application knows where to get the page.
Click OK.
The Open dialog box appears.
4. Double click BookWise.tif.
TextBridge gets the page, displays it, and displays the Preview toolbar so that you can preview it.
3–28 TextBridge Pro 98 User’s Guide
The page you see should be a two-column magazine article beginning with a drop cap.
If this is not the correct page, in the File menu, click New. Click
OK to close the current document. You can begin again by selecting Get Page.
5. Click the Locate Zones button.
TextBridge automatically zones the page. TextBridge locates areas on the page to recognize and designates each area as text, table, or picture. TextBridge then stops for you to check and change to the zones. All zones on this page are text zones.
Learning to Use TextBridge Pro 3–29
Preview toolbar
Text zone
Figure 3–21. Zoned magazine page
6. Check the results of automatic zoning.
There should be six text zones. The numbers on the text zones reflect the order in which these zones will appear in the final document, even if you decide not to retain page layout.
Click the Zoom In and Zoom Out buttons to enlarge
and reduce the page to examine the zones, if necessary.
Zoom In Zoom Out
TextBridge magnifies the page.
Modify automatic zoning, if necessary.
3–30 TextBridge Pro 98 User’s Guide
If a zone is not assigned the desired type, right-click the zone. In the shortcut menu, move the pointer to Zone Type and click the type of zone you desire.
Reverse video text must be in a separate text zone that includes
no regular text. If the reverse video text is not in one zone by itself with its own zone number, rezone the reverse video text.
One way to separate the reverse video text from the regular text is to use the Erase Markup tool. Determine which area of the page you want to include in the reversed video text zone.
To divide one zone into two zones:
Click the Erase Markup button.
Erase the area of the page that connects the regular text to
the reversed video text.
Press and hold the mouse at the upper left corner of the area you want to erase. Drag the mouse diagonally across the area to erase. When you have defined the area, release the mouse. The area is erased and becomes white, which means it is no longer included in a zone.
When the zones are accurate, continue with the next step, which is page recognition.
7. Click the Recognize Page button.
TextBridge performs OCR and recognizes the page.
Learning to Use TextBridge Pro 3–31
Proofread toolbar
Suspect word
TextBridge stops for you to proofread the results of recognition and displays the Proofread toolbar. You can fix words not recognized correctly and add them to the dictionary for improved recognition.
Figure 3–22. Proofreading a page
9. Change any words that were not accurately recognized using the Proofread toolbar.
• Examine the word in the Suspect box.
If you want a closer look at the word as it appears in the original page, click the Word Image button on the Proofread toolbar.
3–32 TextBridge Pro 98 User’s Guide
• If the suspect word is the word you want, click the
Accept button.
TextBridge removes the suspect highlighting and continues to the next suspect word.
or
• If the suspect word is not the word you want, type the
word you want in the Should Be box.
• Click the Add to Dictionary button if you want the
TextBridge dictionary to store a word for future recognition.
As discussed in the section on proofreading in Chapter 2, the
dictionary is most useful for non-standard words that you frequently need to recognize such as proper nouns and technical words.
• Click the Accept button.
TextBridge continues to the next suspect word.
or
• If the you want to go on to the next suspect word, click
the Find Next button.
TextBridge does not remove the suspect highlighting and finds the next suspect word.
• If you see a word in the text that you want to correct,
click it. The word appears in the Suspect box, and you can edit it.
Repeat this process until you check every suspect word and either accept it as is or change it then accept it. You can save at any time.
Learning to Use TextBridge Pro 3–33
Loading...