The software described in this book is furnished under license and may
be used or copied only in accordance with the terms of such license.
IMPORTANT NOTICE
TRADEMARKS AND CREDITS
Scansoft, Inc. provides this publication “as is” without warranty of any
kind, either express or implied, including but not limited to the implied
warranties of merchantability or fitness for a particular purpose. Some
states or jurisdictions do not allow disclaimer of express or implied
warranties in certain transactions; therefore, this state-ment may not
apply to you. Scansoft reserves the right to revise this publication and to
make changes from time to time in the content hereof without obligation
of Scansoft to notify any person of such revision or changes.
TextBridge is a registered trademark, and Smart Zones, Instant Access
OCR, and Custom Proof are trademarks, of Scansoft, Inc., a Xerox
Company. Xerox, The Document Company, and the Stylized X are
trademarks of Xerox Corp.
Excel, Word, and Windows are trademarks of Microsoft Corp.
WordPerfect is a registered trademark of WordPerfect Corp.
Other terms used in this manual are the trademarks of their respective
holders.
Using the Help System ............................. 3–4
Using the Sample Documents ........................ 3–6
Session 1: Processing a Simple Document
Using Auto Processing ........................ 3–12
Session 2: Using Instant Access OCR.................. 3–19
Session 3: Processing a Complex Document Using Manual
Processing ................................. 3–25
Session 4: Processing Text, Pictures, and a Table ........ 3–35
Session 5: Training OCR and Using the Page toolbar...... 3–42
Where to Go From Here............................ 3–51
INDEX
ivTextBridge Pro 98 User’s Guide
PREFACE
ScanSoft, Inc., a Xerox Company, welcomes you to TextBridge®
Pro 98 for Windows 95™ and Windows NT. (Hereinafter
TextBridge Pro 98 will be referred to as “TextBridge.”)
Before going on to find out more about TextBridge, please read
this preface because it describes these important items:
◆About this user’s guide
◆Related documentation
◆Technical support
ABOUT THIS USER’S GUIDE
This user’s guide includes introductory information designed
primarily for non-technical users as well as information designed
for more technical users. It assumes that you are familiar with
the management and operation of your computer and Windows.
The documentation that comes with TextBridge should provide
all the information you need to operate TextBridge. TextBridge
documentation includes this user’s guide, a Help system, and
Release Notes. ScanSoft invites your comments about the
information provided in the documentation. Please make sure to
register your software and provide any comments to ScanSoft.
TextBridge Pro 98 User’s Guidev
Organization of this user’s guide
This user’s guide is designed as a reference tool to provide basic
information about TextBridge. It is organized as follows:
◆Chapter 1, “Introduction to TextBridge,” discusses TextBridge’s
features. It also describes: documents TextBridge can recognize,
what comes with TextBridge, supported scanners, system
requirements, installation, setting up Instant Access, uninstalling
TextBridge, and input and output file formats.
◆Chapter 2, “OCR and TextBridge,” provides an explanation of the
concepts of document recognition and OCR and the basic
functionality of TextBridge.
◆Chapter 3, “Learning to Use TextBridge,” walks you through
several practice sessions designed to provide a firm basis on
which to learn and use the important features of TextBridge.
This user’s guide also provides a comprehensive index for you to
quickly locate the information you need.
viTextBridge Pro 98 User’s Guide
Documentation conventions
As described in Table P–1, TextBridge documentation uses
certain graphical elements and formatting to emphasize
information and give more meaning to text.
Table P–1. Documentation Conventions
boldIntroduces a new term or the first use of an
italicDenotes titles of other user’s guides or books
monospaceDenotes text that appears on the computer
“ ” (quotes)Denotes titles of chapters and sections in this
important term in a chapter. Sometimes used
to denote strong in-line emphasis.
and generic representations of file name entries
in examples; for example, filename
screen such as examples, menu text, and
messages plus actual file names.
user’s guide.
☞
NoteIntroduces information of note about the
RELATED DOCUMENTATION
TextBridge provides a comprehensive set of printed and online
documentation designed to assist you in learning and operating
the product. The documentation provided with TextBridge covers
all aspects of installation and operation.
In addition to this TextBridge Pro 98 User’s Guide, refer to the
following documentation for more information:
Prefacevii
Introduces tips that provide useful information
about a procedural step or system function.
current subject.
◆Online Release Notes—After you install TextBridge, read the
online Release Notes first. These provide the most up-to-date
information about TextBridge. Release Notes automatically
appears in the TextBridge 98 folder. Simply point to Release
Notes in the TextBridge 98 folder to open the Release Notes so
that you can read them.
◆Help—An extensive online Help system comes with TextBridge.
The Help provides you with information about the software in
general; the menus, commands, and tools; step-by-step
procedures; and a glossary.
◆TextBridge online electronic documentation—This includes
an electronic version of this TextBridge Pro 98User’s Guide in
Adobe Acrobat format (.pdf). The documentation resides on the
compact disk in the directory TextBridgePro Documents.
Please refer to the Release Notes in that directory for information
about using the online documentation.
◆Multimedia Guided Tour—The Guided Tour provides you with
an introduction to TextBridge.
NoteYou may need to refer to additional publications, such as the
manufacturer’s documentation for your scanner.
TECHNICAL SUPPORT
If you should experience problems with TextBridge that you
cannot resolve with the documentation and software, contact
TextBridge Technical Support.
You can contact TextBridge Technical Support by the Internet,
telephone, or fax.
This information will assist Technical Support in solving the
problem:
viiiTextBridge Pro 98 User’s Guide
◆Your software version number
(This is on the back of the CD-ROM case and in the Help menu
under About TextBridge.)
◆Your software serial number
(This is the serial number on the back of the TextBridge CD-ROM
case and in the Help menu under About TextBridge.)
◆Your scanner make and model
◆A description of the steps that led up to the problem
◆If TextBridge generated an error message, a verbatim description
of the error message or its number
Internet and electronic mail addresses
You can also contact Technical Support and get information about
TextBridge on the Internet at the addresses in the following list:
◆TextBridge site: www.textbridge.com
The TextBridge Web site provides a link to Technical Support
with Frequently Asked Questions, technical information
bulletins, and a problem report form.
E-mail in the United States, Canada, or the Pacific Rim:
E-mail from European countries and the Middle East:
◆Technical Support: uk_support@xis.xerox.com
◆Upgrade information: xisuk@xis.xerox.com
Prefaceix
Telephone and fax numbers
Call one of the following telephone numbers or send a fax
describing the problem to one of the fax numbers.
In the United States, Canada, or the Pacific Rim:
☎ Telephone: 978–977–0764
Fax: 978–977–2434
From European countries and the Middle East:
Xerox Scansoft Ltd. in England:
☎ Telephone: +44 (0) 1923 209140
Fax: +44 (0) 1923 208446
xTextBridge Pro 98 User’s Guide
1
INTRODUCTION TO
TEXTBRIDGE
Welcome to ScanSoft’s TextBridge™ Pro 98, optical character
recognition (OCR) software for Microsoft Windows™ 95 and
Windows NT. (Hereinafter TextBridge Pro 98 will be referred to
as “TextBridge.”)
This chapter provides an introduction to TextBridge including:
◆Features and benefits
◆What comes with TextBridge
◆Scanners supported
◆System requirements
◆Installing TextBridge
◆Setting up TextBridge Instant Access
◆Uninstalling TextBridge
◆Input image file formats supported
◆Output text file formats supported
OCR is a technology that enables you to reproduce the paper
documents you use every day into fully editable text on your
computer. TextBridge even retains the layout of the original
document when possible.
TextBridge Pro 98 User’s Guide1–1
You can use TextBridge to convert printed documents from fax
machines, photocopiers, and dot matrix and laser printers to
electronic documents for your word processor or text
application as well as documents for some database, desktop
publishing, and spreadsheet software. TextBridge OCR can also
recognize page image files from scanners as well as fax
machines and other sources.
FEATURES AND BENEFITS
Using Xerox’s latest document recognition technology, DocuRT™,
TextBridge OCR produces a fully-editable electronic document
that retains the original document layout, complete with text and
pictures (Figure 1–1). TextBridge understands your original
document format, and the keeps the layout the same, including
columns, headers, footers, pictures and picture captions. This
feature is supported only if your text application supports
pictures and layout. For example, this feature is supported in
Microsoft Word and WordPerfect but not in Notepad.
1–2TextBridge Pro 98 User’s Guide
Original document
Recomposed document
in word processor
Figure 1–1. TextBridge document recomposition
TextBridge offers many productivity features. Whether you need
to capture a simple one-page letter, a magazine article, a
spreadsheet, or a long transcript, TextBridge can save you
valuable time and effort. In addition, TextBridge provides all the
capabilities that experienced OCR users have come to expect.
Introduction1–3
New Productivity features in TextBridge Pro 98
TextBridge offers these major features:
◆Improved OCR accuracy. Dramatically saves time and
eliminates retyping.
◆Instant Access™. You can start TextBridge within most
Windows text programs. After recognizing and converting the
page image to text, TextBridge then automatically pastes
recognized data (text and pictures) directly into the text
program’s open document.
recomposition to retain your original page layout. It reproduces
multiple columns, cell tables, and pictures in the same location as
they are in your original document.
• When you specify output to Microsoft Word™ or WordPerfect
format, TextBridge can retain the original document layout in
fully-editable form, even for pages containing tables, line art,
reverse video, drop caps, insets, and pictures. When you edit
the document, the original text flow is maintained.
• When you specify output to Microsoft Excel™ or Lotus 1-2-3
format, spreadsheets and cell tables retain their original
layout as cell tables not tabbed columns. When you edit the
table information, the lines move to fit exactly as you would
expect.
TextBridge supports formats for the the following applications
that retain page layout:
• Excel 3.0–5.0
• Excel Max 3.0–7.0
• Excel Office 97
®
1–4TextBridge Pro 98 User’s Guide
• HTML
• HTML Editor
• HTML Netscape
• Lotus 1-2-3
• Quatro Pro
• Word 6.0
• Word 7.0 in Office 95
• Word for Windows 2.x
• Word Office 97
• Word Perfect 6.1, 7.0, and 8.0
◆TextBridge wizard. An easy-to-use wizard guides you through
each step of the TextBridge process, including page type selection
and recomposition options.
◆Page type templates. TextBridge provides many predesigned
page type templates to make processing more efficient. Templates
automatically provide appropriate settings for the type of page
you want to process. For example, there is a magazine page type
and a letter page type that automatically activate settings for
improved results. Page types incorporate three page settings:
page size, page loayout, and print quality. You do not have to go
through a complicated process of determining and specifying
settings for common types of pages.
◆Built-in Proofreader™. After document recognition, you can use
TextBridge’s built-in proofreader to view and accept or correct
any words that TextBridge suspects may not be recognized
accurately.
◆Automatic zoning and zone editing. TextBridge automatically
zones your page is into text, picture, and table zones.
Introduction1–5
◆Zone editing. You can edit the automatic zones to further refine
the zoning. Use zone editing to increase the accuracy and
efficiency of page processing by reshaping zones and renumbering
them.
◆Adobe Acrobat PDF output. You can output the document in
Adobe Acrobat Portable Document Format (PDF), which can be
viewed on either a PC or Macintosh computer.
◆Dynamic OCR training. You can train OCR to improve
recognition accuracy as the job progresses. Use dynamic training
with difficult documents, such as faxes or multi-generation
photocopies. TextBridge enables you to interact with the OCR
process and view then accept or correct its automatic recognition
decisions. The software actually learns special symbols and
words.
◆Guided Tour. Multimedia introduction to TextBridge including
a guide to major features.
◆ToolTips and What’s This? Help. Instant context-sensitive
information about commands, dialog boxes, and buttons on the
interface.
Other TextBridge features
In addition to the features listed in the previous section,
TextBridge provides these other productivity features:
◆Windows 95 certification
◆Microsoft Office 97 certification
◆MMX support
1–6TextBridge Pro 98 User’s Guide
◆Broad scanner support. TextBridge supports most popular
desktop scanners. It provides many built-in Image and Scanner
Interface Standard (ISIS) drivers supporting a number of
scanners. It supports the TWAIN device interface standard. It
also supports the Text Enhancement Technology and Auto Area
Segmentation features of the EPSON® scanner family. It also
supports some Windows NT scanners.
◆Image processing. TextBridge accepts a wide range of images
from a variety of sources for processing. Specifically, the program
imports and recognizes online document images in BMP, Delrina,
PCX, DCX, TIFF, and XIF formats originating from fax modems
and other sources. For more information, see the “Supported
Input Image File Formats” section in this chapter.
◆OLE drag and drop. TextBridge supports Windows OLE
standard drag and drop operations. For example, you can drag an
image from an OLE-compliant program onto TextBridge.
◆Clipboard support. TextBridge can import and recognize
images from the Clipboard.
◆Deferred processing. TextBridge enables you to scan all pages
of a document to a TIFF file, then later open the image file for
document recognition.
◆Output text formats including HTML. TextBridge supports a
number of output text formats, including word processor, desktop
publishing, portable document, spreadsheet, HTML, and
database formats. Now you can process your text for publication
on the Web. For more information, see the “Supported Output
Text File Formats” section in this chapter.
◆Preview with manual zoning. TextBridge provides a set of
tools for previewing page images before processing them. You can
view a page before continuing with processing. You can manually
define areas of page images as zones to be processed and capture
only the text, tables, or pictures you want. You can also edit the
automatic zoning by adjusting the text, table, and picture zones.
Introduction1–7
◆Zone templates with re-usable data. After you create a set of
zones, TextBridge lets you save and reload the zone templates for
new jobs. In this way you can consistently process or ignore
specific areas on the same type of pages and save time.
◆Re-usable training data. After you interactively train OCR,
you can save the training data in a file. You can reload this
training file for similar documents of the same page type. Using
this training file assures the highest recognition accuracy without
having to repeat the training.
◆Custom dictionaries. To improve recognition accuracy further,
you can create specialized word lists (scientific terminology,
proper names, acronyms, and so on) within TextBridge or in
ASCII files and load them into TextBridge.
◆Two-sided document processing. If your scanner has a sheet
feeder, you can scan the fronts (odd sides) of the pages first, then
flip the stack and scan the reverse (even) sides. When scanning
and recognition are complete, TextBridge automatically collates
the text.
With these features, you can import virtually any paper
document or document image file to your computer. TextBridge
attains the highest degree of OCR accuracy and provides the
output in fully editable form in your favorite text program.
Characteristics of Documents TextBridge can recognize
TextBridge includes a number of advances developed by the
Xerox Desktop Document Systems (DDS) division and by the Palo
Alto Research Center (PARC), where modern computer interfaces
were invented.
Consequently, TextBridge provides the highly accurate OCR and
format retention results on the widest range of documents.
TextBridge can recognize:
◆Documents printed on typewriters, phototypesetters, and impact,
ink-jet, dot-matrix, and laser printers
1–8TextBridge Pro 98 User’s Guide
◆Photocopied, degraded, or dirty documents
◆Documents with single- or multiple-column layouts
◆Spreadsheets or cell tables
◆Paper documents with black and white, grayscale, or color
pictures including photos and line art
◆ Page image files with black and white pictures
NoteAfter processing with TextBridge, all pictures are output as black
and white or grayscale. However, TextBridge can recognize and
retain color and grayscale pictures in XIF files.
◆Online single- or multiple-page images from fax modems and
other sources
◆Hard-copy faxes
◆Documents with point sizes ranging from 5-point to 72-point type
in practically any typeface
◆TextBridge software in English, French, German, Italian, or
Spanish
◆Documents composed in English, French, German, Italian, or
Spanish
NoteTextBridge versions shipped in international markets can
recognize an even greater number of languages: Danish, Dutch,
Finnish, Norwegian, Portuguese, and Swedish.
Introduction1–9
WHAT COMES WITH TEXTBRIDGE
TextBridge comes with the following items:
◆One installation CD-ROM. The CD-ROM includes software
programs, scanner drivers, language packs, sample page image
files, release notes, Help, and User’s Guide in PDF format.
◆A printed User’s Guide.
◆A software registration card.
NoteBe sure to register electronically or complete and return the
software registration card. Registration entitles you to free
customer support and assures that you are kept up-to-date on
new software releases and other information related to
TextBridge and the ScanSoft family of software.
In the US, the mailing address is on the registration card. In the
UK, the mailing address for registration is:
XIS Support Department (PSC)
Willow Grange
Church Road
Watford
Hertfordshore
WD1 3QA
Check to be sure you have all the items listed above. If any item
is missing from your TextBridge package, call your authorized
ScanSoft dealer.
For information about contacting ScanSoft, refer to the Preface of
this manual or the Help system.
1–10TextBridge Pro 98 User’s Guide
SCANNERS SUPPORTED
◆Built-in ISIS drivers provided by Pixel Translations Inc.
◆The TWAIN standard, which lets TextBridge work with virtually
◆Text Enhancement Technology and Auto Area Segmentation
NoteInstall your scanner before you install TextBridge.
TextBridge works with many popular desktop scanners using:
any fully TWAIN-compliant device that provides a binary image
in a supported size and resolution.
features of the EPSON® scanner family.
The full list of scanners supported by TextBridge is always
growing. Check the online Release Notes and the TextBridge Web
site at www.textbridge.com to find the latest list of supported
scanners. If your scanner is not in this list, call ScanSoft or to
check further if your scanner is supported.
Scanners require a system-level driver or a TWAIN source driver,
which is provided by the scanner or interface card manufacturer.
Consult the scanner documentation for details about installing
your scanner, interface card, and driver.
After installing your scanner, test that the scanner is functioning.
Refer to the scanner manufacturer’s documentation to answer
any questions about the scanner.
NoteYour scanner must be working independently of TextBridge prior
to connecting it to TextBridge.
In general, it is recommended that you turn on your scanner
before you turn on your PC.
Next, install the TextBridge software.
Introduction1–11
SYSTEM REQUIREMENTS
To install and run TextBridge 98, your Windows-compatible PC
must be equipped with the following:
◆An Intel (or compatible) 80486 or Pentium™ microprocessor
◆VGA, SVGA, or multi-sync color monitor
◆A minimum of 16 megabytes of random access memory (RAM) for
Windows 95 and Windows NT
◆Microsoft Windows™ 95 or Windows NT 4.0
◆A hard disk with a minimum of 20 megabytes (20 MB) of free
space in which to install TextBridge. This enables installation of
all TextBridge software and one language pack. Please allow one
megabyte (1 MB) for each additional language pack you intend to
install.
INSTALLING TEXTBRIDGE
After you have installed your scanner and checked that it is
working, you are ready to install TextBridge software.
This section provides procedures to install TextBridge.
NoteIf you want TextBridge to run on both Windows 95 and NT with a
dual boot, install TextBridge twice.
Before you begin installation, exit from any open applications so
that only Windows is running. There should be no applications
listed in the task bar and no floating toolbars.
To install TextBridge for Windows 95/NT 4.0:
1–12TextBridge Pro 98 User’s Guide
Read the instructions
Click Next to proceed
1. Insert the TextBridge CD into your CD-ROM drive.
An autorun program on the CD-ROM launches the TextBridge
setup program automatically. (You can also use the Windows
Explorer to open the drive, and double-click the autorun.exe
program at the top level of the CD-ROM.)
The TextBridge 98 Setup dialog box appears (Figure 1–2).
Figure 1–2. TextBridge Pro 98 Setup dialog box
2. Read the information in the Setup dialog box, then click
Next.
3. Read the Software License Agreement (Figure 1–3), then
click Yes to proceed with installation.
Introduction1–13
Read the agreement
Click Yes to proceed
Figure 1–3. Software License Agreement dialog box
☞ If you click No because you do not accept the license
agreement, the TextBridge setup closes.
4. The Setup Installation Type dialog box appears
(Figure 1–4). Complete this screen, as follows.
1–14TextBridge Pro 98 User’s Guide
Read the instructions
Click Next to proceed
Figure 1–4. Setup Installation Type dialog box
•Choose one type of installation:
• Typical (Recommended for most users.)
• Compact (Used for minimum installation.)
• Custom (Used to select OCR language packs to install.)
•Accept the default destination directory, or Browse for a
different directory. (It is recommended that you install
TextBridge in the default directory.)
•Click Next to install the TextBridge files onto your
destination disk directory.
5. After the files are copied onto your system, the installation
of TextBridge is complete.
All of the TextBridge program is now installed.
Introduction1–15
Select your scanner
6. A Scanner Setup dialog box asks if you want to set up your
scanner. Select Yes (Figure 1–5).
Figure 1–5. Scanner Setup dialog box
Click No if you are not using a scanner or are not sure at this
time.
7. The Scanner Setup dialog box appears (Figure 1–6).
Complete this screen as follows:
Figure 1–6. Scanner Setup dialog box
• If you will not be using TextBridge with a scanner, select No
scanner, then click OK. If you want to use a scanner at a later
time, you can get this dialog box from the Select Scanner in
the File menu
1–16TextBridge Pro 98 User’s Guide
• Select your scanner, then click OK.
• If applicable, click Configure to further define your scanner
configuration. Refer to your scanner documentation for details
about scanner configuration settings.
For some scanners, a dialog box appears that lets you define
settings including Port Address, SCSI ID Number, Transfer
Mode, and Scanning Speed.
For other scanners, a dialog box appears with the following
message:
This scanner’s configuration is set using the
system-level driver.
When you are finished specifying scanner configuration settings,
click OK to save the new settings.
• Click OK in the Scanner Setup dialog box.
8. Complete the electronic registration.
Follow the instructions in the registration dialog box.
If your PC is not set up for electronic registration, please fill in
the registration card, and mail it.
9. The Setup Complete dialog box appears. Specify when you
want to restart your PC, then click OK.
Figure 1–8. Setup Complete dialog box
1–18TextBridge Pro 98 User’s Guide
Click OK to restart your
computer
☞ Restarting is necessary to complete TextBridge setup. It is
recommended that you restart immediately. However, if you
want to perform other activities before restarting, you can
click No.
Congratulations! TextBridge setup is now complete, and your
new software is installed on your PC.
SETTING UP TEXTBRIDGE INSTANT ACCESS
When you restart your PC, you can use the TextBridge Instant
Access Control Panel dialog box to set up Instant Access (Figure
1–9). To set up TextBridge Instant Access from your other
programs, use the following procedure:
1. On the Windows task bar, click Start.
2. Point to Programs, then point to the TextBridge Pro 98
folder.
3. Click Instant Access Control Panel.
The TextBridge Instant Access Control Panel dialog box appears.
Check one or more programs
Click OK
Figure 1–9. TextBridge Instant Access Control Panel
Introduction1–19
4. Check one or more programs in the list.
5. Click OK.
More information is available if you click the Help button in the
dialog box.
TextBridge will now be available in the File menu of the
program(s) you checked if they are installed on your PC.
UNINSTALLING TEXTBRIDGE
To restore your PC to the state it was in before you installed
TextBridge, use the following procedure:
1. Close all active applications, including TextBridge.
2. On the Windows task bar, click Start.
3. Point to Programs, then point to the TextBridge Pro 98
folder.
4. Click TextBridge Uninstall.
The TextBridge Uninstall dialog box appears.
5. Click Yes to continue the uninstall process.
TextBridge automatically uninstalls.
Click No to exit the uninstall process.
6. The Uninstall Complete dialog box appears. Click OK to
restart your computer.
With the above steps completed, TextBridge is completely
uninstalled from your PC.
1–20TextBridge Pro 98 User’s Guide
INPUT IMAGE FILE FORMATS SUPPORTED
The source of page images for TextBridge can be your scanner or
it can be image files. TextBridge can recognize the following types
of image file formats:
Image File FormatFile Name Extension
Windows bitmap.bmp
PCX.pcx
Multi-page PCX used in some fax programs .dcx
Tag image file format (including Alacrity
All the previous image files must be black and white with the
exception of .xif, which can contain color or grayscale images.
TextBridge can process images in resolutions from 72 to 900 dots
per inch. However, you will not receive noticeably better OCR on
images with resolutions higher than 400 dpi. In addition, you
may encounter memory errors or at least slower processing time.
It is recommended that you scan at 300 or 400 dpi.
NoteThis list is subject to change. Refer to the online Release Notes
for the latest information.
.tif, .ala
Introduction1–21
OUTPUT TEXT FILE FORMATS SUPPORTED
TextBridge can convert its recognized text to files for the
following programs:
ProgramFile Name
Adobe PDF/Normal.pdf
Adobe PDF/Image Only.pdf
Adobe PDF/Image and Text Only.pdf
Ami Pro 2.0 and 3.0.sam
ASCII Smart, Standard, and Stripped.txt
dBase IV.dbf
DCA/RFT.rft
DisplayWrite 5.rft
Excel 3.0, 4.0, and 5.0.xls
Excel for the Macintosh 3.0 to 7.0.xls
Excel 97.xls
FrameMaker.mif
HTML.htm
HTML Editor.htm
HTML Netscape.htm
Interleaf.wps
Lotus 1-2-3.wk1
Lotus Word Pro.lwp
MSWorks.rtf
MultiMate Advantage.doc
PostScript.ps
Professional Write 2.0 and 2.2.doc
Quatro Pro for Windows.wb2
Rich Text Format.rtf
Extension
1–22TextBridge Pro 98 User’s Guide
ProgramFile Name
RTF for the Macintosh.rtf
Windows Write.wri
Word for Windows 2.x.rtf
Word 6.0 and 7.0.rtf
Word 97.rtf
WordPerfect 4.2 and 5.1.wpf
Word Perfect 6.0, 6.1, and 7.0.wpd
WordStar.doc
NoteThis list is subject to change. Refer to the Release Notes for the
latest information.
WHERE TO GO FROM HERE
To learn how TextBridge recognizes a document and how you
prepare TextBridge to do this, read Chapter 2. This chapter
explains the basic concepts and functions of the software.
Extension
To learn how you use TextBridge to process simple and complex
documents, refer to Chapter 3. It also explains how to view, zone,
train, and proofread your document in TextBridge and edit your
document in your word processor.
The online Help system provides a complete reference to the user
interface, including window areas, menus, commands, and tools
as well as overview information and features, step-by-step
procedures for using the software, tips, and a glossary.
Introduction1–23
2
This chapter provides information about the process of page
recognition. Use this chapter to learn about optical character
recognition (OCR), recomposition, and other concepts that
will help you use TextBridge effectively.
This chapter provides information about OCR and TextBridge
including:
◆What is TextBridge OCR?
◆Running TextBridge
◆TextBridge functionality
◆Before you start to OCR
◆Using TextBridge to OCR
◆Automatic processing
◆Manual processing
◆Improving Page Recognition with Settings
◆Improving OCR with Training
Page recognition refers to the process during which a page is
analyzed, and characters and words are saved as a text. Opticalcharacter recognition is the technology that converts
documents that you can read into documents that your computer
can read. Recomposition is the technology that reproduces the
formatting of text and the layout of the page, including the
positioning of text, pictures, and tables.
TextBridge Pro 98 User’s Guide2–1
WHAT IS TEXTBRIDGE OCR?
TextBridge is OCR software that turns paper documents or page
image data into image documents then into text documents on
your PC. Page image data is electronic information about the
pages of a document that comes from a source such as your
scanner or fax software. This data becomes an image document
and is stored in an image file. Text documents are files
containing information about the text and pictures in your
document. A text document contains one or more pages and is
expressed in text form and stored in a text file. You can open,
edit, reformat, and republish this information.
Page types
TextBridge can recognize a wide variety of pages. All you need to
do is specify settings to control page processing. To make this
easier and quicker, TextBridge gives you a set of page types.
These are common types of pages that you match with your
original pages. Each page type comes with default settings that
are most often used to process pages of that type. Using page
types makes it quick and easy for you to perform page
recognition. You just select the page type that most closely
matches your original page.
2–2TextBridge Pro 98 User’s Guide
Figure 2–1. Page types in Start dialog box
Page types incorporate three settings: page size, print type, and
page layout. The page types to choose from and their
characteristics are described in the following table:
Page TypePage Size Print TypePage Layout
Any PageLetterAnyAny
Any Page (Fax quality) LetterFaxAny
Legal DocumentLegalGoodSingle column
Magazine PageLetterGoodMulti-column
Memo or LetterLetterGoodSingle column
NewspaperLegalNewspaper Multi-column
Spreadsheet or TableLetterGoodTable and one-
column text
In addition to the settings connected with page types, there are
several other settings that control page processing. These have
default settings. The scanner resolution is set at 300 dpi. Page
layout, pictures, and paragraph styles are automatically retained.
You can modify these settings by using the Page Type tab in the
Settings dialog box.
OCR and TextBridge Pro2–3
Page sources
Figure 2–2. Page Type tab in Settings dialog box
You can get pages to process from your scanner or from page
images. Use your scanner as a source to input documents on
paper to TextBridge, which then takes the scanned images,
performs OCR, converts the recognized text and pictures to the
text file format of your choice, and stores it on your PC.
Alternatively, use TextBridge to recognize and convert page
images stored in image files that come from fax modems or other
sources. Refer to Figure 2–1, which illustrates where to select
your page source.
2–4TextBridge Pro 98 User’s Guide
Recomposition
TextBridge recomposition lets you keep the layout of the
original page. When you select Retain page layout, TextBridge
recomposes the layout, while maintaining full ability to edit in
the output file. After recomposition, text, pictures, and tables are
in the same position in relation to each other as in the original
page. You can see the results of recomposition when you print the
page or look at it in layout view, if your word processor supports
these elements.
Retain pictures keeps pictures in the saved document. If you do
not select retain page layout, pictures are saved at the end of the
document. If you select retain page layout, the pictures are in the
same position in relation to the text and each other as they were
in the original page when you print the page or view it so that
you can see the layout.
Format with paragraph styles makes it possible for you to see
the specific styles assigned to text by TextBridge. Formatting
styles include indentation, font size and style, underline, bold,
and italic. Paragraph styles have names that begin with “TxBr”
followed by two sets of numbers separated by a letter. The first
number represents the number of times you have pasted into the
application. The letter represents information about the style.
The letter “c” means that the paragraph is centered, “p” means a
regular paragraph, and “t” means a tab table. The second number
represents a specific style. For example, in the Style box in Word,
you can see and use the styles assigned by TextBridge such as
TxBr_3c11. The number of consecutive Instant Access pastes into
Word is 3, the paragraph is centered, and the style number is 11.
TextBridge can output formatted text and graphics to word
processor and spreadsheet file formats, as well as some database,
desktop publishing, and electronic document file formats.
OCR and TextBridge Pro2–5
It is important to note that in reconstructing the layout of the
original document, TextBridge is limited by the composition
capabilities of the text application. For example, there are some
complex magazine pages originally created with a publishing
program for which you will not get identical output in your word
processor. Even the most powerful word processors do not have
some of the composition capabilities of publishing software.
In addition, some complex, free-form layouts defeat TextBridge’s
recomposition capabilities. For these types of documents, it is
often best to preview pages and manually zone text and image
zones that you want to capture.
For some documents, you may want only the text in simple galley
(one-column) form. In this case, you would not want to retain the
layout. The output document will have a single column of all the
text in the original document. If you choose to retain paragraphstyles, the text formatting but not the page layout will be
retained. For example, the final document will have paragraphs
and headings in styles like the original document and in the
order of the original document. If you choose to retain pictures,
the pictures will be at the end of the document. If you use zoneordering, you can number the zones in the order in which you
want them to be in the final document.
NoteTextBridge is not designed to recognize and retain the layout of
forms, including forms designed with fill in the blanks, check
boxes, or vertical and horizontal lines separating fields of
information.
RUNNING TEXTBRIDGE
You can run TextBridge as standalone application or invoke it
from within an application with Instant Access.
2–6TextBridge Pro 98 User’s Guide
Standalone Application
Instant Access
When you start TextBridge from the Start menu, it operates as
standalone application and runs independently of any other
application. TextBridge recognizes pages and saves them in the
output format that you specify. You can then open the file in the
application that uses the format you specified.
Instant Access gives you direct access to TextBridge from
applications such as Microsoft Word. Programs with Instant
Access have a TextBridge command in the File menu. Clicking
TextBridge in the File menu starts TextBridge, which recognizes
pages and pastes them directly into the open document in the
program.
You operate TextBridge just as if you had started the standalone
TextBridge software. The differences between running
TextBridge standalone compared to Instant Access are:
◆The options to process automatically or manually and to retain
pictures and retain layout are available in the Start dialog box.
◆Deferred processing is not available.
◆TextBridge automatically determines which format to use based
on the application being used.
The Instant Access Control Panel, which is available from the
Start menu, enables you to specify which programs have Instant
Access to TextBridge. The programs in Figure 2–3 automatically
have Instant Access.
OCR and TextBridge Pro2–7
Figure 2–3. Instant Access Control Panel
TEXTBRIDGE FUNCTIONALITY
You can perform the activities in the following list with
TextBridge:
◆Select a page type for the type of page to process.
◆Select page type settings for the entire document or change
settings on a page-by-page basis.
◆View the scanned page and delete the page if desired.
◆Scan pages but defer OCR until later.
◆View the text and picture zoning.
◆Adjust the zoning, use a zone template, or entirely redo the
zoning.
◆Select a portion of a page to process.
◆Train OCR on one or more pages.
◆View the OCR results.
◆Proofread the OCR results.
◆Save the recognition results in one or more file formats.
2–8TextBridge Pro 98 User’s Guide
BEFORE YOU START TO OCR
The following checklist will take you through the most important
questions to ask before you start to process a document.
1. Is my document coming from my scanner or an image?
2. What type of page is the document?
3. Is this document a good candidate for OCR?
4. Do I want to retain the original layout of the document?
5. Does the original document have pictures? If so, do I want to
retain the pictures?
6. Do I want to save the document as a PDF document?
7. Do I want to stop to train OCR?
8. Are there any other settings I want to check and change?
The rest of this chapter provides information that helps you to
answer these questions.
USING TEXTBRIDGE TO OCR
The next two sections provide information on automatic OCR
processing as well as other more advanced processes that are
options in manual processing. Refer to the Help system for the
step-by-step procedures for these activities.
TextBridge provides an easy-to-use interface and a powerful set
of built-in capabilities. You can use it in a number of ways to do
OCR, depending on the complexity of the document to be
recognized. You can use TextBridge to OCR in automatic mode or
manual mode.
OCR and TextBridge Pro2–9
◆You can process all the pages automatically in the automatic
mode or interact with the process in the manual mode.
◆You can preview and zone pages before OCR.
◆You can train OCR to achieve the highest possible accuracy.
AUTOMATIC PROCESSING
When you use TextBridge’s automatic processing, TextBridge
processes pages automatically with very little interaction with
you. In automatic mode, once you select the page type,
TextBridge automatically recognizes your page(s). TextBridge
only stops for you to add more pages and to proofread the results
of recognition. You may request to train OCR, but this is optional.
The following steps describe the automatic process of using
TextBridge for page recognition. Refer to the Help system for the
step-by-step procedures for these activities.
1. Click the Auto Process button.
2. Select a page type.
3. Select the source of the page image, either your scanner or
image file.
4. Click OK.
5. TextBridge processes all the pages in your scanner or the
selected image file(s).
6. If scanning, click the More Pages button to add another
page to the final document. (Optional)
7. If scanning, click the No More button when there are no
more pages to add.
2–10TextBridge Pro 98 User’s Guide
Click the Auto Process
button
8. TextBridge recognizes the page(s) including character,
picture, and format recognition.
9. Proofread the results of character recognition.
10. Save the text and picture(s) in a file format of your choice.
During the automatic process, you will interact with these
screens.
Figure 2–4. Click the Auto Process button in the TextBridge
window
OCR and TextBridge Pro2–11
Click No More to
proceed when ready
Figure 2–5. Start the automatic process using the Start dialog
box
Figure 2–6. Add Pages to Scanner dialog box
2–12TextBridge Pro 98 User’s Guide
Click Save when finished
proofreading
Proofread toolbar
Figure 2–7. Proofread the results of recognition using the
Proofread toolbar
Figure 2–8. Save the document using the Save As dialog box
OCR and TextBridge Pro2–13
MANUAL PROCESSING
◆Preview the page
◆Zone the page manually
◆Proofread the document
1. Click the Get Page button.
TextBridge is powerful OCR software that enables you to get
professional results from page recognition. Page recognition is a
complex process, and it can require your interaction with
TextBridge to get the best output. There are a number of
opportunities during the page recognition process that allow you
to enhance the results for the particular document and future
similar documents. During manual mode, you lead TextBridge
through processing a document.
During manual processing you can stop the page recognition
process to perform the activities in the following list:
The following steps describe the manual process of using
TextBridge for page recognition. Refer to the Help system for the
step-by-step procedures for these activities.
2. Select a page type.
3. Select the source of the page image, either your scanner or
an image file.
4. Click OK.
5. TextBridge processes the first page image.
6. Preview the page, including zoning.
7. Click the Recognize Page button.
2–14TextBridge Pro 98 User’s Guide
8. TextBridge recognizes the page, including character,
picture, and format recognition.
9. Proofread the results of recognition.
10. Add more pages to the document. (Optional)
11. Save the text and picture(s) in a file format of your choice.
Each of these activities is explained in the next sections.
Selecting Page Type and Source
When you start processing a new document, the TextBridge —
Start dialog box appears, and you can perform the actions in the
following list:
◆Indicate whether pages are from your scanner or an image file.
◆Select the Page type that best matches your original page(s).
◆View and change the settings for the page type you selected.
Figure 2–9. Start the manual process using the Start dialog box
OCR and TextBridge Pro2–15
Page toolbar
NoteYou can use the optional Page toolbar to select the page type and
source rather than the Start dialog box. From the Toolbars dialog
box in the View menu, choose the Page toolbar.
Figure 2–10. Page toolbar
Previewing the Page
After TextBridge gets a page of a document and before it begins
page recognition, you preview the page. You commonly use
Preview to check the contents, brightness, orientation, and
quality of the page and delete unwanted pages from the
document. After you check the page, zone it.
2–16TextBridge Pro 98 User’s Guide
Preview toolbar
Figure 2–11. Preview the page using the Preview toolbar
Processing stops after TextBridge gets each page and displays the
acquired image of the original page. At this point, you can
perform one or more of the activities in the following list:
◆Check that this is the page you want.
◆Check the quality of the scanned page.
◆Rotate the page to turn the page upright, if necessary.
◆Use the Zoom commands to magnify or reduce the page view.
◆Delete the page from the document.
◆Adjust the settings for processing the page.
◆Cancel the process by creating a new file or opening another file.
◆Look at the properties of the page.
◆Continue processing the page.
You can use the Preview toolbar or View menu commands to
examine and orient the acquired page.
OCR and TextBridge Pro2–17
Zoning the Page
During preview and before recognition can begin, the page must
be zoned. TextBridge can automatically zone the page, or you can
zone the page manually. An acquired page is divided into one or
more zones. There are three types of zones: text, table, and
picture.
Text zoneContains text and can be normal or reverse (white
characters on a black background).
Table zoneContains tables divided into cells. Tables can be
ruled and unruled.
Picture
zone
NoteA form is not a table. TextBridge does not OCR forms and does
not retain the original layout of most forms.
Each type of zone has a different transparent color so you can
easily distinguish among them. TextBridge assigns colors to each
type of zone. You can change the assigned colors in the Options
dialog box. TextBridge also orders zones for output. TextBridge
assigns a number to every text and table zone in the following
order: headers, text including titles and headings, insets and
picture captions, and footers. You can change the order of the
zones during preview. This is useful when you want to output the
document without retaining the page layout and reorder the
paragraphs in the output document.
Only those parts of the page that are marked with zones are
recognized by TextBridge. If you want to recognize only part of a
page, mark only that portion. TextBridge does OCR on text and
table zones and converts both to text. OCR is not done on picture
zones. Picture zones are saved as part of the output. They are not
saved as separate files.
Contains any graphic art such as line art and
photographs, and halftone images, which you see
as shades of gray.
2–18TextBridge Pro 98 User’s Guide
Preview toolbar
When you stop to preview the page, you can display the zones
automatically generated by TextBridge. You can adjust these
zones before continuing the zoning process and recognizing the
page. You can also manually zone the page. Use the zoning tools
in the Preview toolbar called text marker, table marker, picture
marker, and erase marker like highlighting markers to create
and adjust zones.
Figure 2–12. Zone the page using the Preview toolbar
You can perform these activities related to zones:
◆Use TextBridge automatic zoning.
◆Mark text, table, and picture zones.
◆Zone only part of a page.
◆View and adjust the text and picture zoning by adjusting the size,
merging, or splitting the zones.
OCR and TextBridge Pro2–19
◆Drag a selected zone to adjust its position.
◆Delete zones so that text, tables, or pictures are not included in
the final document.
◆Display the attributes of the current zone.
◆Change a zone from one type such as text to another type such as
table.
◆Enlarge or reduce the page to view the zones, using the Zoom In
and Zoom Out buttons.
You can also perform these less common activities related to
zones:
◆Use the same zoning for subsequent pages of the same document.
◆Save the current set of zones (including their size, location, and
type) as a template and use the zone template on other
documents.
◆Change the colors used to highlight different types of zones.
◆Create polygonal zones with intersecting rectangular zones.
◆Select zone order. Select zones and assign a number to each zone
to determine the order in which zones are output to the text
document. However, if you choose to Retain Page Layout,
TextBridge ignores the zone order.
You can use the Preview toolbar to quickly perform many of these
activities or use commands in the File, Edit and View menus.
After you complete the preview, tell TextBridge to recognize the
page.
2–20TextBridge Pro 98 User’s Guide
Proofreading the Document
In automatic mode, after TextBridge recognizes all the pages of a
document, you can proofread the recognition results. TextBridge
displays the first page for proofreading after all the pages of a
document have been recognized and before you save the
document. In manual mode, TextBridge stops for you to proofread
after it recognizes each page. The page is laid out like the original
page. Pictures found by OCR are displayed in the same location
as in the original page.
NotePictures in a XIF file are not shown, and place holders are
displayed in their place. Proofreading is not available if you are
creating a PDF file.
Proofread toolbar
Original Image
Figure 2–13. Proofreading the page using the Proofread toolbar
OCR and TextBridge Pro2–21
Saving the Document
Words that TextBridge suspects may not have been recognized
correctly are color coded. Suspect words are identified by one
color and unrecognized characters are highlighted in another
color. By default, the suspect words are blue, and the current
word in the Suspect box is yellow in the view area and Original
Image window. Use the Proofread toolbar to correct words.
You can add corrected words to the user dictionary, which can
improve recognition in subsequent pages of the same document
and subsequent documents. The user dictionary is most useful for
non-standard words that you frequently need to recognize such as
proper nouns and technical words.
While you are still in proofread mode, you can add pages to the
final document by getting a page using either the automatic or
manual process.
After you finish proofreading the document, you are ready to save
it. Once you save the document, you can not add any more pages
to it. You can specify the location, name, and type of format of the
output document. TextBridge converts the document to the
format of your choice and saves it. You can save the same
document more than once using the Save As command. For
example, you can save the document as text only and then save it
with pictures and layout.
2–22TextBridge Pro 98 User’s Guide
Figure 2–14. Saving the page using the Save As dialog box
After you save the document, the image of the document remains
on the screen until you begin a new job.
IMPROVING PAGE RECOGNITION WITH SETTINGS
There are a number of settings that you select in TextBridge at
the beginning of the recognition process to help it recognize a
document with more accuracy. Many of these options are related
to the manual processes described in the previous section. Use
the Settings dialog box to specify which options of the software
you want to use.
Page Type Settings
Usually, you will want to use the settings automatically assigned
to a page type. However, it is possible for you to change these
settings.
OCR and TextBridge Pro2–23
You can view and change the settings for a page type in the Page
Type tab of the Settings dialog box. Check the settings to be sure
they are the best ones for processing the original page.
Figure 2–15. Page Type tab in the Settings dialog box
On the Page Type tab, you can choose the following page type
settings for the specific page type you select:
◆Select the page layout of the original page:
♦ Any layout
♦ Single column with or without pictures and tables
♦ Multi-column with or without pictures and tables
♦ Table for pages with a table or spreadsheet and single-column
text.
2–24TextBridge Pro 98 User’s Guide
When you select Any layout, TextBridge automatically
determines the page layout. Use Any layout when pages in your
document have different layouts or when your pages have
complex layouts that do not fit the above layouts.
◆Set the print type of the document to be processed.
♦ Any
♦ Good
♦ Fax
♦ Dot matrix
♦ Newspaper
When you select Any, TextBridge automatically determines the
print type.
◆Set the page size to reflect the actual size of the original page:
♦ Letter
♦ Legal
Scanner Settings
♦ A4 for European documents
♦ Business Card
You can view and change the settings for your scanner in the
Scanner tab of the Settings dialog box. For ISIS and TWAIN
scanners that have been configured to allow TextBridge to control
the scanner, check the scanner settings to be sure they are the
best ones for processing the original page.
OCR and TextBridge Pro2–25
Figure 2–16. Scanner tab in the Settings dialog box
On the Scanner tab:
◆Choose the resolution value to reflect the actual resolution of your
scanner. For most documents, use 300 dpi. For 8 point or smaller
text, use 400 dpi for the best results.
◆Change the brightness based on whether your original page has
light or dark text and pictures. For example, if the text and
pictures on the original page are light, darken the brightness
control.
◆Check the box to use the Automatic Document Feeder if your
scanner has this feature and you are scanning multiple pages.
◆If you use an Epson scanner, check Text Enhancement
Technology to enhance the scanned image for text recognition.
◆If you use an Epson scanner, check Auto Area segmentation to
enhance the scanned image for improved picture quality.
2–26TextBridge Pro 98 User’s Guide
Processing Settings
You can view and change the settings for processing in the
Processing tab of the Settings dialog box. Check the settings to be
sure they are the best ones for processing the original page.
Figure 2–17. Settings dialog box and Processing tab
On the Processing tab:
◆Select the primary language of the document, which could be:
English, German, French, Italian, Spanish, and other languages.
◆Choose one or more of the following options to apply during page
processing for special documents, if you have created any of
these: a user dictionary, training data, and zone template.
OCR and TextBridge Pro2–27
◆Set the page orientation for the way text and images are printed
◆Check Train OCR if you want to train TextBridge OCR for
Text Document Settings
on the original page:
♦ Any orientation
♦ Portrait
♦ Landscape
If you select Any orientation, TextBridge automatically
determines the page orientation.
recognition.
You can view and change the settings for the output document in
the Text Document tab in the Settings dialog box. Check the
settings to be sure they are the best ones for processing the
original page.
Figure 2–18. Text Document tab in the Settings dialog box
2–28TextBridge Pro 98 User’s Guide
On the Text Document tab select default settings for saving the
results:
◆Specify one or more recomposition settings to reflect the output
results you want based on the original page:
♦ Retain the layout of the original page.
♦ Retain the pictures on the original page.
♦ Format with paragraph styles
If you select format with paragraph styles, the text in the output
document is formatted with the paragraph styles of the original
page rather than automatically formatted with a single style.
NoteThe output page looks the same whether format with paragraph
styles is checked or not; however, in your word processor there
are paragraph and table styles assigned to the text. You can use
the styles from the original document if you modify the text in the
output document.
◆Specify how you want to save the results of document processing:
♦ Save the output as one document in one file.
♦ Save each page as a separate document in a separate files.
♦ Save a new document whenever a blank page is found in the
original document.
OCR and TextBridge Pro2–29
◆Specify where you want to save the results of document
processing:
♦ Save the document on the Desktop.
♦ Save the document on the a: drive.
♦ Save the document in My Documents.
♦ Save the document in Text Documents.
♦ Save the document in a location you determine by using the
Browse button or typing the location.
◆Specify the type of format in which to save the results from the
list of options.
NoteThe save as type and the recomposition settings are related. If
there is a conflict between the recomposition and save as type
settings, some settings will automatically be unavailable to
prevent a conflict.
◆Specify the default name of the scanned document to save.
♦ The default name is automatically the name of the current
page type for pages from your scanner and the name of the
image file for pages from an image file.
♦ Type in another name, if desired.
If you are saving more than one document, each document has
the same base name appended with an integer in parentheses.
For example, Magazine Page (2). If you want to rename the file,
you can rename it in the Save As dialog box.
You can change settings for retain picture, retain page layout,
and file location later in the Save As dialog box, if you desire.
2–30TextBridge Pro 98 User’s Guide
Saving a Document in a PDF File
TextBridge saves the results of page recognition in a format
compatible with common text software such as your word
processor or spreadsheet. In addition, TextBridge can save the
results of page recognition in the Adobe Acrobat PortableDocument Format (PDF). This format lets you share documents
among different types of computers, such as PC or Macintosh.
TextBridge can output documents in the PDF formats in the
following list:
◆Acrobat PDF Normal with no word images
◆Acrobat PDF Normal with suspect word images
◆Acrobat PDF Normal with highly suspect word images
◆Acrobat PDF Image only
◆Acrobat PDF Image and text
After you select the type of page to OCR and before you begin
processing, use the Text Document tab in the Settings dialog box
to set the Save As type to the type of PDF that you want. When
you save the results, you must keep the PDF format that you
specified before beginning the job.
You can automatically or manually zone pages, as you can with
any other format. Retain pictures and page layout are not
applicable, since both are automatically retained.
NoteThe Proofreader is not available when you select PDF output.
Also, you cannot output PDF files if you use Instant Access to
TextBridge.
OCR and TextBridge Pro2–31
Figure 2–19. Text document tab in the Settings dialog box with
Adobe PDF format selected
IMPROVING OCR WITH TRAINING
To assure the highest possible accuracy, TextBridge provides an
interactive training capability that can aid in word recognition
on pages with the same fonts and print quality. This feature
enables you to participate in the OCR process, verifying correctly
recognized words and correcting any recognition errors. You can
select the Train OCR command at the beginning of a job or
during a job and use it in automatic or manual mode. Interactive
training is especially effective for documents with poor quality
originals, such as faxes and multi-generation photocopies.
2–32TextBridge Pro 98 User’s Guide
Training level
Training toolbar
Figure 2–20. Training using the Training toolbar
You can set training level options to control the sensitivity of
the training process. The level determines how frequently suspect
words are displayed for your input. You can request to see any
word that is slightly suspect or only those words that are highly
suspect. In the first case you will review a fewer number of words
than in the second case.
As you correct or accept TextBridge’s recognition decisions, you
also train TextBridge to improve its own accuracy rate for later
pages of the same document. During this process, TextBridge
compiles information about the character shapes, styles, and
sizes found in the document being recognized. You can accept or
correct each suspect word until you are satisfied that TextBridge
is sufficiently trained. Then you can turn interactive training off,
and TextBridge will recognize the rest of the document.
OCR and TextBridge Pro2–33
You can use the Training toolbar to correct words. You can add
corrected words to the user dictionary, which can improve
recognition in subsequent pages of the same document and
subsequent documents. The user dictionary is most useful for
non-standard words that you frequently need to recognize such as
proper nouns and technical words. You can save and later reload
the user dictionary to assure that documents containing the same
words are recognized with the same high degree of accuracy.
You can save and later reload training data to assure that other
similar documents of the same font are recognized with the same
high degree of accuracy. Training data is only effective for
documents with the same font(s) characters, and print quality.
WHERE TO GO FROM HERE
To learn how you use TextBridge to process simple and complex
documents, refer to Chapter 3. It also explains how to start
TextBridge and use the Help system and sample documents, use
Instant Access, plus view, zone, train, and proofread your
document in TextBridge.
The online Help system provides a complete reference to the user
interface, including window areas, menus, commands, and tools
as well as overview information and features, step-by-step
procedures for using the software, tips, and a glossary.
2–34TextBridge Pro 98 User’s Guide
3
LEARNING TO USE
TEXTBRIDGE
The previous chapters have introduced you to TextBridge and
document recognition. This chapter provides step-by-step
instructions to teach you how to use the most important
capabilities of TextBridge.
The learning sessions build on each other and assume that you
understand the procedures explained in the previous sessions. It’s
best to do them in order or skim through prior sessions to
familiarize yourself with the steps. Each learning session begins
with introductory information including a list of what you will
learn followed by step-by-step procedures and explanations.
The topics presented in this chapter are in the following list:
♦ Starting TextBridge
♦ Using the Help system
♦ Using the sample documents
♦ Processing a simple document using auto processing
♦ Using Instant Access OCR
♦ Processing a complex document using manual processing
♦ Processing text, pictures, and a table
♦ Training OCR and using the Page toolbar
TextBridge Pro 98 User’s Guide3–1
STARTING TEXTBRIDGE
1. On the Windows task bar, click Start.
2. Point to Programs, then point to the TextBridge Pro 98
3. Click the TextBridge Pro 98 icon.
NoteFor these learning sessions you will be selecting the page type
There are two ways to start TextBridge. You can start
TextBridge as a standalone application or as Instant
Access from any Windows-based text application.
In this section you will learn to start TextBridge as a
standalone application.
To start TextBridge:
folder.
The TextBridge Pro 98 main window appears.
and page source from the Start dialog box. The instructions for
these sessions assume that your screen looks like the following
figure.
3–2TextBridge Pro 98 User’s Guide
Menu bar
Main toolbar
Process toolbar
View area
Status bar
Figure 3–1. TextBridge Pro 98 main window
If your screen does not look like this, in the View menu, select
Toolbars. In the Toolbars dialog box, select Start dialog box in the
Select Page Type and Source From box.
Figure 3–2. Toolbars dialog box
Learning to Use TextBridge Pro3–3
USING THE HELP SYSTEM
TextBridge is designed to be easy to learn and use. The
online Help system provides general information about
the program, step-by-step procedures for using the
program, a glossary, and a complete reference to the user
interface, including window areas, menus, commands,
and tools.
☞In this section, you will learn to:
♦ Get information from the Help system, including What’s This?
Help.
♦ Use Help Topics window.
What you want to know
about
Item in a dialog box or menuClick the ? button then click the
Entire dialog boxClick the Help button in the dialog
How to do somethingClick TextBridge Help in the Help
General informationClick TextBridge Help in the Help
How to get Help
item or
Right mouse click the item then
click What’s This? in the shortcut
menu. or
Select the item than press F1 or
Shift+F1.
box.
menu, then click Step-by-StepProcedures in the Contents tab or
use the Index.
menu then click AboutTextBridge Pro 98 in the Contents
tab or use the Index.
3–4TextBridge Pro 98 User’s Guide
What you want to know
How to get Help
about
Meaning of a wordClick Help in the menu bar and in
the Contents tab, click Glossary
or use the Index.
A concept not listed in the
Contents or Index
Click TextBridge Help in the Help
menu then click Find and follow
the directions.
You can get Help by using the main Help Topics window shown in
the following figure.
Figure 3–3. Help Topics: TextBridge Pro 98 Help window
From the Help Topics window, you can get the help you want by
performing one of the activities in the following list:
♦ Select a topic from a book in the Contents tab.
♦ Select a topic from the Index tab.
Learning to Use TextBridge Pro3–5
♦ Search for information about a specific word or phrase using the
Find tab.
♦ Jump from one topic to a related topic.
USING THE SAMPLE DOCUMENTS
In this section, you will learn about the sample documents and
how to open a sample document.
Use the sample documents provided on the installation CD with
the learning sessions in this chapter. You can find the five sample
documents in the installation folder in the following location:
C:\Program Files\TextBridge Pro 98\Images\Samples
This is the default location for these files; however, you may have
installed them in another location. The sample documents are
stored in TIFF format and are named:
Letter BookWise Markplan Scanning Plexis
The sample documents provide a cross-section of the page types
that TextBridge can process:
♦ Memo or Letter
♦ Magazine Page
♦ Any Page
♦ Spreadsheet or Table
The sample documents are designed to provide you with
documents on which to learn and to highlight the capabilities of
the application. In each of the learning sessions, you are asked to
use a specific sample document.
3–6TextBridge Pro 98 User’s Guide
☞In this session, you will learn to open a sample document. For
this session, use Letter.tif.
Figure 3–4. Letter sample document
To find and open a sample document:
1.Click the Auto Process button.
The Start dialog box appears.
Learning to Use TextBridge Pro3–7
Figure 3–5. Start dialog box with Any Page and Image file
selected
2.In the Start dialog box:
•Click Any Page in the Page type box.
•Click Image file in the Page source box.
•Click OK.
The Open dialog box appears. The default folder Samples is
open. The sample TIFF files are listed in the Open dialog box.
3–8TextBridge Pro 98 User’s Guide
Select an image file
Figure 3–6. Open dialog boxwith Letter.tif selected
If Samples is not the open folder, access the sample documents
folder in the following location from the Look In: box in the Open
dialog box:
C:\Program Files\TextBridge Pro 98\Images\Samples
This is the default location unless you installed TextBridge in
another place.
3.The Open dialog box, double-click a file name to open it.
In this case, double-click Letter.tif.
TextBridge gets the page as shown in the following figure.
Learning to Use TextBridge Pro3–9
Figure 3–7. TextBridge - Getting Page dialog box
TextBridge automatically zones the page and identifies text,
tables, and pictures as shown in the Zoning dialog box.
Figure 3–8. TextBridge - Zoning dialog box
TextBridge automatically recognizes the characters and page
layout as shown in the Recognizing dialog box.
3–10TextBridge Pro 98 User’s Guide
Figure 3–9. TextBridge - Recognizing dialog box
After TextBridge reads the page image and processes it, it stops
for you to proofread the page.
Figure 3–10. TextBridge - Proofread window
Learning to Use TextBridge Pro3–11
For this lesson, you just want to get back to where you started
without saving the document.
4.Click the New command in the File menu to discard the
current page.
A dialog box appears and tells you that the current page will not
be saved.
5.Click OK.
You return to the original TextBridge screen.
Now you know how to find and open a sample document .tif file.
Proceed to the learning sessions to work with TextBridge, and
familiarize yourself with using its capabilities.
SESSION 1: PROCESSING A SIMPLE DOCUMENT USING AUTO
PROCESSING
TextBridge provides a range of powerful features. However,
TextBridge is also designed to be very easy to use. For many
documents, you can use default settings and automatically
process a document.
☞For this learning session, use the sample document named
Letter. This document has a single column of text and a logo.
In this session you’ll learn to:
♦ Use Auto Process
♦ Use the Start dialog box
♦ Select the Memo or Letter page type
♦ Open an image file
3–12TextBridge Pro 98 User’s Guide
♦ Save a document after recognition
When you select Memo or Letter as the page type, it
automatically specifies the following settings:
♦ Single column page layout
♦ Good print type
♦ Letter size
TextBridge also uses the following default settings:
♦ Scanner resolution 300 dpi
♦ Scanner brightness at normal
♦ English language
♦ Default user dictionary
♦ No training data
♦ No zone template
♦ Portrait orientation
♦ Retain page layout
♦ Retain pictures
♦ Format with paragraph styles
♦ One file for all pages
♦ Save in the Text Documents folder of the TextBridge folder
♦ Save as standard format
Learning to Use TextBridge Pro3–13
Refer to Chapter 2 and Help for more information about these
settings.
When retain page layout and retain pictures are set, TextBridge
recomposition keeps the layout of the original page when you
save and open your document in a format for an application that
supports recomposition, such as Word, WordPerfect, and Excel.
When retain page layout is selected, TextBridge recomposes the
layout of the document, including text and pictures, while
maintaining full editability in the output file. After
recomposition, text, pictures , and tables are in the same position
in relation to each other as in the original page when you print
the page or look at it in layout view, if your word processor
supports these elements.
With retain pictures selected, pictures are included in the final
document. If retain page layout is not selected, pictures are
placed at the end of the text rather than the same position in
relation to the text that they were in the original page.
If format with paragraph styles is selected, text in the final
document will be assigned specific styles with “TxBr” followed by
two sets of numbers separated by a letter. The first number
represents the number of times you have pasted into the
application. The letter represents information about the style.
The letter “c” means that the paragraph is centered, “p” means a
regular paragraph, and “t” means a tab table. The second number
represents a specific style. For example, TxBr_3c11 means that
the number of consecutive Instant Access pastes into Word is 3,
the paragraph is centered, and the style number is 11.
To process a simple document, use the following procedure:
1.Start TextBridge.
TextBridge appears.
3–14TextBridge Pro 98 User’s Guide
2.On the Process toolbar, click the Auto Process button.
The Start dialog box appears.
Figure 3–11. Start dialog box with Memo or Letter and Image file
selected
3.In the Start dialog box:
•Click Memo or Letter in the Page type box.
•Select Image file in the Page source box.
•Click OK.
The Open dialog box appears.
Learning to Use TextBridge Pro3–15
Select an image file
Figure 3–12. Opendialog box with Letter.tif selected
4.In the Open dialog box, double-click the sample document,
Letter.tif.
TextBridge reads the image file, and automatically performs OCR
on it, as indicated by the feedback display in the view area of the
main window. It stops to display the page for you to proofread.
For this lesson, just continue and save the document.
5.Click the Save button to save the document.
The Save As dialog box appears.
3–16TextBridge Pro 98 User’s Guide
Accept the default name,
or type a new name
Select the output format
Click Save
Figure 3–13. Save As dialog box
6.In the Save As dialog box, complete the following steps:
•In the Save in list, select the folder in which to save
the text file.
•In the File name box, type a file name.
•In the Save as type list, select the output format for
your word processor or other text application.
•Check that Retain pictures and Retain page layout
are selected.
•Click the Save button.
TextBridge formats and saves the document.
The status bar at the bottom of the screen confirms that you have
saved the document. The Proofread window remains open. You
can start to process another document if you press the Auto
Process button or the Get Page button.
Learning to Use TextBridge Pro3–17
☞ Be sure to notice where the document is saved so that you can
find it easily. The save location originally defaults to Text
Documents in the TextBridge Pro 98 folder in Program Files. You
can check or change the default in the Settings dialog box Text
Document tab.
7.Open the file in your word processor or other text
application.
Unless you specified otherwise, open the file in the Text
Documents folder of the TextBridge Pro 98 folder in the Program
Files folder. You can use the shortcut to this location.
☞The document is saved with an .rtf extension if your text
application is Word. In the Open dialog box of your word
processor, check that you can see files of this type listed.
Compare the recognized document in your word processor with
the picture of the sample document, Letter.tif.
Figure 3–14. Letter sample document
3–18TextBridge Pro 98 User’s Guide
With a word processor such as Word or WordPerfect in the page
layout view, the recognized document should have the same or
similar layout as the TIFF image or sample document. The
difference is that now you have formatted, fully editable text, just
as if you had typed it in yourself. At this point, you could spell
check the document and make any other changes in your word
processor.
SESSION 2: USING INSTANT ACCESS OCR
You can use TextBridge Instant Access OCR to run TextBridge
from within another application, such as a word processor. To use
Instant Access, simply start TextBridge from within an
application, such as Word or WordPerfect. During Instant Access
OCR, TextBridge processes a document then pastes it into the
open document in your text application.
☞For this learning session, use the sample document named
Markplan. This document has a single column of text, a title,
headings, and bullet lists. The procedure is similar to processing
a simple document.
In this session you’ll learn to:
♦ Use TextBridge Instant Access in your word processor.
♦ Select the Any Page page type.
When you select Any Page as the page type, it automatically
specifies the following settings:
♦ Any page layout
♦ Any print type
♦ Letter size
Learning to Use TextBridge Pro3–19
TextBridge also uses the following default settings:
♦ Scanner resolution 300 dpi
♦ Scanner brightness at normal
♦ English language
♦ Default user dictionary
♦ No training data
♦ No zone template
♦ Portrait orientation
♦ Auto process all pages
♦ Retain pictures
♦ Retain page layout
♦ Format with paragraph styles
♦ One file for all pages
♦ Save as standard format
♦ Save in Text Documents folder
Refer to Chapter 2 and Help for a more information about these
settings.
☞If TextBridge is still running from the previous learning session,
exit from TextBridge. This will let you run TextBridge from your
word processor. You can not have more than one copy of
TextBridge running at the same time.
3–20TextBridge Pro 98 User’s Guide
Before you run TextBridge as Instant Access, you may need to use
the Instant Access Control Panel to choose which applications
have Instant Access to TextBridge. TextBridge automatically
provides Instant Access for the applications listed in the control
panel, as shown by the check mark.
If you want to examine the status of Instant Access, click Start on
the Windows taskbar, then Programs, then TextBridge Pro 98,
and Instant Access Control Panel. You can also access the Instant
Access Control Panel from the main TextBridge application in the
File menu by clicking Instant Access Control Panel. Help provides
additional information about Instant Access.
Check one or more programs
Click OK
Figure 3–15. TextBridge Instant Access Control Panel
The Enable access to TextBridge list shows the text applications
from which TextBridge can be invoked. The list includes
applications commonly used with TextBridge and applications
that are currently running. If your application does not appear in
this list, close the TextBridge Instant Access Control Panel, start
your application, and reopen the TextBridge Instant Access
Control Panel. Your application should now appear in the list.
Click on applications in the list to check or uncheck them. Click
All to check all items in the list. Click None to uncheck all items
in the list. Instant Access to TextBridge will be available from all
checked applications.
Learning to Use TextBridge Pro3–21
Start Textbridge
Instant Access
Click OK to close the Instant Access Control Panel and save any
changes you specified.
To use Instant Access OCR from your word processor, use the
following procedure:
1.Start your word processor, and open a new document.
2.In the File menu, click the TextBridge... command.
Figure 3–16. Textbridge... command in File menu
The Start dialog box appears. Notice that the Start dialog box is
slightly different than the Start dialog box in the standalone
version of TextBridge. Processing and Output boxes have been
added.
3–22TextBridge Pro 98 User’s Guide
Figure 3–17. Start dialog box for Instant Access
3.In the Start dialog box:
•In the Page type box, click Any Page.
•In the Page Source box, select Image file.
•In the Processing box, select Auto process all pages.
•In the Output box, select Retain pictures and Retain
layout.
•Click OK.
If your application does not support Retain pictures and Retain
layout, these will not be available to select.
The Open dialog box appears.
Learning to Use TextBridge Pro3–23
Select an image file
Figure 3–18. Open dialog box with Markplan.tif selected
4.In the Open dialog box, double-click the sample document,
Markplan.
TextBridge reads the image file, and automatically performs OCR
on it, as indicated by the feedback display in the view area of the
main window. After acquiring and recognizing the page,
TextBridge pastes the document into the open document in your
word processor.
If TextBridge can not do this, you will get a message that tells
you, “TextBridge is unable to paste text automatically. Select
Paste from the Edit menu to manually copy the recognized text
from the clipboard.” You may have to open a new document before
you can use the Paste command. After you follow these
instructions, the processed document is pasted in the open
document of your word processor.
Compare the recognized document in your word processor with
the reproduction of the sample document, Markplan.tif.
3–24TextBridge Pro 98 User’s Guide
Figure 3–19. Markplan sample document
With a word processor such as Word or WordPerfect in the page
layout view, the recognized document should have the same or
similar layout as the TIFF image or sample document. The
difference is that now you have formatted, fully editable text.
☞If this document continues to a second page, delete any additional
spacing that was inserted into the document.
You can save the document or make any changes you’d like to the
document just as if you’d typed it yourself. For example, you can
spell check it and save it with your changes.
SESSION 3: PROCESSING A COMPLEX DOCUMENT USING MANUAL
PROCESSING
For more complex documents such as magazine articles, you often
can use TextBridge in automatic mode. However, simply using a
few additional steps in manual mode can sometimes produce a
more accurate result in less time.
Learning to Use TextBridge Pro3–25
☞For this learning session, use the sample document named
BookWise. This document has multiple columns, a dropped
capital letter, headings, paragraphs, bullet lists, and reversed
video text.
In this session you’ll learn to:
♦ Use manual processing with the Get Page button.
♦ Select Magazine Page type.
♦ Zone a page.
♦ Use the Zoom button.
♦ Add a word to the user dictionary.
♦ Proofread a page.
♦ Retain page layout.
♦ Save a page.
♦ Edit the document in your word processor.
Refer to Chapter 2 and Help to learn more about zoning and
proofreading.
When you select Magazine Page as the page type, it
automatically specifies the following settings:
♦ Multi-column page layout
♦ Good print type
♦ Letter size
TextBridge also uses the following default settings:
♦ Scanner resolution 300 dpi
3–26TextBridge Pro 98 User’s Guide
♦ Scanner brightness at normal
♦ English language
♦ Default user dictionary
♦ No training data
♦ No zone template
♦ Portrait orientation
♦ Retain pictures
♦ Retain page layout
♦ Format with paragraph styles
♦ One file for all pages
♦ Save as standard format
♦ Save in Text Documents folder
Run the standalone version of TextBridge from the Start button
for this learning session.
1.Start the TextBridge standalone version.
2.Click the Get Page button.
The Start dialog box appears.
Learning to Use TextBridge Pro3–27
Figure 3–20. Start dialog box with Magazine Page and Image file
selected
3.In the Start dialog box:
•Click Magazine Page in the Page Type box.
The settings are automatically set to multi-column page layout,
good print type, letter size. Note: If your text application does not
retain page layout, the page will be a single column of text also
referred to as galley text followed by pictures.
•Click Image file under Page source.
The application knows where to get the page.
•Click OK.
The Open dialog box appears.
4.Double click BookWise.tif.
TextBridge gets the page, displays it, and displays the Preview
toolbar so that you can preview it.
3–28TextBridge Pro 98 User’s Guide
The page you see should be a two-column magazine article
beginning with a drop cap.
☞If this is not the correct page, in the File menu, click New. Click
OK to close the current document. You can begin again by
selecting Get Page.
5.Click the Locate Zones button.
TextBridge automatically zones the page. TextBridge locates
areas on the page to recognize and designates each area as text,
table, or picture. TextBridge then stops for you to check and
change to the zones. All zones on this page are text zones.
Learning to Use TextBridge Pro3–29
Preview toolbar
Text zone
Figure 3–21. Zoned magazine page
6.Check the results of automatic zoning.
There should be six text zones. The numbers on the text zones
reflect the order in which these zones will appear in the final
document, even if you decide not to retain page layout.
•Click the Zoom In and Zoom Out buttons to enlarge
and reduce the page to examine the zones, if
necessary.
Zoom In Zoom Out
TextBridge magnifies the page.
•Modify automatic zoning, if necessary.
3–30TextBridge Pro 98 User’s Guide
If a zone is not assigned the desired type, right-click the zone. In
the shortcut menu, move the pointer to Zone Type and click the
type of zone you desire.
☞Reverse video text must be in a separate text zone that includes
no regular text. If the reverse video text is not in one zone by
itself with its own zone number, rezone the reverse video text.
One way to separate the reverse video text from the regular text
is to use the Erase Markup tool. Determine which area of the
page you want to include in the reversed video text zone.
To divide one zone into two zones:
• Click the Erase Markup button.
• Erase the area of the page that connects the regular text to
the reversed video text.
Press and hold the mouse at the upper left corner of the area you
want to erase. Drag the mouse diagonally across the area to
erase. When you have defined the area, release the mouse. The
area is erased and becomes white, which means it is no longer
included in a zone.
When the zones are accurate, continue with the next step, which
is page recognition.
7.Click the Recognize Page button.
TextBridge performs OCR and recognizes the page.
Learning to Use TextBridge Pro3–31
Proofread toolbar
Suspect word
TextBridge stops for you to proofread the results of recognition
and displays the Proofread toolbar. You can fix words not
recognized correctly and add them to the dictionary for improved
recognition.
Figure 3–22. Proofreading a page
9.Change any words that were not accurately recognized
using the Proofread toolbar.
• Examine the word in the Suspect box.
If you want a closer look at the word as it appears in the
original page, click the Word Image button on the Proofread
toolbar.
3–32TextBridge Pro 98 User’s Guide
• If the suspect word is the word you want, click the
Accept button.
TextBridge removes the suspect highlighting and continues to
the next suspect word.
or
• If the suspect word is not the word you want, type the
word you want in the Should Be box.
• Click the Add to Dictionary button if you want the
TextBridge dictionary to store a word for future
recognition.
☞As discussed in the section on proofreading in Chapter 2, the
dictionary is most useful for non-standard words that you
frequently need to recognize such as proper nouns and
technical words.
• Click the Accept button.
TextBridge continues to the next suspect word.
or
• If the you want to go on to the next suspect word, click
the Find Next button.
TextBridge does not remove the suspect highlighting and
finds the next suspect word.
• If you see a word in the text that you want to correct,
click it. The word appears in the Suspect box, and you
can edit it.
Repeat this process until you check every suspect word and either
accept it as is or change it then accept it. You can save at any
time.
Learning to Use TextBridge Pro3–33
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.