Nuance TEXTBRIDGE PRO-MILLENNIUM BUSINESS EDITION, TextBridge Pro - Millennium - Business Edition User Manual

3.87 Mb
Loading...

COPYRIGHT

INFORMATION

IMPORTANT

NOTICE

Copyright © 1995–2000 by ScanSoft, Inc. All rights reserved. No part of this publication may be transmitted, transcribed, reproduced, stored in any retrieval system or translated into any language or computer language in any form or by any means, mechanical, electronic, magnetic, optical, chemical, manual, or otherwise, without the prior written consent of ScanSoft, Inc., 9 Centennial Drive, Peabody, Massachusetts 01960.

The software described in this book is furnished under license and may be used or copied only in accordance with the terms of such license.

ScanSoft, Inc. provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Some states or jurisdictions do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you. ScanSoft reserves the right to revise this publication and to make changes from time to time in the content hereof without obligation of ScanSoft to notify any person of such revision or changes.

TRADEMARKS AND

CREDITS

TextBridge is a registered trademark, and Smart Zones, Instant Access OCR, and Custom Proof are trademarks, of ScanSoft, Inc.

Excel and Word are trademarks; Windows and FrontPage are registered trademarks of Microsoft Corp.

WordPerfect is a registered trademark of WordPerfect Corp.

Other terms used in this manual are the trademarks of their respective holders.

Animated character designed by Dreamlight Incorporated. www.dreamlight.com.

Portions of this product copyright © 1994–2000, Inso Corporation.

Authors: Lois West and Beth Paddock

© SCANSOFT, INC.

9 Centennial Drive

Peabody, Massachusetts 01960

TextBridge Pro Millennium Business Edition User’s Guide

Part Number 00–009733–00

April 2000

CONTENTS

PREFACE

 

About This User’s Guide ...............................

vii

 

Organization of this user’s guide.....................

viii

 

Documentation conventions .........................

ix

 

Related Documentation ................................

ix

 

Technical Support .....................................

x

1

INTRODUCTION TO TEXTBRIDGE

 

 

Basic OCR Concepts .................................

1–1

 

Features and Benefits ................................

1–3

 

New Features ...................................

1–3

 

Enhanced Features...............................

1–4

 

Other Features ..................................

1–7

 

Documents TextBridge Can Recognize ................

1–8

 

Input Image File Formats .............................

1–9

 

Output Text File Formats ............................

1–10

 

Output Image File Formats...........................

1–12

 

Where to Go From Here..............................

1–13

TextBridge Pro Millennium Business Edition User’s Guide

iii

2

INSTALLING AND SETTING UP TEXTBRIDGE

 

 

What Comes with TextBridge ..........................

2–1

 

Supported Scanners..................................

2–2

 

Installing and Testing Your Scanner .....................

2–3

 

System Requirements ................................

2–4

 

Before Installing TextBridge ...........................

2–4

 

Uninstalling a Previous Version of TextBridge ..........

2–4

 

Using TextBridge with Pagis .......................

2–6

 

Learning about TextBridge before you install it .........

2–6

 

Installing TextBridge ................................

2–7

 

Scanner Setup ......................................

2–8

 

Setting Up Instant Access to TextBridge ..................

2–9

 

Updating your TextBridge Software ....................

2–10

 

Uninstalling TextBridge Pro Millennium Business Edition...

2–10

 

Where to Go From Here..............................

2–11

3

OCR AND BASIC TEXTBRIDGE OPERATIONS

 

 

What is TextBridge OCR? .............................

3–1

 

Page types .....................................

3–2

 

Page sources ....................................

3–3

 

Recomposition ..................................

3–4

 

Running TextBridge Standalone and Instant Access .........

3–5

 

Standalone Program..............................

3–5

 

Instant Access ..................................

3–6

 

Improving Page Recognition with Settings ................

3–7

 

Page Type Settings ...............................

3–7

 

Text Document Settings ..........................

3–11

 

Recognizing Other Languages .........................

3–13

 

Language Installation............................

3–13

 

Language Processing ............................

3–13

 

Where to Go From Here..............................

3–15

iv

TextBridge Pro Millennium Business Edition User’s Guide

4

LEARNING TO USE TEXTBRIDGE

 

 

Before Beginning to Process a Document..................

4–1

 

Using TextBridge to Process a Document .................

4–2

 

Starting TextBridge..................................

4–3

 

Using Automatic Processing ...........................

4–4

 

Using Manual Processing .............................

4–6

 

Performing Basic Operations...........................

4–8

 

Selecting the Page Source..........................

4–8

 

Selecting the Page Type ...........................

4–9

 

Previewing the Page .............................

4–10

 

Zoning the Page ................................

4–12

 

Proofreading the Document .......................

4–14

 

Saving the Document ............................

4–16

 

Getting Help While Using TextBridge ...................

4–17

 

Using the Welcome Window .......................

4–18

 

Using the Show Me How Window ...................

4–18

 

Using Tips ....................................

4–20

 

Getting Information from Help .....................

4–20

 

Using the TextBridge Web Site.....................

4–21

 

Where to Go From Here..............................

4–22

5

SAMPLE SESSIONS WITH TEXTBRIDGE

 

 

Using the Sample Documents ..........................

5–1

 

Session 1: Recognizing a Simple Document Using

 

 

Auto Processing ..............................

5–6

 

Session 2: Using Instant Access to TextBridge ............

5–12

 

Session 3: Recognizing a Complex Document Using

 

 

Manual Processing...........................

5–17

 

Session 4: Processing Text, Pictures, and a Table ..........

5–25

 

Where to Go From Here..............................

5–32

Table of Contents

v

6 ADVANCED SAMPLE SESSIONS

Session 1: Processing a Document to Use in a Database

...... 6–1

Session 2: Using Zone Templates and Page Types ...........

6–6

Session 3: Training TextBridge OCR ....................

6–11

Where to Go From Here..............................

6–16

INDEX

vi

TextBridge Pro Millennium Business Edition User’s Guide

PREFACE

ScanSoft, Inc. welcomes you to TextBridge Pro Millennium

Business Edition for Windows® 95, 98, 2000, and Windows

NT® 4.0.

The documentation that comes with TextBridge provides you with the information you need to operate TextBridge. The documentation includes this user’s guide, a Help system, and Release Notes. ScanSoft invites your comments about the information provided in the documentation.

The documentation is part of an extensive user assistance program designed to provide you with information you may need to understand and use TextBridge. The section “Getting Help while using TextBridge” provides further information about user assistance.

Before going on to find out more about TextBridge, please read this preface because it describes these important items:

About this user’s guide

Related documentation

Technical support

ABOUT THIS USERS GUIDE

This user’s guide is a reference tool that provides information about TextBridge. It is for users with a wide range of computer experience. It assumes that you are familiar with the management and operation of your computer and Windows.

This manual is provided both in print and electronic form. The entire user’s guide is provided as a digital document in Adobe® Portable Document Format (PDF).

TextBridge Pro Millennium Business Edition User’s Guide vii

To view the user’s guide in PDF format you need Adobe Acrobat Reader, which is installed with TextBridge, unless you already have it on your PC. You can access the user’s guide from the installation menu and the TextBridge Help menu, or you can open it from Adobe Acrobat Reader. After you open it, you can view it on your PC and print all or part of it using Adobe Acrobat Reader.

Organization of this user’s guide

The user’s guide organization is as follows:

The Table of Contents at the beginning of the user’s guide describes the basic information in this book and helps you to find general information quickly.

This Preface describes the documentation provided with

TextBridge and technical support.

Chapter 1 “Introduction to TextBridge” discusses TextBridge features. It also describes basic OCR concepts, documents TextBridge can recognize, supported scanners, and input file formats TextBridge can read and output file formats to which TextBridge can save the recognized text.

Chapter 2 “Installing TextBridge” describes what comes with TextBridge; system requirements, installation, Instant Access set up, and TextBridge uninstall.

Chapter 3 “OCR and TextBridge” explains the basic TextBridge functions that enable it to recognize and OCR your documents.

Chapter 4 “Learning to Use TextBridge” describes the basic processes of using TextBridge.

Chapter 5 “Sample Sessions with TextBridge” walks you through several practice sessions designed to help you to learn and use the important features of TextBridge.

Chapter 6 “Advanced Sample Sessions” describes more complex and less frequent uses of TextBridge.

The Index provides a comprehensive list of topics to assist you in quickly locating the specific information you need.

viii

TextBridge Pro Millennium Business Edition User’s Guide

Documentation conventions

TextBridge documentation uses certain graphical elements and formatting to emphasize information and give more meaning to text.

Table 1: Documentation Conventions

bold

Introduces a new term or the first use of

 

an important term in a chapter. It is

 

sometimes used to denote strong

 

emphasis.

italic

Denotes titles of other user’s guides or

 

books and generic representations of file

 

name entries in examples; for example,

 

filename

monospace

“ ” (quotes)

Note

Denotes text that appears on the computer screen such as examples, menu text, and messages plus actual file names.

Denotes titles of chapters and sections in this user’s guide.

Introduces tips that provide useful information about a procedural step or system function.

Introduces information of note about the current subject.

RELATED DOCUMENTATION

TextBridge provides a comprehensive set of printed and digital documentation designed to assist you in learning and operating the product. The documentation provided with TextBridge covers all aspects of installation and operation.

Note Information provided in individual documents is not duplicated in other documents except for basic information about TextBridge. If you do not find the information you want in a particular document, please check another. For example, if you do not find information you want in this user’s guide, look for it in the Help system

Preface

ix

Refer to the documentation in the following list for information:

Online Release Notes. Before or after you install TextBridge, read the Release Notes. These provide the most up-to-date information about TextBridge. They describe technical information, including specifics about using a particular scanner. Release Notes also include information unavailable at the time that the user’s guide and Help were finalized. During installation you can access the Release Notes from the installation menu. After installation you can access the Release Notes from the TextBridge Program menu in the Start menu.

Help. The Help system provides you with detailed information about using TextBridge. It includes instructions on how to get started in TextBridge, step-by-step procedures for most operations and user tips. Context-sensitive Help is always available by pressing F1 from any menu command or dialog box.

Online User’s Guide. An online version of the complete user’s guide is provided in Adobe Acrobat format (.pdf). You can access the user’s guide from the installation menu and the TextBridge Help menu, or you can open it from Adobe Acrobat Reader.

Printed User’s Guide. A printed version of the user’s guide is provided. The user’s guide provides you with basic information about OCR and how TextBridge performs OCR, information about installing TextBridge, information about how to run TextBridge and improve its performance, and tutorials to help you learn the basic-to-advanced use of TextBridge.

Note You may also need to refer to additional publications, such as the manufacturer’s documentation for your scanner.

TECHNICAL SUPPORT

If you should experience problems with TextBridge that you cannot resolve on your own using the documentation and software, contact TextBridge Technical Support at the following Web site:

www.scansoft.com

x

TextBridge Pro Millennium Business Edition User’s Guide

The ScanSoft Web site provides a link to TextBridge pages, including Technical Support with Frequently Asked Questions, technical information bulletins, and a problem report form.

Before sending a problem report form to ScanSoft Technical Support, be sure to visit FastTrack, ScanSoft’s electronic support system on the web site. Using an Intelligent Expert Reasoning™ methodology, FastTrack delivers intuitive self-service via the Internet and allows you to successfully resolve your problems online 24 hours a day, seven days a week. Software downloads, product upgrades and product updates are also available in FastTrack.

Additional information about contacting TextBridge Technical Support is provided in the TextBridge Help menu.

If you must contact ScanSoft Technical Support, the following information will help in solving the problem:

Your software version number

(This is on the back of the CD envelope and in the Help menu under About TextBridge.)

Your software serial number

(This is the serial number on the back of the TextBridge CD-ROM envelope and in the Help menu under About TextBridge.)

Your scanner make and model

A description of the steps that led up to the problem

If TextBridge generated an error message, a verbatim description of the error message or its number and when it appeared.

Preface

xi

1INTRODUCTION TO TEXTBRIDGE

Welcome to ScanSoft’s TextBridge Pro Millennium, optical character recognition (OCR) software for Microsoft Windows® 95, 98, 2000 and Windows NT® 4.0.

This chapter provides an introduction to TextBridge including:

Basic OCR concepts

Features and benefits

Characteristics of documents TextBridge can recognize

Input image file formats

Output text file formats

Output image file formats

BASIC OCR CONCEPTS

OCR technology enables you to convert paper documents into fully editable text with images on your computer. Originally, OCR technology performed simple character recognition of text characters, numbers, and symbols. Today, TextBridge OCR includes full document recognition including recognizing text plus formatting such as headlines, multiple columns, tables, and running headers and footers and capturing photographs and line drawings. TextBridge even retains the layout of the original document as much as possible.

You can use TextBridge to scan and convert printed pages to text documents for your word processor, spreadsheet program, web browser, database program, or other text application. Pages may be from most sources, including computer printers, fax machines, photocopiers, magazines, and newspapers. Pages can be black and white or color. TextBridge can also recognize standard page image files from fax modems, image applications, and other sources.

TextBridge Pro Millennium Business Edition User’s Guide

1–1

Using the latest document recognition technology from ScanSoft, TextBridge OCR uses its recomposition capability to produce a fully editable electronic document with the original pictures and document layout (Figure 1–1).

Original document

Recomposed document in

word processor

Figure 1–1. TextBridge document recomposition

In most cases, TextBridge understands your original document’s format and maintains the layout, including columns, headers, footers, pictures, and picture captions. Pictures can be black and white, grayscale, or color.

1–2

TextBridge Pro Millennium Business Edition User’s Guide

Recomposition is possible only if your text program supports pictures and layout. For example, recomposition is supported in Microsoft Word and Corel WordPerfect but not in Notepad. Forms and documents created in desktop publishing programs are usually too complex for recomposition by TextBridge as well as your word processor. As a result, the text and pictures are retained but the full layout is not.

FEATURES AND BENEFITS

TextBridge offers many features designed to make it easy to use and increase your productivity. Whether you need to capture a simple one-page letter, a magazine article, a spreadsheet, or a long transcript, TextBridge can save you valuable time and effort. In addition, TextBridge provides all the capabilities that experienced OCR users expect.

With TextBridge, you can import most paper documents or document image files to your computer. TextBridge attains the highest degree of OCR accuracy and provides the output in fully editable form in your favorite program. Many of these features and benefits are described in more detail in the user’s guide and Help.

New Features

TextBridge Pro Millennium Business Edition offers these new features to increase your productivity:

Windows 2000 Certification. Makes use of latest Windows technology to assure a consistent user experience and a more reliable and manageable application.

Updated scanner support. Includes latest Scanner Wizard hint file for easy setup of popular scanners.

Live updates. Stay up-to-date with the latest product changes from the ScanSoft, Inc. web site.

Instant Access™ to FrontPage® 2000 and The Print Shop® ProPublisher 2000. Use TextBridge Instant Access to scan, recognize, and paste text and pictures directly into your FrontPage and Print Shop documents.

Introduction to TextBridge

1–3

Enhanced Features

In addition to the new features, TextBridge offers enhanced features that were available in previous versions. These features were available before and are even better now. They are described in the following list:

Instant Access. Start TextBridge within most Windows text programs such as Word or Excel. After recognizing and converting the page, TextBridge then automatically pastes recognition data (text and pictures) directly into the program’s open document.

OCR accuracy. Dramatically save time and eliminate retyping.

Color and grayscale pictures and text. Recognition and output of color and grayscale pictures. Recognition of color text and text on a color or shaded background and output of black on white or white on black.

Table recomposition. Advanced analytical capability results in very accurate table reformatting. Ability to edit the entire table as well as individual cells for improved recognition. Cell table recomposition is supported even if you do not choose to retain layout.

Flexible multi-page document handling. Ability to view and manipulate the pages of a document using the page thumbnails. Zone multiple pages before recognition. Process the pages of a document in any order. Delete, rearrange, and re-recognize individual pages. You can also control the output.

Extensive language recognition. Ability to recognize many Eastern, Central, and Western European languages.

Multiple language recognition. Ability to recognize multiple languages on the same page if all languages belong to the same language group.

Usability and user assistance. Enhanced ease of use including a redesigned user interface and extensive user assistance. User assistance includes a multimedia assistant, information screens, context-sensitive tips, status area messages, Help system, and printed and online documentation.

TextBridge Assistant. An easy-to-use assistant guides you through each step of the most common TextBridge activities, such as how to scan a page and send it to Word, recognize an image file, and recognize just part of a page.

1–4

TextBridge Pro Millennium Business Edition User’s Guide

Convenient batch processing. The ability to select multiple files and process each file separately plus the ability to schedule processing for a specific time in the future.

Integration with e-mail programs. Input to popular programs such as Lotus cc:Mail, Microsoft Outlook, and America Online (AOL).

Integration with the latest scanners. TextBridge works with the most recent scanners. The Release Notes and the ScanSoft Web site at www.scansoft.com provides the latest information about supported scanners and getting your scanner to work with TextBridge.

HTML 4.0 output and WYSIWYG capability. Output files in the latest version of HTML and preserve the original look using cascading style sheets.

Dual page scanning. Scan both pages of an open book at the same time but handle them as two separate pages.

Easy database importing. Use of standard delimited text file output that allows you to import data into many databases.

ToolTips and What’s This? Help. Instant context-sensitive information about commands, dialog boxes, and buttons on the interface.

Document recomposition. TextBridge offers true document recomposition to retain your original page layout. It reproduces multiple columns, tables, and pictures and keeps them in the same location as they are in your original document.

For example, when you specify output to the Microsoft Wordor Corel WordPerfect® format, TextBridge can retain the original document layout in fully-editable form, even for pages containing tables, line art, reverse video, drop caps, insets, and pictures. When you edit the document, the original text flow is maintained.

When you specify output to the Microsoft Excelor Lotus 1-2-3 format, spreadsheets and cell tables retain their original layout as cell tables, not tabbed columns. When you edit the table information, the lines move to fit.

Introduction to TextBridge

1–5

TextBridge supports formats for the programs that retain page layout in the following list:

Internet Explorer

Netscape

Word 6.0, 7.0, 97, and 2000

Word Perfect 6.0, 6.1, 7.0, 8.0, and 9.0

Any word processor that supports RTF

Retaining pictures is independent of retaining layout. Some text programs retain pictures even though they do not retain layout.

Page Types. TextBridge provides many predesigned Page Types to make processing easier and more efficient. You do not have to go through a complicated process of determining and specifying settings for common types of pages. These Page Types automatically provide appropriate settings for the type of page you want to process. For example, there is a Letter page type and a Magazine page type that automatically activate settings for improved results for letters and pages from magazines.

Automatic zoning. TextBridge automatically zones your page into text, picture, and table zones. You do not need to zone the page manually.

Zone editing. You can edit the automatically recognized zones to further refine the zoning. Use zone editing to increase the accuracy and efficiency of page processing by reshaping zones, specifying the language, and renumbering zones.

Built-in Proofreader. After document recognition, you can use the built-in proofreader to view and accept or correct any words that TextBridge suspects may not be recognized accurately. The proofreader provides suggestions from which you may choose.

Dynamic OCR training. You can train TextBridge’s OCR to improve recognition accuracy as the job progresses. Use dynamic training with difficult documents, such as faxes or multigeneration photocopies. TextBridge enables you to interact with the OCR process by viewing then accepting or correcting its automatic recognition decisions. The software actually learns special symbols and words.

1–6

TextBridge Pro Millennium Business Edition User’s Guide

Output files to the latest version of programs. These include Microsoft Word 2000, Excel 2000, FrontPage 2000, WordPerfect 9.0, and Adobe FrameMaker 5.0.

Custom dictionaries. To improve recognition accuracy further, you can create specialized word lists (scientific terminology, proper names, acronyms, and so on) within TextBridge or in ASCII text files and load them into TextBridge. You can also use your Microsoft Word or Office custom dictionary with TextBridge.

Other Features

In addition to the features listed in the previous sections

TextBridge provides these other features.

Broad scanner support. TextBridge supports most popular desktop scanners with TWAIN (technology without an important name) device interface standard.

Image processing. TextBridge accepts a wide range of images from a variety of sources for processing. Specifically, the program imports and recognizes online document images in BMP, PCX, DCX, TIFF, and XIF formats that originate from fax modems and other sources. For more information, see the “Input Image File Formats” section in this chapter.

Deferred processing. TextBridge enables you to scan all the pages of a document to a TIFF or XIF file, then later open the image file for document recognition. You can also save all the pages to a multi-page image file or save each page as a separate file.

Output text file formats including HTML. TextBridge supports a number of output text file formats, including word processor, desktop publishing, spreadsheet, HTML, and database formats. Now you can process your text for publication on the Web.

Preview of page images. TextBridge provides a set of tools for previewing page images before processing them. You can manually define areas of page images as zones to be processed and capture only the text, tables, or pictures you want. You can also edit the automatic zoning by adjusting the text, table, and picture zones.

Introduction to TextBridge

1–7

Zone templates. After you create a set of zones, TextBridge lets you save and reload zone templates for new jobs. In this way you can consistently process or ignore specific areas on the same type of pages and save time without rezoning each page.

Re-usable training data. After you interactively train OCR, you can save the training data in a file. You can reload this training file for similar documents of the same page type. Using this training file assures the highest recognition accuracy without your having to repeat the training.

Two-sided document processing. If your scanner has a sheet feeder, you can scan the fronts (odd sides) of the pages first, then flip the stack, and scan the reverse (even) sides. When scanning and recognition are complete, TextBridge automatically collates the text and keeps it in the original order.

Documents TextBridge Can Recognize

TextBridge includes a number of advances developed by ScanSoft, Inc. and at the Xerox Palo Alto Research Center (PARC). Consequently, TextBridge provides highly accurate OCR and format retention on the widest range of documents. TextBridge can recognize documents with the characteristics in the following list:

Documents printed on typewriters, phototypesetters, and impact, ink-jet, dot-matrix, and laser printers

Photocopied, degraded, or dirty documents

Documents with singleor multiple-column layouts

Spreadsheets and cell tables

Paper documents with black and white, grayscale, and color pictures including photos and line art

Page image files with black and white, grayscale, and color pictures

Dual page documents, such as bound books

Multi-page documents

Singleor multiple-page images from fax modems and other sources

1–8

TextBridge Pro Millennium Business Edition User’s Guide

Hard-copy faxes

Documents with point sizes ranging from 5-point to 72-point type in practically any typeface

Documents composed in any of many Eastern, Central, or Western European languages as well as one or more of the languages within one of these groups in the same document

INPUT IMAGE FILE FORMATS

The source of page images for TextBridge can be your scanner or it can be image files. TextBridge can recognize the following types of image file formats:

Image File Format

File Name Extension

Windows bitmap

.bmp

PCX

.pcx

Multi-page PCX used in some fax

.dcx

programs

 

Tag image file format (including

.tif, .ala

Alacrity TIFF)

 

Delrina WinFax fax image files

.fxr, .fxd,

 

.fxm, .fxs

eXtended image file

.xif

Image files can be black and white (binary), grayscale, or color. TextBridge can process images in resolutions from 72 to 900 dots per inch (dpi). Recognition results are generally better from grayscale images than binary images. For the most accurate results, we recommend scanning grayscale images at 200 dpi. For better results on difficult documents, we recommend scanning grayscale images at 300 dpi; however, this requires more processing time.

Note Refer to the ScanSoft Web site at www.scansoft.com for the latest list of supported input image file formats.

Introduction to TextBridge

1–9

OUTPUT TEXT FILE FORMATS

TextBridge can convert its recognized text and pictures to files for the following programs and formats:

Programs and Formats

File Name

 

Extension

Adobe Acrobat Portable Document Format (PDF)

.pdf

Ami Pro 2.0 and 3.0

.sam

dBase IV

.dbf

DisplayWrite 5

.rft

Excel 97 and 2000

.xls

Excel 3.0, 4.0, and 5.0

.xls

Excel for the Macintosh 3.0 to 7.0

.xls

FrameMaker

.mif

HTML WYSIWYG

.htm

HTML

.htm

Interleaf

.wps

Lotus 1-2-3

.wk1

Lotus Word Pro

.lwp

MultiMate Advantage II

.doc

PostScript

.ps

Professional Write 2.0 and 2.2

.doc

Quattro Pro for Windows

.wb2

RFT-DCA

.rft

Rich Text Format (RTF)

.rtf

RTF for the Macintosh

.rtf

Text

.txt

Text with line breaks

.txt

Text DOS format

.txt

Text with line breaks DOS format

.txt

Text comma-delimited

.csv

Text tab-delimited

.txt

Word 2.x (RTF)

.doc

Word 6.0 and 7.0 (RTF)

.doc

Word 97 and 2000 (RTF)

.doc

WordPerfect 4.2 and 5.1

.wpf

Word Perfect 6.0, 6.1, 7.0, and 8.0

.wpd

WordStar

.wsd

Works

.rtf

1–10

TextBridge Pro Millennium Business Edition User’s Guide

PDF files can be transferred and shared across computer platforms. Originally developed by Adobe Systems, Inc, PDF files can be viewed with the Adobe Acrobat Reader. The following table lists the PDF file types you can create with TextBridge as well as the equivalent Adobe names:

TextBridge

Adobe Acrobat 4.0

Adobe Capture 3.0

PDF Image Only

PDF Image Only

 

PDF Image &

PDF Original

PDF Searchable

Hidden Text

Image with Hidden

Image

 

Text

 

PDF Normal

PDF Normal

PDF formatted

 

 

Text and Graphics

PDF Normal

 

 

without word

 

 

images

 

 

The TextBridge PDF File types have the following characteristics:

PDF Image Only. A bitmap picture of the page(s). The document can be viewed but not searched.

PDF Image & Hidden Text. A picture of the page(s) in the foreground with the recognized text hidden behind it. The document can be viewed and searched. Use this type when you need to have searchable text but must keep the original scanned image of each page for legal or archival purposes. Only the zoned text is recognized and saved for searching. Gray or color pages are converted to black and white.

PDF Normal. The page(s) contain actual text and pictures, if any. Suspect words are put on the page as word images, not text, to assure page accuracy. The page(s) can be viewed and searched, including suspect words. PDF Normal files are generally much smaller than PDF Image type files.

PDF Normal Without Word Images. The same as PDF Normal, except all suspect words are converted to actual text – even if the word recognition confidence is low.

Note PDF Image Only and PDF Image & Hidden Text files are always output as a full page in black and white, regardless of how they are scanned or zoned.

Introduction to TextBridge

1–11

Note PDF formats are available for languages in the American/European language group only. Refer to “Recognizing Other Languages” in Chapter 3, “OCR and Basic TextBridge Operations” for more information about language groups.

Microsoft Word (RTF) format is also accepted by a number of other applications, including ClarisWorks® and Adobe® PageMaker®, and WordPad. See the documentation for your particular application for more information about importing files in RTF format.

Note Refer to the ScanSoft Web site at www.scansoft.com for the latest list of supported output text files.

OUTPUT IMAGE FILE FORMATS

With deferred processing, TextBridge can convert its scanned documents and save them as image files for the following formats:

Programs and Formats

File Name

 

Extension

Tag Image File Format

.tif

eXtended Image File

.xif

We recommend XIF for deferred processing, as this format retains the full fidelity of any scan, producing an image ideal for OCR.

1–12

TextBridge Pro Millennium Business Edition User’s Guide

WHERE TO GO FROM HERE

To learn how to install and set up TextBridge on your system, go to Chapter 2.

To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software.

To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor.

Chapters 5 and 6 provide sample sessions that are step-by-step tutorials. Chapter 5 shows you how to use auto processing, Instant Access, recognize a document with complex layout, and process a document with text, pictures, and a table.

Chapter 6 shows how to process a document for a database, to use advanced settings for zones and page types, and to train TextBridge’s OCR.

The Help system provides a complete reference to the TextBridge user interface. Help includes overview information on key features, getting started instructions, step-by-step procedures for most operations and user tips. Typical user questions are answered in a “How Do I?” section and Troubleshooting helps when you have problems. Context-sensitive Help is always available by pressing F1 from any menu command or dialog box.

Introduction to TextBridge

1–13

2INSTALLING AND SETTING UP TEXTBRIDGE

This chapter describes the TextBridge software installation and setup procedures. Specifically, it covers these topics:

What comes with TextBridge

Supported scanners

Installing and testing your scanner

System requirements

Before installing TextBridge

Installing TextBridge

Scanner setup

Setting up Instant Access to TextBridge

Updating your TextBridge software

Uninstalling TextBridge Pro Millennium Business Edition

To get started quickly, proceed to the installation procedure on page 2–7.

WHAT COMES WITH TEXTBRIDGE

TextBridge comes with the following items:

One installation CD-ROM. The CD-ROM includes software programs, language packs, sample document image files, release notes, Help files, online user’s guide in Adobe PDF format, and Adobe Acrobat Reader.

A printed user’s guide to get you started.

Check to be sure that you have all the items listed above. If any item is missing from your TextBridge package, call your authorized ScanSoft dealer. To contact ScanSoft, visit the ScanSoft Web site at www.scansoft.com or refer to the TextBridge Help system.

TextBridge Pro Millennium Business Edition User’s Guide

2–1

Note Be sure to register electronically or print and return the printed software registration form. Registration qualifies you for technical support and assures that you are kept up-to-date on new software releases and other information related to TextBridge and the ScanSoft family of products.

SUPPORTED SCANNERS

TextBridge works with many popular desktop scanners using your scanner's TWAIN interface. You can use TextBridge with any fully TWAIN-compliant scanner that provides a binary, grayscale, or color image in a supported size and resolution.

Exceptions to this include the design of some TWAIN drivers, i.e., the Hewlett-Packard Scanjet 5100C scanner with the PrecisionScan TWAIN interface, triple-pass scanners, and Visioneer sheetfed scanners.

Depending upon the design of your TWAIN driver, you may not be able to scan in color with TextBridge. If you have a triple-pass scanner, use it in single pass, black and white mode only.

If you have a Visioneer sheetfed scanner, use the Visioneer Paperport software and drag and drop an image onto TextBridge or your word processor.

An ISIS driver will be installed by TextBridge to support the Hewlett-Packard Scanjet 5100C model scanners. Other ISIS drivers previously installed on your system will be accessible through TextBridge and may work, however only the HP Scanjet 5100C ISIS driver is supported by ScanSoft.

The full list of scanners supported by TextBridge is always growing. Check the TextBridge Web site at www.scansoft.com to view the most up-to-date list of supported scanners.

Note Install your scanner before you install TextBridge.

Scanners require a TWAIN source driver or an ISIS driver, which are provided by the scanner or interface card manufacturer. Consult the scanner documentation for details about installing your scanner, interface card, and driver.

2–2

TextBridge Pro Millennium Business Edition User’s Guide

After installing your scanner, test that the scanner is functioning. Refer to the scanner manufacturer’s documentation to answer any questions about the scanner.

Note Your scanner must be working independently of TextBridge prior to connecting it to TextBridge.

In general, we recommend that you turn on your scanner before you turn on your PC.

Next, install and test your scanner.

INSTALLING AND TESTING YOUR SCANNER

Refer the to manufacture's detailed instructions for installing your scanner. They provide the most precise information for setting up your scanner. The basic steps for installing a scanner are:

1.Install the correct scanner interface card (if one is necessary) in the PC bus. Note that many scanners simply plug into the PC’s parallel port, universal serial bus (USB), or occasionally the standard serial port.

2.Hook up your scanner to the interface card or standard port with the correct cable and turn on the scanner, then turn on your PC.

3.Install the system-level scanner driver (.sys) file or TWAIN source driver on your PC’s hard disk, as directed by the scanner documentation.

4.Test the scanner using software tools provided by the scanner manufacturer.

If your scanner runs independently of TextBridge, you can be sure that it is functioning correctly. Setting it up to run with TextBridge should then be a simple matter.

5.After the scanner is functioning, go on to install and link your scanner to TextBridge software.

Installing TextBridge

2–3

SYSTEM REQUIREMENTS

To install and run TextBridge, your Windows-compatible PC must be equipped with the following:

An Intel (or compatible) 80486 or Pentiummicroprocessor. We recommend Pentium for the best performance.

A VGA, SVGA, or multi-sync color monitor.

A minimum of 24 megabytes (MB) of random access memory (RAM) for Windows 95 and 98; a minimum of 32 MB for Windows 2000 or Windows NT. We recommend 64 MB for the best performance.

Microsoft Windows 95, 98, or 2000 or Windows NT 4.0.

A hard disk with a minimum of 40 MB of free space in which to install TextBridge.

Another version of TextBridge is available for the Macintosh computer. Refer to the Web site at www.scansoft.com for further information. TextBridge will not run on Windows 3.1, NT 3.51, or OS/2.

BEFORE INSTALLING TEXTBRIDGE

After you install your scanner and check that it is working properly, you are ready to complete other preparations for installing TextBridge and learn more about TextBridge.

Uninstalling a Previous Version of TextBridge

If you have on older version of TextBridge, uninstall it before installing TextBridge Pro Millennium Business Edition. You can still keep certain customized files you have created, such as custom page types and user dictionaries.

Note Uninstalling the older version of TextBridge later may require you to reinstall TextBridge Pro Millennium Business Edition to restore full operation.

2–4

TextBridge Pro Millennium Business Edition User’s Guide

When you insert the TextBridge CD-ROM into your CD-ROM drive, if there is an older version of TextBridge installed, a dialog box appears and recommends that you uninstall that older version. To save disk space, you can uninstall any of these older versions of TextBridge; however, you are not required to do so. If you choose not to do this before installing the new version of TextBridge, you can uninstall it at a later time.

To uninstall a previous version of TextBridge, use the following procedure:

1.Close all active applications, including TextBridge.

2.On the Windows task bar, click Start.

3.Point to Programs, then point to the TextBridge folder.

4.Click TextBridge Uninstall.

The TextBridge Uninstall dialog box appears.

5.Click Yes to continue the uninstall process.

TextBridge proceeds with the uninstall. When it is finished, the Uninstall Complete dialog box appears.

Click No if you decide to quit the uninstall process.

6.Click OK to restart your computer.

With these steps finished, TextBridge is removed from your PC.

If you have saved any user dictionary, training, zone template, or text files in the TextBridge folder, these are not deleted by the uninstall. You can use your user dictionary files with the new version of TextBridge. Just move them to the Windows folder

...All Users\Application Data\TextBridge\Bin\User Dictionaries

Zone Templates created with TextBridge 9.0 can be used with TextBridge Pro Millennium Business Edition. Just move them to the Windows folder

...All Users\Application Data\TextBridge\Bin\Zone Templates

Installing TextBridge

2–5

Training data created with TextBridge 9.0 can be used with TextBridge Pro Millennium Business Edition. Just move them to the Windows folder

...All Users\Application Data\TextBridge\Bin\Training Data

Training data and zone templates created with versions of TextBridge earlier than TextBridge 9.0 cannot be used with this version of TextBridge and can be deleted. You can delete the entire TextBridge folder after you have moved any files that you want to keep.

Using TextBridge with Pagis

The Pagis program from ScanSoft is a color scanning suite of software that enables you to scan, copy, fax, view and edit, index, search, and manage electronic documents and includes TextBridge.

If you have Pagis Pro 2.0 or later installed, Pagis will use the latest version of TextBridge available on your PC.

If you have an earlier version of Pagis (e.g., Pagis SE or Pagis

Pro 97), continue to use the previous version of TextBridge with

Pagis.

Learning about TextBridge before you install it

When you insert the TextBridge CD-ROM into your CD-ROM drive, an autorun program on the CD-ROM launches TextBridge setup. You can learn more about TextBridge at this point, before you install the program.

After setup starts, select one of the options in the following list:

Install TextBridge Pro Millennium Business Edition. The setup program begins for you to install the components of TextBridge.

View Release Notes. The Release Notes appear for you to read and review before you install TextBridge. The Release Notes provide information about TextBridge that was not available when the user’s guide and Help system were finalized. The Release Notes may include special installation instructions, known issues, in-depth information about using TextBridge with specific scanners and other programs, and other technical information.

2–6

TextBridge Pro Millennium Business Edition User’s Guide

View Online Documentation. If Adobe Acrobat Reader is not already installed on your PC, TextBridge starts Acrobat’s installation program. The complete online user’s guide appears for you to read and review.

Browse the CD. Windows Explorer opens the TextBridge CD for you to view the folders and files that come with the TextBridge installation program.

Visit ScanSoft’s Web site. Your Web browser goes to the ScanSoft Web page where there is additional information about TextBridge and other ScanSoft products. To use this, you must have a Web browser and a connection to the Internet.

Exit. Quit the TextBridge autorun program.

INSTALLING TEXTBRIDGE

This section provides procedures to install TextBridge.

Note If you want TextBridge to run on more than one version of Windows with a dual boot system, install TextBridge separately under each operating system.

Before you begin installation, quit any open applications so that only Windows is running. If you typically run programs in the background, close them as well. There should be no applications listed in the task bar and no floating toolbars on the Windows desktop. You can press CTRL + ALT + DEL to do this in the Close Program dialog box.

To install TextBridge:

1.Insert the TextBridge CD into your CD-ROM drive.

An autorun program on the CD-ROM launches the TextBridge setup program. (If necessary, you can use Windows Explorer, open the drive, and double-click the autorun.exe program)

The TextBridge setup program menu appears.

Installing TextBridge

2–7

2.Click Install TextBridge Pro Millennium Business Edition

Follow the onscreen prompts and instructions to install TextBridge Pro Millennium Business Edition.

Congratulations! TextBridge setup is now complete, and your new software is installed on your PC.

Note Updates to your TextBridge software may be available on the ScanSoft web site. Refer to "Updating your TextBridge Software" later in this chapter for more information.

SCANNER SETUP

The first time you attempt to scan after installing TextBridge, TextBridge automatically runs the Scanner Setup wizard. You can also run the Scanner Setup wizard yourself from the TextBridge program group in the Windows Start menu. Check that the proper driver for your scanner is selected.

For scanners not listed as supported by TextBridge (on the ScanSoft web site), be sure to use the Scanner Setup wizard scanner test. The scanner test assures operation of your scanner with TextBridge and determines optimal settings. You can also run the scanner test if you are experiencing problems with your scanner.

1.On the Windows task bar, click Start.

2.Point to Programs, then point to the TextBridge Pro Millennium BE folder, and then point to Scanner Setup.

Scanner Setup is also available from the TextBridge Tools menu.

Follow the instruction in the Scanner Setup wizard to install or test your scanner setup.

2–8

TextBridge Pro Millennium Business Edition User’s Guide

SETTING UP INSTANT ACCESS TO TEXTBRIDGE

Instant Access enables you to use TextBridge directly from a number of other programs, such as Word. With Instant Access you can select TextBridge from the File menu of another program. TextBridge starts, recognizes your pages, and then pastes the results at the cursor in the open document.

TextBridge automatically includes Instant Access to many of the applications on your PC. You can use the TextBridge Instant Access Control Panel to view and specify which applications have Instant Access to TextBridge. Open the Instant Access Control Panel from the Start menu or from the TextBridge Tools menu.

Applications commonly used with Instant Access to TextBridge are listed on the Instant Access Control Panel. If a program that you want to use Instant Access from is not in the list in the Instant Access Control Panel, close the control panel, open the program, then open the control panel again. The program is now included in the list and can be selected.

To provide Instant Access to TextBridge from an application, use the following procedure:

1.On the Windows task bar, click Start.

2.Point to Programs, then point to the TextBridge Pro Millennium BE folder, and then point to the Instant Access Control Panel.

The TextBridge Instant Access Control Panel dialog box appears. TextBridge automatically lists the programs from which Instant Access is available as well as any programs that are currently open. TextBridge excludes any programs known not to work with Instant Access.

3.Click one or more programs in the list to select or unselect it. Click All to check all programs. Click None to uncheck all programs.

4.When you click OK, TextBridge will be available from the File menu in all the checked programs.

Installing TextBridge

2–9

UPDATING YOUR TEXTBRIDGE SOFTWARE

You can get live updates to TextBridge from the ScanSoft Web site. These updates can include new scanner support, software patches, and other updates.

To update TextBridge:

1.In the TextBridge Help menu, select ScanSoft on the Web and click TextBridge Updates.

If your computer is set up for Internet access, your Web browser opens at the ScanSoft Web site.

2.Check for updates to your TextBridge software.

3.If your version of TextBridge is not completely up to date, follow the instructions displayed to install the updates.

UNINSTALLING TEXTBRIDGE PRO MILLENNIUM BUSINESS EDITION

To restore your PC to the state it was in before you installed TextBridge Pro Millennium Business Edition, use the following procedure:

1.Close all active applications, including TextBridge.

2.On the Windows task bar, click Start.

3.Point to Settings, then click on the Control Panel folder to open it.

4.In the Control Panel folder, double-click on Add/Remove Programs.

5.In the Add/Remove Programs Properties, select TextBridge Pro Millennium Business Edition and then click the Add/Remove button.

The TextBridge Uninstall dialog box appears.

6.Click Yes to continue the uninstall process.

Respond to any prompts as necessary.

2–10

TextBridge Pro Millennium Business Edition User’s Guide

7.The Uninstall Complete dialog box appears. Click OK to restart your computer.

When you complete these steps, TextBridge is uninstalled from your PC.

If you have created any files in the TextBridge folder, your files and the TextBridge folder are not deleted by the uninstall process. You can delete the entire TextBridge folder and its contents after you have moved any files that you want to keep to another location.

WHERE TO GO FROM HERE

To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software.

To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor.

Chapters 5 and 6 provide sample sessions that are step-by-step tutorials. Chapter 5 shows you how to use auto processing, Instant Access, recognize a document with complex layout, and process a document with text, pictures, and a table.

Chapter 6 shows how to process a document for a database, to use advanced settings for zones and page types, and to train TextBridge’s OCR.

The Help system provides a complete reference to the TextBridge user interface. Help includes overview information on key features, getting started instructions, step-by-step procedures for most operations and user tips. Typical user questions are answered in a “How Do I?” section and Troubleshooting helps when you have problems. Context-sensitive Help is always available by pressing F1 from any menu command or dialog box.

Installing TextBridge

2–11

3OCR AND BASIC TEXTBRIDGE OPERATIONS

This chapter provides information about the process of page recognition. Use this chapter to learn about optical character recognition (OCR), page recognition, recomposition, and operations that will help you use TextBridge effectively including automatic and manual processing and page types and settings for recognition.

This chapter provides information about OCR and TextBridge including:

What is TextBridge OCR?

Running TextBridge standalone and Instant Access

Improving page recognition with settings

Recognizing other languages

Improving OCR with training

Page recognition or optical character recognition is the technology that converts documents that you can read into documents that your computer can read. Recomposition is the technology that reproduces the formatting of text and the layout of the page, including the positioning of text, pictures, and tables.

WHAT IS TEXTBRIDGE OCR?

TextBridge is OCR software that turns paper documents or page image files into text documents on your PC. Page image data is electronic information about the pages of a document that comes from a source such as your scanner or fax software. This data becomes an image document and is stored in an image file. Text documents are files containing information about the text and pictures in your document. A text document contains one or more pages and is expressed in text form and stored in a text file. You can open, edit, reformat, and republish this information.

TextBridge Pro Millennium Business Edition User’s Guide

3–1

Page types

TextBridge can recognize a wide variety of pages. All you need to do is select the page type that most closely matches your original page. TextBridge gives you common page types with settings that are used most often to process pages of that type. You can also define your own page types to handle processing of other types of pages.

Using page types makes it quick and easy for you to perform page recognition. You can modify these page types or create new page types and save them for future use.

Page type settings include: page orientation, page layout, print type, scanner brightness, scanner color and resolution, document language, training data, and user dictionary. The page types to choose from and some of their characteristics are described in the following table:

Page Type

Scan

Print

Page

Picture

 

Size

Type

Layout

Output

Any Page (b&w)

Letter

Any

Any

Gray

Any Page (color)

Letter

Any

Any

Color

Book (Dual page)

Scan

Any

Any

B & W

 

max.

 

 

 

Business Card

Card

Good

Single column

Color

Fax

Letter

Fax

Any

Gray

Legal

Legal

Good

Single column

B & W

Letter

Letter

Good

Single column

B & W

Magazine

Letter

Good

Multi-

Gray

(b & w)

 

 

column

 

Magazine (color)

Letter

Good

Multi-

Color

 

 

 

column

 

Newspaper

A3

Newspaper

Multi-

Gray

 

 

 

column

 

Table

Letter

Good

Table

Gray

3–2

TextBridge Pro Millennium Business Edition User’s Guide

Figure 3–1. Original Page tab in Page Type Settings dialog box

The page type also specifies Scanner Settings controlling how pages of this type will be scanned. The scan page size is set according to the Size setting. Your scanner’s capabilities, together with the Print Type and Picture Output settings determine the scan resolution and whether scanning is color, grayscale, or black and white.

Scanning grayscale (or color) rather than black and white can improve text recognition on pages with difficult-to-recognize text. However, grayscale scanning is slower than black and white scanning.

Page sources

You can get pages to process from your scanner or from page images. Use your scanner as a source to input documents on paper to TextBridge, which then takes the scanned images, performs OCR, converts the recognized text and pictures to the text file format of your choice, and stores it on your PC. Alternatively, use TextBridge to recognize and convert page images stored in image files that come from fax modems or other sources.

OCR and Basic TextBridge Operations

3–3

Recomposition

TextBridge recomposition lets you keep the layout of the original page. When you select Retain page layout in the Save As dialog box, TextBridge recomposes the layout, while maintaining full ability to edit in the output file (except WYSIWYG HTML). After recomposition, text, pictures, and tables are in the same position in relation to each other as in the original page. You can see the results of recomposition when you print the page or look at it in layout view, if your word processor supports these elements.

You can retain page layout when outputting to Word or WordPerfect. Outputting HTML WYSIWYG preserves page layout for Internet Explorer and Netscape.

It is important to note that in reconstructing the layout of the original document, TextBridge is limited by the composition capabilities of the text program. For example, there are some complex magazine pages originally created with a publishing program for which you will not get identical output in your word processor. Even the most powerful word processors do not have some of the composition capabilities of publishing software.

In addition, some complex, free form layouts defeat TextBridge’s recomposition capabilities. For these types of documents, it is often best to preview pages and manually zone text and image zones that you want to capture.

Retain pictures keeps pictures in the saved document if the document format supports pictures. If you do not select , pictures are saved at the end or beginning of the document, depending on your word processor. If you select retain page layout, the pictures are in the same position in relation to the text and each other as they were in the original page when you print the page or view it so that you can see the layout.

Format with paragraph styles makes it possible for you to see the specific formatting styles assigned to paragraphs of text by TextBridge. Paragraph styles have names that begin with “TxBr” Formatting styles include indentation, font size and style, underline, bold, and italic. Paragraph styles make it easier for you to change formatting when you open the output document in your word processor. Paragraph styles are only available if your word processor or other text program supports this capability.

3–4

TextBridge Pro Millennium Business Edition User’s Guide

+ 91 hidden pages