ABBYY FineReader - 8.0 User Manual

Optical Character Recognition Program
ABBYY® FineReader
Version 8.0 User’s Guide
ABBYY FineReader 8.0 User’s Guide
Information in this document is subject to change without notice and does not bear any commitment on the part of ABBYY. The software described in this document is supplied under a license agreement. The software may only be used or copied in strict accordance with the terms of the agreement. It is a breach of the "On legal protection of software and databases" law of the Russian Federation and of international law to copy the software onto any medium unless specifically allowed in the license agreement or nondisclosure agreements. No part of this document may be reproduced or transmitted in any from or by any means, electronic or other, for any purpose, without the express written permission of ABBYY.
© 2005 ABBYY Software. All rights reserved. © 1987–2003 Adobe Systems Incorporated. Adobe® PDF Library is licensed from Adobe Systems Incorporated. Microsoft Reader Content Software Development Kit © 2004 Microsoft Corporation, One Microsoft Way, Redmond, Washington 98052–6399 U.S.A. All rights reserved. Fonts Newton, Pragmatica, Courier © 2001 ParaType, Inc. Font OCR–v–GOST © 2003 ParaType, Inc. © 1999–2000 Image Power, Inc. and the University of British Columbia, Canada. © 2001–2002 Michael David Adams. All rights reserved. ABBYY, the ABBYY Logo, Scan&Read, ABBYY FineReader are either registered trademarks or trademarks of ABBYY Software Ltd. Adobe, the Adobe Logo, the Adobe PDF Logo and Adobe PDF Library are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Microsoft, Outlook, Excel, PowerPoint, Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their respective owners.
2
ABBYY FineReader 8.0 User’s Guide
Contents
Welcome!.......................................................................................................................... 4
What’s New in ABBYY FineReader 8.0 ............................................................................... 5
Chapter 1 Working with ABBYY FineReader..................................................................... 7
Installing and Starting ABBYY FineReader................................................................................................................................................................................8
Acquiring the Image...................................................................................................................................................................................................................................9
Page Layout Analysis...............................................................................................................................................................................................................................16
Recognition...................................................................................................................................................................................................................................................21
Checking and Editing Text ................................................................................................................................................................................................................28
Saving into External Applications and Formats .................................................................................................................................................................33
Working with Batches........................................................................................................................................................................................................................... 42
Automated Tasks.......................................................................................................................................................................................................................................44
Chapter 2 ABBYY Screenshot Reader .............................................................................. 51
Installing and Starting ABBYY Screenshot Reader........................................................................................................................................................... 52
ABBYY Screenshot Reader Toolbar.............................................................................................................................................................................................52
Capturing Text and Tables from the Computer Screen................................................................................................................................................52
Making Screenshots ................................................................................................................................................................................................................................53
Additional Options ..................................................................................................................................................................................................................................54
Chapter 3 ABBYY Hot Folder & Scheduling..................................................................... 55
Installing and Running ABBYY Hot Folder & Scheduling ..........................................................................................................................................56
ABBYY Hot Folder & Scheduling Main Window...............................................................................................................................................................56
Setting Up a Hot Folder........................................................................................................................................................................................................................57
Hot Folder Log File.................................................................................................................................................................................................................................. 58
Additional Options for ABBYY Hot Folder & Scheduling...........................................................................................................................................59
Appendix........................................................................................................................60
Supported Document Saving Formats ......................................................................................................................................................................................61
Supported Image Formats..................................................................................................................................................................................................................61
Hot Keys ..........................................................................................................................................................................................................................................................62
Glossary............................................................................................................................................................................................................................................................ 63
3
ABBYY FineReader 8.0 User’s Guide
Welcome!
Thank you for purchasing ABBYY FineReader! Electronic documents are becoming increasingly prevalent. However, business contracts, books and periodicals are still printed and millions of people use ABBYY FineReader to convert hard–copy documents into electronic formats. ABBYY FineReader gives you the edge by providing full control over printed information: you can quickly transform any printed text or PDF file into an editable format and re–use their content. ABBYY FineReader will help you:
collect information from various sources and draw up a report
edit a paper document or fax
write an article, a thesis or a paper for publication
publish newspaper and book clippings on the Web
extract text from a PDF file and make changes to it
ABBYY FineReader is very easy to use. Even if you are a novice to optical character recognition, you will get results in a matter of minutes. And if you are an OCR professional, you can have full control over all the OCR settings and parameters. This User’s Guide will introduce you to the features and commands of ABBYY FineReader and help you teach your computer to “read.” Welcome to OCR!
4
ABBYY FineReader 8.0 User’s Guide
What’s New in ABBYY FineReader 8.0
Compared to the previous version, ABBYY FineReader 8.0 introduces a variety of improvements and new features to increase your productivity when working with scanned documents, images, PDF files, and faxes. This version features intelligent technology improvements, allowing for improved reading of images taken by digital cameras, more accurate recognition of low–resolution faxes and paper documents, better handling of document layouts, and enhanced security features in PDF files. New features like scheduled operation, recognition of screenshots and automation manager for processing recurring document tasks have been added to increase your productivity even more. Detailed information on major product improvements and new features is given below. Features available only in ABBYY FineReader
8.0 Corporate Edition are marked accordingly.
Up to 30 percent accuracy improvement on low resolution documents and faxes
While ABBYY FineReader has traditionally delivered highly accurate recognition results on documents of good quality, there is still a possibility of documents of lower–than–expected quality arriving at your desk. Most often these are faxes or paper documents that were scanned at resolution lower than recommended for OCR. ABBYY FineReader 8.0 handles such documents better, delivering up to a 30 percent improvement in recognition accuracy.
Processing images taken by digital cameras
When you are on the go and no scanner is available, you may still capture documents with a digital camera and recognize them later on your desktop PC. Now ABBYY FineReader 8.0 includes new adaptive recognition technology for better OCR of camera images.
Security options for PDF files
The new version of ABBYY FineReader supports PDF security settings and allows you to set document Open and Permissions passwords as well as selecting other security options for PDF files. You can select RC4–based 40–bit or 128–bit encryption or the newest AES (Advanced Encryption Standard)–based 128–bit encryption.
Creating tagged PDF files
A new option for saving tagged PDF files in ABBYY FineReader 8.0 makes it possible to create PDF files that will be easier to read when used on devices with limited screen sizes, for example handheld devices.
Automation Manager
This new feature allows faster processing of repeated document tasks by grouping them into sets of consecutive operations that can then be called by one click of a button. Several predefined automated tasks are available and it is also possible to create your own customized automated tasks and share them with colleagues.
Support for hyperlinks
The new version recognizes hyperlinks, such as links to Web sites and e–mail addresses, and reconstructs them in output documents. You can also add new hyperlinks into recognized documents.
Fast mode recognition
With ABBYY FineReader 8.0 you can recognize documents 2–2 1/2 times faster using a new fast recognition mode. This mode is recommended for documents with simple layouts and good printing and scanning quality. For more complex documents, the accuracy mode should be preferred. However, recognition accuracy obtained in fast mode will be sufficient in many cases, for example when converting paper documents into searchable PDF files.
Saving to Microsoft Reader e–book (LIT) format
Now you can save recognition results in Microsoft Reader’s LIT e–book format that makes such documents suitable for reading on handheld devices and PDAs.
Defining document–related properties
ABBYY FineReader 8.0 allows for defining additional document properties like Title, Author, Subject and Keywords, and saving this data to PDF, DOC/RTF, XLS, HTML, Word XML and LIT file formats. These properties can be used by the operating system and other software for indexing and search purposes.
Extended language and dictionary support
The total number of supported languages is now 179. Dictionary support and spell–check functions are available for 36 languages. Legal and medical dictionaries for the English and German languages are now included in the main English and German recognition dictionaries – there is no need to select specialized recognition languages to work with specialized text.
5
ABBYY FineReader 8.0 User’s Guide
Opening multi–page PDF and TIFF files
If you do not need the entire document converted, you can open only selected pages of your multi–page PDF or TIFF files in ABBYY FineReader 8.0.
ABBYY Screenshot Reader
(available in ABBYY FineReader 8.0 Professional Edition after registration, available by default in ABBYY FineReader 8.0 Corporate Edition) This simple and easy–to–use utility allows you to grab a part of the screen and recognize the text in the captured image. The utility also allows you to save captured screen areas to a file or to the clipboard.
ABBYY Hot Folder & Scheduling
(available only in ABBYY FineReader 8.0 Corporate Edition) The previously available feature for automatically scanning folders for incoming images and processing these images has been extended in the new version of ABBYY FineReader. Scheduled processing capabilities have been added to allow you to utilize your computer for document conversion purposes when it is not occupied by your normal activities, for example at night time.
6
ABBYY FineReader 8.0 User’s Guide
Chapter 1 Working with ABBYY FineReader
Chapter Contents:
Installing and Starting ABBYY FineReader
Acquiring the Image
Page Layout Analysis
Recognition
Checking and Editing Text
Saving into External Applications and Formats
Working with Batches
Automated Tasks
Network Document Processing
7
ABBYY FineReader 8.0 User’s Guide
Installing and Starting ABBYY FineReader
This chapter provides detailed instructions on installing ABBYY FineReader, outlines the system requirements of the program and offers instructions for installing the program on workstations and networks. ABBYY FineReader 8.0 includes a specialized installation program that automates the setup process. To ensure proper installation, always use the ABBYY FineReader CD–ROM for installation.
Software and Hardware Requirements
ABBYY FineReader 8.0 requires the following:
1. A PC with Intel® Pentium®/Celeron®/Xeon™, AMD K6/Athlon™/Duron™/Sempron™ or compatible processor (500 MHz or higher).
2. Operating System: Microsoft® Windows® Server 2003, Microsoft® Windows® XP, Microsoft® Windows® 2000. (To work with localized interfaces, corresponding language support is required.)
3. Memory: 128 MB RAM. In a multiprocessor system, an additional 16MB of RAM is required for each additional processor.
4. Hard disk space: 250MB for typical program installation and 100MB for program operation.
5. A TWAIN–compatible scanner, digital camera or fax–modem.
6. A video card and monitor (min. resolution 800×600).
7. A keyboard, mouse or other pointing device.
Installing ABBYY FineReader
The installation program will guide you through installation of ABBYY FineReader. Please close all applications prior to installing ABBYY FineReader.
To install ABBYY FineReader:
1. Insert the ABBYY FineReader 8.0 CD–ROM into the CD–ROM drive. The installation program will be launched automatically.
2. Follow the installation instructions.
If the installation program does not launch automatically:
1. Click the Start button on the Taskbar and select the Settings/Control Panel item.
2. Double–click the Add/Remove Programs icon.
3. Select the Install/Uninstall tab and click the Install button.
4. Follow the installation program instructions.
Installation options
During the installation, you will be asked to select one of the two installation options:
Typical (recommended) – This option installs all components of the program, including all recognition languages. You will be prompted to choose a single interface language during installation.
Custom installation – This option allows you to choose to install only specific components of the program, including all available recognition languages.
Consult the readme.htm file on the ABBYY FineReader CD–ROM if you encounter an error message. Note: If you wish to retain your user dictionaries and patterns from a previously installed version of ABBYY FineReader, do not uninstall the older version of the program prior to installing the new version. All existing user dictionaries and patterns will then be available for use in the latest version.
Starting ABBYY FineReader 8.0
To start ABBYY FineReader:
Select the ABBYY FineReader 8.0 Professional Edition (Corporate Edition) item in the Start/Programs menu.
Click
In Windows Explorer, right–click the file you wish to open. In the local menu, select the Open with ABBYY
FineReader command.
Note: Make sure your scanner is connected to your computer, plugged–in, and turned on before you start ABBYY FineReader. To
install a scanner after installing the program, please consult the user guide supplied with the scanner for installation instructions. If you do not have a scanner, you can still recognize image files using ABBYY FineReader 8.0. You will find sample image files in the ABBYY FineReader/Demo folder on the program CD–ROM.
on the Microsoft Word toolbar.
Installation on a Network Server or Workstation
Only the system administrator may install ABBYY FineReader 8.0 Corporate Edition on a network server. There are two stages to the installation. First, the program is installed on the server. From the server, the program can be installed on workstations using one of the four methods:
using Active Directory
using Microsoft Systems Management Server (SMS)
using Task Scheduler
form the command line
8
ABBYY FineReader 8.0 User’s Guide
manually in interactive mode
To install ABBYY FineReader 8.0 Corporate Edition on the server:
1. Insert the ABBYY FineReader CD–ROM into the CD–ROM drive.
2. Run Adminsetup.exe from the ABBYY FineReader CD–ROM.
The System Administrator’s Guide (which can be found in the Administrator’s Guide folder on the server where ABBYY FineReader is installed) provides additional information about installing ABBYY FineReader on workstations, working with the License Manager and working with the program in a local area network.
Acquiring the Image
The quality of the source image greatly affects recognition quality. In this chapter, you will learn how to scan documents for best results, how to open and read saved images (see the list of supported image formats in "Supported Image Formats" section), and how to process images to improve recognition quality (by eliminating scanning "dust" etc.).
Scanning
ABBYY FineReader communicates with the scanner through a TWAIN interface. The TWAIN standard, which was adopted in 1992, is a universal standard that unifies the interaction between a computer image input device (such as a scanner) and an external application. ABBYY FineReader communicates with a scanner through a TWAIN driver in two ways:
through the ABBYY FineReader interface. In this case, use the Scanner Settings dialog and select Use ABBYY FineReader interface;
through the scanner's TWAIN interface. In this case, use the scanner's TWAIN dialog to set scanning options; select Use TWAIN–source interface.
Each mode has its advantages and disadvantages
Using the TWAIN source interface makes the “preview image” option available so that you can set the scanning area and tune the brightness precisely, and see how these changes affect the previewed image. Every scanner has a unique TWAIN driver dialog. Consult your scanner’s documentation for precise instructions on using the TWAIN dialog. Using the ABBYY FineReader interface provides access to a couple of additional features: a) the ability to scan multiple pages with a scanner that does not have an automatic document feeder (ADF); and b) the ability to access scanning options in the batch template file (*.fbt) and use them for other batches. Switching between modes is easy:
Select the Scan/Open tab in the Options dialog (menu Tools>Options), select the interface – either Use TWAIN–Source interface or Use ABBYY FineReader interface.
Note:
1. The Use ABBYY FineReader interface option may be unavailable (or disabled) in certain scanner models.
2. If you wish to see the Scanner Settings dialog in Use ABBYY FineReader interface mode, select the Display
options dialog before scanning item on the Scan/Open tab (Tools>Options).
Important: Consult your scanner's documentation to ensure it is set up correctly. After connecting the scanner to the computer,
install a TWAIN driver and/or the scanner software.
To start scanning:
Click the 1–Scan button or select the Scan item in the File menu. The Image window containing a scanned image of the page will appear in ABBYY FineReader's main window. To scan multiple pages in a row, select the Scan multiple images option on the Scan/Open tab in the
Options dialog. Note: To open this dialog, select the Options... item from the 1–Scan button menu.
If scanning does not begin immediately, one of two dialogs will open:
The scanner's TWAIN–Source dialog. In this dialog, check the scanning options and click the OK button (depending on your scanner model, it may be called Done, Scan, Final, etc.) to start scanning.
The Scanner Settings dialog. In this dialog, check the scanning options and click the OK button to start scanning.
Tip: To start recognition immediately after the source images are scanned, use the Scan&Read option:
9
ABBYY FineReader 8.0 User’s Guide
Click the arrow at the right of the Scan&Read button and select the Scan&Read item in the local menu.
ABBYY FineReader will scan and read the images. The scanned image will appear in the Image window and the recognition results will be displayed in the Tex t window of the main window.
Setting Scanning Parameters
Recognition quality depends greatly on the quality of the scanned image. You can improve the image quality by altering the main scanning parameters: resolution, scan mode and brightness.
The main scanning parameters are:
Resolution – use 300 dpi resolution for regular texts (font size 10 pts or greater) and 400–600 dpi resolution for texts set in smaller font sizes (9 pts or less).
Scan mode gray. Scanning in grayscale mode is best for recognition purposes. During grayscale scanning, brightness is adjusted automatically.
Scan mode black and white. Black and white scanning maximizes scanning speed but may result in the loss of some character information. This may lower recognition quality in documents of medium and low print quality.
Scan mode color. Select this mode for documents that contain pictures, colored text or colored backgrounds, so that you can retain the original colors. In all other cases, gray scan mode is preferable.
Brightness – a medium brightness value of around 50% should suffice in most cases. Some documents scanned in black and white mode may require additional brightness tuning. Note: Scanning at 400 to 600 dpi resolution (instead of the default 300 dpi) or scanning in grayscale or color (instead of black & white) mode takes more time. Some scanners may take up to four times longer to scan at 600 dpi than 300 dpi.
To set scanning parameters:
To scan images using the ABBYY FineReader TWAIN interface, click the Scanner Settings button on the Scan/Open tab in the Options dialog (menu To ol s >Options). The Scanner Settings dialog will open. Select
the appropriate scanning options from the dialog.
If you wish to scan your images using the TWAIN–Source interface, your scanner's TWAIN dialog will open automatically when you click the 1–Scan button. Set the scanning parameters in the dialogue. Scanning options may have different names depending on the scanner model. For example, for brightness the word "threshold", a "sun" symbol or a black and white circle may be used. Consult your scanner documentation for a full description of available options.
Tips on Brightness Tuning
To be recognized, a scanned image must be legible. Check the legibility of the image in the Zoom window.
– an example of an image that is appropriate for OCR
If you see that the scanned image is compromised (characters are glued or torn), consult the table below to find ways to improve image quality.
Your image looks like this: Possible remedy:
characters are "torn" or very light
characters are distorted, glued, or filled
Lower the brightness (to make the image darker).
Scan in gray mode (to activate brightness autotuning).
Increase the brightness (to make the image brighter).
Scan in gray mode (to activate brightness autotuning).
10
ABBYY FineReader 8.0 User’s Guide
Scanning Multi–Page Documents
ABBYY FineReader offers a specialized scanning mode (Scan Multiple Images) for more convenient scanning of a large amounts of pages. To enable this mode, select the Scan multiple images option on the Scan/Open tab of the Options dialog (menu Tools>Options). However, note the following:
If you use the ABBYY FineReader TWAIN interface, scanning will be continuous, i.e. when one page is finished, the program will automatically start on the next.
If you use the TWAIN–Source interface, the TWAIN–dialog of the scanner will remain open after scanning a page so that the next page can be placed onto the scanner immediately.
The process of scanning a large number of pages depends on whether you are using a scanner with an Automatic Document Feeder (ADF) or one without.
ADF Scanning:
1. If you are using the ABBYY FineReader interface, select the Use automatic document feeder option in the Scanner Settings dialog (to open this dialog, click the Scanner Settings button on the Scan/Open tab of the Options dialog) and the Scan multiple images option on the Scan/Open tab in the Options dialog (menu Tools>Options...), then click 1–Scan to start scanning.
2. If you are using the TWAIN–Source interface, select the Use automatic document feeder option in the TWAIN dialog of your scanner (remember that each scanners may name this options differently; consult your scanner documentation for details) and the Scan multiple images option on the Scan/Open tab in the
Options dialog (menu To ol s >Options), then click 1–Scan to start scanning.
Non–ADF Scanning
If you are using the ABBYY FineReader interface, select the Scan multiple images option on the Scan/Open tab in the Options dialog (menu To ol s >Options...) and then click 1–Scan to start scanning. If you are using a flatbed scanner without an ADF and the ABBYY FineReader interface, there are two ways to increase its efficiency:
Set a pause value (i.e. the time that will elapse between the scanning of one page and the next). To do this, select the Pause between pages option and then set the pause value (in seconds) in the Scanner Settings dialog (to open this dialog, click the Scanner Settings button on the Scan/Open tab of the Options dialog). The scanner will pause for the predefined time before scanning the next page to allow you to place the next page onto the scanner. After the pause, scanning continues automatically.
Select the Stop between pages option in the Scanner Settings dialog (to open this dialog, click the Scanner Settings button on the Scan/Open tab of the Options dialog). Each time a page scan is completed, a dialog will
ask you if you wish to continue scanning. Click the Ye s button to continue scanning or No to end the process.
When you have finished scanning your pages, select the Stop Scanning item in the File menu. If you are using the TWAIN–Source interface:
Select the Scan multiple images option on the Scan/Open tab in the Options dialog (menu Tools>Options...) and then click 1–Scan to start scanning. The TWAIN dialog of your scanner will open. Click
the Scan (Final, or other) button to start scanning.
Scan a page, insert the next page into your scanner and click the Scan button in the TWAIN dialog of your scanner to continue scanning. When all pages have been scanned, click the Close or other scanner–specific button in the TWAIN dialog of your scanner. Tip: To have greater control over the quality of your scanned images, make sure that the Open image during scanning option in the Scan/Open group in the Legacy Options dialog is selected. (To open the Legacy Options dialog, click the Legacy Options... button on the General tab in the Options dialog.) This command opens each scanned page in the Image window immediately after it has been scanned. Reject the scanned page and halt the scanning process by clicking on Stop Scanning in the File menu. Next, re– scan the image.
Solving Scanning Problems: Your Scanner does not Support TWAIN
Even if your scanner is not TWAIN–compatible, you can still continue using ABBYY FineReader! Just do the following:
1. Create a new batch and open it. (If a batch is already open, skip this step.)
2. Set the correct recognition parameters (recognition language, document type, print type).
3. Select Start Background Recognition in the Process menu.
4. Scan the document you want to read using any image acquiring program your scanner works with. Do not close ABBYY FineReader. Save the scanned image in the folder you've saved the open ABBYY FineReader batch to; the file name should be 0001.TIF. ABBYY FineReader will pick up the image automatically and read it. Note: If there are some pages in your batch already, the first scanned image file name should not be 0001.TIF, but XXXX.TIF, where XXXX is the number of batch pages plus one. For example, if there are 10 pages in your batch, the first scanned image file name should be 0011.tif. If you scan one more file, it should be named 0012.TIF, and so on.
5. Scan the second document and save it as 0002.TIF, etc.
6. Press F5 to update the page list.
7. Select Stop Background Recognition in the Process menu to stop recognition.
So ABBYY FineReader will read step by step all the pages you want it to read.
11
ABBYY FineReader 8.0 User’s Guide
Opening Images and PDF Files
You can recognize image files without using a scanner (see the list of supported image formats under "Supported Image Formats"). To open an image:
Click on the downward–pointing arrow to the right of the 1–Scan button and select the Open Image item in the local menu. An Open caption will replace the Scan caption on the button.
Select Open Image from the File menu.
In Windows Explorer, right–click the image file you want to open and select Open with ABBYY FineReader
from the local menu. If ABBYY FineReader is running, the image will be added to the current batch. Otherwise, the program will be launched and the most recently used batch opened before the image is added.
In Microsoft Outlook or Windows Explorer, click on the image file you want to open and drag it onto the minimized ABBYY FineReader window. The image will be added to the current batch and opened in the Image window.
Select one or several images in the Open dialog. The selected images will be displayed in the Batch window, and the last selected image displayed in the Image and Zoom windows. All selected images are copied into the batch folder. See the "General Information on Working with Batches" section for more information on batch organization and a description of how pages are displayed within batches. Tip: If you want the opened images to be recognized right away, select the Open&Read mode:
1. Select the Open&Read item in the Process menu or just press CTRL+SHIFT+D. The Open dialog will open.
2. Select the images for recognition in the Open dialog.
Opening PDF files
The author of a PDF file can limit access to his PDF file. For example, the author may protect his file by a password or restrict certain features such as extracting text and graphics. It would be a violation of the author's copyright to access these restricted features, therefore ABBYY FineReader will ask you for a password to open such files.
Scanning Dual Pages
When scanning a bound document (i.e. a book), a dual–page scan, which scans both pages simultaneously, is easiest. You can increase recognition quality, though, by splitting the two sides after scanning, in order to perform recognition, layout analysis, and de–skewing (if necessary). To split a dual page:
Select the Split Dual Pages option on the Scan/Open tab (Tools>Options menu) prior scanning.
This command splits each dual page into two batch pages. See "General Information on Working with Batches" for more information on batches. Note: If a dual page has been split incorrectly, deselect the Split dual pages checkbox and re–scan the dual page, or add the page images to the batch again. Finally, try to split the image manually using the Split Image dialog (Image>Split Image).
Adding Bussiness Cards Images to a Batch
The most efficient way of inputting business cards is to fit as many cards as possible onto the scanner plate. After input, though, each card should be recognized as a separate page (particularly if de–skewing has been done). You may choose either automatic or manual splitting tools to separate the business card image into individual cards. Note: This process requires that the cards be arranged in a specific order. Consult the “Working with Business Cards” section in the ABBYY FineReader Tutorial for more information.
To split the image:
1. Select the image in the Batch window.
2. Select Split Image from the Image menu to open the Split Image dialog.
3. Click on Split business cards.
Note:
1. This process removes the split page from the batch and replaces it with individual card images. For more detailed information, see the "General Information on Working with Batches" section.
2. If the image has been split incorrectly, try to split the image manually by using the Add vertical separator/Add horizontal separator button.
3. In order to delete all separators, click the Remove all separators button.
4. To move a separator, switch to Select separator mode (click the
5. To delete a separator, switch to Select separator mode (click the outside the image.
button).
button) and move the separator
Using a Digital Camera to Photograph Texts
Taking photos of documents requires some skill and practice. In this section you will learn how to set up your camera so that you can get document photos suitable for OCR. For more detailed information about the settings of your particular camera, please refer to the documentation supplied with the camera.
12
ABBYY FineReader 8.0 User’s Guide
Before taking shots...
1. Make sure that the page fits entirely within the frame and no unwanted objects are visible.
2. Make sure that lighting is evenly distributed across the page and there are no dark areas or shadows.
3. Straighten out the page if required and position the camera parallel to the plane of the document so that the lens looks to the center of the text being photographed.
Digital Camera Requirements
Minimum Requirements
2–megapixel sensor
Variable focus lens (fixed–focus cameras, common in cellphones and hand–held devices, will usually produce
images unsuitable for OCR)
Recommended Requirements
5–megapixel sensor
Flash Disable mode
Manual aperture control or aperture priority mode
Manual focusing
An anti–shake system, otherwise the use of a tripod is recommended
Optical zoom
How to Photograph Texts
Lighting
Make sure there is enough light – daylight is recommended. In the case of artificial lighting, use two light sources positioned so as to avoid shadows.
Positioning the Camera
The use of a tripod is highly recommended. The best results are obtained when shooting at the maximum optical zoom. The lens must be positioned parallel to the plane of the document and look towards the center of the text. At full optical zoom, the distance between the camera and the document must be sufficient to fit the entire document into the frame. Usually this distance will be 50–60 cm.
Flash
If there is enough light, turn off the flash to avoid the glare of the page and sharp shadows. In poor lighting conditions, try using the flash from a distance of about 50 cm, but even then using additional illumination is recommended.
Important! Using the flash for documents printed on glossy paper will cause the worst glare.
Shooting Mode
Aperture In poor lighting conditions the recommended aperture is ~3.5 – 5.6, i.e. the maximum allowed by the camera. In bright daylight, smaller apertures will produce sharper images.
13
ABBYY FineReader 8.0 User’s Guide
ISO Speed
In poor lighting conditions, be sure to select a higher ISO setting.
Focus
Autofocus may not work properly in poor lighting conditions. If this is the case, focus the camera manually.
White Balance
If your camera allows, use a white sheet of paper to set white balance. Otherwise, select the white balance mode which best suits the current lighting conditions.
Additional Recommendations
Insufficient lighting will cause the camera to increase expositions, which may have an adverse effect on the sharpness of the resulting picture. Try the following:
Enable the anti–shake system, if available.
Use autorelease to prevent the shaking of the camera caused by pressing the shutter release button.
What do I do if...
The picture is too dark and low–contrast
Try using additional light sources. Otherwise, open up the aperture.
The picture is not sharp enough
Autofocus may not work properly in poor lighting or when trying to photograph the document from a close distance. In poor lighting conditions, try using an additional light source. When photographing a document from a close distance, try using the Macro (or Close– Up) mode. Otherwise, focus the camera manually if manual focus is supported by your camera. If only a part of the picture is blurred, try a smaller aperture. Increase the distance between the document and the camera and use the maximum zoom. Focus on a point somewhere in between the center and a border of the image.
The flash causes a glare in the center of the picture
Turn off the flash. Otherwise, try photographing from a greater distance.
Working with an Image
Despeckle image
The recognized image may contain a large amount of "dust" (i.e. excess dots) if the original is medium–to–low print quality. Dust, when it resides close to character outlines, may adversely affect recognition quality. To decrease the number of dots:
Select Despeckle Image from the Image>Image Adjustment menu.
To despeckle a particular block:
Select Despeckle Block from the Image>Image Adjustment menu.
Note: Despeckling may decrease recognition quality if the original document is very faint or contains a light font. Very small
characters, such as periods or commas, and parts of very thin characters may disappear.
Change image resolution
Image resolution shows the fineness of detail that can be distinguished in an image and is measured in dots per inch (dpi). ABBYY FineReader shows best OCR performance when vertical and horizontal resolution is the same and is in the range from 50 to 3200 dpi. The recommended range is 200–600 dpi, and the recommended setting is 300 dpi. If image resolution is too small or too large, this may have an adverse effect on the quality of OCR. Some image formats, e.g. *.bmp files, have no resolution. Sometimes an image may have non–standard resolution, e.g. 204*96 dpi, which may also adversely affect the quality of OCR. ABBYY FineReader checks the resolution of each image and corrects if required, leaving the image dimensions unchanged. Images whose resolution has been corrected by the program are marked with image to see a pop–up tip. If OCR quality for a particular image is poor, changing its resolution may help improve the quality. To change the resolution of an image marked with
In the Batch window, select the image marked with whose resolution you wish to change. If the pop–up tip
.
says that the image has invalid resolution, select the Correct Resolution command from the Image menu.
in the Batch window. Place the mouse cursor over such an
14
ABBYY FineReader 8.0 User’s Guide
In the dialog that opens, either select the type of the image (scanned image, faxed image, or screenshot) or select
Other resolution and type in the exact resolution of the image.
Select Selected images to change the resolution of the selected images. Select All images in batch to change the resolution of all the images in the batch. The latter option is recommended for images obtained from one and the same source.
Straighten text lines
When scanning very thick books, the text close to the binding may be distorted. Similarly, when photographing text with a digital camera, the text close to the margin may be distorted. To remedy such distortions:
Select Tools>Options and click the Scan/Open tab. Under Image processing, select Straighten text lines.
Note: Straightening text lines may take some time.
Invert image
Some scanners invert images (turning black into white and vice versa) during scanning. You may wish to apply the Invert Image option to create a uniform or standard appearance (e.g. a black font against a white background) among the documents. To do this:
Select Invert Image from the Image>Image Adjustment menu.
Note: If you scan or open inverted images, select the Invert image item in the Scan/Open group in the Legacy Options dialog
prior to adding these images to the batch. To open the Legacy Options dialog, click the Legacy Options... button on the General tab in the Options dialog.
Rotate or flip image
Recognition quality relies on the image having a standard orientation (the text should be read from top to bottom and all lines should be horizontal). ABBYY FineReader automatically detects page orientation during the recognition stage. If the program detects page orientation incorrectly, clear Detect image orientation (during recognition) on the Scan/Open tab and rotate the image manually. To do this:
Click clockwise.
Click or select Rotate Counter–Clockwise from the Image>Rotate /Flip Image menu to rotate the image 90° counter–clockwise.
Select Rotate Upside Down from the Image>Rotate /Flip Image menu to rotate the image 180°.
To flip the image:
horizontally (around the vertical axis) – select Flip Horizontal from the Image>Rotate /Flip Image menu,
vertically (around the horizontal axis) – select Flip Vertical from the Image>Rotate /Flip Image menu.
or select Rotate Clockwise from the Image>Rotate /Flip Image menu to rotate the image 90°
Clear block
You may choose to skip recognizing a particular image area or eradicate large areas of dust on the image by erasing them. To do this:
Select button to erase the selected image area.
and then select the image area you want to erase by holding down the left mouse button. Release the
Crop image
Sometimes a scanned image may have dark borders. You can crop the unwanted black areas before running OCR. You can also use the Crop Image tool to reduce the image to a standard paper size, such as A4 or A5.
1. In the Image window, select the tool (or select the Crop Image command from the Image menu);
2. The image will be displayed in a Crop Image window and its borders indicated by black lines.
In the drop down–list to the left, you can select the scale at which the image must be displayed in the
window;
To crop the image, rest the mouse cursor on the color border and drag it to the desired location.
Alternatively, you can rest the mouse cursor in one of the corners and drag it diagonally. The part of the image that will be removed will be displayed in gray. Click the Crop button;
To reduce the image to a standard paper size, select the desired paper size from the Crop to list to the
right;
To skip cropping the current image and go to the next one, click the Skip button;
Clear the Move to next image box if you do not wish ABBYY FineReader to automatically move to the
next image once you are finished working with the current image.
Note:
1. We recommend cropping an image before you have drawn blocks and recognized the image.
2. You can change the color of the image borders used in the Crop window. In order to do this, go to the View tab
of the Options dialog (menu Tools>Options). In the Appearance group, select the Crop Image Block from the list and click the Color field. In the Color dialog, choose the desired color.
15
ABBYY FineReader 8.0 User’s Guide
Increase/Decrease image scale
Select / on the Image bar (from the Image window) and click on the image. The image scale will double/halve.
Right–click the image and select Scale. Choose the desired scale (by percentage) from the local menu.
Get image information
You can obtain a number of parameters about your image: image width and height in pixels; vertical and horizontal resolution per inch (dpi); and image type. To do this:
Right–click on the image and select the Properties item from the local menu. In the dialog that opens, select the Image tab.
Print image
You can print the image in the Image window, the pages selected in the Batch window, or all batch page images. To do this:
Select File>Print>Image. The Print dialog will open. Set the desired printing parameters (the printer to be used, number of pages to be printed, the number of copies etc.).
Undo the last action
Click the Undo button on the Standard bar .
Tip: To reverse an Undo action, click the Redo button on the Standard bar
.
Page Numbering
A number is assigned to each scanned page. The default number is the number of the last batch page plus one. You may set page numbers manually if you want to retain the original page numbers in the document or if you want to scan pages according to page number.
To specify page numbers:
Select Ask for page number before adding page to the batch on the Scan/Open tab (Tools>Options
menu).
To scan a large number of double–sided pages according to page number:
1. Select Ask for page number before adding page to the batch on the Scan/Open tab (Tools>Options).
2. Specify a number for the first scanned page in the Page number dialog, then select Odd and even separately in the Page numbering field. Select an order for the pages: ascending or descending to reflect the way in which the double–sided pages have been entered into the automatic document feeder (i.e. whether the last page or the first page has been placed on top).
Batch Image Options
Select Convert color and gray images to black and white to scan images in grayscale using the TWAIN–Source interface. The scanned images will not retain color pictures or colored fonts or backgrounds. This option reduces the amount of disk space needed to store scanned images.
Note: This option can be found in the Legacy Options dialog. To open this dialog box, click the Legacy Options... button on the General tab in the Options dialog.
Page Layout Analysis
Before starting the recognition process, ABBYY FineReader must know which image areas it needs to recognize. To achieve this, the page layout analysis process identifies text blocks, picture blocks, table blocks, and barcode blocks. In this chapter you will learn more about: when manual page analysis is necessary; what block types are available; how to edit blocks drawn using automatic layout analysis; and how to streamline the layout analysis with block templates.
General Information on Page Layout Analysis
Page layout analysis can be done either automatically or manually. In most cases, ABBYY FineReader manages the complex task of analyzing page layout by itself. Start automatic analysis by clicking on the 2–Read button. Recognition and layout analysis are performed simultaneously.
Click this button to start the process of reading an open image. To change the button mode, click the arrow at the right of it and select the necessary item in the local menu.
Note: Stand–alone page layout analysis is also available (Process>Read>Analyze Layout menu). This process may be needed at times, but often this approach provides inferior page layout analysis, since coupled layout analysis/recognition uses information acquired during recognition to improve layout analysis.
You may opt to draw blocks manually if:
16
ABBYY FineReader 8.0 User’s Guide
1. Only a part of a page needs to be recognized;
2. Automatic layout analysis has drawn blocks incorrectly.
Tip:
In some cases, the quality of the automatic layout analysis can be improved by changing the page layout analysis options. To view the current layout analysis options, go to the Read tab, Tools>Options menu.
If the application has drawn some blocks incorrectly, it is often faster to edit the incorrect blocks with the block editing tools than to delete the blocks and draw them again manually.
Block Types
Blocks are image areas enclosed in frames. Blocks tell the system which image areas should be recognized and in what order. The blocks also influence how the original page layout is retained. The differently colored frames indicate different types of blocks. The frame colors of the blocks can be changed on the View tab of the Options dialog (Tools>Options menu) in the Appearance group. Select the required block type in the Item field and the desired color in the Color field. The following block types are available: Recognition Area – this is used for automatic recognition and analysis. After the 2–Read button is clicked, all blocks of this type will be automatically analyzed and recognized. Te xt – this is used for text image areas and should only contain single–column text. If there are pictures within the text, draw separate blocks around them. Table – this is used for table image areas or for areas of text that are structured in a table. When the application reads this type of block, it draws vertical and horizontal separators inside the block to form a table. This block is represented as a table in the output text. You can draw and edit tables manually. Picture – this is used for image areas that contain pictures. This type of block may enclose an actual picture or any other object that should be displayed as a picture (e.g. a section of text). Barcode – this is used for barcode image areas. If your document contains a barcode that should be displayed as a series of numbers and letters rather than as a picture, draw a separate block for the barcode and set the block type to barcode. Note: If you wish ABBYY FineReader to read barcodes on your documents automatically, make sure that the Look for barcodes option is selected in the Read group in the Legacy Options dialog, otherwise clear this option. (To open the Legacy Options dialog, click the Legacy Options... button on the General tab in the Options dialog.)
Barcode types
Barcode types
Code 3 of 9 Check Code 3 of 9 Code 3 of 9 without asterisk Codabar Code 93 Code 128 EAN 8 EAN 13 IATA 2 of 5 Inerleaved 2 of 5 Check Inerleaved 2 of 5 Matrix 2 of 5 Postnet Industrial 2 of 5 UCC–128 UPC–A UPC–E PDF417
Automatic Page Layout Analysis Options
As a part of automatic page layout analysis the following types of blocks are drawn: text, table, picture, and barcode. To start automatic layout analysis (and text recognition), click the 2–Read button. Before clicking this button, however, select the table analysis options.
17
ABBYY FineReader 8.0 User’s Guide
Click this button to start the recognition of an open image. To change the button mode click the arrow at the right of it and select the necessary item in the local menu.
Table analysis options
Usually, the application divides tables into rows and columns automatically. If additional tuning of table options is needed, open the
Legacy Options dialog and in the Read group select the desired item. (To open the Legacy Options dialog, click the Legacy Options... button on the General tab in the Options dialog). Change these options if:
automatic page layout analysis has drawn the table rows and columns incorrectly;
the document contains a large number of simple tables of the same type (i.e. there are no merged cells or there is
always only one line of text per cell).
1. Use the One line of text per cell in table option if your table has no (or minimal) black separators and each cell has only a single line of text. For example:
Kilometers Miles 1 0.62 5 3.2
– this table has only one line of text per cell
Physical phenomenon t, degrees centigrade Water boiling point 100 Water freezing point 0
– this table has more than one line of text per cell
2. Use the No merged cells in table option if your table has no merged cells in it. For example:
Temperature Degrees centigrade Degrees Kelvin –273 0 100 373
– the Temperature cell is a merged cell
Note: Do not select One line of text per cell in table and/or No merged cells in table if the text contains tables with differing structures. Selecting these options may result in errors during layout analysis and may adversely affect recognition quality.
18
ABBYY FineReader 8.0 User’s Guide
Drawing and Editing Blocks Manually
To create a new block:
1. Select one of the following tools:
– to draw a recognition area;
– to draw a text block;
– to draw a picture block;
– to draw a table block.
2. Position the mouse at the point where you want a corner of your block to be. Hold down the left mouse button
and drag the mouse pointer to the point where you want the opposite block corner to be.
3. Release the mouse button.
A frame will enclose the selected image area. You may change the drawn block type to any of the following: Recognition Area, Text, Table, Picture, or Barcode. To change a block type:
Right–click the block and select the Change Block Type item followed by the corresponding block type in the local menu.
Modifying blocks To move block borders:
1. Click the block border and hold down the left mouse button. The mouse pointer will become a two–headed
arrow.
2. Drag the pointer in the desired direction.
3. Release the mouse button.
Note: If you click a block corner, you can move both horizontal and vertical borders of the block simultaneously. To add a rectangular block part:
1. Select the
2. Click the block you wish to add to. Press and hold down the left mouse button and drag the mouse pointer
diagonally. Select the desired image area and release the button. The resulting rectangle will be added to the block.
3. If necessary, move the block border.
To cut out a portion of a rectangular block:
tool.
1. Select the
2. Click on the portion of the block you wish to cut. Press and hold down the left mouse button then drag the mouse
pointer diagonally. Select the desired area and release the button. The selected rectangle will be cut from the block.
tool.
19
ABBYY FineReader 8.0 User’s Guide
3. If necessary, move the block border.
Note:
1. You can alter block borders by adding new nodes (splitting points). Use the mouse to move split border segments
in any direction. To add a new node, press SHIFT, place the mouse pointer to where you want a new node (the pointer will become a cross) and click on the border. A new node will be created.
2. ABBYY FineReader imposes certain limitations on block form. To be successfully recognized, text lines within
blocks must be unbroken. To enforce these requirements, ABBYY FineReader automatically corrects block borders as parts are added or deleted. For example, if you delete a portion from the top or bottom of a block, a whole block corner will automatically be cut. Similarly, if you try to cut off a part from between the two upper or lower corners, the application will cut the right block corner (upper or lower) as well. The program will also forbid operations that involve moving the segments that form the block borders.
To select a block or a group of blocks:
Select the the blocks you want to select.
Note: You can select one or more blocks using the block drawing tools. To select several blocks at once hold down SHIFT or CTRL
with one of the following tools activated:
selection (i.e. to select an unselected block or vice versa), hold down the CTRL key while one of the following tools is activated:
, or and drag the arrow over the desired blocks.
To move blocks:
Hold down ALT with one of the following tools activated: , , , or and move the blocks.
To renumber blocks:
1. Select the tool.
2. Click the blocks in the desired order. The contents of blocks will be displayed in the output text in the same order.
Note: If you renumber blocks on a previously recognized image, the recognized text in the draft mode of Te xt window will be re– arranged to reflect the new numbering.
To delete a block:
Select the
Select the blocks you wish to delete and press DEL on the keyboard.
Note: If you delete a previously recognized block, its associated text in the Te x t window will be deleted as well.
To delete all image blocks:
Select the Delete ALL Blocks and Text item in the Image menu.
Note: If you delete blocks on an image that has already been recognized, the recognized text in the Te x t window will also be deleted. Editing a table
To edit a table, select one of the following tools on the Image toolbar:
tool and click on the desired block or press the left mouse button and draw a rectangle around all
, , or . Drag the arrow over the blocks you want to select. To invert the
tool and click the block you wish to delete, or
,
– to add a vertical separator;
– to add a horizontal separator;
– to remove a separator.
To merge several cells:
Select the Merge Cells item in the Image>Table Cells menu.
To split previously merged cells:
Select the Split Cells item in the Image>Table Cells menu.
To merge table rows (the division into columns is retained)
Select the Merge Rows item in the Image>Table Cells menu.
Manual Table Layout Analysis
Tip: If automatic table layout analysis has incorrectly drawn table rows and columns, editing the automatic analysis results instead of deleting all the blocks and re–drawing them manually is usually more efficient. Editing a table manually: Use the following Image toolbar tools to edit a table:
– Add a vertical separator
– Add a horizontal separator
20
Loading...
+ 45 hidden pages