Foxit PDF Compressor Instruction Manual

PDF Compressor Manual

Foxit Europe GmbH
PDF Compressor Manual
PDF Compressor Manual
Foxit Europe GmbH
PDF Compressor Manual
1. Introduction ............................................................................................................ 1
2. Installation ............................................................................................................. 2
System Requirements ......................................................................................... 2
Requirements of the Born Digital Conversion Option ............................................ 2
Network Requirements ....................................................................................... 3
Service Requirements for OCR ............................................................................. 3
Additional OCR Languages ................................................................................... 3
Custom OCR Dictionary ....................................................................................... 4
Location of temporary files ................................................................................. 4
3. Concept of PDF Compressor .................................................................................... 5
Graphical User Interface ..................................................................................... 5
Job List ....................................................................................................... 5
Job Entry and its Properties ........................................................................ 6
Default Properties ....................................................................................... 6
PDF Compressor Service ..................................................................................... 6
Born Digital Conversion ...................................................................................... 6
4. Getting Started ....................................................................................................... 8
Starting the GUI .................................................................................................. 8
Adding Entries to the List .................................................................................... 8
Running Jobs ...................................................................................................... 9
Removing List Entries .......................................................................................... 9
5. Setting up Job Entries ........................................................................................... 10
Configuring General Job Properties .................................................................... 10
Job List Processing Order .......................................................................... 11
Priority Processing Order .......................................................................... 11
List Processing Order ................................................................................ 12
Job List Processing .................................................................................... 12
Time Scheduled Processing and Processing Timeout ................................... 12
Configuring Advanced Job Options ............................................................ 13
Configuring Input Data ...................................................................................... 13
Input Filter Options ................................................................................... 17
Configuring Advanced Input Options .......................................................... 20
Configuring Data Output ................................................................................... 25
Configuring E-Mail Conversion Options ...................................................... 28
Configuring Advanced Output Options ....................................................... 30
Configuring Post-Processing ............................................................................... 37
Actions upon Successful Processing ........................................................... 38
Actions upon Processing Failure ................................................................ 39
Post-processing command execution .......................................................... 40
Configuring OCR and Barcode Recognition ......................................................... 43
Advanced OCR Options ............................................................................. 45
Barcode Options ....................................................................................... 48
Configuring File and Data Embedding ................................................................ 49
Options for Embedding XMP-Metadata ...................................................... 51
Options for Embedding Bookmarks ............................................................ 52
Options for Embedding XML OCR results ................................................... 52
Options for File Embedding ....................................................................... 53
Configuring Header and Footer ................................................................. 55
Embedding Watermarks ............................................................................ 56
Configuring Document Compression .................................................................. 58
www.foxitsoftware.com info@luratech.com
PDF Compressor Manual: iii
PDF Compressor Manual
Configuring Advanced Compression Options .............................................. 60
Setting up the Default Properties ...................................................................... 63
Template String Syntax Description ................................................................... 64
Escape Sequences ............................................................................................. 65
Modifiers .................................................................................................. 66
Modifier Examples .................................................................................... 67
Examples .................................................................................................. 67
Conditional Substitution ............................................................................ 67
Examples .................................................................................................. 68
Regular Expression Substitution ................................................................. 68
Regular Expression Substitution Example ................................................... 69
6. Managing the Job List ........................................................................................... 70
Adding, Deleting and Copying Entries ................................................................ 70
Changing the Sequence of Job Entries ............................................................... 71
Starting and Stopping Job Entries ...................................................................... 71
Individual Entries ...................................................................................... 71
All Entries ................................................................................................. 71
Monitoring Jobs ................................................................................................ 71
Importing and Exporting Job settings ................................................................. 72
7. Automatic Job List Processing ................................................................................ 73
Finding and Processing Job Lists ........................................................................ 73
Job List File Priorities ................................................................................ 74
Job List File Syntax ............................................................................................ 74
8. Submitting Jobs via the PDF Compressor API .......................................................... 77
API Job Lifecycle ............................................................................................... 78
The C/C++ API .................................................................................................. 78
The .NET API ..................................................................................................... 78
API Demo Applications ...................................................................................... 79
9. Administration and Licensing ................................................................................. 80
General Configuration Settings .......................................................................... 80
Log Files ................................................................................................... 80
Log File Analysis ........................................................................................ 81
Online check for updates .......................................................................... 82
Job List Processing .................................................................................... 82
Priority Processing .................................................................................... 82
Language .................................................................................................. 83
Managing Licenses ............................................................................................ 83
Trial Mode ........................................................................................................ 84
License ............................................................................................................. 84
Caveats ..................................................................................................... 85
Updating Licenses ..................................................................................... 85
Moving Licenses ....................................................................................... 85
External License Monitoring ...................................................................... 86
Managing the PDF Compressor Service .............................................................. 87
Needed Privileges ..................................................................................... 88
Choosing the Service Account ................................................................... 88
Setting up the Service Priority ................................................................... 89
Taskbar Status Icon ................................................................................... 89
CPU Configuration .................................................................................... 89
Born Digital Options .................................................................................. 90
10. Tips and Tricks .................................................................................................... 93
11. Troubleshooting .................................................................................................. 94
www.foxitsoftware.com info@luratech.com
PDF Compressor Manual: iv
PDF Compressor Manual
12. License Agreement .............................................................................................. 96
13. Support and Contact ......................................................................................... 102
www.foxitsoftware.com info@luratech.com
PDF Compressor Manual: v
PDF Compressor Manual

Chapter 1. Introduction

PDF Compressor is an industrial strength solution for automated document conversion and compression. Through its scalability it is suited for processing small amounts of data as well as large quantities of input.
The PDF Compressor compresses and converts scanned documents to PDF or PDF/A. Using the Born Digital Module further digital input formats such as MS Office™ documents, PDF and e-mails in MSG or EML format including attachments can be converted to PDF/A.
The outstanding document compression provided by the PDF Compressor greatly reduces file sizes while ensuring prime document quality in an ISO standard compliant format suit­able for long-term archiving.
The fully integrated OCR engine provides for full-text searchable PDF compression results. Created documents can automatically be handed over to downstream systems for various further business processes.
PDF Compressor can generate PDF files for different requirements, e.g. without using newer PDF features in order to be compatible to Adobe™ Reader™ 5.0 (or other older viewers) or archivable documents that meet the PDF/A standard (PDF/A-1, PDF/A-2, PDF/A-3).
PDF Compressor offers the following features:
• Scalable high quality compression of color document images
• Applying OCR to create full-text searchable PDFs from scanned pages
• Conversion from PDF to PDF/A
• Optimization of images or scanned pages inside existing PDF documents
• Merging, splitting and renaming of documents
• 1D- and 2D-Barcode recognition and splitting, renaming and PDF bookmark creation by barcode
• Combining digital documents with scanned pages, e.g. appending certificates and cre­dentials to an online application
• Unified conversion of digitally created and scanned documents to PDF/A for long term archiving
• Automatic input ingestion from watched input folders
• Time-triggered starting and stopping of jobs
• Fully automated processing in Job List Processing mode
www.foxitsoftware.com info@luratech.com
Introduction: 1
PDF Compressor Manual

Chapter 2. Installation

Run the self-extracting installer and follow the installer instructions. The installer should be named PDFC_Setup_<xXX>_v<N.N.NN.NNN>.exe, where xXX corresponds to the system architecture, either "x86" or "x64" and N.N.NN.NNN corresponds to the product’s version number, e.g. 7.0.0.156).
Note: You need Administrator rights to install PDF Compressor.

System Requirements

• Windows™ 32 Bit or 64 Bit, Vista / Windows 7 / Windows 8 / Server 2008 / Server 2012
• CPU: Intel / AMD or compatible x86- or x64 processors, single core and multi core proces­sors supported
• CPU Speed: Minimum 1 GHz, 2 GHz or more recommended
• RAM: Minimum 1 GB per licensed processor core, 2 GB per core recommended, more than 2 GB for conversion of very large documents on 64 bit systems
• Disk space required for installation: 1 GB
• Microsoft™ .NET Framework 4.0
Note: These are minimum installation requirements. Required resources for production en­vironments should be clarified with Foxit Europe.

Requirements of the Born Digital Conversion Option

In addition to the general system requirements, the following conditions must be met for Born Digital document conversion of the PDF Compressor standard version.
To convert PDF to PDF/A no preparations are necessary.
For successful conversion of other born digital file formats the following conditions must be met:
1. Windows™ 64 Bit, Windows 7 / Windows 8 / Server 2008 R2 / Server 2012
2. The necessary office applications have to be installed.
• For conversion of Microsoft Office™ Documents the appropriate components of Office 2007 or higher must be installed such that all Office document types submitted to conversion can be opened. Ensure that the Microsoft Office applications can save as PDF (the 'Microsoft Save as PDF or XPS' Add-In must be installed for Office 2007 prior SP2).
• For conversion of ODF documents (Open Document Format) the appropriate compo­nents of OpenOffice.org 3.2 must be installed.
3. The service "PDF Compressor" must run under a dedicated user account.
• Use the button "Configure system for office conversion" in File→Options→Conversion Options to set the logon settings and necessary system configurations.
www.foxitsoftware.com info@luratech.com
Installation: 2
PDF Compressor Manual
• The account can be a local or a domain user account, but it should have administrative privileges on the computer.
• The account must have sufficient permissions to access the actual files to convert and to access the office applications via (D)COM interface.
4. Manual preparation before first automatic conversion:
• Log in with the dedicated user account (set up for the service) and start each of the Office applications used for conversion.
• Verify that all required input file types can be opened properly in Microsoft Office™ and/or OpenOffice.org. This procedure is required to complete the Office installation and to ensure its completeness.
• For Microsoft Office™ test the "Save as PDF" feature.
• Ensure that all startup dialogs requiring user input are deactivated.
• In the office application options the trust center settings should disable macros and
other active elements (ActiveX) without notification.
• Automatic updates of references, fields and external content during document
opening or printing must be deactivated.
• For some conversions it is necessary to have a default printer installed.

Network Requirements

PDF Compressor can run as a stand-alone application without any network, but typically it will access net shares by reading and/or writing files. The system administrator must ensure that the account used to run the PDF Compressor’s service has the corresponding access rights. See “Managing the PDF Compressor Service” (p. 87) for more details.

Service Requirements for OCR

The OCR function is implemented by using the ABBYY™ FineReader™ Engine SDK. To use the OCR function it is necessary that the ABBYY SDK 11 Licensing Service is running. The service is set up during installation and should not be disabled.

Additional OCR Languages

In addition to the languages installed there are more OCR languages available. Please con­tact Foxit Europe Support (support@luratech.com [mailto:support@luratech.com]) to re­ceive the files neccessary for the installation of additional OCR languages. PDF Compressor supports OCR for almost any language with latin script as well as some languages with Greek and Cyrillic characters. With an additional license the recognition of Asian languages (Chi­nese, Japanese, Korean) and Arabic or Gothic text type is also available.
There is a separate setup package for the additional OCR languages. It provides a num­ber feature add-ons and language bundles to choose from. Please note that the additional OCR languages must match the version of the OCR engine of the installed PDF Compressor. Therefore, when updating PDF Compressor to a version with a newer OCR engine, a warning
www.foxitsoftware.com info@luratech.com
Installation: 3
PDF Compressor Manual
is shown that the installed additional OCR languages must also be updated. Please contact the support in that case to get the new package.

Custom OCR Dictionary

In addition to the standard OCR languages PDF Compressor supports a user-supplied custom dirctionary. This dictionary can e.g. contain words not directly pertaining to any language such as place or brand names. A custom dirctionary has to be provided in the form of an ABBYY .amd file. The Foxit Europe support can assist you with creating such a file.
If a file "PDFCUserDictionary.amd" is present in the "ocr\Data\ExtendedDictionaries" direc­tory in the PDF Compressor’s installation folder it is ingested as a custom dictionary and will be applied to all jobs for which OCR is activated. As a consequence special words contained in this dictionary shold be recogized better by the OCR. Nevertheless, you have to select one or more of the standard OCR languages for each job with OCR.
In case an additional file "PDFCUserAlphabet.txt" is present in the "ocr\Data\ExtendedDic­tionaries" folder its contents are taken as additional characters for the custom dictionary. In this way recognition of special characters - e.g. letters with diacritic marks - that are not part of the base language, but appear in the custom dictionary, can be enabled. This alphabet need not contain the standard letters of the base language(s) - only the additional charac­ters. The "PDFCUserAlphabet.txt" file must be UTF-8 encoded.

Location of temporary files

PDF Compressor may store temporary files during processing. These files will be located in a subfolder "LT_PDF_Compressor" in the temporary folder of the service account. The path of the temporary folder is specified by the following environment variables: TMP, TEMP or USERPROFILE. If none of these variables is specified for the user or system the Windows directory is used instead. Alternatively the environment variable LT_PDFCOMP_TMP may be used to specify a temporary folder for PDF Compressor.
www.foxitsoftware.com info@luratech.com
Installation: 4
PDF Compressor Manual

Chapter 3. Concept of PDF Compressor

The PDF Compressor is designed to convert files automatically without any user interaction. The PDF Compressor, after being configured, will run in the background. You do not need to be logged in on the system any longer. Even after a shutdown and reboot of your system, PDF Compressor will continue to process its jobs.
This kind of functionality results from PDF Compressor consisting of two basic components:
• A graphical user interface (GUI) that allows you to configure and monitor the system.
• An underlying Windows™ service, which performs the conversion jobs.
However, the GUI can be used as any ordinary Windows™ based application without even noticing the underlying Windows™ service.

Graphical User Interface

Job List

The job list is the main element of the GUI. When you first start the PDF Compressor you will see an empty list. Before you can do anything you need to add an entry to this list. This can be done by creating a new entry or by simply dragging & dropping files or folders onto the list window (compare “Adding Entries to the List” (p. 8)).
An entry has many properties. One of them is the input file or folder you want to process, others include all the parameters that can be configured for the file conversion. Since you have a list with a virtually unlimited number of entries, you can setup different types of file conversions, each with its own, independent set of parameters.
You can configure entries to access different net shares within your network, or you might use them to create different versions of PDF files (high resolution, high quality up to low resolution, high compression) from the same input files.
www.foxitsoftware.com info@luratech.com
Concept of PDF Compressor: 5
PDF Compressor Manual
The complete job list can be stored to a file or restored from a file to enable backup func­tionality or the management of completely different setups.

Job Entry and its Properties

Within each entry you define its properties:
• what kind of input is accessed (a file, all files within a folder, what file types, etc.),
• how it is accessed (once or frequently to make a folder a hot folder),
• where to put the output,
• what to do in case of an error,
• what additional data (metadata) is to be added,
• how to compress the data (quality, resolution, PDF compatibility, etc.).
Each entry can be started and stopped individually allowing you to run individual file con­versions to your needs.

Default Properties

Since most of the time you will not want to create job entries with completely different settings, you can configure Default Properties for new entries. Whenever you create a new entry the settings of the default entry will be used as a template.
With a properly chosen set of default properties, converting files to PDF is achieved by sim­ply dragging & dropping files onto the list and pressing the start button.

PDF Compressor Service

Normally you will not notice the underlying Windows™ service. It just processes the files you setup within the GUI. Since the GUI is independent from the service, you can configure and start one or more job entries, close the GUI and log off if you want. When you open the GUI again, you will be informed of the progress of your running jobs.

Born Digital Conversion

The standard version of PDF Compressor comprises an optional component for conversion of Born Digital files, such as Microsoft Office™ and OpenOffice™ documents and e-mail mes- sages. You can choose whether to install this component, since using this component re­quires an additional license.
An installed and licensed Born Digital component provides for the following types of con­versions:
1. Microsoft Office™, OpenOffice™, RTF, HTML and plain text documents are converted to PDF or PDF/A. The distinction between these input formats is specified via the options on the Input tab of the Properties dialog (see “Configuring Input Data” (p. 13)).
The exact list of format types depends on your installation and configuration of the MS Office and OpenOffice applications. Generally supported applications and formats in-
www.foxitsoftware.com info@luratech.com
Concept of PDF Compressor: 6
PDF Compressor Manual
clude Microsoft Word, Excel, PowerPoint and Outlook as well as OpenOffice Writer, Calc and Impress.
2. PDF Compressor also supports the conversion of e-mail messages saved in the EML and MSG formats. Attachments of the supported input file types are extracted and convert­ed, as well.
3. PDF documents can be converted to PDF or PDF/A. Again, the distinction between these output formats is specified via the options on the Output tab of the Properties dialog (see “Configuring Data Output” (p. 25)). Note that the conversion of PDF does not involve the MS Office or OpenOffice applications.
Certain options are available to improve the stability of the Born Digital conversion, e.g. by restarting it automatically. For details see “Born Digital Options” (p. 90).
www.foxitsoftware.com info@luratech.com
Concept of PDF Compressor: 7
PDF Compressor Manual

Chapter 4. Getting Started

This section describes the basic steps to convert scanned document files to PDF files. More details can be found under Chapter 5, Setting up Job Entries (p. 10).

Starting the GUI

To start the GUI, either double click the PDF Compressor taskbar icon or use the Win­dows™ start menu entry
Foxit → PDF Compressor → PDF Compressor.
When you start the GUI for the first time you will see an empty list:

Adding Entries to the List

There are several ways to add new entries. The most straightforward one is opening the Windows™ Explorer and dragging & dropping a file or a folder onto the list window. The new entry will be set up to convert the given file (or all files within the given folder) to PDF. This output will be placed next to the corresponding input file.
1
Changing the default properties can change this behavior. The explanations given here are related to the Factory Default set-
tings.
www.foxitsoftware.com info@luratech.com
1
Getting Started: 8
PDF Compressor Manual

Running Jobs

There are several ways to start jobs represented by the list entries. Here we merely describe one of them:
Select the entry you want to start and press the start button on the toolbar. The status of the entry will change from stopped via starting to working, and back to stopped when the job is finished.
While the job is running, you can abort it by pressing the stop button on the toolbar. It may take some seconds until the conversion has been aborted. Any incomplete PDF output file will be removed.

Removing List Entries

You can close the GUI without saving your current job list. The list will re-appear in exactly the same way when you open the GUI again2. To remove an entry from the list, select it and
press the delete button on the toolbar.
2
The current list is instantly stored within the system to make it available to the underlying PDF Compressor service.
www.foxitsoftware.com info@luratech.com
Getting Started: 9
PDF Compressor Manual

Chapter 5. Setting up Job Entries

You can edit a job entry’s properties once it has been added to the list (see “Adding Entries to the List” (p. 8)). The properties dialog window can be opened by any of the following actions:
• Double-click the corresponding entry within the list.
• Use the right-click context menu associated with the list entry.
Select the entry and use the toolbar’s properties button .
• Select the entry and use the menu EntryProperties.
Within the properties dialog you can change the name of the entry to make it easier for you to identify the corresponding job. The entry name is also used within the log file (see “Log Files” (p. 80)) to identify information, warnings and errors.

Configuring General Job Properties

The General tab lets you configure general properties of the selected job. You can define the priority of the job here and you can configure a delayed start and periodical stop and restart of the job.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 10
PDF Compressor Manual

Job List Processing Order

The Job priority can be set to values from 1 (highest) to 10 (lowest). Lower numbers corre­spond to higher priority. This setting is only available when the menu FilePriority Pro-
cessing Order is enabled. The toolbar button can also be used to switch Priority Process­ing mode on and off.
1

Priority Processing Order

If Priority Processing Order is enabled, jobs with higher priorities are processed before lower priority jobs. All jobs with the same priority are executed concurrently by processing its job units in interleaved order: Once a unit is done, the next unit of the next job with same priority is processed. When more than one CPU core is licensed and enabled, more than one job unit will be processed at the same time. The list of progress bars located in the lower part of the main window shows information for each CPU core, including the job units that are currently processed.
1
All entries of the job list must be stopped to change the Priority Processing mode.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 11
PDF Compressor Manual
A job unit contains all input files that contribute to one output PDF document2. If input merging is enabled (see “Configuring Input Data” (p. 13)), this includes all files that are merged into a single document, otherwise it is just one single input file.
Hot folder jobs (see “Configuring Input Data” (p. 13)) are considered done when their input folder is empty. Once new input files are detected, a hot folder job becomes active again and is processed according to its priority setting.
Priority Processing Order allows setting up complex job dependencies. High priority hot fold­er jobs can be used to dynamically suspend other lower priority jobs. The priority of a job can be changed at any time, even when it is running. Thus it allows you to start an interim job to be processed immediately without stopping a currently running job.
Please find more information on priority processing options under “Priority Process- ing” (p. 82).

List Processing Order

Priority Processing Order is the preferred way to control the order in which jobs are processed. For backward compatibility List Processing Order is still available. To enable List Processing Order, disable Priority Processing Order from the File menu.
In List Processing Order mode, all jobs are processed in the order they appear in the job list. Priorities are ignored and jobs are executed sequentially. There is no way to dynamically suspend a running job.

Job List Processing

In Job List Processing mode (see Chapter 7, Automatic Job List Processing (p. 73)) PDF Compressor dynamically reads job lists from the job list folder. In API Mode, jobs can be passed via a programmatic interface (see Chapter 8, Submitting Jobs via the PDF Compressor API (p. 77)). These modes permit controlling PDF Compressor externally in a workflow setup.
For Job List Processing, the List Processing Order mode can be used to enforce sequential processing of the job lists and the jobs contained. In normal Priority Processing Order mode the jobs are processed in parallel and new job lists are added as soon as at least one proces­sor core becomes idle and there are no more jobs pending.

Time Scheduled Processing and Processing Timeout

Settings for Time Scheduled Processing are enabled by selecting their check boxes. The fol­lowing parameters can then be set for a time-scheduled job:
1. The desired start time is entered in the Start job at field. You can either type in a date and time or pick it from the calendar widget, which appears when clicking on the arrow to the right of the entry. The job is then started at this point in time.
2. The Duration specifies the time span for which the job should run. You can select days, hours or minutes to specify the duration. Once this time has elapsed the job is stopped.
2
Please note that a PDF input file might be split into multiple output files. For further information, see “Configuring Data Out-
put” (p. 25).
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 12
PDF Compressor Manual
3. The Restart interval denotes a second time span, after which the stopped job is started again for the given duration. This value can only be given in days. Once the job has been started, such re-starts occur periodically, unless the configuration is changed.
Note: The Duration and Restart interval settings can be enabled independently. This means that a job can run just once for a given duration without restart. And on the other hand a job can be periodically restarted, without a fixed duration, assuming that it will finish once it has processed all its current input.
With the Processing Timeout option you can set a maximum duration for the processing of a job unit. (The definition of job unit is given above under “Priority Processing” (p. 82).) If processing of a job unit takes longer than the specified duration, the processing of this unit is aborted. Such an abort is considered an error, which is recorded in the log file. Concerning optionally configured post-processing steps an abort is also treated as an error.

Configuring Advanced Job Options

The Advanced Job Options dialog lets you configure additional error handling and Foxit Eu­rope maintenance settings.
1. If the option Continue job on critical error is turned on, PDF Compressor will not stop a hot folder when a critical error occurs (such as "input folder does not exist" or "output folder cannot be created") but to go idle instead and try again later.
2. The Foxit Maintenance entry should not be used unless advised by the Foxit Europe Sup­port Team. The Key must be left blank for normal processing.

Configuring Input Data

The Input tab lets you configure which input documents should be converted to PDF format.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 13
PDF Compressor Manual
Currently TIFF, JPEG, JP2 (JPEG 2000), BMP, PNM, GIF, PNG, PDF and JPM (JPEG 2000 part 6) formats are accepted for input files. With the optional Born Digital Module, additional input formats - such as Microsoft Office and saved e-mail messages - can be converted.
Multipage TIFF and PDF files are recognized and converted to multipage PDF documents. Moreover you can set up input resampling to adjust the resolution of your data.
1. Choose File if you want to convert a single file, or Directory if you want to convert all files within the given folder. The browse button lets you select the file or directory.
You cannot use mapped network drives to specify the input file or directory. The under­lying PDF Compressor Service has no knowledge on mapped network drives. Instead you can specify the network share using the syntax \\host\share\dir. See “Choosing the Service Account” (p. 88), for setting access rights in this case.
2. When Directory is selected, you have the following options:
a. Enable Include subfolders if the whole directory sub tree should be converted. Other-
wise only files directly within the given folder will be processed.
b. Enable Delete empty folders if you have selected Include subfolders and want to clean
up the input sub tree. This can be useful if you configure a setup that moves or deletes all processed input files (see also “Configuring Post-Processing” (p. 37)). Once all
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 14
PDF Compressor Manual
files of the given job have been processed, the output folder will be examined and all empty subfolders will be deleted. If the input folder is a hot folder (see next item), the cleanup process will be performed whenever the hot folder becomes idle, i.e. when no more files are found to be processed at the moment.
c. Enable Check every <number> seconds to make the given folder a hot folder, i.e. to
let the job check that folder frequently for newly generated files (e.g. by an external scanner or capture software). A job that has this options enabled will never stop auto­matically, since it keeps on monitoring the given folder. You have to stop it manually.
Note: This option only makes sense, if you delete successfully converted input files, or move them into another folder. See also “Configuring Post-Processing” (p. 37).
d. Enable Merge all pages if all pages from all processed files of a folder should
be converted to a single PDF output document. The output file will be named <name_of_the_input_folder>.pdf. The input files of a folder will be processed in al­phabetical, case-sensitive order. This defines the page order of the output PDF.
This option can be used to merge pages from scanners that put numbered single page files (such as page000.tif, page001.tif, etc.) into a single folder.
e. Input file formats can be selected from the given list. The format is derived from the
file’s extension.
TIFF enables the conversion of *.tif and *.tiff files. Currently PDF Compressor sup­ports bilevel, grayscale and RGB images. CMYK and CIELab color as well as palletized images are not supported yet.
JPEG enables the conversion of *.jpg files.
JPEG 2000 enables the conversion of JPEG 2000 files. The extensions *.jp2, *.jpf, *.jpx. *.j2k, *.jpc and *.j2c are recognized.
PDF enables the conversion of PDF files. If Always is chosen under Rasterize PDF Input, the input will be rasterized to the resolution specified on the Advanced tab of the Advanced Input Options dialog (see “Configuring Advanced Input Op- tions” (p. 20)) before applying the MRC compression technique. If Automatic has been chosen for Rasterize PDF Input, PDF files may be rasterized, e.g. if conver­sion to PDF/A is not possible otherwise.
BMP enables the conversion of *.bmp files. Palletized images as well as some exotic variants are not supported yet.
PNM enables the conversion of *.ppm and *.pgm files.
GIF enables the conversion of *.gif files. For animated GIF files only the first frame
PNG enables the conversion of *.png files.
• The formats below Born Digital are only available if the Born Digital conversion op-
www.foxitsoftware.com info@luratech.com
is converted and a warning is logged.
tion is installed. Any combination of these input document types can be converted to PDF or PDF/A. Refer to “Born Digital Conversion” (p. 6) and “Requirements of the Born Digital Conversion Option” (p. 2) for details on Born Digital Conversion. The following list shows the file extensions matching the input formats.
Setting up Job Entries: 15
PDF Compressor Manual
MS Word : *.rtf, *.doc, *.docx, *.docm
MS Excel : *.xls, *.xlsx, *.xlsm
MS PowerPoint : *.ppt, *.pps, *.pptx, *.pptm, *.ppsx
MS Project : *.mpp
MS Visio : *.vsd, *.vdx, *.vsw
E-Mail : *.msg, *.eml, *.emlx
OpenOffice Writer : *.odt, *.sxw
OpenOffice Calc : *.ods, *.sxc
OpenOffice Impress : *.odp, *.sxi
Textdateien : *.txt
HTML : *.htm, *.html
Extra Extensions List serves to supply a list of additional file extensions to process. This entry is only enabled, when Born Digital Conversion is installed. The individual extensions entered must be separated by blank spaces.
3. The Advanced Input Options button opens a dialog for configuring more input options. See “Configuring Advanced Input Options” (p. 20) for details.
4. If you configure multiple PDF Compressor installations to work with the same input di­rectory (typically a net share), you need to enable Use lock files for PDF Compressor syn- chronization. This ensures that input files currently processed by one PDF Compressor are not accessed by another PDF Compressor. This is implemented by the use of lock files *.lock and *.dlock that PDF Compressor uses to lock certain input files or all files of a directory (when Merge all pages is enabled).
Note: The PDF Compressor Service must have write access to the input directories when using lock files.
Important note: Files ending with *.lock and *.dlock will be overwritten without notice when using lock files. You should not use these extensions for your own files.
Note: A file PDF_Compressor.ulock can be used as a user defined lock for all files of a directory. In contrast to the lock files *.lock and *.dlock, this file can be managed by the user. If Use lock files for PDF Compressor synchronization is enabled, PDF Compressor will never process any file it finds within a directory that contains a file PDF_Compres­sor.ulock.
This can be used to configure a hot folder with Merge all pages enabled: The external process creates a new directory underneath the PDF Compressor hot folder containing a file PDF_Compressor.ulock and copies all files that need to be merged afterwards. When the last file copy is done, the external process removes PDF_Compressor.ulock. Next time PDF Compressor inspects the hot folder, the processing of the given files begins.
5. The Rasterize PDF Input setting offers different modes for handling PDF input documents:
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 16
PDF Compressor Manual
a. Automatic is the default setting. In this mode PDF Compressor will automatically de-
termine in which cases to convert typeset or vector contents of PDF input files into raster graphics. It will try to avoid such conversions, but will resort to them as a fall­back option for documents that cannot be converted otherwise.
b. Always - this mode will convert all pages of all PDF input documents to raster graphics
before compressing or converting them. For PDF input this mode is equivalent to the former MRC Compression mode.
c. With the Never setting the PDF Compressor will not try to rasterize any non-raster
input. In this way you avoid unintentional conversion of typeset or vector content into raster graphics. On the other hand, this may result in failure to convert certain kinds of PDF input files to PDF/A.
Note: With the Automatic or Never setting chosen, raster graphics already present in the input documents may still be resampled or compressed, if Color image optimiza- tion or Bitonal image optimization are selected on the Compression tab. (Such graph- ics may e.g. be scanned pages or large images inside a PDF document.)

Input Filter Options

File Extensions
On the File Extensions tab of the Input Filter Options dialog you can specify additional input file extensions, such that files these non-standard extensions are processed by PDF Com­pressor, too.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 17
PDF Compressor Manual
Each line of the list widget on the File Extensions tab shows one of PDF Compressor’s input file types. You can find the file extensions configured by default for each of these types under “Configuring Input Data” (p. 13). For the selected job additional file extensions can be associated with each file type in the list’s right hand column. For the selected job each additional extension may only be associated with one input file type.
File Names
On the File Names tab of the Input Filter Options dialog you can specify a file name pattern to include only input files with matching names in the processing by PDF Compressor. Al­ternatively, you can exclude the matched files from processing.
1. Depending on the selection of either Process matching or Skip matching potential input files that match the filter criteria are either included in the processing or skipped.
2. In the File Name Filter Pattern (may contain * and ?) entry you specify a filter pattern, against which the names of all potential input files are matched. This pattern may contain the special characters '*' and '?'.
These characters have the same significance as with the Windows™ file search. The ques­tion mark '?' matches an arbitrary single character and the asterisk '*' matches an arbi­trary sequence of characters, which may be empty. The pattern is matched only against the file’s base name, i.e. without the file extension and all alphabetical characters are matched in a case-insensitive fashion.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 18
PDF Compressor Manual
Examples: The filter pattern 'Client-????-??' matches the file names 'Client-0000-12.jpeg', 'Client-9900-00.jpeg' and 'client-ABCD-XY.jpg'. It will not match the names 'Clien­t-0000-1.jpg' or 'Client-ABC-XY.jpeg'.
The filter pattern 'File*' matches the file names 'file.jpg', 'File_0123.jpg', 'FILE-A-B-C-D.jpg' and 'File-2016-08.jpg'. It will not match the names 'Fille.jp2', 'Fil.jpeg' or '2016_File.jpg'.
The filter pattern '??_scan*' matches the file names '00_SCAN.TIF', 'ab_Scan.jpg', '09_scan_2014.jpg' and 'XY_Scan_2014.jpeg'. It will not match the names 'scan2014.jpg' or '00_SCA.jpeg'.
PDF Input Options
On the Input Filter tab of the Input Filter Options dialog you can configure options that ex­clude certain input files from being processed by PDF Compressor.
1. The Bypass PDF files smaller than … kB option serves to skip the processing of input files smaller than the given file size. This can e.g. be used to suppress the processing of already compressed PDF files.
2. The Metadata Filter options allow you to skip processing of PDF input files depend­ing on the PDF metadata. Skipped files will not be processed to output files. Instead they are passed directly to input file post-processing (see “Configuring Post-Process- ing” (p. 37)).
a. Depending on the selection of either Process matching or Skip matching PDF files that
match the filter criteria are either omitted from or included in the processing.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 19
PDF Compressor Manual
b. The filter criterion is matched in case the Metadata key to check is found in the PDF
Metadata and its value matches the Search text below.
c. Depending on the option Match whole value the Search text is either compared to the
complete value or searched as part of the metadata value.
d. Skipped input files can be copied to a directory by enabling Copy skipped files to di-
rectory.

Configuring Advanced Input Options

Clicking the Advanced Input Options button brings up a dialog with further settings concern- ing input options.
Advanced Input Options
1. Enable Read only up to <number> pages from input file to restrict the number of pages read from each input file to the given limit. When you merge all files from an input di­rectory (see “Configuring Input Data” (p. 13)) to a single PDF output file, setting the maximum number of input pages to 1 can be used to create documents that only contain the cover pages of these files.
2. Select Case insensitive sorting of files and folders if you want the input file processing to be in case insensitive alphanumeric order. Otherwise, the ordering will be case sensitive (with capital letters lexically smaller than lower case letters and therefore "Ac" coming before "ab").
3. Enable Read local copy of each input file to solve problems related to the location of input files on a network share. The files will be temporarily copied to the local temporary folder
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 20
PDF Compressor Manual
where they will be opened and read. This is especially useful when the performance for transmitting small parts of data in the network is low.
4. Default Resolution: If the input image file does not contain any resolution information, the value (in dpi) given here will be used. In addition this value is used when an input PDF file is rasterized (PDF files do not have specific resolution information in general).
a. Enable Optimized dpi for PDF input to automatically choose an optimal dpi value for
the rendering of each PDF input page. The optimal value is calculated from the reso­lutions of all images embedded in the PDF input page. If there are no images on the page, the setting of the Default Resolution will be used. Otherwise the highest reso­lution of the detected images is calculated and clipped to the resolution range given by the Resampling option of the Input tab. We recommend enabling this option only when the input PDF documents originate from a scan process.
b. Born digital born PDF documents may contain images of small physical size (e.g. less
than an inch in either dimension) but containing many pixels (e.g. 1000 x 1000 pixels) and thus yielding a non-realistic resolution (more than 1000 dpi in the example). For born digital documents you should either use a fixed Default Resolution value with­out enabling Optimized dpi for PDF input, or specify a narrow resolution range in the Resampling settings.
5. Resampling lets you change the input’s resolution. There are two reasons, why you might want to change the input’s resolution:
a. If the input resolution is too low (below 150 dpi), the PDF Compressor output might
have a quality that is too low for your needs (although the file size will be extremely small). Changing the resolution e.g. to 300 dpi will produce better results.
b. If your original scans are high resolution (e.g. more than 600 dpi), you might want to
produce a lower resolution PDF output.
The Minimal and Maximal dpi settings address these requirements. Whenever the resolution of the input files is outside the given [min, max] range, the resolution will be changed by resampling to the minimum value or the maximum value.
Note: Please bear in mind that the Resampling setting only affects grayscale and color images. The resolution of bilevel or B/W images will not be changed.
In case of JPEG 2000 input, the ability to decode a JPEG 2000 file to a lower resolution will be used. This enables fast access to huge JPEG 2000 images, when only a lower resolution is needed.
6. When Directly create PDF pages from JPEG input without recoding is enabled, JPEG input images are not submitted to MRC or JPEG 2000 compression but are directly embedded as pages in the PDF output document. In this fashion repeated compression of such input documents is avoided. On the other hand these pages are not reduced in size.
7. Skip JPEG validation when adding images without recoding omits all integrity checks for JPEG images directly to embed into PDF pages. This involves the risk of creating corrupt PDF output by incorporating corrupt JPEG input images.
8. With the Use orientation stored in TIFF input files option the orientation information optionally contained in TIFF input files determines the orientation of the corresponding pages in the PDF output document.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 21
PDF Compressor Manual
PDF Input Options
The PDF Input tab of the Advanced Input Options dialog lets you configure options that are relevant for opening PDF input files.
1. The Document Open Password is used to open password protected PDF files. The given password is stored within PDF Compressor in an encrypted way. Note that this encryption does not meet the highest security demands. The given password is ignored, when the input PDF file is not password protected.
2. In addition to a password needed to open the PDF document, it can be protected by a Change Permissions Password. This password is used whenever the PDF access permis­sions need to be changed.
3. PDF input rasterization options are available unless Rasterize PDF Input has been set to Never:
a. The Interpolate images switch influences the rasterization of PDF input files. PDF in-
put files may contain embedded images that need to be scaled for rasterization. In- terpolate images should be turned on to obtain best image quality. However if an input PDF file contains only bitonal image content, and this PDF is rasterized to the same resolution as the embedded images' resolution, the image interpolation can be turned off to speed up an OCR process.
b. Default PDF Page Size lets you select a page size that is used only if the input file is a
PDF file that is missing its page size information, but has to be rasterized. Normally all pages of a PDF file carry page size information. However some applications create ill­formed PDF files without page size information. This option has been added to deal with such malformed PDF files.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 22
PDF Compressor Manual
Select user defined at the very end of the list to enter a user defined page size by specifying the width and the height of a page.
c. Selecting the Alternative rasterization method can be useful when input PDF files con-
tain very large pages which lead to out-of-memory errors. The alternative rasteriza­tion method consumes significantly less memory but is slower.
4. The option Restrict PDF Conversion - Use only PDF Optimization for converting PDF input files is only enabled when the Born Digital Module is installed. If selected it serves to suppress the attempt to use Born Digital Conversion functionality when converting PDF input files. Instead these files are only processed using PDF Optimization features.
Note: This option primarily serves to guarantee backward compatibility with jobs set up with pre-7.0 versions of PDF Compressor, which used the PDF Optimization mode for converting PDF input. In general it is preferable to use the conversion provided by the Born Digital Module.
Office Input Options
On the Office Input tab you can configure options for the Born Digital conversion of Mi­crosoft Office input files.
1. If the Use Excel page layout option is selected, the print page layout as specified inside the Excel input files is used when converting this file to PDF. This option is disabled by default, such that each non-empty sheet inside an Excel input file is converted to one PDF output page.
2. When Process MS Office documents with OpenOffice is chosen, all office input files are converted using OpenOffice. Obviously, an OpenOffice installation must be present in
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 23
PDF Compressor Manual
this case. This option is switched off by default and each input document is processed using its native application.
3. The option Remove tagging information ensures that structure information generated by office applications during PDF export is disposed. Structure information is important for accessible documents and should generally be kept. However, if file size is critical this option can help to create smaller PDF output.
Blank Page Detection Options
On the Blank Page Detection tab you can configure options for the Blank Page Detection of input files. Pages that are detected as blank are always exempt from text recognition (OCR).
1. The option Detect blank pages serves to enable or disable blank page detection.
2. In the Excluded margins box, you can choose how much of the margins of all pages should not be part of the blank page detection. This way, it is possible to exclude invariable page headers or footers or punched holes, for example. You can also choose between the units millimeter (mm) and hundredths of an inch (1/100 inch).
3. The blank page detection segments the area of the page to be worked on into tiles of configurable size. The detection is done tile per tile. If in the end all tiles are classified as blank the whole page is considered blank. The unit for the Tile size value is 'pt', which is equal to 1/72 inch.
4. The analysis of a tile’s content is block-based. The size of a block should be set to the size of the smallest meaningful symbol, e.g. the size of a punctuation dot. Depending on the page image’s resolution a block spans multiple pixels. The unit for the Block size is 'pt' which is equal to 1/72 inch.
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 24
PDF Compressor Manual
5. For each block a value is calculated that represents how much the block’s color intensity differs from other blocks in the tile. Each block that has a deviance that exceeds a certain value is considered non-blank. This value is influenced by the Sensitivity property. The sensitivity is a value in the range of 1 through 100. A low sensitivity causes only strong colored structures to be counted as being non-blank. In contrast, a high sensitivity also takes pale textures into account.
6. A tile is considered non-blank if the number of non-blank blocks exceeds a certain thresh­old. The Threshold is given in per mill (one tenth of a percent) defining the maximum fraction of non-blank blocks in relation to blank blocks for a full size tile to still be clas­sified as blank.
7. The option Exclude blank pages from output defines whether detected blank pages are included in the output. Checking this option results in the exclusion of detected blank pages.

Configuring Data Output

The Output tab lets you configure PDF output options including the exact version of PDF to generate and the location where generated PDF files are placed.
1. Select Place output next to input file, and the generated PDF file will be placed into the same directory as the input file. Select Place output in directory and specify a folder by
www.foxitsoftware.com info@luratech.com
Setting up Job Entries: 25
Loading...
+ 78 hidden pages