Configuring Advanced OCR Options
All PDF files imported into Continia Document Capture are OCR-processed in accordance with the settings that are applicable at the time of processing. The default settings will apply if you make no adjustments, but it is indeed possible to customize some of the more advanced settings to suit your needs. To find out how, see the sections below.
To configure OCR settings
You can configure the way incoming documents are OCR-processed by following these steps:
- Choose the icon, enter Document Categories, and then choose the related link.
- Open the relevant document category. For example, to open the purchase document category, select the PURCHASE line (not the PURCHASE code itself), and then select Edit in the action bar.
- On the OCR Processing FastTab, configure the settings as needed. For more information and recommendations, see Details and recommended settings below.
Details and recommended settings
The table below contains a number of tips and recommendations for each of the fields you can customize using the above guide:
Field | Details and recommendations |
---|---|
TIFF Image Resolution | In this field, you can enter the number of dots per inch (DPI) to be used by Document Capture when storing OCR-processed files as TIFF files. The entered value must be at least 150 DPI – anything below this will return an error. The higher the entered value, the better the resolution. However, note that very high values will result in correspondingly large image files that take a long time to load in the user interface. For this reason, we recommend that you select 300 DPI, which ensures good resolutions and acceptable file sizes. |
TIFF Image Colour Mode | Here, you can specifiy the color mode of the TIFF files that all imported PDF files are converted into. You can choose between the following options:
|
Max. number of pages to process per file | This field allows you to specify how many pages should be OCR-processed for each imported file, enabling you to reduce the import time and thereby optimize the import process. The last three pages of any imported file is always processed, as they typically contain essential information. Note that Document Capture imposes an overall limit of 500 pages on document import, meaning that no documents longer than 500 pages can be imported into Document Capture, regardless of what value you enter in this field. |
OCR Languages | In this field, you can add all the languages whose character sets should be recognized by Document Capture when OCR-processing incoming documents. We recommend that you limit the number of activated languages to the ones generally used in the documents you import (typically only your own native language and, if relevant, English), as enabling too many languages is likely to lower the overall quality of character recognition. |
Process PDF files with XML files | With this toggle, you can enable the import of XML files embedded in PDFs (such as ZUGFeRD, Factur-X, and XRechnung). For more information, see Enabling the Import of PDF Files with Embedded XML Files (ZUGFeRD, XRechnung). |
For information on the remaining customizable fields on the OCR Processing FastTab, which all relate to the automatic splitting of documents, see Splitting documents automatically.