OCR execution with mixed fonts
This is the technical documentation for the Goobi plugin for performing OCR with mixed fonts. Images should be marked with the "OCR Page Selection" plugin.
Overview
Identifier
intranda_step_mixedocr
Licence
GPL 2.0 or newer
Last change
25.07.2024 11:56:35
Introduction
This documentation describes the installation, configuration and use of a plug-in for OCR with mixed fonts. This plugin is only useful in combination with the plugin "OCR Page Selection".
Installation
The following files must be installed to use the plugin:
The first file contains the actual plugin. The second file is the configuration file of the plugin.
The precondition for using the plugin is the correct installation and configuration of the plugin as well as the correct integration of the plugin into the desired workflow steps. In addition, the plugin for the manual selection of the pages is required (intranda_step_ocrselector
).
Overview and functionality
After the plugin has been installed and configured, it must be added to a workflow step in Goobi workflow.
The plugin is usually executed automatically, so you should check the box Automatic task
. The plugin intranda_step_mixedocr
must also be selected under Plugin for step
.
Configuration
The content of the configuration file plugin_intranda_step_mixedocr.xml
must be structured as follows:
Several configurations for different projects and sections are possible. These are defined by project
and step
. It is also possible to use the wildcard *
for all steps and projects. The actual configuration then takes place within the config
elements.
The template
is the template that the TaskManager should use, the itmUrl
is the URL to the endpoint of the TaskManager that accepts new jobs. The callbackBaseUrl
must be a URL that can be reached from the TaskManager and points to the Goobi installation where the plugin is installed. It is needed to close the step after successful OCR. The element useOrigDir
determines whether the master images or the derivatives should be used for the OCR. The entry serverType
is the value entered in the intranda license server for the server that is to be used for the OCR. This value can be requested from intranda or omitted when using another OCR provider.
In addition to this plugin-specific configuration, the file /opt/digiverso/goobi/config/goobi_rest.xml
must be released so that the TaskManager can report the successful processing of the jobs to the plugin:
Last updated