OCR execution with mixed fonts
This is the technical documentation for the Goobi plugin for performing OCR with mixed fonts. Images should be marked with the "OCR Page Selection" plugin.
This documentation describes the installation, configuration and use of a plug-in for OCR with mixed fonts. This plugin is only useful in combination with the plugin "OCR Page Selection".
Details | Text |
---|---|
Identifier | intranda_step_mixedocr |
Source code | |
Licence | GPL 2.0 or newer |
Compatibility | Goobi workflow 3.0.4 and newer |
Documentation date | 04.03.2019 |
The precondition for using the plugin is the use of Goobi workflow in version 3.0.4 or higher, the correct installation and configuration of the plugin as well as the correct integration of the plugin into the desired workflow steps. In addition, the plugin for the manual selection of the pages is required .
The following files must be installed to use the plugin:
/opt/digiverso/goobi/plugins/step/plugin_intranda_step_mixedocr.jar
/opt/digiverso/goobi/config/plugin_intranda_step_mixedocr.xml
The first file contains the actual plugin. The second file is the configuration file of the plugin.
The content of the configuration file
plugin_intranda_step_mixedocr.xml
must be structured as follows:plugin:intranda:step:mixedocr.xml
<config_plugin>
<!--
order of configuration is:
1.) project name and step name matches
2.) step name matches and project is *
3.) project name matches and step name is *
4.) project name and step name are *
-->
<config>
<!-- which projects to use for (can be more then one, otherwise use *) -->
<project>*</project>
<step>*</step>
<template>template.xml</template>
<itmUrl>http://localhost:8080/itm/service</itmUrl>
<!-- this must be without a trailing slash -->
<callbackBaseUrl>http://localhost:8080/goobi</callbackBaseUrl>
<useOrigDir>false</useOrigDir>
<serverType>intranda-tesseract</serverType>
</config>