4.5. Full-text recognition - OCR

To automatically execute a text recognition the TaskManager can be used by connecting it to Goobi with this plugin. Dependent on the configuration you can define which data formats shall be transferred back to Goobi from the OCR engine.

Starting the plugin

The TaskClient call for an OCR job is similar to the calls for other job types::

/usr/bin/java -jar /opt/digiverso/itm/bin/TaskClient.jar 
    -itm http://localhost:8080/itm/service 
    -s {tifpath} 
    -d {processpath} 
    -e -gid {processid} 
    -i {stepid} 
    -T {processtitle} 
    -f {process.Schrifttyp} 
    -n template.xml 
    -l ${metas.DocLanguage} 
    -st intranda-abbyy


The command parameters are explained in the following table:


Possible Goobi variable




URL to intranda TaskManager interface

-e, --returnError


If entered, the TaskClient will end with an error code to prevent the workflow from continuing automatically


0 – 10

Priority to execute this job



Goobi process ID



ID for the workflow step that launches the call

-T, --title


Goobi process title for which the call is launched

-t, --jobtype


Job type

-n, --templatename


Name of previously generated configuration file

-s, --source


Path to media directory for the process

-d, --destination


Path to main process directory

-f, --fontType

e.g. {product.fonttype} or {process. fonttype}

Typeface Fraktur and Antiqua are supported

-l, --language


Language of text being recognised



OCR server type

Operation of the plugin

When a new OCR job is received by the intranda TaskManager, different tickets are generated depending on the number of images to be recognised. Typically, a ticket will cover up to 500 images requiring up to 10 GB of storage. Each ticket comprises a list of image files and an instruction specifying the OCR output format.

Tickets are processed individually. The intranda TaskManager loads the ticket and the corresponding images into the OCR input folder. This can be a local folder, a mounted folder or a WebDav folder. The OCR application monitors this folder and begins the text recognition process once all the data has been transferred.

The intranda TaskManager now monitors both the error folder for possible errors and the OCR application’s control folder for results messages. If a suitable control file is received, all the data belonging to this ticket will be gathered together in the output folder and downloaded. The data is then saved into individual sub-folders in the ocr sub-folder (within the corresponding process folder) based on their file suffix..

Last updated