4.16. Creation of a PDF document for an entire work

This plugin creates a PDF document for a Goobi operation and saves it in the file system. It uses the structure of the METS file of the Goobi process for the table of contents and the image files referenced in the METS file for the page contents.

Depending on the configuration, existing single-page PDFs - if available - can also be used to generate the overall PDF. This makes the creation of the PDF much faster, since no conversion is necessary and full texts can be taken over from the single page PDFs.

Alternatively, full texts can also be written from ALTO documents into the PDF. This is also possible when creating from single-page PDFs, but is suppressed if these already contain full texts.

Starting the plugin

The creation of a PDF is triggered via the TaskClient by the call:

/usr/bin/java -jar /opt/digiverso/itm/bin/TaskClient.jar 
    -itm http://localhost:8080/itm/service 
    -s {processpath} 
    -d {processpath}/ocr/{processtitle}_pdf 
    -e 
    -i {stepid} 
    -T {processtitle} 
    -t PdfCreation 
    -n template 
    -gid {processid}

Parameters

The command parameters are explained in the following table:

Parameter

Possible Goobi variable

Meaning

-itm

http://localhost/itm/service

URL to intranda TaskManager interface

-e, --returnError

-

If entered, the TaskClient will end with an error code to prevent the workflow from continuing automatically

-p

0 – 10

Priority to execute this job

-gid

{processid}

Goobi process ID

-i

{stepid}

ID for the workflow step that launches the call

-T, --title

{processtitle}

Goobi process title for which the call is launched

-t, --jobtype

PdfCreation

Job type

-n, --templatename

template

This parameter is not used by the plugin.

-s, --source

{processpath}

Path to the folder containing the ABBYY XML files

-d, --destination

{processpath}/ocr/{processtitle}_pdf

Path to the folder in which the ebook is to be stored

Einzelseiten-PDFs und ALTO-Dateien werden aus dem Unterordner ocr des source-Ordners - innerhalb dessen sich die Datei meta.xml befindet - aus den Unterordnern <Vorgangsname>_pdf bzw. <Vorgangsname>_alto geladen. Die Dateien müssen, bis auf die Endung, so benannt sein wie die Bilddateien in der METS-Datei.

Ist der Zielordner, in den die PDF-Datei geschrieben wird, identisch mit dem Ordner der Einzelseiten-PDFs, so wird letzterer umbenannt in <Vorgangsname>_pdf_abbyy.

Single page PDFs and ALTO files are loaded from the subfolder ocr of the source folder - within which the meta.xml file is located - from the subfolders <processtitle>_pdf or <processtitle>_alto. The files must be named like the image files in the METS file, except for the extension.

If the destination folder where the PDF file is written to is identical to the folder of the single-page PDFs, the latter is renamed to <processtitle>_pdf_abbyy.

Configuration of the plugin

The plugin can be used without its own configuration file. In this case, single page PDFs and full text ALTO files are used for the creation, if available. The intrandaContentServer configuration file provided by Goobi is used for all other parameters of PDF creation.

If a different behaviour is required, the following configuration file must be created:

/opt/digiverso/itm/config/pdfCreationConfig.xml

This file has the following content:

<config_plugin>
    <usePdfDirectory>[true/false]</usePdfDirectory>
    <useAltoDirectory>[true/false]</useAltoDirectory>
    <contentServerConfigPath>
        /path/to/my/content/server/config.xml
    </contentServerConfigPath>
</config_plugin>

The parameter usePdfDirectory can be used to switch the use of single-page PDFs on or off; the parameter useAltoDirectory can be used to switch the use of ALTO files for integrating full texts.

The contentServerConfigPath parameter can be used to specify a path to an alternative ContentServer configuration if a different behavior is desired than for PDF creation in Goobi.

Stopping the plugin

After stopping the plugin, a currently running PDF creation will be stopped before the plugin is terminated. Any jobs remaining in the queue will be processed after restarting the plugin.

The PDF creation is performed by the TaskManager plugin. So if the TaskManager itself is terminated, the creation is aborted immediately. However, the job is then neither on the waiting list, nor is it considered finished or aborted. In this case, the job must be manually aborted and restarted.

Last updated