4.15. PDF generation for the Goobi viewer

This TaskManager plugin creates PDF files based on METS/MODS documents and writes them to the file system. PDF file creation jobs are created through the TaskClient and can be monitored through a REST interface. When a job is completed, a status message is sent to a configured URL.

This plugin is designed so that interrupted jobs are re-processed and completed if the plugin has been switched off in the meantime, or the entire TaskManager is switched off. This also applies to an unexpected termination of the server.

To communicate with the TaskManager, a Goobi viewer with a version number of 3.1 or later is required. The viewer must have access to a task client, which may already be included in the scope of delivery of the Goobi viewer installation.

Starting the plugin

Unlike the other TaskManager plugins, this plugin is not requested via the usual TaskClient call, but programmatically directly from the Java code. The call in the source code is done as follows:

OcrClient.createPost(taskManagerUrl,
metsFilePath, targetPath, "", "", priority, logId,
title, dataRepository, "VIEWERPDF", downloadIdentifier,
"noServerTypeInTaskClient", "", "", "", "", false);

Parameter

Possible Goobi variable

Meaning

taskManagerUrl

http://localhost/itm/service

URL to intranda TaskManager interface

metsFilePath

/opt/digiverso/viewer/indexed_mets/12345.xml

The absolute path in the filesystem for the METS file

targetPath

/opt/digiverso/viewer/download_pdf/emos5dm.pdf

The absolute path to the download directory

priority

10

The priority in the TaskManager queue

logId

LOG_0003

The METS-ID of the structural element to be used. This value can be zero or empty; in this case, the top structure element of the plant is used

title

12345_LOG_0003

The METS-ID of the structural element to be used. This value can be zero or empty; in this case, the top structure element of the plant is used

dataRepository

/opt/digiverso/viewer/

The (absolute) file path to the viewer repository with the required content.

downloadIdentifier

‚Äč

The identifier of the PDF job

Configuration of the plugin

In the plugin configuration folder of the intranda TaskManager the configuration file viewerpdfconfig.xml with the following content must be present:

<config_plugin>
<jobsInProcessing max="1"></jobsInProcessing>
<contentServerconfig>
/opt/digiverso/itm/plugins/config/config_contentServer.xml
</contentServerconfig>
<viewerServletUrl>http://localhost:8082/viewer/harvest</viewerServletUrl>
</config_plugin>

The individual configuration elements have the following meaning:

Parameter

Meaning

jobsInProcessing.max

The maximum number of PDF generations that can be processed simultaneously.

contentServerconfig

The configuration file to be used for the content server. This should either be the content server configuration file of the viewer itself, or a copy of it.

ViewerServletUrl

The URL to which status updates should be sent when a PDF generation is terminated or fails. This is usually the URL of the harvester interface of the viewer.

You may need to save a copy of the Goobi viewer content server configuration to the location specified in the plug-in configuration.

Status queries via REST-API

The status of a PDF job can be queried via a REST interface. The URLs specified below must be preceded by the URL of the REST interface of the TaskManager. This usually corresponds to the URL of the TaskManager, supplemented by /rest. The following queries are supported:

Details of the order: /viewerpdf/info/{id}

This returns general information about the PDF job from the TaskManager database as a JSON object. The object has the following properties:

Parameter

Meaning

pdfID

The identifier of the PDF job.

title

The title of the PDF.

pages

The number of pages of the PDF.

size

The total size in bytes of all images required for the PDF, or of all required single-page PDFs, if they exist and can be used for PDF generation.

startDate

The date on which the order was received.

endDate

The date on which the order was completed, if applicable.

Position in the PDF queue: /viewerpdf/numJobsUntil/{id}

This returns as plain text the number of PDF jobs queued before the job with the identifier {id}.

Number of pages before the job: /viewerpdf/numPagesBefore/{id}

This returns as plain text the number of pages of all PDF jobs queued before the job with the identifier {id}.

Orders before the order: /viewerpdf/jobsUntil/{id}

This returns a JSON object with a list of all PDF jobs that are queued before the job with the identifier {id}. The list consists of objects with the same properties as those created when requesting job details.

For all requests the title of the requested PDF file can be used instead of the identifier. In this case the last job with this title will be used.

Feedback on completion

When a PDF job is completed, regardless of successful or unsuccessful execution, a Get Request with the following parameters is sent to the viewerServletUrl configured in the TaskManager plugin

Parameter

Meaning

identifier

The identifier of the order.

status

The status of the order as a string. This is either READY if the PDF was created successfully, or ERROR if the job was terminated due to an error.

message

This parameter is only added in the case of status=ERROR, and contains the error message with which the job was aborted.

If a job is manually canceled in TaskManager, no completion message is sent.

Stopping the plugin

If the plugin is stopped, the current PDF generation will be aborted after creating the PDF page currently being processed. The current PDF is then considered not yet generated. The next time the plug-in is started, the current PDF is generated again. This also applies to an unexpected stop of the TaskManager or the entire Tomcat server.

The PDF is created within the TaskManager plugin. Stopping the TaskManager will also stop the PDF generation immediately. The interrupted PDF is also re-generated after a restart of the TaskManager.