Liechtenstein Volksblatt Importer
This workflow plugin enables the mass import of individual newspaper editions for the Liechtensteiner Volksblatt
Introduction
This workflow plugin was implemented to read metadata from the file names and a configuration file and to correctly create or update processes and metadata. This plugin was originally developed for the import of newspaper issues of the Liechtensteiner Volksblatt, but can also be used for other imports as long as their page names follow the same pattern as 001_vbhp_4c_2019-01-11
, where the first three digits indicate the serial number of this page within its issue and the final date is the date of the issue. The description text in between does not matter as long as it does not match the regular expression \d{4}-\d{2}-\d{2}
, which is reserved for storing the issue date.
Overview
Details | |
---|---|
Identifier | intranda_workflow_liechtenstein_volksblatt_importer |
Source code | |
Licence | GPL 2.0 or newer |
Documentation date | 16.11.2023 |
Installation
To install the plugin, the following two files must be installed:
To configure how the plugin should behave, various values can be adjusted in the configuration file. The configuration file is usually located here:
Configuration
Configurations are supposed to be done in the configuration file, which may look like the following example:
The individual parameters are used as follows:
Value | Description |
---|---|
| Path to the folder containing the separated newspaper pages for the import. |
| Name of the production template to be used. |
| If the value is 'true', the files are deleted from the import folder as soon as they have been successfully imported, otherwise all files in the import folder remain unchanged. |
| An independent metadata is created from each element specified here. It accepts six attributes, where |
Using the plugin
It is not important for the plugin which file formats the newspaper pages to be imported have, as all the metadata that needs to be saved is read directly from the file names and from the configuration file. The page files are distributed to the master folders of the corresponding Goobi processes.
The file formats in the file links created by this plugin in the METS file are changed to
tiff
andjpg
, as only these can be rendered correctly by the metadata editor. If the pages cannot be viewed correctly after import, the files may need to be converted first. In the event that these are PDF files that are to be imported to be imported, such a step could look like this:Install the package
pdftoppm
, if not already donecreate a script file under the name
/opt/digiverso/goobi/scripts/script_convertPdfToTiff.sh
Create a workflow step in the workflow with the path to the script
/opt/digiverso/goobi/scripts/script_convertPdfToTiff.sh "{origpath}"
Last updated