Metadata Cleanup
Goobi Step Plugin for Manipulating and Cleaning Metadata for the City Archive Kiel

Introduction

This documentation describes the installation, configuration and use of the Step Plugin for cleaning metadata for the Kiel City Archive. In concrete terms, this is about the automated evaluation of metadata that are to be separated into individual fields, such as information on the scales of historical maps. Furthermore, this plugin also copies the corresponding image files into the master folder of the respective process.
Details
Text
Identifier
intranda_step_kiel_archive_cleanup
Licence
GPL 2.0 oder neuer
Compatibility
Goobi workflow 2022.05
Documentation date
10.06.2022

How the plug-in works

The plugin is usually executed fully automatically within the workflow. It first determines whether there is a block in the configuration file that has been configured for the current workflow with regard to the project name and work step. If this is the case, the METS file is opened and the necessary changes to the METS file take place. Finally, the plugin determines those images that contain a prefix in the file name, which is also read from the METS file in order to copy them into the master folder of the workflow.

Operation of the plugin

This plugin is integrated into the workflow in such a way that it is executed automatically. Manual interaction with the plugin is not necessary. To use it within a step of the workflow, it should be configured as shown in the screenshot below.
Integration of the plug-in into the workflow

Installation

The plugin consists of the following files to be installed:
1
plugin_intranda_step_kiel_archive_cleanup.jar
2
plugin_intranda_step_kiel_archive_cleanup.xml
Copied!
The first file must be installed in the following directory:
1
/opt/digiverso/goobi/plugins/step/plugin_intranda_step_kiel_archive_cleanup.jar
Copied!
In addition, there is a configuration file that must be located in the following place:
1
/opt/digiverso/goobi/plugins/config/plugin_intranda_step_kiel_archive_cleanup.xml
Copied!

Configuration

The configuration of the plugin is done via the configuration file plugin_intranda_step_kiel_archive_cleanup.xml and can be adjusted during operation. The following is an example configuration file:
1
<config_plugin>
2
<!--
3
order of configuration is:
4
1.) project name and step name matches
5
2.) step name matches and project is *
6
3.) project name matches and step name is *
7
4.) project name and step name are *
8
-->
9
10
<config>
11
<!-- which projects to use for (can be more then one, otherwise use *) -->
12
<project>*</project>
13
<step>*</step>
14
15
<!-- folder where to import images from -->
16
<importFolder>/opt/digiverso/import/kiel/</importFolder>
17
18
<!-- METS field which contains the map ID that can be used to automatically find the images for the process -->
19
<fieldForImagePrefix>UnitID</fieldForImagePrefix>
20
21
<!-- Name of workflow steps which shall be deactivated if image files were found -->
22
<stepToSkipIfImagesAvailable>Bilder einspielen</stepToSkipIfImagesAvailable>
23
24
<!-- METS field that contains width, length and scale to be splitted into individual fields -->
25
<size field="SizeSourcePrint"/>
26
27
<!-- METS fields to create from splitted size field and terms to use for splitting the size field (used as "startsWith") -->
28
<sizeWidth field="MapWidth" term="Breite"/>
29
<sizeLength field="MapLength" term="Länge"/>
30
<sizeScale field="MapScale" term="Maßstab"/>
31
32
</config>
33
34
</config_plugin>
Copied!
Parameter
Explanation
project
This parameter determines for which project the current block <config> should apply. The name of the project is used here. This parameter can occur several times per <config> block.
step
This parameter controls for which workflow steps the block <config> should apply. The name of the workflow step is used here. This parameter can occur several times per <config> block.
<importFolder>
This parameter specifies the directory from which the images are to be copied.
<fieldForImagePrefix>
This parameter controls which metadata of the METS file is decisive for the selection of the images to be copied as prefix.
<stepToSkipIfImagesAvailable>
Here you can define how the workflow should behave in case of missing images.
<size>
Determination of the scale field to be evaluated.
<sizeWidth>
Definition of the field to be generated for the width
<sizeLength>
Determination of the field to be generated for the length
<sizeScale>
Determination of the field to be generated for the scale