Data Poller

Goobi Administration plugin for periodic updating of existing METS files with content from a data query

Overview

Introduction

This documentation describes the installation, configuration and use of the Administration Plugin for automated repeated retrieval of data (e.g. from a catalog) to update records in Goobi workflow.

Installation

The plugin consists in total of the following files to be installed

plugin_intranda_administration_data-poller.jar
plugin_intranda_administration_data-poller-GUI.jar

These files must be installed in the correct directories so that they are in the following paths after installation:

/opt/digiverso/goobi/plugins/administration/plugin_intranda_administration_data-poller.jar
/opt/digiverso/goobi/plugins/GUI/plugin_intranda_administration_data-poller-GUI.jar

In addition, there is a configuration file that must be located at the following location:

/opt/digiverso/goobi/config/plugin_intranda_administration_data_poller.xml

Overview and functionality

The Data Poller plugin is automatically activated by Goobi. Its runtime starts at the configured start time and repeats according to the configured number of hours, e.g. every 24 hours, i.e. once a day.

If a user wants to have access to the plugin's user interface in addition to this automatic feature, he must belong to a user group that has been granted the following plugin-specific permission for this purpose:

Plugin_Goobi_DataPoller

To assign this right, the desired user group must first have the permission entered in the right pane.

If the permission for the user group should be newly entered, the user must first log in to Goobi again to be able to use this permission level. Afterwards he can click on the plugin Data Poller in the menu Administration and there he can also manually re-trigger an update of the data records by means of a query at any time.

Automatic backups

If the plugin finds updated metadata for an operation and therefore updates the METS file, it will first automatically create a backup of the current METS file meta.xml and if relevant also of meta_anchor.xml. The backup is stored next to the updated METS file.

Logging within the journal

The updates of the metadata by the plugin usually take place fully automatically in the background. In order to be able to understand nevertheless at any time for a data record, what happened with this in the meantime, the events are logged. For each process for which there were changes from this plugin, detailed entries are therefore automatically inserted within the ‚journal'. In addition to the timestamp, these contain, among other things, an exact listing of the changed metadata fields together with their contents. Thus, it is possible at any time to trace the previous or the new value as well.

Configuration

The configuration of the plugin is done via the configuration file plugin_intranda_administration_data_poller.xml and can be adjusted during operation. The following is an example configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<config_plugin>

	<!-- multiple different rules can be defined for individual use cases.
        you can specify a start time and a delay in hours. the rule will only be executed if
        enabled is true. A rule can be of type filter or hotfolder. If the type hotfolder is
        used you must specify the path inside a path element in the rule.
    -->
	<rule title="SampleProject" enabled="false" startTime="22:00:00" delay="24">

		<!-- filter which items to run through please notice that filters that contain blanks
        need to be surrounded by quotation marks -->
		<filter>project:SampleProject</filter>
		<!--
		<filter>"project:Manuscript items"</filter>
        <path>/opt/digiverso/goobi/import/</path>
        -->

		<!-- which catalogue to use (GBV, Wiener, CBL Adlib ...) -->
		<catalogue>Wiener</catalogue>
		
		<!-- which catalogue field to use and which identifier to use for the 
		catalogue request (use standard variable replacer compatible value here) -->
		<catalogueField fieldName="12" fieldValue="$(meta.CatalogIDDigital)" />

		<!-- define if existing structure subelements shall be kept (true),
        otherwise a complete new mets file is created and overwrites the
        existing one (false) -->
		<mergeRecords>true</mergeRecords>

		<!-- define if children shall be analysed as well. If a sub element contains an 
		identifier, the metadata will get imported as well -->
		<analyseSubElements>true</analyseSubElements>

		<!-- execute an automatic export of updated records;
        this is only executed if mergeRecords is set to true -->
		<exportUpdatedRecords>false</exportUpdatedRecords>

       <!-- fieldList: Must have a mode attribute which can contain either blacklist or whitelist as a value.
            blacklist: All fields are updated except the defined ones. This is a potential dangerous setting!
            whitelist: Only the definied fields are updated. All others are skipped. 
            field: Use the internal metadata names from the ruleset as field definition
        -->
         <fieldList mode="blacklist">
            <field>viewerinstance</field>
            <field>singleDigCollection</field>
            <field>pathimagefiles</field>
            <field>_urn</field>
            <field>_representative</field>
         </fieldList>
        
		<!-- alwaysExecuteStepList: specify steps that shall be performed after each run of the rule
            step: name of the step that shall be executed
         -->
        <alwaysExecuteStepList>
            <step>resize images</step>
       </alwaysExecuteStepList>

        <!-- internal timestamp for the plugin to know when the rule was last executed -->
        <lastRun>1551731078691</lastRun>

	</rule>

	<rule type="filter" title="Archive project" enabled="false" startTime="21:00:00" delay="48">
		<filter>project:Archive</filter>
		<catalogue>K10Plus</catalogue>
		<catalogueField fieldName="12" fieldValue="$(meta.CatalogIDDigital)" />
		<mergeRecords>true</mergeRecords>
		<analyseSubElements>true</analyseSubElements>
		<exportUpdatedRecords>false</exportUpdatedRecords>
        <fieldList mode="whitelist">
		  <field>Author</field>
		  <field>PublicationYear</field>
		  <field>Subject</field>
		  <field>CreatorsAllOrigin</field>
        </fieldList>
	</rule>

	<rule type="filter" title="Manuscript project" enabled="false" startTime="23:00:00" delay="24">
		<filter>"project:Manuscript items"</filter>
		<catalogue>K10Plus</catalogue>
		<catalogueField fieldName="12" fieldValue="$(meta.CatalogIDDigital)" />
		<mergeRecords>true</mergeRecords>
		<analyseSubElements>true</analyseSubElements>
		<exportUpdatedRecords>false</exportUpdatedRecords>
        <fieldList mode="blacklist">
		  <field>TitleDocMain</field>
		  <field>CatalogueIDDigital</field>
		  <field>DocLanguage</field>
        </fieldList>fieldList mode="blacklist">
	</rule>

</config_plugin>

Attributes of the rule element

Subelements of the rule element

Additional elements/parameters - rule type filter

Additional elements/parameters - rule type hotfolder

Last updated