Libsafe Integration
This is a technical documentation for the integration of Libsafe long-term archiving.
Overview
Identifier
intranda_step_bagcreation,intranda_step_bagsubmission
Repository
Licence
GPL 2.0 or newer
Last change
25.07.2024 11:50:00
Introduction
This documentation describes the installation, configuration and use of the plugin for ingesting into the Libsafe long-term archiving system.
Using this plugin for Goobi, the metadata objects available in Goobi and additional descriptive documents can be combined into an E-ARK-BagIt and transferred to the Libsafe server.
Installation
The following files must be installed in order to use Libsafe Ingest:
Two new steps must be added to the workflow. Firstly, an automatic step that creates the E-ARK-based BagIt Submission Information Package (SIP)
, where intranda_step_bagcreation
must be selected as the plugin. A second automatic step is then required to handle the actual data delivery. The intranda_step_bagsubmission
plugin is required for this.
Overview and functionality
This plugin is integrated into the workflow so that it is executed automatically. Manual interaction with the plugin is not necessary. To use it within a workflow step, it should be configured as shown in the screenshot below.
The Long-term archiving consists of several sub-steps:
Folder structure
Firstly, the file and folder structure required for the SIP is created.
A metadata
folder and a representations
folder are created within a root folder. Within the metadata
folder there are the subfolders descriptive
and other
to store MODS files and other formats such as the DFG viewer extensions. Within representations
there are subfolders for different formats, each containing a subfolder data
in which the files are located.
Metadata
Each format has a METS file in which the files are listed in the data
folder. Each format is described in its own METS file, each of which contains a fileGroup
and a structMap
.
The metadata is described in MODS. There is a separate file for each structural element in the descriptive
folder. This file contains all metadata for which an export mapping has been defined in the rule set. As there may also be metadata that should not be exported in the regular export but must also be archived during long-term archiving, there is the option of defining additional export parameters in the configuration file that are only used for the Libsafe export.
Technical or administrative metadata is stored in the other
folder. A METS file is then created in the root folder, which refers to the other created METS, MODS and AMD.
SIP creation
The prepared data is now summarised in a SIP BagIt
. For this purpose, all files are provided with a checksum and listed in the file manifest-sha256.txt
. bagit.txt
contains information about the bag version and the encoding and bag-info.txt
contains information about the creator of the bag, the size, payload and the creation date, as well as some information about the transmission of the ingest status back to Goobi.
Finally, the tagmanifest-sha256.txt
file is created. This contains the names and checksums of the 3 files mentioned above.
Tar generation
The previously prepared folders and files are combined into a tar file and saved in the process folder.
Data delivery
Data is delivered via SFTP upload. For this purpose, the previously created SIP file is uploaded to the remote server. Alternatively, the data can be exported to a local directory on the server or a network share. The file name corresponds to the bag name and the suffix _bag.tar
.
Feedback to Goobi
The status message back to Goobi is sent via Rest API calls. There are various endpoints for providing the individual pieces of information. The Rest API can handle XML or JSON. To do this, the Accept
header must be set for GET requests and Content-Type
must be set to application/xml
or application/json
for other requests. If this is not specified, the default JSON is used.
Authentication can be carried out in 2 ways. The necessary methods can be enabled in goobi_rest.xml
for an IP address, in which case the requests from this one server work, or an API token can be generated. Individual methods can then be authorised for this API token without IP address restrictions. Authentication then takes place via the HTTP header Authorisation: Basic <TOKEN>
.
The processid
is required for all requests. This information is transmitted in two places. Firstly, it is part of the metadata and can be found in the MODS file in the field <mods:identifier type="GOOBI">
, alternatively it is transmitted in the field Process-ID
in bag-info.txt
.
Transmission of the Libsafe ID
To make the generated Libsafe ID known in Goobi, a POST
request must be sent to /process/<process id>/metadata
.
Success/error message
A message in the process journal can be created via a POST
request to /process/<process id>/journal
.
The variables USERNAME
and MESSAGE
can contain any text, TYPE
must be a value from the list error
, warn
, info
or debug
.
Status change
To complete the ingest process in Goobi, the ID of the step to be closed must be known. This ID can be determined via the Rest API by making a GET
request after all steps of the process.
The correct step and its ID can be found from the response using either steptitle
or status
. A PUT
request can then finalise the step:
Configuration
The plugin is configured in the file plugin_intranda_step_bagcreation.xml
, which is explained here:
The <config>
area can be repeated as often as required and therefore allows different metadata configurations or ingest to different destinations for different projects.
The sub-elements <project>
and <step>
are used to check whether the current block should be used for the current step. The system first checks whether there is an entry that contains both the project name and the step name. If this is not the case, the system searches for an entry for any projects marked with *
and the step name used. If no entry is found for this either, a search is carried out for the project name and any steps, otherwise the default block is used, in which both <project>
and <step>
contain *
.
The various <mets:fileGrp>
elements are defined here. Each filegroup
corresponds to a file format that is taken into account during delivery. Each defined element contains the attributes folder
, fileGrpName
, prefix
, suffix
and mimeType
, as well as useOriginalFileExtension
.
The folder to be used is specified in folder
. First, a check is made to see whether the folder exists and contains files. If this is the case, a folder is created in the SIP folder structure that corresponds to the fileGrpName
. This specification is also used as USE
within the METS file. The individual <mets:file>
specifications within the fileGroup
are composed of prefix
, the actual file name and suffix
:
Optionally, useOriginalFileExtension="true"
can be used to specify that file Extension
and MIMETYPE
are automatically determined individually for each file. This works both for files directly in the specified folder and for files in subfolders.
The individual parameters, which are also known from the project configuration, are then configured. As different entries may be required here than in the regular export to the Goobi viewer, different entries can be made here:
The individual parameters and their function are described in the Goobi workflow manual.
The <submissionParameter>
section contains information about the owner of the data, which is written to bag-info.txt
.
In addition to these fields, the bag-info.txt
file also contains a range of other information, such as creation date, size of the set and Oxum, which do not need to be configured as these are determined automatically.
The <additionalMetadata>
section is used to extend the rule set. A mapping can be added here for metadata, corporate bodies, persons or groups for which no export mapping is provided in the rule set because this information should not be published in the regular export to the Goobi viewer.
The syntax is identical to the MODS mapping in the rule set.
The last step is to configure the access data for the SFTP transfer.
Authentication can be carried out using either a username and password or a private/public key. To authenticate using a password, the <keyfile>
field remains empty. Otherwise, the key configured there is used.
<hostname>
and <port>
describe the access to the remote server. A target folder on the server can be specified using <remoteFolder>
if the upload is not to take place in the root directory. <knownHostsFile>
contains the path to a known_hosts file, which must contain a fingerprint of the host.
Last updated