Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
As a workflow management application for the library environment, Goobi has to be able to deal with a wide range of specific configurations and project-specific requirements. To this end, it has been designed in line with established conventions. These cover individual directory structures and the way Goobi uses these structures in different areas of the application. This section outlines the directory structures that have proven most effective and explains how external storage is integrated into the system.
Depending on the way Goobi has been installed, the import
directory will contain a range of data, mostly on a temporary basis. By way of example, import plug-ins use this directory to enter metadata and associated digital content in order to create processes. The respective import plug-ins are also responsible for deleting files that are no longer needed.
The config
directory contains all the Goobi configuration files that do not have to be located within the application itself. These are listed below:
Depending on the specific installation, the config directory may also contain other configuration files in addition to those related to the application’s core components. Accordingly, we recommend that you also use this central configuration directory to store configurations for individual plug-ins that provide additional functionality.
For subsequent ease of maintenance, the paths and file names relating to the configuration of any new Goobi plug-ins that may be developed should also adhere to this convention.
Within Goobi, the UGH class library is used to process metadata, map PICA imports and generate METS files. In order to manage the huge variety of configuration options, UGH uses a mechanism known as rulesets. The rulesets
directory is the central storage location for these rulesets. It allows you to make individual configurations available for different projects and types of publication.
A range of scripts can be made available centrally in the scripts
directory. These scripts can be used within the workflow to automate certain tasks.
The Goobi installation path may vary depending on your installation. Typically, the base path for web applications on an Ubuntu Linux system within an Apache Tomcat servlet container is shown below:
Accordingly, the Goobi application is located on the following path within the file system:
Goobi workflow allows operation with S3-compatible storage. It should be noted that a local file system is still required to store the metadata. This means that the files meta.xml
, meta_anchor.xml
and their backups, which exist for each process, will continue to be stored in the file system. Only all other data, such as images and OCR results, are stored on the S3 storage area.
To run Goobi with S3 as storage, the following two settings must be set within the configuration file goobi_config.properties
:
Goobi workflow uses the AWS Java SDK internally. This means that the credentials for accessing the storage system are read either from $HOME/.aws
or from environment variables. If another S3 provider is to be used instead of AWS, the connection can be configured relatively granularly. This requires a few more settings within the same configuration files:
Using S3 as a storage system should basically work with all S3-compatible APIs. During the development of the S3 functionality, both Amazon S3 and MinIO were used for the implementation.
As a web-based application, Goobi has its own structure and is located on a defined path in the file system independently of the servlet container being used. This section explains how to organise the directory structures within which Goobi saves its data and the different configuration files.
The base path for all digitisation software in the Goobi environment is:
The following directories are usually located on this base path:
The logs
directory is the main directory for log files. Goobi log files are also stored here (assuming the system is properly configured). The other directories listed above relate to frequently used applications (e.g. viewer
for the Goobi viewer, itm
for the intranda Task Manager and goobi
for Goobi.
The base path for Goobi is:
In most cases, this base path will accommodate the following folder structure (see below for details of each sub-directory):
Depending on the way Goobi has been installed, the plugins
directory may contain a number of plug-ins that perform imports or call Web API commands. Depending on the task, the compiled plug-ins are located in either of the directories shown below:
Goobi uses a mechanism called XSLT transformation to generate dockets as PDF files. This involves generating PDF documents from existing xml files. This is done on the basis of xslt files located centrally in the xslt
directory.
Most digitisation projects involve handling very large volumes of data. In most cases, this makes it necessary to link external storage capacity to the server. This can be done in a number of ways. We recommend that the external storage is linked to the following folder in the directory tree:
This means that all Goobi data can be found in a central location.
Two solutions for integrating external storage are explained in schematic form below. We do not recommend linking via CIFS as this can affect performance and functionality. Furthermore, CIFS does not allow you to produce symbolic links or read-only rights.
The following information is required if you wish to integrate external storage via an NFS Share
• exporting server • exporting directory
You can then add the storage to the directory tree via NFS. It is a good idea to add an entry into the file /etc/fstab that automatically sets up the link when the system starts up. This entry could be as follows:
Another way of integrating external storage is to attach it to the virtual machine as an independent device. This can be different iSCSIs or SAN LUNs. They are subsequently combined into a logical volume in the virtual machine using LVM. The result is an aggregated storage unit based on a number of devices.
The metadata
sub-directory is the central directory for storing metadata and digital content generated by Goobi. For each Goobi process, it contains a directory with the name of the process ID. Directories for individual Goobi processes are structured as follows:
Depending on the configuration, the central metadata file meta.xml may be accompanied by other back-up files, e.g. meta.xml.1
, meta.xml.2
, meta.xml.3
Within the workflow, the images
directory is accessible for a limited period to various users. Its structure is shown below:
When you are working with digital content, the most important directory is the one ending in _media
. The directory beginning with master_
is normally used to store all master images in unmanipulated status. The other directories are intended to be freely accessible within the workflow and can be added to whenever necessary. Both the directory ending in source and the directory ending in _media
are copied when exporting to the presentation system (e.g. intranda viewer).
The images directory may be accompanied by an ocr
directory. This contains all the OCR results that are generated within the workflow and added to the process. There is a separate directory for each format of the OCR results.
Smaller versions of the images in images
can be saved in the thumbs
folder, which Goobi uses to display the images in low resolution. This considerably increases the speed of image display for larger images. For each subfolder of images
, one or more subfolders can be created in thumbs
with the same name as the images
subfolder, extended by an additional underscore _ and a size specification in pixels. This size specification must correspond to the maximum height and width of the images in the respective subfolder. The file names of the images in the thumbs
subfolder must correspond to those of the images in the corresponding images
subfolder, but with the file extension .jpg
.
If there are matching images in thumbs
for an image file in images
, these are automatically used in Goobi to display thumbnails and zoomable images when zoomed out.
The validation
directory is used in cases where automatic validation (e.g. of the images) is performed on the Goobi server and in the workflows.
A sub-folder is created within this directory each time validation is performed. This makes it possible to retain older validation results. As illustrated below, the name generated for each sub-folder contains the date, time and type of validation.
If Goobi is being used with the intranda TaskManager, a taskmanager
directory will also be found within the folder. This is where TaskManager stores temporary data to perform tasks that require lengthy processing. Depending on the configuration, it is also used to permanently store and maintain all the ticket and template files created each time the TaskManager is called. The directory is made up as follows:
Again depending on the installation, for each Goobi process there is an import
directory, which is used by import plug-ins to store original source files for the process in question. Catalogue dataset files and other source files that have been manually read in and imported can be stored here and used in scripts as part of workflow processing. The folder structure could be as follows:\