1 von 8

Global directory structure

As a web-based application, Goobi has its own structure and is located on a defined path in the file system independently of the servlet container being used. This section explains how to organise the directory structures within which Goobi saves its data and the different configuration files.

The base path for all digitisation software in the Goobi environment is:

The following directories are usually located on this base path:

The logs directory is the main directory for log files. Goobi log files are also stored here (assuming the system is properly configured). The other directories listed above relate to frequently used applications (e.g. viewer for the Goobi viewer, itm for the intranda Task Manager and goobi for Goobi.

The base path for Goobi is:

In most cases, this base path will accommodate the following folder structure (see below for details of each sub-directory):

‘config’ sub-directory

The config directory contains all the Goobi configuration files that do not have to be located within the application itself. These are listed below:

Depending on the specific installation, the config directory may also contain other configuration files in addition to those related to the application’s core components. Accordingly, we recommend that you also use this central configuration directory to store configurations for individual plug-ins that provide additional functionality.

For subsequent ease of maintenance, the paths and file names relating to the configuration of any new Goobi plug-ins that may be developed should also adhere to this convention.

‘import’ sub-directory

Depending on the way Goobi has been installed, the import directory will contain a range of data, mostly on a temporary basis. By way of example, import plug-ins use this directory to enter metadata and associated digital content in order to create processes. The respective import plug-ins are also responsible for deleting files that are no longer needed.

‘metadata’ sub-directory

The metadata sub-directory is the central directory for storing metadata and digital content generated by Goobi. For each Goobi process, it contains a directory with the name of the process ID. Directories for individual Goobi processes are structured as follows:

1234/
1234/meta.xml
1234/images/
1234/ocr/
1234/import/
1234/validation/
1234/taskmanager/
1234/thumbs/

Depending on the configuration, the central metadata file meta.xml may be accompanied by other back-up files, e.g. meta.xml.1, meta.xml.2, meta.xml.3

Images

Within the workflow, the images directory is accessible for a limited period to various users. Its structure is shown below:

1234/images/abc_media/
1234/images/abc_jpg/
1234/images/abc_source/
1234/images/master_abc_media/

When you are working with digital content, the most important directory is the one ending in _media. The directory beginning with master_ is normally used to store all master images in unmanipulated status. The other directories are intended to be freely accessible within the workflow and can be added to whenever necessary. Both the directory ending in source and the directory ending in _media are copied when exporting to the presentation system (e.g. intranda viewer).

OCR

The images directory may be accompanied by an ocr directory. This contains all the OCR results that are generated within the workflow and added to the process. There is a separate directory for each format of the OCR results.

1234/ocr/abc_alto/
1234/ocr/abc_doc/
1234/ocr/abc_pdf/
1234/ocr/abc_wc/
1234/ocr/abc_xml/

Thumbnails

Smaller versions of the images in images can be saved in the thumbs folder, which Goobi uses to display the images in low resolution. This considerably increases the speed of image display for larger images. For each subfolder of images, one or more subfolders can be created in thumbs with the same name as the images subfolder, extended by an additional underscore _ and a size specification in pixels. This size specification must correspond to the maximum height and width of the images in the respective subfolder. The file names of the images in the thumbs subfolder must correspond to those of the images in the corresponding images subfolder, but with the file extension .jpg.

1234/thumbs/abc_media_800/
1234/thumbs/abc_media_3000/
1234/thumbs/master_abc_media_800/
1234/thumbs/master_abc_media_3000/

If there are matching images in thumbs for an image file in images, these are automatically used in Goobi to display thumbnails and zoomable images when zoomed out.

Validation

The validation directory is used in cases where automatic validation (e.g. of the images) is performed on the Goobi server and in the workflows.

A sub-folder is created within this directory each time validation is performed. This makes it possible to retain older validation results. As illustrated below, the name generated for each sub-folder contains the date, time and type of validation.

1234/validation/2012-11-20_11-20-01_jpylyzer/
1234/validation/2012-11-20_12-02-13_jhove/
1234/validation/2012-11-23_08-12-56_jpylyzer/

If Goobi is being used with the intranda TaskManager, a taskmanager directory will also be found within the folder. This is where TaskManager stores temporary data to perform tasks that require lengthy processing. Depending on the configuration, it is also used to permanently store and maintain all the ticket and template files created each time the TaskManager is called. The directory is made up as follows:

1234/taskmanager/2012-11-23_08-12-56_jp2validate/
1234/taskmanager/2012-11-25_14-38-15_iii-create_jpeg/

Import

Again depending on the installation, for each Goobi process there is an import directory, which is used by import plug-ins to store original source files for the process in question. Catalogue dataset files and other source files that have been manually read in and imported can be stored here and used in scripts as part of workflow processing. The folder structure could be as follows:\

1234/import/abc.mrc
1234/import/abc.original.pdf
1234/import/eod/

‘plugins’ sub-directory

Depending on the way Goobi has been installed, the plugins directory may contain a number of plug-ins that perform imports or call Web API commands. Depending on the task, the compiled plug-ins are located in either of the directories shown below:

‘rulesets’ sub-directory

Within Goobi, the UGH class library is used to process metadata, map PICA imports and generate METS files. In order to manage the huge variety of configuration options, UGH uses a mechanism known as rulesets. The rulesets directory is the central storage location for these rulesets. It allows you to make individual configurations available for different projects and types of publication.

‘scripts’ sub-directory

A range of scripts can be made available centrally in the scripts directory. These scripts can be used within the workflow to automate certain tasks.

‘xslt’ sub-directory

Goobi uses a mechanism called XSLT transformation to generate dockets as PDF files. This involves generating PDF documents from existing xml files. This is done on the basis of xslt files located centrally in the xslt directory.