2.2 Directories

For operation, a number of folders must be configured for the Goobi viewer Indexer from which files can be read or in which files can be stored. If certain folders do not exist, they are created automatically, but the path configurations must not be missing.

<init>
     <hotFolder>/opt/digiverso/viewer/hotfolder/</hotFolder>
     <!--hotfolder>/opt/digiverso/viewer/reindex_hotfolder/</hotfolder-->
     <tempFolder>/opt/digiverso/indexer/temp/</tempFolder>
     <viewerHome>/opt/digiverso/viewer/</viewerHome>
     <dataRepositories>
          <strategy>SingleRepositoryStrategy</strategy>
          <maxRecords>10000</maxRecords>
          <dataRepository buffer="10G">/opt/digiverso/viewer/data/1</dataRepository>
          [...]
     </dataRepositories>
     <mediaFolder>media</mediaFolder>
     <pyramidTiffFolder>ptif</pyramidTiffFolder>
     <altoFolder>alto</altoFolder>
     <fulltextFolder>fulltext</fulltextFolder>
     <fulltextCrowdsourcingFolder>fulltext_crowd</fulltextCrowdsourcingFolder>
     <wcFolder>wc</wcFolder>
     <pagePdfFolder>pdf</pagePdfFolder>
     <sourceContentFolder>source</sourceContentFolder>
     <userGeneratedContentFolder>ugc</userGeneratedContentFolder>
     <annotationFolder>annotations</annotationFolder>
     <indexedMets>indexed_mets</indexedMets>
     <indexedLido>indexed_lido</indexedLido>
     <indexedDenkXweb>indexed_denkxweb</indexedDenkXweb>
     <indexedDublinCore>indexed_dublincore</indexedDublinCore>
     <successFolder>/opt/digiverso/viewer/success/</successFolder>
     <updatedMets>/opt/digiverso/viewer/updated_mets/</updatedMets>
     <deletedMets>/opt/digiverso/viewer/deleted_mets/</deletedMets>
     <errorMets>/opt/digiverso/viewer/error_mets/</errorMets>
     <origLido>/opt/digiverso/viewer/orig_lido/</origLido>
     <origDenkXweb>/opt/digiverso/viewer/orig_denkxweb/</origDenkXweb>
</init>

The parameters are explained in detail in the following table:

Setting

Description

hotFolder

Contents to be indexed are stored in this folder. The Goobi viewer Indexer checks the folder for new XML files at short intervals. If new files are found, they are indexed one after the other (if a supported data format exists) and removed from the hotfolder. The element can be specified twice. The order in the configuration file is crucial. The first entry is the default hotfolder, the second can be used for reindexing and lower priority work.

tempFolder

Folder for temporary files.

viewerHome

Basic path of the Goobi viewer Core directory structure in the file system.

dataRepositories/strategy

There are three possible strategies:

  1. SingleRepositoryStrategy (Standard)

  2. MaxRecordNumberStrategy (maximum number of records per DataRepository)

  3. RemainingSpaceStrategy (smallest sufficient disk space)

The default configuration with the SingleRepositoryStrategy writes everything to a folder.

dataRepositories/maxRecords

The maximum number of records that a data repository can contain. The default value is 10000. This value is only evaluated for the MaxRecordNumberStrategy.

dataRepositories/dataRepository

This element may exist any number of times and defines the individual data repositories. The full path to the data repository must be entered. Each data repository contains a complete folder structure for media files, XML, full texts, and so on. These are created automatically.

With the optional attribute buffer a memory buffer can be defined especially for RemainingSpaceStrategy, which should remain unused. You can specify sizes in bytes, megabytes (large "M" after the number) or gigabytes (large "G" after the number). The default value is 0 bytes.

mediaFolder

This folder serves as a storage for any media files (images, video and audio) of an indexed object. These are each stored in a subfolder that has the identifier of the respective object as its name. The media files must always be available, as they are loaded from this folder into the Goobi viewer. This folder is searched for or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

altoFolder

ALTO XML files are stored in this folder. These files contain detailed OCR results and can be used to extract both full texts and word coordinates. This folder is searched for or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

altoCrowdsourcingFolder

This folder also contains ALTO XML files. However, these come from the crowdsourcing functions of the Goobi viewer. This means that if an ALTO document from crowdsourcing exists for a page, it is indexed, and not the document from OCR.

fulltextFolder

Here the (plaintext) fulltext files are stored after indexing. The files are stored in a subfolder, which has the identifier of the object as name. Although they are not required for the operation of the Goobi viewer (the full texts are indexed completely), they can be used for any re-indexing of an object. (in the event that no full text folder is found in the hot folder, the Goobi viewer Indexer searches for an existing full text folder from previous indexing). Please note the following: If an ALTO document is also available for a page, this is preferred for indexing full texts. This folder is searched for or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

fulltextCrowsourcingFolder

This folder also contains simple full text files. However, these come from the crowdsourcing functions of the Goobi viewer. This means that if a full-text document from crowdsourcing is available for a page, it is indexed, and not the document from OCR.

wcFolder

Here the TEI word coordinates files are stored after indexing. These are each stored in a subfolder that has the identifier of the respective object as its name. Although they are not required for the operation of the Goobi viewer (the word coordinates are completely indexed), they can be reused for any re-indexing of an object (in the event that no word coordinates folder is found in the hotfolder, the Goobi viewer indexer searches for an existing word coordinates folder from previous indexing). Please note the following: If an ALTO document is also available for a page, this is preferred for generating word coordinates. This folder is searched or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

pagePdfFolder

Pre-rendered PDF files for the individual pages of the object are stored here after indexing. These are each stored in a subfolder that has the identifier of the respective object as its name. If these files are present, the generation of PDF documents for the object in question can be considerably accelerated. This folder is searched or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

sourceContentFolder

Files are stored here that are to be offered for direct download for the object (e.g. Born Digital materials). These files are stored in a subfolder which contains the identifier of the respective object as its name. For each file located here, a download link is displayed for the respective object. This folder is searched or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

userGeneratedContentFolder

XML documents are stored here that originate user-generated content from the crowdsourcing functions of the Goobi viewer. These are used to display and search for this content in normal operation. This folder is searched or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

annotationFolder

Contains JSON WebAnnotations that were created using a crowdsourcing campaign, for example.

indexedMets

The METS files are stored here after indexing. They are not required for general operation of the Goobi viewer, but must be available if a document is requested via the METS resolver. This folder is searched or created relative to dataRepositoriesHome (for dataRepositories/enabled = true) or viewerHome (for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.

indexedLido

The LIDO files of individual objects are stored here after indexing. They are not required for general operation of the Goobi viewer, but must be available if a document is requested via the LIDO resolver.

indexedDenkXweb

Here the DenkXweb files of single monuments are stored after indexing.

indexedDublinCore

Here the Dublin Core files of records created in the Admin Backend are stored after indexing.

updatedMets

If a METS or LIDO file is reindexed, the previous version of this file is archived here. The time stamp of the respective reindexing is appended to the file name.

For the Goobi viewer, this folder is not relevant, but must still exist.

deletedMets

If an object is deleted from the index, the relevant METS or LIDO file is stored here.

For the Goobi viewer, this folder is not relevant, but must still exist.

successFolder

Files are stored here that are used to signal Goobi that indexing has been successful. On the basis of these files, Goobi learns the outcome of the indexing of a process and reports this to the user.

For the Goobi viewer, this folder is not relevant, but must still exist.

errorMets

If the indexing of an object fails, the relevant METS or LIDO file is stored here. In addition, the error message generated by the Goobi viewer Indexer is written to a log file and also stored there. On the basis of these files, Goobi learns the outcome of the indexing of a process and reports this to the user.

For the Goobi viewer, this folder is not relevant, but must still exist.

origLido

This is where the original LIDO files, as found in the hotfolder, are stored. These may contain thousands of objects, which are initially split into individual LIDO data records. The original files are not necessary for operation and are only used for archiving.

For the Goobi viewer, this folder is not relevant, but must still exist.

origDenkXweb

Here the original DenkXweb files as found in the hotfolder are stored. These files may contain thousands of monuments, which are split into single DenkXweb datasets. The original files are not necessary for operation and are only used for archiving.

For the Goobi viewer, this folder is not relevant, but must still exist.

Last updated