2.2 Directories
For operation, a number of folders must be configured for the Goobi viewer Indexer from which files can be read or in which files can be stored. If certain folders do not exist, they are created automatically, but the path configurations must not be missing.
The parameters are explained in detail in the following table:
Setting
Description
hotFolder
Contents to be indexed are stored in this folder. The Goobi viewer Indexer checks the folder for new XML files at short intervals. If new files are found, they are indexed one after the other (if a supported data format exists) and removed from the hotfolder. The element can be specified twice. The order in the configuration file is crucial. The first entry is the default hotfolder, the second can be used for reindexing and lower priority work.
tempFolder
Folder for temporary files.
viewerHome
Basic path of the Goobi viewer Core directory structure in the file system.
dataRepositories/strategy
There are three possible strategies:
SingleRepositoryStrategy
(Standard)MaxRecordNumberStrategy
(maximum number of records per DataRepository)RemainingSpaceStrategy
(smallest sufficient disk space)
The default configuration with the SingleRepositoryStrategy
writes everything to a folder.
dataRepositories/maxRecords
The maximum number of records that a data repository can contain. The default value is 10000
. This value is only evaluated for the MaxRecordNumberStrategy
.
dataRepositories/dataRepository
This element may exist any number of times and defines the individual data repositories. The full path to the data repository must be entered. Each data repository contains a complete folder structure for media files, XML, full texts, and so on. These are created automatically.
With the optional attribute buffer
a memory buffer can be defined especially for RemainingSpaceStrategy
, which should remain unused. You can specify sizes in bytes, megabytes (large "M" after the number) or gigabytes (large "G" after the number). The default value is 0 bytes.
mediaFolder
This folder serves as a storage for any media files (images, video and audio) of an indexed object. These are each stored in a subfolder that has the identifier of the respective object as its name. The media files must always be available, as they are loaded from this folder into the Goobi viewer. This folder is searched for or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false
). For this reason, the value may only contain the name and no absolute path.
altoFolder
ALTO XML files are stored in this folder. These files contain detailed OCR results and can be used to extract both full texts and word coordinates. This folder is searched for or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false
). For this reason, the value may only contain the name and no absolute path.
altoCrowdsourcingFolder
This folder also contains ALTO XML files. However, these come from the crowdsourcing functions of the Goobi viewer. This means that if an ALTO document from crowdsourcing exists for a page, it is indexed, and not the document from OCR.
fulltextFolder
Here the (plaintext) fulltext files are stored after indexing. The files are stored in a subfolder, which has the identifier of the object as name. Although they are not required for the operation of the Goobi viewer (the full texts are indexed completely), they can be used for any re-indexing of an object. (in the event that no full text folder is found in the hot folder, the Goobi viewer Indexer searches for an existing full text folder from previous indexing). Please note the following: If an ALTO document is also available for a page, this is preferred for indexing full texts. This folder is searched for or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.
fulltextCrowsourcingFolder
This folder also contains simple full text files. However, these come from the crowdsourcing functions of the Goobi viewer. This means that if a full-text document from crowdsourcing is available for a page, it is indexed, and not the document from OCR.
wcFolder
Here the TEI word coordinates files are stored after indexing. These are each stored in a subfolder that has the identifier of the respective object as its name. Although they are not required for the operation of the Goobi viewer (the word coordinates are completely indexed), they can be reused for any re-indexing of an object (in the event that no word coordinates folder is found in the hotfolder, the Goobi viewer indexer searches for an existing word coordinates folder from previous indexing). Please note the following: If an ALTO document is also available for a page, this is preferred for generating word coordinates. This folder is searched or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false
). For this reason, the value may only contain the name and no absolute path.
pagePdfFolder
Pre-rendered PDF files for the individual pages of the object are stored here after indexing. These are each stored in a subfolder that has the identifier of the respective object as its name. If these files are present, the generation of PDF documents for the object in question can be considerably accelerated. This folder is searched or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false). For this reason, the value may only contain the name and no absolute path.
sourceContentFolder
Files are stored here that are to be offered for direct download for the object (e.g. Born Digital materials). These files are stored in a subfolder which contains the identifier of the respective object as its name. For each file located here, a download link is displayed for the respective object. This folder is searched or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false
). For this reason, the value may only contain the name and no absolute path.
userGeneratedContentFolder
XML documents are stored here that originate user-generated content from the crowdsourcing functions of the Goobi viewer. These are used to display and search for this content in normal operation. This folder is searched or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false
). For this reason, the value may only contain the name and no absolute path.
annotationFolder
Contains JSON WebAnnotations that were created using a crowdsourcing campaign, for example.
indexedMets
The METS files are stored here after indexing. They are not required for general operation of the Goobi viewer, but must be available if a document is requested via the METS resolver. This folder is searched or created relative to dataRepositoriesHome
(for dataRepositories/enabled = true
) or viewerHome
(for dataRepositories/enabled = false
). For this reason, the value may only contain the name and no absolute path.
indexedLido
The LIDO files of individual objects are stored here after indexing. They are not required for general operation of the Goobi viewer, but must be available if a document is requested via the LIDO resolver.
indexedDenkXweb
Here the DenkXweb files of single monuments are stored after indexing.
indexedDublinCore
Here the Dublin Core files of records created in the Admin Backend are stored after indexing.
updatedMets
If a METS or LIDO file is reindexed, the previous version of this file is archived here. The time stamp of the respective reindexing is appended to the file name.
For the Goobi viewer, this folder is not relevant, but must still exist.
deletedMets
If an object is deleted from the index, the relevant METS or LIDO file is stored here.
For the Goobi viewer, this folder is not relevant, but must still exist.
successFolder
Files are stored here that are used to signal Goobi that indexing has been successful. On the basis of these files, Goobi learns the outcome of the indexing of a process and reports this to the user.
For the Goobi viewer, this folder is not relevant, but must still exist.
errorMets
If the indexing of an object fails, the relevant METS or LIDO file is stored here. In addition, the error message generated by the Goobi viewer Indexer is written to a log file and also stored there. On the basis of these files, Goobi learns the outcome of the indexing of a process and reports this to the user.
For the Goobi viewer, this folder is not relevant, but must still exist.
origLido
This is where the original LIDO files, as found in the hotfolder, are stored. These may contain thousands of objects, which are initially split into individual LIDO data records. The original files are not necessary for operation and are only used for archiving.
For the Goobi viewer, this folder is not relevant, but must still exist.
origDenkXweb
Here the original DenkXweb files as found in the hotfolder are stored. These files may contain thousands of monuments, which are split into single DenkXweb datasets. The original files are not necessary for operation and are only used for archiving.
For the Goobi viewer, this folder is not relevant, but must still exist.
Last updated
Was this helpful?