2.2 Directories
For operation, a number of folders must be configured for the Goobi viewer Indexer from which files can be read or in which files can be stored. If certain folders do not exist, they are created automatically, but the path configurations must not be missing.
The parameters are explained in detail in the following table:
Setting | Description |
hotFolder | Contents to be indexed are stored in this folder. The Goobi viewer Indexer checks the folder for new XML files at short intervals. If new files are found, they are indexed one after the other (if a supported data format exists) and removed from the hotfolder. The element can be specified twice. The order in the configuration file is crucial. The first entry is the default hotfolder, the second can be used for reindexing and lower priority work. |
tempFolder | Folder for temporary files. |
viewerHome | Basic path of the Goobi viewer Core directory structure in the file system. |
dataRepositories/strategy | There are three possible strategies:
The default configuration with the |
dataRepositories/maxRecords | The maximum number of records that a data repository can contain. The default value is |
dataRepositories/dataRepository | This element may exist any number of times and defines the individual data repositories. The full path to the data repository must be entered. Each data repository contains a complete folder structure for media files, XML, full texts, and so on. These are created automatically. With the optional attribute |
mediaFolder | This folder serves as a storage for any media files (images, video and audio) of an indexed object. These are each stored in a subfolder that has the identifier of the respective object as its name. The media files must always be available, as they are loaded from this folder into the Goobi viewer. This folder is searched for or created relative to |
altoFolder | ALTO XML files are stored in this folder. These files contain detailed OCR results and can be used to extract both full texts and word coordinates. This folder is searched for or created relative to |
altoCrowdsourcingFolder | This folder also contains ALTO XML files. However, these come from the crowdsourcing functions of the Goobi viewer. This means that if an ALTO document from crowdsourcing exists for a page, it is indexed, and not the document from OCR. |
fulltextFolder | Here the (plaintext) fulltext files are stored after indexing. The files are stored in a subfolder, which has the identifier of the object as name. Although they are not required for the operation of the Goobi viewer (the full texts are indexed completely), they can be used for any re-indexing of an object. (in the event that no full text folder is found in the hot folder, the Goobi viewer Indexer searches for an existing full text folder from previous indexing). Please note the following: If an ALTO document is also available for a page, this is preferred for indexing full texts. This folder is searched for or created relative to |
fulltextCrowsourcingFolder | This folder also contains simple full text files. However, these come from the crowdsourcing functions of the Goobi viewer. This means that if a full-text document from crowdsourcing is available for a page, it is indexed, and not the document from OCR. |
wcFolder | Here the TEI word coordinates files are stored after indexing. These are each stored in a subfolder that has the identifier of the respective object as its name. Although they are not required for the operation of the Goobi viewer (the word coordinates are completely indexed), they can be reused for any re-indexing of an object (in the event that no word coordinates folder is found in the hotfolder, the Goobi viewer indexer searches for an existing word coordinates folder from previous indexing). Please note the following: If an ALTO document is also available for a page, this is preferred for generating word coordinates. This folder is searched or created relative to |
pagePdfFolder | Pre-rendered PDF files for the individual pages of the object are stored here after indexing. These are each stored in a subfolder that has the identifier of the respective object as its name. If these files are present, the generation of PDF documents for the object in question can be considerably accelerated. This folder is searched or created relative to |
sourceContentFolder | Files are stored here that are to be offered for direct download for the object (e.g. Born Digital materials). These files are stored in a subfolder which contains the identifier of the respective object as its name. For each file located here, a download link is displayed for the respective object. This folder is searched or created relative to |
userGeneratedContentFolder | XML documents are stored here that originate user-generated content from the crowdsourcing functions of the Goobi viewer. These are used to display and search for this content in normal operation. This folder is searched or created relative to |
annotationFolder | Contains JSON WebAnnotations that were created using a crowdsourcing campaign, for example. |
indexedMets | The METS files are stored here after indexing. They are not required for general operation of the Goobi viewer, but must be available if a document is requested via the METS resolver. This folder is searched or created relative to |
indexedLido | The LIDO files of individual objects are stored here after indexing. They are not required for general operation of the Goobi viewer, but must be available if a document is requested via the LIDO resolver. |
indexedDenkXweb | Here the DenkXweb files of single monuments are stored after indexing. |
indexedDublinCore | Here the Dublin Core files of records created in the Admin Backend are stored after indexing. |
updatedMets | If a METS or LIDO file is reindexed, the previous version of this file is archived here. The time stamp of the respective reindexing is appended to the file name. For the Goobi viewer, this folder is not relevant, but must still exist. |
deletedMets | If an object is deleted from the index, the relevant METS or LIDO file is stored here. For the Goobi viewer, this folder is not relevant, but must still exist. |
successFolder | Files are stored here that are used to signal Goobi that indexing has been successful. On the basis of these files, Goobi learns the outcome of the indexing of a process and reports this to the user. For the Goobi viewer, this folder is not relevant, but must still exist. |
errorMets | If the indexing of an object fails, the relevant METS or LIDO file is stored here. In addition, the error message generated by the Goobi viewer Indexer is written to a log file and also stored there. On the basis of these files, Goobi learns the outcome of the indexing of a process and reports this to the user. For the Goobi viewer, this folder is not relevant, but must still exist. |
origLido | This is where the original LIDO files, as found in the hotfolder, are stored. These may contain thousands of objects, which are initially split into individual LIDO data records. The original files are not necessary for operation and are only used for archiving. For the Goobi viewer, this folder is not relevant, but must still exist. |
origDenkXweb | Here the original DenkXweb files as found in the hotfolder are stored. These files may contain thousands of monuments, which are split into single DenkXweb datasets. The original files are not necessary for operation and are only used for archiving. For the Goobi viewer, this folder is not relevant, but must still exist. |
Last updated