2.8 Indexing records

In order to index a record, the desired XML file (for example in METS/MODS, LIDO or TEI format) must be saved in the hotfolder. A separate XML file is required for each record to be imported.

Optionally, further folders can be stored next to the XML file so that these can be taken into account during indexing.

Folder suffix

File name suffix

Function

_media

.jpg, .tif, .png, .jp2, .mp4, .avi, .mpg, .wav, ...

Media. Images, Video and Audio

_txt

.txt

Plain text files

_alto

.xml

ALTO

_neralto

.xml

ALTO with enriched Named Entity Tags. If _alto and _neralto are present, the latter is preferred.

_xml

.xml

ABBYY XML

_pdf

.pdf

(pre-rendered) PDF pages

_src

*.*

Files that are to be offered directly for download

_annotations

*.json

Web annotations

_cms

*.xml

Texts from the CMS

_downloadImages

-/-

The folder serves as an indicator to automatically download the images linked in the record during the indexing process. The functionality is currently implemented for the formats METS/MODS, LIDO and DenkXweb.

The folders must have the file name of the XML file to be indexed (without its extension, but with the corresponding suffix). The following is an example of a directory structure in which the directory names are marked in bold:

  • hotfolder/

    • PPN123456789.xml

    • PPN123456789_media/

      • 00000001.jpg

      • 00000002.jpg

    • PPN123456789_alto/

      • 00000001.xml

      • 00000002.xml

    • AC987654321.xml

    • AC987654321_media/

      • prefix_0001.jp2

      • prefix_0002.jp2

      • prefix_0003.jp2

    • AC987654321_src/

      • additional_document.docx

File names in folders must always have the file name of the corresponding file in the media folder, for example for the image 00000001.jpg the ALTO file is 00000001.xml.

Since the Goobi viewer indexer starts indexing as soon as an XML file is found, indexing may be complete before the data folders have been copied. In this case the folders are not considered and remain in the hotfolder. Therefore, record XML files should not be copied into the hotfolder until the copying of the corresponding data folders has been completed.

If Goobi workflow is not used for exporting data to the hotfolder, make sure that the configuration meets the requirements described above.

Last updated