2.8 Indexing records
In order to index a record, the desired XML file (for example in METS/MODS, LIDO or TEI format) must be saved in the hotfolder. A separate XML file is required for each record to be imported.
Optionally, further folders can be stored next to the XML file so that these can be taken into account during indexing.
Folder suffix | File name suffix | Function |
_media | .jpg, .tif, .png, .jp2, .mp4, .avi, .mpg, .wav, ... | Media. Images, Video and Audio |
_txt | .txt | Plain text files |
_alto | .xml | ALTO |
_neralto | .xml | ALTO with enriched Named Entity Tags. If |
_xml | .xml | ABBYY XML |
_pdf | (pre-rendered) PDF pages | |
_src | *.* | Files that are to be offered directly for download |
_annotations | *.json | Web annotations |
_cms | *.xml | Texts from the CMS |
_downloadImages | -/- | The folder serves as an indicator to automatically download the images linked in the record during the indexing process. The functionality is currently implemented for the formats METS/MODS, LIDO and DenkXweb. |
The folders must have the file name of the XML file to be indexed (without its extension, but with the corresponding suffix). The following is an example of a directory structure in which the directory names are marked in bold:
hotfolder/
PPN123456789.xml
PPN123456789_media/
00000001.jpg
00000002.jpg
PPN123456789_alto/
00000001.xml
00000002.xml
AC987654321.xml
AC987654321_media/
prefix_0001.jp2
prefix_0002.jp2
prefix_0003.jp2
AC987654321_src/
additional_document.docx
File names in folders must always have the file name of the corresponding file in the media folder, for example for the image 00000001.jpg the ALTO file is 00000001.xml.
Since the Goobi viewer indexer starts indexing as soon as an XML file is found, indexing may be complete before the data folders have been copied. In this case the folders are not considered and remain in the hotfolder. Therefore, record XML files should not be copied into the hotfolder until the copying of the corresponding data folders has been completed.
If Goobi workflow is not used for exporting data to the hotfolder, make sure that the configuration meets the requirements described above.
Last updated