docType
controls which structure types the entries extracted from the PDF table of contents receive in the METS file. The parent element is the main element in which all other table of contents entries land. If it is omitted, all entries are entered directly into the main element of the METS file.images
-tag controls the resolution (in DPI) and the output format for the extracted images.properties
, process properties are written depending on the result of the extraction. The configuration given here as an example writes the process property OCRDone
with value YES
if full text was found in the PDF and with value NO
if there was no full text in the PDF file. This is particularly helpful if the workflow is to be changed afterwards, for example to omit an OCR step if full text already exists.