config_contentServer.xml

In the file config_contentServer.xml, technical details about the content server used in Goobi are provided. The configuration file can be used for Goobi Workflow and Goobi Viewer in the same way.

The file is usually located at the following location:

/opt/digiverso/goobi/config/config_contentServer.xml

For example, this configuration file looks as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<config>
	<localConfigPath value="/opt/digiverso/config/config_contentServer.xml" />
	<defaultResolution
		value="600" />
	<connectionTimeout>2000</connectionTimeout>
	<openJpeg>
		<binaries>/opt/digiverso/OpenJpeg</binaries>
	</openJpeg>
	<imageToPdfSizeFactor tiff="0.35" jpg="1.0" />
	<maxFileLength
		file=""
		value="0" />
	<scaling
		quality="QUALITY"
		thumbnailQuality="SPEED"
		maxStepSize="50" />
	<defaultFileNames>
		<image
			value="image_$datetime"
			sendAsAttachment="false" />
		<pdf
			value="ContentServer_$datetime"
			sendAsAttachment="true" />
	</defaultFileNames>
	<defaultRepositoryPathImages
		value="file:///opt/digiverso/viewer/media/" />
	<defaultRepositoryPathPdf
		usage="true"
		value="file:///opt/digiverso/viewer/pdf/" />
	<defaultRepositoryPathAlto
		usage="false"
		value="file:///opt/digiverso/viewer/alto/"
		fontFile="font.ttf" />
	<defaultRepositoryPathMets
		value="file:///opt/digiverso/viewer/mets/" />
	<defaultImageConfig defaultFormat="jpeg" />
	<defaultPdfConfig
		pagesize="A4"
		imageDpi="0"
		imageScale="1.0"
		imageScaleToBox=""
		imageCompression="0"
		convertToGrayscale="false"
		writeAsPdfA="false"
		metsFileGroup="PRESENTATION" 
		metsFileUrlRemove="file:\/\/\/opt\/digiverso\/viewer\/media\/" writeAsPdfA="false" />
	<defaultHighlightColor
		valueRed="255"
		valueGreen="255"
		valueBlue="0"
		valueAlpha="255" />
	<imageTypeSettings>
		<type format="png">
			<settings scaleWithScalr="false"></settings>
		</type>
		<type compression="jpeg" colorType="grayscale">
			<settings forceConvertToBuffered="true"></settings>
		</type>
		<type format="jpeg">
			<settings scaleWithScalr="true"></settings>
		</type>
		<type format="tiff" embeddedColorProfile="true">
			<settings scaleWithScalr="false"></settings>
		</type>
		<type format="jpg2000">
			<settings scaleWithScalr="false" allowSubSampling="true"></settings>
		</type>
		<type
			format="default"
			colorType="default"
			compression="default"
			embeddedColorProfile="both"
			minSize="0"
			maxSize="0">
			<settings
				allowRenderWithJAI="false"
				scaleWithScalr="true"
				mergeWithJAI="false"
				forceConvertToBuffered="false"
				forceConvertToRGB="false"
				forwardDirectlyIfPossible="true"
				preferredImageReader="com.github.jaiimageio"
				preferredImageWriter="com.github.jaiimageio">
			</settings>
		</type>
	</imageTypeSettings>
	<watermark
		use="false"
		scale="true"
		convertColorSpace="false"
		scaleToPercent="6"
		configFile="file:///opt/digiverso/viewer/config/config_imageFooter.xml" />
	<errorWaterMark
		title="Error"
		titleFontSize="20" 
		messageFontSize="14"
		messageMaxLineLength="60" />
	<pdfTitlePage
		use="true"
		templateFolder="file:///opt/digiverso/viewer/config/PDFTitlePage/"
		defaultTemplate="default"
		configFile=""
		fontFile="" />
	<pdfChapterTitlePages
		use="true"
		templateFolder="file:///opt/digiverso/viewer/config/PDFTitlePage/"
		defaultTemplate="default" />
	<singlePdfTitlePage
		use="false"
		templateFolder="file:///opt/digiverso/viewer/config/PDFTitlePage/"
		defaultTemplate="default" />
	<restapi use="false">
		<iiif>
			<attribution></attribution>
			<logo></logo>
			<license></license>
		</iiif>
		<discloseContentLocation>true</discloseContentLocation>
	</restapi>
	<contentCache
		useCache="false"
		size="100"
		useShortFileNames="false"
		path=""
		cachePartialImages="false" />
	<thumbnailCache
		useCache="false"
		size="100"
		useShortFileNames="false"
		path="" />
	<pdfCache
		useCache="true"
		size="100"
		useShortFileNames="false" />
	<memoryUsage
		maximalParallelImageRequests="0"
		maximalParallelPdfRequests="0"
		lowMemoryShreshold="1000000000"
		memoryUnit="MB"
		timeoutUnit="s"
		triggerGCAfterAction="true">
		<image>
			<maxParallelRequests>20</maxParallelRequests>		
			<lowMemoryThreshold>200</lowMemoryThreshold>
			<lowMemoryTimeout>8</lowMemoryTimeout>
		</image>
		<pdf>
			<maxParallelRequests>8</maxParallelRequests>		
			<lowMemoryThreshold>500</lowMemoryThreshold>
			<lowMemoryTimeout>10</lowMemoryTimeout>
		</pdf>
		<metsPdf>
			<maxParallelRequests>4</maxParallelRequests>		
			<lowMemoryThreshold>1000</lowMemoryThreshold>
			<lowMemoryTimeout>20</lowMemoryTimeout>
		</metsPdf>
	</memoryUsage>
	<S3>
		<useCustom>true</useCustom>
		<Endpoint>http://192.168.178.124:9000</Endpoint>
		<AccessKeyID>24JW1VB99T8MAU94TBIC</AccessKeyID>
		<SecretAccessKey>OTrwBXGVBacVdXNSI7SKKrX8b+CqwANa5ngLZ4lB</SecretAccessKey>
	</S3>
</config>

Data types

In this configuration file settings are made with different data types. For overview all used types are explained briefly in the following table:

General settings and default values

In addition, the imageToPdfSizeFactor element can be used to set arbitrary parameters for file types for which a special factor should be used to scale from that file type to a PDF file (see example). In that case the image file extensions are used as parameter names and the scaling factors as values. A scaling of 1.0 means that the image size remains the same. Values below 1.0 or above 1.0 reduce or enlarge the image. The value 0 should not be used.

The size of image files can be limited on the content server. The following parameters can be used in the maxFileLength element:

For different application purposes it may be useful to scale images differently. The following parameters can be used in the scaling element:

For downloading images generated in the content server, a default file name can be specified here. The following parameters can be used in the image and pdf elements within the defaultFileNames element:

Default paths for locations of different file types can be specified for non-complete requests to the content server. The following parameters can be used in the defaultRepositoryPathImages, defaultRepositoryPathPdf, defaultRepositoryPathAlto, and defaultRepositoryPathMets elements:

The image data type initially includes all image file types which are not specified in detail. Therefore, defaultImageConfig can be used to make settings for image files. The following parameters can be used in the defaultImageConfig element:

PDF files are configured separately because they have some special properties unlike other image file formats. The following parameters can be used in the defaultPdfConfig element:

A color can be specified to mark different image elements. The following parameters can be used in the defaultHighlightColor element:

The following table shows some simple color examples:

Where alpha="255" means full opacity of the color (covers the image area completely) and alpha="0" means no opacity (invisible).

<localConfigPath value="/opt/digiverso/config/config_contentServer.xml" />
<defaultResolution
	value="600" />
<connectionTimeout>2000</connectionTimeout>
<openJpeg>
	<binaries>/opt/digiverso/OpenJpeg</binaries>
</openJpeg>
<imageToPdfSizeFactor tiff="0.35" jpg="1.0" />
<maxFileLength
	file=""
	value="0" />
<scaling
	quality="QUALITY"
	thumbnailQuality="SPEED"
	maxStepSize="50" />
<defaultFileNames>
	<image
		value="image_$datetime"
		sendAsAttachment="false" />
	<pdf
		value="ContentServer_$datetime"
		sendAsAttachment="true" />
</defaultFileNames>
<defaultRepositoryPathImages
	value="file:///opt/digiverso/viewer/media/" />
<defaultRepositoryPathPdf
	usage="true"
	value="file:///opt/digiverso/viewer/pdf/" />
<defaultRepositoryPathAlto
	usage="false"
	value="file:///opt/digiverso/viewer/alto/"
	fontFile="font.ttf" />
<defaultRepositoryPathMets
	value="file:///opt/digiverso/viewer/mets/" />
<defaultImageConfig defaultFormat="jpeg" />
<defaultPdfConfig
	pagesize="A4"
	imageDpi="0"
	imageScale="1.0"
	imageScaleToBox=""
	imageCompression="0"
	convertToGrayscale="false"
	writeAsPdfA="false"
	metsFileGroup="PRESENTATION" 
	metsFileUrlRemove="file:\/\/\/opt\/digiverso\/viewer\/media\/" writeAsPdfA="false" />
<defaultHighlightColor
	valueRed="255"
	valueGreen="255"
	valueBlue="0"
	valueAlpha="255" />

Image file types

Within the imageTypeSettings element, settings for any number of image file types can be defined. For each image file type a type element with the sub-element settings is specified.

For file types (type) the following parameters can be used:

Note: The minSize and maxSize parameters are queried as floating point numbers in the configuration, but later processed as long-numbers. Therefore, only integers should be specified.

For file type settings (settings) the following parameters can be used:

<imageTypeSettings>
	<type format="png">
		<settings scaleWithScalr="false"></settings>
	</type>
	<type compression="jpeg" colorType="grayscale">
		<settings forceConvertToBuffered="true"></settings>
	</type>
	<type format="jpeg">
		<settings scaleWithScalr="true"></settings>
	</type>
	<type format="tiff" embeddedColorProfile="true">
		<settings scaleWithScalr="false"></settings>
	</type>
	<type format="jpg2000">
		<settings scaleWithScalr="false" allowSubSampling="true"></settings>
	</type>
	<type
		format="default"
		colorType="default"
		compression="default"
		embeddedColorProfile="both"
		minSize="0"
		maxSize="0">
		<settings
			allowRenderWithJAI="false"
			scaleWithScalr="true"
			mergeWithJAI="false"
			forceConvertToBuffered="false"
			forceConvertToRGB="false"
			forwardDirectlyIfPossible="true"
			preferredImageReader="com.github.jaiimageio"
			preferredImageWriter="com.github.jaiimageio">
		</settings>
	</type>
</imageTypeSettings>

Water marks

When automatically processing and checking image files, the content server can set watermarks. These are images or parts of images that are inserted into the processed image as a kind of identity information (of the author). Watermarks can be configured with the watermark element and contain the following parameters:

Error watermarks can be used not to display errors in documents on a subsequent error web page, but to use them as watermarks (=overlay) directly in the image files. These can be configured with the errorWaterMark element and contain the following parameters:

<watermark
	use="false"
	scale="true"
	convertColorSpace="false"
	scaleToPercent="6"
	configFile="file:///opt/digiverso/viewer/config/config_imageFooter.xml" />
<errorWaterMark
	title="Error"
	titleFontSize="20" 
	messageFontSize="14"
	messageMaxLineLength="60" />

PDF title pages

The content server can generate three different types of artificial title pages and insert them at appropriate places in multi-page PDF files. Artificial title pages are pages that contain some meta information about the document or chapter or document section that follows. Metadata is read from METS files of the corresponding process. Additionally, it is also possible to display image content linked in the METS file. The layout as well as static contents of the metadata pages are specified by XML documents, so-called templates, which can be customized according to individual needs.

With pdfTitlePage a unique title page can be generated for the entire PDF document. It is inserted before the first page and can only contain information about the entire work and the top structural element contained in the PDF file.

The pdfChapterTitlePages element can be used to insert title pages before each chapter or structure element and can contain information about the respective structure element and the overall work. For example, these pages can contain information about the document structure (tables of contents, chapters, subchapters, appendices, etc.).

The 'singlePdfTitlePage' element can be used to include additional, individual title pages in the PDF document, which provide information about special places in a book, for example. It can contain only information about the whole work.

.fo` template files can be used to generate additional PDF title pages. These can either be specified in server requests or specified in the following XML elements.

The specified folders (templateFolder) must contain at least for each active metadata page the XML file with file extension .fo specified in defaultTemplate, and the file fop.xconf which contains further settings for conversion to PDF using "Apache fop". Details about "Apache fop" can be found at this location .

<pdfTitlePage
	use="true"
	templateFolder="file:///opt/digiverso/viewer/config/PDFTitlePage/"
	defaultTemplate="default"
	configFile=""
	fontFile="" />
<pdfChapterTitlePages
	use="true"
	templateFolder="file:///opt/digiverso/viewer/config/PDFTitlePage/"
	defaultTemplate="default" />
<singlePdfTitlePage
	use="false"
	templateFolder="file:///opt/digiverso/viewer/config/PDFTitlePage/"
	defaultTemplate="default" />

REST API

The REST API can be used to retrieve information about image files at the content server. The parameters attribution, logo and license are additional specifications that can optionally be set in the image metadata of the returned image files.

<restapi use="false">
	<iiif>
		<attribution></attribution>
		<logo></logo>
		<license></license>
	</iiif>
	<discloseContentLocation>true</discloseContentLocation>
</restapi>

Cache storage

Caches can be used to temporarily store image data so that it is not necessary to recalculate all data for each (possibly identical) request. Different caches are used for processing different file types, which can be configured in more detail with the following XML elements. The contentCache element can be used for a general cache for image files, the pdfCache element for a cache for PDF files, and the thumbnailCache element for a cache for thumbnails.

<contentCache
	useCache="false"
	size="100"
	useShortFileNames="false"
	path=""
	cachePartialImages="false" />
<thumbnailCache
	useCache="false"
	size="100"
	useShortFileNames="false"
	path="" />
<pdfCache
	useCache="true"
	size="100"
	useShortFileNames="false" />

Performance

The memoryUsage element can be used to specify some memory and runtime constraints for the content server. The memoryUsage element contains general settings and further subelements for specific file types. The following parameters can be used for memoryUsage:

For the parameter memoryUnit there are some, partly misleading, values. The following table shows in each case which values can be used and which exact numeric values they correspond to internally.

There are also several possible values for the timeoutUnit parameter. The following table shows in each case how these are interpreted internally:

With the sub-elements image, pdf and metsPdf special settings can be made for the corresponding file types. The following parameters can be used equally for all subelements:

<memoryUsage
	maximalParallelImageRequests="0"
	maximalParallelPdfRequests="0"
	lowMemoryShreshold="1000000000"
	memoryUnit="MB"
	timeoutUnit="s"
	triggerGCAfterAction="true">
	<image>
		<maxParallelRequests>20</maxParallelRequests>		
		<lowMemoryThreshold>200</lowMemoryThreshold>
		<lowMemoryTimeout>8</lowMemoryTimeout>
	</image>
	<pdf>
		<maxParallelRequests>8</maxParallelRequests>		
		<lowMemoryThreshold>500</lowMemoryThreshold>
		<lowMemoryTimeout>10</lowMemoryTimeout>
	</pdf>
	<metsPdf>
		<maxParallelRequests>4</maxParallelRequests>		
		<lowMemoryThreshold>1000</lowMemoryThreshold>
		<lowMemoryTimeout>20</lowMemoryTimeout>
	</metsPdf>
</memoryUsage>

S3 storage

The S3 element can optionally be used to include S3 cloud storage to offload data.

<S3>
	<useCustom>true</useCustom>
	<Endpoint>http://192.168.178.124:9000</Endpoint>
	<AccessKeyID>24JW1VB99T8MAU94TBIC</AccessKeyID>
	<SecretAccessKey>OTrwBXGVBacVdXNSI7SKKrX8b+CqwANa5ngLZ4lB</SecretAccessKey>
</S3>

Zuletzt aktualisiert