Importing records from an Excel file
This is the technical documentation for the plugin for importing Excel files.
Overview
Introduction
This documentation describes the installation, configuration and use of the plugin for mass importing data sets from Excel files.
Installation
The plugin must be installed in the following folder:
There is also a configuration file, which must be located at the following place:
Overview and functionality
To use the import, the mass import area must be opened in the production templates and the plugin intranda_import_excel
selected in the File upload import tab. An Excel file can then be uploaded and imported.
The import then takes place line by line. A new process is created for each line and the configured rules are applied. If a valid data record has been created and the generated task title has not yet been assigned, the task is actually created and saved.
Configuration
The configuration is done via the file plugin_intranda_import_excel.xml
. This file can be adapted during operation.
Individual configurability
It is possible to create a global configuration for all production templates as well as individual settings for individual production templates. The element config
can be repeated in the XML file. If mass import has been selected in Goobi, the system always searches for the configuration block with the name of the selected production template in the template
element. If such an entry does not exist, the default
configuration is used. This is marked with *
.
Collection
With the optional element collection
it is possible to define a collection to be inserted into all records. In addition, collections can also be selected from the interface, or the collection can be imported as part of the Excel file or from the catalog.
Catalogue import
The next four elements use-Opac
, opacName
, opacHeader
and searchField
control whether a catalogue query should be performed during the import. If useOpac
contains the value true
, such a query is performed. The catalogue and the search field configured in the fields are used for this. The name of the catalogue must correspond to an entry in the Goobi configuration file goobi_projects.xml
. It can either be permanently defined in the opacName
parameter or used dynamically from a line of the relevant record (the opacHeader
). The structure type is automatically recognised by the OPAC data.
However, if no OPAC is used, the structure type of the operations to be created must be specified in the publicationType
field. The name used here must exist within the ruleset. If the OPAC is to be used, this field is not evaluated.
Line range
The following elements describe the structure of the Excel file to be imported.
rowHeader
defines the row in which the column headings that are later relevant for the mapping were entered. This is usually the first line. However, this can also differ for multi-line entries.
rowDataStart
and rowDataEnd
describe the area that contains the data. Usually, these are the lines that follow the rowHeader
directly, but special formatting can also contain blank lines that can be removed using this.
Identifier
The identifierHeaderName
entry contains the heading of the column in which an identifier is contained. This field is used internally to identify the rows. This value is used for an OPAC query. In addition, this value is also used to generate the transaction title if no other generation has been specified for the transaction title.
Process title
The element processTitleRule
is used to generate the operation title. The same options are available here that can also be used in the Goobi configuration file goobi_projects.xml
.
The processTitleRule
can be provided with the additional parameter replacewith
. The character specified here (e. g. replacewith="_"
) replaces all special characters with this character.
Transfer of images
The elements imageFolderHeaderName
, imageFolderPath
and moveFiles
can be used to import images in addition to metadata. In imageFolderHeaderName
the column name is entered, in which the folder names containing the images can be found in the Excel file. Either an absolute path or a relative path can be specified there.
If a relative path is specified, the element imageFolderPath
must contain the root path to the images. The element moveFiles
can be used to control whether the images should be copied or moved.
Execution using GoobiScript
The element runAsGoobiScript
controls whether an import should be processed asynchronously in the background via the GoobiScript queue or whether the import should be processed directly within the user session. Here you have to consider which setting makes sense. If an import including images is to take place or if the Excel file contains a lot of data records, it probably makes more sense to perform this import as GoobiScript.
Note:
If the identifierHeaderName
column does not contain a unique identifier or has not been configured, the runAsGoobiScript
option cannot be used.
Configuration of the individual Excel columns
The fields metadata
, person
and group
can be used to import individual columns as metadata or as process properties. Each field contains a number of attributes and sub-elements.
Import metadata
The metadata
element is used to generate descriptive metadata..
The headerName
attribute contains the column header. The rule only applies if the Excel file contains a column with this title and the cell is not empty. At least one of the two attributes ugh
and name
must exist. The ugh
field can contain the name of a metadata. If this is the case (and the metadata is allowed for the configured publication type), a new metadata is created. name
creates a property with this name.
The docType
attribute becomes relevant if a multi-volume work or journal has been imported from the catalog. This can be used to control whether the field should belong to the entire recording or to the volume.
If, in addition to the content, there is another column with standard data identifiers or URIs, this column can be added to the normdataHeaderName
attribute.
Import of persons
The person
element can be used to automatically create persons.
Persons differ from normal metadata in that they consist of first and last names. This specification can be in two different columns, then the elements firstnameFieldHeader
and lastnameFieldHeader
are used. If the names are only in one column, the field nameFieldHeader
is used. In this case, the system checks whether the specifications should only contain the surname or whether the content must be split. With splitChar
you can set the character/sequence at which the splitting should take place. The attribute firstNameIsFirstPart
contains the information whether the name is to be imported as First name Last name
or Last name First name
.
Import of metadata groups
Metadata groups can be created using the group
element.
A metadata group consists of several metadata and persons. The configuration of the individual sub-elements is identical to that of the individual metadata and persons.
Last updated