3.1 Classical

Introduction

The following installation guide for the Goobi viewer refers to Ubuntu Linux 22.04. It is written as a step-by-step guide from top to bottom, meaning that settings and configurations build on each other. If the order is not followed, certain commands may fail.

VIEWER.EXAMPLE.ORG is used as the domain name in this manual, please adapt this to your own DNS name.

For productive use, a virtual machine with at least 4 CPUs and 8GB RAM is recommended.

Preparation

First, log on to the server on which the Goobi viewer is to be installed and obtain root rights:

ssh VIEWER.EXAMPLE.ORG
sudo -i

Then generate a password for the Goobi viewer database and a token and store them as a variable in the session. The DNS name is also stored there:

export VIEWER_HOSTNAME=VIEWER.EXAMPLE.ORG
export PW_SQL_VIEWER=SECRETPASSWORD
export VIEWER_USERNAME=goobi@intranda.com
export VIEWER_USERPASS=SECRETPASSWORD
export TOKEN=$(uuidgen)
export install=/tmp/install

Now install the following packages:

apt -y install git patch openjdk-17-jdk-headless tomcat9 mariadb-server apache2 ttf-mscorefonts-installer unzip zookeeperd python3-passlib python3-bcrypt libopenjp2-7

At the end of the preparation, a temporary directory for the installation must be created and the Goobi viewer Core Config Repository must be cloned. This contains the files necessary for the installation:

mkdir -p $install
cd $install
git clone https://github.com/intranda/goobi-viewer-core-config.git

It is recommended to have set a DNS entry for the server at this time.

Variables and aliases

Add the following aliases to /root/.bash_aliases:

cat << "EOF" >>/root/.bash_aliases
alias cata='journalctl -u tomcat9 -n 1000 -f'
alias ind='tail -n 1000 -f /opt/digiverso/logs/indexer.log'
alias vl='tail -n 1000 -f /opt/digiverso/logs/viewer.log'
EOF

Adopt the changes in the current session:

. /root/.bashrc

Create directory structure

The following commands create the necessary folder structure:

mkdir -p /opt/digiverso/{indexer,logs,viewer/{abbyy,cmdi,deleted_mets,hotfolder,media,orig_lido,orig_denkxweb,success,ugc,alto,cms_media,error_mets,indexed_lido,mix,pdf,tei,updated_mets,cache,config/{PDFTitlePage,watermark},fulltext,indexed_mets,oai/token,ptif,themes,wc}}
chown -R tomcat: /opt/digiverso/{indexer,logs,viewer}

Configure services

Tomcat

SYSTEMD_EDITOR=tee systemctl edit tomcat9 << "EOF"
[Service]
LogsDirectoryMode=755
CacheDirectoryMode=755
ProtectSystem=true
NoNewPrivileges=true
ReadWritePaths=
EOF

In /etc/default/tomcat9 adjust the memory under -Xmx to the available machine memory, choose reasonable garbage collector options and use urandom for faster Tomcat startup:

patch /etc/default/tomcat9 << "EOF"
@@ -1,16 +1,23 @@
 # The home directory of the Java development kit (JDK). You need at least
 # JDK version 8. If JAVA_HOME is not set, some common directories for
 # OpenJDK and the Oracle JDK are tried.
-#JAVA_HOME=/usr/lib/jvm/java-8-openjdk
+JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/

 # You may pass JVM startup parameters to Java here. If you run Tomcat with
 # Java 8 instead of 9 or newer, add "-XX:+UseG1GC" to select a suitable GC.
 # If unset, the default options will be: -Djava.awt.headless=true
-JAVA_OPTS="-Djava.awt.headless=true"

 # To enable remote debugging uncomment the following line.
 # You will then be able to use a Java debugger on port 8000.
 #JAVA_OPTS="${JAVA_OPTS} -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n"
+JAVA_OPTS="-Djava.awt.headless=true -Xmx4g -Xms4g"
+JAVA_OPTS="${JAVA_OPTS} -XX:+UseG1GC"
+JAVA_OPTS="${JAVA_OPTS} -XX:+ParallelRefProcEnabled"
+JAVA_OPTS="${JAVA_OPTS} -XX:+DisableExplicitGC"
+JAVA_OPTS="${JAVA_OPTS} -Djava.security.egd=file:/dev/./urandom"
+JAVA_OPTS="${JAVA_OPTS} -Dfile.encoding='utf-8'"
+JAVA_OPTS="${JAVA_OPTS} --add-exports=java.desktop/sun.awt.image=ALL-UNNAMED"
+
+UMASK=0022

 # Java compiler to use for translating JavaServer Pages (JSPs). You can use all
 # compilers that are accepted by Ant's build.compiler property.
@@ -20,4 +27,4 @@
 #SECURITY_MANAGER=true

 # Whether to compress logfiles older than today's
-#LOGFILE_COMPRESS=1
+LOGFILE_COMPRESS=1
EOF

Further configurations are made in /etc/tomcat9/server.xml. This sets up a HTTP connector (with correct proxyName) and an AJP connector on localhost. A Crawler Session Manager Valve is also activated:

sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g" << "EOF" | patch /etc/tomcat9/server.xml
@@ -66,58 +66,22 @@
          APR (HTTP/AJP) Connector: /docs/apr.html
          Define a non-SSL/TLS HTTP/1.1 Connector on port 8080
     -->
-    <Connector port="8080" protocol="HTTP/1.1"
-               connectionTimeout="20000"
-               redirectPort="8443" />
-    <!-- A "Connector" using the shared thread pool-->
-    <!--
-    <Connector executor="tomcatThreadPool"
-               port="8080" protocol="HTTP/1.1"
-               connectionTimeout="20000"
-               redirectPort="8443" />
-    -->
-    <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443
-         This connector uses the NIO implementation. The default
-         SSLImplementation will depend on the presence of the APR/native
-         library and the useOpenSSL attribute of the AprLifecycleListener.
-         Either JSSE or OpenSSL style configuration may be used regardless of
-         the SSLImplementation selected. JSSE style configuration is used below.
-    -->
-    <!--
-    <Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
-               maxThreads="150" SSLEnabled="true">
-        <SSLHostConfig>
-            <Certificate certificateKeystoreFile="conf/localhost-rsa.jks"
-                         type="RSA" />
-        </SSLHostConfig>
-    </Connector>
-    -->
-    <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443 with HTTP/2
-         This connector uses the APR/native implementation which always uses
-         OpenSSL for TLS.
-         Either JSSE or OpenSSL style configuration may be used. OpenSSL style
-         configuration is used below.
-    -->
-    <!--
-    <Connector port="8443" protocol="org.apache.coyote.http11.Http11AprProtocol"
-               maxThreads="150" SSLEnabled="true" >
-        <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" />
-        <SSLHostConfig>
-            <Certificate certificateKeyFile="conf/localhost-rsa-key.pem"
-                         certificateFile="conf/localhost-rsa-cert.pem"
-                         certificateChainFile="conf/localhost-rsa-chain.pem"
-                         type="RSA" />
-        </SSLHostConfig>
-    </Connector>
-    -->
+      <Connector address="127.0.0.1" port="8080" protocol="HTTP/1.1"
+              server=" "
+              connectionTimeout="20000"
+              maxThreads="400"
+              URIEncoding="UTF-8"
+              enableLookups="false"
+              disableUploadTimeout="true"
+              proxyName="VIEWER.EXAMPLE.ORG"
+              proxyPort="80" />
+
+      <Connector address="127.0.0.1" port="8009" protocol="AJP/1.3"
+              secretRequired="false"
+              connectionTimeout="20000"
+              maxThreads="400"
+              URIEncoding="UTF-8" />

-    <!-- Define an AJP 1.3 Connector on port 8009 -->
-    <!--
-    <Connector protocol="AJP/1.3"
-               address="::1"
-               port="8009"
-               redirectPort="8443" />
-    -->

     <!-- An Engine represents the entry point (within Catalina) that processes
          every request.  The Engine implementation for Tomcat stand alone
@@ -160,9 +124,14 @@
         <!-- Access log processes all example.
              Documentation at: /docs/config/valve.html
              Note: The pattern used is equivalent to using pattern="common" -->
+      <!--
         <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
                prefix="localhost_access_log" suffix=".txt"
-               pattern="%h %l %u %t &quot;%r&quot; %s %b" />
+              pattern="%h %l %u %t &quot;%r&quot; %s %b" />
+      -->
+      <Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve"
+              crawlerUserAgents=".*[bB]ot.*|.*Yahoo! Slurp.*|.*Feedfetcher-Google.*|.*Apache-HttpClient.*|.*[Ss]pider.*|.*[Cc]rawler.*|.*nagios.*|.*Yandex.*|.*facebookexternalhit.*|.*bytedance.com.*|.*Turnitin.*|.*GoogleOther.*|.*python-requests.*|.*check_http.*"
+              sessionInactiveInterval="60"/>

       </Host>
     </Engine>
EOF

Disable session persistence in /etc/tomcat9/context.xml by running the following patch:

patch /etc/tomcat9/context.xml << "EOF"
@@ -25,7 +25,8 @@
     <WatchedResource>${catalina.base}/conf/web.xml</WatchedResource>
 
     <!-- Uncomment this to disable session persistence across Tomcat restarts -->
-    <!--
     <Manager pathname="" />
-    -->
+
+    <!-- Set mode for the JSESSONID cookie. Google authentication needs "lax" -->
+    <CookieProcessor sameSiteCookies="strict" />
 </Context>
EOF

Apache

Apache must be set up to make the viewer available externally. Activate the following modules for this:

a2enmod proxy_ajp
a2enmod proxy_http
a2enmod proxy_wstunnel
a2enmod rewrite
a2enmod expires
a2enmod headers
a2enmod http2

Place the following file at /etc/apache2/sites-available/${VIEWER_HOSTNAME}.conf:

sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g" << "EOF" >/etc/apache2/sites-available/${VIEWER_HOSTNAME}.conf
<VirtualHost *:80>
        ServerAdmin support@intranda.com
        ServerName VIEWER.EXAMPLE.ORG
        DocumentRoot /var/www

        ## make sure rewrite is enabled
        RewriteEngine On 

        ## search engines: do not follow certain urls
        RewriteCond %{HTTP_USER_AGENT} ^.*bot.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*Yandex.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*spider.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*rawler.*$ [NC]
        ReWriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]

        RewriteCond %{HTTP_USER_AGENT} ^.*bot.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*Yandex.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*spider.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*rawler.*$ [NC]
        ReWriteRule ^(.*)viewer/!(.*)$ $1viewer/$2 [L,R=301]

        RewriteCond %{HTTP_USER_AGENT} ^.*bot.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*Yandex.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*spider.*$ [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} ^.*rawler.*$ [NC]
        ReWriteRule ^(.*)/[Ll][Oo][Gg]_(.*)$ $1/ [L,R=301]

        ## compress output
        <IfModule mod_deflate.c>
                AddOutputFilterByType DEFLATE text/plain text/html text/xml
                AddOutputFilterByType DEFLATE text/css text/javascript
                AddOutputFilterByType DEFLATE application/xml application/xhtml+xml
                AddOutputFilterByType DEFLATE application/rss+xml
                AddOutputFilterByType DEFLATE application/javascript application/x-javascript
                AddOutputFilterByType DEFLATE application/json
        </IfModule>

        ## general proxy settings
        ProxyPreserveHost On
        <Proxy *>
                Require local
        </Proxy>

        ## CORS for IIIF
        Header set Access-Control-Allow-Origin "*"
        Header always set Access-Control-Allow-Methods "GET, OPTIONS"
        Header always set Access-Control-Max-Age "600"
        Header always set Access-Control-Allow-Headers "Authorization, Content-Type"
        Header always set Access-Control-Expose-Headers "Content-Security-Policy, Location"

        RewriteCond %{REQUEST_METHOD} OPTIONS
        RewriteRule ^(.*)$ $1 [R=200,L]

        # make sure ETag headers are forwarded correctly
        # Post Apache 2.4 have a look at
        # https://httpd.apache.org/docs/trunk/mod/mod_deflate.html#deflatealteretag
        RequestHeader edit "If-None-Match" '(.*)-gzip"$' '$1", $1-gzip"'

        ## Enable WebSockets to check concurrent access
        RewriteCond %{HTTP:Upgrade} websocket [NC]
        RewriteCond %{HTTP:Connection} upgrade [NC]
        RewriteRule /?(.*) ws://localhost:8080/$1 [P,L]

        ## Security
        # If set to 'none' the Matomo iFrame does not work anymore
        Header set Content-Security-Policy "frame-ancestors 'self';"
        Header always append X-Frame-Options DENY
        Header always set X-XSS-Protection "1; mode=block"
        Header always set X-Content-Type-Options nosniff

        ## Viewer
        redirect 301 /index.html http://VIEWER.EXAMPLE.ORG/viewer/
        redirect 301 /viewer http://VIEWER.EXAMPLE.ORG/viewer/

        ProxyPassMatch ^/viewer/(.*)$ ajp://localhost:8009/viewer/$1 retry=0
        <LocationMatch ^/viewer/(.*)$>
                ProxyPassReverse ajp://localhost:8009/viewer/$1

                <IfModule mod_expires.c>
                        ExpiresActive on

                        ExpiresByType image/jpg "access plus 1 months"
                        ExpiresByType image/gif "access plus 1 months"
                        ExpiresByType image/jpeg "access plus 1 months"
                        ExpiresByType image/png "access plus 1 months"

                        ExpiresByType font/ttf "access plus 1 year"
                        ExpiresByType application/x-font-woff "access plus 1 year"
                        ExpiresByType application/vnd.ms-fontobject "access plus 1 year"
                </IfModule>

                Require all granted  
        </LocationMatch>


        ## solr
        redirect 301 /solr http://VIEWER.EXAMPLE.ORG/solr/
        <Location /solr/>
                Require local
                ProxyPass http://localhost:8983/solr/ retry=0
                ProxyPassReverse http://localhost:8983/solr/
        </Location>


        ## logging
        CustomLog /var/log/apache2/VIEWER.EXAMPLE.ORG_access.log combined
        ErrorLog /var/log/apache2/VIEWER.EXAMPLE.ORG_error.log

        # Possible values include: debug, info, notice, warn, error, crit, alert, emerg.
        LogLevel warn

</VirtualHost>
EOF

Then a bit of hardening of Apache for production use:

patch /etc/apache2/conf-available/security.conf << "EOF"
@@ -22,7 +22,7 @@
 # Set to one of:  Full | OS | Minimal | Minor | Major | Prod
 # where Full conveys the most information, and Prod the least.
 #ServerTokens Minimal
-ServerTokens OS
+ServerTokens Prod
 #ServerTokens Full
 
 #
@@ -33,7 +33,7 @@
 # Set to "EMail" to also include a mailto: link to the ServerAdmin.
 # Set to one of:  On | Off | EMail
 #ServerSignature Off
-ServerSignature On
+ServerSignature Off
 
 #
 # Allow TRACE method
EOF

Enable the Goobi viewer vhost, disable the default vhost and restart the Apache web server:

a2dissite 000-default
a2ensite ${VIEWER_HOSTNAME}.conf
systemctl restart apache2.service

Finally create a robots.txt:

sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g" << "EOF" >>/var/www/robots.txt
User-agent: *
Disallow: /viewer/content*action=pdf
Disallow: /viewer/rest/pdf/*
Disallow: /viewer/api/v1/records/*/pdf/
Disallow: /viewer/*.pdf$
Disallow: /viewer/search/
Disallow: /viewer/tags/
Disallow: /viewer/term/
Disallow: /viewer/oai
Disallow: /viewer/nextHit/
Disallow: /viewer/prevHit/
Disallow: /viewer/login/
Disallow: /viewer/crowd/
Disallow: /viewer/error/
Disallow: /viewer/searchadvanced/

Sitemap: https://VIEWER.EXAMPLE.ORG/viewer/sitemap_index.xml

Crawl-delay: 10
EOF

If UFW is used, Apache must be enabled:

ufw allow Apache\ Full

MySQL / MariaDB

The Goobi viewer requires a database and its own user. This is created with the following command:

mysql -e "CREATE DATABASE viewer;
CREATE USER 'viewer'@'localhost' IDENTIFIED BY '$PW_SQL_VIEWER';
GRANT ALL PRIVILEGES ON viewer.* TO 'viewer'@'localhost' WITH GRANT OPTION;
FLUSH PRIVILEGES;"

The database schema is created automatically the first time the application is started.

NFS

This point is only relevant if Goobi workflow is/is also installed and this is not done on the same machine.

Then the /opt/digiverso/viewer folder must be exported to the Goobi workflow server. NFS is used for this. The adjustments for this are here:

apt install nfs-kernel-server -y

Export of the hotfolder:

export IP_GOOBI=1.2.3.4  # IP-Adresse des Goobi workflow Servers
echo "/opt/digiverso/viewer/hotfolder ${IP_GOOBI}/255.255.255.255(rw,sync,no_subtree_check,all_squash,anonuid=$(id -u tomcat),anongid=$(id -g tomcat))" >> /etc/exports
exportfs -av

If UFW is used, TCP 2049 must be enabled for NFSv4:

ufw allow from $IP_GOOBI proto tcp to any port 2049

The adjustments for Goobi workflow can be found in the installation instructions there:

Installation

The installation of the required components is described below.

Apache Solr

Download Solr and unzip the installation script:

cd $install
wget https://archive.apache.org/dist/solr/solr/9.5.0/solr-9.5.0.tgz
tar -xzf solr-9.5.0.tgz solr-9.5.0/bin/install_solr_service.sh --strip-components=2

Now create the folder /opt/digiverso/solr and install it there:

mkdir -p /opt/digiverso/solr/
./install_solr_service.sh solr-9.5.0.tgz -i /opt/digiverso/solr -d /opt/digiverso/solr -u solr -s solr -p 8983 -n

Now adjust the limits for the user solr:

cat << "EOF" >/etc/security/limits.d/solr.conf
solr hard nofile 65535
solr soft nofile 65535
solr hard nproc 65535
solr soft nproc 65535
EOF

To use streaming expressions Solr is not installed in the standalone version but in the SolrCloud variant. However, we only use one node. For the configuration Zookeeper is used:

Make sure that Zookeeper only listens on localhost:

patch /etc/zookeeper/conf/zoo.cfg << "EOF"
@@ -15,6 +15,7 @@

 # the port at which the clients will connect
 clientPort=2181
+clientPortAddress=127.0.0.1

 # specify all zookeeper servers
 # The fist port is used by followers to connect to the leader
EOF
systemctl restart zookeeper

If dataDir in /etc/zookeeper/conf/zoo.cfg points to /tmp/, change dataDir to a persistent location, e.g. dataDir=/var/lib/zookeeper-data. Create folder and set persmissions: mkdir /var/lib/zookeeper-data; chown zookeeper: /var/lib/zookeeper-data; chmod 750 /var/lib/zookeeper-data; systemctl restart zookeeper

Then make the installation known to Solr and adjust the memory, the garbage collector options and the log level. Furthermore, the Solr should only listen on localhost:

patch /etc/default/solr.in.sh << "EOF"
@@ -32,7 +32,7 @@
 #SOLR_START_WAIT="$SOLR_STOP_WAIT"
 
 # Increase Java Heap as needed to support your indexing / query needs
-#SOLR_HEAP="512m"
+SOLR_HEAP="2048m"
 
 # Expert: If you want finer control over memory options, specify them directly
 # Comment out SOLR_HEAP if you are using this though, that takes precedence
@@ -50,25 +50,19 @@
 #  -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
 
 # These GC settings have shown to work well for a number of common Solr workloads
-#GC_TUNE=" \
-#-XX:+ExplicitGCInvokesConcurrent \
-#-XX:SurvivorRatio=4 \
-#-XX:TargetSurvivorRatio=90 \
-#-XX:MaxTenuringThreshold=8 \
-#-XX:+UseConcMarkSweepGC \
-#-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-#-XX:+CMSScavengeBeforeRemark \
-#-XX:PretenureSizeThreshold=64m \
-#-XX:+UseCMSInitiatingOccupancyOnly \
-#-XX:CMSInitiatingOccupancyFraction=50 \
-#-XX:CMSMaxAbortablePrecleanTime=6000 \
-#-XX:+CMSParallelRemarkEnabled \
-#-XX:+ParallelRefProcEnabled        etc.
+GC_TUNE=" \
+-XX:+ExplicitGCInvokesConcurrent \
+-XX:SurvivorRatio=4 \
+-XX:TargetSurvivorRatio=90 \
+-XX:MaxTenuringThreshold=8 \
+-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
+-XX:PretenureSizeThreshold=64m \
+-XX:+ParallelRefProcEnabled"
 
 # Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 # e.g. host1:2181,host2:2181/chroot
 # Leave empty if not using SolrCloud
-#ZK_HOST=""
+ZK_HOST="127.0.0.1:2181"
 
 # Set to true if your ZK host has a chroot path, and you want to create it automatically.
 #ZK_CREATE_CHROOT=true
@@ -129,7 +123,7 @@
 
 # Changes the logging level. Valid values: ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL, OFF. Default is INFO
 # This is an alternative to changing the rootLogger in log4j2.xml
-#SOLR_LOG_LEVEL=INFO
+SOLR_LOG_LEVEL=ERROR
 
 # Location where Solr should write logs to. Absolute or relative to solr start dir
 #SOLR_LOGS_DIR=logs
EOF

Also ensure that Solr is only started after Zookeeper:

patch /etc/init.d/solr << EOF
@@ -16,7 +16,7 @@
 
 ### BEGIN INIT INFO
 # Provides: solr
-# Required-Start:    \$remote_fs \$syslog
+# Required-Start:    \$remote_fs \$syslog zookeeper
 # Required-Stop:     \$remote_fs \$syslog
 # Default-Start:     2 3 4 5
 # Default-Stop:      0 1 6
EOF
systemctl daemon-reload

Now make the Solr script executable:

chmod 755 /opt/digiverso/solr/solr/bin/solr

Now the configuration is stored in a new configset:

cd /opt/digiverso/solr/solr/server/solr/configsets/
cp -a _default/ goobiviewer
cd goobiviewer/conf/
rm managed-schema.xml
wget -O schema.xml https://raw.githubusercontent.com/intranda/goobi-viewer-indexer/master/goobi-viewer-indexer/src/main/resources/other/schema.xml
cp lang/stopwords_de.txt lang/stopwords.txt
wget https://raw.githubusercontent.com/intranda/goobi-viewer-indexer/master/goobi-viewer-indexer/src/main/resources/other/mapping-ISOLatin1Accent.txt
patch solrconfig.xml << "EOF"
@@ -36,6 +36,7 @@
        affect both how text is indexed and queried.
   -->
   <luceneMatchVersion>9.9</luceneMatchVersion>
+  <schemaFactory class="ClassicIndexSchemaFactory"/>
 
   <!-- <lib/> directives can be used to instruct Solr to load any Jars
        identified and use them to resolve any "plugins" specified in
@@ -72,7 +73,7 @@
        The example below can be used to load a Solr Module along
        with their external dependencies.
     -->
-    <!-- <lib dir="${solr.install.dir:../../../..}/modules/ltr/lib" regex=".*\.jar" /> -->
+    <lib dir="${solr.install.dir:../../../..}/modules/analysis-extras/lib" regex=".*\.jar" />
 
   <!-- an exact 'path' can be used instead of a 'dir' to specify a
        specific jar file.  This will cause a serious error to be logged
@@ -666,7 +667,7 @@
   <!-- Shared parameters for multiple Request Handlers -->
   <initParams path="/update/**,/query,/select,/spell">
     <lst name="defaults">
-      <str name="df">_text_</str>
+      <str name="df">DEFAULT</str>
     </lst>
   </initParams>
 
@@ -939,7 +940,7 @@
   </updateProcessor>
 
   <!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
-  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
+  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
            processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
     <processor class="solr.LogUpdateProcessorFactory"/>
     <processor class="solr.DistributedUpdateProcessorFactory"/>
EOF
chown -R solr: *

Upload the new configset to Zookeeper now:

cd /opt/digiverso/solr/solr/
sudo -u solr bin/solr zk upconfig -n goobiviewer -d server/solr/configsets/goobiviewer/

The JTS library is still needed for the search for geocoordinates:

wget -O /opt/digiverso/solr/solr/server/solr-webapp/webapp/WEB-INF/lib/jts-core-1.17.0.jar https://github.com/locationtech/jts/releases/download/1.17.0/jts-core-1.17.0.jar
chown solr: /opt/digiverso/solr/solr/server/solr-webapp/webapp/WEB-INF/lib/jts-core-1.17.0.jar

Finally, start the service and create a new collection for the Goobi viewer with its configuration files:Solr in Tomcat:

systemctl start solr
sudo -u solr bin/solr create -c collection1 -n goobiviewer

Goobi viewer Indexer

First the application is downloaded. Then copy the required files to /opt/digiverso/indexer/, check the Tomcat port and replace C: with /opt if necessary. Finally the systemd service unit is activated:

wget -O /opt/digiverso/indexer/solrIndexer.jar https://github.com/intranda/goobi-viewer-indexer/releases/latest/download/solrIndexer.jar
wget -O /opt/digiverso/indexer/config_indexer.xml https://raw.githubusercontent.com/intranda/goobi-viewer-indexer/master/goobi-viewer-indexer/src/main/resources/config_indexer.xml
sed -e 's|<solrUrl>.*</solrUrl>|<solrUrl>http://localhost:8983/solr/collection1</solrUrl>|' -e 's|C:/|/|g' -i /opt/digiverso/indexer/config_indexer.xml
wget -O /etc/systemd/system/solrindexer.service https://raw.githubusercontent.com/intranda/goobi-viewer-indexer/master/goobi-viewer-indexer/src/main/resources/other/solrindexer.service
systemctl enable solrindexer.service

Goobi viewer

Stop the Tomcat service:

systemctl stop tomcat9

Then put the following file to /etc/tomcat9/Catalina/localhost/viewer.xml:

sed -e "s|PW_SQL_VIEWER|${PW_SQL_VIEWER}|g" << "EOF" >/etc/tomcat9/Catalina/localhost/viewer.xml
<?xml version='1.0' encoding='UTF-8'?>
<Context>
    <Resources>
        <PreResources 
            className="org.apache.catalina.webresources.DirResourceSet"
            base="/opt/digiverso/viewer/themes/goobi-viewer-theme-reference/goobi-viewer-theme-reference/WebContent/resources/themes/"
            webAppMount="/resources/themes" />  
    </Resources>
    <Resource
        name="viewer" 
        auth="Container" 
        factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" 
        type="javax.sql.DataSource" 
        driverClassName="org.mariadb.jdbc.Driver" 
        username="viewer" 
        password="PW_SQL_VIEWER" 
        maxActive="100" 
        maxIdle="30" 
        minIdle="4" 
        maxWait="10000" 
        testOnBorrow="true" 
        testWhileIdle="true" 
        validationQuery="SELECT SQL_NO_CACHE 1" 
        removeAbandoned="true" 
        removeAbandonedTimeout="600" 
        url="jdbc:mariadb://localhost/viewer?characterEncoding=UTF-8&amp;autoReconnectForPools=true" />
    <Valve 
        className="org.apache.catalina.valves.ErrorReportValve" 
        showReport="false" 
        showServerInfo="false" />
</Context>
EOF

Now adjust the file rights so that the file is not deleted with an update:

chown -R root:tomcat /etc/tomcat9/Catalina
chmod -R g-w /etc/tomcat9/Catalina

The theme is integrated as an external theme:

mkdir -p /opt/digiverso/viewer/themes/
cd /opt/digiverso/viewer/themes/
git clone https://github.com/intranda/goobi-viewer-theme-reference.git

Now download the Goobi viewer and place it in the expected location in the file system:

wget -O /var/lib/tomcat9/webapps/viewer.war https://github.com/intranda/goobi-viewer-theme-reference/releases/latest/download/viewer.war

Finally move the last configuration files and restart the Tomcat service:

mv $install/goobi-viewer-core-config/goobi-viewer-core-config/src/main/resources/install/* /opt/digiverso/viewer/config/
cp /opt/digiverso/solr/solr/server/solr/configsets/goobiviewer/conf/lang/stopwords.txt /opt/digiverso/viewer/config/
chown -R tomcat: /opt/digiverso/viewer/
systemctl start tomcat9

Goobi viewer Connector

For the installation of the OAI and SRU interface, the application must first be placed in the expected location in the file system, and rights have to be adjusted:

wget -O /opt/digiverso/viewer/oai/MARC21slimUtils.xsl https://raw.githubusercontent.com/intranda/goobi-viewer-connector/master/goobi-viewer-connector/src/main/resources/MARC21slimUtils.xsl
wget -O /opt/digiverso/viewer/oai/MODS2MARC21slim.xsl https://raw.githubusercontent.com/intranda/goobi-viewer-connector/master/goobi-viewer-connector/src/main/resources/MODS2MARC21slim.xsl
chown -R tomcat: /opt/digiverso/viewer/oai/

Finally, place a local config_oai.xml in the file system:

sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g" << "EOF" >/opt/digiverso/viewer/config/config_oai.xml
<?xml version="1.0" encoding="UTF-8"?>
<config>
        <identifyTags>
                <baseURL useInRequestElement="true">http://VIEWER.EXAMPLE.ORG/viewer/oai</baseURL>
                <adminEmail>support@intranda.com</adminEmail>
        </identifyTags>
        <solr>
                <solrUrl>http://localhost:8983/solr/collection1</solrUrl>
        </solr>
        <urnResolverUrl>http://VIEWER.EXAMPLE.ORG/viewer/resolver?urn=</urnResolverUrl>
        <piResolverUrl>http://VIEWER.EXAMPLE.ORG/viewer/piresolver?id=</piResolverUrl>
</config>
EOF

Settings

Cronjob

Cronjobs must be set up for regular tasks:

sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g" -e "s|TOKEN|${TOKEN}|g" << "EOF" >/etc/cron.d/intranda-goobiviewer
PATH=/usr/bin:/bin:/usr/sbin/
MAILTO=admin@intranda.com

#
# Regular cron jobs for the Goobi viewer
#

## Optimize the Solr search index once a month
@monthly            root    curl -s 'http://localhost:8983/solr/collection1/update?optimize=true&waitFlush=false'
EOF

config_viewer.xml

In the local config_viewer.xml different settings have to be stored:

sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}/viewer|g" -e "s|TOKEN|${TOKEN}|g" << "EOF" >/opt/digiverso/viewer/config/config_viewer.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
        <urls>
                <metadata>
                        <sourcefile>http://VIEWER.EXAMPLE.ORG/sourcefile?id=</sourcefile>
                        <marc>http://VIEWER.EXAMPLE.ORG/oai?verb=GetRecord&amp;metadataPrefix=marcxml&amp;identifier=</marc>
                        <dc>http://VIEWER.EXAMPLE.ORG/oai?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=</dc>
                        <ese>http://VIEWER.EXAMPLE.ORG/oai?verb=GetRecord&amp;metadataPrefix=europeana&amp;identifier=</ese>
                </metadata>

                <download>http://VIEWER.EXAMPLE.ORG/download/</download>
                <rest>http://VIEWER.EXAMPLE.ORG/api/v1/</rest>
                <solr>http://localhost:8983/solr/collection1</solr>
        </urls>

        <viewer>
                <theme mainTheme="reference" discriminatorField="">
                        <rootPath>/opt/digiverso/goobi-viewer-theme-reference/goobi-viewer-theme-reference/WebContent/resources/themes/</rootPath>
                </theme>
        </viewer>

        <rss>
                <numberOfItems>50</numberOfItems>
                <title>Goobi viewer RSS Feed</title>
                <description>new items</description>
                <copyright>(c) Goobi viewer using institution </copyright>
        </rss>

        <webapi>        
                <authorization>
                        <token>TOKEN</token>
                </authorization>
        </webapi>
</config>
EOF

Create user account

The Goobi viewer has a backend. The following command inserts a test account with the user name goobi@intranda.com and the password specified in the beginning into the database:

VIEWER_USERPASS_HASH=$(python3 -c "from passlib.hash import bcrypt; print(bcrypt.using(rounds=10, ident='2a').hash('${VIEWER_USERPASS}'))")
mysql viewer -e "INSERT INTO users (active,email,password_hash,score,superuser) VALUES (1,'${VIEWER_USERNAME}','${VIEWER_USERPASS_HASH}',0,1);"

Last updated