The following installation guide for the Goobi viewer refers to Ubuntu Linux 20.04. It is written as a step-by-step guide from top to bottom, meaning that settings and configurations build on each other. If the order is not followed, certain commands may fail.
VIEWER.EXAMPLE.ORG is used as the domain name in this manual, please adapt this to your own DNS name.
For productive use, a virtual machine with at least 4 CPUs and 8GB RAM is recommended.
Preparation
First, log on to the server on which the Goobi viewer is to be installed and obtain root rights:
sshVIEWER.EXAMPLE.ORGsudo-i
Then generate a password for the Goobi viewer database and a token and store them as a variable in the session. The DNS name is also stored there:
At the end of the preparation, a temporary directory for the installation must be created and the Goobi viewer Core Config Repository must be cloned. This contains the files necessary for the installation:
In /etc/default/tomcat9 adjust the memory under -Xmx to the available machine memory, choose reasonable garbage collector options and use urandom for faster Tomcat startup:
patch /etc/default/tomcat9 <<"EOF"@@ -11,6+11,15 @@ # To enable remote debugging uncomment the following line. # You will then be able to use a Java debugger on port 8000. #JAVA_OPTS="${JAVA_OPTS} -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n"+JAVA_OPTS="-Djava.awt.headless=true -Xmx4g -Xms4g"+JAVA_OPTS="${JAVA_OPTS} -XX:+UseG1GC"+JAVA_OPTS="${JAVA_OPTS} -XX:+ParallelRefProcEnabled"+JAVA_OPTS="${JAVA_OPTS} -XX:+DisableExplicitGC"+JAVA_OPTS="${JAVA_OPTS} -XX:+CMSClassUnloadingEnabled"+JAVA_OPTS="${JAVA_OPTS} -Djava.security.egd=file:/dev/./urandom"+JAVA_OPTS="${JAVA_OPTS} -Dfile.encoding='utf-8'"++UMASK=0022 # Java compiler to use for translating JavaServer Pages (JSPs). You can use all # compilers that are accepted by Ant's build.compiler property.@@ -20,4 +29,4 @@ #SECURITY_MANAGER=true # Whether to compress logfiles older than today's-#LOGFILE_COMPRESS=1+LOGFILE_COMPRESS=1EOF
Further configurations are made in /etc/tomcat9/server.xml. This will disable the appContextProtection for the JreMemoryLeakPreventionListener, set up a HTTP connector (with correct proxyName) and an AJP connector on localhost. A Crawler Session Manager Valve is also activated:
sed -e "s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g" << "EOF" | patch /etc/tomcat9/server.xml
@@ -66,59 +66,22 @@
APR (HTTP/AJP) Connector: /docs/apr.html
Define a non-SSL/TLS HTTP/1.1 Connector on port 8080
-->
- <Connector port="8080" protocol="HTTP/1.1"
+ <Connector address="127.0.0.1" port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
- redirectPort="8443" />
- <!-- A "Connector" using the shared thread pool-->
- <!--
- <Connector executor="tomcatThreadPool"
- port="8080" protocol="HTTP/1.1"
+ redirectPort="8443"
+ maxThreads="400"
+ URIEncoding="UTF-8"
+ enableLookups="false"
+ disableUploadTimeout="true"
+ proxyName="VIEWER.EXAMPLE.ORG"
+ proxyPort="80" />
+
+ <Connector address="127.0.0.1" port="8009" protocol="AJP/1.3"
+ secretRequired="false"
connectionTimeout="20000"
- redirectPort="8443" />
- -->
- <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443
- This connector uses the NIO implementation. The default
- SSLImplementation will depend on the presence of the APR/native
- library and the useOpenSSL attribute of the
- AprLifecycleListener.
- Either JSSE or OpenSSL style configuration may be used regardless of
- the SSLImplementation selected. JSSE style configuration is used below.
- -->
- <!--
- <Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
- maxThreads="150" SSLEnabled="true">
- <SSLHostConfig>
- <Certificate certificateKeystoreFile="conf/localhost-rsa.jks"
- type="RSA" />
- </SSLHostConfig>
- </Connector>
- -->
- <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443 with HTTP/2
- This connector uses the APR/native implementation which always uses
- OpenSSL for TLS.
- Either JSSE or OpenSSL style configuration may be used. OpenSSL style
- configuration is used below.
- -->
- <!--
- <Connector port="8443" protocol="org.apache.coyote.http11.Http11AprProtocol"
- maxThreads="150" SSLEnabled="true" >
- <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" />
- <SSLHostConfig>
- <Certificate certificateKeyFile="conf/localhost-rsa-key.pem"
- certificateFile="conf/localhost-rsa-cert.pem"
- certificateChainFile="conf/localhost-rsa-chain.pem"
- type="RSA" />
- </SSLHostConfig>
- </Connector>
- -->
+ maxThreads="400"
+ URIEncoding="UTF-8" />
- <!-- Define an AJP 1.3 Connector on port 8009 -->
- <!--
- <Connector protocol="AJP/1.3"
- address="::1"
- port="8009"
- redirectPort="8443" />
- -->
<!-- An Engine represents the entry point (within Catalina) that processes
every request. The Engine implementation for Tomcat stand alone
@@ -161,9 +124,14 @@
<!-- Access log processes all example.
Documentation at: /docs/config/valve.html
Note: The pattern used is equivalent to using pattern="common" -->
+ <!--
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
prefix="localhost_access_log" suffix=".txt"
pattern="%h %l %u %t "%r" %s %b" />
+ -->
+ <Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve"
+ crawlerUserAgents=".*[bB]ot.*|.*Yahoo! Slurp.*|.*Feedfetcher-Google.*|.*Apache-HttpClient.*|.*[Ss]pider.*|.*[Cc]rawler.*|.*nagios.*|.*Yandex.*"
+ sessionInactiveInterval="60"/>
</Host>
</Engine>
EOF
Disable session persistence in /etc/tomcat9/context.xml by running the following patch:
patch /etc/tomcat9/context.xml << "EOF"
@@ -25,7 +25,5 @@
<WatchedResource>${catalina.base}/conf/web.xml</WatchedResource>
<!-- Uncomment this to disable session persistence across Tomcat restarts -->
- <!--
<Manager pathname="" />
- -->
</Context>
EOF
Apache
Apache must be set up to make the viewer available externally. Activate the following modules for this:
The Goobi viewer requires a database and its own user. This is created with the following command:
mysql -e "CREATE DATABASE viewer;CREATE USER 'viewer'@'localhost' IDENTIFIED BY '$PW_SQL_VIEWER';GRANT ALL PRIVILEGES ON viewer.* TO 'viewer'@'localhost' WITH GRANT OPTION;FLUSH PRIVILEGES;"
The database schema is created automatically the first time the application is started.
NFS
This point is only relevant if Goobi workflow is/is also installed and this is not done on the same machine.
Then the /opt/digiverso/viewer folder must be exported to the Goobi workflow server. NFS is used for this. The adjustments for this are here:
aptinstallnfs-kernel-server-y
Export of the hotfolder:
export IP_GOOBI=1.2.3.4# IP-Adresse des Goobi workflow Serversecho"/opt/digiverso/viewer/hotfolder ${IP_GOOBI}/255.255.255.255(rw,sync,no_subtree_check,all_squash,anonuid=$(id-utomcat),anongid=$(id-gtomcat))">>/etc/exportsexportfs-av
If UFW is used, TCP 2049 must be enabled for NFSv4:
ufw allow from $IP_GOOBI proto tcp to any port 2049
The adjustments for Goobi workflow can be found in the installation instructions there:
The installation of the required components is described below.
Apache Solr
Download Solr 8.5.2 and unzip the installation script:
cd $installwgethttp://archive.apache.org/dist/lucene/solr/8.5.2/solr-8.5.2.tgztar-xzfsolr-8.5.2.tgzsolr-8.5.2/bin/install_solr_service.sh--strip-components=2
Now create the folder /opt/digiverso/solr and install it there:
cat<< "EOF" >/etc/security/limits.d/solr.confsolr hard nofile 65535solr soft nofile 65535solr hard nproc 65535solr soft nproc 65535EOF
To use streaming expressions Solr is not installed in the standalone version but in the SolrCloud variant. However, we only use one node. For the configuration Zookeeper is used:
Make sure that Zookeeper only listens on localhost:
patch /etc/zookeeper/conf/zoo.cfg <<"EOF"@@ -15,6+15,7 @@ # the port at which the clients will connect clientPort=2181+clientPortAddress=127.0.0.1 # specify all zookeeper servers # The fist port is used by followers to connect to the leaderEOFsystemctl restart zookeeper
Then make the installation known to Solr and adjust the memory, the garbage collector options and the log level. Furthermore, the Solr should only listen on localhost:
patch /etc/default/solr.in.sh <<"EOF"@@ -30,3+30,3 @@ # Increase Java Heap as needed to support your indexing / query needs-#SOLR_HEAP="512m"+SOLR_HEAP="2048m"@@ -48,15+48,15 @@ # These GC settings have shown to work well for a number of common Solr workloads-#GC_TUNE=" \-#-XX:SurvivorRatio=4 \-#-XX:TargetSurvivorRatio=90 \-#-XX:MaxTenuringThreshold=8 \-#-XX:+UseConcMarkSweepGC \-#-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \-#-XX:+CMSScavengeBeforeRemark \-#-XX:PretenureSizeThreshold=64m \-#-XX:+UseCMSInitiatingOccupancyOnly \-#-XX:CMSInitiatingOccupancyFraction=50 \-#-XX:CMSMaxAbortablePrecleanTime=6000 \-#-XX:+CMSParallelRemarkEnabled \-#-XX:+ParallelRefProcEnabled \+GC_TUNE=" \+-XX:SurvivorRatio=4 \+-XX:TargetSurvivorRatio=90 \+-XX:MaxTenuringThreshold=8 \+-XX:+UseConcMarkSweepGC \+-XX:ConcGCThreads=4-XX:ParallelGCThreads=4 \+-XX:+CMSScavengeBeforeRemark \+-XX:PretenureSizeThreshold=64m \+-XX:+UseCMSInitiatingOccupancyOnly \+-XX:CMSInitiatingOccupancyFraction=50 \+-XX:CMSMaxAbortablePrecleanTime=6000 \+-XX:+CMSParallelRemarkEnabled \+-XX:+ParallelRefProcEnabled" #-XX:-OmitStackTraceInFastThrow etc.@@ -66,3 +66,3 @@ # Leave empty if not using SolrCloud-#ZK_HOST=""+ZK_HOST="127.0.0.1:2181"@@ -73,3 +73,3 @@ # for production SolrCloud environments to control the hostname exposed to cluster state-#SOLR_HOST="192.168.1.1"+SOLR_HOST="127.0.0.1"@@ -115,3 +115,3 @@ # This is an alternative to changing the rootLogger in log4j2.xml-#SOLR_LOG_LEVEL=INFO+SOLR_LOG_LEVEL=ERROR@@ -212,3 +212,3 @@ # host checking can be disabled by using the system property "solr.disable.shardsWhitelist"-#SOLR_OPTS="$SOLR_OPTS -Dsolr.shardsWhitelist=http://localhost:8983,http://localhost:8984"+SOLR_OPTS="$SOLR_OPTS -Dsolr.disable.shardsWhitelist"@@ -234,2+234,3 @@ SOLR_PORT="8983"+SOLR_OPTS="$SOLR_OPTS -Djetty.host=127.0.0.1"EOF
Now make the Solr script executable:
chmod755/opt/digiverso/solr/solr/bin/solr
Now the configuration is stored in a new configset:
cd /opt/digiverso/solr/solr/server/solr/configsets/
cp -a _default/ goobiviewer
cd goobiviewer/conf/
rm managed-schema
wget -O schema.xml https://raw.githubusercontent.com/intranda/goobi-viewer-indexer/master/goobi-viewer-indexer/src/main/resources/other/schema.xml
cp lang/stopwords_de.txt lang/stopwords.txt
wget https://raw.githubusercontent.com/intranda/goobi-viewer-indexer/master/goobi-viewer-indexer/src/main/resources/other/mapping-ISOLatin1Accent.txt
patch solrconfig.xml << "EOF"
@@ -21,6 +21,8 @@
this file, see http://wiki.apache.org/solr/SolrConfigXml.
-->
<config>
+ <schemaFactory class="ClassicIndexSchemaFactory"/>
+
<!-- In all configuration below, a prefix of "solr." for class names
is an alias that causes solr to search appropriate packages,
including org.apache.solr.(search|update|request|core|analysis)
@@ -57,6 +59,10 @@
<lib dir="./lib" />
-->
+ <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lib" regex=".*\.jar" />
+ <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
+ <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-analysis-extras-\d.*\.jar" />
+
<!-- A 'dir' option by itself adds any files found in the directory
to the classpath, this is useful for including all jars in a
@@ -773,7 +779,7 @@
<initParams path="/update/**,/query,/select,/spell">
<lst name="defaults">
- <str name="df">_text_</str>
+ <str name="df">DEFAULT</str>
</lst>
</initParams>
@@ -1108,7 +1114,7 @@
</updateProcessor>
<!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
- <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
+ <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
EOF
chown -R solr. *
First the application is downloaded. Then copy the required files to /opt/digiverso/indexer/ and rename the indexerconfig_solr.xml to solr_indexerconfig.xml, check the Tomcat port and replace C: with /opt if necessary. Finally the systemd service unit is activated:
Adjust the settings to the correct URL so that the resolver links work and ensure that the port of the tomcat on which the Solr service is running is 8080:
sed-e"s|VIEWER.EXAMPLE.ORG|${VIEWER_HOSTNAME}|g"-e"s|TOKEN|${TOKEN}|g"<< "EOF" >/etc/cron.d/intranda-goobiviewerPATH=/usr/bin:/bin:/usr/sbin/MAILTO=support@intranda.com## Regular cron jobs for the Goobi viewer### This REST call triggers the email notification about new search hits for users, ## that enabled notifications for saved searches42 8,12,17 * * * root curl -s -H "Content-Type: application/json" -H "token:TOKEN" -d '{"type":"NOTIFY_SEARCH_UPDATE"}' http://localhost:8080/viewer/api/v1/tasks/ 1>/dev/null## This REST call creates an XML sitemap for the Goobi viewer instance. Please always ## call it on it's external URL because otherwise the protocol (http/https) might not ## be detected correctly18 1 * * * root curl -s -H "Content-Type: application/json" -H "token:TOKEN" -d '{"type":"UPDATE_SITEMAP"}' http://localhost:8080/viewer/api/v1/tasks/ 1>/dev/null## These two scripts pull the theme git repository regulary. The @daily part is only ## a reminder for the 1-minute schedule#*/1 * * * * root cd /opt/digiverso/viewer/themes/goobi-viewer-theme-reference; git pull | grep -v -e "Already up.to.date." -e "Bereits aktuell."#@daily root echo "Please look at the git checkout interval for the Goobi viewer theme" | mail -s "Reference: Theme repository is checked out every minute" support@intranda.com ## Optimize the Solr search index once a month@monthly root curl -s 'http://localhost:8080/solr/collection1/update?optimize=true&waitFlush=false'EOF
config_viewer.xml
In the local config_viewer.xml different settings have to be stored:
The Goobi viewer has a backend. The following command inserts a test account with the user name goobi@intranda.com and the password specified in the beginning into the database: