4.7. Internet Archive download

Working with the intranda TaskManager also allows Goobi to efficiently harvest the Internet Archive. For this to happen, two criteria have to be met. The WebDavCommunicator plugin must be present in TaskManager’s plugin folder. The default path is shown below:


The plugin itself is also copied to the plugin folder. The default path is:


Starting the plugin

Der Aufruf des Internet-Archive-Harvestings wird innerhalb von Goobi in einem Workflowschritt folgendermaßen konfiguriert:

/usr/bin/java -jar /opt/digiverso/itm/bin/TaskClient.jar 
    -itm http:~/~/localhost/itm/service 
    -s http:~/~/archive.org/download/${meta.CatalogIDDigital} 
    -d {imagepath}/source/ 
    -n template 
    -e -i {stepid} 
    –T {processtitle} 
    -gid {processid} 


The command parameters are explained in the following table:

Operation of the plugin

To begin with, the plugin receives a URL for a volume in the Internet Archive together with a target directory. It then downloads the files shown below from the URL into the target directory:


If a download fails, that job’s priority will be reduced, and TaskManager will attempt to download the next job. Failed attempts are reported to Goobi, and a warning message is then recorded in the process log.

Initially, every job in TaskManager is assigned priority level 10. The higher the priority, the sooner a job will be processed. If a job’s priority is reduced to zero, four more download attempts will be made. If all of these fail, the job will be regarded as unsuccessful and will be sent back to Goobi with an error message.

Last updated