How can I install Apache Tika 1.20 on Ubuntu 18.04 / Ubuntu 16.04?. Apache Tika is an Open source toolkit that detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). Tika is very useful for search engine indexing, content analysis, translation e.t.c.

What is new in Apache Tika 1.20

  • Upgrade to POI 4.0.1
  • Upgrade to PDFBox 2.0.13
  • Integrate/parameterize new angles handling in
    PDFBox
  • Prevent content within <style> and <script/> elements to be written in the ToTextContentHandle
  • Switch child to parent communication to a shared memory-mapped file in tika-server’s – spawnChild mode
  • Bulk upgrade of dependencies
  • Upgrade jaxb-runtime and javax.activation
  • Improve language id efficiency in tika-eval
  • Remove duplication of notes in PPT slides
  • Upgrade sqlite “provided” dependency to 3.25.2

In this post, we will discuss the installation of Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS.

Apache Tika dependencies

What you need to build and install Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS are:

  • Java Runtime Environment (JRE)
  • Apache Maven

We will install these dependencies before we can download and install Tika on Ubuntu 18.04 / Ubuntu 16.04.

Step 1: Update your Ubuntu system

Start by ensuring you’re running an updated Ubuntu Desktop / Server.

sudo apt update
sudo apt -y upgrade
sudo apt -y intall wget curl vim

Step 2: Install Java on Ubuntu 18.04 / Ubuntu 16.04

As from Tika 1.19, build from Java 11 is supported. You can install Java 11 on Ubuntu 18.04 / Ubuntu 16.04 LTS using our previous guide below.

How to Install Java 11 on Ubuntu 18.04 /16.04 / Debian 9

For Java 8, install it using commands below

sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java8-set-default

Confirm installed version of Java:

$ java --version
java 11.0.1 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)

Step 3: Install Apache Maven

Install Apache Maven by following our guide:

Install Latest Apache Maven on Ubuntu 18.04 /16.04 / Debian 9

Step 4: Download and Install Apache Tika

Download latest Apache Tika from the Downloads page.

export VER="1.20"
wget https://archive.apache.org/dist/tika/tika-${VER}-src.zip

Unzip the downloaded file.

unzip tika-${VER}-src.zip

Change to new folder and run mvn install

cd tika-${VER}
mvn install

Sample output.

install apache tika ubuntu 18.04

Wait for the installation to finish then test Tika within its base directory.

Reference:

http://tika.apache.org/1.20/gettingstarted.html

Your support is our everlasting motivation,
that cup of coffee is what keeps us going!


As we continue to grow, we would wish to reach and impact more people who visit and take advantage of the guides we have on our blog. This is a big task for us and we are so far extremely grateful for the kind people who have shown amazing support for our work over the time we have been online.

Thank You for your support as we work to give you the best of guides and articles. Click below to buy us a coffee.

LEAVE A REPLY

Please enter your comment!
Please enter your name here