Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

Hadoop Install

With this tutorial, we will learn the complete process to install Hadoop 3.3.1 on Ubuntu 20.

Supported Java Versions

  • Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)
  • Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported:  HADOOP-16795 – Java 11 compile support OPEN
  • Apache Hadoop from 3.0.x to 3.2.x now supports only Java 8
  • Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8

Required software for Linux include:

 

  • Java must be installed. Recommended Java versions are described at HadoopJavaVersions.
  • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons if the optional start and stop scripts are to be used. 

Steps for Installing JAVA 8 on Ubuntu

Step 1 – Install Java 8 on Ubuntu

The OpenJDK 8 is available under default Apt repositories. You can simply install Java 8 on an Ubuntu system using the following commands.

$sudo apt update
$sudo apt install openjdk-8-jdk -y

Step 2 – Verify Java Installation

You have successfully installed Java 8 on your system. Let’s verify the installed and current active version using the following command.

$java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1ubuntu1-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

Step 3 – Setup JAVA_HOME and JRE_HOME Variable

As you have installed Java on your Linux system, You must have to set JAVA_HOME and JRE_HOME environment variables, 

Edit the system Path file /etc/profile

sudo nano /etc/profile

Add the following lines at the end

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH
Java Setting

Steps for Installing ssh on Ubuntu

Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network. Typical applications include remote command-line, login, and remote command execution, but any network service can be secured with SSH.

Install ssh on your system using the below command:

sudo apt-get install ssh
ssh install

Type the password for the sudo user and then press Enter.

Install pdsh on your system using the below command:

sudo apt-get install pdsh
install pdsh

Type ‘Y’ and then press Enter to continue with the installation process.

Open the .bashrc file in the nano editor using the following command:

nano .bashrc
bashrc

Now set the PDSH_RCMD_TYPE environment variable to ssh

bashrc setting

Steps for Installing Hadoop on Ubuntu

  • Create a directory for example

$mkdir /home/bigdata/hadoop

  • Move to hadoop directory

$cd /home/bigdata/hadoop

Download Hadoop (Link will change with respect to country so please get the download link from hadoop website ie https://hadoop.apache.org/releases.html

hadoop

A new web page will get open and copy the link

In Ubuntu terminal type

$wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Then type

$tar xvf hadoop-3.3.1.tar.gz
$cd hadoop-3.3.1
pwd

Edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

$cd etc/
cd etc part1
cd etc part2
$nano hadoop-env.sh
hadoop-env.sh

Set the Java Path in hadoop-env.sh as shown in the image.

hadoop-env.sh modify
By Bhavesh