With this tutorial, we will learn the complete process to install Apache Spark 3.2.0 on Ubuntu 20.
Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Steps for Installing Apache Spark
Step 1 – Create a directory for example
Step 2 – Move to Apache Spark directory
Step 3 – Download Apache Spark (Link will change with respect to country so please get the download link from Apache Spark website ie https://spark.apache.org/downloads.html)
Step 4 – Extract this tar file
$tar -xzf spark-3.2.0-bin-hadoop3.2.tgz
Step 5 – Open the bashrc files in the nano editor using the following command:
edit .bashrc file located in the user’s home directory and add the following parameters:
Now load the environment variables to the opened session by running the below command
Spark comes with several sample programs. Scala, Java, Python and R examples are in the examples/src/main directory. To run one of the Java or Scala sample programs, use bin/run-example [params] in the top-level Spark directory. (Behind the scenes, this invokes the more general spark-submit script for launching applications). For example,
$run-example SparkPi 10
You can also run Spark interactively through a modified version of the Scala shell. This is a great way to learn the framework.
./bin/spark-shell –master local
The –master option specifies the master URL for a distributed cluster, or local to run locally with one thread, or local[N] to run locally with N threads. You should start by using local for testing. For a full list of options, run Spark shell with the –help option.
Spark Web UI
Open Internet Explorer and type the below URL for Spark Web UI