Bigdata Hadoop

Installing Apache Spark 3  in Local Mode – Command Line (Single Node Cluster) on Windows 10

Installing Apache Spark 3  in Local Mode – Command Line (Single Node Cluster) on Windows 10

In this tutorial, we will set up a single node Spark cluster and run it in local mode using the command line.Step 1) Let's start getting the spark binary you can download the spark binary from the below linkDownload Spark link: https://spark.apache.org/Windows Utils link: https://github.com/steveloughran/winutilsStep 2) Click on Download Step 3) A new Web page will get open i) Choose a Spark release as 3.0.3ii) Choose a package type as Pre-built for Apache Hadoop 2.7 Step 4) Click on Download Spark spark-3.0.3-bin-hadoop2.7.tgz Step 5) A new Web Page will get open Step 6) Click on the link to download Step 7)…
Read More
Install Apache Spark On Ubuntu

Install Apache Spark On Ubuntu

With this tutorial, we will learn the complete process to install Apache Spark 3.2.0 on Ubuntu 20.  Prerequisite:  Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x). Steps for Installing Apache Spark Step 1 - Create a directory for example $mkdir /home/bigdata/apachespark Step 2 - Move to Apache Spark directory $cd /home/bigdata/apachespark Step 3 - Download…
Read More
Apache Hive Installation Steps on Ubuntu

Apache Hive Installation Steps on Ubuntu

With this tutorial, we will learn the complete process to install Apache Hive 3.1.2 on Ubuntu 20.The Apache Hive  data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.Steps for Installing Hadoop on UbuntuStep 1 - Create a directory for example $mkdir /home/bigdata/apachehive Step 2 - Move to hadoop directory $cd /home/bigdata/apachehive Step 3 - Download Apache Hive (Link will change with respect to country so please get the download link from…
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

With this tutorial, we will learn the complete process to install Hadoop 3.3.1 on Ubuntu 20.Supported Java VersionsApache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported:  HADOOP-16795 - Java 11 compile support OPENApache Hadoop from 3.0.x to 3.2.x now supports only Java 8Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8Required software for Linux include: Java must be installed. Recommended Java versions are described at HadoopJavaVersions. ssh must be installed and sshd must be running to use the Hadoop scripts that…
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Use the following property in the respective filesFile: nano etc/hadoop/core-site.xml: <configuration>   <property>     <name>fs.defaultFS</name>     <value>hdfs://localhost:9000</value>   </property> </configuration> File: nano etc/hadoop/hdfs-site.xml <configuration>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration> File: nano etc/hadoop/mapred-site.xml <configuration>   <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>   </property>   <property>     <name>mapreduce.application.classpath</name>     <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>   </property> </configuration> File: nano etc/hadoop/yarn-site.xml <configuration>   <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>   <property>     <name>yarn.nodemanager.env-whitelist</name>     <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME*</value>   </property> </configuration> Now check that you can ssh to the localhost without…
Read More
Top 1000+ Big Data Interview Question and Answers

Top 1000+ Big Data Interview Question and Answers

With more companies turning to big data to run their business, the demand for talent is at an all-time high. What does that mean for you? It just translates to better opportunities if you want to get employed in any of the big data-related fields. In the era of big data, companies are turning more and more towards using big data to operate their operations. It means better prospects for employment in any big data-related organization. There is a huge demand for talent in the big data era, with more and more companies utilizing big data to run their operations.…
Read More
Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 1

Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 1

Download Link for Apache Hadoop 3.3.0 URL : https://hadoop.apache.org/releases.html Click on the Binary it will open a new website https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz (This link may change based on your location) Download link for Java SE Development Kit 8 https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html Register or Login If you have already registered the Download will begin We will have Below files in Download Folder Installing and Configuring Java Step 1: Create a Empty Folder Java in C Drive Step 2: Go to the Download location Step 3: Double Click on the Setup file Click on Next Click on Next Click on Next Click on Change Make Sure you change…
Read More
Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 2

Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 2

We have downloaded hadoop installation files We need to move (that is cut and paste) From: Downloads Location To: C:\hadoop-3.3.0.tar In C Drive Extract the hadoop-3.3.0.tar files in C Drive using extraction software (WinZip, WinRar or 7Zip) Now we will have the following in C Drive Now Open Folder C:\hadoop-3.3.0\etc\hadoop We need to edit 5 files File C:/Hadoop-3.3.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file. <configuration>  <property>  <name>fs.default.name</name>  <value>hdfs://localhost:9000</value>  </property> </configuration> C:/Hadoop-3.3.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file. <configuration>  <property>  <name>mapreduce.framework.name</name>  <value>yarn</value>  </property> </configuration> Create folder "data" under "C:\Hadoop-3.3.0"  1) Create folder "datanode" under "C:\Hadoop-3.3.0\data"  2) Create…
Read More
Customer Complaints Analysis Part 1

Customer Complaints Analysis Part 1

In this article, We will analyze Consumer Complains recorded by US government from US citizens about financial products and services using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyze the data in Hadoop Eco-system to: Get the number of complaints filed for each company.Get the number of complaints filed under each product.Get the total number of complaints filed from a particular locationGet the list of company grouped by location which has no timely response. Attribute Information or Dataset Details: Data: Input Format - .CSV Public DATASET available at below website https://catalog.data.gov/dataset/consumer-complaint-database…
Read More
Customer Complaints Analysis Part 2

Customer Complaints Analysis Part 2

Execution of the Shell script Output Chart view in Excel 1. Get the number of complaints filed for each company. (Complains_by_Company.csv) 2. Get the number of complaints filed under each product. (Complains_by_Product.csv) 3. Get the total number of complaints filed from a particular location (Complains_by_Location.csv) 4. Get the list of company grouped by location which has no timely response (Complains_by_Response_No.csv) (Note: To generate the below Chart we need to Use PIVOT Chart option in Order to group data)
Read More