Blog

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

With this tutorial, we will learn the complete process to install Hadoop 3.3.1 on Ubuntu 20.Supported Java VersionsApache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported:  HADOOP-16795 - Java 11 compile support OPENApache Hadoop from 3.0.x to 3.2.x now supports only Java 8Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8Required software for Linux include: Java must be installed. Recommended Java versions are described at HadoopJavaVersions. ssh must be installed and sshd must be running to use the Hadoop scripts that…
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Use the following property in the respective filesFile: nano etc/hadoop/core-site.xml: <configuration>   <property>     <name>fs.defaultFS</name>     <value>hdfs://localhost:9000</value>   </property> </configuration> File: nano etc/hadoop/hdfs-site.xml <configuration>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration> File: nano etc/hadoop/mapred-site.xml <configuration>   <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>   </property>   <property>     <name>mapreduce.application.classpath</name>     <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>   </property> </configuration> File: nano etc/hadoop/yarn-site.xml <configuration>   <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>   <property>     <name>yarn.nodemanager.env-whitelist</name>     <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME*</value>   </property> </configuration> Now check that you can ssh to the localhost without…
Read More
Installing Apache Superset on Ubuntu (Linux) Machine

Installing Apache Superset on Ubuntu (Linux) Machine

Installing Superset from Scratch In Ubuntu 20.04 the following command will ensure that the required dependencies are installed: sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev Python Virtual EnvironmentWe highly recommend installing Superset inside of a virtual environment. pip install virtualenv You can create and activate a virtual environment using: # virtualenv is shipped in Python 3.6+ as venv instead of pyvenv. # See https://docs.python.org/3.6/library/venv.html python3 -m venv venv . venv/bin/activate Installing and Initializing SupersetFirst, start by installing apache-superset: pip install apache-superset Then, you need to initialize the database: superset db upgrade Finish installing by running through the…
Read More
Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing the binary tarball Verify the version of Java installed. For example: Command $ java -version Result openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) 2. Download the binary tarball from one of the mirrors on the Apache Cassandra Download site. For example, to download Cassandra 4.0.1: $ curl -OL https://dlcdn.apache.org/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz The mirrors only host the latest versions of each major supported release. To download an earlier version of Cassandra, visit the Apache Archives. OPTIONAL: Verify the integrity of the downloaded tarball using one of the methods here. For example, to verify…
Read More
Installing Java on Ubuntu (Linux) Machine

Installing Java on Ubuntu (Linux) Machine

Steps for Installing JAVA 8 on Ubuntu Step 1 – Install Java 8 on UbuntuThe OpenJDK 8 is available under default Apt repositories. You can simply install Java 8 on an Ubuntu system using the following commands. sudo apt update sudo apt install openjdk-8-jdk -y Step 2 – Verify Java InstallationYou have successfully installed Java 8 on your system. Let’s verify the installed and current active version using the following command. java -version openjdk version "1.8.0_252" OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1ubuntu1-b09) OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode) Step 3 – Setup JAVA_HOME and JRE_HOME VariableAs you have installed…
Read More
Customer Segmentation using Machine Learning in Apache Spark

Customer Segmentation using Machine Learning in Apache Spark

Customer segmentation is the practice of dividing a company's customers into groups that reflect similarities among customers in each group. The goal of segmenting customers is to decide how to relate to customers in each segment in order to maximize the value of each customer to the business. Problem Statement or Business Problem In this project, we will perform one of the most essential applications of machine learning – Customer Segmentation. We will implement customer segmentation in Apache Spark and Scala, whenever you need to find your best customer. Customer Segmentation is one of the most important applications of unsupervised…
Read More
Apache Zeppelin with Apache Spark Installation on Ubuntu

Apache Zeppelin with Apache Spark Installation on Ubuntu

Installation Steps for Apache Zeppelin on Ubuntu Prerequisite: Need to have Java 7 or Java 8 installed on Ubuntu Operating System. The first step is to download the latest version on Apache Zeppelin and save it in one of the folder Link: http://zeppelin.apache.org/download.html The second step is to unzip the downloaded tar file (i.e) .tgz (We have stored the downloaded tar file in /home/bigdata/apachezeppelin/ (We have manually created apachezeppelin folder by using command mkdir apachezeppelin) [email protected]:~$ cd /home/bigdata/apachezeppelin/ [email protected]:~/apachezeppelin$ pwd /home/bigdata/apachezeppelin [email protected]:~/apachezeppelin$ ls -ltr total 683072 -rw-rw-r-- 1 bigdata bigdata 699455687 Aug 15 11:27 zeppelin-0.9.0-bin-netinst.tgz [email protected]:~/apachezeppelin$ tar -xvzf zeppelin-0.9.0-bin-netinst.tgz zeppelin-0.9.0-bin-netinst/…
Read More
Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Movies are loved by everyone irrespective of age, gender, race, color, or geographical location. A recommendation system is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. Recommendation systems encompass a class of techniques and algorithms that can suggest “relevant” items to users. They predict future behavior based on past data through a multitude of techniques. Problem Statement or Business Problem In this project, we will generate top 10 movie recommendations for each user as well as generate top 10 user recommendations for each movie. Attribute Information…
Read More
Top 1000+ Big Data Interview Question and Answers

Top 1000+ Big Data Interview Question and Answers

With more companies turning to big data to run their business, the demand for talent is at an all-time high. What does that mean for you? It just translates to better opportunities if you want to get employed in any of the big data-related fields. In the era of big data, companies are turning more and more towards using big data to operate their operations. It means better prospects for employment in any big data-related organization. There is a huge demand for talent in the big data era, with more and more companies utilizing big data to run their operations.…
Read More
Machine Learning Project on Sales Prediction or Sale Forecast

Machine Learning Project on Sales Prediction or Sale Forecast

Sales forecasting is the process of estimating future sales. Accurate sales forecasts enable companies to make informed business decisions and predict short-term and long-term performance. Companies can base their forecasts on past sales data, industry-wide comparisons, and economic trends. It is easier for established companies to predict future sales based on years of past business data. Newly founded companies have to base their forecasts on less-verified information, such as market research and competitive intelligence to forecast their future business. Sales forecasting gives insight into how a company should manage its workforce, cash flow, and resources. In addition to helping a…
Read More