Blog

Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing the binary tarball Verify the version of Java installed. For example: Command $ java -version Result openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) 2. Download the binary tarball from one of the mirrors on the Apache Cassandra Download site. For example, to download Cassandra 4.0.1: $ curl -OL https://dlcdn.apache.org/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz The mirrors only host the latest versions of each major supported release. To download an earlier version of Cassandra, visit the Apache Archives. OPTIONAL: Verify the integrity of the downloaded tarball using one of the methods here. For example, to verify…
Read More
Installing Java on Ubuntu (Linux) Machine

Installing Java on Ubuntu (Linux) Machine

Steps for Installing JAVA 8 on Ubuntu Step 1 – Install Java 8 on UbuntuThe OpenJDK 8 is available under default Apt repositories. You can simply install Java 8 on an Ubuntu system using the following commands. sudo apt update sudo apt install openjdk-8-jdk -y Step 2 – Verify Java InstallationYou have successfully installed Java 8 on your system. Let’s verify the installed and current active version using the following command. java -version openjdk version "1.8.0_252" OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1ubuntu1-b09) OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode) Step 3 – Setup JAVA_HOME and JRE_HOME VariableAs you have installed…
Read More
Customer Segmentation using Machine Learning in Apache Spark

Customer Segmentation using Machine Learning in Apache Spark

Customer segmentation is the practice of dividing a company's customers into groups that reflect similarities among customers in each group. The goal of segmenting customers is to decide how to relate to customers in each segment in order to maximize the value of each customer to the business. Problem Statement or Business Problem In this project, we will perform one of the most essential applications of machine learning – Customer Segmentation. We will implement customer segmentation in Apache Spark and Scala, whenever you need to find your best customer. Customer Segmentation is one of the most important applications of unsupervised…
Read More
Apache Zeppelin with Apache Spark Installation on Ubuntu

Apache Zeppelin with Apache Spark Installation on Ubuntu

Installation Steps for Apache Zeppelin on Ubuntu Prerequisite: Need to have Java 7 or Java 8 installed on Ubuntu Operating System. The first step is to download the latest version on Apache Zeppelin and save it in one of the folder Link: http://zeppelin.apache.org/download.html The second step is to unzip the downloaded tar file (i.e) .tgz (We have stored the downloaded tar file in /home/bigdata/apachezeppelin/ (We have manually created apachezeppelin folder by using command mkdir apachezeppelin) bigdata@bigdata:~$ cd /home/bigdata/apachezeppelin/ bigdata@bigdata:~/apachezeppelin$ pwd /home/bigdata/apachezeppelin bigdata@bigdata:~/apachezeppelin$ ls -ltr total 683072 -rw-rw-r-- 1 bigdata bigdata 699455687 Aug 15 11:27 zeppelin-0.9.0-bin-netinst.tgz bigdata@bigdata:~/apachezeppelin$ tar -xvzf zeppelin-0.9.0-bin-netinst.tgz zeppelin-0.9.0-bin-netinst/…
Read More
Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Movies are loved by everyone irrespective of age, gender, race, color, or geographical location. A recommendation system is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. Recommendation systems encompass a class of techniques and algorithms that can suggest “relevant” items to users. They predict future behavior based on past data through a multitude of techniques. Problem Statement or Business Problem In this project, we will generate top 10 movie recommendations for each user as well as generate top 10 user recommendations for each movie. Attribute Information…
Read More
Top 1000+ Big Data Interview Question and Answers

Top 1000+ Big Data Interview Question and Answers

With more companies turning to big data to run their business, the demand for talent is at an all-time high. What does that mean for you? It just translates to better opportunities if you want to get employed in any of the big data-related fields. In the era of big data, companies are turning more and more towards using big data to operate their operations. It means better prospects for employment in any big data-related organization. There is a huge demand for talent in the big data era, with more and more companies utilizing big data to run their operations.…
Read More
Machine Learning Project on Sales Prediction or Sale Forecast

Machine Learning Project on Sales Prediction or Sale Forecast

Sales forecasting is the process of estimating future sales. Accurate sales forecasts enable companies to make informed business decisions and predict short-term and long-term performance. Companies can base their forecasts on past sales data, industry-wide comparisons, and economic trends. It is easier for established companies to predict future sales based on years of past business data. Newly founded companies have to base their forecasts on less-verified information, such as market research and competitive intelligence to forecast their future business. Sales forecasting gives insight into how a company should manage its workforce, cash flow, and resources. In addition to helping a…
Read More
Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 1

Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 1

Download Link for Apache Hadoop 3.3.0 URL : https://hadoop.apache.org/releases.html Click on the Binary it will open a new website https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz (This link may change based on your location) Download link for Java SE Development Kit 8 https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html Register or Login If you have already registered the Download will begin We will have Below files in Download Folder Installing and Configuring Java Step 1: Create a Empty Folder Java in C Drive Step 2: Go to the Download location Step 3: Double Click on the Setup file Click on Next Click on Next Click on Next Click on Change Make Sure you change…
Read More
Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 2

Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 2

We have downloaded hadoop installation files We need to move (that is cut and paste) From: Downloads Location To: C:\hadoop-3.3.0.tar In C Drive Extract the hadoop-3.3.0.tar files in C Drive using extraction software (WinZip, WinRar or 7Zip) Now we will have the following in C Drive Now Open Folder C:\hadoop-3.3.0\etc\hadoop We need to edit 5 files File C:/Hadoop-3.3.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file. <configuration>  <property>  <name>fs.default.name</name>  <value>hdfs://localhost:9000</value>  </property> </configuration> C:/Hadoop-3.3.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file. <configuration>  <property>  <name>mapreduce.framework.name</name>  <value>yarn</value>  </property> </configuration> Create folder "data" under "C:\Hadoop-3.3.0"  1) Create folder "datanode" under "C:\Hadoop-3.3.0\data"  2) Create…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

A mushroom, or toadstool, is the fleshy, spore-bearing fruiting body of a fungus, typically produced above ground on soil or on its food source. Problem Statement or Business Problem In this project, looking at the various properties of a mushroom, we will predict whether the mushroom is edible or poisonous. Attribute Information or Dataset Details: To be more understandable, let's write properties one by one. classes: edible=e, poisonous=p cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s cap-color: brown=n, buff=b, cinnamon=c, gray=g,green=r, pink=p, purple=u, red=e,white=w,yellow=y bruises: bruises=t,no=f odor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p,spicy=s gill-attachment: attached=a,descending=d,free=f,notched=n gill-spacing: close=c,crowded=w,distant=d gill-size: broad=b,narrow=n gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,…
Read More