Apache Spark

Customer Segmentation using Machine Learning in Apache Spark

Customer Segmentation using Machine Learning in Apache Spark

Customer segmentation is the practice of dividing a company's customers into groups that reflect similarities among customers in each group. The goal of segmenting customers is to decide how to relate to customers in each segment in order to maximize the value of each customer to the business. Problem Statement or Business Problem In this project, we will perform one of the most essential applications of machine learning – Customer Segmentation. We will implement customer segmentation in Apache Spark and Scala, whenever you need to find your best customer. Customer Segmentation is one of the most important applications of unsupervised…
Read More
Apache Zeppelin with Apache Spark Installation on Ubuntu

Apache Zeppelin with Apache Spark Installation on Ubuntu

Installation Steps for Apache Zeppelin on Ubuntu Prerequisite: Need to have Java 7 or Java 8 installed on Ubuntu Operating System. The first step is to download the latest version on Apache Zeppelin and save it in one of the folder Link: http://zeppelin.apache.org/download.html The second step is to unzip the downloaded tar file (i.e) .tgz (We have stored the downloaded tar file in /home/bigdata/apachezeppelin/ (We have manually created apachezeppelin folder by using command mkdir apachezeppelin) [email protected]:~$ cd /home/bigdata/apachezeppelin/ [email protected]:~/apachezeppelin$ pwd /home/bigdata/apachezeppelin [email protected]:~/apachezeppelin$ ls -ltr total 683072 -rw-rw-r-- 1 bigdata bigdata 699455687 Aug 15 11:27 zeppelin-0.9.0-bin-netinst.tgz [email protected]:~/apachezeppelin$ tar -xvzf zeppelin-0.9.0-bin-netinst.tgz zeppelin-0.9.0-bin-netinst/…
Read More
Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Movies are loved by everyone irrespective of age, gender, race, color, or geographical location. A recommendation system is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. Recommendation systems encompass a class of techniques and algorithms that can suggest “relevant” items to users. They predict future behavior based on past data through a multitude of techniques. Problem Statement or Business Problem In this project, we will generate top 10 movie recommendations for each user as well as generate top 10 user recommendations for each movie. Attribute Information…
Read More
Top 1000+ Big Data Interview Question and Answers

Top 1000+ Big Data Interview Question and Answers

With more companies turning to big data to run their business, the demand for talent is at an all-time high. What does that mean for you? It just translates to better opportunities if you want to get employed in any of the big data-related fields. In the era of big data, companies are turning more and more towards using big data to operate their operations. It means better prospects for employment in any big data-related organization. There is a huge demand for talent in the big data era, with more and more companies utilizing big data to run their operations.…
Read More
Machine Learning Project on Sales Prediction or Sale Forecast

Machine Learning Project on Sales Prediction or Sale Forecast

Sales forecasting is the process of estimating future sales. Accurate sales forecasts enable companies to make informed business decisions and predict short-term and long-term performance. Companies can base their forecasts on past sales data, industry-wide comparisons, and economic trends. It is easier for established companies to predict future sales based on years of past business data. Newly founded companies have to base their forecasts on less-verified information, such as market research and competitive intelligence to forecast their future business. Sales forecasting gives insight into how a company should manage its workforce, cash flow, and resources. In addition to helping a…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

A mushroom, or toadstool, is the fleshy, spore-bearing fruiting body of a fungus, typically produced above ground on soil or on its food source. Problem Statement or Business Problem In this project, looking at the various properties of a mushroom, we will predict whether the mushroom is edible or poisonous. Attribute Information or Dataset Details: To be more understandable, let's write properties one by one. classes: edible=e, poisonous=pcap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=scap-surface: fibrous=f,grooves=g,scaly=y,smooth=scap-color: brown=n, buff=b, cinnamon=c, gray=g,green=r, pink=p, purple=u, red=e,white=w,yellow=ybruises: bruises=t,no=fodor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p,spicy=sgill-attachment: attached=a,descending=d,free=f,notched=ngill-spacing: close=c,crowded=w,distant=dgill-size: broad=b,narrow=ngill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r, orange=o, pink=p,purple=u,red=e,white=w,yellow=ystalk-shape: enlarging=e,tapering=tstalk-root: bulbous=b, club=c, cup=u, equal=e, rhizomorphs=z,…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Collecting all String Columns into an Array %scala var StringfeatureCol = Array("class", "capshape", "capsurface", "capcolor", "bruises", "odor", "gillattachment", "gillspacing", "gillsize", "gillcolor", "stalkshape", "stalkroot", "stalksurfaceabovering", "stalksurfacebelowring", "stalkcolorabovering", "stalkcolorbelowring", "veiltype", "veilcolor", "ringnumber", "ringtype", "sporeprintcolor", "population", "habitat") StringIndexer encodes a string column of labels to a column of label indices. Example of StringIndexer %scala import org.apache.spark.ml.feature.StringIndexer val df = spark.createDataFrame( Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")) ).toDF("id", "category") df.show() val indexer = new StringIndexer() .setInputCol("category") .setOutputCol("categoryIndex") val indexed = indexer.fit(df).transform(df) indexed.show() Output: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| b| | 2| c| | 3|…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 1)

Machine Learning Pipeline Application on Power Plant. (Part 1)

This is an end-to-end Project of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset. Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant. Background Power generation is a complex process, and understanding and predicting power output is an important element in managing a plant and its connection to the power grid. The operators of a regional power grid create predictions of power demand based on historical…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 2)

Machine Learning Pipeline Application on Power Plant. (Part 2)

Visualize Your Data To understand our data, we will look for correlations between features and the label. This can be important when choosing a model. E.g., if features and a label are linearly correlated, a linear model like Linear Regression can do well; if the relationship is very non-linear, more complex models such as Decision Trees can be better. We can use Databrick's built in visualization to view each of our predictors in relation to the label column as a scatter plot to see the correlation between the predictors and the label. Exploratory Data Analysis (EDA) is an approach/philosophy for…
Read More
Machine Learning Project – Predict Forest Cover Part 1

Machine Learning Project – Predict Forest Cover Part 1

In this project, we will predict Forest Cover based on various attributes (cartographic variables) of the Forest. Hence, this is a classification problem. Problem Statement or Business Problem In this project, we'll predict Forest Cover supported various attributes (cartographic variables) of the Forest. Hence, this is often a classification problem. Attribute Information or Dataset Details: Given is the attribute name, attribute type, the measurement unit, and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. NameData TypeMeasurementDescriptionElevationquantitativemetersElevation in metersAspectquantitativeazimuthAspect in degrees…
Read More