Apache Spark Machine Learning

Life Expectancy Prediction using Machine Learning – Part 1

Life Expectancy Prediction using Machine Learning – Part 1

Project idea – The idea behind this ML project is to build a model for Life Expectancy and Statistical Analysis on factors influencing Life Expectancy Problem Statement or Business Problem Although there have been lot of studies undertaken in the past on factors affecting life expectancy considering demographic variables, income composition and mortality rates. It was found that affect of immunization and human development index was not taken into account in the past. Also, some of the past research was done considering linear regression based on data set of one year for all the countries. Hence, this gives motivation to…
Read More
Life Expectancy Prediction using Machine Learning – Part 2

Life Expectancy Prediction using Machine Learning – Part 2

Scatter Plot (Life_Expectancy VS Adult_Mortality) Scatter Plot (Life_Expectancy VS Infant_Deaths) Scatter Plot (Life_Expectancy VS Alcohol) Scatter Plot (Life_Expectancy VS Percentage_Expenditure) Scatter Plot (Life_Expectancy VS Hepatitis_B) Scatter Plot (Life_Expectancy VS Under_Five_Deaths) Scatter Plot (Life_Expectancy VS Polio) Scatter Plot (Life_Expectancy VS Total_Expenditure) Scatter Plot (Life_Expectancy VS Diphtheria) Scatter Plot (Life_Expectancy VS HIV_AIDS) Scatter Plot (Life_Expectancy VS GDP) Scatter Plot (Life_Expectancy VS Population) Scatter Plot (Life_Expectancy VS Thinness_1_19_years) Scatter Plot (Life_Expectancy VS Thinness_5_9_years) Scatter Plot (Life_Expectancy VS Income_Composition_of_Resources) Scatter Plot (Life_Expectancy VS Schooling) Scatter Plot (Schooling VS Adult_Mortality) Scatter Plot (Schooling VS Income_Composition_of_Resources) Scatter Plot (Adult_Mortality VS Income_Composition_of_Resources) Collecting all String Columns into…
Read More
Predicting Possible Loan Default Using Machine Learning

Predicting Possible Loan Default Using Machine Learning

Project idea – The idea behind this ML project is to build a model for a Loan Prediction Based on Customer Behavior and determine the risk factor. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. this ML project is to build a model for a Loan Prediction Based on Customer BehaviorProblemCompany wants to automate the loan risk factor based on customer detail behavior. A loan default occurs when a borrower takes money from a bank and does not repay the loan. Details are Income, Age, Experience, Married/Single, House_Ownership, Car Ownership, Profession, City,…
Read More
Machine Learning Project – Loan Approval Prediction

Machine Learning Project – Loan Approval Prediction

Project idea – The idea behind this ML project is to build a model for a Home Loan Company to validates the customer eligibility for loan. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. They have presence across all urban, semi urban and rural areas. Customer first apply for home loan after that company validates the customer eligibility for loan.ProblemCompany wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount,…
Read More
Customer Segmentation using Machine Learning in Apache Spark

Customer Segmentation using Machine Learning in Apache Spark

Customer segmentation is the practice of dividing a company's customers into groups that reflect similarities among customers in each group. The goal of segmenting customers is to decide how to relate to customers in each segment in order to maximize the value of each customer to the business. Problem Statement or Business Problem In this project, we will perform one of the most essential applications of machine learning – Customer Segmentation. We will implement customer segmentation in Apache Spark and Scala, whenever you need to find your best customer. Customer Segmentation is one of the most important applications of unsupervised…
Read More
Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Movies are loved by everyone irrespective of age, gender, race, color, or geographical location. A recommendation system is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. Recommendation systems encompass a class of techniques and algorithms that can suggest “relevant” items to users. They predict future behavior based on past data through a multitude of techniques. Problem Statement or Business Problem In this project, we will generate top 10 movie recommendations for each user as well as generate top 10 user recommendations for each movie. Attribute Information…
Read More
Machine Learning Project on Sales Prediction or Sale Forecast

Machine Learning Project on Sales Prediction or Sale Forecast

Sales forecasting is the process of estimating future sales. Accurate sales forecasts enable companies to make informed business decisions and predict short-term and long-term performance. Companies can base their forecasts on past sales data, industry-wide comparisons, and economic trends. It is easier for established companies to predict future sales based on years of past business data. Newly founded companies have to base their forecasts on less-verified information, such as market research and competitive intelligence to forecast their future business. Sales forecasting gives insight into how a company should manage its workforce, cash flow, and resources. In addition to helping a…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

A mushroom, or toadstool, is the fleshy, spore-bearing fruiting body of a fungus, typically produced above ground on soil or on its food source. Problem Statement or Business Problem In this project, looking at the various properties of a mushroom, we will predict whether the mushroom is edible or poisonous. Attribute Information or Dataset Details: To be more understandable, let's write properties one by one. classes: edible=e, poisonous=p cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s cap-color: brown=n, buff=b, cinnamon=c, gray=g,green=r, pink=p, purple=u, red=e,white=w,yellow=y bruises: bruises=t,no=f odor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p,spicy=s gill-attachment: attached=a,descending=d,free=f,notched=n gill-spacing: close=c,crowded=w,distant=d gill-size: broad=b,narrow=n gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Collecting all String Columns into an Array %scala var StringfeatureCol = Array("class", "capshape", "capsurface", "capcolor", "bruises", "odor", "gillattachment", "gillspacing", "gillsize", "gillcolor", "stalkshape", "stalkroot", "stalksurfaceabovering", "stalksurfacebelowring", "stalkcolorabovering", "stalkcolorbelowring", "veiltype", "veilcolor", "ringnumber", "ringtype", "sporeprintcolor", "population", "habitat") StringIndexer encodes a string column of labels to a column of label indices. Example of StringIndexer %scala import org.apache.spark.ml.feature.StringIndexer val df = spark.createDataFrame( Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")) ).toDF("id", "category") df.show() val indexer = new StringIndexer() .setInputCol("category") .setOutputCol("categoryIndex") val indexed = indexer.fit(df).transform(df) indexed.show() Output: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| b| | 2| c| | 3|…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 1)

Machine Learning Pipeline Application on Power Plant. (Part 1)

This is an end-to-end Project of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset. Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant. Background Power generation is a complex process, and understanding and predicting power output is an important element in managing a plant and its connection to the power grid. The operators of a regional power grid create predictions of power demand based on historical…
Read More