Data Visualization

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Collecting all String Columns into an Array %scala var StringfeatureCol = Array("class", "capshape", "capsurface", "capcolor", "bruises", "odor", "gillattachment", "gillspacing", "gillsize", "gillcolor", "stalkshape", "stalkroot", "stalksurfaceabovering", "stalksurfacebelowring", "stalkcolorabovering", "stalkcolorbelowring", "veiltype", "veilcolor", "ringnumber", "ringtype", "sporeprintcolor", "population", "habitat") StringIndexer encodes a string column of labels to a column of label indices. Example of StringIndexer %scala import org.apache.spark.ml.feature.StringIndexer val df = spark.createDataFrame( Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")) ).toDF("id", "category") df.show() val indexer = new StringIndexer() .setInputCol("category") .setOutputCol("categoryIndex") val indexed = indexer.fit(df).transform(df) indexed.show() Output: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| b| | 2| c| | 3|…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 1)

Machine Learning Pipeline Application on Power Plant. (Part 1)

This is an end-to-end Project of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset. Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant. Background Power generation is a complex process, and understanding and predicting power output is an important element in managing a plant and its connection to the power grid. The operators of a regional power grid create predictions of power demand based on historical…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 2)

Machine Learning Pipeline Application on Power Plant. (Part 2)

Visualize Your Data To understand our data, we will look for correlations between features and the label. This can be important when choosing a model. E.g., if features and a label are linearly correlated, a linear model like Linear Regression can do well; if the relationship is very non-linear, more complex models such as Decision Trees can be better. We can use Databrick's built in visualization to view each of our predictors in relation to the label column as a scatter plot to see the correlation between the predictors and the label. Exploratory Data Analysis (EDA) is an approach/philosophy for…
Read More
Machine Learning Project – Predict Forest Cover Part 1

Machine Learning Project – Predict Forest Cover Part 1

In this project, we will predict Forest Cover based on various attributes (cartographic variables) of the Forest. Hence, this is a classification problem. Problem Statement or Business Problem In this project, we'll predict Forest Cover supported various attributes (cartographic variables) of the Forest. Hence, this is often a classification problem. Attribute Information or Dataset Details: Given is the attribute name, attribute type, the measurement unit, and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. NameData TypeMeasurementDescriptionElevationquantitativemetersElevation in metersAspectquantitativeazimuthAspect in degrees…
Read More
Machine Learning Project – Predict Forest Cover Part 2

Machine Learning Project – Predict Forest Cover Part 2

Define the Pipeline​ A predictive model often requires multiple stages of feature preparation. A pipeline consists of a series of transformer and estimator stages that typically prepare a DataFrame for modeling and then train a predictive model. Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = ForestDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test =…
Read More
Machine Learning Project Predict Will it Rain Tomorrow in Australia

Machine Learning Project Predict Will it Rain Tomorrow in Australia

Machine Learning Project for Predicting will it Rain Tomorrow in Australia Problem Statement or Business Problem In this project we will be working with a data set, indicating whether it rain the next day in Australia, Yes or No? This column is Yes if the rain for that day was 1mm or more. We will try to create a model that will predict using the available data. Attribute Information or Dataset Details: Date -The date of observationLocation - The common name of the location of the weather stationMinTemp - The minimum temperature in degrees celsiusMaxTemp - The maximum temperature in…
Read More
Predict Ads Click – Practice Data Analysis and Logistic Regression Prediction

Predict Ads Click – Practice Data Analysis and Logistic Regression Prediction

Machine Learning Project for Predict Ads Click based on the available attributes Problem Statement or Business Problem In this project we will be working with a data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user. Attribute Information or Dataset Details: 'Daily Time Spent on Site': consumer time on site in minutes'Age': cutomer age in years'Area Income': Avg. Income of geographical area of consumer'Daily Internet Usage': Avg. minutes a day…
Read More
Drug Classification Part 1

Drug Classification Part 1

Since as a beginner in machine learning it would be a great opportunity to try some techniques to predict the outcome of the drugs that might be accurate for the patient. Problem Statement or Business Problem The target feature is Drug type The feature sets are: AgeSexBlood Pressure Levels (BP)Cholesterol LevelsNa to Potassium Ration The main problem here is not just the feature sets and target sets but also the approach that is taken in solving these types of problems as a beginner. So best of luck. Attribute Information or Dataset Details: AgeSexBPCholesterolNa_to_KDrug Technology Used Apache SparkSpark SQLApache Spark MLLibScalaDataFrame-based…
Read More
Drug Classification Part 2

Drug Classification Part 2

Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = DrugsFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…
Read More
Prediction task is to determine whether a person makes over 50K a year Part 1

Prediction task is to determine whether a person makes over 50K a year Part 1

Machine Learning Project to predict whether a person makes over 50K a year Problem Statement or Business Problem Prediction task is to determine whether a person makes over 50K a year.(Income Classification) Attribute Information or Dataset Details: age: integerworkclass: stringfnlwgt: integereducation: stringeducation-num: integermarital-status: stringoccupation: stringrelationship: stringrace: stringsex: stringcapital-gain: integercapital-loss: integerhours-per-week: integernative-country: stringincome: string Technology Used Apache SparkSpark SQLApache Spark MLLibScalaDataFrame-based APIDatabricks Notebook Challenges Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Introduction Welcome to this project on predict whether a person makes over 50K a year in Apache Spark Machine Learning…
Read More