This is an end-to-end Project of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset. Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant. Background Power generation is a complex process, and understanding and predicting power output is an important element in managing a plant and its connection to the power grid. The operators of a regional power grid create predictions of power demand based on historical…
Visualize Your Data To understand our data, we will look for correlations between features and the label. This can be important when choosing a model. E.g., if features and a label are linearly correlated, a linear model like Linear Regression can do well; if the relationship is very non-linear, more complex models such as Decision Trees can be better. We can use Databrick's built in visualization to view each of our predictors in relation to the label column as a scatter plot to see the correlation between the predictors and the label. Exploratory Data Analysis (EDA) is an approach/philosophy for…
In this project, we will predict Forest Cover based on various attributes (cartographic variables) of the Forest. Hence, this is a classification problem. Problem Statement or Business Problem In this project, we'll predict Forest Cover supported various attributes (cartographic variables) of the Forest. Hence, this is often a classification problem. Attribute Information or Dataset Details: Given is the attribute name, attribute type, the measurement unit, and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. NameData TypeMeasurementDescriptionElevationquantitativemetersElevation in metersAspectquantitativeazimuthAspect in degrees…
Define the Pipeline A predictive model often requires multiple stages of feature preparation. A pipeline consists of a series of transformer and estimator stages that typically prepare a DataFrame for modeling and then train a predictive model. Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = ForestDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test =…
Machine Learning Project for Predicting will it Rain Tomorrow in Australia Problem Statement or Business Problem In this project we will be working with a data set, indicating whether it rain the next day in Australia, Yes or No? This column is Yes if the rain for that day was 1mm or more. We will try to create a model that will predict using the available data. Attribute Information or Dataset Details: Date -The date of observation Location - The common name of the location of the weather station MinTemp - The minimum temperature in degrees celsius MaxTemp - The…
Machine Learning Project for Predict Ads Click based on the available attributes Problem Statement or Business Problem In this project we will be working with a data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user. Attribute Information or Dataset Details: 'Daily Time Spent on Site': consumer time on site in minutes 'Age': cutomer age in years 'Area Income': Avg. Income of geographical area of consumer 'Daily Internet Usage': Avg.…
Since as a beginner in machine learning it would be a great opportunity to try some techniques to predict the outcome of the drugs that might be accurate for the patient. Problem Statement or Business Problem The target feature is Drug type The feature sets are: Age Sex Blood Pressure Levels (BP) Cholesterol Levels Na to Potassium Ration The main problem here is not just the feature sets and target sets but also the approach that is taken in solving these types of problems as a beginner. So best of luck. Attribute Information or Dataset Details: Age Sex BP Cholesterol…
Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = DrugsFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…
Machine Learning Project to predict whether a person makes over 50K a year Problem Statement or Business Problem Prediction task is to determine whether a person makes over 50K a year.(Income Classification) Attribute Information or Dataset Details: age: integerworkclass: stringfnlwgt: integereducation: stringeducation-num: integermarital-status: stringoccupation: stringrelationship: stringrace: stringsex: stringcapital-gain: integercapital-loss: integerhours-per-week: integernative-country: stringincome: string Technology Used Apache Spark Spark SQL Apache Spark MLLib Scala DataFrame-based API Databricks Notebook Challenges Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Introduction Welcome to this project on predict whether a person makes over 50K a year…
Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = IncomeFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…