Scala

Predict Ads Click – Practice Data Analysis and Logistic Regression Prediction

Predict Ads Click – Practice Data Analysis and Logistic Regression Prediction

Machine Learning Project for Predict Ads Click based on the available attributes Problem Statement or Business Problem In this project we will be working with a data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user. Attribute Information or Dataset Details: 'Daily Time Spent on Site': consumer time on site in minutes'Age': cutomer age in years'Area Income': Avg. Income of geographical area of consumer'Daily Internet Usage': Avg. minutes a day…
Read More
Drug Classification Part 1

Drug Classification Part 1

Since as a beginner in machine learning it would be a great opportunity to try some techniques to predict the outcome of the drugs that might be accurate for the patient. Problem Statement or Business Problem The target feature is Drug type The feature sets are: AgeSexBlood Pressure Levels (BP)Cholesterol LevelsNa to Potassium Ration The main problem here is not just the feature sets and target sets but also the approach that is taken in solving these types of problems as a beginner. So best of luck. Attribute Information or Dataset Details: AgeSexBPCholesterolNa_to_KDrug Technology Used Apache SparkSpark SQLApache Spark MLLibScalaDataFrame-based…
Read More
Drug Classification Part 2

Drug Classification Part 2

Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = DrugsFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…
Read More
Prediction task is to determine whether a person makes over 50K a year Part 1

Prediction task is to determine whether a person makes over 50K a year Part 1

Machine Learning Project to predict whether a person makes over 50K a year Problem Statement or Business Problem Prediction task is to determine whether a person makes over 50K a year.(Income Classification) Attribute Information or Dataset Details: age: integerworkclass: stringfnlwgt: integereducation: stringeducation-num: integermarital-status: stringoccupation: stringrelationship: stringrace: stringsex: stringcapital-gain: integercapital-loss: integerhours-per-week: integernative-country: stringincome: string Technology Used Apache SparkSpark SQLApache Spark MLLibScalaDataFrame-based APIDatabricks Notebook Challenges Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Introduction Welcome to this project on predict whether a person makes over 50K a year in Apache Spark Machine Learning…
Read More
Prediction task is to determine whether a person makes over 50K a year Part 2

Prediction task is to determine whether a person makes over 50K a year Part 2

Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = IncomeFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…
Read More
Classifying gender based on personal preferences Part 1

Classifying gender based on personal preferences Part 1

Gender is a social construct. The way males and females are treated differently since birth moulds their behaviour and personal preferences into what society expects for their gender. This small dataset is designed to provide an idea about whether a person's gender can be predicted with an accuracy significantly above 50% based on their personal preferences. Attribute Information or Dataset Details: FavoriteColorFavoriteMusicGenreFavoriteBeverageFavoriteSoftDrinkGender Technology Used​ Apache SparkSpark SQL Apache Spark MLLibScalaDataFrame-based APIDatabricks Notebook Introduction Welcome to this project on Mobile Price Classification in Apache Spark Machine Learning using Databricks platform community edition server which allows you to execute your spark code, free…
Read More
Classifying gender based on personal preferences Part 2

Classifying gender based on personal preferences Part 2

Collecting all String Columns into an Array %scala var StringfeatureCol = Array("FavoriteColor", "FavoriteMusicGenre", "FavoriteBeverage", "FavoriteSoftDrink", "Gender"); StringIndexer encodes a string column of labels to a column of label indices. Example of StringIndexer %scala import org.apache.spark.ml.feature.StringIndexer val df = spark.createDataFrame( Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")) ).toDF("id", "category") df.show() val indexer = new StringIndexer() .setInputCol("category") .setOutputCol("categoryIndex") val indexed = indexer.fit(df).transform(df) indexed.show() Output: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| b| | 2| c| | 3| a| | 4| a| | 5| c| +---+--------+ +---+--------+-------------+ | id|category|categoryIndex| +---+--------+-------------+ | 0| a| 0.0| | 1|…
Read More
Mobile Price Classification

Mobile Price Classification

Machine Learning Project for Mobile Price Classification based on the available attributes Problem Statement or Business Problem Bob has started his own mobile company. He wants to give a tough fight to big companies like Apple, Samsung etc. He does not know how to estimate the price of mobiles his company creates. In this competitive mobile phone market, you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies. Bob wants to find out some relation between features of a mobile phone(eg:- RAM, Internal Memory, etc) and its selling price. But he…
Read More
Predicting the Cellular Localization Sites of Proteins in Yest

Predicting the Cellular Localization Sites of Proteins in Yest

Machine Learning Project Predicting the Cellular Localization Sites of Proteins in Yest based on the available attributes Data Set Information Sequence Name: Accession number for the SWISS-PROT databasemcg: McGeoch's method for signal sequence recognition.gvh: von Heijne's method for signal sequence recognition.alm: Score of the ALOM membrane spanning region prediction program.mit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins.erl: Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute.pox: Peroxisomal targeting signal in the C-terminus.vac: Score of discriminant analysis…
Read More
YouTube Spam Comment Prediction

YouTube Spam Comment Prediction

Machine Learning Project YouTube Spam Comment Prediction. The study of the classification YouTube comment as spam based on the available attributes Data Set Information COMMENT_ID: StringAUTHOR: StringDATE: StringCONTENT: StringCLASS: Double Technology Used Apache SparkSpark SQL Apache Spark MLLibScalaDataFrame-based APIDatabricks Notebook Challenges Process Comma-separated values file (ie file with .csv as Extensions) with user define a schema for data Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Introduction Welcome to this project on creating prediction model to Identify spam comment in Apache Spark Machine Learning using Databricks platform community edition server which allows…
Read More