Since as a beginner in machine learning it would be a great opportunity to try some techniques to predict the outcome of the drugs that might be accurate for the patient. Problem Statement or Business Problem The target feature is Drug type The feature sets are: Age Sex Blood Pressure Levels (BP) Cholesterol Levels Na to Potassium Ration The main problem here is not just the feature sets and target sets but also the approach that is taken in solving these types of problems as a beginner. So best of luck. Attribute Information or Dataset Details: Age Sex BP Cholesterol…
Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = DrugsFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…
Machine Learning Project to predict whether a person makes over 50K a year Problem Statement or Business Problem Prediction task is to determine whether a person makes over 50K a year.(Income Classification) Attribute Information or Dataset Details: age: integerworkclass: stringfnlwgt: integereducation: stringeducation-num: integermarital-status: stringoccupation: stringrelationship: stringrace: stringsex: stringcapital-gain: integercapital-loss: integerhours-per-week: integernative-country: stringincome: string Technology Used Apache Spark Spark SQL Apache Spark MLLib Scala DataFrame-based API Databricks Notebook Challenges Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Introduction Welcome to this project on predict whether a person makes over 50K a year…
Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = IncomeFinalDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test = splits(1) val train_rows = train.count() val test_rows = test.count() println("Training Rows: " + train_rows + " Testing Rows: " + test_rows) Prepare the Training Data To train the Classification model, you need a training data set that…
Gender is a social construct. The way males and females are treated differently since birth moulds their behaviour and personal preferences into what society expects for their gender. This small dataset is designed to provide an idea about whether a person's gender can be predicted with an accuracy significantly above 50% based on their personal preferences. Attribute Information or Dataset Details: FavoriteColor FavoriteMusicGenre FavoriteBeverage FavoriteSoftDrink Gender Technology Used Apache Spark Spark SQL Apache Spark MLLib Scala DataFrame-based API Databricks Notebook Introduction Welcome to this project on Mobile Price Classification in Apache Spark Machine Learning using Databricks platform community edition server…
Collecting all String Columns into an Array %scala var StringfeatureCol = Array("FavoriteColor", "FavoriteMusicGenre", "FavoriteBeverage", "FavoriteSoftDrink", "Gender"); StringIndexer encodes a string column of labels to a column of label indices. Example of StringIndexer %scala import org.apache.spark.ml.feature.StringIndexer val df = spark.createDataFrame( Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")) ).toDF("id", "category") df.show() val indexer = new StringIndexer() .setInputCol("category") .setOutputCol("categoryIndex") val indexed = indexer.fit(df).transform(df) indexed.show() Output: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| b| | 2| c| | 3| a| | 4| a| | 5| c| +---+--------+ +---+--------+-------------+ | id|category|categoryIndex| +---+--------+-------------+ | 0| a| 0.0| | 1|…
Machine Learning Project for Mobile Price Classification based on the available attributes Problem Statement or Business Problem Bob has started his own mobile company. He wants to give a tough fight to big companies like Apple, Samsung etc. He does not know how to estimate the price of mobiles his company creates. In this competitive mobile phone market, you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies. Bob wants to find out some relation between features of a mobile phone(eg:- RAM, Internal Memory, etc) and its selling price. But he…
Machine Learning Project Predicting the Cellular Localization Sites of Proteins in Yest based on the available attributes Data Set Information Sequence Name: Accession number for the SWISS-PROT database mcg: McGeoch's method for signal sequence recognition. gvh: von Heijne's method for signal sequence recognition. alm: Score of the ALOM membrane spanning region prediction program. mit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins. erl: Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute. pox: Peroxisomal targeting signal in…
Machine Learning Project YouTube Spam Comment Prediction. The study of the classification YouTube comment as spam based on the available attributes Data Set Information COMMENT_ID: String AUTHOR: String DATE: String CONTENT: String CLASS: Double Technology Used Apache Spark Spark SQL Apache Spark MLLib Scala DataFrame-based API Databricks Notebook Challenges Process Comma-separated values file (ie file with .csv as Extensions) with user define a schema for data Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Introduction Welcome to this project on creating prediction model to Identify spam comment in Apache Spark Machine…
Machine Learning Project Animal Classification. The study of the classification of types of animal, Identify the Type of animal (7 Types) based on the available attributes Data Set Information A simple database containing 17 Boolean-valued attributes. The "type" attribute appears to be the class attribute. Here is a breakdown of which animals are in which type: Class# -- Set of animals:====== ====================================================1 -- (41) aardvark, antelope, bear, boar, buffalo, calf, cavy, cheetah, deer, dolphin, elephant, fruitbat, giraffe, girl, goat, gorilla, hamster, hare, leopard, lion, lynx, mink, mole, mongoose, opossum, oryx, platypus, polecat, pony, porpoise, puma, pussycat, raccoon, reindeer, seal, sealion,…