Apache Spark

Life Expectancy Prediction using Machine Learning – Part 1

Life Expectancy Prediction using Machine Learning – Part 1

Project idea – The idea behind this ML project is to build a model for Life Expectancy and Statistical Analysis on factors influencing Life Expectancy Problem Statement or Business Problem Although there have been lot of studies undertaken in the past on factors affecting life expectancy considering demographic variables, income composition and mortality rates. It was found that affect of immunization and human development index was not taken into account in the past. Also, some of the past research was done considering linear regression based on data set of one year for all the countries. Hence, this gives motivation to…
Read More
Life Expectancy Prediction using Machine Learning – Part 2

Life Expectancy Prediction using Machine Learning – Part 2

Scatter Plot (Life_Expectancy VS Adult_Mortality) Scatter Plot (Life_Expectancy VS Infant_Deaths) Scatter Plot (Life_Expectancy VS Alcohol) Scatter Plot (Life_Expectancy VS Percentage_Expenditure) Scatter Plot (Life_Expectancy VS Hepatitis_B) Scatter Plot (Life_Expectancy VS Under_Five_Deaths) Scatter Plot (Life_Expectancy VS Polio) Scatter Plot (Life_Expectancy VS Total_Expenditure) Scatter Plot (Life_Expectancy VS Diphtheria) Scatter Plot (Life_Expectancy VS HIV_AIDS) Scatter Plot (Life_Expectancy VS GDP) Scatter Plot (Life_Expectancy VS Population) Scatter Plot (Life_Expectancy VS Thinness_1_19_years) Scatter Plot (Life_Expectancy VS Thinness_5_9_years) Scatter Plot (Life_Expectancy VS Income_Composition_of_Resources) Scatter Plot (Life_Expectancy VS Schooling) Scatter Plot (Schooling VS Adult_Mortality) Scatter Plot (Schooling VS Income_Composition_of_Resources) Scatter Plot (Adult_Mortality VS Income_Composition_of_Resources) Collecting all String Columns into…
Read More
Medical Appointment Data Analysis

Medical Appointment Data Analysis

Project idea – The idea behind this Analysis project is to analysis a person makes a doctor's appointment, receives all the instructions, and no-show. Who to blame? Problem Statement or Business Problem Problem A person makes a doctor's appointment, receives all the instructions, and no-show. Who to blame? In this tutorial we will try to analyze why would some patient not show up for his medical appointment and whether there are reasons for that using the data we have. We will try to find some correlation between the different attributes we have and whether the patient shows up or not.…
Read More
Predicting Possible Loan Default Using Machine Learning

Predicting Possible Loan Default Using Machine Learning

Project idea – The idea behind this ML project is to build a model for a Loan Prediction Based on Customer Behavior and determine the risk factor. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. this ML project is to build a model for a Loan Prediction Based on Customer BehaviorProblemCompany wants to automate the loan risk factor based on customer detail behavior. A loan default occurs when a borrower takes money from a bank and does not repay the loan. Details are Income, Age, Experience, Married/Single, House_Ownership, Car Ownership, Profession, City,…
Read More
Installing Apache Spark 3  in Local Mode – Command Line (Single Node Cluster) on Windows 10

Installing Apache Spark 3  in Local Mode – Command Line (Single Node Cluster) on Windows 10

In this tutorial, we will set up a single node Spark cluster and run it in local mode using the command line.Step 1) Let's start getting the spark binary you can download the spark binary from the below linkDownload Spark link: https://spark.apache.org/Windows Utils link: https://github.com/steveloughran/winutilsStep 2) Click on Download Step 3) A new Web page will get open i) Choose a Spark release as 3.0.3ii) Choose a package type as Pre-built for Apache Hadoop 2.7 Step 4) Click on Download Spark spark-3.0.3-bin-hadoop2.7.tgz Step 5) A new Web Page will get open Step 6) Click on the link to download Step 7)…
Read More
Machine Learning Project – Loan Approval Prediction

Machine Learning Project – Loan Approval Prediction

Project idea – The idea behind this ML project is to build a model for a Home Loan Company to validates the customer eligibility for loan. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. They have presence across all urban, semi urban and rural areas. Customer first apply for home loan after that company validates the customer eligibility for loan.ProblemCompany wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount,…
Read More
Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Databricks is founded by the creators of Apache Spark, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. Databricks is an American enterprise software company founded by the creators of Apache Spark. The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. Gartner has classified Databricks as a leader in the last quadrant for Data Science and Machine Learning platforms. General information: Exam length: The exam…
Read More
Healthcare Analytics for Beginners Part 1

Healthcare Analytics for Beginners Part 1

Health care analytics is the health care analysis activities that can be undertaken as a result of data collected from four areas within healthcare; claims and cost data, pharmaceutical and research and development (R&D) data, clinical data (collected from electronic medical records (EHRs)), and patient behavior and sentiment data. Data Description PatientProfile.csv – This file contains Patient profile details like PatientID, OnlineFollower, Social media details, Income, Education, Age, FirstInteractionDate, CityType and Employer_Category More Info On patient_profiles file. Patient_ID Unique Identifier for each patient. This ID is not sequential in nature and can not be used in model Online_Follower Whether a patient follows…
Read More
Healthcare Analytics for Beginners Part 2

Healthcare Analytics for Beginners Part 2

Patient's Age Patient's Income Patient's Occupation All in One Scatter Plot Loading Data into DataFrame %scala // File location and type val file_location = "/FileStore/tables/First_Health_Camp_Attended.csv" val file_type = "csv" // CSV options val infer_schema = "true" val first_row_is_header = "true" val delimiter = "," // The applied options are for CSV files. For other file types, these will be ignored. val First_Health_Camp_Attended = spark.read.format(file_type) .option("inferSchema", infer_schema) .option("header", first_row_is_header) .option("sep", delimiter) .load(file_location) display(First_Health_Camp_Attended) Count of Data (Total Records) %scala First_Health_Camp_Attended.count() res12: Long = 6218 Displaying Statistics of Data %scala display(First_Health_Camp_Attended.describe()) Print Schema of Data %scala First_Health_Camp_Attended.printSchema() root |-- Patient_ID: integer (nullable…
Read More
Install Apache Spark On Ubuntu

Install Apache Spark On Ubuntu

With this tutorial, we will learn the complete process to install Apache Spark 3.2.0 on Ubuntu 20.  Prerequisite:  Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x). Steps for Installing Apache Spark Step 1 - Create a directory for example $mkdir /home/bigdata/apachespark Step 2 - Move to Apache Spark directory $cd /home/bigdata/apachespark Step 3 - Download…
Read More