Healthcare Analytics for Beginners Part 2

Patient's Age

Patient's Income

Patient's Occupation

All in One Scatter Plot

Loading Data into DataFrame

%scala

// File location and type
val file_location = "/FileStore/tables/First_Health_Camp_Attended.csv"
val file_type = "csv"

// CSV options
val infer_schema = "true"
val first_row_is_header = "true"
val delimiter = ","

// The applied options are for CSV files. For other file types, these will be ignored.
val First_Health_Camp_Attended = spark.read.format(file_type)
.option("inferSchema", infer_schema)
.option("header", first_row_is_header)
.option("sep", delimiter)
.load(file_location)

display(First_Health_Camp_Attended)

Count of Data (Total Records)

%scala

First_Health_Camp_Attended.count()

res12: Long = 6218

Displaying Statistics of Data

%scala

display(First_Health_Camp_Attended.describe())

Print Schema of Data

%scala

First_Health_Camp_Attended.printSchema()

root |-- Patient_ID: integer (nullable = true)
|-- Health_Camp_ID: integer (nullable = true)
|-- Donation: integer (nullable = true)
|-- Health_Score: double (nullable = true)

Creating Temp View so we can run Spark SQL Queries on data

%scala

First_Health_Camp_Attended.createOrReplaceTempView("First_Health_Camp_Attended");

Exploratory Data Analysis

Donation Distribution

Health Score Vs Donation

Histogram of Health Score

By Bhavesh