Apache Spark Analytics

Healthcare Analytics for Beginners Part 1

Healthcare Analytics for Beginners Part 1

Health care analytics is the health care analysis activities that can be undertaken as a result of data collected from four areas within healthcare; claims and cost data, pharmaceutical and research and development (R&D) data, clinical data (collected from electronic medical records (EHRs)), and patient behavior and sentiment data. Data Description PatientProfile.csv – This file contains Patient profile details like PatientID, OnlineFollower, Social media details, Income, Education, Age, FirstInteractionDate, CityType and Employer_Category More Info On patient_profiles file. Patient_ID Unique Identifier for each patient. This ID is not sequential in nature and can not be used in model Online_Follower Whether a patient follows…
Read More
Healthcare Analytics for Beginners Part 2

Healthcare Analytics for Beginners Part 2

Patient's Age Patient's Income Patient's Occupation All in One Scatter Plot Loading Data into DataFrame %scala // File location and type val file_location = "/FileStore/tables/First_Health_Camp_Attended.csv" val file_type = "csv" // CSV options val infer_schema = "true" val first_row_is_header = "true" val delimiter = "," // The applied options are for CSV files. For other file types, these will be ignored. val First_Health_Camp_Attended = spark.read.format(file_type) .option("inferSchema", infer_schema) .option("header", first_row_is_header) .option("sep", delimiter) .load(file_location) display(First_Health_Camp_Attended) Count of Data (Total Records) %scala First_Health_Camp_Attended.count() res12: Long = 6218 Displaying Statistics of Data %scala display(First_Health_Camp_Attended.describe()) Print Schema of Data %scala First_Health_Camp_Attended.printSchema() root |-- Patient_ID: integer (nullable…
Read More
Marketing Analytics Part 1

Marketing Analytics Part 1

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes. Overall goalYou're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions.Section 01: Exploratory Data Analysis Are there any null values or outliers? How will you wrangle/handle them?Are there any variables that warrant transformations?Are there any useful variables that you can engineer with the given data?Do…
Read More
Marketing Analytics Part 2

Marketing Analytics Part 2

Are there any useful variables that you can engineer with the given data?Review a list of the feature names below, from which we can engineer:The total number of dependents in the home ('Dependents') can be engineered from the sum of 'Kidhome' and 'Teenhome'The year of becoming a customer ('Year_Customer') can be engineered from 'Dt_Customer'The total amount spent ('TotalMnt') can be engineered from the sum of all features containing the keyword 'Mnt'The total purchases ('TotalPurchases') can be engineered from the sum of all features containing the keyword 'Purchases' The total number of campaigns accepted ('TotalCampaignsAcc') can be engineered from the sum of…
Read More
Marketing Analytics Part 3

Marketing Analytics Part 3

NumStorePurchases VS MntGoldProds MntFishProducts Distribution Campaign 1 Campaign 2 Campaign 3 Campaign 4 Campaign 5 Section 03: Data Visualization Products VS Amount Spent Purchases Conclusion Recall the overall goal: You're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions...Summary of actionable findings to improve advertising campaign success:Advertising campaign acceptance is positively correlated with income and negatively correlated with having kids/teensSuggested action: Create two streams of targeted advertising campaigns,…
Read More
Sentiment Analysis on Demonetization in India using Apache Spark

Sentiment Analysis on Demonetization in India using Apache Spark

In this article, We have explored the Sentiments of People in India during Demonetization. Even by using small data, I could still gain a lot of valuable insights. I have used Spark SQL and Inbuild graphs provided by Databricks.India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world’s population. Let us find out the views of different people on the demonetization by analyzing the tweets from Twitter. Attribute Information or Dataset Details: Table Created in Databricks Environment Technology Used Apache Spark Spark SQL DataFrame-based API Databricks Notebook Free Account…
Read More
Analytics on India census using Apache Spark Part 1

Analytics on India census using Apache Spark Part 1

In this article, We have explored Census data for India to understand changes in India’s demographics, population growth, religion distribution, gender distribution, and sex ratio, etc. Even by using small data, I could still gain a lot of valuable insights about the country. I have used Spark SQL and Inbuild graphs provided by Databricks. India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world's population. Already containing 17.5% of the world's population, India is projected to be the world's most populous country by 2025, surpassing China, its population reaching…
Read More
Analytics on India census using Apache Spark Part 2

Analytics on India census using Apache Spark Part 2

Code for Spark SQL to get Population Density in terms of Districts Code for Spark SQL to get Scheduled Castes (SCs) Population per State Code for Spark SQL to get What percentage of the states are actually literate in India? Code for Spark SQL to get State which have Literacy rate less than 50% Code for Spark SQL to get Male and Female Literacy rate as per State Code for Spark SQL to get Literacy Rate as per Type of Education for every State Code for Spark SQL to get Male and Female Percentage as per state Code for Spark SQL to…
Read More
Analytics on India census using Apache Spark Part 3

Analytics on India census using Apache Spark Part 3

Code for Spark SQL to get Status Electricity Facility by State Code for Spark SQL to get Education Facility in India as Per State Code for Spark SQL to get Medical Facility in India Code for Spark SQL to get Bus Transportation per State Code for Spark SQL to get Raod Status in India Code for Spark SQL to get Residence Status in India by State
Read More
Bidding Auction Data Analytics in Apache Spark

Bidding Auction Data Analytics in Apache Spark

In this article, We have explored the Bidding Auction Data Analysis The dataset to be used is from eBay like online auctions. Even by using small data, I could still gain a lot of valuable insights. I have used Spark RDD in Databricks.In this activity, we will load data into Apache Spark and inspect the data using the Spark In this section, we use the SparkContext method, textFile, to load the data into a Resilient Distributed Dataset (RDD). Our dataset is a .csv file that consists of online auction data. Each auction has an auction id associated with it and…
Read More