Databricks

Medical Appointment Data Analysis

Medical Appointment Data Analysis

Project idea – The idea behind this Analysis project is to analysis a person makes a doctor's appointment, receives all the instructions, and no-show. Who to blame? Problem Statement or Business Problem Problem A person makes a doctor's appointment, receives all the instructions, and no-show. Who to blame? In this tutorial we will try to analyze why would some patient not show up for his medical appointment and whether there are reasons for that using the data we have. We will try to find some correlation between the different attributes we have and whether the patient shows up or not.…
Read More
Predicting Possible Loan Default Using Machine Learning

Predicting Possible Loan Default Using Machine Learning

Project idea – The idea behind this ML project is to build a model for a Loan Prediction Based on Customer Behavior and determine the risk factor. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. this ML project is to build a model for a Loan Prediction Based on Customer BehaviorProblemCompany wants to automate the loan risk factor based on customer detail behavior. A loan default occurs when a borrower takes money from a bank and does not repay the loan. Details are Income, Age, Experience, Married/Single, House_Ownership, Car Ownership, Profession, City,…
Read More
Machine Learning Project – Loan Approval Prediction

Machine Learning Project – Loan Approval Prediction

Project idea – The idea behind this ML project is to build a model for a Home Loan Company to validates the customer eligibility for loan. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. They have presence across all urban, semi urban and rural areas. Customer first apply for home loan after that company validates the customer eligibility for loan.ProblemCompany wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount,…
Read More
Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Databricks is founded by the creators of Apache Spark, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. Databricks is an American enterprise software company founded by the creators of Apache Spark. The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. Gartner has classified Databricks as a leader in the last quadrant for Data Science and Machine Learning platforms. General information: Exam length: The exam…
Read More
Healthcare Analytics for Beginners Part 1

Healthcare Analytics for Beginners Part 1

Health care analytics is the health care analysis activities that can be undertaken as a result of data collected from four areas within healthcare; claims and cost data, pharmaceutical and research and development (R&D) data, clinical data (collected from electronic medical records (EHRs)), and patient behavior and sentiment data. Data Description PatientProfile.csv – This file contains Patient profile details like PatientID, OnlineFollower, Social media details, Income, Education, Age, FirstInteractionDate, CityType and Employer_Category More Info On patient_profiles file. Patient_ID Unique Identifier for each patient. This ID is not sequential in nature and can not be used in model Online_Follower Whether a patient follows…
Read More
Healthcare Analytics for Beginners Part 2

Healthcare Analytics for Beginners Part 2

Patient's Age Patient's Income Patient's Occupation All in One Scatter Plot Loading Data into DataFrame %scala // File location and type val file_location = "/FileStore/tables/First_Health_Camp_Attended.csv" val file_type = "csv" // CSV options val infer_schema = "true" val first_row_is_header = "true" val delimiter = "," // The applied options are for CSV files. For other file types, these will be ignored. val First_Health_Camp_Attended = spark.read.format(file_type) .option("inferSchema", infer_schema) .option("header", first_row_is_header) .option("sep", delimiter) .load(file_location) display(First_Health_Camp_Attended) Count of Data (Total Records) %scala First_Health_Camp_Attended.count() res12: Long = 6218 Displaying Statistics of Data %scala display(First_Health_Camp_Attended.describe()) Print Schema of Data %scala First_Health_Camp_Attended.printSchema() root |-- Patient_ID: integer (nullable…
Read More
Marketing Analytics Part 1

Marketing Analytics Part 1

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes. Overall goalYou're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions.Section 01: Exploratory Data Analysis Are there any null values or outliers? How will you wrangle/handle them?Are there any variables that warrant transformations?Are there any useful variables that you can engineer with the given data?Do…
Read More
Marketing Analytics Part 2

Marketing Analytics Part 2

Are there any useful variables that you can engineer with the given data?Review a list of the feature names below, from which we can engineer:The total number of dependents in the home ('Dependents') can be engineered from the sum of 'Kidhome' and 'Teenhome'The year of becoming a customer ('Year_Customer') can be engineered from 'Dt_Customer'The total amount spent ('TotalMnt') can be engineered from the sum of all features containing the keyword 'Mnt'The total purchases ('TotalPurchases') can be engineered from the sum of all features containing the keyword 'Purchases' The total number of campaigns accepted ('TotalCampaignsAcc') can be engineered from the sum of…
Read More
Marketing Analytics Part 3

Marketing Analytics Part 3

NumStorePurchases VS MntGoldProds MntFishProducts Distribution Campaign 1 Campaign 2 Campaign 3 Campaign 4 Campaign 5 Section 03: Data Visualization Products VS Amount Spent Purchases Conclusion Recall the overall goal: You're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions...Summary of actionable findings to improve advertising campaign success:Advertising campaign acceptance is positively correlated with income and negatively correlated with having kids/teensSuggested action: Create two streams of targeted advertising campaigns,…
Read More
Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Machine Learning Project – Creating Movies Recommendation Engine using Apache Spark

Movies are loved by everyone irrespective of age, gender, race, color, or geographical location. A recommendation system is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. Recommendation systems encompass a class of techniques and algorithms that can suggest “relevant” items to users. They predict future behavior based on past data through a multitude of techniques. Problem Statement or Business Problem In this project, we will generate top 10 movie recommendations for each user as well as generate top 10 user recommendations for each movie. Attribute Information…
Read More