Blog

Predicting Possible Loan Default Using Machine Learning

Predicting Possible Loan Default Using Machine Learning

Project idea – The idea behind this ML project is to build a model for a Loan Prediction Based on Customer Behavior and determine the risk factor. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. this ML project is to build a model for a Loan Prediction Based on Customer BehaviorProblemCompany wants to automate the loan risk factor based on customer detail behavior. A loan default occurs when a borrower takes money from a bank and does not repay the loan. Details are Income, Age, Experience, Married/Single, House_Ownership, Car Ownership, Profession, City,…
Read More
Installing Apache Spark 3  in Local Mode – Command Line (Single Node Cluster) on Windows 10

Installing Apache Spark 3  in Local Mode – Command Line (Single Node Cluster) on Windows 10

In this tutorial, we will set up a single node Spark cluster and run it in local mode using the command line.Step 1) Let's start getting the spark binary you can download the spark binary from the below linkDownload Spark link: https://spark.apache.org/Windows Utils link: https://github.com/steveloughran/winutilsStep 2) Click on Download Step 3) A new Web page will get open i) Choose a Spark release as 3.0.3ii) Choose a package type as Pre-built for Apache Hadoop 2.7 Step 4) Click on Download Spark spark-3.0.3-bin-hadoop2.7.tgz Step 5) A new Web Page will get open Step 6) Click on the link to download Step 7)…
Read More
Machine Learning Project – Loan Approval Prediction

Machine Learning Project – Loan Approval Prediction

Project idea – The idea behind this ML project is to build a model for a Home Loan Company to validates the customer eligibility for loan. Problem Statement or Business Problem About CompanyWonderful Dream Housing Finance company deals in all home loans. They have presence across all urban, semi urban and rural areas. Customer first apply for home loan after that company validates the customer eligibility for loan.ProblemCompany wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount,…
Read More
Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Databricks is founded by the creators of Apache Spark, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. Databricks is an American enterprise software company founded by the creators of Apache Spark. The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. Gartner has classified Databricks as a leader in the last quadrant for Data Science and Machine Learning platforms. General information: Exam length: The exam…
Read More
Healthcare Analytics for Beginners Part 1

Healthcare Analytics for Beginners Part 1

Health care analytics is the health care analysis activities that can be undertaken as a result of data collected from four areas within healthcare; claims and cost data, pharmaceutical and research and development (R&D) data, clinical data (collected from electronic medical records (EHRs)), and patient behavior and sentiment data. Data Description PatientProfile.csv – This file contains Patient profile details like PatientID, OnlineFollower, Social media details, Income, Education, Age, FirstInteractionDate, CityType and Employer_Category More Info On patient_profiles file. Patient_ID Unique Identifier for each patient. This ID is not sequential in nature and can not be used in model Online_Follower Whether a patient follows…
Read More
Healthcare Analytics for Beginners Part 2

Healthcare Analytics for Beginners Part 2

Patient's Age Patient's Income Patient's Occupation All in One Scatter Plot Loading Data into DataFrame %scala // File location and type val file_location = "/FileStore/tables/First_Health_Camp_Attended.csv" val file_type = "csv" // CSV options val infer_schema = "true" val first_row_is_header = "true" val delimiter = "," // The applied options are for CSV files. For other file types, these will be ignored. val First_Health_Camp_Attended = spark.read.format(file_type) .option("inferSchema", infer_schema) .option("header", first_row_is_header) .option("sep", delimiter) .load(file_location) display(First_Health_Camp_Attended) Count of Data (Total Records) %scala First_Health_Camp_Attended.count() res12: Long = 6218 Displaying Statistics of Data %scala display(First_Health_Camp_Attended.describe()) Print Schema of Data %scala First_Health_Camp_Attended.printSchema() root |-- Patient_ID: integer (nullable…
Read More
Practice Apache Superset without Installing

Practice Apache Superset without Installing

Apache Superset is a modern data exploration and visualization platform. Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.PresetPreset Cloud is a fully hosted, hassle free cloud service for Apache Superset™. Get started for free today!www.preset.ioWe should start with Starter Plan Hassle free Superset in the cloud, best for small teams.Free: Forever, for up to 5 usersFeatures:Unlimited dashboards and chartsNo-code chart builderCollaborative SQL editorOver 40 visualization typesChart and dashboard cachehttps://youtu.be/49ItnEXsN7M
Read More
Install Apache Spark On Ubuntu

Install Apache Spark On Ubuntu

With this tutorial, we will learn the complete process to install Apache Spark 3.2.0 on Ubuntu 20.  Prerequisite:  Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x). Steps for Installing Apache Spark Step 1 - Create a directory for example $mkdir /home/bigdata/apachespark Step 2 - Move to Apache Spark directory $cd /home/bigdata/apachespark Step 3 - Download…
Read More
Marketing Analytics Part 1

Marketing Analytics Part 1

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes. Overall goalYou're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions.Section 01: Exploratory Data Analysis Are there any null values or outliers? How will you wrangle/handle them?Are there any variables that warrant transformations?Are there any useful variables that you can engineer with the given data?Do…
Read More
Marketing Analytics Part 2

Marketing Analytics Part 2

Are there any useful variables that you can engineer with the given data?Review a list of the feature names below, from which we can engineer:The total number of dependents in the home ('Dependents') can be engineered from the sum of 'Kidhome' and 'Teenhome'The year of becoming a customer ('Year_Customer') can be engineered from 'Dt_Customer'The total amount spent ('TotalMnt') can be engineered from the sum of all features containing the keyword 'Mnt'The total purchases ('TotalPurchases') can be engineered from the sum of all features containing the keyword 'Purchases' The total number of campaigns accepted ('TotalCampaignsAcc') can be engineered from the sum of…
Read More