Bhavesh

60 Posts
Healthcare Analytics for Beginners Part 1

Healthcare Analytics for Beginners Part 1

Health care analytics is the health care analysis activities that can be undertaken as a result of data collected from four areas within healthcare; claims and cost data, pharmaceutical and research and development (R&D) data, clinical data (collected from electronic medical records (EHRs)), and patient behavior and sentiment data. Data Description PatientProfile.csv – This file contains Patient profile details like PatientID, OnlineFollower, Social media details, Income, Education, Age, FirstInteractionDate, CityType and Employer_Category More Info On patient_profiles file. Patient_ID Unique Identifier for each patient. This ID is not sequential in nature and can not be used in model Online_Follower Whether a patient follows…
Read More
Healthcare Analytics for Beginners Part 2

Healthcare Analytics for Beginners Part 2

Patient's Age Patient's Income Patient's Occupation All in One Scatter Plot Loading Data into DataFrame %scala // File location and type val file_location = "/FileStore/tables/First_Health_Camp_Attended.csv" val file_type = "csv" // CSV options val infer_schema = "true" val first_row_is_header = "true" val delimiter = "," // The applied options are for CSV files. For other file types, these will be ignored. val First_Health_Camp_Attended = spark.read.format(file_type) .option("inferSchema", infer_schema) .option("header", first_row_is_header) .option("sep", delimiter) .load(file_location) display(First_Health_Camp_Attended) Count of Data (Total Records) %scala First_Health_Camp_Attended.count() res12: Long = 6218 Displaying Statistics of Data %scala display(First_Health_Camp_Attended.describe()) Print Schema of Data %scala First_Health_Camp_Attended.printSchema() root |-- Patient_ID: integer (nullable…
Read More
Practice Apache Superset without Installing

Practice Apache Superset without Installing

Apache Superset is a modern data exploration and visualization platform. Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.PresetPreset Cloud is a fully hosted, hassle free cloud service for Apache Superset™. Get started for free today!www.preset.ioWe should start with Starter Plan Hassle free Superset in the cloud, best for small teams.Free: Forever, for up to 5 usersFeatures:Unlimited dashboards and chartsNo-code chart builderCollaborative SQL editorOver 40 visualization typesChart and dashboard cachehttps://youtu.be/49ItnEXsN7M
Read More
Install Apache Spark On Ubuntu

Install Apache Spark On Ubuntu

With this tutorial, we will learn the complete process to install Apache Spark 3.2.0 on Ubuntu 20.  Prerequisite:  Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x). Steps for Installing Apache Spark Step 1 - Create a directory for example $mkdir /home/bigdata/apachespark Step 2 - Move to Apache Spark directory $cd /home/bigdata/apachespark Step 3 - Download…
Read More
Marketing Analytics Part 1

Marketing Analytics Part 1

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes. Overall goalYou're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions.Section 01: Exploratory Data Analysis Are there any null values or outliers? How will you wrangle/handle them?Are there any variables that warrant transformations?Are there any useful variables that you can engineer with the given data?Do…
Read More
Marketing Analytics Part 2

Marketing Analytics Part 2

Are there any useful variables that you can engineer with the given data?Review a list of the feature names below, from which we can engineer:The total number of dependents in the home ('Dependents') can be engineered from the sum of 'Kidhome' and 'Teenhome'The year of becoming a customer ('Year_Customer') can be engineered from 'Dt_Customer'The total amount spent ('TotalMnt') can be engineered from the sum of all features containing the keyword 'Mnt'The total purchases ('TotalPurchases') can be engineered from the sum of all features containing the keyword 'Purchases' The total number of campaigns accepted ('TotalCampaignsAcc') can be engineered from the sum of…
Read More
Marketing Analytics Part 3

Marketing Analytics Part 3

NumStorePurchases VS MntGoldProds MntFishProducts Distribution Campaign 1 Campaign 2 Campaign 3 Campaign 4 Campaign 5 Section 03: Data Visualization Products VS Amount Spent Purchases Conclusion Recall the overall goal: You're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions...Summary of actionable findings to improve advertising campaign success:Advertising campaign acceptance is positively correlated with income and negatively correlated with having kids/teensSuggested action: Create two streams of targeted advertising campaigns,…
Read More
Apache Hive Installation Steps on Ubuntu

Apache Hive Installation Steps on Ubuntu

With this tutorial, we will learn the complete process to install Apache Hive 3.1.2 on Ubuntu 20.The Apache Hive  data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.Steps for Installing Hadoop on UbuntuStep 1 - Create a directory for example $mkdir /home/bigdata/apachehive Step 2 - Move to hadoop directory $cd /home/bigdata/apachehive Step 3 - Download Apache Hive (Link will change with respect to country so please get the download link from…
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

With this tutorial, we will learn the complete process to install Hadoop 3.3.1 on Ubuntu 20.Supported Java VersionsApache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported:  HADOOP-16795 - Java 11 compile support OPENApache Hadoop from 3.0.x to 3.2.x now supports only Java 8Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8Required software for Linux include: Java must be installed. Recommended Java versions are described at HadoopJavaVersions. ssh must be installed and sshd must be running to use the Hadoop scripts that…
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Use the following property in the respective filesFile: nano etc/hadoop/core-site.xml: <configuration>   <property>     <name>fs.defaultFS</name>     <value>hdfs://localhost:9000</value>   </property> </configuration> File: nano etc/hadoop/hdfs-site.xml <configuration>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration> File: nano etc/hadoop/mapred-site.xml <configuration>   <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>   </property>   <property>     <name>mapreduce.application.classpath</name>     <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>   </property> </configuration> File: nano etc/hadoop/yarn-site.xml <configuration>   <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>   <property>     <name>yarn.nodemanager.env-whitelist</name>     <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME*</value>   </property> </configuration> Now check that you can ssh to the localhost without…
Read More