Agenda This script will serve as an introduction to advanced data analysis utilizing the SQL language, which should be a necessary tool for every data scientist, data engineer, and machine learning engineer to gain access to data. The idea underlying SQL is fairly similar to that of any other language or tool used for data analysis (excel, Pandas), thus it should be very intuitive for individuals who have experience working with data. Loading Data into https://sqliteonline.com/ Open Website in Browser Click on File and select on Open DB Select the file database.sqlite which is downloaded from the download section and…

System Requirements:Java Runtime Environment - Java 1.8 or laterMemory - Sufficient memory for configurations used by sources, channels or sinksDisk Space - Sufficient disk space for configurations used by channels or sinksDirectory Permissions - Read/Write permissions for directories used by agentThe first step is to create a folder Flume:Make flume directory in /home/dataengineer/mkdir flumecd flume We need to go to the website https://flume.apache.org/ and click on download. A new webpage will get open click on apache-flume-1.11.0-bin.tar.gz A new webpage will get open https://www.apache.org/dyn/closer.lua/flume/1.11.0/apache-flume-1.11.0-bin.tar.gz and copy the link shown to you. Type the below commandwget https://dlcdn.apache.org/flume/1.11.0/apache-flume-1.11.0-bin.tar.gz You will be able to…

Step 1: Update/Upgrade Package Repositorysudo apt updatesudo apt upgradeStep 2: Install MySQLsudo apt install mysql-serverWhen asked if you want to continue with the installation, answer Y and hit ENTER.Note: If you only want to connect to a remote MySQL server instead of hosting a database on your machine, install only the MySQL Client by running:sudo apt install mysql-client Step 3: Check if MySQL Service Is Runningsudo systemctl status mysql Step 4: Log in to MySQL Serversudo mysql -u root

Step 1) Create a Sqoop directory by using the command mkdir sqoop so that we can download Apache Sqoop.Step 2) Download the stable version of Apache Sqoop (ie Apache Sqoop 1.4.7 in the year 2022) Website URL https://archive.apache.org/dist/sqoop/1.4.7/wget https://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gzStep 3) Unzip the downloaded file using the tar commandtar -xvzf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gzStep 4) Edit the .bashrc file by using the commandnano .bashrcStep 5) Enter the following commands below in bashrc file and save itexport SQOOP_HOME="/home/dataengineer/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0"export PATH=$PATH:$SQOOP_HOME/bin Step 6) Execute the below command on the command prompt so bashrc gets activated.source ~/.bashrcStep 7) Check the installed sqoop version using the below commandsqoop version…

Project idea – The idea behind this project is to analysis and generate Vehicle Sales Report generation and Dive into data on popular vehicles using the following dimensions such as Total Revenue, Total Products Sold, Quarterly Revenue, Total Items Sold (By Product Line), Quarterly Revenue (By Product Line), and Overall Sales (By Product Line) Problem Statement or Business Problem Visualizes Vehicle sales data and generate a report out of it, Dive into data on the vehicle using the following dimensions:Total RevenueTotal Products SoldQuarterly RevenueTotal Items Sold (By Product Line)Quarterly Revenue (By Product Line)Overall Sales (By Product Line)Proportion of Monthly Revenue…

Project idea – The idea behind this project is to analysis Video Game Sales and Dive into data on popular video games using the following dimensions such as Year, Platform, Publisher and Genre Problem Statement or Business Problem Visualizes sales & platform data on video games that sold more than 100k copies.Dive into data on popular video games using the following dimensions:YearPlatformPublisherGenre Attribute Information or Dataset Details: rank: integer (nullable = true)name: string (nullable = true)platform: string (nullable = true)year: string (nullable = true)genre: string (nullable = true)publisher: string (nullable = true)na_sales: double (nullable = true)eu_sales: double (nullable = true)jp_sales:…

Project idea – The idea behind this Analysis project is to analysis Slack usage. Problem Statement or Business Problem Slack is a messaging program designed specifically for the office, but has also been adopted for personal use. Slack including persistent chat rooms (channels) organized by topic, private groups, and direct messaging. In addition to these online communication features, Slack integrates with other software.In this tutorial we will try to analyze usage of slack software. Technology Used Apache SparkSpark SQLScalaDataFrame-based APIDatabricks Notebook Introduction Welcome to this project on Slack Data Analysis in Apache Spark Analytics using Databricks platform community edition server…

Project idea – The idea behind this ML project is to build a model for Life Expectancy and Statistical Analysis on factors influencing Life Expectancy Problem Statement or Business Problem Although there have been lot of studies undertaken in the past on factors affecting life expectancy considering demographic variables, income composition and mortality rates. It was found that affect of immunization and human development index was not taken into account in the past. Also, some of the past research was done considering linear regression based on data set of one year for all the countries. Hence, this gives motivation to…

Scatter Plot (Life_Expectancy VS Adult_Mortality) Scatter Plot (Life_Expectancy VS Infant_Deaths) Scatter Plot (Life_Expectancy VS Alcohol) Scatter Plot (Life_Expectancy VS Percentage_Expenditure) Scatter Plot (Life_Expectancy VS Hepatitis_B) Scatter Plot (Life_Expectancy VS Under_Five_Deaths) Scatter Plot (Life_Expectancy VS Polio) Scatter Plot (Life_Expectancy VS Total_Expenditure) Scatter Plot (Life_Expectancy VS Diphtheria) Scatter Plot (Life_Expectancy VS HIV_AIDS) Scatter Plot (Life_Expectancy VS GDP) Scatter Plot (Life_Expectancy VS Population) Scatter Plot (Life_Expectancy VS Thinness_1_19_years) Scatter Plot (Life_Expectancy VS Thinness_5_9_years) Scatter Plot (Life_Expectancy VS Income_Composition_of_Resources) Scatter Plot (Life_Expectancy VS Schooling) Scatter Plot (Schooling VS Adult_Mortality) Scatter Plot (Schooling VS Income_Composition_of_Resources) Scatter Plot (Adult_Mortality VS Income_Composition_of_Resources) Collecting all String Columns into…

Project idea – The idea behind this Analysis project is to analysis a person makes a doctor's appointment, receives all the instructions, and no-show. Who to blame? Problem Statement or Business Problem ProblemA person makes a doctor's appointment, receives all the instructions, and no-show. Who to blame?In this tutorial we will try to analyze why would some patient not show up for his medical appointment and whether there are reasons for that using the data we have. We will try to find some correlation between the different attributes we have and whether the patient shows up or not. The dataset…