Bhavesh

77 Posts
YouTube Data Analysis Part-1

YouTube Data Analysis Part-1

In this article, We will see how to Analyze YouTube Data using Big Data Technology, We will see step by step process execution of the project. YouTube is an American online video-sharing platform headquartered in San Bruno, California. Three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim created the service in February 2005. Problem Statement: Problem Statement is to 1) Find out the top 5 categories in which the most number of videos are uploaded. 2) Find top 10 rated videos, 3) Find top 10 most viewed videos 4) Find top 10 rated videos in each category 5) Find…
Read More
Analyze social bookmarking sites to find insights Part 1

Analyze social bookmarking sites to find insights Part 1

In this article, we will Analyze social bookmarking sites to find insights using Big Data Technology, Data comprises of the information gathered from sites that are bookmarking sites and allow you to bookmark, review, rate, on a specific topic. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various categories defining it and the ratings linked with it. Problem Statement: Analyse the data in Hadoop Eco-system to: Fetch the data into Hadoop Distributed File System and analyze it with the help of MapReduce, Pig, and Hive…
Read More
Generate Analytics from a Product based Company Web Log Part 1

Generate Analytics from a Product based Company Web Log Part 1

In this article, We have explore generating Analytics from a Product based Company using Web Log. Even by using small data, We could still gain a lot of valuable insights. Problem Statement: Generate Analytics based on the data in Hadoop Eco-system: Load weblog data into HDFS using HDFS client     Develop Pig program to load log and perform analytics on  IP Category-1 Category-2 page, status_code 2.1.   Count of page views by individual user ie [IP, count(*)] 2.2.  Top / Bottom 2: catagery-1/ catagery-2 / page /users (Exclude status code other than  200) Top 2 and bottom 2 records Category, total_number_viewspage,…
Read More
Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 1

Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 1

In this article, We will see how to process Sensex Log (Share Market) which is in PDF format using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyse the data in Hadoop Eco-system to: Take the complete PDF Input data on HDFSDevelop a Map-Reduce Use Case to get the below-filtered results from the HDFS Input data(Excel data)   If TYPE OF TRADING is -->'SIP'        - OPEN_BALANCE > 25000 & FLTUATION_RATE > 10  --> store "HighDemandMarket"        -CLOSING_BALANCE<22000 & FLTUATION_RATE IN BETWEEN 20 - 30  --> store "OnGoingMarketStretegy"   If TYPE OF…
Read More
Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 2

Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 2

Apache Pig Script​ SENSEX.pig A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/HighDemandMarket-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disHM = DISTINCT A; orHM = ORDER disHM by Sid; STORE orHM into '/hdfs/bhavesh/SENSEX/HM' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/ReliableProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disRP = DISTINCT A; orRP = ORDER disRP by Sid; STORE orRP into '/hdfs/bhavesh/SENSEX/RP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/OtherProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disOP = DISTINCT A; orOP = ORDER disOP by Sid; STORE orOP into '/hdfs/bhavesh/SENSEX/OP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/WealthyProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disWP= DISTINCT A; orWP = ORDER disWP by Sid; STORE orWP into '/hdfs/bhavesh/SENSEX/WP' using PigStorage(','); A…
Read More
Sentiment Analysis on Demonetization in India using Apache Spark

Sentiment Analysis on Demonetization in India using Apache Spark

In this article, We have explored the Sentiments of People in India during Demonetization. Even by using small data, I could still gain a lot of valuable insights. I have used Spark SQL and Inbuild graphs provided by Databricks.India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world’s population. Let us find out the views of different people on the demonetization by analyzing the tweets from Twitter. Attribute Information or Dataset Details: Table Created in Databricks Environment Technology Used Apache Spark Spark SQL DataFrame-based API Databricks Notebook Free Account…
Read More
Analytics on India census using Apache Spark Part 1

Analytics on India census using Apache Spark Part 1

In this article, We have explored Census data for India to understand changes in India’s demographics, population growth, religion distribution, gender distribution, and sex ratio, etc. Even by using small data, I could still gain a lot of valuable insights about the country. I have used Spark SQL and Inbuild graphs provided by Databricks. India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world's population. Already containing 17.5% of the world's population, India is projected to be the world's most populous country by 2025, surpassing China, its population reaching…
Read More