Apache Hadoop

Generate Analytics from a Product based Company Web Log Part 1

Generate Analytics from a Product based Company Web Log Part 1

In this article, We have explore generating Analytics from a Product based Company using Web Log. Even by using small data, We could still gain a lot of valuable insights. Problem Statement: Generate Analytics based on the data in Hadoop Eco-system: Load weblog data into HDFS using HDFS client     Develop Pig program to load log and perform analytics on  IP Category-1 Category-2 page, status_code 2.1.   Count of page views by individual user ie [IP, count(*)] 2.2.  Top / Bottom 2: catagery-1/ catagery-2 / page /users (Exclude status code other than  200) Top 2 and bottom 2 records Category, total_number_viewspage,…
Read More
Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 1

Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 1

In this article, We will see how to process Sensex Log (Share Market) which is in PDF format using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyse the data in Hadoop Eco-system to: Take the complete PDF Input data on HDFSDevelop a Map-Reduce Use Case to get the below-filtered results from the HDFS Input data(Excel data)   If TYPE OF TRADING is -->'SIP'        - OPEN_BALANCE > 25000 & FLTUATION_RATE > 10  --> store "HighDemandMarket"        -CLOSING_BALANCE<22000 & FLTUATION_RATE IN BETWEEN 20 - 30  --> store "OnGoingMarketStretegy"   If TYPE OF…
Read More
Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 2

Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 2

Apache Pig Script​ SENSEX.pig A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/HighDemandMarket-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disHM = DISTINCT A; orHM = ORDER disHM by Sid; STORE orHM into '/hdfs/bhavesh/SENSEX/HM' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/ReliableProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disRP = DISTINCT A; orRP = ORDER disRP by Sid; STORE orRP into '/hdfs/bhavesh/SENSEX/RP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/OtherProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disOP = DISTINCT A; orOP = ORDER disOP by Sid; STORE orOP into '/hdfs/bhavesh/SENSEX/OP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/WealthyProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disWP= DISTINCT A; orWP = ORDER disWP by Sid; STORE orWP into '/hdfs/bhavesh/SENSEX/WP' using PigStorage(','); A…
Read More