Generate Analytics from a Product based Company Web Log Part 1

In this article, We have explore generating Analytics from a Product based Company using Web Log. Even by using small data, We could still gain a lot of valuable insights.

Problem Statement: Generate Analytics based on the data in Hadoop Eco-system:

  1. Load weblog data into HDFS using HDFS client
  2.      Develop Pig program to load log and perform analytics on  IP Category-1 Category-2 page, status_code

2.1.   Count of page views by individual user ie [IP, count(*)]

2.2.  Top / Bottom 2: catagery-1/ catagery-2 / page /users (Exclude status code other than  200)

Top 2 and bottom 2 records

  • Category, total_number_views
  • page, total_number_views
  • IP, total_number_of_views

2.3. Total page views / Category wise pageviews / Unique pageviews

  • page,total_number_of_views
  • category, total_views
  • page, total_number_of_unique_views

2.4. Count of status code = 200 / 404 / 400 / 500

  • status_code, count
  1. Load results into tables in MySql Database using Sqoop.

Attribute Information or Dataset Details:

Data: Input Format – Text

Technology Used

  1. Apache Hadoop (HDFS)
  2. Apache Pig
  3. Apache Hive
  4. MySQL
  5. Shell Script
  6. Apache Sqoop
  7. Linux

Apache Pig Code

Pig

Shell Script

Shell
By Bhavesh