Customer Complaints Analysis Part 1

In this article, We will analyze Consumer Complains recorded by US government from US citizens about financial products and services using Big Data Technology, We will see step by step process execution of the project.

Problem Statement: Analyze the data in Hadoop Eco-system to:

  1. Get the number of complaints filed for each company.
  2. Get the number of complaints filed under each product.
  3. Get the total number of complaints filed from a particular location
  4. Get the list of company grouped by location which has no timely response.

Attribute Information or Dataset Details:

Data: Input Format – .CSV

Public DATASET available at below website

https://catalog.data.gov/dataset/consumer-complaint-database

(Note: Original data contains some additional commas we need to remove those commas)

Technology Used

  1. Apache Hadoop (HDFS)
  2. Apache Pig
  3. Apache Hive
  4. Shell Script
  5. Microsoft Excel
  6. Linux

Flow Chart – Processing Logic (Customer Complaints data in Hadoop Echo System)

Apache Pig Script​

Apache Pig Script purpose it to address the below Problem

  1. Get the number of complaints filed for each company.
  2. Get the number of complaints filed under each product.
  3. Get the total number of complaints filed from a particular location
  4. Get the list of company grouped by location which has no timely response

Four Output files will be created.

Apache Pig Code – Customer_Complain_Analysis.pig​

Pig

Shell Script​

Purpose of this shell script is to perform cleanup (delete existing output files) and execute the Pig Script to get Customer Complaints Analysis and store the resultant file in CSV format.

Shell Script Code – Customer_Complain_Analysis.sh​

Shell
By Bhavesh