Download a recent stable release from one of the Apache Download website https://pig.apache.org/releases.htmlClick on Download A new Page will get open (https://www.apache.org/dyn/closer.cgi/pig) Click on the link as marked in the below image A new page will get Open Click on Latest Folder so the new page will get open Download the file as shown in the image We have downloaded the file in directory /home/dataengineer/apachepig/ Unzip the file using the below commandtar -xvzf pig-0.17.0.tar.gzAdd /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv (tcsh,csh). For example:$ export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATHExecuting Pig Help Command
With more companies turning to big data to run their business, the demand for talent is at an all-time high. What does that mean for you? It just translates to better opportunities if you want to get employed in any of the big data-related fields. In the era of big data, companies are turning more and more towards using big data to operate their operations. It means better prospects for employment in any big data-related organization. There is a huge demand for talent in the big data era, with more and more companies utilizing big data to run their operations.…
In this article, We will analyze Consumer Complains recorded by US government from US citizens about financial products and services using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyze the data in Hadoop Eco-system to: Get the number of complaints filed for each company.Get the number of complaints filed under each product.Get the total number of complaints filed from a particular locationGet the list of company grouped by location which has no timely response. Attribute Information or Dataset Details: Data: Input Format - .CSV Public DATASET available at below website https://catalog.data.gov/dataset/consumer-complaint-database…
Execution of the Shell script Output Chart view in Excel 1. Get the number of complaints filed for each company. (Complains_by_Company.csv) 2. Get the number of complaints filed under each product. (Complains_by_Product.csv) 3. Get the total number of complaints filed from a particular location (Complains_by_Location.csv) 4. Get the list of company grouped by location which has no timely response (Complains_by_Response_No.csv) (Note: To generate the below Chart we need to Use PIVOT Chart option in Order to group data)
In this article, We will see how to Analyze YouTube Data using Big Data Technology, We will see step by step process execution of the project. YouTube is an American online video-sharing platform headquartered in San Bruno, California. Three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim created the service in February 2005. Problem Statement: Problem Statement is to 1) Find out the top 5 categories in which the most number of videos are uploaded. 2) Find top 10 rated videos, 3) Find top 10 most viewed videos 4) Find top 10 rated videos in each category 5) Find…
In this article, we will Analyze social bookmarking sites to find insights using Big Data Technology, Data comprises of the information gathered from sites that are bookmarking sites and allow you to bookmark, review, rate, on a specific topic. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various categories defining it and the ratings linked with it. Problem Statement: Analyse the data in Hadoop Eco-system to: Fetch the data into Hadoop Distributed File System and analyze it with the help of MapReduce, Pig, and Hive…
Execution of Shell Script MapReduce Output (XML Converted to Comma Separated file) Apache Pig Script Execution Apache Pig script generates 4 Output files Apache HIVE Execution Apache HIVE Output Apache Sqoop Execution Apache Sqoop Output in RDBMS Tables
In this article, We have explore generating Analytics from a Product based Company using Web Log. Even by using small data, We could still gain a lot of valuable insights. Problem Statement: Generate Analytics based on the data in Hadoop Eco-system: Load weblog data into HDFS using HDFS client Develop Pig program to load log and perform analytics on IP Category-1 Category-2 page, status_code 2.1. Count of page views by individual user ie [IP, count(*)] 2.2. Top / Bottom 2: catagery-1/ catagery-2 / page /users (Exclude status code other than 200) Top 2 and bottom 2 records Category, total_number_viewspage,…
Shell Script Execution Apache Hive Output MySQL Output