Apache Hive

Installing Apache Druid on the Local Machine

Installing Apache Druid on the Local Machine

Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high uptime are important.Druid is commonly used as the database backend for GUIs of analytical applications, or for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data.Common application areas for Druid include:Clickstream analytics including web and mobile analyticsNetwork telemetry analytics including network performance monitoringServer metrics storageSupply chain analytics including manufacturing metricsApplication performance metricsDigital marketing/advertising analyticsBusiness intelligence/OLAP Prerequisites You can follow these steps on a relatively modest…
Read More
Apache Hive Installation Steps on Ubuntu

Apache Hive Installation Steps on Ubuntu

With this tutorial, we will learn the complete process to install Apache Hive 3.1.2 on Ubuntu 20.The Apache Hive  data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.Steps for Installing Hadoop on UbuntuStep 1 - Create a directory for example $mkdir /home/bigdata/apachehive Step 2 - Move to hadoop directory $cd /home/bigdata/apachehive Step 3 - Download Apache Hive (Link will change with respect to country so please get the download link from…
Read More
Top 1000+ Big Data Interview Question and Answers

Top 1000+ Big Data Interview Question and Answers

With more companies turning to big data to run their business, the demand for talent is at an all-time high. What does that mean for you? It just translates to better opportunities if you want to get employed in any of the big data-related fields. In the era of big data, companies are turning more and more towards using big data to operate their operations. It means better prospects for employment in any big data-related organization. There is a huge demand for talent in the big data era, with more and more companies utilizing big data to run their operations.…
Read More
Analyze social bookmarking sites to find insights Part 1

Analyze social bookmarking sites to find insights Part 1

In this article, we will Analyze social bookmarking sites to find insights using Big Data Technology, Data comprises of the information gathered from sites that are bookmarking sites and allow you to bookmark, review, rate, on a specific topic. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various categories defining it and the ratings linked with it. Problem Statement: Analyse the data in Hadoop Eco-system to: Fetch the data into Hadoop Distributed File System and analyze it with the help of MapReduce, Pig, and Hive…
Read More
Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 1

Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 1

In this article, We will see how to process Sensex Log (Share Market) which is in PDF format using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyse the data in Hadoop Eco-system to: Take the complete PDF Input data on HDFSDevelop a Map-Reduce Use Case to get the below-filtered results from the HDFS Input data(Excel data)   If TYPE OF TRADING is -->'SIP'        - OPEN_BALANCE > 25000 & FLTUATION_RATE > 10  --> store "HighDemandMarket"        -CLOSING_BALANCE<22000 & FLTUATION_RATE IN BETWEEN 20 - 30  --> store "OnGoingMarketStretegy"   If TYPE OF…
Read More
Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 2

Sensex Log Data Processing (PDF File Processing in Map Reduce) Part 2

Apache Pig Script​ SENSEX.pig A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/HighDemandMarket-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disHM = DISTINCT A; orHM = ORDER disHM by Sid; STORE orHM into '/hdfs/bhavesh/SENSEX/HM' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/ReliableProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disRP = DISTINCT A; orRP = ORDER disRP by Sid; STORE orRP into '/hdfs/bhavesh/SENSEX/RP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/OtherProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disOP = DISTINCT A; orOP = ORDER disOP by Sid; STORE orOP into '/hdfs/bhavesh/SENSEX/OP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/WealthyProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disWP= DISTINCT A; orWP = ORDER disWP by Sid; STORE orWP into '/hdfs/bhavesh/SENSEX/WP' using PigStorage(','); A…
Read More