In this article, We will see how to process Sensex Log (Share Market) which is in PDF format using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyse the data in Hadoop Eco-system to: Take the complete PDF Input data on HDFSDevelop a Map-Reduce Use Case to get the below-filtered results from the HDFS Input data(Excel data) If TYPE OF TRADING is -->'SIP' - OPEN_BALANCE > 25000 & FLTUATION_RATE > 10 --> store "HighDemandMarket" -CLOSING_BALANCE<22000 & FLTUATION_RATE IN BETWEEN 20 - 30 --> store "OnGoingMarketStretegy" If TYPE OF…
Apache Pig Script SENSEX.pig A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/HighDemandMarket-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disHM = DISTINCT A; orHM = ORDER disHM by Sid; STORE orHM into '/hdfs/bhavesh/SENSEX/HM' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/ReliableProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disRP = DISTINCT A; orRP = ORDER disRP by Sid; STORE orRP into '/hdfs/bhavesh/SENSEX/RP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/OtherProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disOP = DISTINCT A; orOP = ORDER disOP by Sid; STORE orOP into '/hdfs/bhavesh/SENSEX/OP' using PigStorage(','); A = LOAD '/hdfs/bhavesh/SENSEX/OUTPUT/WealthyProducts-r-00000' using PigStorage('\t') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int); disWP= DISTINCT A; orWP = ORDER disWP by Sid; STORE orWP into '/hdfs/bhavesh/SENSEX/WP' using PigStorage(','); A…
In this article, We have explored the Sentiments of People in India during Demonetization. Even by using small data, I could still gain a lot of valuable insights. I have used Spark SQL and Inbuild graphs provided by Databricks.India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world’s population. Let us find out the views of different people on the demonetization by analyzing the tweets from Twitter. Attribute Information or Dataset Details: Table Created in Databricks Environment Technology Used Apache Spark Spark SQL DataFrame-based API Databricks Notebook Free Account…
In this article, We have explored Census data for India to understand changes in India’s demographics, population growth, religion distribution, gender distribution, and sex ratio, etc. Even by using small data, I could still gain a lot of valuable insights about the country. I have used Spark SQL and Inbuild graphs provided by Databricks. India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world's population. Already containing 17.5% of the world's population, India is projected to be the world's most populous country by 2025, surpassing China, its population reaching…
Code for Spark SQL to get Population Density in terms of Districts Code for Spark SQL to get Scheduled Castes (SCs) Population per State Code for Spark SQL to get What percentage of the states are actually literate in India? Code for Spark SQL to get State which have Literacy rate less than 50% Code for Spark SQL to get Male and Female Literacy rate as per State Code for Spark SQL to get Literacy Rate as per Type of Education for every State Code for Spark SQL to get Male and Female Percentage as per state Code for Spark SQL to…
Code for Spark SQL to get Status Electricity Facility by State Code for Spark SQL to get Education Facility in India as Per State Code for Spark SQL to get Medical Facility in India Code for Spark SQL to get Bus Transportation per State Code for Spark SQL to get Raod Status in India Code for Spark SQL to get Residence Status in India by State
Click on the Create a Blank Notebook as shown in the below Image Specify the File name and Select the Cluster which we have created. A notebook is a collection of runnable cells (commands). When you use a notebook, you are primarily developing and running cells. The supported magic commands are: %python, %r, %scala, and %sql. Additionally: %shAllows you to execute shell code in your notebook. %fsAllows you to use dbutils filesystem commands. %mdAllows you to include various types of documentation, including text, images, and mathematical formulas and equations. For more details please refer Databricks Documentation.
What is the Databricks Community Edition? The Databricks Community Edition is the free version of our cloud-based big data platform. Its allows users to access a micro-cluster as well as a cluster manager and notebook environment. All users can share their notebooks and host them free of charge with Databricks. Link for Databricks Community Edition https://community.cloud.databricks.com/login.html Open the above Link in any Latest Browser, we recommend use Google Chrome for better experience. Click on Sign up as shown in the Image A New Page will get open as shown in the below Image. Fill all the required details as applicable…
Once you login to Databricks Community Edition on the Left Tab we have Cluster Button as shown in the Image Click on it. As soon as you click on Clusters Button a new webpage will get open as shown in the below image. As soon as you click on Create Cluster a new webpage will get open as shown in the below image Launching Spark Cluster Steps are as follows: Specify the Cluster name [You can specify any Cluster Name for our all Project we will specify it as SparkCluster] Click on Create Cluster Please make a note: Free 15GB Memory:…
Loading Data into Databricks: Click on Import and Explore Data A new popup will get open Select the file which you want to upload into Databricks A new web page will get open and files will get uploaded into Databricks Make Sure you see the tick mark which indicated file is uploaded successfully and copy the file location and refer these file in Notebook Once we click on Drop files a new Popup will get Open A new web page will get open and file will get uploaded into Databricks Environment