Analytics on India census using Apache Spark Part 1

In this article, We have explored Census data for India to understand changes in India’s demographics, population growth, religion distribution, gender distribution, and sex ratio, etc. Even by using small data, I could still gain a lot of valuable insights about the country. I have used Spark SQL and Inbuild graphs provided by Databricks.

India is the second-most populous country in the world, with over 1.271 billion people, more than a sixth of the world’s population. Already containing 17.5% of the world’s population, India is projected to be the world’s most populous country by 2025, surpassing China, its population reaching 1.6 billion by 2050. Its population growth rate is 1.2%.

Attribute Information or Dataset Details:

Table Created in Databricks Environment

Technology Used

  1. Apache Spark
  2. Spark SQL 
  3. DataFrame-based API
  4. Databricks Notebook

Free Account creation in Databricks

Creating a Spark Cluster

Basics about Databricks notebook

Code for Spark SQL to get India's States with Number of Districts

Plot Option for Chart

By Bhavesh