Bidding Auction Data Analytics in Apache Spark

In this article, We have explored the Bidding Auction Data Analysis The dataset to be used is from eBay like online auctions. Even by using small data, I could still gain a lot of valuable insights. I have used Spark RDD in Databricks.
In this activity, we will load data into Apache Spark and inspect the data using the Spark In this section, we use the SparkContext method, textFile, to load the data into a Resilient Distributed Dataset (RDD).

Our dataset is a .csv file that consists of online auction data. Each auction has an auction id associated with it and can have multiple bids. Each row represents a bid. For each bid, we have the following information:

Attribute Information or Dataset Details:

We load this data into Spark using RDDs

Objectives
• Load data into Spark
• Use transformations and actions to inspect the data

What transformations and actions would you use in each case?

  1. How do you see the first element of the inputRDD?
  2. What do you use to see the first 5 elements of the RDD?
  3. What is the total number of bids?
  4. What is the total number of distinct items that were auctioned?
  5. What is the total number of item types that were auctioned?
  6. What is the total number of bids per item type?
  7. Across all auctioned items, what is the minimum number of bids?
  8. Across all auctioned items, what is the maximum number of bids?
  9. What is the average number of bids?

Input file (contains 10654 Records)

Spark Code is written in Java

Java

Output

By Bhavesh