Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

A mushroom, or toadstool, is the fleshy, spore-bearing fruiting body of a fungus, typically produced above ground on soil or on its food source.

Problem Statement or Business Problem

In this project, looking at the various properties of a mushroom, we will predict whether the mushroom is edible or poisonous.

Attribute Information or Dataset Details:

To be more understandable, let’s write properties one by one.

  • classes: edible=e, poisonous=p
  • cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
  • cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
  • cap-color: brown=n, buff=b, cinnamon=c, gray=g,green=r, pink=p, purple=u, red=e,white=w,yellow=y
  • bruises: bruises=t,no=f
  • odor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p,spicy=s
  • gill-attachment: attached=a,descending=d,free=f,notched=n
  • gill-spacing: close=c,crowded=w,distant=d
  • gill-size: broad=b,narrow=n
  • gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r, orange=o, pink=p,purple=u,red=e,white=w,yellow=y
  • stalk-shape: enlarging=e,tapering=t
  • stalk-root: bulbous=b, club=c, cup=u, equal=e, rhizomorphs=z, rooted=r,missing=?
  • stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
  • stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
  • stalk-color-above-ring: brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e,white=w,yellow=y
  • stalk-color-below-ring: brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e,white=w,yellow=y
  • veil-type: partial=p,universal=u
  • veil-color: brown=n,orange=o,white=w,yellow=y
  • ring-number: none=n,one=o,two=t
  • ring-type: cobwebby=c, evanescent=e, flaring=f, large=l, none=n, pendant=p, sheathing=s,zone=z
  • spore-print-color: black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u,white=w,yellow=y
  • population: abundant=a, clustered=c, numerous=n, scattered=s, several=v, solitary=y
  • habitat: grasses=g, leaves=l, meadows=m, paths=p, urban=u, waste=w, woods=d

Also, following image shows mushroom parts as we mentioned above. (Image Credit goes to Infovisual)

  • Cap: The cap is the top of the mushroom (and often looks sort of like a small umbrella). Mushroom caps can come in a variety of colors but most often are brown, white, or yellow.
  • Gills, Pores, or Teeth: These structures appear under the mushroom’s cap. They look similar to a fish’s gills.
  • Ring: The ring (sometimes called the annulus) is the remaining structure of the partial veil after the gills have pushed through.
  • Stem or Stipe: The stem is the tall structure that holds the cap high above the ground.
  • Volva: The volva is the protective veil that remains after the mushroom sprouted up from the ground. As the fungus grows, it breaks through the volva.
  • Spores: Microscopic seeds acting as reproductive agents; they are usually released into the air and fall on a substrate to produce a new mushroom.

Technology Used

  1. Apache Spark
  2. Spark SQL
  3. Apache Spark MLLib
  4. Scala
  5. DataFrame-based API
  6. Databricks Notebook

Introduction

Welcome to this project on predict whether mushroom is edible or poisonous in Apache Spark Machine Learning using Databricks platform community edition server which allows you to execute your spark code, free of cost on their server just by registering through email id.

In this project, we explore Apache Spark and Machine Learning on the Databricks platform.

I am a firm believer that the best way to learn is by doing. That’s why I haven’t included any purely theoretical lectures in this tutorial: you will learn everything on the way and be able to put it into practice straight away. Seeing the way each feature works will help you learn Apache Spark machine learning thoroughly by heart.

We’re going to look at how to set up a Spark Cluster and get started with that. And we’ll look at how we can then use that Spark Cluster to take data coming into that Spark Cluster, a process that data using a Machine Learning model, and generate some sort of output in the form of a prediction. That’s pretty much what we’re going to learn about the predictive model.

In this project, we will be performing prediction where mushroom are edible or poisonous.

We will learn:

Preparing the Data for Processing.
Basics flow of data in Apache Spark, loading data, and working with data, this course shows you how Apache Spark is perfect for a Machine Learning job.
Learn the basics of Databricks notebook by enrolling in Free Community Edition Server
Define the Machine Learning Pipeline
Train a Machine Learning Model
Testing a Machine Learning Model
Evaluating a Machine Learning Model (i.e. Examine the Predicted and Actual Values)
The goal is to provide you with practical tools that will be beneficial for you in the future. While doing that, you’ll develop a model with a real use opportunity.

I am really excited you are here, I hope you are going to follow all the way to the end of the Project. It is fairly straight forward fairly easy to follow through the article we will show you step by step each line of code & we will explain what it does and why we are doing it.

Free Account creation in Databricks

Creating a Spark Cluster

Basics about Databricks notebook

Loading Data into Databricks Environment

Download Data

Load Data in Dataframe using User-defined Schema

Scala

Print Schema of Dataframe

Scala

Statistics of Data

Scala

Create Temporary View so we can perform Spark SQL on Data

Scala

Spark SQL

Scala

Exploratory Data Analysis or EDA​

Bruises Counts with Mushroom Types

SQL

Mushroom Cap Color Quantity

SQL

Edible and Poisonous Mushrooms Based on Cap Color

SQL

Mushroom Odor and Quantity

SQL

Edible and Poisonous Mushrooms Based on Odor

SQL

Mushroom Population Type Percentage

SQL

Edible & Poisonous Mushroom Population Type Percentage

SQL

Mushroom Habitat Type Percentage

SQL

Edible & Poisonous Mushroom Habitat Type Percentage

SQL
By Bhavesh