Marketing Analytics Part 1

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes.

Overall goal

You’re a marketing analyst and you’ve been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions.

Section 01: Exploratory Data Analysis

Are there any null values or outliers? How will you wrangle/handle them?
Are there any variables that warrant transformations?
Are there any useful variables that you can engineer with the given data?
Do you notice any patterns or anomalies in the data? Can you plot them?

Section 02: Statistical Analysis

Please run statistical tests to answer these questions & propose data-driven action recommendations to your CMO. Make sure to interpret your results with non-statistical jargon so your CMO can understand your findings.

What factors are significantly related to the number of store purchases?
Does US fare significantly better than the Rest of the World in terms of total purchases?
Your supervisor insists that people who buy gold are more conservative. Therefore, people who spent an above average amount on gold in the last 2 years would have more in store purchases. Justify or refute this statement using an appropriate statistical test
Is there a significant relationship between geographical regional and success of a campaign?

Section 03: Data Visualization

Please plot and visualize the answers to the below questions.

Which marketing campaign is most successful?
What does the average customer look like for this company?
Which products are performing best?
Which channels are underperforming?

Load Source Data

The is a CSV file of 2240 observations (customers) with 28 variables related to marketing data. More specifically, the variables provide insights about:

Customer profiles
Products purchased
Campaign success (or failure)
Channel performance

Data Level Details

ID – Customer’s unique identifier
Year_Birth – Customer’s birth year
Education – Customer’s education level
Marital_Status – Customer’s marital status
Income – Customer’s yearly household income
Kidhome – Number of children in customer’s household
Teenhome – Number of teenagers in customer’s household
Dt_Customer – Date of customer’s enrollment with the company
Recency – Number of days since customer’s last purchase
MntWines – Amount spent on wine in the last 2 years
MntFruits – Amount spent on fruits in the last 2 years
MntMeatProducts – Amount spent on meat in the last 2 years
MntFishProducts – Amount spent on fish in the last 2 years
MntSweetProducts – Amount spent on sweets in the last 2 years
MntGoldProds – Amount spent on gold in the last 2 years
NumDealsPurchases – Number of purchases made with a discount
NumWebPurchases – Number of purchases made through the company’s web site
NumCatalogPurchases – Number of purchases made using a catalogue
NumStorePurchases – Number of purchases made directly in stores
NumWebVisitsMonth – Number of visits to company’s web site in the last month
AcceptedCmp3 – 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp4 – 1 if customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp5 – 1 if customer accepted the offer in the 5th campaign, 0 otherwise
AcceptedCmp1 – 1 if customer accepted the offer in the 1st campaign, 0 otherwise
AcceptedCmp2 – 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
Response – 1 if customer accepted the offer in the last campaign, 0 otherwise
Complain – 1 if customer complained in the last 2 years, 0 otherwise
Country – Customer’s location

Technology Used

Apache Spark
Spark SQL
Apache Spark MLLib
Scala
DataFrame-based API
Databricks Notebook

Introduction

Welcome to this project on creating Marketing Analytics Report using Apache Spark on Databricks platform community edition server which allows you to execute your spark code, free of cost on their server just by registering through email id.

In this project, we explore Apache Spark, Spark SQL and Graphs (Data Visualization) on the Databricks platform.

I am a firm believer that the best way to learn is by doing. That’s why I haven’t included any purely theoretical lectures in this tutorial: you will learn everything on the way and be able to put it into practice straight away. Seeing the way each feature works will help you learn Apache Spark thoroughly by heart.

We’re going to look at how to set up a Spark Cluster and get started with that. And we’ll look at how we can then use that Spark Cluster to take data coming into that Spark Cluster, a process that data using Apache Spark, and generate some sort of output in the form of a Graphs and Data Visualization. That’s pretty much what we’re going to learn about the Marketing Analytics.

In this project, we are a marketing analyst and you’ve been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the data set to understand this problem and propose data-driven solutions.

We will learn:

Preparing the Data for Processing.
Basics flow of data in Apache Spark, loading data, and working with data, this tutorial shows you how Apache Spark is perfect for Data Analytics job.
Learn the basics of Databricks notebook by enrolling in Free Community Edition Server
Exploratory Data Analysis
Statistical Analysis
Data Visualization

The goal is to provide you with practical tools that will be beneficial for you in the future. While doing that, you’ll develop a Analytics Report with a real use opportunity.
I am really excited you are here, I hope you are going to follow all the way to the end of the Project. It is fairly straight forward fairly easy to follow through the article we will show you step by step each line of code & we will explain what it does and why we are doing it.

Free Account creation in Databricks

Creating a Spark Cluster

Basics about Databricks notebook

Download Data

Loading Data into DataFrame

%scala

// File location and type
val file_location = "/FileStore/tables/marketing_data.csv"
val file_type = "csv"

// CSV options
val infer_schema = "true"
val first_row_is_header = "true"
val delimiter = ","

// The applied options are for CSV files. For other file types, these will be ignored.
val marketingDF = spark.read.format(file_type)
.option("inferSchema", infer_schema)
.option("header", first_row_is_header)
.option("sep", delimiter)
.load(file_location)

display(marketingDF)

Count of Data (Total Records)

%scala

marketingDF.count()

res2: Long = 2240

Displaying Statistics of Data

%scala

display(marketingDF.describe())

Print Schema of Data

%scala

marketingDF.printSchema()

root
|-- ID: integer (nullable = true)
|-- Year_Birth: integer (nullable = true)
|-- Education: string (nullable = true)
|-- Marital_Status: string (nullable = true)
|-- Income : string (nullable = true)
|-- Kidhome: integer (nullable = true)
|-- Teenhome: integer (nullable = true)
|-- Dt_Customer: string (nullable = true)
|-- Recency: integer (nullable = true)
|-- MntWines: integer (nullable = true)
|-- MntFruits: integer (nullable = true)
|-- MntMeatProducts: integer (nullable = true)
|-- MntFishProducts: integer (nullable = true)
|-- MntSweetProducts: integer (nullable = true)
|-- MntGoldProds: integer (nullable = true)
|-- NumDealsPurchases: integer (nullable = true)
|-- NumWebPurchases: integer (nullable = true)
|-- NumCatalogPurchases: integer (nullable = true)
|-- NumStorePurchases: integer (nullable = true)
|-- NumWebVisitsMonth: integer (nullable = true)
|-- AcceptedCmp3: integer (nullable = true)
|-- AcceptedCmp4: integer (nullable = true)
|-- AcceptedCmp5: integer (nullable = true)
|-- AcceptedCmp1: integer (nullable = true)
|-- AcceptedCmp2: integer (nullable = true)
|-- Response: integer (nullable = true)
|-- Complain: integer (nullable = true)
|-- Country: string (nullable = true)

Remove Spaces from Income Column Name (i.e. Rename a Column)

%scala

val marketingdataDF = marketingDF.withColumnRenamed(" Income ", "Income")

Print Schema

%scala

marketingdataDF.printSchema();

Output:

root
|-- ID: integer (nullable = true)
|-- Year_Birth: integer (nullable = true)
|-- Education: string (nullable = true)
|-- Marital_Status: string (nullable = true)
|-- Income: string (nullable = true)
|-- Kidhome: integer (nullable = true)
|-- Teenhome: integer (nullable = true)
|-- Dt_Customer: string (nullable = true)
|-- Recency: integer (nullable = true)
|-- MntWines: integer (nullable = true)
|-- MntFruits: integer (nullable = true)
|-- MntMeatProducts: integer (nullable = true)
|-- MntFishProducts: integer (nullable = true)
|-- MntSweetProducts: integer (nullable = true)
|-- MntGoldProds: integer (nullable = true)
|-- NumDealsPurchases: integer (nullable = true)
|-- NumWebPurchases: integer (nullable = true)
|-- NumCatalogPurchases: integer (nullable = true)
|-- NumStorePurchases: integer (nullable = true)
|-- NumWebVisitsMonth: integer (nullable = true)
|-- AcceptedCmp3: integer (nullable = true)
|-- AcceptedCmp4: integer (nullable = true)
|-- AcceptedCmp5: integer (nullable = true)
|-- AcceptedCmp1: integer (nullable = true)
|-- AcceptedCmp2: integer (nullable = true)
|-- Response: integer (nullable = true)
|-- Complain: integer (nullable = true)
|-- Country: string (nullable = true)

Remove $ and , signs from Income Column Values

%scala

import org.apache.spark.sql.functions.col
import org.apache.spark.sql.functions.regexp_replace

val FinalmarketingdataDF = marketingdataDF.withColumn("Income", regexp_replace(col("Income"),"[$,]",""))

display(FinalmarketingdataDF)

Converting Income column (String Type) to (Double Type)

%scala

import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.DoubleType

val Finalmarketingdataframe = FinalmarketingdataDF.withColumn("Income",col("Income").cast(DoubleType))

Print Schema

%scala

Finalmarketingdataframe.printSchema()

Output:

root
|-- ID: integer (nullable = true)
|-- Year_Birth: integer (nullable = true)
|-- Education: string (nullable = true)
|-- Marital_Status: string (nullable = true)
|-- Income: double (nullable = true)
|-- Kidhome: integer (nullable = true)
|-- Teenhome: integer (nullable = true)
|-- Dt_Customer: string (nullable = true)
|-- Recency: integer (nullable = true)
|-- MntWines: integer (nullable = true)
|-- MntFruits: integer (nullable = true)
|-- MntMeatProducts: integer (nullable = true)
|-- MntFishProducts: integer (nullable = true)
|-- MntSweetProducts: integer (nullable = true)
|-- MntGoldProds: integer (nullable = true)
|-- NumDealsPurchases: integer (nullable = true)
|-- NumWebPurchases: integer (nullable = true)
|-- NumCatalogPurchases: integer (nullable = true)
|-- NumStorePurchases: integer (nullable = true)
|-- NumWebVisitsMonth: integer (nullable = true)
|-- AcceptedCmp3: integer (nullable = true)
|-- AcceptedCmp4: integer (nullable = true)
|-- AcceptedCmp5: integer (nullable = true)
|-- AcceptedCmp1: integer (nullable = true)
|-- AcceptedCmp2: integer (nullable = true)
|-- Response: integer (nullable = true)
|-- Complain: integer (nullable = true)
|-- Country: string (nullable = true)

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes.

Overall goal

Section 01: Exploratory Data Analysis

Section 02: Statistical Analysis

Section 03: Data Visualization

Load Source Data

Data Level Details

Technology Used

Introduction

Free Account creation in Databricks

Creating a Spark Cluster

Basics about Databricks notebook

Basics about Databricks notebook

Download Data

Loading Data into DataFrame

Count of Data (Total Records)

Displaying Statistics of Data

Print Schema of Data

Remove Spaces from Income Column Name (i.e. Rename a Column)

Print Schema

Remove $ and , signs from Income Column Values

Converting Income column (String Type) to (Double Type)

Print Schema

Display Data

Creating Temp View so we can run Spark SQL Queries on data

Exploratory Data Analysis

Income Distribution

Box Plot for Income

Box Plot for Year Birth

Box Plot for Kid Home

Box Plot for Teen Home

Displaying Data

Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions in relation to brand and revenue outcomes.

Overall goal

Section 01: Exploratory Data Analysis

Section 02: Statistical Analysis

Section 03: Data Visualization

Load Source Data

Data Level Details

Technology Used

Introduction

Free Account creation in Databricks

Creating a Spark Cluster

Basics about Databricks notebook

Basics about Databricks notebook

Download Data

Loading Data into DataFrame

Count of Data (Total Records)

Displaying Statistics of Data

Print Schema of Data

Remove Spaces from Income Column Name (i.e. Rename a Column)

Print Schema

Remove $ and , signs from Income Column Values

Converting Income column (String Type) to (Double Type)

Print Schema

Display Data

Creating Temp View so we can run Spark SQL Queries on data

Exploratory Data Analysis

Income Distribution

Box Plot for Income

Box Plot for Year Birth

Box Plot for Kid Home

Box Plot for Teen Home

Displaying Data

Related Posts