Are there any useful variables that you can engineer with the given data?
Review a list of the feature names below, from which we can engineer:
- The total number of dependents in the home (‘Dependents’) can be engineered from the sum of ‘Kidhome’ and ‘Teenhome’
- The year of becoming a customer (‘Year_Customer’) can be engineered from ‘Dt_Customer’
- The total amount spent (‘TotalMnt’) can be engineered from the sum of all features containing the keyword ‘Mnt’
- The total purchases (‘TotalPurchases’) can be engineered from the sum of all features containing the keyword ‘Purchases’
- The total number of campaigns accepted (‘TotalCampaignsAcc’) can be engineered from the sum of all features containing the keywords ‘Cmp’ and ‘Response’ (the latest campaign)
Deriving Some useful Data
%scala
import spark.implicits._
import org.apache.spark.sql.functions._
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
val derivedDF = Finalmarketingdataframe.select($"ID", $"Income", $"Kidhome" + $"Teenhome" as "Dependents", year(to_timestamp($"Dt_Customer", "MM/dd/yy")) as "Year_Customer", $"MntWines" + $"MntFruits" + $"MntMeatProducts" + $"MntFishProducts" + $"MntSweetProducts" + $"MntGoldProds" as "TotalMnt", $"NumDealsPurchases" + $"NumWebPurchases" + $"NumCatalogPurchases" + $"NumStorePurchases" as "TotalPurchases", $"AcceptedCmp1" + $"AcceptedCmp2" + $"AcceptedCmp3" + $"AcceptedCmp4" + $"AcceptedCmp5" as "TotalCampaignsAcc", $"Country")
import spark.implicits._
import org.apache.spark.sql.functions._
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
val derivedDF = Finalmarketingdataframe.select($"ID", $"Income", $"Kidhome" + $"Teenhome" as "Dependents", year(to_timestamp($"Dt_Customer", "MM/dd/yy")) as "Year_Customer", $"MntWines" + $"MntFruits" + $"MntMeatProducts" + $"MntFishProducts" + $"MntSweetProducts" + $"MntGoldProds" as "TotalMnt", $"NumDealsPurchases" + $"NumWebPurchases" + $"NumCatalogPurchases" + $"NumStorePurchases" as "TotalPurchases", $"AcceptedCmp1" + $"AcceptedCmp2" + $"AcceptedCmp3" + $"AcceptedCmp4" + $"AcceptedCmp5" as "TotalCampaignsAcc", $"Country")
Display Derived Data
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Derived-Data-1024x576.jpg)
Creating Temp View So we can perform Spark SQL
%scala
derivedDF.createOrReplaceTempView("DerivedData");
derivedDF.createOrReplaceTempView("DerivedData");
Scatter Plot TotalMnt VS Income
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/TotalMnt-VS-Income-1024x576.jpg)
NumDealsPurchases VS Dependents
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/NumDealsPurchases-VS-Dependents-1024x576.jpg)
TotalCampaignsAcc VS Income
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/TotalCampaignsAcc-VS-Income-1024x576.jpg)
Dependents VS TotalCampaignsAcc
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Dependents-VS-TotalCampaignsAcc-1024x576.jpg)
Scatter plot NumWebPurchases VS NumWebVisitsMonth
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Scatter-plot-NumWebPurchases-VS-NumWebVisitsMonth-1024x576.jpg)
Scatter Plot NumDealsPurchases VS NumWebVisitsMonth
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Scatter-Plot-NumDealsPurchases-VS-NumWebVisitsMonth-1024x576.jpg)
Section 02: Statistical Analysis
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Statistical-Analysis-1024x576.jpg)
Total Number of Purchases by Country
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Total-Number-of-Purchases-by-Country-1024x576.jpg)
Total Amount Spent by Country
![](https://projectsbasedlearning.com/wp-content/uploads/2021/11/Total-Amount-Spent-by-Country-1024x576.jpg)