Are there any useful variables that you can engineer with the given data?
Review a list of the feature names below, from which we can engineer:
- The total number of dependents in the home (‘Dependents’) can be engineered from the sum of ‘Kidhome’ and ‘Teenhome’
- The year of becoming a customer (‘Year_Customer’) can be engineered from ‘Dt_Customer’
- The total amount spent (‘TotalMnt’) can be engineered from the sum of all features containing the keyword ‘Mnt’
- The total purchases (‘TotalPurchases’) can be engineered from the sum of all features containing the keyword ‘Purchases’
- The total number of campaigns accepted (‘TotalCampaignsAcc’) can be engineered from the sum of all features containing the keywords ‘Cmp’ and ‘Response’ (the latest campaign)
Deriving Some useful Data
%scala
import spark.implicits._
import org.apache.spark.sql.functions._
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
val derivedDF = Finalmarketingdataframe.select($"ID", $"Income", $"Kidhome" + $"Teenhome" as "Dependents", year(to_timestamp($"Dt_Customer", "MM/dd/yy")) as "Year_Customer", $"MntWines" + $"MntFruits" + $"MntMeatProducts" + $"MntFishProducts" + $"MntSweetProducts" + $"MntGoldProds" as "TotalMnt", $"NumDealsPurchases" + $"NumWebPurchases" + $"NumCatalogPurchases" + $"NumStorePurchases" as "TotalPurchases", $"AcceptedCmp1" + $"AcceptedCmp2" + $"AcceptedCmp3" + $"AcceptedCmp4" + $"AcceptedCmp5" as "TotalCampaignsAcc", $"Country")
import spark.implicits._
import org.apache.spark.sql.functions._
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
val derivedDF = Finalmarketingdataframe.select($"ID", $"Income", $"Kidhome" + $"Teenhome" as "Dependents", year(to_timestamp($"Dt_Customer", "MM/dd/yy")) as "Year_Customer", $"MntWines" + $"MntFruits" + $"MntMeatProducts" + $"MntFishProducts" + $"MntSweetProducts" + $"MntGoldProds" as "TotalMnt", $"NumDealsPurchases" + $"NumWebPurchases" + $"NumCatalogPurchases" + $"NumStorePurchases" as "TotalPurchases", $"AcceptedCmp1" + $"AcceptedCmp2" + $"AcceptedCmp3" + $"AcceptedCmp4" + $"AcceptedCmp5" as "TotalCampaignsAcc", $"Country")
Display Derived Data
data:image/s3,"s3://crabby-images/c9e99/c9e99fcb08f76a0ede61df8b4cec76b3d1dab159" alt=""
Creating Temp View So we can perform Spark SQL
%scala
derivedDF.createOrReplaceTempView("DerivedData");
derivedDF.createOrReplaceTempView("DerivedData");
Scatter Plot TotalMnt VS Income
data:image/s3,"s3://crabby-images/f2133/f2133adf9c81c161db2ece03ba5fd8b43e719cba" alt=""
NumDealsPurchases VS Dependents
data:image/s3,"s3://crabby-images/87b47/87b47ddadbb43abbac0c93c056e0c882369a0c93" alt=""
TotalCampaignsAcc VS Income
data:image/s3,"s3://crabby-images/2e1af/2e1aff118d6284a97df3cc52dd13d1352ad2f977" alt=""
Dependents VS TotalCampaignsAcc
data:image/s3,"s3://crabby-images/90520/9052029c1981daa27a19285d36b5621ee24cb355" alt=""
Scatter plot NumWebPurchases VS NumWebVisitsMonth
data:image/s3,"s3://crabby-images/b6428/b6428a9fcffc91233abd1e810e098565dd40f0c7" alt=""
Scatter Plot NumDealsPurchases VS NumWebVisitsMonth
data:image/s3,"s3://crabby-images/07df0/07df002ffe8712c0ed31a210184bbacbc7557372" alt=""
Section 02: Statistical Analysis
data:image/s3,"s3://crabby-images/6a1b9/6a1b94440b739850ce8b0a9a07628e48d33bbf41" alt=""
Total Number of Purchases by Country
data:image/s3,"s3://crabby-images/d51db/d51db82b83d82f81b4267fb2d14eb2d2b9839762" alt=""
Total Amount Spent by Country
data:image/s3,"s3://crabby-images/00ed9/00ed9b50cc53c4372a3f0113c29e02dc5705bf2f" alt=""