Machine Learning Project for Glass Identification. Problem Statement or Business Problem From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc) The study of the classification of types of glass was motivated by the criminological investigation. At the scene of the crime, the glass left can be used as evidence...if it is correctly identified! Attribute Information or Dataset Details: Id number: 1 to 214 RI: refractive index Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) Mg: Magnesium Al: Aluminum Si: Silicon K: Potassium Ca:…

Machine Learning Project Predicting the age of abalone from physical measurements. Abalone is a common name for any of a group of small to very large sea snails, marine gastropod molluscs in the family Haliotidae. Other common names are ear shells, sea ears, and muttonfish or muttonshells in Australia, ormer in the UK, perlemoen in South Africa, and paua in New Zealand. Abalone are marine snails. Problem Statement or Business Problem Predict the age of abalone from physical measurements Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone,…

Histogram for Sex and Age %sql select Sex, Age from AbaloneData; Plot Option Age Distribution %sql select count(Sex), Sex from AbaloneData group by Sex; Histogram for Lenght in mm in Abalone %sql select Length_in_mm from AbaloneData; Histogram for Height in mm in Abalone %sql select Height_in_mm from AbaloneData; Histogram for rings in Abalone %sql select Rings from AbaloneData; Creating a Regression Model In this tutorial , you will implement a regression model that uses features of abalone to predict the age of abalone from physical measurements Import Spark SQL and Spark ML Libraries First, import the libraries you will…

In this article, We will analyze Consumer Complains recorded by US government from US citizens about financial products and services using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyze the data in Hadoop Eco-system to: Get the number of complaints filed for each company.Get the number of complaints filed under each product.Get the total number of complaints filed from a particular locationGet the list of company grouped by location which has no timely response. Attribute Information or Dataset Details: Data: Input Format - .CSV Public DATASET available at below website…

Execution of the Shell script Output Chart view in Excel 1. Get the number of complaints filed for each company. (Complains_by_Company.csv) 2. Get the number of complaints filed under each product. (Complains_by_Product.csv) 3. Get the total number of complaints filed from a particular location (Complains_by_Location.csv) 4. Get the list of company grouped by location which has no timely response (Complains_by_Response_No.csv) (Note: To generate the below Chart we need to Use PIVOT Chart option in Order to group data)

In this article, We will see how to Analyze YouTube Data using Big Data Technology, We will see step by step process execution of the project. YouTube is an American online video-sharing platform headquartered in San Bruno, California. Three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim created the service in February 2005. Problem Statement: Problem Statement is to 1) Find out the top 5 categories in which the most number of videos are uploaded. 2) Find top 10 rated videos, 3) Find top 10 most viewed videos 4) Find top 10 rated videos in each category 5) Find…

In this article, we will Analyze social bookmarking sites to find insights using Big Data Technology, Data comprises of the information gathered from sites that are bookmarking sites and allow you to bookmark, review, rate, on a specific topic. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various categories defining it and the ratings linked with it. Problem Statement: Analyse the data in Hadoop Eco-system to: Fetch the data into Hadoop Distributed File System and analyze it with the help of MapReduce, Pig, and Hive…

In this article, We have explore generating Analytics from a Product based Company using Web Log. Even by using small data, We could still gain a lot of valuable insights. Problem Statement: Generate Analytics based on the data in Hadoop Eco-system: Load weblog data into HDFS using HDFS client Develop Pig program to load log and perform analytics on IP Category-1 Category-2 page, status_code 2.1. Count of page views by individual user ie [IP, count(*)] 2.2. Top / Bottom 2: catagery-1/ catagery-2 / page /users (Exclude status code other than 200) Top 2 and bottom 2 records Category, total_number_viewspage,…