Bhavesh

82 Posts
Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 1

Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 1

Download Link for Apache Hadoop 3.3.0 URL : https://hadoop.apache.org/releases.html Click on the Binary it will open a new website https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz (This link may change based on your location) Download link for Java SE Development Kit 8 https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html Register or Login If you have already registered the Download will begin We will have Below files in Download Folder Installing and Configuring Java Step 1: Create a Empty Folder Java in C Drive Step 2: Go to the Download location Step 3: Double Click on the Setup file Click on Next Click on Next Click on Next Click on Change Make Sure you change…
Read More
Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 2

Apache Hadoop 3.3.0 Single Node Installation on Windows 10 Part 2

We have downloaded hadoop installation files We need to move (that is cut and paste) From: Downloads Location To: C:\hadoop-3.3.0.tar In C Drive Extract the hadoop-3.3.0.tar files in C Drive using extraction software (WinZip, WinRar or 7Zip) Now we will have the following in C Drive Now Open Folder C:\hadoop-3.3.0\etc\hadoop We need to edit 5 files File C:/Hadoop-3.3.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file. <configuration>  <property>  <name>fs.default.name</name>  <value>hdfs://localhost:9000</value>  </property> </configuration> C:/Hadoop-3.3.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file. <configuration>  <property>  <name>mapreduce.framework.name</name>  <value>yarn</value>  </property> </configuration> Create folder "data" under "C:\Hadoop-3.3.0"  1) Create folder "datanode" under "C:\Hadoop-3.3.0\data"  2) Create…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 1

A mushroom, or toadstool, is the fleshy, spore-bearing fruiting body of a fungus, typically produced above ground on soil or on its food source. Problem Statement or Business Problem In this project, looking at the various properties of a mushroom, we will predict whether the mushroom is edible or poisonous. Attribute Information or Dataset Details: To be more understandable, let's write properties one by one. classes: edible=e, poisonous=p cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s cap-color: brown=n, buff=b, cinnamon=c, gray=g,green=r, pink=p, purple=u, red=e,white=w,yellow=y bruises: bruises=t,no=f odor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p,spicy=s gill-attachment: attached=a,descending=d,free=f,notched=n gill-spacing: close=c,crowded=w,distant=d gill-size: broad=b,narrow=n gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,…
Read More
Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Machine Learning Project on Mushroom Classification whether it’s edible or poisonous Part 2

Collecting all String Columns into an Array %scala var StringfeatureCol = Array("class", "capshape", "capsurface", "capcolor", "bruises", "odor", "gillattachment", "gillspacing", "gillsize", "gillcolor", "stalkshape", "stalkroot", "stalksurfaceabovering", "stalksurfacebelowring", "stalkcolorabovering", "stalkcolorbelowring", "veiltype", "veilcolor", "ringnumber", "ringtype", "sporeprintcolor", "population", "habitat") StringIndexer encodes a string column of labels to a column of label indices. Example of StringIndexer %scala import org.apache.spark.ml.feature.StringIndexer val df = spark.createDataFrame( Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")) ).toDF("id", "category") df.show() val indexer = new StringIndexer() .setInputCol("category") .setOutputCol("categoryIndex") val indexed = indexer.fit(df).transform(df) indexed.show() Output: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| b| | 2| c| | 3|…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 1)

Machine Learning Pipeline Application on Power Plant. (Part 1)

This is an end-to-end Project of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset. Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant. Background Power generation is a complex process, and understanding and predicting power output is an important element in managing a plant and its connection to the power grid. The operators of a regional power grid create predictions of power demand based on historical…
Read More
Machine Learning Pipeline Application on Power Plant. (Part 2)

Machine Learning Pipeline Application on Power Plant. (Part 2)

Visualize Your Data To understand our data, we will look for correlations between features and the label. This can be important when choosing a model. E.g., if features and a label are linearly correlated, a linear model like Linear Regression can do well; if the relationship is very non-linear, more complex models such as Decision Trees can be better. We can use Databrick's built in visualization to view each of our predictors in relation to the label column as a scatter plot to see the correlation between the predictors and the label. Exploratory Data Analysis (EDA) is an approach/philosophy for…
Read More
Machine Learning Project – Predict Forest Cover Part 1

Machine Learning Project – Predict Forest Cover Part 1

In this project, we will predict Forest Cover based on various attributes (cartographic variables) of the Forest. Hence, this is a classification problem. Problem Statement or Business Problem In this project, we'll predict Forest Cover supported various attributes (cartographic variables) of the Forest. Hence, this is often a classification problem. Attribute Information or Dataset Details: Given is the attribute name, attribute type, the measurement unit, and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. NameData TypeMeasurementDescriptionElevationquantitativemetersElevation in metersAspectquantitativeazimuthAspect in degrees…
Read More
Machine Learning Project – Predict Forest Cover Part 2

Machine Learning Project – Predict Forest Cover Part 2

Define the Pipeline​ A predictive model often requires multiple stages of feature preparation. A pipeline consists of a series of transformer and estimator stages that typically prepare a DataFrame for modeling and then train a predictive model. Split the Data It is common practice when building machine learning models to split the source data, using some of it to train the model and reserving some to test the trained model. In this project, you will use 70% of the data for training, and reserve 30% for testing. %scala val splits = ForestDF.randomSplit(Array(0.7, 0.3)) val train = splits(0) val test =…
Read More
Machine Learning Project Predict Will it Rain Tomorrow in Australia

Machine Learning Project Predict Will it Rain Tomorrow in Australia

Machine Learning Project for Predicting will it Rain Tomorrow in Australia Problem Statement or Business Problem In this project we will be working with a data set, indicating whether it rain the next day in Australia, Yes or No? This column is Yes if the rain for that day was 1mm or more. We will try to create a model that will predict using the available data. Attribute Information or Dataset Details: Date -The date of observation Location - The common name of the location of the weather station MinTemp - The minimum temperature in degrees celsius MaxTemp - The…
Read More
Predict Ads Click – Practice Data Analysis and Logistic Regression Prediction

Predict Ads Click – Practice Data Analysis and Logistic Regression Prediction

Machine Learning Project for Predict Ads Click based on the available attributes Problem Statement or Business Problem In this project we will be working with a data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user. Attribute Information or Dataset Details: 'Daily Time Spent on Site': consumer time on site in minutes 'Age': cutomer age in years 'Area Income': Avg. Income of geographical area of consumer 'Daily Internet Usage': Avg.…
Read More