Linux

Installing Apache Flume on Ubuntu

Installing Apache Flume on Ubuntu

System Requirements:Java Runtime Environment - Java 1.8 or laterMemory - Sufficient memory for configurations used by sources, channels or sinksDisk Space - Sufficient disk space for configurations used by channels or sinksDirectory Permissions - Read/Write permissions for directories used by agentThe first step is to create a folder Flume:Make flume directory in /home/dataengineer/mkdir flumecd flume We need to go to the website  https://flume.apache.org/ and click on download. A new webpage will get open click on  apache-flume-1.11.0-bin.tar.gz A new webpage will get open https://www.apache.org/dyn/closer.lua/flume/1.11.0/apache-flume-1.11.0-bin.tar.gz  and copy the link shown to you. Type the below commandwget https://dlcdn.apache.org/flume/1.11.0/apache-flume-1.11.0-bin.tar.gz You will be able to…
Read More
MySQL client and Server Installation

MySQL client and Server Installation

Step 1: Update/Upgrade Package Repositorysudo apt updatesudo apt upgradeStep 2: Install MySQLsudo apt install mysql-serverWhen asked if you want to continue with the installation, answer Y and hit ENTER.Note: If you only want to connect to a remote MySQL server instead of hosting a database on your machine, install only the MySQL Client by running:sudo apt install mysql-client Step 3: Check if MySQL Service Is Runningsudo systemctl status mysql Step 4: Log in to MySQL Serversudo mysql -u root
Read More
Apache Hive Installation Steps on Ubuntu

Apache Hive Installation Steps on Ubuntu

With this tutorial, we will learn the complete process to install Apache Hive 3.1.2 on Ubuntu 20.The Apache Hive  data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.Steps for Installing Hadoop on UbuntuStep 1 - Create a directory for example $mkdir /home/bigdata/apachehive Step 2 - Move to hadoop directory $cd /home/bigdata/apachehive Step 3 - Download Apache Hive (Link will change with respect to country so please get the download link from…
Read More
Installing Apache Pig on Ubuntu

Installing Apache Pig on Ubuntu

Download a recent stable release from one of the Apache Download website https://pig.apache.org/releases.htmlClick on Download A new Page will get open (https://www.apache.org/dyn/closer.cgi/pig) Click on the link as marked in the below image A new page will get Open Click on Latest Folder so the new page will get open Download the file as shown in the image We have downloaded the file in directory /home/dataengineer/apachepig/ Unzip the file using the below commandtar -xvzf pig-0.17.0.tar.gzAdd /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv (tcsh,csh). For example:$ export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATHExecuting Pig Help Command
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 1)

With this tutorial, we will learn the complete process to install Hadoop 3.3.1 on Ubuntu 20.Supported Java VersionsApache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported:  HADOOP-16795 - Java 11 compile support OPENApache Hadoop from 3.0.x to 3.2.x now supports only Java 8Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8Required software for Linux include: Java must be installed. Recommended Java versions are described at HadoopJavaVersions. ssh must be installed and sshd must be running to use the Hadoop scripts that…
Read More
Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Apache Hadoop 3.3.1 Installation Steps on Ubuntu (Part 2)

Use the following property in the respective filesFile: nano etc/hadoop/core-site.xml: <configuration>   <property>     <name>fs.defaultFS</name>     <value>hdfs://localhost:9000</value>   </property> </configuration> File: nano etc/hadoop/hdfs-site.xml <configuration>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration> File: nano etc/hadoop/mapred-site.xml <configuration>   <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>   </property>   <property>     <name>mapreduce.application.classpath</name>     <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>   </property> </configuration> File: nano etc/hadoop/yarn-site.xml <configuration>   <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>   <property>     <name>yarn.nodemanager.env-whitelist</name>     <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME*</value>   </property> </configuration> Now check that you can ssh to the localhost without…
Read More
Installing Apache Superset on Ubuntu (Linux) Machine

Installing Apache Superset on Ubuntu (Linux) Machine

Installing Superset from Scratch In Ubuntu 20.04 the following command will ensure that the required dependencies are installed: sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev Python Virtual EnvironmentWe highly recommend installing Superset inside of a virtual environment. pip install virtualenv You can create and activate a virtual environment using: # virtualenv is shipped in Python 3.6+ as venv instead of pyvenv. # See https://docs.python.org/3.6/library/venv.html python3 -m venv venv . venv/bin/activate Installing and Initializing SupersetFirst, start by installing apache-superset: pip install apache-superset Then, you need to initialize the database: superset db upgrade Finish installing by running through the…
Read More
Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing the binary tarball Verify the version of Java installed. For example: Command $ java -version Result openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) 2. Download the binary tarball from one of the mirrors on the Apache Cassandra Download site. For example, to download Cassandra 4.0.1: $ curl -OL https://dlcdn.apache.org/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz The mirrors only host the latest versions of each major supported release. To download an earlier version of Cassandra, visit the Apache Archives. OPTIONAL: Verify the integrity of the downloaded tarball using one of the methods here. For example, to verify…
Read More
Installing Java on Ubuntu (Linux) Machine

Installing Java on Ubuntu (Linux) Machine

Steps for Installing JAVA 8 on Ubuntu Step 1 – Install Java 8 on UbuntuThe OpenJDK 8 is available under default Apt repositories. You can simply install Java 8 on an Ubuntu system using the following commands. sudo apt update sudo apt install openjdk-8-jdk -y Step 2 – Verify Java InstallationYou have successfully installed Java 8 on your system. Let’s verify the installed and current active version using the following command. java -version openjdk version "1.8.0_252" OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1ubuntu1-b09) OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode) Step 3 – Setup JAVA_HOME and JRE_HOME VariableAs you have installed…
Read More
Customer Complaints Analysis Part 1

Customer Complaints Analysis Part 1

In this article, We will analyze Consumer Complains recorded by US government from US citizens about financial products and services using Big Data Technology, We will see step by step process execution of the project. Problem Statement: Analyze the data in Hadoop Eco-system to: Get the number of complaints filed for each company.Get the number of complaints filed under each product.Get the total number of complaints filed from a particular locationGet the list of company grouped by location which has no timely response. Attribute Information or Dataset Details: Data: Input Format - .CSV Public DATASET available at below website https://catalog.data.gov/dataset/consumer-complaint-database…
Read More