Uncategorized

Data Analysis using SQL

Data Analysis using SQL

Agenda  This script will serve as an introduction to advanced data analysis utilizing the SQL language, which should be a necessary tool for every data scientist, data engineer, and machine learning engineer to gain access to data. The idea underlying SQL is fairly similar to that of any other language or tool used for data analysis (excel, Pandas), thus it should be very intuitive for individuals who have experience working with data. Loading Data into https://sqliteonline.com/ Open Website in Browser Click on File and select on Open DB Select the file database.sqlite which is downloaded from the download section and…
Read More
Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Practice Test to prepare for Apache Spark Certification – Databricks Certification exam.

Databricks is founded by the creators of Apache Spark, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. Databricks is an American enterprise software company founded by the creators of Apache Spark. The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. Gartner has classified Databricks as a leader in the last quadrant for Data Science and Machine Learning platforms. General information: Exam length: The exam…
Read More
Practice Apache Superset without Installing

Practice Apache Superset without Installing

Apache Superset is a modern data exploration and visualization platform. Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.PresetPreset Cloud is a fully hosted, hassle free cloud service for Apache Superset™. Get started for free today!www.preset.ioWe should start with Starter Plan Hassle free Superset in the cloud, best for small teams.Free: Forever, for up to 5 usersFeatures:Unlimited dashboards and chartsNo-code chart builderCollaborative SQL editorOver 40 visualization typesChart and dashboard cachehttps://youtu.be/49ItnEXsN7M
Read More
Installing Apache Superset on Ubuntu (Linux) Machine

Installing Apache Superset on Ubuntu (Linux) Machine

Installing Superset from Scratch In Ubuntu 20.04 the following command will ensure that the required dependencies are installed: sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev Python Virtual EnvironmentWe highly recommend installing Superset inside of a virtual environment. pip install virtualenv You can create and activate a virtual environment using: # virtualenv is shipped in Python 3.6+ as venv instead of pyvenv. # See https://docs.python.org/3.6/library/venv.html python3 -m venv venv . venv/bin/activate Installing and Initializing SupersetFirst, start by installing apache-superset: pip install apache-superset Then, you need to initialize the database: superset db upgrade Finish installing by running through the…
Read More
Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing Apache Cassandra on Ubuntu (Linux) Machine

Installing the binary tarball Verify the version of Java installed. For example: Command $ java -version Result openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) 2. Download the binary tarball from one of the mirrors on the Apache Cassandra Download site. For example, to download Cassandra 4.0.1: $ curl -OL https://dlcdn.apache.org/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz The mirrors only host the latest versions of each major supported release. To download an earlier version of Cassandra, visit the Apache Archives. OPTIONAL: Verify the integrity of the downloaded tarball using one of the methods here. For example, to verify…
Read More
Installing Java on Ubuntu (Linux) Machine

Installing Java on Ubuntu (Linux) Machine

Steps for Installing JAVA 8 on Ubuntu Step 1 – Install Java 8 on UbuntuThe OpenJDK 8 is available under default Apt repositories. You can simply install Java 8 on an Ubuntu system using the following commands. sudo apt update sudo apt install openjdk-8-jdk -y Step 2 – Verify Java InstallationYou have successfully installed Java 8 on your system. Let’s verify the installed and current active version using the following command. java -version openjdk version "1.8.0_252" OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1ubuntu1-b09) OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode) Step 3 – Setup JAVA_HOME and JRE_HOME VariableAs you have installed…
Read More
Apache Zeppelin with Apache Spark Installation on Ubuntu

Apache Zeppelin with Apache Spark Installation on Ubuntu

Installation Steps for Apache Zeppelin on Ubuntu Prerequisite: Need to have Java 7 or Java 8 installed on Ubuntu Operating System. The first step is to download the latest version on Apache Zeppelin and save it in one of the folder Link: http://zeppelin.apache.org/download.html The second step is to unzip the downloaded tar file (i.e) .tgz (We have stored the downloaded tar file in /home/bigdata/apachezeppelin/ (We have manually created apachezeppelin folder by using command mkdir apachezeppelin) bigdata@bigdata:~$ cd /home/bigdata/apachezeppelin/ bigdata@bigdata:~/apachezeppelin$ pwd /home/bigdata/apachezeppelin bigdata@bigdata:~/apachezeppelin$ ls -ltr total 683072 -rw-rw-r-- 1 bigdata bigdata 699455687 Aug 15 11:27 zeppelin-0.9.0-bin-netinst.tgz bigdata@bigdata:~/apachezeppelin$ tar -xvzf zeppelin-0.9.0-bin-netinst.tgz zeppelin-0.9.0-bin-netinst/…
Read More
Basics about Databricks notebook

Basics about Databricks notebook

Click on the Create a Blank Notebook as shown in the below Image Specify the File name and Select the Cluster which we have created. A notebook is a collection of runnable cells (commands). When you use a notebook, you are primarily developing and running cells. The supported magic commands are: %python, %r, %scala, and %sql. Additionally: %shAllows you to execute shell code in your notebook. %fsAllows you to use dbutils filesystem commands. %mdAllows you to include various types of documentation, including text, images, and mathematical formulas and equations. For more details please refer Databricks Documentation.
Read More
Free Account creation in Databricks Community Edition

Free Account creation in Databricks Community Edition

What is the Databricks Community Edition? The Databricks Community Edition is the free version of our cloud-based big data platform. Its allows users to access a micro-cluster as well as a cluster manager and notebook environment. All users can share their notebooks and host them free of charge with Databricks. Link for Databricks Community Edition https://community.cloud.databricks.com/login.html Open the above Link in any Latest Browser, we recommend use Google Chrome for better experience. Click on Sign up as shown in the Image A New Page will get open as shown in the below Image. Fill all the required details as applicable…
Read More
Provisioning a Spark Cluster or Creating a Spark Cluster

Provisioning a Spark Cluster or Creating a Spark Cluster

Once you login to Databricks Community Edition on the Left Tab we have Cluster Button as shown in the Image Click on it. As soon as you click on Clusters Button a new webpage will get open as shown in the below image. As soon as you click on Create Cluster a new webpage will get open as shown in the below image Launching Spark Cluster Steps are as follows: Specify the Cluster name [You can specify any Cluster Name for our all Project we will specify it as SparkCluster] Click on Create Cluster Please make a note: Free 15GB Memory:…
Read More