Tuesday, October 26, 2021

Univariate,Bivariate and MultiVariate Analysis by EDA

 # Data science life cycle:

Every Data science Beginner, working professional, student or practitioner follows a few steps while doing. I will tell you about all these steps in simple terms for your understanding.


# 1.Hypothesis definition:- A proposed explanation as a starting point for further investigation.

Ex:- A(company) wants to release a Raincoat(product) in Summer. now the company is in a dilemma whether to release the product or not. (i know its a bad idea, but for understanding, let's think this.)


# 2. Data Acquisition:- collecting the required data.

Ex:- collecting the last 10 years of data in a certain region.


# 3.Exploratory Data Analysis(EDA):-

    Analysing collected data using some concepts(will see them below).

Ex: on collected data(existing data)data scientists will perform some analysis and decide, what are features/metrics to consider for model building.


# 4.Model building:-

This is where Machine learning comes into light.

#Ex:- by using metrics(outputs of EDA), they will predict(using ML )whether the product will be successful or not if it goes into the market.


# 5.Result report:-

After doing EDA and Model building, it generates results.

Ex: as a result of all the above steps we get some results, which decides whether to start production or not

# 6.final Product:- 

Based on the result, we will get a product.

Ex:- if the result generated is positive, A(company) can start production. if the result is negative, A won't start production.

# Exploratory Data Analysis:-

By definition, exploratory data analysis is an approach to analysing data to summarise their main characteristics, often with visual methods.

in other words, we perform analysis on data that we collected, to find important metrics/features by using some nice and pretty visualisations.

every person takes some decisions in their life considering a few points in some situations. to be accurate at these decisions data scientist does some EDA on data.


# Exploratory Data Analysis is majorly performed using the following methods:


# Univariate analysis:-

 Univariate analysis provides summary statistics for each field in the raw data set (or) summary only on one variable. Ex:- CDF,PDF,Box plot, Violin plot.(don't worry, will see below what each of them is)


# Bivariate analysis:-

Bivariate analysis is performed to find the relationship between each variable in the dataset and the target variable of interest (or) using 2 variables and finding the relationship between them.Ex:-Box plot, Violin plot.


# Multivariate analysis:-

Multivariate analysis is performed to understand interactions between different fields in the dataset (or) finding interactions between variables more than 2. Ex:- Pair plot and 3D scatter plot.

#let's download a data set from Kaggle(home for Data scientists), you can download and know more about it here →Habberman dataset.

Full python code is avaiable at


No comments:

Post a Comment

"🚀 Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!"

"🚀 Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!" Big news for data engineers! Delta Lake 2.0+ in...