Data Science: Data Reduction Techniques In Data Pre-Processing

2 min readOct 28, 2021

Data Reduction:

Since data mining is a technique that is used to handle huge amount of data. While working with huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses data reduction technique. It aims to increase the storage efficiency and reduce data storage and analysis costs.

Dimensionality Reduction:

This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after reconstruction from compressed data, original data can be retrieved, such reduction are called lossless reduction else it is called lossy reduction. The two effective methods of dimensionality reduction are:Wavelet transforms and PCA (Principal Component Analysis).

Principal Component Analysis:

Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. For a lot of machine learning applications it helps to be able to visualize your data. Visualizing 2 or 3 dimensional data is not that challenging. You can use PCA to reduce that 4 dimensional data into 2 or 3 dimensions so that you can plot and hopefully understand the data better.

Variance Threshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features. Our dataset has no zero variance feature so our data isn’t affected here.

You can check out the code below!

GitHub - dhrumildalwadi/DS_PR3

Contribute to dhrumildalwadi/DS_PR3 development by creating an account on GitHub.

github.com

Data Science: Data Reduction Techniques In Data Pre-Processing

GitHub - dhrumildalwadi/DS_PR3

Contribute to dhrumildalwadi/DS_PR3 development by creating an account on GitHub.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Dhrumil Dalwadi

No responses yet