0% found this document useful (0 votes)

215 views

Introduction To Dimensionality Reduction

Dimensionality reduction is a technique used to reduce the number of features in a dataset while retaining important information. It can be done to reduce model complexity, improve algorithm performance, or aid visualization. Common techniques are principal component analysis (PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA), which project data onto a lower-dimensional space while preserving important information. Dimensionality reduction is performed during preprocessing to improve model performance, though it can discard useful information, so care is needed when applying these techniques.

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

215 views

Introduction To Dimensionality Reduction

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Introduction to Dimensionality

Reduction
Machine Learning: As discussed in this article, machine learning is nothing but a
field of study which allows computers to “learn” like humans without any need of
explicit programming.
What is Predictive Modeling: Predictive modeling is a probabilistic process that
allows us to forecast outcomes, on the basis of some predictors. These predictors are
basically features that come into play when deciding the final result, i.e. the outcome
of the model.
Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible. This can
be done for a variety of reasons, such as to reduce the complexity of a model, to
improve the performance of a learning algorithm, or to make it easier to visualize the
data. There are several techniques for dimensionality reduction, including principal
component analysis (PCA), singular value decomposition (SVD), and linear
discriminant analysis (LDA). Each technique uses a different method to project the
data onto a lower-dimensional space while preserving important information.
What is Dimensionality Reduction?
Dimensionality reduction is a technique used to reduce the number of features in a
dataset while retaining as much of the important information as possible. In other
words, it is a process of transforming high-dimensional data into a lower-
dimensional space that still preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of
features or variables. The curse of dimensionality is a common problem in machine
learning, where the performance of the model deteriorates as the number of features
increases. This is because the complexity of the model increases with the number of
features, and it becomes more difficult to find a good solution. In addition, high-
dimensional data can also lead to overfitting, where the model fits the training data
too closely and does not generalize well to new data.
Dimensionality reduction can help to mitigate these problems by reducing the
complexity of the model and improving its generalization performance. There are
two main approaches to dimensionality reduction: feature selection and feature
extraction.
Feature Selection:
Feature selection involves selecting a subset of the original features that are most
relevant to the problem at hand. The goal is to reduce the dimensionality of the
dataset while retaining the most important features. There are several methods for
feature selection, including filter methods, wrapper methods, and embedded
methods. Filter methods rank the features based on their relevance to the target
variable, wrapper methods use the model performance as the criteria for selecting
features, and embedded methods combine feature selection with the model training
process.
Feature Extraction:
Feature extraction involves creating new features by combining or transforming the
original features. The goal is to create a set of features that captures the essence of
the original data in a lower-dimensional space. There are several methods for feature
extraction, including principal component analysis (PCA), linear discriminant
analysis (LDA), and t-distributed stochastic neighbor embedding (t-SNE). PCA is a
popular technique that projects the original features onto a lower-dimensional space
while preserving as much of the variance as possible.
Why is Dimensionality Reduction important in Machine Learning and
Predictive Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple
e-mail classification problem, where we need to classify whether the e-mail is spam
or not. This can involve a large number of features, such as whether or not the e-mail
has a generic title, the content of the e-mail, whether the e-mail uses a template, etc.
However, some of these features may overlap. In another condition, a classification
problem that relies on both humidity and rainfall can be collapsed into just one
underlying feature, since both of the aforementioned are correlated to a high degree.
Hence, we can reduce the number of features in such problems. A 3-D classification
problem can be hard to visualize, whereas a 2-D one can be mapped to a simple 2-
dimensional space, and a 1-D problem to a simple line. The below figure illustrates
this concept, where a 3-D feature space is split into two 2-D feature spaces, and
later, if found to be correlated, the number of features can be reduced even further.

Components of Dimensionality Reduction

There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of
variables, or features, to get a smaller subset which can be used to model
the problem. It usually involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to
a lower dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the
method used. The prime linear method, called Principal Component Analysis, or
PCA, is discussed below.
Principal Component Analysis
This method was introduced by Karl Pearson. It works on the condition that while
the data in a higher dimensional space is mapped to data in a lower dimension space,
the variance of the data in the lower dimensional space should be maximum.

It involves the following steps:

 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to
reconstruct a large fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been
some data loss in the process. But, the most important variances should be retained
by the remaining eigenvectors.
Advantages of Dimensionality Reduction
 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
 Improved Visualization: High dimensional data is difficult to visualize,
and dimensionality reduction techniques can help in visualizing the data in
2D or 3D, which can help in better understanding and analysis.
 Overfitting Prevention: High dimensional data may lead to overfitting in
machine learning models, which can lead to poor generalization
performance. Dimensionality reduction can help in reducing the
complexity of the data, and hence prevent overfitting.
 Feature Extraction: Dimensionality reduction can help in extracting
important features from high dimensional data, which can be useful in
feature selection for machine learning models.
 Data Preprocessing: Dimensionality reduction can be used as a
preprocessing step before applying machine learning algorithms to reduce
the dimensionality of the data and hence improve the performance of the
model.
 Improved Performance: Dimensionality reduction can help in improving
the performance of machine learning models by reducing the complexity
of the data, and hence reducing the noise and irrelevant information in the
data.
Disadvantages of Dimensionality Reduction
 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is
sometimes undesirable.
 PCA fails in cases where mean and covariance are not enough to define
datasets.
 We may not know how many principal components to keep- in practice,
some thumb rules are applied.
 Interpretability: The reduced dimensions may not be easily interpretable,
and it may be difficult to understand the relationship between the original
features and the reduced dimensions.
 Overfitting: In some cases, dimensionality reduction may lead to
overfitting, especially when the number of components is chosen based on
the training data.
 Sensitivity to outliers: Some dimensionality reduction techniques are
sensitive to outliers, which can result in a biased representation of the
data.
 Computational complexity: Some dimensionality reduction techniques,
such as manifold learning, can be computationally intensive, especially
when dealing with large datasets.

Important points:

 Dimensionality reduction is the process of reducing the number of features

in a dataset while retaining as much information as possible.
This can be done to reduce the complexity of a model, improve the
performance of a learning algorithm, or make it easier to visualize the
data.
 Techniques for dimensionality reduction include: principal component
analysis (PCA), singular value decomposition (SVD), and linear
discriminant analysis (LDA).
 Each technique projects the data onto a lower-dimensional space while
preserving important information.
 Dimensionality reduction is performed during pre-processing stage before
building a model to improve the performance
 It is important to note that dimensionality reduction can also discard useful
information, so care must be taken when applying these techniques.
This article is contributed by Anannya Uberoi. If you like GeeksforGeeks and
would like to contribute, you can also write an article
using write.geeksforgeeks.org or mail your article to review-
[email protected]. See your article appearing on the GeeksforGeeks main
page and help other Geeks. Please write comments if you find anything incorrect, or
if you want to share more information about the topic discussed above.

(Ebook) Generative Deep Learning by David Foster ISBN 9781492041948, 1492041947 - The ebook is available for instant download, no waiting required
100% (1)
(Ebook) Generative Deep Learning by David Foster ISBN 9781492041948, 1492041947 - The ebook is available for instant download, no waiting required
54 pages
CRISC Four Domains Brief
No ratings yet
CRISC Four Domains Brief
6 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Introduction To The Philosophy of The Human Person: Intersubjectivity
100% (2)
Introduction To The Philosophy of The Human Person: Intersubjectivity
29 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Lab Program
100% (1)
Lab Program
15 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Unit-3-Second Chapter
No ratings yet
Unit-3-Second Chapter
9 pages
Deep Learning (RCS-086) ppt-1 of Unit-1
100% (2)
Deep Learning (RCS-086) ppt-1 of Unit-1
14 pages
Tangent Prop and Manifold Tangent Classifier are b
No ratings yet
Tangent Prop and Manifold Tangent Classifier are b
4 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Pattern Recognition and Anomaly Detection Lab
No ratings yet
Pattern Recognition and Anomaly Detection Lab
3 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
ML Mid Sem Question Bank
No ratings yet
ML Mid Sem Question Bank
11 pages
JNTUK R20 ML UNIT-I Final
No ratings yet
JNTUK R20 ML UNIT-I Final
22 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 2 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 2 Notes
51 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
Types of Classification Algorithm
No ratings yet
Types of Classification Algorithm
27 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Designing A Learning System
No ratings yet
Designing A Learning System
12 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Artificial Intelligence: Adversarial Search
No ratings yet
Artificial Intelligence: Adversarial Search
36 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Unit 3
No ratings yet
Unit 3
99 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
UNIT-4
No ratings yet
UNIT-4
79 pages
Unit 4
100% (1)
Unit 4
57 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Feature Creation in Data Mining
No ratings yet
Feature Creation in Data Mining
5 pages
18AI61
No ratings yet
18AI61
3 pages
AI-UNIT-2 PPT
No ratings yet
AI-UNIT-2 PPT
135 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
CP5191 Machine Learning Techniques L T P C3 0 0 3
No ratings yet
CP5191 Machine Learning Techniques L T P C3 0 0 3
7 pages
CS3351 AIML UNIT 5 NOTES EduEngg
No ratings yet
CS3351 AIML UNIT 5 NOTES EduEngg
35 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Deep Learning Question Bank(2024-25)
No ratings yet
Deep Learning Question Bank(2024-25)
2 pages
Machine Learning: in Telugu
No ratings yet
Machine Learning: in Telugu
14 pages
Searching Sorting Notes Handwritten
No ratings yet
Searching Sorting Notes Handwritten
29 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
ML Question Bank 2024
No ratings yet
ML Question Bank 2024
2 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Game Playing: Adversarial Search
No ratings yet
Game Playing: Adversarial Search
66 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Activation+Functions
No ratings yet
Activation+Functions
15 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
Difference Between K Means and Hierarchical Clustering
No ratings yet
Difference Between K Means and Hierarchical Clustering
2 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Linear Equations-2
No ratings yet
Linear Equations-2
2 pages
SQL Sequences
No ratings yet
SQL Sequences
3 pages
SQL Query Processing10
No ratings yet
SQL Query Processing10
3 pages
SQL UNION Clause
No ratings yet
SQL UNION Clause
3 pages
SQL Part 1
No ratings yet
SQL Part 1
4 pages
SQL WITH Clause
No ratings yet
SQL WITH Clause
3 pages
A320 Family LGCIU Troubleshooting Guidelines
No ratings yet
A320 Family LGCIU Troubleshooting Guidelines
2 pages
TD Esc 02 de en 13 015 Rev000a Description Cold Climate
No ratings yet
TD Esc 02 de en 13 015 Rev000a Description Cold Climate
4 pages
Life Insurance Cover Letter
100% (1)
Life Insurance Cover Letter
8 pages
Meditation and Yoga As Alternative Therapy For Primary Dysmenorrhea
No ratings yet
Meditation and Yoga As Alternative Therapy For Primary Dysmenorrhea
6 pages
Introduction to Basics of Pharmacology And Toxicology Volume 2 Essentials of Systemic Pharmacology From Principles to Practice 1st Edition by Abialbon Paul, Nishanthi Anandabaskar, Jayanthi Mathaiyan, Gerard Marshall Raj 9813360089 9789813360082 instant download
100% (1)
Introduction to Basics of Pharmacology And Toxicology Volume 2 Essentials of Systemic Pharmacology From Principles to Practice 1st Edition by Abialbon Paul, Nishanthi Anandabaskar, Jayanthi Mathaiyan, Gerard Marshall Raj 9813360089 9789813360082 instant download
47 pages
Manual WaterCAD V8i - Guia Del Usuario (Ingles) (0601-0800) PDF
No ratings yet
Manual WaterCAD V8i - Guia Del Usuario (Ingles) (0601-0800) PDF
200 pages
Class 12 Physics 2023-24 Notes Chapter 2 - Electrostatic Potential and Capacitance
No ratings yet
Class 12 Physics 2023-24 Notes Chapter 2 - Electrostatic Potential and Capacitance
32 pages
Logical Division Paragraph - Fix
No ratings yet
Logical Division Paragraph - Fix
13 pages
An Evaluation of The National Museum Dumaguete
No ratings yet
An Evaluation of The National Museum Dumaguete
3 pages
BUS 405 original
No ratings yet
BUS 405 original
103 pages
Model Question For International Business, 2 Semester Masters in Business Studies
No ratings yet
Model Question For International Business, 2 Semester Masters in Business Studies
2 pages
126 en PDF
No ratings yet
126 en PDF
0 pages
Cebu Car Rental - Rent A Car in Cebu - Newest Cars at Lowest Rates.
No ratings yet
Cebu Car Rental - Rent A Car in Cebu - Newest Cars at Lowest Rates.
20 pages
Soil Liquefaction
No ratings yet
Soil Liquefaction
35 pages
India The Emerging Giant A Panagariya
No ratings yet
India The Emerging Giant A Panagariya
5 pages
Child Study Notes Ecd
No ratings yet
Child Study Notes Ecd
15 pages
English 10-Q3-Module 2-Writing An Independent Critique
No ratings yet
English 10-Q3-Module 2-Writing An Independent Critique
42 pages
ACRS Product Range
No ratings yet
ACRS Product Range
10 pages
Flushing Record To Be Updated and Review
No ratings yet
Flushing Record To Be Updated and Review
10 pages
SIGNIFICANT FIGURES Lesson Plan
50% (2)
SIGNIFICANT FIGURES Lesson Plan
4 pages
Nisterin Dirbana
No ratings yet
Nisterin Dirbana
56 pages
Tally.Prime
No ratings yet
Tally.Prime
2 pages
Manufacturing Engineering and Technology 7th Edition Serope Kalpakjian - The ebook with rich content is ready for you to download
100% (1)
Manufacturing Engineering and Technology 7th Edition Serope Kalpakjian - The ebook with rich content is ready for you to download
61 pages
Learner Activity Sheet/Worksheets in Understanding Culture Society and Politics
100% (1)
Learner Activity Sheet/Worksheets in Understanding Culture Society and Politics
5 pages
Protable Computer Group HW Department: First International Computer, Inc
No ratings yet
Protable Computer Group HW Department: First International Computer, Inc
54 pages
NC Tech Specs
No ratings yet
NC Tech Specs
5 pages
Thesis - Chapter 3
No ratings yet
Thesis - Chapter 3
5 pages
The Cawdor Complex
0% (1)
The Cawdor Complex
42 pages

Uploaded by

Uploaded by

Introduction to Dimensionality

Components of Dimensionality Reduction

It involves the following steps:

 Dimensionality reduction is the process of reducing the number of features

You might also like