0% found this document useful (0 votes)

198 views

LDA 01 Linear Discriminant Analysis

Linear discriminant analysis (LDA) is a technique used for classification and feature selection. It works by finding the linear combination of features that best separates two or more classes of objects. LDA finds the directions, called discriminant functions, that maximize the separation between the classes. It assumes normal distribution of the data and equal class covariance matrices. LDA is commonly used for binary classification problems and can be extended to problems with more than two classes. It is generally preferred over logistic regression when the normality assumption is valid.

Uploaded by

Vijay Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views

LDA 01 Linear Discriminant Analysis

Uploaded by

Vijay Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Linear discriminant analysis

• Need of a classification model

• An introduction of classification situation through a case study
• Formal definition of LDA
• LDA function for feature selection – using R
• LDA for classification – using R
• Mahalanobis Distance, linear discriminant function & Bayes
Theorem
• LDA vs PCA
• Industrial usage of LDA
• Handling special situations 1
Created by – Gopal Prasad Malakar
Need of a classification model
BUSINESS SCENARIO

Created by – Gopal Prasad Malakar

2
Business Scenario – need of a
model?
• Say 100,000 prospect
• Say 1,000 takes up the product
• Say by working on 20000 prospect
• Can we get 900 responder
Business is unhappy
with such a poor
response rate
• Note – no possibility of exact
match in real life scenarios
• Think of – if $2 is the cost of • Also very rare possibility of
mailer then one has spend $200 getting all the responder by
per new customer acquisition, working on part of population
right? • Target is to get almost all the
• Can we find a base where by responder by working on only
working on less number of small portion of the population
prospect, we can still get almost
all the responder

Created by – Gopal Prasad Malakar

3
So the target is …..
• Target is to get almost all the responder by working on only part of
the population

X% of Population N
Y% of all Responder K
Population – N Y>X
Responder - K

1- X% of Population – N
1- Y% of Responder - K
 Note RGB concept –
✓ Green the bench mark response rate
✓more response rate – red
✓Less response rate - blue
 Work on red / blue– higher response / lower response rate section
Created by – Gopal Prasad Malakar
4
Purpose of
LDA

Created by – Gopal Prasad Malakar

5
Two purpose
• One : independent variable selection (also called feature selection)
– At times 1000s of independent variables and one need to do a quick selection
of variables to make it manageable
– Highly recommended technique for numeric variable as it is very efficient
computationally
– Can proceed to another techniques – like logistic regression / decision tree
• Two : for classification of data
– From k predictor variables, can we say what will be the value of dependent
variable?
– Very useful technique for binary dependent variable
– Preferred technique for more than two classes

Created by – Gopal Prasad Malakar

6
classical case
OWNERS VS NON OWNER

Created by – Gopal Prasad Malakar

7
Income n lot size

• A riding mower manufacturer is interesting in knowing  who is likely

to buy the riding mower and who is not
• The data shows income andCreated
lot size for owners and non owners
by – Gopal Prasad Malakar
8
Income n lot size

• Is it possible to find a line here, which separates owners vs non owners?

• What is the use of the line?
• One can say, if a equation is giving higher value than the threshold line,
9
then owner else non owner.Created by – Gopal Prasad Malakar
Income n lot size

• What is the misclassification here.

• What is the misclassification rate error?
• 3/ 24 10
Created by – Gopal Prasad Malakar
Income n lot size

• What will the quality of the best line which separates two types?
• The one which gives least misclassification error
• How can you simplify the process of higher than the threshold 11value?
Created by – Gopal Prasad Malakar
Usage simplification
• Generate score for each category
• The category with highest score will give the category
• Can be extended to more than two class
• Like Equipment buyer, entertainment buyer, both buyer customers

Person 01 both

Person 02 Equipment

Person 03 Entertainment

Person 04 Equipment

0 1 2 3 4 5 6

Both Entertainment Equipment

Created by – Gopal Prasad Malakar
12
Formal definition of
LINEAR DISCRIMINANT
ANALYSIS

Created by – Gopal Prasad Malakar

13
Linear discriminant analysis
• Is all about using numeric data to decide about a categorical outcome.
• It is very useful when it is reasonable to believe that independent
variables are normally distributed.
• It assumes that correlation among predictors within a class is the same
across all classes
– So x1, x2, x3…..xn have same correlation when Y=0 or Y=1
– Extended case x1, x2, x3…..xn have same correlation when Y=“High Risk”
or Y=“Medium Risk” or Y=“Low Risk”
• It can be easily extended to more than two categories.
• Logistic regression and probit regression are similar to LDA than
ANOVA is, as they also explain a categorical variable by the values of
continuous independent variables
• When it is unreasonable to expect normality of independent variables,
you prefer logistic regression over LDA.

Created by – Gopal Prasad Malakar

14
Linear discriminant analysis
• If you think of, from historical data, we know
– The probability of the dependent variable (Y) taking a particular class
– And conditional probability of observing X , for given Y
• Example – if you know distribution of height for each gender of students
of Masters in a particular zone is like as follows

• Now if you know the height, can you predict the gender?
• For extreme cases, it is easier but for the overlap region, one will go for
the group that gives least misclassification.
• Now with the new data, X is known. it is all about finding probability of Y
from given X. This is obtained using Bayes theorem. 15
Created by – Gopal Prasad Malakar
D2

• Consider various directions such as directions D1 and D2 shown in

Figure.
• One way to identify a good linear discriminant function is to choose
• Amongst all possible directions the one that has the property that
when we project (drop a perpendicular lines from observations) the
– means of the two groups, onto a line in the chosen direction i.e.
– group means of the projections (feet of the perpendiculars, e.g. P1 and P2
in direction D1) are separated by the maximum possible distance.
Created by – Gopal Prasad Malakar
16
D3
D2

• Amongst all possible directions the one that has the property that
when we project (drop a perpendicular lines from observations) the
– means of the two groups, onto a line in the chosen direction i.e.
– group means of the projections (feet of the perpendiculars, e.g. P1 and P2
in direction D1) are separated by the maximum possible distance.
• Which one appears better doing the job D1 or D2?
• Once you have found the line, you can always use the line
perpendicular to this to define the threshold . D3 is more of a
threshold line here. Created by – Gopal Prasad Malakar
17
When to apply
WHICH TECHNIQUE

Created by – Gopal Prasad Malakar

18
Techniques at a glance
Dependent Variable
Categorical Numeric

Chi Square for contingency ANOVA / dummy variable

table/ classification tree regression/ Regression
Independent

Categorical (type of decision tree) / Tree

Variable

Logistic Regression / Linear Regression /

Linear Discriminant Analysis / Regression Tree (type of
Numeric Probit Regression/ decision tree)
Classification tree/
Support Vector Machine /
Artificial Neural Network
Created by – Gopal Prasad Malakar
19
LDA for
FEATURE SELECTION

Created by – Gopal Prasad Malakar

20
Fisher’s ratio
All Data
Non Responder (Y=0)
Responder (Y=1)

Mean
Mean Var
Var

• Calculate mean of variable for each group

• Calculate variance of variable for each group

Created by – Gopal Prasad Malakar

21
Fishers ratio
Fisher's ratio is a measure for (linear) discriminating power of some
variable:

with m1, and m2 being the means of class 1 and class 2, and V1, and V2
the variances.
• Fisher’s ratio will always be positive (why?)
• Greater the difference between m1 and m2  greater value of Fisher’s
ratio
• i.e. Greater between group variance  greater value of Fisher’s ratio
• Smaller value of V1 and V2  greater value of Fisher’s ratio
• i.e. Smaller within group variance  greater value of Fisher’s ratio

Created by – Gopal Prasad Malakar

22
Fishers ratio

with m1, and m2 being the means of class 1 and class 2, and v1, and v2
the variances.

Quiz – which variable / scenario will have higher Fisher’s ratio?

Created by – Gopal Prasad Malakar

23
Rational for Fishers ratio
• What is the rational behind this ratio, let’s see graphically

✓ Which one is more clear distinction between two group?

✓ So lesser the sum of variance: more clear is the distinction
Created by – Gopal Prasad Malakar
24
Rational for Fishers ratio

• Greater the difference of the means of the population: clearer the

distinction between two sets

Created by – Gopal Prasad Malakar

25
Calculate
FISHER’S RATIO

Created by – Gopal Prasad Malakar

26
Fishers ratio
✓ One can select variables having higher fisher’s ratio
✓ This is standardization in some sense (why?)
✓ Steps to calculate the Fisher’s ratio
X1 X2 X3 X4 X5 X6 .. Y
0
1
0

All Y=0 All Y=1

Var Mean Var Var Mean Var Var Fisher’s ratio
X1 MX10 VX10 X1 MX11 VX11 X1 (MX11 -MX10)2/(VX10+VX11)
X2 MX20 VX20 X2 MX21 VX21 X2 (MX21 –MX20)2/(VX20+VX21)
X3 X3 X3

Demo using R n Excel

Created by – Gopal Prasad Malakar

27
Feel the formula n method

• Avoid this when more than 20% observation has missing value for the
numeric independent variable either for population dependent variable
=0 or 1
• Taking higher Fisher’s ratio will lead to
– maximize between variance
– Minimize within variance
• It is computationally less demanding
– You are just creating two datasets for responder and non responder
– And then calculating mean and variance
– Statistical procedure are optimized for the same
Created by – Gopal Prasad Malakar
28
LDA for
CLASSIFICATION

Created by – Gopal Prasad Malakar

29
Intuitively
Non Responder (Y=0)

Responder (Y=1)

New Object

• Calculate distance of New Object from mean of different populations

• Assign it to the group, from which it is closest
Created by – Gopal Prasad Malakar
30
Step by step
1. Calculate distance of new object from each center
2. Use function of distance (rather than the direct distance) , which
minimizes misclassification error
3. Generate score for each class
4. Score is like probability to belong to the class
5. Assign it to group, for which it has highest score

Created by – Gopal Prasad Malakar

31
How do we calculate
DISTANCE

Created by – Gopal Prasad Malakar

32
Euclidean distance

• Distance between object i and j is given by  Euclidean distance:

d (i, j)  (| x  x |2  | x  x |2 ... | x  x |2 )
i1 j1 i2 j2 ip jp
– Properties
• d(i,j)  0
• d(i,i) = 0
• d(i,j) = d(j,i)
• d(i,j)  d(i,k) + d(k,j)

(U2,V2)
(U1,V1)

D = ((U1 - U2)2 + (V1 - V2)2)1/2

Created by – Gopal Prasad Malakar
33
Euclidean distance issue
A B C
Income Lot Size Lot Size Income ($ Lot Size ( sq
($ 000's) (000's sq ft) Income ($) (000's sq ft) 000's) ft)
75.0 19.6 75,000 19.6 75.0 19,600
52.8 20.8 52,800 20.8 52.8 20,800
64.8 17.2 64,800 17.2 64.8 17,200
43.2 20.4 43,200 20.4 43.2 20,400
84.0 17.6 84,000 17.6 84.0 17,600
49.2 17.6 49,200 17.6 49.2 17,600
• Do you think, the three table will give the same Euclidean distance
between the objects?
• Table B will develop distance primarily on the basis of income
• Table C will develop distance primarily on the basis of lot size
• So it gets affected by scale
• What is the way out?
• Standardization ( z= (x – x_mean)/ x_std_deviation) 34
Created by – Gopal Prasad Malakar
Euclidean distance issue
• But even after Standardization one more issue is remaining
• Take a look at below table

Income ($ Lot Size Saving ($

000's) (000's sq ft) 000's)
75.0 20 27.9
52.8 21 23.3
64.8 17 28.6
43.2 20 19.3
84.0 18 34.8
49.2 18 23.0
• Do you suspect variables to be little correlated here?
• If variables are correlated, then don’t you think, you are counting same
impact multiple times?
• You need a better method – which gets rid of scaling impact as well as
collinearity impact of variables.
• Mahalanobis distance is theCreated
technique
by – Gopal Prasad Malakar
35
What is
MAHALANOBIS DISTANCE

Created by – Gopal Prasad Malakar

36
Mahalanobis distance
• Mahalanobis distance measure does the following:
– it transforms the variables into uncorrelated variables
– And makes their variances equal to 1,
– and then calculates simple Euclidean distance.
• Formula is

X1 X2 … Xn
X1
X2
X..
Xn

Created by – Gopal Prasad Malakar

37
Mahalanobis distance
• Intuitively
x1
X2
X3

Distance from mean

Divide by covariance matrix
Is not it like multi variate standardization? Standardization ( z= (x – m)/ s)
Multi Variate Standardization ( z= (X metrics – m metrices )2/ var-cov metrics)
Please note the formula is measure D2 and that’s why you are dividing by
variance not standard deviation
Created by – Gopal Prasad Malakar
Impact
Y-Values
0
0 20 40 60 80 100 120
-0.5

-1

-1.5

-2

-2.5

-3

-3.5

-4

These two impact slides (39 & 40) has

been added, later hence all the subsequent
slide number has increased by 2 Created by – Gopal Prasad Malakar
39
Impact
Y-Values
0
0 20 40 60 80 100 120

-1

-2

-3

-4

-5

-6

-7

Created by – Gopal Prasad Malakar

40
An example
• Can be seen at
• http://www.jennessent.com/arcview/mahalanobis_description.htm
• The example is about a bi variate data with X and Y as variable
• where
Variable Mean Std
deviation

X 500 79.32
Y 500 79.25

• So for an observation which has X=410 and Y=400, it shows the calculation of
Mahalanobis distance
• Let’s see the Mahalanobis calculation

Created by – Gopal Prasad Malakar

41
A screen grab

Created by – Gopal Prasad Malakar

42
Next steps
DISCRIMINANT FUNCTION

Created by – Gopal Prasad Malakar

43
Steps
1. Calculate distance of new object from each center
2. Use function of distance (rather than the direct distance) , which
minimizes misclassification error
3. Generate score for each class
4. Score is like probability to belong to the class
5. Assign it to group, for which it has highest score
6. We will use R to do all these

• Discriminant analysis finds a set of linear combinations of the variables,

whose values are
✓ as close as possible within groups and
✓ as far apart as possible between groups.
• The linear combinations are called discriminant functions
• For k class, we need k-1 discriminant functions (why?).
• Discriminant functions are obtained using Bayes theorem.
Created by – Gopal Prasad Malakar
44
Linear discriminant function
• Is expected to reduce overlap
• Which will result into lower misclassification error
• The two main approach of doing so is

Reduce
variance

Max diff
between
means

Created by – Gopal Prasad Malakar

45
One such example
http://www.ismll.uni-hildesheim.de/lehre/ml-08w/skript/classification1.pdf

Note – Note –
1. Green thick line joins centers of 1. Rotates the line joining the
the two population mean
2. Almost uniform distribution 2. Close to normal distribution
3. Higher overlap 3. Low overlap
Created by – Gopal Prasad Malakar
46
Bayes theorem
BY EXAMPLE

Created by – Gopal Prasad Malakar

47
A scenario
2% Leakage
Overall
98% No Leakage

• A firm has found that overall probability of leakage of gas is 2%.

• They install a sensor to detect leakage
• Sensor blows alarm 99% of the time when there is leakage
• However it also blows alarm 5% of the time when there is no leakage
99% Alarm

2% Leakage 1% No Alarm If alarm has blown,

Overall then what is the
98% 5% probability of
No Leakage Alarm leakage?

95% No Alarm
Created by – Gopal Prasad Malakar
48
Calculate 99% Alarm

2% Leakage 1% No Alarm If alarm has blown,

Overall then what is the
98% 5% probability of
No Leakage Alarm leakage?

95% No Alarm

• Probability of alarm = (0.020.99 + 0.980.05) = 0.0693

• Probability of leakage if alarm has blown =
– 0.02*0.99/ (0.02*0.99 + 0.99*0.05)
– 1/3.5 ( or 28.6%)
• So out of seven alarm, how many will be false alarm?
• Let’s discuss intuitively
• Prior & posterior probability – probability of an outcome before and after
the event Prior =2%,
• What is the event here? posterior=28.6%
• What was prior probability and posterior probability after an alarm?
49
Created by – Gopal Prasad Malakar
In equation format
99% Alarm

2% Leakage 1% No Alarm
Overall
99% 5%
No Leakage Alarm

95% No Alarm

• Probability of alarm = (0.020.99 + 0.980.05) = 0.0693

• Probability of leakage if alarm has blown =
– 0.02*0.99/ (0.02*0.99 + 0.99*0.05)
P(A/L)∗ P(L)
– P(leakage (L) / alarm (A) : P (L/A) =
(P(A/L)∗ P(L)+P(A/NL)∗P(NL))

Created by – Gopal Prasad Malakar

In equation format
99% Alarm

2% Leakage 1% No Alarm
Overall
99% 5%
No Leakage Alarm

95% No Alarm

P(A/L)∗ P(L)
– P(leakage (L) / alarm (A) : P (L/A) =
(P(A/L)∗ P(L)+P(A/NL)∗P(NL))

Suppose that B1, B2, B3,. . . , Bn partition the

outcomes of an experiment and that A is another
B1 B2 B3 B4 event. For any number, k, with
A 1 <= k <= n, we have the formula:

P( A | Bk )  P( Bk )
P( Bk | A)  n

 P( A | B )  P( B )
i 1
i i

Created by – Gopal Prasad Malakar

51
Demo for two class LDA
USING R

Created by – Gopal Prasad Malakar

52
What is the difference between
LDA AND PCA

Created by – Gopal Prasad Malakar

53
Quiz ?
• What is the difference between LDA & PCA?

PCA  Xs

LDA  Xs with
respect to Y

http://stackoverflow.com/questions/33576963/dimensions-reduction-in-matlab-using-pca
Created by – Gopal Prasad Malakar
54
Quiz ?
• What is the difference between LDA & PCA?

LDA PCA

Discovers relationship between Discovers relationship between

Dependent & independent independent variables
variables
Used for variable reduction based Used for reducing variables based
on strength of relationship between on collinearity of independent
independent n dependent variable variables
Used for prediction of classes

Finds the direction that maximizes Finds direction that maximizes the
difference between two classes variance in the data

Created by – Gopal Prasad Malakar

55
Extension of LDA
TO MORE THAN TWO CLASSES

Created by – Gopal Prasad Malakar

56
IRIS data

Created by – Gopal Prasad Malakar

Data details
setosa versicolor virginica
50 50 50

Iris Data
2.5
my_iris$Petal.Width
1.5
0.5

1 2 3 4 5 6 7
my_iris$Petal.Length

Species mean_pet_length mean_pet_Width

1 setosa 1.462 0.246
2 versicolor 4.260 1.326
3 virginica 5.552 2.026

Created by – Gopal Prasad Malakar

Industrial use of LDA
CLASSIFICATION -- SAME
APPLIES TO CA, LOGISTIC ETC.

Created by – Gopal Prasad Malakar

59
Analytical tool for
reducing attrition
• LDA/ Classification tree / logistic regression is a very handy tool in this
respect
• Note  Objective function is binary here, and tool is generic in nature
• It can help business to know about which profile has high probability of
attrition
• So that efforts can be made to keep them

Created by – Gopal Prasad Malakar

Analytical tool for xsell
• LDA/ Classification tree / logistic regression is a very handy tool in this respect
• Same chart  but objective function is different
• The tool is generic and has wide usage
• It can help business to know about which profile has high probability of
taking your xsell product
• So that effort can be optimized for better gains

Created by – Gopal Prasad Malakar

Campaigns
• Campaigns are important marketing tool
• Here you give targeted offer i.e. customer specific offer
• And track result to know, where you gained good response
• Classification methods can be used to predict the profile, where we can
get the best response.

Created by – Gopal Prasad Malakar

Handling special cases
IN LDA

Created by – Gopal Prasad Malakar

63
For bias sampling
• If frequency of dependent variable in the sample does not reflect reality
• Then
• One needs to incorporate prior (or real) probabilities of class
membership:
– Add log(pj) to the classification function for class j
– Where Pj is probability a case belongs to class j
• Example
– Say sample data contains 50% owner and 50% non owners
– However if actual population has 20% owners and 80% non owners then
– If needed one can derive new_posterior.N=
posterior.N/(posterior.N+posterior.Y)

posterior.N posterior.Y posterior.N posterior.Y

0.152 0.848 =ln(0.8)+ 0.152 =ln(0.2) + 0.848

Created by – Gopal Prasad Malakar

64
For unequal
misclassification costs
• At times misclassification errors has different costs
• Example – better to double check a cancer patient than leave when
confusion
• One needs to incorporate costs of classification errors
• Add :
– Add log(Cj) to the classification function for class j
– Where Cj is misclassification costs to class j
– If Cj are not known then add assumed ln(C1/C2) to one group and ln(1) =0 to other
• Example
– If you have little doubt of cancer, you will like it be doubly sure than otherwise
– If You want to be 50 times more sure that
• if you say no cancer  good health than
• when you say yes may have cancer  worrisome situation
– Here again new_posterior.N & new_posterior.Y can be derived to make sum = 1

posterior.N posterior.Y posterior.N posterior.Y

0.88 0.12 Created by – Gopal Prasad Malakar= 0.88 =ln(50)

65 + 0.12

FRM_Part_1_Formula_Sheet-1651745537143
No ratings yet
FRM_Part_1_Formula_Sheet-1651745537143
31 pages
David I Warton - Eco-Stats - Data Analysis in Ecology - From T-Tests To Multivariate Abundances (Methods in Statistical Ecology) - Springer (2022)
No ratings yet
David I Warton - Eco-Stats - Data Analysis in Ecology - From T-Tests To Multivariate Abundances (Methods in Statistical Ecology) - Springer (2022)
434 pages
Homework 4
No ratings yet
Homework 4
4 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
FinQuiz Level1Mock2013Version2JunePMSolutions
No ratings yet
FinQuiz Level1Mock2013Version2JunePMSolutions
74 pages
Deeper Understanding, Faster Calculation - Exam P Insights and Shortcuts
89% (9)
Deeper Understanding, Faster Calculation - Exam P Insights and Shortcuts
432 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Artificial Neural Networks Kluniversity Course Handout
No ratings yet
Artificial Neural Networks Kluniversity Course Handout
18 pages
Australian Gas Production - Project On Time Series Forecasting
100% (19)
Australian Gas Production - Project On Time Series Forecasting
29 pages
Problem 1
No ratings yet
Problem 1
12 pages
Problem 2 - Survey: Importing Nessceary Libraries
No ratings yet
Problem 2 - Survey: Importing Nessceary Libraries
10 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
Chapter 7 - TThe Box-Jenkins Methodology For ARIMA Models
100% (1)
Chapter 7 - TThe Box-Jenkins Methodology For ARIMA Models
205 pages
ML Assignemnt PDF
No ratings yet
ML Assignemnt PDF
21 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
No ratings yet
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
6 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
PM P L Lohitha 12-12-22 Business Report
100% (1)
PM P L Lohitha 12-12-22 Business Report
31 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
ML Quiz 3
No ratings yet
ML Quiz 3
2 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Project Report: CS 574 - Computer Vision Using Machine Learning
No ratings yet
Project Report: CS 574 - Computer Vision Using Machine Learning
38 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Project: ©great Learning. Proprietary Content. All Rights Reserved. Unauthorised Use or Distribution Prohibited
No ratings yet
Project: ©great Learning. Proprietary Content. All Rights Reserved. Unauthorised Use or Distribution Prohibited
8 pages
Time Series
67% (3)
Time Series
34 pages
Buisiness Reoprt Extended As Project Report
No ratings yet
Buisiness Reoprt Extended As Project Report
18 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
Graded Quiz 1 - Working With Python Great Lakes
No ratings yet
Graded Quiz 1 - Working With Python Great Lakes
6 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Machine Learning GL
No ratings yet
Machine Learning GL
25 pages
All Life Bank - AIML_ML_Project_low_code_notebook
No ratings yet
All Life Bank - AIML_ML_Project_low_code_notebook
78 pages
Tanaya - Lokhande - Advance Statistic Business Report
No ratings yet
Tanaya - Lokhande - Advance Statistic Business Report
24 pages
SMDM Project Report Dsba
No ratings yet
SMDM Project Report Dsba
2 pages
Mvchine Learning Project Report
No ratings yet
Mvchine Learning Project Report
33 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
Predictive Modeling - Supporting File1
No ratings yet
Predictive Modeling - Supporting File1
3 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
6 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
Class Assignment 1 For Business Analytics
No ratings yet
Class Assignment 1 For Business Analytics
5 pages
Multivariate Linear Regression
No ratings yet
Multivariate Linear Regression
30 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Excel 2013/2016: Get Your Hands Dirty
From Everand
Excel 2013/2016: Get Your Hands Dirty
Sam Akrasi
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
I TRIMESTER Modules
No ratings yet
I TRIMESTER Modules
13 pages
Civic Attitudes Scale
No ratings yet
Civic Attitudes Scale
19 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
CH 07
No ratings yet
CH 07
13 pages
An Introduction To Mathematical Statistics Fetsje Bijma Marianne Jonker Aad Vaart Stichting Epsilon Uitgaven Reinie Ern instant download
No ratings yet
An Introduction To Mathematical Statistics Fetsje Bijma Marianne Jonker Aad Vaart Stichting Epsilon Uitgaven Reinie Ern instant download
83 pages
Unit 5 - PS
No ratings yet
Unit 5 - PS
51 pages
A Survey On Awareness and Perception of Millennial Towards CSR Activities Conducted by FMCG Companies in Pune PDF
No ratings yet
A Survey On Awareness and Perception of Millennial Towards CSR Activities Conducted by FMCG Companies in Pune PDF
13 pages
A Risk and Cost Management
No ratings yet
A Risk and Cost Management
8 pages
Antonio Marco - A Pen and Paper Introduction To Statistics (2024, CRC Press - Taylor & Francis Group)
No ratings yet
Antonio Marco - A Pen and Paper Introduction To Statistics (2024, CRC Press - Taylor & Francis Group)
161 pages
Hypothesis Testing - Exercises: 2021 / Ian Soon 1of4
No ratings yet
Hypothesis Testing - Exercises: 2021 / Ian Soon 1of4
4 pages
Ex. Sheet 3 - Sol.
No ratings yet
Ex. Sheet 3 - Sol.
4 pages
B3 Practice Exam With Answers
No ratings yet
B3 Practice Exam With Answers
12 pages
Analisis Granul
No ratings yet
Analisis Granul
90 pages
Revision Notes P3 0 P7
No ratings yet
Revision Notes P3 0 P7
6 pages
Alagde Pricemk
No ratings yet
Alagde Pricemk
16 pages
Akachukwu Chichebe, M.: Dr. A. M. Aibinu and Engr. Dr. M. N. Nwohu
No ratings yet
Akachukwu Chichebe, M.: Dr. A. M. Aibinu and Engr. Dr. M. N. Nwohu
54 pages
Screenshot 2023-03-21 at 10.02.39 PM
No ratings yet
Screenshot 2023-03-21 at 10.02.39 PM
15 pages
15. ANOVA
No ratings yet
15. ANOVA
15 pages
I Nteraction Terms in Logit and Probit Models: Chunrong Ai, Edward C. Norton
No ratings yet
I Nteraction Terms in Logit and Probit Models: Chunrong Ai, Edward C. Norton
7 pages
Lesson7 Measures of Variation
100% (1)
Lesson7 Measures of Variation
28 pages
9fm0-4b-rms-20220818
No ratings yet
9fm0-4b-rms-20220818
14 pages
STMOL Lecture 1
No ratings yet
STMOL Lecture 1
54 pages
Ases311 Midrev
No ratings yet
Ases311 Midrev
17 pages
11 Grade 3rd Term Note Maths
No ratings yet
11 Grade 3rd Term Note Maths
22 pages
Lectures on Probability Theory and Mathematical Statistics 2nd Edition Marco Taboga download
100% (2)
Lectures on Probability Theory and Mathematical Statistics 2nd Edition Marco Taboga download
84 pages
A Novel Image Encryption Algorithm Based On Chaotic System and DNA Computing
No ratings yet
A Novel Image Encryption Algorithm Based On Chaotic System and DNA Computing
25 pages

Uploaded by

Uploaded by

Linear discriminant analysis

• Need of a classification model

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

• A riding mower manufacturer is interesting in knowing  who is likely

• Is it possible to find a line here, which separates owners vs non owners?

• What is the misclassification here.

Both Entertainment Equipment

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

• Consider various directions such as directions D1 and D2 shown in

Created by – Gopal Prasad Malakar

Chi Square for contingency ANOVA / dummy variable

Categorical (type of decision tree) / Tree

Logistic Regression / Linear Regression /

Created by – Gopal Prasad Malakar

• Calculate mean of variable for each group

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Quiz – which variable / scenario will have higher Fisher’s ratio?

Created by – Gopal Prasad Malakar

✓ Which one is more clear distinction between two group?

• Greater the difference of the means of the population: clearer the

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

All Y=0 All Y=1

Demo using R n Excel

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

• Calculate distance of New Object from mean of different populations

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

• Distance between object i and j is given by  Euclidean distance:

D = ((U1 - U2)2 + (V1 - V2)2)1/2

Income ($ Lot Size Saving ($

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Distance from mean

These two impact slides (39 & 40) has

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

• Discriminant analysis finds a set of linear combinations of the variables,

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

• A firm has found that overall probability of leakage of gas is 2%.

2% Leakage 1% No Alarm If alarm has blown,

2% Leakage 1% No Alarm If alarm has blown,

• Probability of alarm = (0.02*0.99 + 0.98*0.05) = 0.0693

• Probability of alarm = (0.02*0.99 + 0.98*0.05) = 0.0693

Created by – Gopal Prasad Malakar

Suppose that B1, B2, B3,. . . , Bn partition the

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Discovers relationship between Discovers relationship between

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Species mean_pet_length mean_pet_Width

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

Created by – Gopal Prasad Malakar

posterior.N posterior.Y posterior.N posterior.Y

0.152 0.848 =ln(0.8)+ 0.152 =ln(0.2) + 0.848

Created by – Gopal Prasad Malakar

posterior.N posterior.Y posterior.N posterior.Y

0.88 0.12 Created by – Gopal Prasad Malakar= 0.88 =ln(50)

You might also like

• Probability of alarm = (0.020.99 + 0.980.05) = 0.0693

• Probability of alarm = (0.020.99 + 0.980.05) = 0.0693