0% found this document useful (0 votes)

33 views

Maximum Likelihood Estimation (MLE)

1) Maximum likelihood estimation (MLE) involves choosing parameters that maximize the likelihood function to obtain estimates. 2) Under regularity conditions like independent and identically distributed observations, the MLE is a consistent estimator that converges in probability to the true parameter values as the sample size increases. 3) The MLE is a function of sufficient statistics and has desirable properties like invariance to transformations of parameters.

Uploaded by

Juan Eduardo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Maximum Likelihood Estimation (MLE)

Uploaded by

Juan Eduardo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Econ 620

Maximum Likelihood Estimation (MLE)

Definition of MLE
• Consider a parametric model in which the joint distribution of Y = (y1 , y2 , · · ·, yn ) has a density
(Y ; θ) with respect to a dominating measure µ, where θ ∈ Θ ⊂ RP .

Definition 1 A maximum likelihood estimator of θ is a solution to the maximization problem

max (y; θ)
θ∈Θ

• Note that the solution to an optimization problem is invariant to a strictly monotone increasing trans-
formation of the objective function, a MLE can be obtained as a solution to the following problem;

max log (y; θ) = max L (y; θ)

θ∈Θ θ∈Θ

Proposition 2 (Suﬃcient condition for existence) If the parameter space Θ is compact and if the likelihood
function θ → (y; θ) is continuous on Θ, then there exists a MLE.

Proposition 3 (Suﬃcient condition for uniqueness of MLE) If the parameter space Θ is convex and if the
likelihood function θ → (y; θ) is strictly concave in θ, then the MLE is unique when it exists.

• If the observations on Y are i.i.d. with density f (yi ; θ) for each observation, then we can write the
likelihood function as

n n
(y; θ) = f (yi ; θ) ⇒ L (y; θ) = log f (yi ; θ)
i=1 i=1

Properties of MLE
Proposition 4 (Functional invariance
of MLE) Suppose a bijective function g : Θ → Λ where Λ ⊂ Rq and
θ is a MLE of θ, then λ
= g θ is a MLE of λ ∈ Λ.

⇒ By deﬁnition of MLE, we have

θ ∈ Θ and y; θ ≥ (y; θ) , ∀θ ∈ Θ

or equivalently,
∈ Λ and y; g −1 λ
λ ≥ y; g −1 (λ) , ∀λ ∈ Λ

= g θ is a MLE of λ in a model with density y; g −1 (λ) .
which implies that λ

Proposition 5 (Relationship with suﬃciency) MLE is a function of every suﬃcient statistic.

⇒ Let S (Y ) be a suﬃcient statistic. From the factorization theorem of a suﬃcient statistic, the density
function can be written as (y; θ) = Ψ (S (y) ; θ) h (y) , i.e., L (y; θ) = log Ψ (S (y) ; θ) + log h (y) . Hence max-
imizing (y; θ) with respect to θ is equivalent to maximizing log Ψ (S (y) ; θ) with respect to θ. Therefore,
MLE depends on Y through S (Y ) .

• To discuss asymptotic properties of MLE, which are why we study and use MLE in practice, we need
some so-called regularity conditions. These conditions are to be checked not to be granted before
we use MLE. It is diﬃcult, mostly impossible, to check in practice, though.

1
Regularity Conditions
1. The variables Yi , i = 1, 2, · · · are independent and identically distributed with density f (y; θ) .
2. The parameter space Θ is compact.
3. The true but unknown parameter value θ0 is identiﬁed, i.e.

θ0 = arg max Eθ0 log f (Yi ; θ)

θ∈Θ

4. The likelihood function

n
L (y; θ) = log f (yi ; θ)
i=1

is continuous in θ.
5. Eθ0 log f (Y ; θ) exists.
6. The log-likelihood function is such that n1 L (y; θ) converges almost surely (in probability) to Eθ0 log f (Yi ; θ)
uniformly in θ ∈ Θ, i.e.,

1

sup L (y; θ) − Eθ0 log f (Yi ; θ) < δ almost surely (in probability) for some δ > 0.
θ∈Θ n

Proposition 6 Under 1 - 6, there exists a sequence of MLE’s converging almost surely (in probability) to
the true parameter value θ0 . That is, MLE is a consistent estimator.

⇒ 1 and 2 ensure the existence

1
n of MLE θn . It is obtained by maximizing L (y; θ) or equivalently,
n L (y; θ) .Since n1 L (y; θ) = n1 i=1 log f (yi ; θ) can be interpreted as the sample mean of the random
variables log f (yi ; θ) , which are i.i.d., the objective function converges almost surely (in probability) to
Eθ0 log f (Y ; θ) by the strong(weak) law of large numbers. Furthermore, the uniform strong law of large

numbers implies that the solution to n1 ni=1 log f (yi ; θ) , θn , converges to the solution to the limit problem

max Eθ0 log f (Y ; θ)

θ∈Θ

i.e.,

max log f (y; θ) f (y; θ0 ) dy

θ∈Θ Y

Now, note that the identiﬁability condition 3 ensures the convergence of θn to θ0 .

More regularity conditions for asymptotic distribution

2’. θ0 ∈ Int (Θ) .
7. The log-likelihood function L (y; θ) is twice continuously diﬀerentiable in a neighborhood of θ0 .
8. Integration and diﬀerential operators are interchangeable.
9. The matrix 2
∂ log f (Y ; θ0 )
I (θ0 ) = Eθ0 −
∂θ∂θ
called information matrix, exists and non-singular.

• The additional assumptions enables us to use diﬀerential method to obtain MLE and its asymptotic
distribution.

Lemma 7
∂ log f (Y ; θ0 )
Eθ0 = 0.
∂θ

2
⇒

∂ log f (Y ; θ0 ) ∂ log f (y; θ0 )

Eθ0 = f (y; θ0 ) dy
∂θ ∂θ

1 ∂f (y; θ0 ) ∂f (y; θ0 )
= f (y; θ0 ) dy = dy
f (y; θ0 ) ∂θ ∂θ
However,

f (y; θ0 ) dy = 1 by deﬁnition.

Hence, diﬀerentiating with respect to θ gives

∂ ∂f (y; θ0 )
f (y; θ0 ) dy = dy = 0
∂θ ∂θ
Lemma 8 2
∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 )
Eθ0 = Eθ0 −
∂θ ∂θ ∂θ∂θ
⇒
2
∂ log f (Y ; θ0 )
Eθ0
∂θ∂θ

2

∂ log f (y; θ0 ) ∂ ∂ log f (y; θ0 )
= f (y; θ 0 ) dy = f (y; θ0 ) dy
∂θ∂θ ∂θ ∂θ

∂ 1 ∂f (y; θ0 )
= f (y; θ0 ) dy
∂θ f (y; θ0 ) ∂θ

1 ∂f (y; θ0 ) ∂f (y; θ0 ) 1 ∂ 2 f (y; θ0 )
= − 2 + f (y; θ0 ) dy
(f (y; θ0 )) ∂θ ∂θ f (y; θ0 ) ∂θ∂θ

2
1 ∂f (y; θ0 ) 1 ∂f (y; θ0 ) ∂ f (y; θ0 )
=− f (y; θ0 ) dy + dy
f (y; θ0 ) ∂θ f (y; θ0 ) ∂θ ∂θ∂θ

∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 )
=− f (y; θ0 ) dy = −Eθ0
∂θ ∂θ ∂θ ∂θ
∂ 2 f (y;θ0 )
The last line follows from the fact that ∂θ∂θ dy = 0.

Proposition 9 Under 1,2’, 3 - 9, a sequence of MLE, θn , satisﬁes

√
n θn − θ0 → N 0, I (θ0 )
d −1

⇒ A Taylor series expansion of the ﬁrst order condition around the true value of θ, θ0 , yields

∂L θn ∂L (θ0 ) ∂ 2 L (θ∗ )
= + θ n − θ 0
∂θ ∂θ ∂θ∂θ
where θ∗ is on the line segment connecting θn and θ0 . From the ﬁrst order condition, we have
∂L (θ0 ) ∂ 2 L (θ∗ )
0= +
θn − θ0
∂θ ∂θ∂θ
Therefore,
−1
√ 1 ∂ 2 L (θ∗ ) 1 ∂L (θ0 )
n θn − θ0 = − √
n ∂θ∂θ n ∂θ
As n → ∞,
1 ∂ 2 log f (Yi ; θ∗ )
n
1 ∂ 2 L (θ∗ )
− = −
n ∂θ∂θ n i=1 ∂θ∂θ

3
converges almost surely to 2
∂ log f (Y ; θ0 )
I (θ0 ) = Eθ0 −
∂θ∂θ
a.s.
by the strong law of large numbers and the fact that θ∗ → θ0 . Moreover,
1 ∂ log f (Y ; θ0 )
n
1 ∂L (θ0 )
√ = √
n ∂θ n i=1 ∂θ
n
1 ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 )
= √ − Eθ0
n i=1 ∂θ ∂θ

which converges in distribution to

N (0, I (θ0 ))
by the central limit theorem. We have used Lemma 7 and Lemma 8 here to get the asymptotic distribution
0)
of √1n ∂L(θ
∂θ . Then,
√
n θn − θ0 → N 0, I (θ0 )
d −1

• The asymptotic distribution, itself is useless since we have to evaluate the information matrix at
true value of parameter. However, we can consistently estimate the asymptotic variance of MLE by
evaluating the information matrix at MLE, i.e.,
−1
√
n θn − θ0 → N 0, I θn
d

In other expression which is slightly misleading but commonly used in practice is

1 −1 −1
= N θ0 , nI θn
d
θn → N θ0 , I θn
n
2
∂ L(θn )
where nI θn = − ∂θ∂θ . We can also use the approximation that

n ∂ log f y ; θ
i n ∂ log f yi ; θn
nI θn =
i=1
∂θ ∂θ

Proposition 10 Let g be a continuously diﬀerentiable function of θ ∈ Rp with values in Rq . Then, under

the assumptions of Proposition 9,

(i) g θn converges almost surely to g (θ0 ) .

√ dg (θ0 ) −1 dg (θ0 )
d
(ii) n g θn − g (θ0 ) → N 0, I (θ0 )
dθ dθ
⇒ The ﬁrst claim
is straight application of Slutsky theorem. For the second claim, we do a Taylor
expansion of g θn around θ0 to get
dg (θ∗ )
g θn = g (θ0 ) + θn − θ0
dθ
Hence,
√ dg (θ∗ ) √
n g θn − g (θ0 ) = n θn − θ0
dθ
Note that, as n → ∞, we have

dg (θ∗ ) a.s. dg (θ0 )

→ and
dθ dθ
√
n θn − θ0 → N 0, I (θ0 )
d −1

The claim follows immediately.

An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
100% (1)
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
7 pages
Mastering The Emotions Technique (MEMT)
No ratings yet
Mastering The Emotions Technique (MEMT)
15 pages
Natural Herbs For Waldenstrom's Macroglobulinemia
No ratings yet
Natural Herbs For Waldenstrom's Macroglobulinemia
2 pages
A List of Topics Covered in The SMLE Medical Examination
No ratings yet
A List of Topics Covered in The SMLE Medical Examination
2 pages
SACS - 5.2 Manual
100% (2)
SACS - 5.2 Manual
72 pages
Data Science Pocket Dictionary 1691284156
No ratings yet
Data Science Pocket Dictionary 1691284156
28 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Hd465-7eo SM Sen01081-08 PDF
100% (1)
Hd465-7eo SM Sen01081-08 PDF
1,613 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Module4
No ratings yet
Module4
3 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
SampleQs Solutions PDF
No ratings yet
SampleQs Solutions PDF
20 pages
Notes
No ratings yet
Notes
10 pages
STAT2602 Tutorial 5
No ratings yet
STAT2602 Tutorial 5
7 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Mathematical Statistics (II)
No ratings yet
Mathematical Statistics (II)
112 pages
4 Comparison of Estimators: 4.1 Optimality Theory
No ratings yet
4 Comparison of Estimators: 4.1 Optimality Theory
16 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
EE Exercise Solutions 2022
No ratings yet
EE Exercise Solutions 2022
21 pages
MLE_Assingnment (1)
No ratings yet
MLE_Assingnment (1)
7 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
Sol Stat Chapter2
No ratings yet
Sol Stat Chapter2
9 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
All Ex Sol
No ratings yet
All Ex Sol
43 pages
ps2,3
No ratings yet
ps2,3
48 pages
7 Mle
No ratings yet
7 Mle
31 pages
02_review_estimation_2
No ratings yet
02_review_estimation_2
36 pages
Module02B Slides Print 1
No ratings yet
Module02B Slides Print 1
59 pages
Math435 HW 8
No ratings yet
Math435 HW 8
8 pages
lecture1_ml_MLE
No ratings yet
lecture1_ml_MLE
103 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Unit 4 1lec 5
No ratings yet
Unit 4 1lec 5
6 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
No ratings yet
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
47 pages
TS-Theme3
No ratings yet
TS-Theme3
18 pages
18.650 Statistics For Applications
No ratings yet
18.650 Statistics For Applications
25 pages
Three Classical Tests Wald, LM (Score), and LR Tests: Econ 620
No ratings yet
Three Classical Tests Wald, LM (Score), and LR Tests: Econ 620
8 pages
Applied Time-Series Analysis: Arun K. Tangirala
No ratings yet
Applied Time-Series Analysis: Arun K. Tangirala
50 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
NOTES
No ratings yet
NOTES
14 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
Unit 4
No ratings yet
Unit 4
8 pages
281A Final Sol
No ratings yet
281A Final Sol
9 pages
استدلال جديد (2)
No ratings yet
استدلال جديد (2)
29 pages
ML Notes
No ratings yet
ML Notes
4 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
12_MLEFilled (1)
No ratings yet
12_MLEFilled (1)
8 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
slides_week4 (1)
No ratings yet
slides_week4 (1)
37 pages
STAT2006_A1
No ratings yet
STAT2006_A1
21 pages
VE564 Summer 2023: Lecture 3-1: Maximum Likelihood Estimation and Least Squares
No ratings yet
VE564 Summer 2023: Lecture 3-1: Maximum Likelihood Estimation and Least Squares
78 pages
Proof Wilks Theorem Likelihood Ratio Test
No ratings yet
Proof Wilks Theorem Likelihood Ratio Test
4 pages
SDET Formulae MidSem2 2018 Ver3
No ratings yet
SDET Formulae MidSem2 2018 Ver3
2 pages
Lecture 2727K19EN
No ratings yet
Lecture 2727K19EN
15 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Laplace Transforms Essentials
From Everand
Laplace Transforms Essentials
Morteza Shafii-Mousavi
3.5/5 (3)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
Experiment No: 8 TITTLE: Write A Program To Solve 8 Puzzle Problems. Exercise
No ratings yet
Experiment No: 8 TITTLE: Write A Program To Solve 8 Puzzle Problems. Exercise
3 pages
Important Articles of Indian Constitution
No ratings yet
Important Articles of Indian Constitution
15 pages
UP Criminal-Law
No ratings yet
UP Criminal-Law
22 pages
Thank You For Listening To Our Presentation.: Feel Free To Ask Us Any Questions You Have
No ratings yet
Thank You For Listening To Our Presentation.: Feel Free To Ask Us Any Questions You Have
2 pages
Bio Data Form
No ratings yet
Bio Data Form
1 page
Robin Milner A Calculus of Communicating Systems 1980
No ratings yet
Robin Milner A Calculus of Communicating Systems 1980
176 pages
Sasa To Sasa Ki Kapus Jasa - Marathi Balgeet For Kids - YouTube
No ratings yet
Sasa To Sasa Ki Kapus Jasa - Marathi Balgeet For Kids - YouTube
1 page
Report_504C103295_Kishanbhai Ahuja
No ratings yet
Report_504C103295_Kishanbhai Ahuja
6 pages
Television Q&A
No ratings yet
Television Q&A
2 pages
Medieval Ethnographies European Perceptions of the World Beyond Joan-Pau Rubiés (Editor) - The full ebook with all chapters is available for download now
100% (2)
Medieval Ethnographies European Perceptions of the World Beyond Joan-Pau Rubiés (Editor) - The full ebook with all chapters is available for download now
66 pages
Ch 2 - Vowels (Biblical Aramaic)
No ratings yet
Ch 2 - Vowels (Biblical Aramaic)
8 pages
Systems I Software Db2 PDF Performance DDS SQL
No ratings yet
Systems I Software Db2 PDF Performance DDS SQL
23 pages
Naïve Method. Code:: Naive, Rabin-Karp, and Knuth-Morris-Pratt Algorithms For String Matching
No ratings yet
Naïve Method. Code:: Naive, Rabin-Karp, and Knuth-Morris-Pratt Algorithms For String Matching
5 pages
Hbse Review 2
No ratings yet
Hbse Review 2
10 pages
Bucket Lists British English Student
No ratings yet
Bucket Lists British English Student
8 pages
International Personality Item Pool
No ratings yet
International Personality Item Pool
4 pages
General Matter
No ratings yet
General Matter
2 pages
Upo9788175968448 011
No ratings yet
Upo9788175968448 011
31 pages
198 Shorthand (English) Theory
No ratings yet
198 Shorthand (English) Theory
3 pages
Lab 2 Report
100% (2)
Lab 2 Report
16 pages
Voting Behaviour and Media Essay Plans
No ratings yet
Voting Behaviour and Media Essay Plans
3 pages
Lesson 5 Growing in Authentic Freedom
No ratings yet
Lesson 5 Growing in Authentic Freedom
21 pages
The Encyclopedia of British Literature 3 Volume Set 1660 1789 Wiley Blackwell Encyclopedia of Literature 1st Edition Gary Day instant download
No ratings yet
The Encyclopedia of British Literature 3 Volume Set 1660 1789 Wiley Blackwell Encyclopedia of Literature 1st Edition Gary Day instant download
53 pages
Form 15ca
No ratings yet
Form 15ca
2 pages

Uploaded by

Uploaded by

Econ 620

Maximum Likelihood Estimation (MLE)

Definition 1 A maximum likelihood estimator of θ is a solution to the maximization problem

max log (y; θ) = max L (y; θ)

⇒ By deﬁnition of MLE, we have

Proposition 5 (Relationship with suﬃciency) MLE is a function of every suﬃcient statistic.

θ0 = arg max Eθ0 log f (Yi ; θ)

4. The likelihood function

⇒ 1 and 2 ensure the existence

max Eθ0 log f (Y ; θ)

max log f (y; θ) f (y; θ0 ) dy

More regularity conditions for asymptotic distribution

∂ log f (Y ; θ0 ) ∂ log f (y; θ0 )

Hence, diﬀerentiating with respect to θ gives

Proposition 9 Under 1,2’, 3 - 9, a sequence of MLE, θn , satisﬁes

which converges in distribution to

In other expression which is slightly misleading but commonly used in practice is

Proposition 10 Let g be a continuously diﬀerentiable function of θ ∈ Rp with values in Rq . Then, under

dg (θ∗ ) a.s. dg (θ0 )

The claim follows immediately.

You might also like