Lecture23 Conditional Expectation
Lecture23 Conditional Expectation
Let X and Y be discrete random variables with joint probability mass function pX,Y (x, y), then the condi-
tional probability mass function was defined in previous lectures as
pX,Y (x, y)
pX|Y (x|y) = ,
pY (y)
(y) = E[X|Y = y] changes with y. The random variable (Y ) is the conditional expectation of X given Y
and denoted as E[X|Y ].
Let X and Y be continuous random variables with joint probability density function fX,Y (x, y). Recall the
conditional probability density function
fX,Y (x, y)
fX|Y (x|y) = ,
fY (y)
The random variable (Y ) is the conditional expectation of X given Y and denoted as E[X|Y ].
Example 1: Find E[Y |X] if the joint probability density function is fX,Y (x, y)= x1 ; 0 < y x 1.
Rx
Solution: fX (x)= x1 dy =1, 0 x 1
0
fX,Y (x,y) 1
fY |X (y|x)= fX (x) = x, 0<yx
Rx Rx y x
E[Y |X = x] = yfY |X (y|x)dy = x dy = 2
0 0
X
The conditional expectation E[Y |X] = 2.
23-1
23-2 Lecture 23: Conditional Expectation
X
EX [E[Y |X]] = pX (x)E[Y |X = x]
x
X X
= pX (x) ypY |X (y|x)
x y
X X pX,Y (x, y)
= pX (x) y
x y
pX (x)
X
= ypX,Y (x, y)
x,y
X X
= y pX,Y (x, y)
y x
X
= ypY (y)
y
= E[Y ].
Similarly law of iterated expectation for jointly continuous random variables can also be proved.
Application of the law of iterated expectation:
N
P
SN = Xi , where {X1 , ...XN } are independent and identically distributed random variables. N is a non-
i=1
negative random variable independent of Xi i {1, ..N }. From the law of iterative expectation, E[SN ] =
EN [E[SN |N ]]. Consider
"N #
X
E[SN |N = n] = E Xi |N = n (23.1)
i=1
" n #
X
=E Xi |N = n . (23.2)
i=1
n
n
P P
As N is independent of Xi , E Xi |N = n =E Xi =nE[X].
i=1 i=1
= E[Y g(X)].
Exercise: Prove E[Y g(X)] = E[E[Y |X]g(X)] if X and Y are jointly continuous random variables.
This theorem implies that
E[(Y E[Y |X])g(X)] = 0. (23.3)
The conditional expectation E[Y |X] can be viewed as an estimator of Y given X. Y E(Y |X) is then the
estimation error for this estimator. The above theorem implies that the estimation error is uncorrelated
with every function of X.
Observe that in this lecture, we have not dealt with conditional expectations in a general framework. Instead,
we have separately defined it for discrete and jointly continuous random variables. In a more general
development of the topic, (23.3) is in fact taken as the defining property of the conditional expectation.
Specifically, for any g(X), one can prove the existence and uniqueness (up to measure zero) of a (X)-
measurable random variable (X), that satisfies E[((X) Y )g(X)] = 0. Such a (X) is then defined as
the conditional expectation E[Y |X]. For a more detailed discussion, refer Chapter 9 in [1].
Minimum Mean Square Error Estimator:
We have seen that E[Y |X] is an estimator of Y given X. In the next theorem we will prove that this is indeed
an optimal estimate of Y given X, in the sense that the conditional expectation minimizes the mean-squared
error.
Proof:
E[(Y g(X))2 ] = E[(Y E[Y |X])2 ] + E[(E[Y |X] g(X))2 ] + 2E[(Y E[Y |X])(E[Y |X] g(X))]
E[(Y E[Y |X])2 ].
This is because E[(Y E[Y |X])(E[Y |X] g(X))] =0 (by (23.3)), and E[(E[Y |X] g(X))2 ] 0.
E[(Y E[Y |X])(E[Y |X] g(X))] =0 as from (23.3) we know that E[(E[Y |X] Y )(X)] = 0. Here (X) =
(E[Y |X] g(X)).
From (23.3) we observe that the estimation error Y (E[Y |X)] is orthogonal to any measurable function
of X. In the Hilbert Space of square integrable random variables, E[Y |X] can be viewed as the projection
of Y onto the subspace L2 ((X)) of (X) measurable random variables. As depicted in Figure 23.1, it is
quite intuitive that the conditional expectation (which is the projection of Y onto the subspace) minimizes
the mean-squared error among all random variables from the subspace L2 ((X)).
23-4 Lecture 23: Conditional Expectation
L2 ((X))
E[Y |X]
23.1 Exercises
1. Prove the law of iterated expectation for jointly continuous random variables.
2. (i) Given is the table for Joint PMF of random variables X and Y .
X=0 X =1
1 2
Y=0 5 5
2
Y=1 5 0
Let Z = E[X|Y ] and V = V ar(X|Y ). Find the PMF of Z and V , and compute E[Z] and E[V ].
1
(ii) Consider a sequence of i.i.d. random variables {Zi } where P(Zi = 0) = P(Zi = 1) = 2. Using
this sequence, define a new sequence of random variables {Xn } as follows:
X0 = 0,
X1 = 2Z1 1, and
Xn = Xn1 + (1 + Z1 + ... + Zn1 )(2Zn - 1) for n 2.
Show that E[Xn+1 |X0 , X1 , ..., Xn ] = Xn a.s. for all n.
3. (a) [MIT OCW problem set] The number of people that enter a pizzeria in a period of 15 minutes
is a (nonnegative integer) random variable K with known moment generating function MK (s).
Each person who comes in buys a pizza. There are n types of pizzas, and each person is equally
likely to choose any type of pizza, independently of what anyone else chooses. Give a formula, in
terms of MK (.), for the expected number of different types of pizzas ordered.
(b) John takes a taxi to home everyday after work. Every evening, he waits by the road to get a taxi
but every taxi that comes by is occupied with a probability 0.8 independent of each other . He
counts the number of taxis he missed till he gets an unoccupied taxi. Once he gets inside the taxi,
he throws a fair six faced die for a number of times equal to the number of taxis he missed. He
counts the output of the die throws and gives a tip to the driver equal to that. Find the expected
amount of tip that John gives everyday.
References
[1] D. Williams, Probability with Martingales,Cambridge University Press, Fourteenth Printing, 2011.