Tute1 Questions
Tute1 Questions
Lecturer: Yi He
2 November, 2023
(a) Prove that the following optimization problems on page 8 of lecture slides generate the same
hard-margin SVM classifier:
Optimization Problem 1
maximize M
w,w0 ,M
subject to Yi · (wT Xi + w0 ) ≥ M ∀i
∥w∥ = 1
Optimization Problem 2
1 T
minimize β β
β,β0 2
subject to Yi (β T Xi + β0 ) ≥ 1 ∀i
Assume that optimization problem 1 has a solution and the length of margin is positive.
(b) Henceforth we only consider the second optimization problem. Differentiate the Lagrange
function
n
1 T X
β β+ λi (1 − Yi (β T Xi + β0 )), λi ≥ 0,
2
i=1
1
(c) Let S = {i : λ
bi ̸= 0} be the so-called support set, where λ
bi is the optimal solution of λi in part
(b). By the slackness condition (you do not need to prove this) we also know that
Consider a two-class classification problem with a target Y ∈ {−1, 1} and features X ∈ Rd . Define
p(x) = P(Y = 1|X = x) for all x ∈ Rd .
E[ℓH (f (X), Y ) | X = x]
minimizes conditional expected Hinge loss in part (a) over all candidate prediction rules f .
(c) Show that C Bayes (x) also minimizes the population classification error
Consider again the two-class classification problem from Question 2. Let σ denote the sigmoid
function given by
1
σ(a) = , a ∈ R.
1 + exp(−a)
2
(a) Prove that the bivariate regression function for the one-hot encoded target satisfies the following
representation:
with
P(Y = 1|X = x)
a(x) = log .
P(Y = −1|X = x)
(c) Prove that the Bayes classifier equals to the sign of the log odds function a(x), that is,
1
a(x) > 0
Bayes
C (x) = sign(a(x)) =
−1 a(x) < 0.
K
X
o
y, y ) = −
ℓCE (b yko log ybk .
k=1
y , y o ) ̸= ℓCE (y o , yb).
Note that ℓCE (b
(a) Prove that log x ≤ x − 1 for all x > 0, where the equality holds if and only if x = 1.
(b) Use part (a) to prove that the cross-entropy loss is minimal at the true distribution y o , that is,
y , y o ) ≥ ℓCE (y o , y o ).
ℓCE (b
(c) Use the concavity of the logarithm function to verify that the cross-entry loss is convex in yb,
meaning that for any two estimates yb, ye ∈ Pk and λ ∈ (0, 1),
y , y o ) ≤ λℓCE (b
y + (1 − λ)e
ℓCE (λb y , y o ) + (1 − λ)ℓCE (e
y , y o ).
3
(d) Let Y ∈ {1, . . . , K} be a target variable from the true distribution y o such that P(Y = k) = yko .
Show that the log-likelihood for parameters y o is given by
K
X
y |Y ) =
log L(b 1[Y = k] log ybk .
k=1
(e) Compare the expression of the log-likelihood function in (d) with the cross-entropy error. Can
you find a relation between the method of maximum likelihood and minimum cross-entropy?