0% found this document useful (0 votes)

57 views

Nonparametric Regression

The document discusses nonparametric regression, which estimates the regression function m(x) = E(Y | X = x) without making assumptions about the form of the regression (such as linearity). It introduces the regressogram, the simplest method of nonparametric regression. The regressogram divides the range of X into bins and estimates m(x) within each bin using the average Y value of the observations within that bin. Choosing the number of bins k involves balancing bias and variance - more bins reduce bias but increase variance. Examples show applying the regressogram to simulated and bone mineral density data.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Nonparametric Regression

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Nonparametric Regression

1 Introduction

So far we have assumed that

Y = β0 + β1 X1 + · · · + βp Xp + .

In other words, m(x) = E[Y |X = x] = β0 + β1 X1 + · · · + βp Xp . Now we want to drop the

assumption of linearity. We will assume only that m(x) is a smooth function.

Given a sample (X1 , Y1 ), . . ., (Xn , Yn ), where Xi ∈ Rd and Yi ∈ R, we estimate the regression

function
m(x) = E(Y |X = x) (1)
without making parametric assumptions (such as linearity) about the regression function
m(x). Estimating m is called nonparametric regression or smoothing. We can write

Y = m(X) +

where E() = 0. This follows since, = Y − m(X) and E() = E(E(|X)) = E(m(X) −
m(X)) = 0

Example 1 Figure 1 shows data from on bone mineral density. The plots show the rela-
tive change in bone density over two consecutive visits, for men and women. The smooth
estimates of the regression functions suggest that a growth spurt occurs two years earlier for
females. In this example, Y is change in bone mineral density and X is age.

Example 2 (Multiple nonparametric regression) Figure 2 shows an analysis of some

diabetes data from Efron, Hastie, Johnstone and Tibshirani (2004). The outcome Y is a
measure of disease progression after one year. We consider four covariates (ignoring for
now, six other variables): age, bmi (body mass index), and two variables representing blood
serum measurements. A nonparametric regression model in this case takes the form

Y = m(x1 , x2 , x3 , x4 ) + . (2)

A simpler, but less general model, is the additive model

Y = m1 (x1 ) + m2 (x2 ) + m3 (x3 ) + m4 (x4 ) + . (3)

Figure 2 shows the four estimated functions m

b 1, m
b 2, m
b 3 and m
b 4.

1
0.20
0.15
Females

Change in BMD
0.10
0.05
0.00
−0.05
10 15 20 25

Age
0.20
0.15

Males
Change in BMD
0.10
0.05
0.00
−0.05

10 15 20 25

Age

Figure 1: Bone Mineral Density Data

2 The Bias–Variance Tradeoff

The prediction risk or prediction error is

2
b = E((Y − m(X))
R(m, m) b ) (4)

where (X, Y ) denotes a new observation. It follows that

Z
2 2
b = τ + E (m(x) − m(x))
R(m, m) b dP (x) (5)
Z Z
2 2
= τ + bn (x)dP (x) + vn (x)dP (x) (6)

where bn (x) = E(m(x))

b − m(x) is the bias and vn (x) = Var(m(x))
b is the variance. Recall
2
that τ is the unavoidable error, due to the fact that there would be some prediction error
even if we knew the true regression funcion m(x).

The estimator m b typically involves smoothing the data in some way. The main challenge is
to determine how much smoothing to do. When the data are oversmoothed, the bias term
is large and the variance is small. When the data are undersmoothed the opposite is true.

2
300
190
180

250
170

200
160

150
150

100
−0.10 −0.05 0.00 0.05 0.10 −0.10 −0.05 0.00 0.05 0.10 0.15

Age Bmi
240

160
150
200

140
160

130
120
120

110
−0.10 −0.05 0.00 0.05 0.10 −0.10 −0.05 0.00 0.05 0.10 0.15

Map Tc

Figure 2: Diabetes Data

This is called the bias–variance tradeoff. Minimizing risk corresponds to balancing bias and
variance.

3 The Regressogram

We will begin by assuming there is only one covariate X. For simplicity, assume that
0 ≤ X ≤ 1. The simplest nonparametric estimator of m is the regressogram. Let k be an
integer. Divide [0, 1] into k bins:

B1 = [0, h], B2 = [h, 2h], B3 = [2h, 3h], . . . .

P
Let nj denote the number of observations in bin Bj . In other words ni = i I(Xi ∈ Bj )
where I(Xi ∈ Bj ) = 1 if Xi ∈ Bj and I(Xi ∈ Bj ) = 0 if Xi ∈
/ Bj .

We then define Y j to be the average of Yi ’s in Bj :

1 X
Yj = Yi .
nj X ∈B
i j

3
Finally, we define m(x)
b = Y j for all x ∈ Bj . We can write this as
k
X
m(x)
b = Y j I(x ∈ Bj ).
j=1

Here are some examples.

regressogram = function(x,y,left,right,k,plotit,xlab="",ylab="",sub=""){
### assumes the data are on the interval [left,right]
n = length(x)
B = seq(left,right,length=k+1)
WhichBin = findInterval(x,B)
N = tabulate(WhichBin)
m.hat = rep(0,k)
for(j in 1:k){
if(N[j]>0)m.hat[j] = mean(y[WhichBin == j])
}
if(plotit==TRUE){
a = min(c(y,m.hat))
b = max(c(y,m.hat))
plot(B,c(m.hat,m.hat[k]),lwd=3,type="s",
xlab=xlab,ylab=ylab,ylim=c(a,b),col="blue",sub=sub)
points(x,y)
}
return(list(bins=B,m.hat=m.hat))
}

pdf("regressogram.pdf")
par(mfrow=c(2,2))
### simulated example
n = 100
x = runif(n)
y = 3*sin(8*x) + rnorm(n,0,.3)
plot(x,y,pch=20)
out = regressogram(x,y,left=0,right=1,k=5,plotit=TRUE)
out = regressogram(x,y,left=0,right=1,k=10,plotit=TRUE)
out = regressogram(x,y,left=0,right=1,k=20,plotit=TRUE)
dev.off()

4
### Bone mineral density versus age for men and women
pdf("bmd.pdf")
par(mfrow=c(2,2))
library(ElemStatLearn)
attach(bone)

age.male = age[gender == "male"]

density.male = spnbmd[gender == "male"]
out = regressogram(age.male,density.male,left=9,right=26,k=10,plotit=TRUE,
xlab="Age",ylab="Density",sub="Male")
out = regressogram(age.male,density.male,left=9,right=26,k=20,plotit=TRUE,
xlab="Age",ylab="Density",sub="Male")

age.female = age[gender == "female"]

density.female = spnbmd[gender == "female"]
out = regressogram(age.female,density.female,left=9,right=26,k=10,plotit=TRUE,xlab="Age"
out = regressogram(age.female,density.female,left=9,right=26,k=20,plotit=TRUE,xlab="Age"
dev.off()

From Figures 3 and 4 you can see two things. First, we need a way to choose k. (The answer
will be cross-validation). But also, m(x)
b is very unsmooth. To get a smoother estimator, we
will use kernel smoothing.

4 The Kernel Estimator

The idea of kernel smoothing is very simple. To estimate m(x) b we will take a local average
of the Yi0 s. In other words, we average all Yi such that |Xi − x| ≤ h where h is some small
number called the bandwidth. We can write this estimator as
Pn x−Xi

i=1 K h
Yi
m(x)
b = P n x−Xi

i=1 K h

where (
1 if |z| ≤ 1
K(z) =
0 if |z| > 1.
The function K is called the boxcar kernel.

5
● ● ● ●
● ● ●●
● ● ● ● ● ●●
●● ●
● ●●●
3

3
● ●●● ●
●
●● ●
● ●
●
●● ● ●
●● ●●
●
● ●● ●● ● ●
●●
●
● ●● ●
●● ●●
●
● ●
2

2
● ● ● ● ● ●●
●
●● ● ●●
●●
●● ●●
●
● ●
● ●
●
●
● ●
●●●
●●● ●●●
●
● ●● ● ●
● ●
●●
●
1

1
●
●●
●
●● ●
●● ●
●
●●
●● ●
● ●
●
● ● ● ●
y

● ●
0

0
● ● ●●
● ● ● ●
●
●
● ●
●●
● ●
−1

−1
● ●
●●
●
●
●
●●
● ●
●●
●
● ●
●● ●●
● ●
● ●
●
● ● ●● ● ● ●●●
−3

−3
● ● ● ●
● ●
● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

● ● ● ●
● ● ●●
●● ● ● ● ●●
●● ●
● ●●● ● ●●●
3

●
●● ●● ●
●● ●●
● ●
●● ●● ● ● ●● ●● ● ●
●
●● ●● ●
●● ●●
● ● ● ●
2

● ●●
● ● ●●
●
●●
●● ●●
●●
●
● ●
● ●
● ●
●
●●● ●●● ●●● ●●●
● ●
● ● ● ●
● ●
●● ●●
1

●● ● ●● ●
●● ●
● ●
● ●● ●
● ●
●
● ● ● ●
● ●
0

●● ●●
● ● ● ●
●●
● ● ●●
● ●
−1

−1

● ●
● ●● ● ●●
●●
● ●●
●
● ●
●● ●●
● ●
● ●
● ● ●●● ● ● ●●●
−3

−3

● ● ● ●
● ●
● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure 3: Simulated data. Various choices of k.

6
● ●
● ●
● ● ● ● ● ●
0.15

0.15
● ●● ● ●●
● ● ● ●
●●● ●●●
● ●● ● ●●
●● ● ●● ●
● ● ● ●
● ●●● ● ● ● ●●● ● ●
Density

Density
●● ●● ● ●● ●● ●
● ● ● ●
● ● ●
● ● ● ●
●
● ● ● ● ● ● ● ● ● ●
● ● ●●● ● ● ● ● ●●● ● ●
0.05

0.05
● ● ● ● ●●
● ●● ●●● ●●● ● ● ● ● ●●
● ●● ●●● ●●●
● ●●●●● ● ● ● ●
●●● ●● ●●
● ● ●● ● ●●●●● ● ● ● ●
●●● ●● ●●
● ● ●●
● ●●● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ●● ●
●
● ●
●
● ●●● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●●
●● ● ●
● ●● ●
● ● ● ●●
● ● ●● ●
● ● ●●●●● ●
●●●●●●● ●● ●● ●● ● ●
● ●● ●
● ● ● ●●
● ● ●● ●
● ● ●●●●● ●
●●●●●●● ●● ●●
● ●●● ● ●
● ●●● ● ● ●●● ●● ● ●●● ● ●
● ●●● ● ● ●●● ●●
● ● ●● ● ● ●●
● ●● ● ● ● ●● ● ●● ●
● ●● ●●● ● ● ●● ● ● ● ●● ● ●● ●
● ●● ●●● ●
● ● ●●●● ●●● ● ● ●●●● ●●●
● ● ● ● ● ● ● ● ● ●
● ●
−0.05

−0.05
● ●●●● ● ● ●●●● ●
● ● ● ●
● ●

10 15 20 25 10 15 20 25

Age Age
Male Male

● ●

● ●
●● ●●
● ●● ● ●●
0.15

0.15

● ●
● ●
●● ●●● ● ●● ●●● ●
● ●● ● ●●
●● ●● ● ●● ●● ●
●●●● ●● ●●●● ●●
Density

Density

● ●● ● ● ●● ●
●● ●●
● ●●
● ●
● ●● ● ● ●●
● ●
● ●● ●
● ●● ● ● ●● ●
● ● ● ●
● ●● ●● ● ● ● ● ●● ●● ● ● ●
●● ●● ● ●●● ● ●● ●● ● ●●● ●
0.05

0.05

● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●●●
● ● ●
● ● ● ● ● ● ●●●
● ● ●
● ●
●● ●●● ●●●● ●● ●●● ●●●●
● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ●●● ●
●
● ● ●●
●●●
● ●
● ●●●●● ● ● ●
● ● ● ● ●●● ●
●
● ● ●●
●●●
● ●
● ●●●●● ●
●
●● ●●
●● ● ●● ●●●●●●
●● ●
●● ●●
●● ● ●● ●●●●●●
●●
●●● ● ● ●● ● ● ●● ●●●● ●●
●●● ●●● ● ● ●● ● ● ●● ●●●● ●●
●●●
● ●● ● ● ●●●●●●
●●
● ● ●●●
● ●● ● ● ● ● ●● ● ● ●●●●●●
●●
● ● ●●●
● ●● ● ● ●
● ●●● ●● ●● ● ● ●●● ●●●●●●
● ● ● ●●● ●● ●● ● ● ●●● ●●●●●●
● ●
● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ●
−0.05

−0.05

● ●
●● ● ● ● ●● ● ● ●

10 15 20 25 10 15 20 25

Age Age
Female Female

Figure 4: Bone density example. Men and women. Left plots use k = 10. Right plots use
k = 20.

7
● ●

● ●
● ● ● ●
● ●● ● ●●
0.15

0.15
● ●
● ●
● ● ●●● ● ● ● ●●● ●
density.female

density.female
● ●● ● ●●
●●●●●●● ●●●●●●●
● ●● ● ● ●●●●● ● ●● ● ● ●●●●●
● ●●
● ●
● ●● ● ● ●●
● ●
● ●● ●
● ● ● ● ● ●
●●● ●●●
● ●● ●● ● ●● ● ● ●● ●● ● ●● ●
●● ●● ● ●●
● ●● ●● ● ●●
●
0.05

0.05
● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ●● ●●●●●● ● ● ● ● ●● ●●●●●● ●
●● ●● ●●● ● ●● ●● ●●● ●
● ● ●● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●● ● ●
●
● ●●●●●● ●●● ● ● ● ●● ● ●
●
● ●●●●●● ●●● ●
● ● ● ●
● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
● ● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
●
●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●● ●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●●
● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
● ● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
●
●
● ● ● ●● ● ● ● ●
● ●●● ● ●
● ● ● ●● ● ● ● ●
● ●●● ●
●●● ● ● ● ●●● ● ● ●
● ●● ● ● ● ●● ● ●
−0.05

−0.05
● ●
●● ● ● ● ●● ● ● ●

10 15 20 25 10 15 20 25

age.female age.female

● ●

● ●
● ● ● ●
● ●● ● ●●
0.15

0.15

● ●
● ●
● ● ●●● ● ● ● ●●● ●
density.female

density.female

● ●● ● ●●
●●●●●●● ●●●●●●●
● ●● ● ● ●●●●● ● ●● ● ● ●●●●●
● ●●
● ●
● ●● ● ● ●●
● ●
● ●● ●
● ● ● ● ● ●
●●● ●●●
● ●● ●● ● ●● ● ● ●● ●● ● ●● ●
●● ●● ● ●●
● ●● ●● ● ●●
●
0.05

0.05

● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ●● ●●●●●● ● ● ● ● ●● ●●●●●● ●
●● ●● ●●● ● ●● ●● ●●● ●
● ● ●● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●● ● ●
●
● ●●●●●● ●●● ● ● ● ●● ● ●
●
● ●●●●●● ●●● ●
● ● ● ●
● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
● ● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
●
●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●● ●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●●
● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
● ● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
●
●
● ● ● ●● ● ● ● ●
● ●●● ● ●
● ● ● ●● ● ● ● ●
● ●●● ●
●●● ● ● ● ●●● ● ● ●
● ●● ● ● ● ●● ● ●
−0.05

−0.05

● ●
●● ● ● ● ●● ● ● ●

10 15 20 25 10 15 20 25

age.female age.female

Figure 5: Boxcar kernel estimators for various choices of h.

8
As you can see in Figure 5, this gives much smoother estimates than the regressogram. But
we can improve this even further by replacing K with a smoother function.

This leads us to the following definition. A one-dimensional smoothing kernel is any smooth
function K such that K(x) ≥ 0 and
Z Z Z
K(x) dx = 1, xK(x)dx = 0 and σK ≡ x2 K(x)dx > 0.
2
(7)

Let h > 0 be a positive number, called the bandwidth. The Nadaraya–Watson kernel esti-
mator is defined by
Pn x−Xi
n
i=1 Yi K
X
h
m(x)
b ≡m
b h (x) = Pn x−Xi
= Yi `i (x) (8)
i=1 K h i=1
P
where `i (x) = K((x − Xi )/h)/ j K((x − Xj )/h).

Thus m(x)
b is a local average of the Yi ’s. It can be shown that there is an optimal kernel
called the Epanechnikov kernel. But the choice of kernel K is not too important. Estimates
obtained by using different kernels are usually numerically very similar. This observation is
confirmed by theoretical calculations which show that the risk is very insensitive to the choice
of kernel. What does matter much more is the choice of bandwidth h which controls the
amount of smoothing. Small bandwidths give very rough estimates while larger bandwidths
give smoother estimates. A common choice for the kernel is the normal density:
2 /2
K(z) ∝ e−z .

This does not mean that we are assuming that the data are Normal. We are just using this
as a convenient way to define the smoothing weights.

Figure 6 shows the kernel estimator m(x)

b for 4 different choices of h using a Gaussian kernel.
As you can see, we get much nicer estimates. But we still need a way to choose the smoothing
parameter h. Before we discuss that, let’s first discuss linear smoothers.

5 The Kernel Estimator is a Linear Smoother

The kernel estimator m(x)

b is defined at each x. As we did for linear regression, we can also
compute the estimator at the original data points Xi . Again, we call Ybi = m(X
b i ) the fitted
values. Note that
Pn Xi −Xj
Y
j=1 i K h Xn
mb h (Xi ) = P = Yi `j (Xi ) (9)
n Xi −Xj
j=1 K h j=1

9
● ●

● ●
● ● ● ●
● ●● ● ●●
0.15

0.15
● ●
● ●
● ● ●●● ● ● ● ●●● ●
density.female

0.05
● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ●● ●●●●●● ● ● ● ● ●● ●●●●●● ●
●● ●● ●●● ● ●● ●● ●●● ●
● ● ●● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●● ● ●
●
● ●●●●●● ●●● ● ● ● ●● ● ●
●
● ●●●●●● ●●● ●
● ● ● ●
● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
● ● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
●
●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●● ●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●●
● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
● ● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
●
●
● ● ● ●● ● ● ● ●
● ●●● ● ●
● ● ● ●● ● ● ● ●
● ●●● ●
●●● ● ● ● ●●● ● ● ●
● ●● ● ● ● ●● ● ●
−0.05

−0.05
● ●
●● ● ● ● ●● ● ● ●

10 15 20 25 10 15 20 25

age.female age.female

● ●

● ●
● ● ● ●
● ●● ● ●●
0.15

0.15

● ●
● ●
● ● ●●● ● ● ● ●●● ●
density.female

density.female

0.05

● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ●● ●●●●●● ● ● ● ● ●● ●●●●●● ●
●● ●● ●●● ● ●● ●● ●●● ●
● ● ●● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●● ● ●
●
● ●●●●●● ●●● ● ● ● ●● ● ●
●
● ●●●●●● ●●● ●
● ● ● ●
● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
● ● ● ● ● ●● ● ●
●●● ●● ●●● ● ●●●● ●
●●●●
● ●●●● ●●
●●
●
●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●● ●●● ●●● ●●●●● ●●
● ●●
●●●●●● ● ●●
●●
●●●
● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
● ● ●● ● ● ● ●●
● ●● ●●●● ● ●
● ●
●
●
● ● ● ●● ● ● ● ●
● ●●● ● ●
● ● ● ●● ● ● ● ●
● ●●● ●
●●● ● ● ● ●●● ● ● ●
● ●● ● ● ● ●● ● ●
−0.05

−0.05

● ●
●● ● ● ● ●● ● ● ●

10 15 20 25 10 15 20 25

age.female age.female

Figure 6: Kernel estimators for various choices of h.

10
P
where `j (x) = K((x − Xj )/h)/ t K((x − Xt )/h). Let Y b = (Yb1 , . . . , Ybn ), Y = (Y1 , . . . , Yn )
and let L be the n × n matrix with entries Lij = `j (Xi ). We then see that

Y
b = LY. (10)

This looks a lot like the equation Y

b = HY from linear regression.

The matrix L is called the smoothing matrix. It is like the hat matrix but it is not a
projection matrix. We call X
ν = tr(L) = Lii
i

the effective degrees of freedom. As the bandwidth h gets smaller, ν gets larger. In other
words, small bandwidths correspond to more complex models.

The equation Y b = LY means that we can write each Ybi as a linear combination of the Yi ’s.
Because of this, we say that kernel regression is a linear smoother. This does not mean that
m(x)
b is linear. It just means that each Ybi as a linear combination of the Yi ’s.

The are many other linear smoothers besides kernel regression estimators but we will focus on
the kernel estimator for now. Remember: a linear smoother does not mean linear regression.

6 Choosing h by Cross-Validation

We will choose h by cross-validation. For simplicity, we focus on leave-one-out cross-

(−i)
validation. Let m b h be the kernel estimator (with bandwidth h) obtained after leaving
out (Xi , Yi ). The cross-validation score is
1X (−i)
CV (h) = b h (Xi ))2 .
(Yi − m
n i

It turns out that there is a handy short cut formual for CV which is
2
1 X (Yi − m b h (Xi ))
CV (h) = . (11)
n i 1 − Lii

Our strategy is to fit m

b h for many values of h. We then compute CV (h) using (11). Then
we find h to minimize CV (h). The bandwidth b
b h is the one we use.
P
Some people approximate the formula above replacing each Lii by the average value (1/n) i Lii =
(1/n)tr(L) = ν/n. If we replace Lii with ν/n we get this formula:
1 1X
GCV (h) = b h (Xi ))2
(Yi − m (12)
ν 2 n i

1− n

11
Cross−validation Score

Cross−validation Score
0.0017

0.0017
0.0015

0.0015
0.0013

0.0013
0 1 2 3 4 5 0 10 20 30 40 50

Bandwidth Effective Degrees of Freedom

●
● ●
● ●●
0.15

●
●
● ● ●●● ●
● ●●
●● ●● ●
● ●● ● ●●
Density

● ●● ●
●●
● ●●
● ●
● ●● ●
● ●● ●
●●
● ●● ●● ● ●● ●
●● ●● ● ●●
●
0.05

●
●● ● ● ●

10 15 20 25

Age

Figure 7: Cross validation (black) and generalized cross validation (red dotted line) versus
h and versus effective degrees of freedom. The bottom left plot is the kernel regression
estimator using the bandwidth that minimizes the cross-validation score.

12
which is called generalized cross validation.

Figure 7 shows the cross-validation score and the generalized cross validation score. I plotted
it versus h and then I plotted it versus the effective degrees of freedom ν. The bottom left plot
is the kernel regression estimator using the bandwidth that minimizes the cross-validation
score.

Here is the code:

kernel = function(x,y,grid,h){
### kernel regression estimator at a grid of values
n = length(x)
k = length(grid)
m.hat = rep(0,k)
for(i in 1:k){
w = dnorm(grid[i],x,h)
m.hat[i] = sum(y*w)/sum(w)
}
return(m.hat)
}

kernel.fitted = function(x,y,h){
### fitted values and diaginal of smoothing matrix
n = length(x)
m.hat = rep(0,n)
S = rep(0,n)
for(i in 1:n){
w = dnorm(x[i],x,h)
w = w/sum(w)
m.hat[i] = sum(y*w)
S[i] = w[i]
}
return(list(fitted=m.hat,S=S))
}

CV = function(x,y,H){
### H is a vector of bandwidths
n = length(x)
k = length(H)
cv = rep(0,k)
nu = rep(0,k)

13
gcv = rep(0,k)
for(i in 1:k){
tmp = kernel.fitted(x,y,H[i])
cv[i] = mean(((y - tmp$fitted)/(1-tmp$S))^2)
nu[i] = sum(tmp$S)
gcv[i] = mean((y - tmp$fitted)^2)/(1-nu[i]/n)^2
}
return(list(cv=cv,gcv=gcv,nu=nu))
}

pdf("crossval.pdf")
par(mfrow=c(2,2))

bone = read.table("BoneDensity.txt",header=TRUE)
attach(bone)

age.female = age[gender == "female"]

density.female = spnbmd[gender == "female"]

H = seq(.1,5,length=20)
out = CV(age.female,density.female,H)
plot(H,out$cv,type="l",lwd=3,xlab="Bandwidth",ylab="Cross-validation Score")
lines(H,out$gcv,lty=2,col="red",lwd=3)
plot(out$nu,out$cv,type="l",lwd=3,xlab="Effective Degrees of Freedom",ylab="Cross-valida
lines(out$nu,out$gcv,lty=2,col="red",lwd=3)

j = which.min(out$cv)
h = H[j]
grid = seq(min(age.female),max(age.female),length=100)
m.hat = kernel(age.female,density.female,grid,h)
plot(age.female,density.female,xlab="Age",ylab="Density")
lines(grid,m.hat,lwd=3,col="blue")

dev.off()

14
7 Analysis of the Kernel Estimator

Recall that the prediction error of m

b is
Z Z
2 2
E(Y − m
b h (X)) = τ + b2n (x)p(x)dx + vn (x)p(x)dx

where τ 2 is the unavoidable error, bn (x) = E(m b h (x)) − m(x) is the bias and vn (x) =
Var(mb h (x)) is the variance. The first term is unavoidable. It is the second two terms
that we can try to make small. That is, we would like to minimize the integrated mean
squared error Z Z
IM SE = b2n (x)p(x)dx + vn (x)p(x)dx.

It can be shown (under certain conditions) that

Z
b2n (x)p(x)dx ≈ C1 h4

for some constant C1 and Z

C2
vn (x)p(x)dx ≈ .
nh
Hence,
C2
IM SE = C1 h4 + .
nh
If we minimize this expression over h we find that the best h is
1/5
C3
hn =
n
where C3 = C2 /(4C1 ). If we insert this hn into the IMSE we find that
4/5
C4
IM SE = .
n

What do we learn from this? First, the optimal bandwidth gets smaller as the sample size
increases. Second, the IMSE goes to 0 as n gets larger. But it goes to 0 slower than other
estimators your are used to. For example, if you estimate the mean µ with the sample
mean Y then E(Y − µ)2 = σ 2 /n. Roughly speaking, the mean squared error of parametric
estimators is something like 1/n but kernel estimators (and other nonparamettic estimators)
behave like (1/n)4/5 which goes to 0 more slowly. Slower convergence is the price of being
nonparametric.

The formula we derived for hn is interesting for helping our understanding of the theoretical
behavior of m.
b But we can’t use that formual in practice because those constants involve
quantities that depend on the unknown function m. So we use cross-validation to choose h
in practice.

15
8 Variability Bands

We would like to get some idea of how accurate our estimator is. To estimate the standard
error of m
b h (x) we are going to use a tool called the bootstrap. In 402, you will see that the
bootstrap is a very general tool. Here we will only use it to get standard errors for our kernel
estimator.

Our data can be written as

Z1 , . . . , Zn
where Zi = (Xi , Yi ). Here is how the bootstrap works:

1. Compute the estimator m(x).

2. Choose a large number B. Usually, we take B = 1, 000 or B = 10, 000.

3. For j = 1, . . . , B do the following:

(a) Draw n observations, with replacement, from the original data {Z1 , . . . , Zn }. Call
these observations Z1∗ , . . . , Zn∗ . This is called a bootstrap sample.
(b) Compute a kernel estimator m b ∗j (x) from the bootstrap sample. (Note that we
b ∗j (x) at every x.)
compute m

b ∗1 (x), . . . , m
4. At each x, let se(x) be the standard deviation of the numbers m b ∗B (x).

The resulting function se(x) is an estimate of the standard deviation of m(x).

b (I am actually
sweeping some technical details under the rug.) We can now plot m(x)
b + 2 se(x) and m(x)
b −
2 se(x). These are called variability bands. The are not actually valid confidence bands for
some technical reasons that we will not go into here. But they do give us an idea of the
variability of our estimator. An example is shown in Figure 8. Here is the code.

16
boot = function(x,y,grid,h,B){
### pointwise standard error for kernel regression using the bootstrap
k = length(grid)
n = length(x)
M = matrix(0,k,B)
for(j in 1:B){
I = sample(1:n,size=n,replace=TRUE)
xx = x[I]
yy = y[I]
M[,j] = kernel(xx,yy,grid,h)
}

s = sqrt(apply(M,1,var))
return(s)
}

bone = read.table("BoneDensity.txt",header=TRUE)
attach(bone)
age.female = age[gender == "female"]
density.female = spnbmd[gender == "female"]
h = .7
grid = seq(min(age.female),max(age.female),length=100)
plot(age.female,density.female)
mhat = kernel(age.female,density.female,grid,h)
lines(grid,mhat,lwd=3)

B = 1000
se = boot(age.female,density.female,grid,h,B)
lines(grid,mhat+2*se,lwd=3,lty=2,col="red")
lines(grid,mhat-2*se,lwd=3,lty=2,col="red")

9 Multiple Nonparametric Regression

Suppose now that the covariate is d-dimensional, Xi = (xi1 , . . . , xid )T . The regression equa-
tion is
Y = m(X1 , . . . , Xd ) + . (13)
A problem that occurs with smoothing methods is the curse of dimensionality. Estimation
gets harder very quickly as the dimension of the observations increases.

17
●
0.20

● ●
●
● ●
●
0.15

● ●
● ● ●
●
●● ●● ●
●●
● ● ●
●●●●
density.female

●
● ● ●● ●
● ● ●
0.10

● ●
●●
● ● ●
● ● ●
● ●
● ●
● ●
● ● ●● ●● ● ●
● ●● ● ●●
● ● ● ●
● ● ●
0.05

● ●● ●
● ● ● ● ●●● ●● ●
● ● ●● ● ●
●
● ● ●
●● ● ●
● ● ●● ● ●●●● ● ● ●● ● ●
● ●● ● ●● ● ●
●● ● ● ● ●
● ● ● ● ●
●● ● ● ● ●●
● ●●● ●●
● ● ● ● ●● ●
●●●
● ● ●
● ● ● ● ●
● ● ● ●
●
●● ● ●● ● ●● ● ● ●
● ●●● ●●
0.00

● ● ●● ●● ● ●
●
● ● ● ● ●●● ● ● ● ●
● ●
● ● ● ● ● ●● ● ●
● ● ● ●● ● ● ● ● ●
● ●
● ● ● ● ●
●
● ● ● ●
●
●
● ● ●
●
●
●
● ●
−0.05

● ●

10 15 20 25

age.female

Figure 8: Variability bands for the kernel estimator.

18
The IMSE of of most nonparametric regression estimators has the form n−4/(4+d) . This
implies that the sample size needed for a given level of accuracy increases exponentially with
d. The reason for this phenomenon is that smoothing involves estimating a function m(x)
using data points in a local neighborhood of x. But in a high-dimensional problem, the data
are very sparse, so local neighborhoods contain very few points.

9.1 Kernels

The kernel estimator in the multivariate case is defined by

Pn
i=1 Yi K(kx − Xi k/h)
m(x) = P n . (14)
i=1 K(kx − Xi k/h)
b

We now proceed as in the univariate case. For example, we use cross-validation to estimate
h.

9.2 Additive Models

Interpreting and visualizing a high-dimensional fit is difficult. As the number of covariates

increases, the computational burden becomes prohibitive. A practical approach is to use an
additive model. An additive model is a model of the form
d
X
m(x1 , . . . , xd ) = α + mj (xj ) (15)
j=1

where m1 , . . . , md are smooth functions.

The additive model is not well defined as we have stated it. We can add any constant to
α and subtract the same constant from one of the mj ’s without changing the regression
function. This problem can be fixed in a number of ways; the simplest is P b = Y and
to set α
n
then regard the mj ’s as deviations from Y . In this case we require that i=1 mb j (Xi ) = 0
for each j.

There is a simple algorithm called backfitting for turning any one-dimensional regression
smoother into a method for fitting additive models. This is essentially a coordinate descent,
Gauss-Seidel algorithm.

The Backfitting Algorithm

19
Initialization: set α
b = Y and set initial guesses for m
b 1, . . . , m
b d . Now iterate the following
steps until convergence. For j = 1, . . . , d do:

P
• Compute Yei = Yi − α
b − k6=j m
b k (Xi ), i = 1, . . . , n.

• Apply a smoother to Ye on Xj to obtain m b j.

b j (x) − n−1 ni=1 m

P
• Set m
b j (x) ←− m b j (Xi ).

• end do.

In principle, we could use a different bandwidth hj for each m

b j . This is a lot of work. Instead,
it is common to use the same bandwidth h for each function.

9.3 Example

Here we consider predicting the price of cars based on some covariates.

library(mgcv)
D = read.table("CarData.txt",header=TRUE)
##data from Consumer Reports (1990)
attach(D)
names(D)
[1] "Price" "Mileage" "Weight" "Disp" "HP"
pairs(D)
out = gam(Price ~ s(Mileage) + s(Weight) + s(Disp) + s(HP))
summary(out)

#Family: gaussian
#Link function: identity
#
#Formula:
#Price ~ s(Mileage) + s(Weight) + s(Disp) + s(HP)
#
#Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 12615.7 307.3 41.06 <2e-16 ***
#---
#Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

20
#
#Approximate significance of smooth terms:
# edf Ref.df F p-value
#s(Mileage) 4.328 5.314 1.897 0.12416
#s(Weight) 1.000 1.000 7.857 0.00723 **
#s(Disp) 1.000 1.000 11.354 0.00146 **
#s(HP) 4.698 5.685 3.374 0.00943 **
#---
#Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
#
#R-sq.(adj) = 0.66 Deviance explained = 72.4%
#GCV = 7.0853e+06 Scale est. = 5.6652e+06 n = 60
#

plot(out,lwd=3)
r = resid(out)
plot(Mileage,r);abline(h=0)
plot(Weight,r);abline(h=0)
plot(Disp,r);abline(h=0)
plot(HP,r);abline(h=0)

21
20 25 30 35 100 200 300
● ● ● ●
● ● ● ●

20000
● ● ● ●

● ● ● ● ●● ● ● ●●
●●● ● ● ● ●● ●
●●● ●● ● ● ● ●
Price ●●
●●● ●●
● ●●
●● ●
●● ●● ● ●
●
●
●
●
●
● ●●● ●
● ● ●●●
●
●
●●
●
●● ●
● ●●
● ●●● ●
●●● ●
●
●
●●
●
●
●
●
●
● ● ●●
●●
●
●
● ●

10000
● ● ● ● ● ●
● ● ● ● ● ●● ●
● ● ●●
●● ● ● ● ●● ●
● ● ● ●● ● ● ●● ● ● ● ●
●
●●●●● ● ●●● ●● ● ● ●●
●● ●●
●● ●● ●
●● ●●
● ● ● ●● ● ● ● ●●
● ● ●● ●● ● ●
● ● ● ●● ● ● ● ● ● ● ●● ● ●
● ●●●

● ● ● ●
35

● ● ● ●
● ● ● ●
● ●● ●●● ● ●● ● ●●
● ● ● ● ● ●●
30

● ● ● ●
● ● ● ●
●
● ● ●
●● ●● ●
●
● Mileage ●●
●●
●● ●
● ●●
●
●●
●●
●
● ●
● ●
● ●
●●
●●● ●
●● ●
25

●● ● ● ● ●● ● ●● ●
● ●● ● ● ●● ● ●●● ● ●● ●
●●● ●● ● ● ● ●●● ● ●● ●●
●● ● ● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●
● ● ●●●●
● ●● ● ● ●● ● ●● ● ●● ●● ● ● ●● ●
20

● ●● ● ● ●●●● ● ●
●● ●
● ● ● ●●●
● ●● ● ●● ● ● ● ● ● ●
● ●●● ● ●● ● ● ● ● ● ● ●● ●

● ● ● ● ●
● ● ●
● ●●● ●●
●● ● ●
● ● ● ●
● ● ● ●●
●● ● ● ●●●● ● ● ● ● ●● ●
●● ● ● ●●●●
●● ● ● ● ●
● ●●● ●
●●●● ● ● ● ●● ●
● ● ● ●●●

3000
●
● ● ●● ● ●
●
● ●
●

●
●●●●
●
●●
●
●●●
● ●●●●●
●
●

●
●●●
● ● ●●
●
● ●
●●
●● ● ●
●● ●●
●
Weight ●●
●●
● ●●
●
●●●●
●●●
●●
●
●
●
●
●
●● ●
●●●
●●●
●
●
● ●●● ●

●●
●
●
● ● ● ● ● ●● ●●
●
●● ●●●
● ● ●● ●●
●● ● ●
●
●●
●●● ●●●●●
●
●
●

2000
● ● ● ●
● ● ● ●
●● ● ● ●● ●●
300

●● ● ● ●●● ● ● ● ● ●

●● ● ● ● ● ●
200

●●
● ●● ●●● ● ● ●
●
●
●● ●
●
●
●●● ●
●● ●●●
●
● ●
●● ● ●● Disp ●●
●
●●
●●●●
●
●
●
●●●●● ●●
●●● ●●●
●●●● ●● ● ●●●●●●
●● ●● ● ● ●●● ●●
●● ● ●● ● ● ● ●● ●●
● ●●
● ● ●● ● ●●
●● ● ● ● ● ●
●● ● ● ●● ● ●●●●●●●● ●●●●
● ●●● ● ●
100

● ●● ● ●● ●● ●
●
●●●
●
●●
● ●●●● ● ●● ●● ● ●●
●● ● ● ●●●● ●●●
●● ● ● ●● ● ● ●●

● ● ● ●

200
● ● ● ●
● ● ● ● ● ● ● ● ●
● ●● ●●● ●●● ● ●● ●

150
● ●●●●
●●●●●● ●
●● ●
●
●●●●●
●
●●●●● ●
●●
● ●
● ● ● ●●●
●
●
●● ● ●
● ●
●
●
●
●●●
● ●
● ● ● ● ●
●
●
●
●
HP
● ●●●●●●● ● ● ●●●● ●
●
●● ● ●
● ●● ● ●●
● ●● ● ● ● ● ● ● ●●●●●
●● ●
● ●
● ●
●● 100
●●
● ● ● ● ●●●● ●
●●● ●●● ● ●●
● ●●
●
●
●● ●●
●
●● ● ● ●● ●● ●● ● ●
●
● ●● ●
●●●
● ● ● ● ●
●● ● ●● ● ● ● ● ●●
● ● ● ●

10000 20000 2000 3000 100 150 200

Figure 9: The car price data.

22
5000

5000
s(Mileage,4.33)

s(Weight,1)
0

0
−10000

−10000
20 25 30 35 2000 2500 3000 3500

Mileage Weight
5000

5000
s(HP,4.7)
s(Disp,1)

0
−10000

−10000

100 150 200 250 300 100 150 200

Disp HP

Figure 10: The car price data: estimated functions.

23
● ●
6000

6000
● ●

● ●
● ●
● ● ● ● ● ●
● ●
2000

2000
● ●
r

r
● ● ● ● ● ●● ●
● ●
● ● ● ● ● ● ● ●● ●● ● ● ●●
● ● ● ● ● ● ● ●
0

0
● ● ● ● ● ●● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●● ●●● ●
● ● ● ● ●
● ● ● ●● ●
● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ●● ●
−4000

−4000
● ●

20 25 30 35 2000 2500 3000 3500

Mileage Weight

● ●
6000

6000

● ●

● ●
● ●
● ● ● ● ● ●
● ●
2000

2000

● ●
r

● ● ● ●●
● ●● ● ●● ●
● ● ● ● ●● ● ● ● ● ● ●●
● ●● ● ●● ●
● ● ●
0

● ● ● ● ● ● ● ●●
● ● ●● ● ●
● ●
●● ●●
●● ● ● ● ● ● ● ●
● ● ●
● ●●● ●● ● ● ●
●
● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ●
−4000

−4000

● ●

100 150 200 250 300 100 150 200

Disp HP

Figure 11: The car price data: residuals

Pset 6 - Fall2019 - Solutions PDF
100% (3)
Pset 6 - Fall2019 - Solutions PDF
33 pages
Kernel Smoothing-MP Wand-MC Jones-1995
100% (1)
Kernel Smoothing-MP Wand-MC Jones-1995
228 pages
Cafs HSC Notes
No ratings yet
Cafs HSC Notes
70 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Supplier Quality Development Program
100% (1)
Supplier Quality Development Program
16 pages
21+nonparametric+regression
No ratings yet
21+nonparametric+regression
18 pages
non-par-regression
No ratings yet
non-par-regression
35 pages
MAP 716 Lecture 5 Multiple Regression
No ratings yet
MAP 716 Lecture 5 Multiple Regression
6 pages
reg
No ratings yet
reg
110 pages
Model Checking
No ratings yet
Model Checking
65 pages
Ra Web
No ratings yet
Ra Web
70 pages
Regression 101
No ratings yet
Regression 101
18 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
Soluciones Unidad 3 Opcionales
No ratings yet
Soluciones Unidad 3 Opcionales
15 pages
Regression Modeling in Biostatistics
No ratings yet
Regression Modeling in Biostatistics
3 pages
k2 Attachments CT Lecture 14. Multiple Linear Regression
No ratings yet
k2 Attachments CT Lecture 14. Multiple Linear Regression
37 pages
Module 4
No ratings yet
Module 4
33 pages
Simple Linear Regression 4
No ratings yet
Simple Linear Regression 4
5 pages
Nonparametric and Semiparametric Models
No ratings yet
Nonparametric and Semiparametric Models
325 pages
Ridge Regression
No ratings yet
Ridge Regression
82 pages
Supervised - GLMs GAMs GAIMs EBMs (1h) - 2
No ratings yet
Supervised - GLMs GAMs GAIMs EBMs (1h) - 2
27 pages
Basic STATA Command
No ratings yet
Basic STATA Command
5 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Regression Modeling Strategies: With Applications To Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer Series in Statistics) - ISBN 3319194240, 978-3319194240
100% (31)
Regression Modeling Strategies: With Applications To Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer Series in Statistics) - ISBN 3319194240, 978-3319194240
23 pages
R Practice
No ratings yet
R Practice
38 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Appendix Nonparametric Regression
No ratings yet
Appendix Nonparametric Regression
17 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
NVT SDS Unit V Final PDF
No ratings yet
NVT SDS Unit V Final PDF
100 pages
An Overview of Regression Analysis: Notes
No ratings yet
An Overview of Regression Analysis: Notes
5 pages
Math Portfolio BMI
No ratings yet
Math Portfolio BMI
7 pages
Module 8 Part B Updated-2
No ratings yet
Module 8 Part B Updated-2
7 pages
RSM1282-2025-Session 6-Multiple Regression POST (1)
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST (1)
84 pages
2015 Book RegressionModelingStrategies-1 PDF
No ratings yet
2015 Book RegressionModelingStrategies-1 PDF
598 pages
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages
Regression Modeling PDF
100% (1)
Regression Modeling PDF
598 pages
BIA B350F Assignment 1 Regression Analysis Sample
No ratings yet
BIA B350F Assignment 1 Regression Analysis Sample
19 pages
Analytics
No ratings yet
Analytics
11 pages
2017_18 Exam
No ratings yet
2017_18 Exam
4 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
63 pages
MSC Nursing
No ratings yet
MSC Nursing
8 pages
Unit 4 Biostatistics Chatgpt
No ratings yet
Unit 4 Biostatistics Chatgpt
2 pages
Excel Solver - Step by Step Nonlinear Regression - Brown
No ratings yet
Excel Solver - Step by Step Nonlinear Regression - Brown
11 pages
Bioestadistica 5
No ratings yet
Bioestadistica 5
42 pages
Prediction of Random Effects and Effects of Misspecification of Their Distribution
No ratings yet
Prediction of Random Effects and Effects of Misspecification of Their Distribution
49 pages
Bayesian Methods For Regression Models With Fat Data
No ratings yet
Bayesian Methods For Regression Models With Fat Data
51 pages
Nonlinear Regression Using EXCEL Solver
No ratings yet
Nonlinear Regression Using EXCEL Solver
10 pages
Stt151a Notes
No ratings yet
Stt151a Notes
14 pages
correlation
No ratings yet
correlation
13 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
332 3 Muscle - Mass
No ratings yet
332 3 Muscle - Mass
9 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
Minitab Multiple Regression Analysis PDF
No ratings yet
Minitab Multiple Regression Analysis PDF
6 pages
Minitab Multiple Regression Analysis
No ratings yet
Minitab Multiple Regression Analysis
6 pages
Minitab Multiple Regression Analysis
100% (1)
Minitab Multiple Regression Analysis
6 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
STATA Command Summary
No ratings yet
STATA Command Summary
3 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Stats Multiple Regression
No ratings yet
Stats Multiple Regression
19 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
1 Review
No ratings yet
1 Review
7 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
HW7
No ratings yet
HW7
1 page
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
NT2580, Week6 Assignment 2
0% (2)
NT2580, Week6 Assignment 2
4 pages
Computer Literacy
No ratings yet
Computer Literacy
41 pages
Automation in Clinical Chemistry
100% (1)
Automation in Clinical Chemistry
20 pages
Research Design: Shobhit Kumar R.B.M.I., Bareilly
No ratings yet
Research Design: Shobhit Kumar R.B.M.I., Bareilly
17 pages
ATTITUE
No ratings yet
ATTITUE
17 pages
Researcher Performance Evaluation Sheet
No ratings yet
Researcher Performance Evaluation Sheet
2 pages
Lecturenotes12 10
No ratings yet
Lecturenotes12 10
22 pages
Relevance of adaptive protection scheme using phasor measurement unit in Indian power grid
No ratings yet
Relevance of adaptive protection scheme using phasor measurement unit in Indian power grid
16 pages
Formulation of Research Findings, Conclusions and Recommendation
100% (2)
Formulation of Research Findings, Conclusions and Recommendation
27 pages
Normal Distribution
50% (2)
Normal Distribution
24 pages
DS-OS06 (1) (1) Avinash
No ratings yet
DS-OS06 (1) (1) Avinash
16 pages
Amey Shinagare - 20106A1023 - BRM Assignment 3
No ratings yet
Amey Shinagare - 20106A1023 - BRM Assignment 3
3 pages
Binary Search Trees
No ratings yet
Binary Search Trees
28 pages
A Tracer Study On The Employment Status
No ratings yet
A Tracer Study On The Employment Status
56 pages
BUSINESS STATISTICS II PROJECT Final
No ratings yet
BUSINESS STATISTICS II PROJECT Final
11 pages
Ut Arlington Thesis Template
100% (3)
Ut Arlington Thesis Template
6 pages
Young Consumers: The Making of Tomorrow's Consumer
No ratings yet
Young Consumers: The Making of Tomorrow's Consumer
14 pages
Business and Corporate Anthropology
No ratings yet
Business and Corporate Anthropology
10 pages
Heuristics For Creativity
No ratings yet
Heuristics For Creativity
2 pages
Chapter 10 Characteristics and Skills of An Entrepreneur
No ratings yet
Chapter 10 Characteristics and Skills of An Entrepreneur
10 pages
Solutions Manual Probability Statistics and Random Processes For PDF
No ratings yet
Solutions Manual Probability Statistics and Random Processes For PDF
11 pages
Sigmaplot 12 Brochure
No ratings yet
Sigmaplot 12 Brochure
6 pages
Cool Japan and The Commodification of C PDF
No ratings yet
Cool Japan and The Commodification of C PDF
31 pages
Advanced Reservoir
100% (1)
Advanced Reservoir
26 pages
Final Year Project
100% (1)
Final Year Project
60 pages
Marketing Research On: Customer Preference Towards Ott Platforms
No ratings yet
Marketing Research On: Customer Preference Towards Ott Platforms
9 pages
Responsible Conduct of Research (RCR) : Principles and Practices
No ratings yet
Responsible Conduct of Research (RCR) : Principles and Practices
34 pages
Sampling Procedure
100% (1)
Sampling Procedure
14 pages

Uploaded by

Uploaded by

Nonparametric Regression

So far we have assumed that

In other words, m(x) = E[Y |X = x] = β0 + β1 X1 + · · · + βp Xp . Now we want to drop the

Given a sample (X1 , Y1 ), . . ., (Xn , Yn ), where Xi ∈ Rd and Yi ∈ R, we estimate the regression

Example 2 (Multiple nonparametric regression) Figure 2 shows an analysis of some

A simpler, but less general model, is the additive model

Y = m1 (x1 ) + m2 (x2 ) + m3 (x3 ) + m4 (x4 ) + . (3)

Figure 2 shows the four estimated functions m

Figure 1: Bone Mineral Density Data

2 The Bias–Variance Tradeoff

The prediction risk or prediction error is

where (X, Y ) denotes a new observation. It follows that

where bn (x) = E(m(x))

Figure 2: Diabetes Data

B1 = [0, h], B2 = [h, 2h], B3 = [2h, 3h], . . . .

We then define Y j to be the average of Yi ’s in Bj :

Here are some examples.

age.male = age[gender == "male"]

age.female = age[gender == "female"]

4 The Kernel Estimator

Figure 3: Simulated data. Various choices of k.

Figure 5: Boxcar kernel estimators for various choices of h.

Figure 6 shows the kernel estimator m(x)

5 The Kernel Estimator is a Linear Smoother

The kernel estimator m(x)

Figure 6: Kernel estimators for various choices of h.

This looks a lot like the equation Y

We will choose h by cross-validation. For simplicity, we focus on leave-one-out cross-

Our strategy is to fit m

Bandwidth Effective Degrees of Freedom

Here is the code:

age.female = age[gender == "female"]

Recall that the prediction error of m

It can be shown (under certain conditions) that

for some constant C1 and Z

Our data can be written as

1. Compute the estimator m(x).

2. Choose a large number B. Usually, we take B = 1, 000 or B = 10, 000.

3. For j = 1, . . . , B do the following:

The resulting function se(x) is an estimate of the standard deviation of m(x).

9 Multiple Nonparametric Regression

Figure 8: Variability bands for the kernel estimator.

The kernel estimator in the multivariate case is defined by

9.2 Additive Models

Interpreting and visualizing a high-dimensional fit is difficult. As the number of covariates

where m1 , . . . , md are smooth functions.

The Backfitting Algorithm

• Apply a smoother to Ye on Xj to obtain m b j.

b j (x) − n−1 ni=1 m

In principle, we could use a different bandwidth hj for each m

Here we consider predicting the price of cars based on some covariates.

10000 20000 2000 3000 100 150 200

Figure 9: The car price data.

100 150 200 250 300 100 150 200

Figure 10: The car price data: estimated functions.

20 25 30 35 2000 2500 3000 3500

100 150 200 250 300 100 150 200

Figure 11: The car price data: residuals

You might also like