Stata_Session_10_2
Stata_Session_10_2
Ruslan Aliyev
We must be careful here. Both invpc and price have upward trends.
twoway (scatter linvpc year) (scatter lprice year) (lfit linvpc year) (lfit lprice year)
0
-.2
-.4
-.6
-.8
-1
log(invpc) log(price)
Fitted values Fitted values
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
1
Econometrics II School of Business
Ruslan Aliyev
The previous regression results show a spurious relationship between invpc and price due to the fact that both
variables are trending upward over time. To account for the trending behavior of the variables, we add a time
trend:
regress linvpc t lprice
Source SS df MS Number of obs = 42
F( 2, 39) = 10.08
Model .415945108 2 .207972554 Prob > F = 0.0003
Residual .804674927 39 .02063269 R-squared = 0.3408
Adj R-squared = 0.3070
Total 1.22062003 41 .02977122 Root MSE = .14364
The story is much different now: the estimated price elasticity is negative and not statistically different from zero.
The time trend is statistically significant, and its coefficient implies an approximate 1% increase in invpc per year,
on average. From this analysis, we cannot conclude that real per capita housing investment is influenced at all by
price. There are other factors, captured in the time trend, that affect invpc, but we have not modeled these.
We can also add a linear time trend to the fertility equation (fertil3.dta) estimated earlier.
regress gfr pe ww2 pill t
Source SS df MS Number of obs = 72
F( 4, 67) = 32.84
Model 18441.2357 4 4610.30894 Prob > F = 0.0000
Residual 9406.65967 67 140.397905 R-squared = 0.6622
Adj R-squared = 0.6420
Total 27847.8954 71 392.223879 Root MSE = 11.849
The coefficient on pe is more than triple the estimate from the model without time trend, and it is much more
statistically significant. Interestingly, pill is not significant once an allowance is made for a linear trend. As can
be seen by the estimate, gfr was falling, on average, over this period, other factors being equal.
twoway (scatter gfr year) (lfit gfr year)
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
2
Econometrics II School of Business
Ruslan Aliyev
140
120
100
80
60
Since the general fertility rate exhibited both upward and downward trends during the period from 1913 through
1984, we can see how robust the estimated effect of pe is when we use a quadratic trend:
regress gfr pe ww2 pill t c.t#c.t, vsquish
The coefficient on pe is even larger and more statistically significant. Now, pill has the expected negative effect
and is marginally significant, and both trend terms are statistically significant. The quadratic trend is a flexible
way to account for the unusual trending behavior of gfr.
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
3
Econometrics II School of Business
Ruslan Aliyev
5. Detrending
We can apply linear detrending in the previous example:
regress gfr t
predict gfr_dt, residuals
regress pe t
predict pe_dt, residuals
regress ww2 t
predict ww2_dt, residuals
regress pill t
predict pill_dt, residuals
regress gfr_dt pe_dt ww2_dt pill_dt
As we can see the regression with detrended variables has exactly the same coefficients with the regression with
a time trend variable. We can also observe that R-squared and adjusted R-squared are different in the model with
trend and the model with detrended variables. Since time trend inflates variation (SST) in the dependent variable,
the regressions with a time trend produces higher R-squared measures as more variation (trend) is explained with
included time variable. However, in the regression with detrended variables R-squared and Adjusted R-squared
appear to be smaller.
We can also detect how linear detrending works by looking at the graph.
twoway (line gfr t) (line gfr_dt t)
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
4
Econometrics II School of Business
Ruslan Aliyev
We can also apply non-linear, i.e. quadratic detrending in the previous example:
regress gfr t c.t#c.t
predict gfr_dtt, residuals
regress pe t c.t#c.t
predict pe_dtt, residuals
regress ww2 t c.t#c.t
predict ww2_dtt, residuals
regress pill t c.t#c.t
predict pill_dtt, residuals
regress gfr_dtt pe_dtt ww2_dtt pill_dtt
Now the coefficients from the regression with detrended variables are exactly the same with the coefficients from
the regression with a time and squared time variables.
Again, let’s visually see how quadratic and cubic detrending changes the gfr series.
regress gfr t c.t#c.t
predict gfr_dtt, residuals
twoway (line gfr t) (line gfr_dtt t)
150
100
100
50
50
0
0
-50
-50
0 20 40 60 80 0 20 40 60 80
time trend, t=1,...,72 time trend, t=1,...,72
births per 1000 women 15-44 Residuals births per 1000 women 15-44 Residuals
Since the gfr variable exhibit cyclical behavior, the model with cubic time variable achieves better detrending
results.
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
5
Econometrics II School of Business
Ruslan Aliyev
6. Seasonality
The data in gasoline.dta is used to display seasonality problem. The data comprises monthly gasoline sales in the
USA. First we set time variable by using tsset command.
tsset t
time variable: t, 1983m1 to 2009m11
delta: 1 month
As we can see Stata automatically recognizes data as monthly and identifies time period. In the following
commands pay attention to how time interval is introduced by using tin operator. Special time format used by
Stata for variables denoting time requires a distinct approach.
line gasoline t if tin(1990m1, 1990m12)
line gasoline t if tin(1991m1, 1991m12)
line gasoline t if tin(1992m1, 1992m12)
line gasoline t if tin(2000m1, 2000m12)
64000
63000
62000
62000
61000
gasoline
gasoline
60000
60000
58000
59000
58000
56000
1990m1 1990m4 1990m7 1990m10 1991m1 1991m1 1991m4 1991m7 1991m10 1992m1
t t
64000
60000
62000
59000
60000
gasoline
gasoline
58000
58000
57000
56000
56000
54000
1992m1 1992m4 1992m7 1992m10 1993m1 2000m1 2000m4 2000m7 2000m10 2001m1
t t
Even visually one can identify seasonal patterns in these time series from in different years.
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
6
Econometrics II School of Business
Ruslan Aliyev
With these commands we divide original gasoline series into two parts: seasonal (red line) and seasonally adjusted
parts (blue line):
twoway (line gasoline_ds t) (line gasoline_hat t)
60000
40000
20000
0
-20000
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
7
Econometrics II School of Business
Ruslan Aliyev
We can also put all three pieces together: actual values (green line) that is the sum of two parts: 1) the seasonal
part (red line – modeled) and 2) deseasonalized values (blue line – residuals).
0
-20000
Now let’s use this model to predict future values of gasoline sales. Since we do not have other explanatory
variables, the model explains the variation in gasoline sales only based seasonal dummies.
We can add extra periods to the data by tsappend command. Then we have to extend months dummies for the
additional periods.
tsappend, add(36)
g mon = month(dofm(t))
tabulate mon, generate(md)
regress gasoline md1 md2 md3 md4 md5 md6 md7 md8 md9 md10 md11
predict gasoline_n
line gasoline_n t
62000
60000
Fitted values
58000
56000
54000
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
8
Econometrics II School of Business
Ruslan Aliyev
We can use actual values till 2009m11 and predicted values afterwards.
gen gasoline_nn=gasoline
replace gasoline_nn=gasoline_n if tin(2009m12, 2012m11)
line gasoline_nn t
70000
65000
60000
gasoline_nn
55000
50000
45000
These predictions do not take into account non-seasonal trend observed in original series. It would be proper to to
assume non-linear trend and seasonality while making predictions.
drop gasoline_n
regress gasoline md1 md2 md3 md4 md5 md6 md7 md8 md9 md10 md11 t c.t#c.t
predict gasoline_n
line gasoline_n t
65000
60000
Fitted values
55000
50000
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
9
Econometrics II School of Business
Ruslan Aliyev
We can again use actual values till 2009m11 and improved predicted values afterwards.
gen gasoline_nn=gasoline
replace gasoline_nn=gasoline_n if tin(2009m12, 2012m11)
line gasoline_nn t
70000
65000
60000
gasoline_nn
55000
50000
45000
______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.
10