0% found this document useful (0 votes)
14 views

Stata_Session_10_2

The document provides notes on basic regression analysis with time series data, focusing on housing investment and price trends in the U.S. It discusses the importance of accounting for trends and seasonality in regression models, demonstrating the effects of adding time trends and detrending methods on the analysis of variables like housing investment and fertility rates. The notes include Stata commands and regression results to illustrate the statistical relationships and the impact of trends on the data analysis.

Uploaded by

Asya Mammadova
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Stata_Session_10_2

The document provides notes on basic regression analysis with time series data, focusing on housing investment and price trends in the U.S. It discusses the importance of accounting for trends and seasonality in regression models, demonstrating the effects of adding time trends and detrending methods on the analysis of variables like housing investment and fertility rates. The notes include Stata commands and regression results to illustrate the statistical relationships and the impact of trends on the data analysis.

Uploaded by

Asya Mammadova
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Econometrics II School of Business

Ruslan Aliyev

STATA NOTES 10.2

Basic Regression Analysis with


Time Series Data

4. Trending Time Series


The data in hseinv are annual observations on housing investment and a housing price index in the United States
for 1947 through 1988. First, we estimate a static model where invpc denotes real per capita housing investment
(in thousands of dollars), and let price denote a housing price index (equal to 1 in 1982).
regress linvpc lprice
Source SS df MS Number of obs = 42
F( 1, 40) = 10.53
Model .254364468 1 .254364468 Prob > F = 0.0024
Residual .966255566 40 .024156389 R-squared = 0.2084
Adj R-squared = 0.1886
Total 1.22062003 41 .02977122 Root MSE = .15542

linvpc Coef. Std. Err. t P>|t| [95% Conf. Interval]

lprice 1.240943 .3824192 3.24 0.002 .4680452 2.013841


_cons -.5502345 .0430266 -12.79 0.000 -.6371945 -.4632746

We must be careful here. Both invpc and price have upward trends.
twoway (scatter linvpc year) (scatter lprice year) (lfit linvpc year) (lfit lprice year)
0
-.2
-.4
-.6
-.8
-1

1950 1960 1970 1980 1990


1947-1988

log(invpc) log(price)
Fitted values Fitted values

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

1
Econometrics II School of Business
Ruslan Aliyev

The previous regression results show a spurious relationship between invpc and price due to the fact that both
variables are trending upward over time. To account for the trending behavior of the variables, we add a time
trend:
regress linvpc t lprice
Source SS df MS Number of obs = 42
F( 2, 39) = 10.08
Model .415945108 2 .207972554 Prob > F = 0.0003
Residual .804674927 39 .02063269 R-squared = 0.3408
Adj R-squared = 0.3070
Total 1.22062003 41 .02977122 Root MSE = .14364

linvpc Coef. Std. Err. t P>|t| [95% Conf. Interval]

t .0098287 .0035122 2.80 0.008 .0027246 .0169328


lprice -.3809612 .6788352 -0.56 0.578 -1.754035 .9921125
_cons -.9130595 .1356133 -6.73 0.000 -1.187363 -.6387557

The story is much different now: the estimated price elasticity is negative and not statistically different from zero.
The time trend is statistically significant, and its coefficient implies an approximate 1% increase in invpc per year,
on average. From this analysis, we cannot conclude that real per capita housing investment is influenced at all by
price. There are other factors, captured in the time trend, that affect invpc, but we have not modeled these.
We can also add a linear time trend to the fertility equation (fertil3.dta) estimated earlier.
regress gfr pe ww2 pill t
Source SS df MS Number of obs = 72
F( 4, 67) = 32.84
Model 18441.2357 4 4610.30894 Prob > F = 0.0000
Residual 9406.65967 67 140.397905 R-squared = 0.6622
Adj R-squared = 0.6420
Total 27847.8954 71 392.223879 Root MSE = 11.849

gfr Coef. Std. Err. t P>|t| [95% Conf. Interval]

pe .2788778 .0400199 6.97 0.000 .1989978 .3587578


ww2 -35.59228 6.297377 -5.65 0.000 -48.1619 -23.02266
pill .9974479 6.26163 0.16 0.874 -11.50082 13.49571
t -1.149872 .1879038 -6.12 0.000 -1.524929 -.7748145
_cons 111.7694 3.357765 33.29 0.000 105.0673 118.4716

The coefficient on pe is more than triple the estimate from the model without time trend, and it is much more
statistically significant. Interestingly, pill is not significant once an allowance is made for a linear trend. As can
be seen by the estimate, gfr was falling, on average, over this period, other factors being equal.
twoway (scatter gfr year) (lfit gfr year)

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

2
Econometrics II School of Business
Ruslan Aliyev

140
120
100
80
60

1920 1940 1960 1980


1913 to 1984

births per 1000 women 15-44 Fitted values

Since the general fertility rate exhibited both upward and downward trends during the period from 1913 through
1984, we can see how robust the estimated effect of pe is when we use a quadratic trend:
regress gfr pe ww2 pill t c.t#c.t, vsquish

Source SS df MS Number of obs = 72


F( 5, 66) = 35.09
Model 20236.3981 5 4047.27961 Prob > F = 0.0000
Residual 7611.49734 66 115.325717 R-squared = 0.7267
Adj R-squared = 0.7060
Total 27847.8954 71 392.223879 Root MSE = 10.739

gfr Coef. Std. Err. t P>|t| [95% Conf. Interval]

pe .3478126 .0402599 8.64 0.000 .2674311 .428194


ww2 -35.88028 5.707921 -6.29 0.000 -47.27651 -24.48404
pill -10.11972 6.336094 -1.60 0.115 -22.77014 2.530696
t -2.531426 .3893863 -6.50 0.000 -3.308861 -1.753991
c.t#c.t .0196126 .004971 3.95 0.000 .0096876 .0295377
_cons 124.0919 4.360738 28.46 0.000 115.3854 132.7984

The coefficient on pe is even larger and more statistically significant. Now, pill has the expected negative effect
and is marginally significant, and both trend terms are statistically significant. The quadratic trend is a flexible
way to account for the unusual trending behavior of gfr.

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

3
Econometrics II School of Business
Ruslan Aliyev

5. Detrending
We can apply linear detrending in the previous example:
regress gfr t
predict gfr_dt, residuals
regress pe t
predict pe_dt, residuals
regress ww2 t
predict ww2_dt, residuals
regress pill t
predict pill_dt, residuals
regress gfr_dt pe_dt ww2_dt pill_dt

Source SS df MS Number of obs = 72


F( 3, 68) = 25.00
Model 10374.1845 3 3458.06148 Prob > F = 0.0000
Residual 9406.65981 68 138.333233 R-squared = 0.5245
Adj R-squared = 0.5035
Total 19780.8443 71 278.60344 Root MSE = 11.762

gfr_dt Coef. Std. Err. t P>|t| [95% Conf. Interval]

pe_dt .2788778 .0397245 7.02 0.000 .1996088 .3581468


ww2_dt -35.59228 6.250901 -5.69 0.000 -48.06576 -23.1188
pill_dt .9974471 6.215418 0.16 0.873 -11.40523 13.40012
_cons -7.33e-08 1.386108 -0.00 1.000 -2.765935 2.765935

As we can see the regression with detrended variables has exactly the same coefficients with the regression with
a time trend variable. We can also observe that R-squared and adjusted R-squared are different in the model with
trend and the model with detrended variables. Since time trend inflates variation (SST) in the dependent variable,
the regressions with a time trend produces higher R-squared measures as more variation (trend) is explained with
included time variable. However, in the regression with detrended variables R-squared and Adjusted R-squared
appear to be smaller.
We can also detect how linear detrending works by looking at the graph.
twoway (line gfr t) (line gfr_dt t)

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

4
Econometrics II School of Business
Ruslan Aliyev

We can also apply non-linear, i.e. quadratic detrending in the previous example:
regress gfr t c.t#c.t
predict gfr_dtt, residuals
regress pe t c.t#c.t
predict pe_dtt, residuals
regress ww2 t c.t#c.t
predict ww2_dtt, residuals
regress pill t c.t#c.t
predict pill_dtt, residuals
regress gfr_dtt pe_dtt ww2_dtt pill_dtt

Source SS df MS Number of obs = 72


F( 3, 68) = 34.21
Model 11489.0424 3 3829.68079 Prob > F = 0.0000
Residual 7611.49738 68 111.933785 R-squared = 0.6015
Adj R-squared = 0.5839
Total 19100.5397 71 269.021686 Root MSE = 10.58

gfr_dtt Coef. Std. Err. t P>|t| [95% Conf. Interval]

pe_dtt .3478126 .0396634 8.77 0.000 .2686655 .4269597


ww2_dtt -35.88028 5.623355 -6.38 0.000 -47.10151 -24.65905
pill_dtt -10.11972 6.242221 -1.62 0.110 -22.57588 2.336434
_cons 1.61e-08 1.24685 0.00 1.000 -2.488051 2.488051

Now the coefficients from the regression with detrended variables are exactly the same with the coefficients from
the regression with a time and squared time variables.
Again, let’s visually see how quadratic and cubic detrending changes the gfr series.
regress gfr t c.t#c.t
predict gfr_dtt, residuals
twoway (line gfr t) (line gfr_dtt t)

regress gfr t c.t#c.t c.t#c.t#c.t


predict gfr_dttt, residuals
twoway (line gfr t) (line gfr_dttt t)
150

150
100

100
50

50
0

0
-50

-50

0 20 40 60 80 0 20 40 60 80
time trend, t=1,...,72 time trend, t=1,...,72

births per 1000 women 15-44 Residuals births per 1000 women 15-44 Residuals

Since the gfr variable exhibit cyclical behavior, the model with cubic time variable achieves better detrending
results.

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

5
Econometrics II School of Business
Ruslan Aliyev

6. Seasonality
The data in gasoline.dta is used to display seasonality problem. The data comprises monthly gasoline sales in the
USA. First we set time variable by using tsset command.
tsset t
time variable: t, 1983m1 to 2009m11
delta: 1 month

As we can see Stata automatically recognizes data as monthly and identifies time period. In the following
commands pay attention to how time interval is introduced by using tin operator. Special time format used by
Stata for variables denoting time requires a distinct approach.
line gasoline t if tin(1990m1, 1990m12)
line gasoline t if tin(1991m1, 1991m12)
line gasoline t if tin(1992m1, 1992m12)
line gasoline t if tin(2000m1, 2000m12)

64000
63000
62000

62000
61000
gasoline

gasoline

60000
60000

58000
59000
58000

56000

1990m1 1990m4 1990m7 1990m10 1991m1 1991m1 1991m4 1991m7 1991m10 1992m1
t t
64000
60000

62000
59000

60000
gasoline

gasoline
58000

58000
57000

56000
56000

54000

1992m1 1992m4 1992m7 1992m10 1993m1 2000m1 2000m4 2000m7 2000m10 2001m1
t t

Even visually one can identify seasonal patterns in these time series from in different years.

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

6
Econometrics II School of Business
Ruslan Aliyev

Regressing gasoline prices over month dummies yields:


regress gasoline m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11
Source SS df MS Number of obs = 323
F( 11, 311) = 4.29
Model 636131968 11 57830178.9 Prob > F = 0.0000
Residual 4.1919e+09 311 13478698.6 R-squared = 0.1318
Adj R-squared = 0.1010
Total 4.8280e+09 322 14993811.3 Root MSE = 3671.3

gasoline Coef. Std. Err. t P>|t| [95% Conf. Interval]

m1 -4309.025 1008.773 -4.27 0.000 -6293.908 -2324.142


m2 -1928.984 1008.773 -1.91 0.057 -3913.867 55.8983
m3 -671.3583 1008.773 -0.67 0.506 -2656.241 1313.524
m4 -829.025 1008.773 -0.82 0.412 -2813.908 1155.858
m5 -210.0881 1008.773 -0.21 0.835 -2194.971 1774.795
m6 1102.401 1008.773 1.09 0.275 -882.4817 3087.284
m7 644.1563 1008.773 0.64 0.524 -1340.726 2629.039
m8 1003.964 1008.773 1.00 0.320 -980.9188 2988.847
m9 -822.2439 1008.773 -0.82 0.416 -2807.127 1162.639
m10 -817.0473 1008.773 -0.81 0.419 -2801.93 1167.835
m11 -1260.781 1008.773 -1.25 0.212 -3245.663 724.102
_cons 59735.28 720.008 82.96 0.000 58318.57 61151.98

We can use the following commands to deseasonalize the series:


predict gasoline_ds, residual
predict gasoline_hat

With these commands we divide original gasoline series into two parts: seasonal (red line) and seasonally adjusted
parts (blue line):
twoway (line gasoline_ds t) (line gasoline_hat t)
60000
40000
20000

0
-20000

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1


t

Residuals Fitted values

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

7
Econometrics II School of Business
Ruslan Aliyev

We can also put all three pieces together: actual values (green line) that is the sum of two parts: 1) the seasonal
part (red line – modeled) and 2) deseasonalized values (blue line – residuals).

twoway (line gasoline_ds t) (line gasoline_hat t) (line gasoline t)


60000
40000
20000

0
-20000

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1


t

Residuals Fitted values


gasoline

Now let’s use this model to predict future values of gasoline sales. Since we do not have other explanatory
variables, the model explains the variation in gasoline sales only based seasonal dummies.
We can add extra periods to the data by tsappend command. Then we have to extend months dummies for the
additional periods.
tsappend, add(36)
g mon = month(dofm(t))
tabulate mon, generate(md)
regress gasoline md1 md2 md3 md4 md5 md6 md7 md8 md9 md10 md11
predict gasoline_n
line gasoline_n t
62000
60000
Fitted values

58000
56000
54000

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1


t

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

8
Econometrics II School of Business
Ruslan Aliyev

We can use actual values till 2009m11 and predicted values afterwards.
gen gasoline_nn=gasoline
replace gasoline_nn=gasoline_n if tin(2009m12, 2012m11)
line gasoline_nn t

70000
65000
60000
gasoline_nn

55000
50000
45000

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1


t

These predictions do not take into account non-seasonal trend observed in original series. It would be proper to to
assume non-linear trend and seasonality while making predictions.
drop gasoline_n
regress gasoline md1 md2 md3 md4 md5 md6 md7 md8 md9 md10 md11 t c.t#c.t
predict gasoline_n
line gasoline_n t
65000
60000
Fitted values

55000
50000

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1


t

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

9
Econometrics II School of Business
Ruslan Aliyev

We can again use actual values till 2009m11 and improved predicted values afterwards.
gen gasoline_nn=gasoline
replace gasoline_nn=gasoline_n if tin(2009m12, 2012m11)
line gasoline_nn t

70000
65000
60000
gasoline_nn

55000
50000
45000

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1


t

______________________________________________________________________________________________________________
No part of this material may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording,
or other electronic or mechanical methods, without the prior written permission of the publisher.

10

You might also like