0% found this document useful (0 votes)
583 views

CH 01 Wooldridge 6e PPT Updated

Why do we study econometrics? This is about chapter 1 of Econometrics.

Uploaded by

My Tran Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
583 views

CH 01 Wooldridge 6e PPT Updated

Why do we study econometrics? This is about chapter 1 of Econometrics.

Uploaded by

My Tran Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

Course Convenor/Lecturer:

Anh-Tuan Doan, Ph.D.


[email protected]

All emails should insert “Applied Econometrics – class –


student name” at the start of subject.

Without this course code, I will not reply to your email.

1
Learning resources
Prescribed Wooldridge, J.M. (2013). Introductory
Textbook Econometrics: A Modern Approach, 5th or 6th
Edition, South-Western Cengage Learning.

Recommended Gujarati, D.N. (2009). Essentials of Econometrics,


readings McGraw-Hill/Irwin.
Gujarati, D.N. (2008). Basic Econometrics,
McGraw-Hill.
Ramanathan, R. (2002).Introductory Econometrics
with Applications, Harcourt CollegePublishers.

2
3
Assessment
ASSESSMENT VALUE
ASSESSMENT ITEM AND DUE DATE
NUMBER (/100)

1. Class attendance & participation (Individual) 10%

Mid-term exam (Closed book,15-minute


2. reading + 1-hour writing) (Individual) 20%
Due: Session 10
Two home assignments (Individual)
3. 20%
Due: Sessions 10 and 20
Final exam (Closed book,15-minute reading +
4. 2-hour writing) (Individual) 50%
Due: TBA

4
The Nature of
Chapter 1
Econometrics
and Economic
Data

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license distributed with a
certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. © kentoh/Shutterstock.
What is Econometrics?

6
Why do we study econometrics?
• Rare in economics (and many other areas without
labs!) to have experimental data
• Need to use non-experimental, or observational data to
make inferences
• Important to be able to apply economic theory to real
world data

7
The Nature of Econometrics
and Economic Data
● What is econometrics?
• Econometrics = use of statistical methods to analyze economic data
• Econometricians typically analyze nonexperimental data

● Typical goals of econometric analysis


• Estimating relationships between economic variables
• Testing economic theories and hypotheses
• Forecasting economic variables
• Evaluating and implementing government and business policy

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 8
The Nature of Econometrics
and Economic Data
● Steps in econometric analysis
• 1) Economic model (this step is often skipped)
• 2) Econometric model

● Economic models
• Maybe micro- or macromodels
• Often use optimizing behaviour, equilibrium modeling, …
• Establish relationships between economic variables
• Examples: demand equations, pricing equations, …

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 9
Why is it so important?
• An empirical analysis uses data to test a theory or to
estimate a relationship
• A formal economic model can be tested
• Theory may be ambiguous as to the effect of some
policy change – can use econometrics to evaluate the
program

10
RESEARCH PROCESS

11
The Nature of Econometrics
and Economic Data
● Economic model of crime (Becker (1968))
• Derives equation for criminal activity based on utility maximization

Hours spent in
criminal activities

Age
“Wage” of cri-
minal activities Probability of Expected
Wage for legal
Other Probability of conviction if sentence
employment
income getting caught caught

• Functional form of relationship not specified


• Equation could have been postulated without economic modeling

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 12
The Nature of Econometrics
and Economic Data
● Model of job training and worker productivity
• What is effect of additional training on worker productivity?
• Formal economic theory not really needed to derive equation:

Hourly wage

Years of formal
education Weeks spent
Years of work- in job training
force experience

• Other factors may be relevant, but these are the most important (?)

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 13
The Nature of Econometrics
and Economic Data
● Econometric model of criminal activity
• The functional form has to be specified
• Variables may have to be approximated by other quantities

Measure of cri- Wage for legal Other Frequency of


minal activity employment income prior arrests
Unobserved deter-
minants of criminal
activity

e.g. moral character,


wage in criminal activity,
Frequency of Average sentence Age family background …
conviction length after conviction

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 14
The Nature of Econometrics
and Economic Data
● Econometric model of job training and worker productivity

Unobserved deter-
minants of the wage

e.g. innate ability,


Hourly wage Years of formal Years of work- Weeks spent quality of education,
education force experience in job training family background …

● Most of econometrics deals with the specification of the error

● Econometric models may be used for hypothesis testing


• For example, the parameter represents “effect of training on
wage”
• How large is this effect? Is it different from zero?
© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 15
Why does econometric model always include an
error term (u)?
• Cannot identify exactly all factors impacting dependent
variable Y

• All necessary data elements are not included

• Measurement errors from data collection

16
The Nature of Econometrics
and Economic Data
● Econometric analysis requires data

● Different kinds of economic data sets


• Cross-sectional data
• Time series data
• Pooled cross sections
• Panel/Longitudinal data

● Econometric methods depend on the nature of the data used


• Use of inappropriate methods may lead to misleading results

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 17
The Nature of Econometrics
and Economic Data
● Cross-sectional data sets
• Sample of individuals, households, firms, cities, states, countries,
or other units of interest at a given point of time/in a given period
• Cross-sectional observations are more or less independent
• For example, pure random sampling from a population
• Sometimes pure random sampling is violated, e.g. units refuse to
respond in surveys, or if sampling is characterized by clustering
• Cross-sectional data typically encountered in applied
microeconomics

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 18
The Nature of Econometrics
and Economic Data
● Cross-sectional data set on wages and other characteristics

Indicator variables
(1 = yes, 0 = no)

Observation number Hourly wage Years of Years of


education experience
© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 19
The Nature of Econometrics
and Economic Data
● Cross-sectional data on growth rates and country
characteristics

Average growth rate of real Government consumption Adult secondary


per capita GDP as a percentage of GDP education rates

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 20
The Nature of Econometrics
and Economic Data
● Time series data
• Observations of a variable or several variables over time
• For example, stock prices, money supply, consumer price index,
gross domestic product, annual homicide rates, automobile sales, …
• Time series observations are typically serially correlated
• Ordering of observations conveys important information
• Data frequency: daily, weekly, monthly, quarterly, annually, …
• Typical features of time series: trends and seasonality
• Typical applications: applied macroeconomics and finance

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 21
The Nature of Econometrics
and Economic Data
● Time series data on minimum wages and related variables

Average minimum Average Unemployment Gross national


wage for the given year coverage rate rate product

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 22
The Nature of Econometrics
and Economic Data
● Pooled cross sections
• Two or more cross sections are combined in one data set
• Cross sections are drawn independently of each other
• Pooled cross sections often used to evaluate policy changes
• Example:
• Evaluate effect of change in property taxes on house prices
• Random sample of house prices for the year 1993
• A new random sample of house prices for the year 1995
• Compare before/after (1993: before reform, 1995: after reform)

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 23
The Nature of Econometrics
and Economic Data
● Pooled cross sections on housing prices
Property tax
Size of house
in square feet
Number of bedrooms
Number of bathrooms

Before reform

After reform

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 24
The Nature of Econometrics
and Economic Data
● Panel or longitudinal data
• The same cross-sectional units are followed over time
• Panel data have a cross-sectional and a time series dimension
• Panel data can be used to account for time-invariant unobservables
• Panel data can be used to model lagged responses
• Example:
• City crime statistics; each city is observed in two years
• Time-invariant unobserved city characteristics may be modeled
• Effect of police on crime rates may exhibit time lag

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 25
The Nature of Econometrics
and Economic Data
● Two-year panel data on city crime statistics

Each city has two time


series observations

Number of
police in 1986

Number of
police in 1990

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 26
The Nature of Econometrics
and Economic Data
● Causality and the notion of ceteris paribus
Definition of causal effect of on :

“How does variable change if variable is changed


but all other relevant factors are held constant”

● Most economic questions are ceteris paribus questions

● It is important to define which causal effect one is interested


in

● It is useful to describe how an experiment would have to be


designed to infer the causal effect in question

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 27
The Nature of Econometrics
and Economic Data
● Causal effect of fertilizer on crop yield
• “By how much will the production of soybeans increase if one
increases the amount of fertilizer applied to the ground”
• Implicit assumption: all other factors that influence crop yield such
as quality of land, rainfall, presence of parasites etc. are held fixed

● Experiment:
• Choose several one-acre plots of land; randomly assign different
amounts of fertilizer to the different plots; compare yields
• Experiment works because amount of fertilizer applied is unrelated
to other factors influencing crop yields

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 28
The Nature of Econometrics
and Economic Data
● Measuring the return to education
• “If a person is chosen from the population and given another
year of education, by how much will his or her wage increase?”
• Implicit assumption: all other factors that influence wages such as
experience, family background, intelligence etc. are held fixed

● Experiment:
• Choose a group of people; randomly assign different amounts of
education to them (infeasable!); compare wage outcomes
• Problem without random assignment: amount of education is
related to other factors that influence wages (e.g. intelligence)

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 29
The Nature of Econometrics
and Economic Data
● Effect of law enforcement on city crime level
• “If a city is randomly chosen and given ten additional police officers,
by how much would its crime rate fall?”
• Alternatively: “If two cities are the same in all respects, except that
city A has ten more police officers than city B, by how much would
the two cities‘ crime rates differ?”

● Experiment:
• Randomly assign number of police officers to a large number of
cities
• In reality, number of police officers will be determined by crime rate
(simultaneous determination of crime and number of police)

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 30
The Nature of Econometrics
and Economic Data
● Effect of the minimum wage on unemployment
• “By how much (if at all) will unemployment increase if the minimum
wage is increased by a certain amount (holding other things fixed)?”

● Experiment:
• Government randomly chooses minimum wage each year and
observes unemployment outcomes
• Experiment will work because level of minimum wage is unrelated
to other factors determining unemployment
• In reality, the level of the minimum wage will depend on political
and economic factors that also influence unemployment

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 31
The Nature of Econometrics
and Economic Data
● Testing predictions of economic theories
• Economic theories are not always stated in terms of causal effects
• For example, the expectations hypothesis states that long term
interest rates equal compounded expected short term interest rates

• An implicaton is that the interest rate of a three-months T-bill


should be equal to the expected interest rate for the first three
months of a six-months T-bill; this can be tested using econometric
methods

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use a s permitted in a license
distributed with a certain product or service or otherwise on a password -protected website or school-approved learning management system for classroom use. 32
Statistics Review
– Mean – Standard Deviation
– Median – Histograms
– Mode – Skewness
– Range – Kurtosis
– Interquartile Range – Covariance
– Variance – Correlation Coefficient

33
Measures of central tendency – Review

• Measures of the location of the middle or


the center of a distribution
• Mean
• Median
• Mode

34
Mean – Review

• Mean – Average value of a distribution; Most


commonly used measure of central tendency

• Median – This is the value of a variable such


that half of the observations are above and half
are below this value, i.e., this value divides the
distribution into two groups of equal size

• Mode - This is the most frequently occurring


value in the distribution
35
An Example Data Set
• Daily low temperatures recorded in Chapel Hill
(01/18-01/31, 2005, °F)
Jan. 18 – 11 Jan. 25 – 25
Jan. 19 – 11 Jan. 26 – 33
Jan. 20 – 25 Jan. 27 – 22
Jan. 21 – 29 Jan. 28 – 18
Jan. 22 – 27 Jan. 29 – 19
Jan. 23 – 14 Jan. 30 – 30
Jan. 24 – 11 Jan. 31 – 27
• For these 14 values, we will calculate all three
measures of central tendency - the mean, median,
and mode 36
Mean – Review
• Mean –Most commonly used measure of central
tendency
• Procedures
• (1) Sum all the values in the data set
• (2) Divide the sum by the number of values in the
n

x
data set
i

• Watch for outliers x i 1


n 37
Mean – Review
• (1) Sum all the values in the data set
 11 + 11 + 11 + 14 + 18 + 19 + 22 + 25 + 25 + 27 +
27 + 29 + 30 + 33 = 302
n

x
• (2) Divide the sum by the number
of values in the data set i

 Mean = 302/14 = 21.57


x i 1
n
• Is this a good measure of central tendency for this
data set?
38
Median – Review
• Median - 1/2 of the values are above it & 1/2 below
• (1) Sort the data in ascending order
• (2) Find the value with an equal number of values
above and below it
• (3) Odd number of observations  [(n-1)/2]+1 value
from the lowest
• (4) Even number of observations  average (n/2)
and [(n/2)+1] values
• (5) Use the median with asymmetric distributions,
particularly with outliers
39
Median – Review
• (1) Sort the data in ascending order:
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33

• (2) Find the value with an equal number of values


above and below it
Even number of observations  average the
(n/2) and [(n/2)+1] values
 (14/2) = 7; [(14/2)+1] = 8
 (22+25)/2 = 23.5 (°F)

• Is this a good measure of central tendency for this


data? 40
Mode – Review
• Mode – This is the most frequently occurring value
in the distribution
• (1) Sort the data in ascending order
• (2) Count the instances of each value
• (3) Find the value that has the most occurrences
• If more than one value occurs an equal number of
times and these exceed all other counts, we have
multiple modes
• Use the mode for multi-modal data
41
Mode – Review
• (1) Sort the data in ascending order:
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33
• (2) Count the instances of each value:
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33

3x 1x 1x 1x 1x 2x 2x 1x 1x 1x
• (3) Find the value that has the most occurrences
 mode = 11 (°F)

• Is this a good measure of the central tendency of


this data set? 42
Measures of Dispersion – Review
• Measures of dispersion are concerned with the
distribution of values around the mean in data:
– Range

– Interquartile range

– Variance

– Standard deviation

43
An Example Data Set
• Daily low temperatures recorded in Chapel Hill
(01/18-01/31, 2005, °F)
Jan. 18 – 11 Jan. 25 – 25
Jan. 19 – 11 Jan. 26 – 33
Jan. 20 – 25 Jan. 27 – 22
Jan. 21 – 29 Jan. 28 – 18
Jan. 22 – 27 Jan. 29 – 19
Jan. 23 – 14 Jan. 30 – 30
Jan. 24 – 11 Jan. 31 – 27
• For these 14 values, we will calculate all measures
of dispersion
44
Range – Review
• Range – The difference between the largest and
the smallest values
• (1) Sort the data in ascending order
• (2) Find the largest value
 max
• (3) Find the smallest value
 min
• (4) Calculate the range
 range = max - min
• Vulnerable to the influence of outliers 45
Range – Review
• Range – The difference between the largest and
the smallest values
• (1) Sort the data in ascending order
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33
• (2) Find the largest value
 max = 33
• (3) Find the smallest value
 min = 11
• (4) Calculate the range
 range = 33 – 11 = 22 46
Interquartile Range – Review

• Interquartile range – The difference between the


25th and 75th percentiles

• (1) Sort the data in ascending order

• (2) Find the 25th percentile – (n+1)/4 observation

• (3) Find the 75th percentile – 3(n+1)/4 observation

• (4) Interquartile range is the difference between


these two percentiles
47
Interquartile Range – Review
• (1) Sort the data in ascending order
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33

• (2) Find the 25th percentile – (n+1)/4 observation


 (14+1)/4 = 3.75  11+(14-11)*0.75 = 13.265

• (3) Find the 75th percentile – 3(n+1)/4 observation


 3(14+1)/4 = 11.25  27+(29-27)*0.25 = 27.5

• (4) Interquartile range is the difference between


these two percentiles
 27.5 – 13.265 = 14.235 48
Variance – Review

• Variance is formulated as the sum of squares of


statistical distances (or deviation) divided by
the population size or the sample size minus one:
n

 (x  x)
i
2

s 
2 i 1
n 1
49
Variance – Review
• (1) Calculate the mean
 x
• (2) Calculate the deviation for each value
 xi  x
• (3) Square each of the deviations
 ( xi  x ) 2

• (4) Sum the squared deviations


  i
( x  x ) 2

• (5) Divide the sum of squares by (n-1) for a sample


  i
( x  x ) 2
/( n 1)
50
Variance – Review
• (1) Calculate the mean
 x  21.57
• (2) Calculate the deviation for each value
 xi  x
Jan. 18 (11 – 21.57) = -10.57 Jan. 25 (25 – 21.57) = 3.43
Jan. 19 (11 – 21.57) = -10.57 Jan. 26 (33 – 21.57) = 11.43
Jan. 20 (25 – 21.57) = 3.43 Jan. 27 (22 – 21.57) = 0.43
Jan. 21 (29 – 21.57) = 7.43 Jan. 28 (18 – 21.57) = -3.57
Jan. 22 (27 – 21.57) = 5.43 Jan. 29 (19 – 21.57) = -2.57
Jan. 23 (14 – 21.57) = -7.57 Jan. 30 (30 – 21.57) = 8.42
Jan. 24 (11 – 21.57) = -10.57 Jan. 31 (27 – 21.57) = 5.42
51
Variance – Review
• (3) Square each of the deviations
 ( xi  x ) 2

Jan. 18 (-10.57)^2 = 111.76 Jan. 25 (3.43)^2 = 11.76


Jan. 19 (-10.57)^2 = 111.76 Jan. 26 (11.43)^2 = 130.61
Jan. 20 (3.43)^2 = 11.76 Jan. 27 (0.43)^2 = 0.18
Jan. 21 (7.43)^2 = 55.18 Jan. 28 (-3.57)^2 = 12.76
Jan. 22 (5.43)^2 = 29.57 Jan. 29 (-2.57)^2 = 6.61
Jan. 23 (7.57)^2 = 57.33 Jan. 30 (8.43)^2 = 71.04
Jan. 24 (-10.57)^2 = 111.76 Jan. 31 (5.43)^2 = 29.57

• (4) Sum the squared deviations


  (x  x)
i
2
= 751.43

52
Variance – Review

• (5) Divide the sum of squares by (n-1) for a


sample

 i
( x  x ) 2
/( n 1)

= 751.43 / (14-1) = 57.8

• The variance of the Tmin data set (Chapel Hill)


is 57.8

53
Standard Deviation – Review
• Standard deviation is equal to the square root
of the variance

 (x  x)
i
2

s i 1
n 1
• Compared with variance, standard deviation
has a scale closer to that used for the mean and
the original data
54
Standard Deviation – Review
• (1) Calculate the mean
 x
• (2) Calculate the deviation for each value
 xi  x
• (3) Square each of the deviations
 ( xi  x ) 2

• (4) Sum the squared deviations


  i
( x  x ) 2

• (5) Divide the sum of squares by (n-1) for a sample


  i
( x  x ) 2
/( n 1)
• (6) Take the square root of the resulting variance
  i
( x  x ) 2
/( n  1) 55
Standard Deviation – Review

• (1) – (5)
 s2 = 57.8

• (6) Take the square root of the variance


 57.8  7.6
• The standard deviation (s) of the Tmin
data set (Chapel Hill) is 7.6 (°F)
56
Histograms – Review
• We may also summarize our data by constructing
histograms, which are vertical bar graphs
• A histogram is used to graphically summarize
the distribution of a data set
• A histogram divides the range of values in a data
set into intervals
• Over each interval is placed a bar whose height
represents the percentage of data values in the
interval.
57
Building a Histogram – Review
• (1) Develop an ungrouped frequency table
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33

11 3
14 1
18 1
19 1
22 1
25 2
27 2
29 1
30 1
33 1
58
Building a Histogram – Review
• 2. Construct a grouped frequency table
 Select a set of classes

11-15 4

16-20 2
21-25 3

26-30 4

31-35 1

59
Building a Histogram – Review
• 3. Plot the frequencies of each class

60
Skewness – Review
• Skewness measures the degree of asymmetry
exhibited by the data
• Positive skewness – More observations below
the mean than above it
• Negative skewness – A small number of low
observations and a large number of high ones
n

 (x  x)
i
3
For the example data set:
skewness  i 1
3
ns Skewness = -0.2069
61
Skewness – Review

Source: http://library.thinkquest.org/10030/3smodsas.htm

62
Skewness – Review
n
Mode
Median
 i
( x  x ) 3

skewness  i 1
3
Mean
ns

Skewness > 0 (Positively skewed)


63
Skewness – Review
n

 i
( x  x ) 3
Mode
skewness  i 1
ns 3 Median

Mean

A B

Skewness < 0 (Negatively skewed)


64
Skewness – Review

 i
( x  x ) 3

skewness  i 1
3
ns

Source: http://mathworld.wolfram.com/NormalDistribution.html

Skewness = 0 (symmetric distribution) 65


Skewness – Review

Skewness = -0.2069 (Negatively skewed) 66


Kurtosis – Review
• Kurtosis measures how peaked the histogram is
• Leptokurtic: a high degree of peakedness
– Values of kurtosis over 0
• Platykurtic: flat histograms
– Values of kurtosis less than 0
n

 (x  x)
i
4
For the example data set:
kurtosis  i
4
3
ns Kurtosis = 1.6891 > 0
67
Kurtosis – Review
n

 (x  x)
i
4

kurtosis  i
4
3
ns

68
Kurtosis – Review

Source: http://espse.ed.psu.edu/Statistics/Chapters/Chapter3/Chap3.html 69
Kurtosis – Review
• Platykurtic– When the kurtosis < 0, the
frequencies throughout the curve are closer to be
equal (i.e., the curve is more flat and wide)
• Thus, negative kurtosis indicates a relatively flat
distribution
• Leptokurtic– When the kurtosis > 0, there are
high frequencies in only a small part of the curve
(i.e, the curve is more peaked)
• Thus, positive kurtosis indicates a relatively
peaked distribution
70
Kurtosis – Review

Kurtosis = 1.6891 > 0 (Leptokurtic)


71
Covariance – Review
• Random variables X and Y have a joint distribution
• The covariance between X and Y is
1 n
  
Cov  X ,Y    X i  X Yi  Y   XY
n i 1
• The covariance is a measure of the linear association
between X and Y; its units are units of X  units of Y
• cov(X,Y) > 0 means a positive relation between X and Y
• If X and Y are independently distributed, then cov(X,Y) = 0
(but not vice versa!!)

72
An Example - Covariance
• The covariance between Test Score and STR is negative:

so is the correlation…
73
Correlation Coefficient – Review
• The correlation coefficient is defined in terms of the
covariance:

cov( X ,Y )  XY
corr  X ,Y    r XY
var( X )var(Y )  X  Y

• –1 < corr(X,Y) < 1


• corr(X,Y) = 1 mean perfect positive linear association
• corr(X,Y) = –1 means perfect negative linear association
• corr(X,Y) = 0 means no linear association

74
The correlation
coefficient
measures
linear
association

75
Exercise
Obs 1 2 3 4 5 6 7 8 9 10 11 12 13
wage 3.1 3.2 3 6 5.3 8.8 11 5 3.6 18 6.3 8.1 8.8
educ 11 12 11 8 12 16 18 12 12 17 16 13 12

Calculate : – Mean – Standard Deviation


– Median – Skewness
– Mode – Kurtosis
– Range – Covariance
– Interquartile Range – Correlation Coefficient
– Variance
76
Introduction to STATA
• Data input

• Manual data entry


• Copy Excel file and paste
• Import data

• Obtain statistics value

• Data transformation

77

You might also like