0% found this document useful (0 votes)
16 views

Lecture 4 - Ch4 - s2-5

This document contains lecture notes on engineering statistics. It discusses measurement error, including bias and random error. It then introduces the binomial distribution and how it can model counting experiments with a fixed number of trials having a constant probability of success. The notes provide examples of how to calculate probabilities using the binomial probability mass function. Finally, it discusses how the binomial distribution can approximate the normal distribution as the number of trials increases.

Uploaded by

fexiko9727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lecture 4 - Ch4 - s2-5

This document contains lecture notes on engineering statistics. It discusses measurement error, including bias and random error. It then introduces the binomial distribution and how it can model counting experiments with a fixed number of trials having a constant probability of success. The notes provide examples of how to calculate probabilities using the binomial probability mass function. Finally, it discusses how the binomial distribution can approximate the normal distribution as the number of trials increases.

Uploaded by

fexiko9727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

IE 228

Engineering Statistics
Lecture 4

Binomial and Normal Distributions


(Sections 4.2 & 4.5)
4-2

Topics to learn
1. Errors in measuring process, uncertainty, bias
2. Binomial distribution
3. Sample proportion and success probability
4. Normal distribution
5. Standard units and standard normal distribution

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-3

Errors in measuring process,


uncertainty, bias

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-4

Introduction
• Any measuring procedure contains error.
• Thus, measured values generally differ
somewhat from the true values that are being
measured.
• The errors in the measurements produce errors
in calculated values (like the mean).

Definition: When error in measurement


produces error in calculated values, we say
that error is propagated from the
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-5
Measurement Error
• A geologist weighs a rock on a scale and gets
the following measurements:
251.3 252.5 250.8 251.1 250.4
• These measurements differ from one another,
and it is unlikely that any of them is equal to
the true mass of the rock.
• The error in the measured value is the
difference between a measured value and the
true value.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-6
Parts of Error
• We think of the error of the measurement as being
composed of two parts:
– Systematic error (bias)
– Random error
• Bias is the part of the error that is the same for every
measurement.
For example, a scale that always gives you a reading that is too low.

• Random error is error that varies from measurement to


measurement and averages out to zero in the long run.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-7
Parts of Error

• Any measurement can be considered to be the


sum of the true value plus contributions from
each of the components of error:

Measured value = true value + bias + random error

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Two Aspects of the Measuring Process: 4-8

Accuracy and Precision


• We are interested in accuracy.
– Accuracy is determined by bias.
– The smaller the bias, the more accurate the measuring
process.
– If the bias is zero, the measuring process is said to be
unbiased.
• We are also interested in precision.
– Precision refers to the degree to which repeated
measurements of the same quantity tend to agree with
each other.
– If repeated measurements come out nearly the same
every time, the precision is high.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-9
More on Error
• A measured value is a random variable with mean 
and standard deviation .
• The bias in the measuring process is the difference
between the mean measurement and the true value:
Bias =   true value
• The uncertainty in the measuring process is the
standard deviation .
Uncertainty = standard deviation 
• The smaller the bias, the more accurate the measuring
process.
• The smaller the uncertainty, the more precise the
measuring process.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-10
More on Error

FIGURE 3.1
(a) Both bias and uncertainty are small.
(b) Bias is large; uncertainty is small.
(c) Bias is small; uncertainty is large.
(d) Both bias and uncertainty are large.

Note: We can estimate the uncertainty from the set of repeated


measurements, but without knowing the true value, we cannot
estimate the bias.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-11

Binomial Distribution and


Normal Distribution

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-12
Section 4.2:
The Binomial Distribution
If a total of n Bernoulli trials are conducted, and
 The trials are independent.
 Each trial has the same success probability p
 X is the number of successes in the n trials

then X has the binomial distribution with


parameters n and p, denoted X ~ Bin(n, p).

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-13
Example 4
A fair coin is tossed 10 times. Let X be the
number of heads that appear. What is the
Solution:
distribution of X ?

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-14

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-15

Another Use of the Binomial


 Assume that a finite population contains items
of two types, successes and failures, and that a
simple random sample is drawn from the
population.
 Then if the sample size is no more than 5% of
the population, the binomial distribution may
be used to model the number of successes. (*)

(*) Recall from the discussion of independence that, when


drawing a sample from a finite, tangible population, the sample
items may be treated as independent if the population is very
large compared to the size of the sample.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
In-class exercise 4-16

Example 5
A lot contains several thousand components,
10% of which are defective. Seven components
are sampled from the lot. Let X represent the
number of defective components in the sample.
Solution:
What is the distribution of X?

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-17

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-18
Binomial R.V.:
pmf
 If X ~ Bin(n, p), the probability mass
function (pmf) of X is
 n!
 p x
(1  p ) n x
, x  0,1,..., n
p ( x)  P ( X  x)   x !(n  x)!
0, otherwise

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-19

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-20

Solution:
We use the pmf Equation with n = 10 and p = 0.4.

The pmf is,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-21
Solution (cont.):

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-22

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-23

Solution (cont.):
Using the probability mass function,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-24

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-25

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-26

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-27

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-28

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-29

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-30
Binomial R.V.:
mean, and variance
If X ~ Bin(n, p)

 Mean: X = np

 Variance:   np (1  p )
2
X

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-31

More on the Binomial


• Assume n independent Bernoulli trials are conducted.
• Each trial has probability of success p.
• Let Y1, …, Yn be defined as follows: Yi = 1 if the ith
trial results in success, and Yi = 0 otherwise. (Each of
the Yi has the Bernoulli(p) distribution.)
• Now, let X represent the number of successes among
the n trials. So, X = Y1 + …+ Yn .

 This shows that a binomial random variable can be


expressed as a sum of Bernoulli random variables.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-32

Using a Sample Proportion to Estimate a Success


Probability

sample proportion:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-33

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-34

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-35
Uncertainty in the Sample Proportion
• It is important to realize that the sample
proportion is just an estimate of the success
probability p, and in general, is not equal to p.
• If another sample were taken, the value of
would probably come out differently. In other
words, there is uncertainty in .
• For to be useful, we must compute its bias
and its uncertainty.
Let n denote the sample size, and let X denote
the number of successes, where X ∼ Bin(n, p).
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-36
Uncertainty in the Sample Proportion
• The bias is the difference Since
, it follows (from Equation (2.41) (in Section 2.5))
that

• Since is unbiased; in other words,


its bias is 0.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-37
Uncertainty in the Sample Proportion
• The uncertainty is the standard deviation .
• From Equation (4.6), the standard deviation of X is

• Since p = X/n, it follows (from Equation (2.43) in Section


2.5) that

• In practice, when computing the uncertainty in , we


don’t know the success probability p, so we approximate
it with .

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-38
Estimate of p
If X ~ Bin(n, p), then the sample proportion is
used to estimate the success probability p.
Note:
 Bias is the difference
 is unbiased.
 The uncertainty in is

In practice, when computing , we substitute for p,


since p is unknown.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
In-class exercise 4-39

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-40
Solution (cont.):

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-41

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-42

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-43

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Section 4.5: 4-44

The Normal Distribution


 The normal distribution (also called the
Gaussian distribution) is by far the most
commonly used distribution in statistics.
 This distribution provides a good model for
many, although not all, continuous
populations.
 The normal distribution is continuous rather
than discrete.
 The mean of a normal population may have
any value, and the variance may have any
positive value.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
Normal R.V.: 4-45

pdf, mean, and variance


The probability density function of a normal
population with mean  and variance 2 is given
by 1
f ( x)  e  ( x   ) / 2 ,    x  
2 2

 2

If X ~ N(, 2), then 


the
X
 and variance of X
mean
are given by
X 
2 2

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


68-95-99.7% Rule 4-46

This figure represents a plot of the normal probability density function


with mean  and standard deviation . Note that the curve is
symmetric about , so that  is the median as well as the mean. It is
also the case for the normal population.
 About 68% of the population is in the interval   .
 About 95% of the population is in the interval   2.
 About 99.7% of the population is in the interval   3.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
Standard Units 4-47

• The proportion of a normal population that is


within a given number of standard deviations
of the mean is the same for any normal
population.

• For this reason, when dealing with normal


populations, we often convert from the units in
which the population items were originally
measured to standard units.

• Standard units tell how many standard


McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
In-class exercise 4-48

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-49

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-50
Standard Normal Distribution
 In general, we convert to standard units by subtracting the
mean and dividing by the standard deviation.
 Thus, if x is an item sampled from a normal population
with mean  and variance  2, the standard unit
equivalent of x is the number z, where
x
z

 The number z is sometimes called the “z-score” of x.
 The z-score is an item sampled from a normal population
with mean 0 and standard deviation of 1.
 This normal population is called the standard normal
population.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-51
Example 13
Aluminum sheets used to make beverage cans have thicknesses (in
thousandths of an inch) that are normally distributed with mean 10
and standard deviation 1.3.
A particular sheet is 10.8 thousandths of an inch thick. Find the z-
score.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-52

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-53
Example 13 cont.
The thickness of a certain sheet has a z-score of
1.7. Find the thickness of the sheet in the
original units of thousandths of inches.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-54

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-55

Finding Areas Under the Normal Curve

• The proportion of a normal population that lies


within a given interval is equal to the area
under the normal probability density above
that interval.
• This would suggest integrating the normal pdf,
but this integral does not have a closed form
solution.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-56

Finding Areas Under the Normal Curve


• So, the areas under the standard normal curve
(mean 0, variance 1) are approximated
numerically and are available in a standard
normal table or z table, given in Table A.2.

• We can convert any normal into a standard


normal so that we can compute areas under the
curve.

• The table gives the area in the left-hand tail of


the curve. Other areas can be calculated by
subtraction or by using the fact that the total
area under the curve is 1.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-57

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-58

Example 14
1) Find the area under normal curve to the left of z = 0.47.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-59

Example 14
2) Find the area under normal curve to the right of z = 1.38.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-60
Example 15
1) Find the area under the normal curve between
z = 0.71 and z = 1.28.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-61
Example 15
2) What z-score corresponds to the 75th percentile of a normal
curve?
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-62

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-63

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-64

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-65

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-66
Solution (cont.):
The following figure presents the probability density function of
the N(50, 52)

The shaded area represents P(42 < X < 52), the probability that a
randomly chosen battery has a lifetime between 42 and 52 hours.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-67

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-68

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-69
Solution (cont.):
Therefore, the 40th percentile of battery lifetimes is 48.75

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-70

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-71

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-72

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-73

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-74

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-75

Using 2.51 and its z-score 1.96:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-76

Estimating the Parameters


 If X1,…, Xn are a random sample from a N(, 2)
distribution,
•  is estimated with the sample mean and
•  2 is estimated with the sample standard deviation.

 As with
X is  /anyn sample mean, the uncertainty
s / n in
which we replace with , if  is
unknown.
• The mean is an unbiased estimator of .
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-77
Linear Functions of Normal Random
Variables
Let X ~ N(, 2) and let a ≠ 0 and b be constants.
Then 2 2
𝑎𝑋 +𝑏 𝑁 (𝑎𝜇+ 𝑏 , 𝑎 𝜎 )
Linear Combinations
Let X1, X2, …, Xn be independent and normally distributed with
means 1, 2,…, n and variances .

Let c1, c2,…, cn be constants, and c1 X1 + c2 X2 +…+ cnXn be a


linear combination. Then

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-78
Example 16
A chemist measures the temperature of a
solution in oC. The measurement is denoted C,
and is normally distributed with mean 40 oC and
standard deviation 1oC. The measurement is
converted to oF by the equation F = 1.8C +
32. What is the distribution of F?

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-79

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-80

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Distributions of Functions of 4-81

Normals
Let X1, X2, …, Xn be independent and normally distributed with
mean  and variance  2. Then

2
σ
Let X and Y be independent, with X ~ N(X, X ) and
2
Y ~ N(Y, σ Y ). Then

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-82
How Can I Tell Whether My Data Come from
a Normal Population?
• In practice, we often have a sample from some population, and
we must use the sample to decide whether the population
distribution is approximately normal.
• If the sample is reasonably large:
– the sample histogram may give a good indication
– histograms look something like the normal density function
(peaked in the center, and decreasing more or less
symmetrically on either side)
– Probability plots provide another good way of determining
whether a reasonably large sample comes from a
population that is approximately normal.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-83
How Can I Tell Whether My Data Come from
a Normal Population?
• For small samples, it can be difficult to tell whether the normal
distribution is appropriate.
• One important fact is this: Samples from normal populations
rarely contain outliers. Therefore the normal distribution
should generally not be used for data sets that contain outliers.
• This is especially true when the sample size is small.
– Unfortunately, for small data sets that do not contain
outliers, it is difficult to determine whether the population
is approximately normal.
– In general, some knowledge of the process that generated
the data is needed.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


End of Lecture

You might also like