Ece 069 Complete
Ece 069 Complete
Productivity Tip: “The best things in life make you sweaty.”– Edgar Allan Poe
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Did you ever ask yourself why you must bet for your lotto games in an official outlet ? One of the
sources of our government’s fund in health care is from the income coming from the lotto games. Swertres
Lotto remember is an arrangement of 3 digits with the digits being repeated, hence out of this set – up,
there will be 1,000 3-digits numbers could be generated. Note that if you win by betting P10 to certain
number and if this number wins then you also win by P 6,500. Meaning if you bet to all the possible 3-
digits number then you are sure to win, you will win P 6,500 out of your P10,000. Wow, thank you for
helping the charity projects of our government.
Fill in the first column of what you know to answer the questions on the second column of the table below.
What is permutation?
What is combination?
B. MAIN LESSON
If some procedure can be performed in n1 different ways, and if, following this procedure, a second
procedure can be performed in n2 different ways, and if, following this second procedure, a third
procedure and be performed in n3 different ways, and so forth; then the number of ways the procedures
can be performed in the order indicated is the product n 1 n2 n3 . . .
Example. Suppose a car number plate contains three distinct English letters followed by three non-
repeated digits. How many different car number plates can be printed?
Let the boxes below contain the 3 distinct letters followed by the 3 non-repeated digits.
This box could This box could This box could This box could This box could This box could
be filled in be filled in be filled in be filled in be filled up in be filled up in
26 25 24 10 9 8
different ways different ways different ways different ways different ways different ways
Note that there are 26 letters in the English alphabet and there are 10 digits in our number system, so
the first box could be filled in 26 different ways, and since the 3 letters used are distinct, hence the
succeeding 2 boxes could be filled in 25 and 24 different ways, respectively. Then the 4 th box could be filled
in 10 different ways and again, since the digits should not be repeated, then the succeeding boxes could be
filled in 9 and 8 different ways, respectively. Therefore, the total number of car number plates that
could be printed in this set – up is, ( according to the Fundamental Principle of Counting) 26 x 25 x
24 x 10 x 9 x 8 = 11,232,000.
FACTORIAL NOTATION, N !
N ! = 1 x 2 x 3 x 4 x … x ( n – 2 ) (n – 1) n
0! = 1
1! =1
Example :
4 ! = 1 x 2 x 3 x 4 = 24
6! = 6 x 5 x 4! = 6 x 5 x 24 = 720
PERMUTATION
An arrangement of a set of n objects in a specified order. Permutation of n objects taken r at a time is,
n n!
P ( n, r ) P
r
( n r )!
Example. Find the number of permutations of 6 objects, say , a , b, c, d, e, f, taken three at a time. So,
using your formula, the number of permutations or say number arrangements of these 6 letters taken 3 at
a time is,
6!
P (6,3) 120 . Thus, you can arrange these 6 letters in 120 different ways.
(6 3)!
The number of permutations of n objects of which n1 are alike, n2 are alike, . . . nr alike is,
n!
n1! n2 ! n3!....nr !
Example. How many ways could you arrange the letters of the word P R O B A B I L I T Y ?
Note that some of the letters of the word PROBABILITY are repeated.
Let: n = the total number of letters to be arrange = 11
n1 = the number of letter “P” = 1
n2 = the number of letter “R” = 1
n3 = the number of letter “O” = 1
n4 = the number of letter “B” ( repeated) = 2
n5 = the number of letter “A” = 1
n6 = the number of letter “I” ( repeated) = 2
n7 = the number of letter “L” = 1
n8 = the number of letter “T” = 1
n9 = the number of letter “Y” = 1
So, the total number of ways to arrange the letters of the word P R O B A B I L I T Y is 9, 979,200 ways.
n! 11!
= 9,979,200.
n1!n2 !n3 !....nr ! 1!
1!1!2!1!2!
1!1!
1!
COMBINATION
An arrangement of a set of n objects where order does not count. This means the arrangement of the 3
letters (a, b, c) is the same arrangement as ( b, c, a) or (c, a, b) or ( a, c, b ) or ( b, a, c ) or (c, b, a ). So, as
long as the elements in the arrangement are the same, then with respect to combination this will mean
one arrangement only, however, with respect to permutation this will mean 6 different arrangements
since the order will matter.
n P ( n, r ) n!
C ( n, r ) C
r
r! r!( n r )!
Example. Find the number of combination of 6 objects, say a, b, c, d, e, f, taken three at a time.
Using your formula, you may arrange the letters in any order in 20 ways.
6 P (6,3) 6!
C (6,3) C
3
= 20
3! 3!(6 3)!
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
b. How many ways are there to select 3 candidates from 8 equally qualified recent graduates for
opening in an accounting firm?
c. A teacher forms a committee whose members come from her class consisting of 18 boys and 15
girls. How many committees are formed consisting of 5 members of which 3 members are girls
and 2 members are boys?
What is permutation?
What is combination?
C. LESSON WRAP-UP
You can readily use your scientific calculator to calculate the number of arrangements for permutation and
combination problems.
For permutation calculations you will use this nPr key of your calculator.
For example if you want to calculate P( n, r ), and let n =10 and r = 5 , then with your calculator
Press 10 then the key nPr then press number 5 then the equal sign. The number
30,240 is the number of permutations.
For combination calculations you will use this nCr key of your calculator.
For example if you want to calculate C( n, r ), and let n =10 and r = 5 , then with your calculator
Press 10 then the key nCr then press number 5 then the equal sign. The number 252 is
the number of combinations.
FAQs
What are the other applications of The Fundamental Principle of Counting, permutation and combination
concepts?
The Fundamental Principle of Counting, permutation and combination concepts are needed in solving
probability problems.
KEY TO CORRECTION
Activity #3
15!
a) P (15,5) = 360,360 ways
(15 5)!
8!
b) C (8,3) = 56 ways
3!(8 3)!
c) Total number of committees formed = C( 15,3) × C (18, 2 ) = 69,615
d) Total number of plans available = 5 × 3 × 2 × 2 = 60.
Activity #5
a) permutation problem
b) combination problem
c) combination problem
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Probability is be used to describe the likelihood of an event to happen. When probabilities are used to
describe the occurrence on a particular event, then you are projecting the likelihood of that event to happen.
For example, when a classmate states "I think the probability of quiz tomorrow is about 40%" they are
describing what they think is the probability of that particular event.
Probability is a measure of certainty about a certain outcome. For instance, if we toss a coin, we expect it to
end up heads half the time. When we roll a die with 6 numbers, we expect to get a 6 one time out of six.
The probability of a coin coming up heads is 0.5 and the probability of a die coming up 4 is 1/6. Something
that it certain has a probability 1, whereas something that is impossible has a probability 0.
Fill in the first column of what you know to answer the questions on the second column of the table below.
B. MAIN LESSON
DEFINITION OF TERMS
SAMPLE SPACE
The set of all possible outcomes of a statistical experiment is called a sample space,( represented by the
symbol S ). Each outcome in a sample space is called an element or a member of the sample space, or
simply a sample point.
Example 1. Consider an experiment of tossing a dice. If we are interested in the number that shows on the
top face, then the sample space would be S = { 1, 2, 3, 4, 5, 6 }.
EVENT
An event is a subset of a sample space. For example if event A is the outcome when a die is tossed is
divisible by 3, then event A will be the set, A = { 3,6 }.
PROBABILITY
If an experiment can result in any one of N different equally likely outcomes, and if exactly n of these
outcomes correspond to event A, then the probability of event A is
P(A) = n / N
Example. Calculate the probability of getting a Jack from 1 draw of a well shuffled deck of cards.
Note that there are 4 Jacks ( Jack of Hearts, Clubs, Spades, Diamonds) in a deck of cards and since the
total number of cards in the deck is 52, then using the equation above so the probability of picking a jack
from the deck is 0.0769.
The probability of getting a jack = number of jack in the deck = 4 / 52 = 0.0769.
total number of cards in the deck
Properties Of Probability
P(A) - the probability of the event A
P(S) - the probability of the sample space
1) Positiveness. For every event A, 0 ≤ P (A) ≤ 1 . This means that probability of an event to
happen is always positive.
2) Probability of sure event, P(S) = 1
3) If is the empty set , the P() = 0
Let E be the event that at least 5 cars are serviced. P(E) = 1- P(Ec), where Ec is the
event that fewer than 5 cars are serviced.
c
P(E ) = 0.12 + 0.19 = 0.31, so P(E) = 1 - 0.31 = 0.69
4) If A1 , A2 , . is a sequence of mutually exclusive events, then P( A 1 U A2 U … ) = P(A2) + P (A2) + ....
Example : If the probabilities are, respectively, 0.09, 0.15, 0.21, and 0.23 that a person purchasing a
new automobile will choose the color green, white, red, or blue, what is the probability
that a given buyer will purchase a new automobile that comes in one of those colors?
Let G, W, R, and B be the events that a buyer selects, respectively, a green, white, red, or
blue automobile. Since these four events are mutually exclusive, the probability is
P(G U W U R U B) = P(G)+ P (W )+ P (R)+ P (B)
= 0.09 + 0.15 + 0.21 + 0.23 = 0.68
Conditional Probability
Conditional probabilities are calculated when we need to know the likelihood of event A happening given
that event B has already happened. We say that event A is conditional on event B. Conditional
probabilities don't have a keyword, they have a key-symbol (|). Conditional probabilities are written p(A|B),
which can be read "The probability of A given B".
Let A be an arbitrary event in a sample space S with P (E) > 0. The probability that, an event A occurs
once E has occurred or, in other words, the conditional probability of A given E, written P (A / E), is
defined as follows :
P (A / E) = P (A ∩ E) = number of elements in A ∩ E = number of ways A and E can occur
P(E) number of elements in E number of ways E can occur
Example. Find the probability of drawing a 4 from a shuffled deck of cards given that you have already
drawn a 7 from the deck. If you have drawn the 7 then only 51 of the 52 cards are available so the
probability would be calculated like this:
p(4 | 7) = 4\51 = 0.0784.
INDEPENDENT EVENTS
Two events A and B are independent if and only if P(A / B) = P (A) and P (B /A ) = P (B).
So that , P (A ∩ B) = P(A ) · P (B)
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).
a) Two cards are drawn at random from an ordinary pack of 52 cards. Find the probability that (i) both are
spades and (ii) one is a spade and one is a heart.
b) Three light bulbs are chosen at random from a box containing 15 bulbs of which 5 are defective. Find
the probability the (i) none is defective, (ii) exactly one is defective, and (iii) at least one is
defective.
d) a white marble appears in drawing s single marble from an urn containing 4 white, 3 red and 5 blue
marbles.
C. LESSON WRAP-UP
Skill in solving probability problems allow you to predict the possibility of the occurrence of an event. Do
you have in mind of a situation wherein you want to be in the future? What do you think is your probability
of success?
KEY TO CORRECTION
Activity #3
a)
i) P (both cards are spade) = number of ways to choose 2 spade cards out of the 13 spade cards in the deck
total number of ways to choose 2 cards from the deck of cards
= 13 C 2 = 1 / 17
52 C 2
ii) P (one is a spade and 1 is a heart) = 13C1 · 13C1 = 0.1274
52 C 2
Note : 13 C 1 is the number of ways to choose one spade card out of the 13 spade cards and this
also the number of ways to choose one heart card out of the 13 heart cards in the deck.
b)
i) P(none is defective) = number of ways to choose 3 non defective bulbs out of a total of 10 non defective bulbs
total number of ways to choose 3 bulbs from the box containing 15 bulbs
= 10 C 3 = 0.2637
15 C 3
Note : 5 C 1 is the number of ways to choose one defective bulb out of a total of 5 defective bulbs
10 C 2 is the number of ways to choose 2 non defective bulb out of a total of 10 non defective
bulbs.
iii) P(at least one is defective) = P(1 is defective & 2 are not defective) +
P (2 are defective & 1 is not defective) + P (3 are all defective
= 5C1·10C2 + 5C2·10 C 1 + 5 C 3 = 0.7363
15 C 3
c
Note : P(at least one is defective) = P(none is defective), so from the complement property of
c
probability, P(at least one is defective) = 1 - P(at least one is defective)
= 1 - P(none is defective)
= 1 - 10 C 3 = 1 – 0.2637 = 0.7363
15 C 3
Activity #5
a) P(even numbers in a toss of a fair dice) = 3/6 = 0.5
b) P (a king appears in drawing a single card from an ordinary pack of 52 cards) = 4 / 52
c) P( at least one tail appears in the toss of three fair coins) = 7/8
d) P(a white marble appears) = 4/12
Productivity Tip:
“Never give up on a dream just because of the time it will take to accomplish it. The time will pass anyway.”– E. Nightingale
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Once your have lay down your research problem, one question you may raise is, where could you acquire
your data. Now, you have to clearly define your target population. Population is the source of all the
possible data that will you need in your study. Well, the data from the population will give a more accurate
answer to your research. However in some studies, it is impossible or so expensive to acquire the
complete data for your study, so, you may take data from the sample coming from your target population.
B. MAIN LESSON
There are several methods of determining the sample size of your study. To simplify our discussion, let us
use a sample size of about 20% of the entire population or you may use the Slovin’s formula.
SAMPLE
SAMPLE
POPULATION
SAMPLE
SAMPLE
SAMPLE
SAMPLING METHODS
Example. For example in your study, you want to know the average weight of the third year engineering
students of your school, and you know that the total number of students in this study is N=150.
Considering that you are to conduct the data collection in one day only, so you may represent the average
weight from samples coming from the population, and then let us say, your sample size is n=30. Then how
will you choose your sample using the different methods of sampling.
SIMPLE RANDOM SAMPLING. A simple random sample is one in which each element of the population
has an equal and independent chance of being included in the sample. Here, each of the 150 members of
the population is assigned to a number from 1 to 150. Then, to choose the 30 samples, you may generate
30 random numbers from 1 to 150, and the number being generated is matched to the numbers assigned to
each member of the population, and the matched members are considered your sample.
SYSTEMATIC SAMPLING. Again, you assign a number to every member of the population then sort
them according to their assigned number. The samples in your study will be every ( N/n = 150 /30 ) 5 in
the sorted members of your population.
STRATIFIED SAMPLING. Here, the population is grouped according to strata ( grouping according to
similarities) and from each strata, a total of 30 samples are randomly chosen. In our problem above, you
may group your population according to male or female or you may group the population according to
similarity of courses. Note that the criteria in grouping the population according to strata should be relevant
to your study. Then from the grouping, you may choose randomly and proportionately a total of 30
students as your sample.
CLUSTER SAMPLING. In cluster sampling, you will group the population. The grouping may be
according to geographical location. For example, if there are 2 campuses of your school and the distance
between these campuses is remarkable, then, your grouping is according each of the campus. Then,
from the 3rd year engineering students from each of the campuses, the 30 samples are chosen at random
and proportionately.
MULTISTAGE SAMPLING. This is a complex form of cluster sampling. Here, the population is grouped,
the first grouping is similar to the to that of the cluster sampling, and out from this group are subgroups,
and so forth, then, the samples are taken randomly and proportionately from the final stage of the
grouping process.
CONVENIENCE SAMPLING. Here, samples are not selected at random. The 30 samples are selected
based from the convenience or based from who are favourable to the researcher.
QUOTA SAMPLING. Here, the population is grouped similar to the stratified sampling but the samples
taken from each group is not randomly selected.
JUDGMENT SAMPLING. Here, the sample is selected in selected which depends entirely on judgment of
the researcher.
SNOWBALL SAMPLING. This sampling method is usually used when the data from your samples are
rare to find. Here, you may request your existing sample to provide your with some referrals. Since the
data collected are based from referrals, this sampling method is also called chain – referral sampling.
The mean as well as the standard deviation of the sample is used also to estimate the population mean
(µ) and population standard deviation ( σ ).
Mean, x , where x
x
n
Median – the middle item in the data set after sorting,
Mode – the most frequent item in the data set.
Measures of Variability
Range – the difference between the highest and lowest value in the data set.
Variance , s 2 – the average on the difference between each value and the mean of the data set.
2 ( x x) 2
s .
n 1
Standard Deviation , s – the square root of the variance.
Example. A tire manufacturer tested the life, in months, of 6 randomly chosen tire samples. The test
recorded as follows: 48 53 45 61 57 61
From this data, find the mean life, median, mode, range, variance and the standard deviation.
Mean. x
x (48 53 45 61 57 61) = 54.2
n 6
Median. When n = even number ; median is the average of the two middle items after sorting.
45 48 53 57 61 61
s2
( x x) 2
(45 54.2) 2 (48 54.2) 2 (53 54.2) 2 (57 54.2) 2 (61 54.2) 2 (61 54.2) 2
n 1 (6 1)
s2 = 45
Standard Deviation , s 45 6.7
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)
A. Let us say that you want to have a knowledge of the average final rating of the students in college
(college of Engineering) in the subject Differential Equation from the school year 2015 to 2019. Please
discuss on how your will choose your sample using the following techniques:
a. Simpler random sampling
b. Systematic sampling
c. Stratified sampling
d. Convenience sampling
B. From this given set of data, 4, 9, 3, 6 & 4 find the mean, median , mode and the standard deviation.
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
.
FAQs
In what data collection method is sampling used?
Sampling is usually used when you conduct the survey method of data collection
KEY TO CORRECTION
Activity #3
A)
Identify all students enrolled in Differential equation from your college for SY 2015 to 2019.
Then have a count of these students, since these will constitute your population.
Compute for the sample size. Your may use the Slovin’s formula at e = 0.05.
Assign a number to each member of the population.
a. In simple random sampling, generate n ( sample size) random numbers. Each random number
generated is matched to the member of the population of that assigned number. This will be
your sample for your study.
b. In systematic sampling, you sort the members of your population according to increasing or
N th
decreasing number tagging. Then, every member of your population are samples for
n
your study.
c. In stratified sampling you may group your population according to school year, then, from each
groupings, a random and proportionate n samples are chosen.
d. In convenience sampling, you may choose, at your convenience, n members of your population
as your samples
B)
Mean, x
x (4 9 3 6 4) 5.2
n 5
Median = 4
Mode = 4
Standard deviation
s
( x x) 2
(4 5.2) 2 (9 5.2) 2 (3 5.2) 2 (6 5.2) 2 (4 54.2) 2
2.39
n 1 (5 1)
Activity #5
a) POPULATION
b) SAMPLE
c) 0.05 and 0.01
d) simple random
e) 286
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Data collection requires careful preparation. The first question is "Why am I doing this research?".
Research is process of systematic inquiry that seeks answers to a problem. In your specific engineering
profession, you may be confronted with several problems, problems which could readily be answered
through thorough research. Hence, to answer your inquiry through research, you will pose first your
research problem. Your research problem may be to design electronic circuits to trace cyber crimes? Or,
to improve existing solar panel design? Once the research problem is being set – up, then you may now
outline or design on how to carry out your research, from here, this leads you to data collection.
Fill in the first column of what you know to answer the questions on the second column of the table below.
What is statistics?
B. MAIN LESSON
Statistics is a science which deals with data. It deals with the systematic collection, presentation, analysis
and interpretation of numerical data. Hence, statistics is a very useful tool in fields of studies which deals
with intelligent decision-making processes.
There are two categories of statistics, these are:
a) DESCRIPTIVE STATISTICS – refers to the collection and presentation of data.
b) INFERENTIAL STATISTICS – refers to the analysis and interpretation of data.
STATISTICS is a systematic . . .
COLLECTION PRESENTATION ANALYSIS INTERPRETATION
OF OF OF OF
DATA DATA DATA DATA
Data are information, which are usually facts or numbers collected to answer research problems or
research investigations. There are two types of data and these are :
a) PRIMARY DATA – these are collected by the researcher and used for the first time in his investigation
or research. This data is originally obtained by the researcher through surveys, interviews, documents,
experimentation and direct observation. Primary data is more costly to obtain than secondary data.
SURVEY THROUGH The interviewer asks questions to the interviewee or the respondent to collect data.
INTERVIEW The questions and responses during an interview may be oral or verbal.
SURVEY THROUGH Questions are typed or written down and sent to the respondents to give responses.
QUESTIONNAIRES After giving the required responses, the questionnaire is given back to the researcher.
The reseacher collects document to extract some data. For example, if the reistrar's
DOCUMENTS office woould like to tabulate the address of the students, then enrollment
forms(documents) will be used to extract the data.
b) SECONDARY DATA – these data are collected by others and used by others. Sources of secondary
data includes books, personal sources, journal, newspaper, website, government record etc. Secondary
data are known to be readily available compared to that of primary data. While it is advisable to use
primary data, but in cases where primary data is so expensive to obtain, then we prefer to use
secondary data in our research.
BOOKS
WEBSITES/
Sources of
NEWSPAPER
INTERNET SECONDARY
DATA
RADIO/ TV
REPORTS
Data collection is one of the stages in conducting a research. It is the process of collecting or gathering
data or information to find answers to a research problem. Data may be coming from the population (
which refers to all the data required in an investigation or research) or coming from the sample (refers to a
part of the population). Furthermore, the research problem will give idea to the researcher as to what type
of data should be collected, be it primary data or secondary data.
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)
A. Fill in the blank to complete the following sentences. The choices are found below
Choices :
primary data secondary newspapers experiments
a. ___________ are information, which are usually facts or numbers collected to answer research
problems or research investigations.
b. If you are the researcher and ____ ___________ data are collected by the researcher themselves
for their own study.
c. ___ ____ ___ data which are already collected by other researchers and you use then for your
study.
d. Data from ____ _____________ is an example of primary data.
e. Data from _____ _______________ is an example of secondary data.
B. Group yourselves by 5. Then, as a team (teamwork), discuss a possible research problem that could be
answered within 10 minutes without going out of the classroom, then conduct the data collection from at
least 20% of the population. A sample problem could be... I would have the knowledge as to the
weight of my classmates in this class. So you may use survey through interview or survey through
questionnaire to get the weight of at least 20% of the total number of your classmates. (Note : The
research problem should be approved by the teacher. Answer may vary depending on their research
problem).
What is statistics?
b. I would like to have a knowledge of the final grades in school of my 3 brothers and 1 sister.
c. I would like to be updated daily of the world’s cases of covid 19 from April 2020 to July 2020.
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
KEY TO CORRECTION
Activity #3
A) a) data
b) primary
c) secondary
d) experiments
e) newspaper
Activity #5
a) Answer. An experiment may be done to collect the data leading to the identification of which
machine is more efficient.
b) Answer. The data use to know the final grade may be coming from the report cards (primary data)
of the siblings or the grades may be downloaded from internet ( secondary data).
c) Answer. The data use may be coming from internet ( secondary data) or from newspaper
(secondary data).
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Once data are collected, they may are summarized, hence, they should be presented in a meaningful
manner. There are number of options available while presenting data. Hence, a basic understanding of
the desired result/ form is helpful to choose the correct form of representation. There should be enough
sample available so as to get some meaningful analysis and result.
Fill in the first column of what you know to answer the questions on the second column of the table below.
What is a histogram ?
B. MAIN LESSON
Once data are collected, they may are summarized, hence, they should be presented in a meaningful
manner.
Example 1. The manager of a department store currently receives customer feedbacks saying that
customers have a long waiting time in being served by the sales representatives. The manager do
some observations of the waiting time of 20 customers and listed down the following observations.
In Textual presentation of data, the data is presented in the form of words, sentences and
paragraphs. Textual presentation is used by researchers to present qualitative data which cannot be
presented in graphical or tabular form.
From the example, state in words how you describe the data. Here, you may describe the data set as,
“The 20 observations give a minimum waiting time of 31 seconds and a maximum waiting time of 54
seconds. The average ( mean ) waiting time is 41.65 seconds. Most of the customers wait for 46
seconds.“
Here, in tabular presentation, the data is arrange in columns and rows, and position the data to
facilitate comprehension and understanding. In, other words, the data is presented in a meaningful
table.
30 - 35 3 15
36-40 5 25
41-45 6 30
46-50 5 25
51-54 1 5
We also call Table 1 as the frequency distribution table of the waiting time of twenty department store
customers. The table shows that most of the customers wait for 41 to 45 minutes, since the waiting time
gives the highest percentage in this time range. Furthermore, the data shows that only 5% wait for more
than 51 seconds.
Example 2.
Graphic representation is another way of analyzing numerical data. A graph is a sort of chart through
which statistical data are represented in the form of lines or curves.
Line graph
The simplest method of graphical presentation.
The data is represented in the form of straight lines.
Each line and corresponding heights represent an observation and its and height represents a
magnitude.
The distance between line is uniform.
Bar graph
Presents grouped data with rectangular bars whose height is proportional to the size of each
group.
The width of the bars and the space between them are kept constant.
The independent variable is shown on the x – axis and the dependent variable is shown on the
y-axis.
Pie Chart
A circular statistical graph., which is divided into slices to illustrate numerical proportion.
In a pie chart, the arc length of each slice, is proportional to the quantity it represents.
Figure a
Figure b
Figure c
Histogram
A graphical representation that organizes a group of data points into user-specified ranges.
It is similar in appearance to a bar graph.
The histogram summarizes the data set into an easy visual interpretation. In a histogram, the
y – axis represents the frequency ( the number of counts or percentage of occurrence of the
data in the set ).
The x – axis represents the outcomes.
Approximates the distribution of a numerical data.
Divide the entire range of values ( outcomes ), the bin, into series of equal intervals.
Count t how many values ( the frequency) fall in to each interval. There is no overlapping of the
intervals.
Plot the frequency versus the equal interval using rectangles ( height is proportional to frequency)
with no spaces between rectangles.
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).
A researcher is conducting an experiment to determine the life of the car batteries that their company is
producing. The following 25 car battery life (in years ), were observed.
What is a histogram ?
Figure 1 Figure 2
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
You can use the data analysis tool pack of microsoft excel to plot your histogram.
Follow the steps below:
Open you excel window then enter the your data in one column ( Column A) and then enter on another
column the lower class interval of your data ( column C). You may use the data of example 1.
On the Data tab, in the Analysis group, click Data Analysis.
Note : Install the Analysis ToolPack add –in if the Data Analysis button is not found.
Select Histogram and click ok.
Select the input range ( Column A) and then select the bin range( column C).
Click the output range ( where you histogram output is printed)
Check Chart Output.
Click Ok.
Properly label the graph.
To remove the space between the bars, right click a bar, click Format Data Series and change the Gap
Width to 0%.
KEY TO CORRECTION
Activity #3
a)
Table a. Frequency Distribution Table of the Life of Car Batteries
x, Frequency , f Percentage
Life of Car Batteries ( years ) ( number of cars )
b)
Activity #5
1) textual
2) frequency distribution
3) line graph
4) histogram
5) 1
Productivity Tip: Anyone who has never made a mistake has never tried anything new. – A. Einstein
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Random experiment is an experiment whose outcome cannot be predicted with certainty, before the
experiment is run. Let us recall the concept of the sample space. Sample Space (also called possibility
space ) of an experiment is the set of all possible outcomes or results of that experiment. A sample
space is usually denoted using set notation, and the possible outcomes are the listed as the elements of
the set.
Fill in the first column of what you know to answer the questions on the second column of the table below.
B. MAIN LESSON
Random experiment is an experiment whose outcome cannot be predicted with certainty, before the
experiment is run.
Sample Space ( also called possibility space ) of an experiment is the set of all possible outcomes or
results of that experiment. A sample space is usually denoted using set notation, and the possible
outcomes are the listed as the elements of the set.
Random variable is a function that assigns a real number to each outcome in the sample space of a
random experiment.
Example.
Tossing a coin twice is a random experiment since you cannot predict the outcome with certainty.
If we let S be the sample space, then S = { HH, HT, TH, TT}. We call HH, HT, TH, TT, as the
elements of the sample space. In statistics we term each of these elements as the sample point.
If we let X = as the outcome that a head appears, then X will take the values 0, 1 or 2. Meaning
when X = 0, the outcome is TT, or if X= 1, the outcomes are HT and TH, and lastly, if X = 2, the
outcome is HH. Here, X is a random variable. We may write the random variable as, X = ( 0, 1, 2 ).
Experiment
Outcome HH HT TH TT
(Sample Point )
X 2 1 1 0
Random Variables
Example 1: The tossing of coin, as in the above experiment, has countable outcomes. Hence,
the random variable X is a discrete random variable.
Example 2: If we let X be the height of students in the our class, then X may take uncountable
outcomes, that is, X may be equal to 156.0 cm or 156.1cm or 156.11 cm , . . . . .
Here X is a continuous random variable.
P( X x) ; x S
f ( x)
0 ;xS
From example 1.
Experiment Outcome
(Sample Point ) HH HT TH TT
X 2 1 1 0
Probability Distribution
Tabular
x 0 1 2
f(x) = P(X=x) 1/4 1/2 1/4
Graphical ( histogram )
Continuous Random Variable. Let X be a continuous random variable, the probability density function
(pdf) of X is a function f (x) such that for any two numbers a and b, where a ≤b ; is
The mean, variance and standard deviation of Random Variables & Samples
The mean is the average of the values of the random variables. The mean of a random variable is
also called as the expected value, E(x) or μ.
The standard deviation is denoted by σ. It is the positive square root of the variance. The standard
deviation is measured in the same units as the random variable and the variance is measured in
squared units, hence, the standard deviation is often the preferred measure.
Example 3. The discrete random variable X has the following probability distribution.
x 15 20 25 30
f (x) 0.30 0.25 0.28 0.17
What is the expected value, the variance and the standard deviation of the random
variable X?
Mean
E ( x) x f ( x)
E ( x) x f ( x) (25) (0.30) (20) (0.25) (25) (0.28) (30) (0.17) 24.6
Variance
Var(x) 2 ( x ) 2 f ( x)
2 (15 24.6) 2 (0.30) (20 24.6) 2 (0.25) (25 24.6) 2 (0.28) (30 24.6) 2 (0.17) 37.94
Standard Deviation
2 37.94 6.16
Example 4. That is the expected value of the continuous random variable X given its probability
distribution.
1 x ≥1
;
f ( x) x 4
0 ; otherwise
Mean
E ( x) x f ( x)dx
1 x 2
1
E ( x) x (0)dx x 4 dx 0.5
1
x 2 1
Variance
Var( X ) 2 ( x )2 f ( x)dx E ( X )2 E ( X 2 ) 2
1 x 1
E ( X ) x f ( x)
1
2 2
x (0)dx
2
x 4 dx
2
1.0
1
x 1 1
2 E ( X 2 ) 2 1.0 (0.5) 2 0.75
Standard Deviation
2 0.75 0.87
2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
A. Let X be the number of candies inside a 1-pound candy box. If in a randomly selected boxes, the
number of candies has the following distribution.
B. Find the mean and the variance of the continuous random variable, X , having the following distribution;
x ; 0 ≤ x ≤ 1
f ( x) ;
0
otherwise
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
The descriptive statistics of the population, i.e. , the µ, σ . . . are termed as the parameter and the
descriptive statistics of the sample, i.e. , the x , s2 . . . are termed as the statistic.
KEY TO CORRECTION
Activity #3
a) The average number of candies = E(X)
E ( x) x f ( x) (100) (0.03) (105) (0.91) (110) (0.06) 105.2 106
b)
1
x31 1
Mean, E ( x) x f ( x)dx x xdx
0 3 3
0
Variance, E( X )
2 2 2
1
x4 1
E ( X 2) x 2 f ( x) x 2 ( x)dx
1
0 4 4
0
2
1 1 1
2 E( X 2 ) 2
4 3 12
Activity #5
1) µ
2) variance of the population
3) discrete random variable and continuous random variable
4) a
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Recall some of the data collection methods you have done in the past days. Please identify at least 2
methods and beside it write your purpose why you collected the data. ( answers may vary )
Fill in the first column of what you know to answer the questions on the second column of the table below.
B. MAIN LESSON
Survey
could be through . . .
One method of conducting research is through survey. Data collected from survey are primary data and
may also be a quantitative data. Survey may be in the form of oral interview or through the use of
questionnaires. Questionnaire is a paper-and-pencil instrument that is administered to the respondents.
Between the two broad types of surveys, interviews are more personal. Questionnaires do not provide the
freedom to ask follow-up questions to explore the answers of the respondents. In an interview, two
persons - the researcher as the interviewer, and the respondent as the interviewee. Interview may be done
through personal or face-to-face interview, a phone interview, or an online interview.
The success of a survey starts with a detailed planning. Before you conduct a survey, you need to identify
the purpose of the survey, the goals and objectives, the detailed questions, and other important details
included in utilizing the survey method.
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) Group yourselves by 5.
Instruction to the group. In the given problem, follow the steps above to conduct the survey.
What are the 2 difficult math subjects of the 2nd and 3rd year engineering students of my school?
Come up with a research problem and plan the data collection through survey using the steps outlined
above.
C. LESSON WRAP-UP
Have you answered some surveys already? Enumerate at least 3 and determine whether the survey is
done through a questionnaire or an interview. (Answers may vary)
Examples: 1. Student evaluating their teachers. - questionnaire
2. Students giving feedback to the office staff. - questionnaire
3. Teachers requiring the students to answer orally some questions after each class
discussion. - interview
KEY TO CORRECTION
Activity #3
Note: answers may vary.
Step 1. What is the purpose of the survey?
This study may be conducted by Students organization who would like to know the priority math
subjects that they will include in their “ peer –tutorial sessions “ , whose objective is to encourage
their peers to continue with their engineering courses.
Step 2. Decide on the target group?
The target groups in this study are the 2nd and 3rd year engineering students of my school.
Step 3. How do you reach your target group?
The researcher may visit and discuss to them about their proposed study the students in their math
subjects?
Step 4. Break down the purpose and limit the scope
The researcher may break down the purpose into “ by topic “ starting with more difficult topics.
Step 5. What questions should the survey contain, then write a draft of the questions ?
Sample questions to be answered by the respondents may be.....
a. What are your 2 favorite math subjects?
b. What 2 math subjects you find difficult to pass?
From b, what specific topics you find difficult to comprehend ?
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
Before conducting a research experiment, researchers come up with a research design. Experimental
research design serves as an instruction manual on how the experiment is conducted. The design helps
the researcher stay on track and makes sure all bases are being properly covered to ensure the
experiment's validity. Designed Experiments achieve manufacturing cost savings by minimizing process
variation and reducing rework, scrap, and the need for inspection.
B. MAIN LESSON
In planning to conduct an experiment to collect the research data, the following must already be defined.
The research problem and the research objectives. Formulate the research question or a problem
statement.
The responses and the factors. The variables of interest, in relation to your research problems or
objectives, should be identified. Indicate the independent and the dependent variables. Make some
predictions or hypothesis of the possible outcome ( the dependent variable or the response ) when the
independent variables ( the factors ) are manipulated. Combination of the factors is termed as
treatments.
For example, If you designed an experiment to determine how quickly a cup of hot chocolate drink
cools, then, the manipulated independent variable is time and the dependent measured variable is
temperature.
The experiment research design. This is the process of planning an experiment to test the researcher’s
hypothesis. The relationship between two variables - the dependent and the independent variable is
determined. Data collected in experimental research usually are quantitative in nature.
Control Group. The group of the experimental design not exposed to treatment. The difference in the
performance of the control group and the treatment group measures the effects of the full treatment on the
treatment group.
Before-and-after without control design: In such a design a single test group or area is selected and
the dependent variable is measured before the introduction of the treatment. The treatment is then
introduced and the dependent variable is measured again after the treatment has been introduced. The
effect of the treatment would be equal to the level of the phenomenon after the treatment minus the
level of the phenomenon before the treatment.
The main difficulty of such a design is that with the passage of time considerable extraneous variations
may be there in its treatment effect.
After-only with control design: In this design two groups or areas (test area and control area) are
selected and the treatment is introduced into the test area only. The dependent variable is then
measured in both the areas at the same time. Treatment impact is assessed by subtracting the value of
the dependent variable in the control area from its value in the test area.
The basic assumption in such a design is that the two areas are identical with respect to their
behavior towards the phenomenon considered. If this assumption is not true, there is the possibility of
extraneous variation entering into the treatment effect. However, data can be collected in such a
design without the introduction of problems with the passage of time. In this respect the design is
superior to before-and-after without control design.
Before-and-after with control design: In this design two areas are selected and the dependent
variable is measured in both the areas for an identical time-period before the treatment. The
treatment is then introduced into the test area only, and the dependent variable is measured in both
for an identical time-period after the introduction of the treatment. The treatment effect is determined
by subtracting the change in the dependent variable in the control area from the change in the
dependent variable in test area.
This design is superior to the above two designs for the simple reason that it avoids extraneous
variation resulting both from the passage of time and from non-comparability of the test and control
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON #8
areas. But at times, due to lack of historical data, time or a comparable control area, we should
prefer to select one of the first two informal designs stated above.
Observational Study. Here, data are collected through observation from experiments.
Simulations: This procedure uses a mathematical, physical, or computer models to replicate a real-life
process or situation. It is frequently used when the actual situation is too expensive, dangerous, or
impractical to replicate in real life. This method is commonly used in engineering and operational
research for learning purposes and sometimes as a tool to estimate possible outcomes of real
research.
2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
Read the abstract of a research paper and answer the following questions?
1. What is the research problem of the paper being presented?
2. What are the dependent ( response ) variable and the independent variables ( factors)?
3. What is the control group and the experimental or the treatment group?
Abstract— Since the ancient times, many researches and advancements were carried to enhance the physical and
mechanical properties of concrete. Fiber reinforced concrete is one among those advancements which offers a
convenient, practical and economical method for overcoming micro cracks and similar type of deficiencies. Since
concrete is weak in tension hence some measures must be adopted to overcome this deficiency. Human hair is generally
strong in tension; hence it can be used as a fiber reinforcement material. Human hair Fiber is an alternative non-
degradable matter available in abundance and at cheap cost. It also reduces environmental problems. Also addition of
human hair fibers enhances the binding properties, micro cracking control, Imparts ductility and also increases swelling
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON #8
resistance. The experimental findings in our studies would encourage future research in the direction for long term
performance to extending this cost of effective type of fibers for use in structural applications. Experiments were
conducted on concrete cubes, cylinders and beams of standard sizes with addition of various percentages of human hair
fiber i.e., 0%, 0.5%, 1% and 1.5% by weight of cement, fine & coarse aggregate and results were compared with those of
plain cement concrete of M-20 grade. For each percentage of human hair added in concrete, four cubes, three cylinders
and three beams were tested for their respective mechanical properties at curing periods of 3 , 7 and 28 days. Optimum
hair fiber content was obtained as 1.5% by weight of cement.
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Think of a possible research problem wherein you will be conducting experiment to collect the data. Share
it with your classmate and reserve this research problem in our next session.
KEY TO CORRECTION
Activity #3
1) Answer. Usage of human hair as a fiber reinforcement material.
2) Answer. Mechanical properties of concrete is the dependent variable or the response in the experiment. The
various percentage of human hair in concrete are the independent variables or the factors.
3) Answer. Mechanical properties of plain cement concrete of of M-20 grade is the control group Mechanical
properties of concrete ( dependent variable or the response in the experiment ) with the various percentage
of human hair( the independent variables), 0%, 0.5%, 1% and 1.5% by weight is the experimental group/ the
treatment group.
Activity #5
a) factors
b) research problem
c) Quantitative data
d) experimental design
Productivity Tip: “ Tomorrow becomes never. No matter how small the task, take the first step now! “ – T. Ferriss
A. LESSON PREVIEW/REVIEW
1) Introduction (3 mins)
Let us play a Binomial Experiment. To do this, please toss a one peso coin 20 times and tally the
outcome for each toss whether a head or a tail appears. Fill in the table below for the result of your
statistical experiment.
Fill in the first column of what you know to answer the questions on the second column of the table below.
B. MAIN LESSON
Statistical Experiments are experiments that have three things in common. The experiments have more
than one possible outcomes, each possible outcomes can be specified ahead of time, and each outcome
depends on chance. An examples is flipping a coin; where there two possible outcomes, ( more than one
outcome), the outcomes can be specified in advance either a head or tail, and the outcome is uncertain
( depends on chance).
Probability Distribution. A probability distribution is a table or an equation that links each different
outcomes of a statistical experiment with its probability of occurrence. In some cases, the probability
distribution is represented as a graph. The outcomes of the experiment is represented by a random
variable.
For example, in flipping a coin two times. An outcome of the experiment might be the number of heads
that we see in two coin flips. If we let the variable X be the number of heads that come up, then X is
termed as the random variable which could take a value of X = 1 ( meaning of the two coins flipped only
one head appears, so a tail appears on the other coin ) or X= 2 ( meaning of the two coins flipped e heads
appear, so no heads appear) or X = 0 ( meaning no head appears and that 2 heads appear in flipping the
2 coins).
Let the outcome be the number of heads that you see in flipping two coins. Represented by the random
variable X. Note that the possible outcomes of this experiment are { HH , HT , TH, & TT }. Below are the
probability distribution of the above statistical experiment.
Probability Distribution of tossing a coin 2 times
Binomial distribution is a series of independent and identically distributed Bernoulli trials. In a Bernoulli trial,
the experiment is said to be random and could only have two possible outcomes: success or failure. For
example, flipping a coin is considered to be a Bernoulli trial; each trial can only take one of two values
(heads or tails), each success has the same probability (the probability of flipping a head is 0.5), and the
results of one trial do not influence the results of another. The Bernoulli distribution is a special case of the
binomial distribution where the number of trials n = 1. So, repeated flipping of a coin is considered as a
binomial experiment.
The random variable X which has a binomial probability distribution can be represented as ,
The mean is also termed as the expected value or the average of the outcomes. Mean =n p ; where n=
the total number of trials and p = probability of success. For, example the number of heads in 100 trials is
50, then the mean is 100*0.5.
Median is the middle value in sorted (in an increasing or decreasing arrangement) outcomes. There is no
single formula to find the median for a binomial distribution. However, several special results have been
established: If np is an integer, then the mean, median, and mode coincide and equal np.
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON # 9
Standard deviation is a measure of dispersion of the data set from its mean. Dispersion help you to
interpret the variability of data i.e. to know how much homogenous or heterogenous the data is. In simple
terms, it shows how data approaches the mean. The greater is the standard deviation, the greater is the
deviation of the value of each data from the mean. For a binomial experiment the standard deviation, , is,
n p (1 p)
2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
B. In the automobile spare part production of your company, 90% pass final inspection (and 10% fail and
need to be fixed). What is the mean and the standard deviation that will pass in the next 5
inspections?
a. From your binomial experiment game (page 1, introduction) and from 5 of your classmates data, fill
in the table below:
i) ii )
c. In tossing a dice, what is the probability that a non-zero number will appear?
C. LESSON WRAP-UP
KEY TO CORRECTION
Activity #3
A.
a) From the formula ;
b) Here, n= 5
x=2
p = 0.5
using the equation above,
P(exactly 2 heads will appear) = nCx px (1-p)n-x = 5C2 · ( 0.5 )2 · (1 – 0.5 ) 5 – 2 = 0.3125
c) Here, n= 5; x = 0 and p = 0.5
using the equation above,
P(no head will appear) = nCx px (1-p)n-x = 5C0 · ( 0.5 )0 · (1 – 0.5 ) 5 – 0 = 0.03125
d) P(at most 2 heads will appear/ 2 or less) = P( no head will appear) + P ( exactly 1 head will
appear) + P( exactly 2 heads will appear)
P( no head will appear) = 0.03125
P( exactly 2 heads will appear) = 0.3125
P ( exactly 1 head will appear) = 5C1 · ( 0.5 )1 · (1 – 0.5 ) 5 – 1 = 0.15625
P(at most 2 heads will appear/ 2 or less) = 0.03125 + 0.15625 + 0.3125 = 0.5
B. The mean, or "expected value", is: μ = np = 5 x 0.9 = 4.6 . Meaning we expect 4.6 parts to pass out of
the 5 inspections.
The standard deviation is n p (1 p) 5 0.9 (1 0.9) 0.671 . Meaning, the average
difference of the spare parts that will pass inspection between the over all mean of t he 5 inspections
to each individual inspection is 0.671.
Activity #5
a) answer may vary
b) i. P ( X = 2 ) = 3/ 8
ii. P ( X = 2 ) = 0.1
c) P (non-zero) = 6/6 = 1
Productivity Tip:
―In life, people tend to wait for good things to come to them, and by waiting, they miss out’‖ - Neil Strauss
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
The Poisson distribution is a discrete distribution. It is named after Simeon-Denis Poisson (1781-1840), a
French mathematician, who published its essentials in a paper in 1837.
The Poisson distribution is a special case of the Binomial distribution. Since, as n approaches infinity,
the binomial distribution also approaches the Poisson distribution. Poisson distribution is actually an
important type of probability distribution formula. Poisson distribution models rare events and is asymmetric
— meaning it is always skewed toward the right.
B. MAIN LESSON
Poisson Distribution gives the probability of a number of events in an interval generated by a Poisson
process. The Poisson distribution is defined by the rate parameter, λ, which is the expected number of
events in the interval and the highest probability number of events.
Applications of the Poisson distribution can be found in many fields including:
Asymptotic Poisson model of seismic risk for large earthquakes.
Number of decays in a given time interval in a radioactive sample.
The number of photons emitted in a single laser pulse.
The number of yeast cells used when brewing Guinness beer. This example was used by William Sealy
Gosset (1876–1937).
The number of phone calls arriving at a call centre within a minute. This example was described by A.K.
Erlang (1878–1929).
Failure of a machine in one month.
Example . A particular river overflows every 25 years on the average. Find the probability that there are
x = 2 overflows in a 25 year interval.
Here, λ = 1, x = 2 , hence,
x e 12 e 1
P(there are 2 overflows in a 25 year interval, X = 2 ) = P( X x) 0.1839
x! 2!
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON #10
Examples. Some vehicles pass through a junction on a busy road at an average rate of 300 per hour. Find
out the probability that none passes in a given minute.
a. What is the average number of vehicles passing per minute?
b. What is the probability that no vehicle will pass in a given hour?
c. What is the expected number of vehicles passing in three minutes?
Solution.
a. The average number of vehicles passing per minute, λ = 300 /60 = 5,
x e
b. Using the formula ; P( X x)
x!
50 e 5
P( X 0) 0.00674
0!
c. Expected number of vehicles passing in three minutes = 3· λ = 3· 5 = 15
Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)
1. Twenty cars were examined for defective surface coating. The frequency of the number of cars with a
given number of defective surface coating per were was as follows:
If a car is chosen at random, what is the probability that a car has 3 or more defective surface coating?
2. If electricity power failures occur according to a Poisson distribution with an average of 5 failures every
20 weeks, calculate the probability that there will not be more than one failure during a particular week.
A company makes electric motors. The defects of the motors follow a Poisson distribution. The probability
an electric motor is defective is 0.01. What is the probability that a sample of 100 electric motors will
contain exactly 3 defective motors?
C. LESSON WRAP-UP
KEY TO CORRECTION
Activity #3
1)
Total number of defective surface coating = 0 + 3 + 10 + 6 + 16 + 5 +6 = 46 , hence, λ = 46 / 20 = 2.3 .
You may use the property of complement of probability, here ,
c
P( finding 3 defective surface coating or more ) = 1 - P( finding 3 defective surface coating or more )
c
P( finding 3 defective surface coating or more ) = P( finding less than 3 defective surface coating )
P( finding less than 3 defective surface coating ) =
P( X 3) P( X 0) P( X 1) P( X 2) ;
x e
using the formula, P( X x)
x!
0 2.3 1 2.3 2 2.3
2.3 e 2.3 e 2.3 e
0.59604
0! 1! 2!
Hence, P( X≥ 3 ) = 1 – 0.59604 = 0.40396
2.3x e 2.3
You may predict also the probability using the histogram , a graph of , P( X x)
x!
2)
Here, the average power failure per week is ( λ = 5 /20) 0.25.
x e
P( X 1) P( X 0) P( X 1) ; using the formula, P( X x )
x!
Activity #5
Average number of defective motors in 100 motors, λ = 0.01 x 100 = 1; and x = 3, then
x e 13 e 1
P( X x) 0.0613
x! 3!
Or you may estimate the probability from the histogram, (a plot of vs. x) , shown below.
Productivity Tip: “Hard work keeps the wrinkles out of the mind and spirit.”– Helena Rubinstein
A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
The normal distribution is the most commonly used probability distribution. This is also known as the
Gaussian distribution. A random variable that follows a normal distribution is said to be normally
distributed. If we know a random variable is normally distributed, then, you can use the known properties
of the normal distribution to calculate the probability of this variable on certain values. Random variables
representing height and intelligence are approximately normally distributed.
B. MAIN LESSON
A standard Normal distribution is when mean ( µ ) = 0 and standard deviation (σ ) =1, substituting these
values to the above equation gives the pdf of the standard normal distribution
2
( x)
1
f ( x) e 2
2
The number of standard deviation from the mean is called as the standard score or the z- score. A
positive z-score indicates the raw score is higher than the mean average. For example, if a z-score is
equal to +1, it is 1 standard deviation above the mean. A negative z-score reveals the raw score is below
the mean average. For example, if a z-score is equal to - 2, it is 2 standard deviations below the mean.
( z )2
1
f ( z) e 2 where:
2
Example 1.
The heights of the male adults are normally distributed with a mean of 1.7 meter and a standard deviation
of 0.20. What is the corresponding standard score of if the heights of these adults are x1 = 1.4 meter and
1.6 meter.
Solution: For x1 = 1.4 meter, corresponding z-score is . . .
x 1.4 1.7
z 1.5
0.2
The Standard Normal Probability Distribution Curve ( mean , µ = 0 and standard deviation, σ = 1.0 ).
Note that the total area under the probability distribution curve is equal to 1.
Solution: Referring to the Areas Under the Normal Curve ( Statistical Table).
a) The area between z = - 1.5 and z = - 0.5 is 0.24173. This is also the probability that male adults
have heights between x = 1.4 meter and x = 1.6 meter. Mathematically expressed as
b) Number of male adults having a height between x= 1.4 meter and 1.6 meter is 97 (400 x 0.24173).
The normal curve below shows the probability distribution of z ( in percentage ) given its standard deviation.
Source : https://www.mathsisfun.com/data/standard-normal-distribution.html
Example 2.
A machine produces electrical components. 99.7 % of the components have lengths between 1.176 cm
and 1.224 cm. Assuming the data is normally distributed, what are the mean and the standard deviation ?
Solution :
At 99.7 %, z 2.97 .
x
z x z
Since
From eqn. 1 and eqn. 2 , solve for the mean, µ , and the standard deviation , σ.
This equations gives the following solution : µ = 1.20 cm; σ =0.008 cm
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).
b. A company makes parts for a machine. The lengths of the parts must be within certain limits or they will
be rejected. A large number of parts were measured and the mean and standard deviation were
calculated as 3.1 and 0.005 m respectively. Assuming this data is normally distributed and 99.7 %
were accepted, what are the limits ?
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
FAQs
Many events are normally distributed, or very close to it. For a large sample size, N, the distribution of
non – normal random variables approaches that of the normal distribution.
KEY TO CORRECTION
Activity #3
a)
Given : µ = 255 grams; σ = 2.5 grams ; x = 250 grams
Solution :
x 250 255
Solving for the standard score, z 2.0
2.5
From the table at z = - 2.0 , area = 0.02275. Mathematically, P( x 250) 0.02275 .
Thus, the percentage of coffee that are underweight is 2.275 % ( 0.02275 x 100).
b)
Given : µ = 3.1 meter; σ = 0.005 meter ; P(a x b) 0.997
Solution :
x
At 99.7 % probability of acceptance, z 2.97 z x →
z eqn.3
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON #11
Activity #5
Given : µ = 43 years; σ = 14 years ; x1 = 22 years & x2 = 57 years
Solution :
x 22 43
At x = 22, z 1.5
14
From table, at z = - 1.5 , the area under the standard distribution is equal to 0.06881
x 57 43
x = 57, z 1.0
14
From table, at z = 1.0 , the area under the standard distribution is equal to 0.84134
Hence, the area between x = 22 and x = 57 is 0.77253 (0.84134 - 0.06881). This area also
represents the probability that x, is between 22 and 57. Thus, from the 5000, 3863 (5000 x 0.77253)
have age between 22 years old and 57 years old..
1) Introduction (2 mins)
Questions that concern the time you need to wait before a given event occurs and if this waiting time is
unknown, it is often appropriate to think of it as a random variable having an exponential distribution.
Further, the time you need to wait before an event occurs has an exponential distribution if the probability
that the event occurs during a certain time interval is proportional to the length of that time interval. For
example, you may ask, how long will a piece of machinery work without breaking down?
B. MAIN LESSON
The exponential distribution refers to the probability distribution that is used to define the time between two
successive events that occur independently and continuously at a constant average rate. Here, the
exponential random variable has fewer large values and more small values.
The assumption of a constant rate is very rarely satisfied in the real world scenarios, however, if the time
interval is selected in such a way that the rate is roughly constant, then you can approximate the random
variable to follow an exponential distribution.
; for x ≥ 0
e x
f ( x)
where : X is a non negative continuous random variable
λ = the rate parameter ( a constant )
0 ; for x < 0
e x dx 1 e x
x
and the probability of X = x is , P( X x ) .
0
The mean, median and the variance of the exponential random variables, X.
1 ; hence, λ = 1 /mean
The mean or the expected value of X is, E ( X ) mean
ln( 2)
the median , median( X ) , and
1
the variance of X is, var( X )
2
Sample Problem. On the average, a certain computer has a life time of 10 years. If the life of the
computer is exponentially distributed.
b. What is the probability that a computer has a life of less than 7 years?
1
Let X be the random variable representing the life of the computer. Here, 0.10
10
P( X 7) 0.10e0.10 x dx e0.10 x
7
0
7
0 1 e0.7 1 0.497 0.503
c. What is the probability that a computer has a life of more than 10 years.
c c
P ( X > 10 ) = 1 - P ( X > 10 ) ; P ( X > 10 ) = P ( X ≤ 10 )
0
10
0 1 e1.0 1 0.368 0.632
d. What is the probability that a computer has a life of more than 7 years but less than 10 years?
3) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking
Problem. Suppose that the lifetime (x) of certain model of car battery follows an exponential distribution
with a mean life of 5 years”
a. What is the probability distribution of the life of the car battery ?
b. Plot the probability density function, f(x) versus the lifetime of the car battery (x)
c. What is the probability that the life of the battery will be greater than 2 years?
d. What is the probability that the life of the battery is greater than 2 years but less than 4 years?
e. What is the var(x)?
1 1
a.
b.
2 c. d. 2
3. A conversation follows an exponential distribution, f ( x ) e x ,
with a mean time of 3 minutes.
a) Find the probability that the conversation will be more than 5 minutes.
e 5
5
1 3
5
a.
e b. e 15 c. e 3
d.
3 3
b) Find the probability that the conversation will be less than 5 minutes.
e 5
5
1
5
a.
1 e 3 b. 1 e 15 c. 1 e 3
d.
1
3 3
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Answer. The amount of time (beginning now) until an earthquake occurs ; the amount of time, in
months, a car battery lasts. The exponential distribution is widely used in the study of the amount of
time a product lasts( field of reliability).
KEY TO CORRECTION
Activity #3
Solution :
hence, the probability density function of the battery life is f ( x) 0.20e 0.2 x .
0.2 x
b. Plot of f ( x) 0.20e versus x.
1 1
e. var( X ) 25 var(x) =
2
0.20 2
Activity #5
1) a
2) b
3) a) c
b) c
1) Introduction (2 mins)
When a parameter is being estimated, the estimate can be either a single number or it can be a range of
scores. When the estimate is a single number, the estimate is called a "point estimate"; when the estimate
is a range of scores, the estimate is called an interval estimate. Confidence intervals are used for interval
estimates.
Fill in the first column of what you know to answer the questions on the second column of the table below.
B. MAIN LESSON
Estimation
Refers to the process by which one makes inferences about a population based on information
obtained from the sample.
Statistic
Refers to any measurable quantity calculated from the sample. A statistic could be the sample mean,
x ; the sample standard deviation , s ; the sample variance, s2 ; . . .
Parameter
Refers to the descriptive measures of the population. , for example the population mean, µ ; the
population standard deviation, σ; the population variance, σ2 ; . . .
Estimator
A quantity calculated from the sample data which are used to give information about the unknown
quantity in the population. For example, the sample mean, x , an estimator of the population mean, µ.
Estimate
It is the particular value of an estimator that is obtained from a particular sample of data and used to
estimate the value of the parameter. In the preceding example if the sample mean is , x 3.5 , then
we may say that 3.5 is the estimate of the parameter, the population mean, µ.
Sampling Distribution
The distribution of the point estimator( statistic) is termed as the sampling distribution
Let each set of random variables X 1 , X 2 ,..., X n is normally distributed with mean µ and variance, θ2 .
X 1 X 2 ..... X n
The sample mean X
n
.... n
The mean of the sample distributions X
n n
2 2 .... 2 n 2 2
The variance of the sample distribution 2X
n2 n2 n
Example. An electronic company manufactures resistors that have mean resistance of 120 ohms and a
standard deviation of 12 ohms. If distribution of the resistance is normal, find the mean , the variance
and the standard deviation of the sampling distribution for n = 25 resistors.
The Central Limit Theorem states that the sampling distribution of the sample means (unknown
population) approaches a normal distribution as the sample size gets larger. This holds especially
true for sample sizes over 30.
A good estimator has a small bias. When the bias is zero then you may say that the point estimator
is unbiased.
2. Consistency
Consistency shows how close the point estimator to the value of the parameter as the sample size
increases.
^
E( ) as n → ∞
^
Var ( ) → 0
3. Relative Efficiency
The absolute efficiency of an estimator is the ratio between the minimum variance and the actual
variance .
An unbiased estimator is called efficient if its variance coincides with the minimum variance for all
values of the population parameter.
If two competing estimators are both unbiased, the one with the smaller variance (for a given
sample size) is said to be relatively more efficient. An estimator θ is said to be more efficient than
another estimator θ2 for θ if the variance of the first is less than the variance of the second.
Standard Error
Standard error is a measure of accuracy of a statistic. This is equal to the standard deviation of the
sampling distribution of this statistic.
The standard error tells you how accurate the mean of any given sample from that population is likely
to be compared to the true population mean. When the standard error increases, i.e. the
means are more spread out, it becomes more likely that any given mean is an inaccurate
representation of the true population mean.
Example. In a certain property investment company with an international presence, workers have a
mean hourly wage of P 125 with a population standard deviation of P 5. Given a sample size of 30,
estimate and interpret the SE of the sample mean.
5
SE 0.90
n 30
Interpretation. If we draw several samples of size to from the population, we will end up with a mean
hourly wage of P 125 with a standard error of P 0.90
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).
Multiple Choice. Encircle the best answer.
1. A sampling distribution is the probability for which of the . . .
a. sample b. sample statistic c. population d. d. population parameter
2. What is the best description of a point estimate?
a. any value from the sample to estimate a parameter.
b. a sample statistic used to estimate a parameter.
c. the margin of error to estimate the parameter.
d. the population mean.
4. The difference between the expected value of the sample and the estimates value of the parameter is
the . . .
a. bias b. error c. contradiction d. difference
5. A random sample of 100 engineering students are asked how much they spend a meal during week
days. The average spent is found to be P70. What is the point estimate of the population mean?
a. P 100 b. P 90 c. P80 d. P 70
7. In an application to estimate the mean number of kilometers students commute to school each day,
the following are given: n = 20 ; x 4.33 ; s 3.50
The point estimate for the true population mean is:
a. 1.638 b. 4.33 + 1.638 c. 4.33 d. 3.50
9. A random sample of 340 people in Carmen showed that 66 listened to an FM radio Station A. Based
on this sample information, what is the point estimate for the proportion of people in Carmen that listen
to Station A?
a. 340 b. 0.194 c. 66 d. 0.66
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON #13
Given this information, what is the point estimate for the population mean?
C. LESSON WRAP-UP
The point estimate of the population is obtained from the sample. The smaller the bias of the point
estimator and as the sample size increases, the more closer is its mean value to the parameter being
estimated.
KEY TO CORRECTION
Activity #3
1) b
2) b
3) a
4) a
5) d
6) c
7) c
8) a
9) b
10) a
Activity #5
1) 243
1) Introduction (2 mins)
When a parameter is being estimated, the estimate can be either a single number or it can be a range of
scores. When the estimate is a single number, the estimate is called a "point estimate"; when the estimate
is a range of scores, the estimate is called an interval estimate.
B. MAIN LESSON
Confidence Intervals
Prediction Intervals
Tolerance Intervals
Confidence Interval
A confidence interval is a range of values that probably contain the population mean.
The best known and often used statistical intervals. Confidence intervals are used to express the
uncertainty associated with the population parameter. The estimate of the interval should be
repeatable, meaning, if you do estimating the interval again and again, you will get the same result
and this could be express as the confidence level. Confidence levels are percentage o certainty.
Standard Error =
Lower boundary of the confidence interval : x z / 2
n
Upper boundary of the confidence interval : x z / 2
n
s
Lower boundary of the confidence interval : x t / 2
n
s
Upper boundary of the confidence interval : x t / 2
n
Example 1. We have a sample of 20 observation from a Normal distribution with a standard deviation of
0.20 and a sample mean of 4.5. We want a 95 % level of significance. What are the lower and upper
boundary of the confidence interval?
From the Z – score table, at 95% confidence level, 0.05 the corresponding z value is 1.96.
Example 2. The sample mean result is 25%. For this estimate calculate a confidence interval if the
margin of error is 3.2% for this estimate.
Lower boundary of the confidence interval :
x MOE 25% 3.2 21.8%
Upper boundary of the confidence interval :
x MOE 25% 3.2 28.2%
Therefore, the confidence interval is (21.8% to 28.2%).
Prediction Intervals
Prediction interval is an estimate of an interval in which a future observation will fall, with a certain
probability, given what has already been observed. For example, in a 95% prediction interval of [10 15],
you are 95% confident that the next new observation will fall within this range.
Tolerance Intervals
A tolerance interval covers a specified proportion of the population for a given confidence level.
For example, 85% of the time, batteries will fall into the interval 100 to 120 hours, with 95% confidence.
2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
Problem Solving. The QC manager of a light bulb factory needs to estimate the average lifetime of a large
shipment of bulbs made at the factory. The lifetime of these light bulbs is normally distributed with a
standard deviation of 100 hours. A random sample of 64 bulbs from the shipment results in a sample
mean lifetime of 350 hours.
Given : σ = 100 hrs. ; x = 350 hrs. ; n = 64
(a) Find a 95% confidence interval for the mean lifetime ( µ ) for the entire shipment.
(b) Suppose that the standard deviation was 80 rather than 100 hours. Recalculate your confidence
interval from Part (a). Is it narrower or wider than your solution to (a)?
You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned
Problem Solving. A bottling company fills thousands of 12 oz bottles with soda drink at the same level. A
random sample of bottles were taken from the processing line containing the following amount of soda
drink ( in oz). 11.8; 12.1; 11.2; 12.0; 11.8; 11.7; 11.9. Assuming the distribution of the content is
normally distributed with a standard deviation of 0.01, find the 95% confidence interval the soda drink in
the bottles.
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Increasing the sample size decreases the width of confidence intervals, because it decreases the
standard error.
The statement, "the 95% confidence interval for the population mean is (250, 300), is equivalent to
the statement, "there is a 95% probability that the population mean is between 250 and 300.
FAQs
Which is more accurate, a 95% confidence interval or a 99% confidence interval ?
The 99% confidence interval is more accurate than the 95%.
KEY TO CORRECTION
Activity #3
a)
At 95% confidence interval, z = 1.96
x
z x z 350 1.96 100 154 hrs.
At 95 % Confidence Interval , for n = 100
100
x z / 2 350 1.96 350 24.5
n 64
b)
At 95 % Confidence Interval , for n = 80
80
x z / 2 350 1.96 350 19.6
n 64
Activity #5
From the Z – score table, at 90% confidence level, 0. the corresponding z value is 1.96.
1) Introduction (2 mins)
When you conduct some researches, you are trying to discover of something new. Improved process ?
More accessible raw material ? . . . Along the process , several questions will come up. So you are
trying to make some hypotheses to answer your questions. This session, will guide you on how to test
your hypothesis.
Fill in the first column of what you know to answer the questions on the second column of the table below.
What is a z – test ?
B. MAIN LESSON
HYPOTHESIS TESTING
Hypothesis Testing is a statistical test used to determine whether the hypothesis assumed for the
sample of data stands true for the entire population or not.
Hypothesis testing is also used when you are comparing two or more groups.
The purpose of hypothesis testing is to determine whether there is enough statistical evidence in
favor of a hypothesis about a parameter.
Hypothesis should be simple and specific. There are two types of statistical hypothesis, the null
hypothesis and the alternative hypothesis. The null and alternative hypotheses are contradictory.
Since they are contradictory, you must examine evidence to decide if you have enough evidence to
reject the null hypothesis or not.
NULL HYPOTHESIS
Denoted as Ho. H0 always has a symbol with an equal in it.
A statement that there is no relationship between two measured phenomena or no association
among groups.
A null hypothesis is a hypothesis that says there is no statistical significance between the two
variables in the hypothesis. It is the hypothesis that the researcher is trying to disprove.
It is a statement of no difference between sample means or proportions. It may also be a
statement of no difference between a sample mean and a population mean. In other words, the
difference equals 0.
ALTERNATIVE HYPOTHESIS
Denoted as H1.
It is a claim about the population that is contradictory to H o.
H1 never has a symbol with an equal in it.
The hypothesis that one is trying to establish, and it can be “statistically proved” by a rejection of
the null hypothesis.
Example. Write the null hypothesis and the alternative hypothesis in the following statements.
2. The mean number of cars a person owns in his lifetime is not more than 5.
Null Hypothesis, Ho : µ ≤ 5 cars
Alternative Hypothesis, H1 : µ > 5 cars
3. Seventy percent of the first year engineering students have no failing grades this school
year.
Null Hypothesis, Ho : p = 0.75
Alternative Hypothesis, H1 : p 0.75
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON # 15
LEVEL OF SIGNIFICANCE
Denoted by α .
Measures the strength of the evidence that must be present in your sample before rejecting the null
hypothesis.
It is the probability of rejecting the null hypothesis when in fact it is true, that is ( Type 1 error = α ).
Usual values of α are 0.05 , 0.02 , or 0.01.
TEST STATISTIC
A test statistic is a random variable that is calculated from sample data and used in a hypothesis
test.
Test statistics is used to determine whether to reject the null hypothesis. The test statistic compares
your data with what is expected under the null hypothesis.
The test statistic is used to calculate the p-value.
Examples of test statistic are : for a Z-test is the Z-statistic, for the T –test is the t – statistic.
P-VALUE
The probability that your sample could have been drawn from the population being tested given
that the null hypothesis is true.
A p-value of 0.05 indicates that you have only 5% chance of drawing the sample tested if the
null hypothesis was actually true.
If the p-value is less than the significance level, we reject the null hypothesis.
The p – value is the area under the curve at the rejection region.
Example. If the observed value of z = 1.51 ( calculated value), then from the statistical table at z
= 1.5 is 0.93448, so the p – value of the sample is 0.06552.
Statistical significance plays a pivotal role in statistical hypothesis testing. It is used to determine
whether the null hypothesis should be rejected or retained. The null hypothesis is the default
assumption that nothing happened or changed. [37] For the null hypothesis to be rejected, an
observed result has to be statistically significant, i.e. the observed p-value is less than the pre-
Source : https://www.google.com/
o If H1 contains the “ < “ , then conduct a left tailed test. Compare calculated test
statistic with the critical value of the test statistic at the given α. If calculated test
statistic > critical value of the test statistic, then you do not reject the null hypothesis.
o If H1 contains the “ “ , then conduct a 2 tailed test. Compare calculated test statistic
with the critical value of the test statistic at the given α/2. If calculated test statistic ( if
negative) < critical value of the test statistic, then you do not reject the null hypothesis
and If calculated test statistic ( if positive) > critical value of the test statistic, then you
do not reject the null hypothesis
Problem. A manufacturer of electric lamps is testing a new production method that will be considered
acceptable if the lamps produced by this method result in a normal population with an average life of 2,400
hours and a standard deviation equal to 300. A sample of 100 lamps produced by this method has an
average life of 2,320 hours. Can the hypothesis of validity for the new manufacturing process be accepted
with a risk equal to or less than 5%?
Step 1. State the Null Hypothesis and the Alternative Hypothesis
Null Hypothesis : Ho : µ = 2,400 hours
Alternative Hypothesis : H1 : µ 2,400 hours
Step 2. Level of Significance, α = 0.05
Step 3. Calculate the test statistic, z – statistic.
x 2,320 2,400
z 2.67 , (note : is called as the standard error of the statistic)
300 n
n 100
Step 4. The calculated z – statistic ( - 2.67 ) is less than the critical value of the z statistic at α/2 ( - 1.96),
so there’s enough evidence to reject the null hypothesis in favor of the alternative hypothesis. The
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON # 15
test is statistically significant, suggesting that the mean life of these electric lamps is not equal to
2,400 hours as claimed by the manufacturer.
Further, at z = - 2.67 , from the statistical table, the area is equal to (0.0038 x 2) 0.0076 .
Note that, this area is the p value of the sample, and since the p- value is less than the α/2
(0.025), then we do not accept the null hypothesis.
2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
A rental car company claims the mean time to rent a car on their website is 60 seconds with a standard
deviation of 30 seconds. A random sample of 36 customers attempted to rent a car on the website. The
mean time to rent was more than 70 seconds. Is there enough evidence that the sample mean time is more
than 60 seconds at 95% level of significance?
What is a z – test ?
2. A test is conducted; Ho : µ = 40, H1 : µ > 40 . The z- test statistic is 1.5. What is the p- value of
this test?
a. 0.9332 b. 0.0667 c. 0.05 d. 0.01
3. A test is conducted; Ho : µ = 40, H1 : µ < 40 . The z- test statistic is - 1.5. The correct decision is .
a. Reject H0 both α = 0.05 and α = 0.01.
b. Reject H0 at α = 0.05 but do not reject Ho at α = 0.01.
c. Reject H0 both α = 0.05 and α = 0.10.
d. Reject H0 at α = 0.05 but do not reject Ho at α = 0.10.
5. The z – test is used to test the sample mean in the following case. . .
a. sample standard deviation is known.
b. sample size is less than 30.
c. sample size is more than 30.
d. data are not normally distributed.
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Students’ Activity Sheet LESSON # 15
C. LESSON WRAP-UP
Statistics is about data and it is the interpretation of the data that we are interested in. In hypothesis testing
we are trying to interpret or draw conclusions about the population using data coming from the sample.
Further, hypothesis testing evaluates statements about a population to evaluate which statement is
supported by the sample data.
KEY TO CORRECTION
Activity #3
Conduct a one-sample z-test.
Step 4. The calculated z – statistic ( 2.0 ) is greater than the critical value of the z statistic at α ( 1.96), so
there’s enough evidence to reject the null hypothesis in favor of the alternative hypothesis,
suggesting that the mean time to rent a car is more than 60 seconds.
Further, if z = 2.0 the corresponding p value of the sample is 0.02275, and since the p - value is
less than the α ( 0.05), then we do not accept the null hypothesis or we reject the null
hypothesis in favor of the alternative hypothesis.
Activity #5
1) b
2) b
3) d
4) a
5) c
1) Introduction (2 mins)
⮚ A t-test is a statistical test used to determine if there is a significant difference between the means of
two groups.
B. MAIN LESSON
⮚ t -Test Assumptions
● The sample is collected from a representative of randomly selected portion of the total population.
● The data is normally distributed.
● Population means is known
Types of t – test
⮚ One Sample t – test . This test the mean of a single group against a known population.
⮚ Independent Sample t – test. This test compares the mean for two groups of sample.
⮚ A Paired Sample t – test. This test compares the means of the same group at different times.
⮚ The One Sample t - Test is commonly used to test the statistical difference between a sample mean
and a known or hypothesized value of the mean in the population.
⮚ t-statistic.
where = sample mean
x
t s = sample standard deviation
s µ = population mean
n n = sample size
Example. Test the hypothesis at α = 0.05 that taking a vitamin capsule makes an individual smarter.
Average IQ of an individual is 100. To test the hypothesis 12 engineering students take a the same
vitamin capsule for one year and then an IQ test was given to these students. The results are 116, 111,
101, 120, 99, 94, 106, 115, 107, 101, 110, and 92.
Further using the p - approach, at t = 2.35 and at degrees of freedom = 11, the p value is between
1% and 2.5%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we reject the
null hypothesis.
⮚ This test compares the mean for two groups of sample that are independently selected from each
other.
⮚ There are two types of independent sample t - test.
● Equal Variance ( Pooled variance t – test) with degrees of freedom, df = n1 + n2 – 2.
● Unequal Variance ( Separate variance t – test ) with degrees of freedom,
Degrees of freedom =
⮚ The t - statistic
where :
( x1 x 2 ) Do
t & = mean of sample 1 and sample 2
(n1 1) s1 (n2 1) s2 1 1
2 2
& = variance of sample 1 and sample 2
n1 & n2 = size of sample 1 and sample 2
n1 n2 2 n1 n2 Do = - ( a number that is deduced
from the statement of the situation).
Example. An experiment was performed to compare the abrasive wear of two materials. Ten pieces of
material 1 ( group 1) and ten pieces of material 2 (group 2) were tested. The test on material 1 gave an
average wear of 85 units with a sample standard deviation of 4, and the test on material 2 gave an average
wear of 81 with a sample standard deviation of 5. Can we conclude at 0.05 level of significance that
abrasive wear of material 1 is greater than that of material 2 ? Assume the populations are normally
distributed and with equal variances.
Step 4. The calculated or the observed value of the t – statistic (1.96 ) is greater than the critical value of
the t- statistic (at α = 0.05 and degrees of freedom = 18, 1.34), or we may say that the observed
value of the t – statistic is at the rejection region. Hence, we reject the null hypothesis in favor of
the alternative hypothesis. This suggests that the abrasive wear of material 1 is greater than the
abrasive wear of material 2.
Further, at t = 1.96 and at degrees of freedom = 18, the p value is between 5% and 2.5%, hence
lesser than the level of significance, α = 0.05 ( or 5 % ), so we reject the null hypothesis.
Example. Assume that we are taking a diagonal measurement of bill boards purchased by a company.
Group 1 of samples includes 20 bill boards, while group 2 includes 10 billboards.
Statistical Data : Group 1 : mean diagonal measurement = 21.6 inches ; variance = 17.1
Group 2 : mean diagonal measurement = 19.4 inches ; variance = 1.4
Can we conclude that the mean of group 1 is greater than group 2.
Step 1. State the Null Hypothesis and the Alternative Hypothesis
Null Hypothesis : Ho : µ1 = µ2
Alternative Hypothesis : H1 : µ1 > µ2 ( One tail test )
Step 2. At level of significance, α = 0.05, and
2
s12 s2
2
17.1 1.4
2
n 20 10
Degrees of freedom = 1 n 2
24
2 2 2 2
s1
2
s2
2
17.1 1.4
20
n 10
1 n2
n1 1 n2 1 20 1 10 1
Further using the p approach, at t = 2.194 and at degrees of freedom = 24, the p value is
between 2.5% and 1.0%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we
reject the null hypothesis.
Paired t - Test
⮚ A paired t-test is used when we are interested in the difference between two variables for the
same subject. Often the two variables are separated by time or something other than time.
⮚ Compares the means of two related groups of samples.
⮚ The t –statistic with degrees of freedom df = n-1
t
D
n D 2 D
2
Compare the fuel economy of the two cars , where the cars in each pair is operated using different
types of gasoline ( Type 1 gasoline & Type 2 gasoline)
Step 2. At level of significance, α = 0.05, and degrees of freedom (n - 1 ) equals 8, the critical value of t,
t crit = 1.86.
t
D
1.3
2.6
n D D 9 (0.41) (1.3) 2
2 2
n 1 9 1
Step 4. The calculated or the observed value of the t – statistic (2.6 ) is greater than the critical value of
the t- statistic (at α = 0.05 and degrees of freedom = 8, 1.86), or we may say that the observed
value of the t – statistic is at the rejection region. Hence, we reject the null hypothesis in favor of
the alternative hypothesis. This suggests that the Type 1 gasoline is more economical fuel than
the Type 2 gasoline.
Further ( using the p-value approach), at t = 2.6 and at degrees of freedom = 8, the p value is
between 2.5% and 1.0%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we
reject the null hypothesis.
2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .
The table gave the observations of the control group and the treatment group. Use paired t-test to at 0.05
level of significance to determine if there is a significant difference between the mean of the two groups.
Sample Control Treatmen
No. Group t Group
1 3 20
2 3 13
3 3 13
4 12 20
5 15 29
6 16 32
7 17 23
8 19 20
9 23 25
10 24 15
11 32 30
Answer Sheet:
2. In testing the differences between the means of two independent populations, the null hypothesis
is . . .
a. H o : 1 2 1 b. H o : 1 2 0 c. H o : 1 2 0 d. H o : 1 2 1
4. A one sample t – test was conducted to test the IQ of engineering students. The observe
t – statistic in the study with 15 samples at 0.05 level of significance is 2.0. What is the p – value
of this study?
a. within a value of 0.05 and 0.025.
b. greater than 0.05.
c less than 0.025.
d. none of the above
5. Two different alloys are being considered for making lead-free solder used in the wave soldering
process for printed circuit boards. A crucial characteristic of solder is its melting point, which is
known to follow a Normal distribution. A study was conducted using a random sample of 21
pieces of solder made from each of the two alloys. In each sample, the temperature at which
each of the 21 pieces melted was determined. The mean and standard deviation of the sample
for Alloy 1 were x1 = 218.9ºC and s1 = 2.7ºC; for Alloy 2 the results were x2 = 215.5ºC and s2 =
3.6ºC. If we were to test H0: µ1 = µ2 against Ha: µ1 ≠ µ2 . In this study what is the degrees of
freedom equal to ?
a. 21 b. 20 c. 40 d. 42
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
⮚ The t test is the commonly used statistical test to test the means of two groups of sample.
⮚ You can also use the data menu of Microsoft excel to find the critical value of t and the observe
value of t.
⮚ Let us use Microsoft excel to find the critical value of t and the observed t for the data in activity 3
above.
The table gave the observations of the control group and the treatment group. Use paired t-test to
at 0.05 level of significance to determine the significance of the mean of the two groups.
Sample Control Treatmen
No. Group t Group
1 3 20
2 3 13
3 3 13
4 12 20
5 15 29
6 16 32
7 17 23
8 19 20
9 23 25
10 24 15
11 32 30
Steps:
● Enter the control group and the treatment group columns in excel.
● Click Data, then Data Analysis , the t-test: Paired Two Sample for Means, then press OK.
● Input Variable 1 Range , then Variable 2 Range, enter Alpha, click New worksheet Ply, then OK.
KEY TO CORRECTION
Activity #3
Follow the steps in hypothesis testing.
t
D
73
2.73
n D D 11 (1,131) (73) 2
2 2
n 1 11 1
Control Treatment
Sample Group Group D
No.
1 3 20 -17 289
2 3 13 -10 100
3 3 13 -10 100
4 12 20 -8 64
5 15 29 -14 196
6 16 32 -16 256
7 17 23 -6 36
8 19 20 -1 1
9 23 25 -2 4
10 24 15 9 81
11 32 30 2 4
Step 4. The calculated or the observed value of the t – statistic (- 2.73 or 2.73 since two tailed test is
conducted ) is greater than the critical value of the t- statistic (at α = 0.05 and degrees of freedom =
10, 2.228), or we may say that the observed value of the t – statistic is at the rejection region.
Hence, we reject the null hypothesis in favor of the alternative hypothesis. This suggests that the
control group and the treatment group do not have equal mean.
Further ( using the p-value approach), at t = 2.73 and at degrees of freedom = 10, the p value is
between 2.5% and 1.0%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we
reject the null hypothesis.
Activity #5
1) c
2) b
3) c
4) a
5) b
1) Introduction (2 mins)
Simple linear regression is a statistical method for obtaining a formula to predict the scores on one
variable from the scores on a second variable. The variable we are predicting is called the criterion
variable and is referred to as Y. The variable we are basing our predictions on is called the predictor
variable and is referred to as X. When there is only one predictor variable, the prediction method is
called simple regression.
In simple linear regression, the predictions of Y when plotted as a function of X form a straight line.
Linear regression consists of finding the best-fitting straight line through the points. The best-fitting line
is called a regression line.
B. MAIN LESSON
The 0 ( the intercept of the regression line) and 1 ( the coefficient of X i or the slope of the
regression line ) is estimated by minimizing the sum of the square of the residual error. This procedure
is known as the Method of Least Square.
minimize ( i
2
= (Yi Y.0 ) 2 )
1 n
n n n
n xi yi xi yi
n
1 i 1 i 1 i 1 and o yi 1 xi
n
n 2 n i 1 i 1
n xi xi
2
i 1 i 1
Equation 2
Equation 1
We then substitute the value of 0 and 1 and to the equation and have the regression line
equation.
Y 0 1 X Equation 3
The relationship between the independent and dependent variable is linear, that is, the line of
best fit through the data points is a straight line (rather than a curve)
Correlation Coefficient , r
One of the most commonly used correlation coefficient is the Pearson’s correlation coefficient, r.
The correlation coefficient, r , measures the strength of the linear relationship between the response
variable and the set of explanatory variable.
n x y x y
r
n x 2
x
2
n y 2
y
2
Equation 4
Coefficient of Determination, r2
The square of the correlation coefficient.
It is the proportion of variation in the response variable explained by the regression model.
The most common interpretation of the coefficient of determination is how well the regression model fits
the observed data. For example, a coefficient of determination of 60% shows that 60% of the data fit
the regression model. Generally, a higher coefficient indicates a better fit for the model.
Example 1. A research was done to study the effect of ambient temperature, x, on the electric power
consumed, y , by an industrial plant. Other factors were held constant . Below are data collected from
the experiment. Find the equation of the regression line and estimate the electric power consumption
when x = 70 0F.
y, x,
Trials (BTU) 0
( F)
1 250 27
2 285 45
3 320 72
4 295 58
5 265 31
6 298 60
7 267 31
8 321 74
i 1 i 1
1 n n
o yi 1 xi
1
2,301 1.35 (398) 220 .5
n i 1 i 1 8
Substitute the values of 0 and 1 Equation 3, hence , the regression line equation is . . .
y = 220.5 + 1.35 x .
To predict the power consumption at x = 70 0F, we substitute this value to the regression line to
predict the power consumption, y.
The value of r =0.99, indicates that there is a very high positive relationship between the electric power
consumption and ambient temperature. That there is an increase in electric power consumption for an
increase in ambient temperature. Furthermore, the coefficient of determination of 0.98 ( r 2 = 0.992 )
indicates that 98 % of the data fits into the regression line.
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)
Given below are data set on y and x. Let the y be the response variable and x be the predictor variable.
Find the equation of the regression line equation and the value of the correlation coefficient , r. Interpret
your result.
x y
0 2
1 3
2 5
3 4
4 6
2. If r 2 = 0.99, how confident are you in using the regression line to estimate the response variable given
the predictor variable ?
a. not confident c. the relationship is weak to predict
b. very confident d. the relationship cannot be predicted
3. If the correlation coefficient is 0.90, the percentage of variation in the response variable explained by
the variation in the predictor variable is . . .
a. 0.90 % b. 90% c. 81% d. 0.81%
5. Larger values of r2 give us idea t hat the observations are more closely grouped about the . . . .
a. average value of the independent variables.
b. average value of the dependent variable
c. least squares line.
d. none of the above.
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Group yourselves by three. Search for a problem (with given data points) related to your profession
that uses regression analysis. Solve for the regression line and the correlation coefficient then
interpret your result.
KEY TO CORRECTION
Activity #3
Extending the columns of the preceding table.
0 2 0 0 4
1 3 3 1 9
2 5 10 4 25
3 4 12 9 16
4 6 24 16 36
SUM 10 20 49 30 90
i 1 i 1
1 n n
o yi 1 xi
1
20 0.9 (10) 0.20
n i 1 i 1 5
The value of r =0.90, indicates that there is a very high positive relationship between the y and the x
variables. Furthermore, the coefficient of determination of 0.81 ( r 2 = 0.902 ) indicates that 81 % of the
data fits into the regression line.
Activity #5
1) c
2) b
3) c
4) c
5) c