Quantitative Analysis For Business (A)
Quantitative Analysis For Business (A)
COHR1202
INTRODUCTION
Descriptive Statistics
1
discussion, analysis, and interpretations. It utilizes numerical and graphical methods to
look for patterns in a data set, to summarize the information revealed in a data set, and
to present the information in a convenient form.
EXAMPLE 1: A poll found that 49% of the people in a survey knew the name of the first
book of the Bible. The statistic 49 describes the number out of every 100 persons who
knew the answer.
Inferential Statistics:
Is also known as inductive statistics, goes beyond describing a given problem situation
by means of collecting, summarizing, and meaningfully presenting the related data.
Instead, it consists of methods that are used for drawing inferences, or making broad
generalizations, about a totality of observations on the basis of knowledge about a part
of that totality. The totality of observations about which an inference may be drawn, or a
generalization made, is called a population or a universe. The part of totality, which is
observed for data collection and analysis to gain knowledge about the population, is
called a sample. A decision, estimate, prediction, or generalization about a population,
based on a sample. – generalizing from a sample to a population, estimating unknown
parameters, drawing conclusions, making decisions.
Types of Variables
2
Qualitative → Quality
EXAMPLES: Gender, religious affiliation, type of automobile owned, state of birth, eye
color are examples.
Data Coding
Often, we assign arbitrary numerical values to qualitative data for ease of computer
entry and analysis. But these assigned numerical values are simply codes: They cannot
be meaningfully added, subtracted, multiplied, or divided. Fore example, we might code
Democrat = 1, Republican = 2, and Independent = 3.Coding an attribute as a number
does not make the data numerical. For example, 1 = Bachelor’s, 2 = Master’s, 3 =
Doctorate
For example,
1 = employed, 0 = not employed
1 = married, 0 = not married
1 = male, 0 = female
1 = female, 0 = male
The coding itself has no numerical value so binary variables are attribute data.
Qualitative data can be sub classified as either nominal data or ordinal data. The
categories of an ordinal data set can be ranked or meaningfully ordered. But the
categories of a nominal dataset can't be ordered.
3
Length, height, area, volume, weight, speed, time, temperature, humidity, sound
levels, cost, members, ages, etc.
Quantitative → Quantity
For example,
- Number of auto insurance claims filed in March (e.g., X = 114 claims).
- Can be broken down into two types – discrete or continuous data.
DISCRETE DATA
CONTINUOUS DATA
In contrast, continuous data consists of numerical values that are not restricted to
specific numbers. Such data is called continuous because there are no gaps between
feasible values. A continuous variable is the one that can assume any value between
any two points on a line segment, thus representing an interval of values. The values
are quite precise and close to each other, yet distinguishably different. All
characteristics such as weight, length, height, thickness, velocity, temperature, tensile
strength, etc., represent continuous variables A numerical variable that can have any
value within an interval .Any continuous interval contains infinitely many possible values
4
(e.g., 426 <X< 428).Quantitative variables can be classified as either discrete or
continuous.
LEVELS OF MEASUREMENT
5
At the nominal level of measurement, numbers or other symbols are assigned to a set
of categories for the purpose of naming, labeling, or classifying the observations.
Gender is an example of a nominal level variable. Using the numbers 1 and 2, for
instance, we can classify our observations into the categories "females" and "males,"
with 1 representing females and 2 representing males. We could use any of a variety of
symbols to represent the different categories of a nominal variable; however, when
numbers are used to represent the different categories, we do not imply anything about
the magnitude or quantitative difference between the categories.
If the categories (or values) of a variable can be rank-ordered, and if the measurements
for all the cases are expressed in the same units, then an interval-ratio level of
measurement has been achieved. Examples of variables measured at the interval-ratio
level are age, income, and exam scores. With all these variables we can compare
values not only in terms of which is larger or smaller, but also in terms of how much
larger or smaller one is compared with another. In some discussions of levels of
measurement you will see a distinction made between interval-ratio variables that have
a natural zero point (where zero means the absence of the property) and those
variables that have zero as an arbitrary point. For example, weight and length have a
6
natural zero point, whereas temperature has an arbitrary zero point. Variables with a
natural zero point are also called ratio variables. In statistical practice, however, ratio
variables are subjected to operations that threat them as interval and ignore their ratio
properties. Therefore, no distinction between these two types is made in this text.
Data, or facts, may be derived from several sources. Data can be classified as primary
data and secondary data. Primary data is data gathered for the first time by the
researcher; secondary data is data taken by the researcher from secondary sources,
internal or external. Those data which do not already exist in any form, and thus have to
be collected for the first time from the primary source(s). By their very nature, these data
require fresh and first-time collection covering the whole population or a sample drawn
from it. The researcher must thoroughly search secondary data sources before
commissioning any efforts for collecting primary data. There are many advantages in
searching for and analyzing data before attempting the collection of primary data. In
some cases, the secondary data itself may be sufficient to solve the problem. Usually
the cost of gathering secondary data is much lower than the cost of organizing primary
data. Moreover, secondary data has several supplementary uses. It also helps to plan
the collection of primary data, in case, it becomes necessary. Secondary data is of two
kinds, internal and external. Secondary data – whether internal or external – is data
already collected by others, for purposes other than the solution of the problem on
hand.
7
Advantages to the secondary data collection method are - 1) it saves time that would
otherwise be spent collecting data, 2) provides a larger database (usually) than what
would be possible to collect on one’s own. However there are disadvantages to the fact
that the researcher cannot personally check the data so it's reliability may be
questioned.
PRIMARY data is data that you collect yourself using such methods as:
direct observation
Surveys
Interviews
Primary data can be relied on because you know where it came from and what was
done to it. There's a lot more secondary data than primary data, and secondary data
is a whole lot cheaper and easier to acquire than primary data. The problem is that
often the reliability, accuracy and integrity of the data is uncertain. Who collected it?
Can they be trusted? Did they do any pre-processing of the data? Is it biased? How
old is it? Where was it collected? Can the data be verified, or does it have to be
taken on faith? Often secondary data has been pre-processed to give totals or
averages and the original details are lost so you can't verify it by replicating the
methods used by the original data collectors.
In short, primary data is expensive and difficult to acquire, but it's trustworthy.
Secondary data is cheap and easy to collect, but must be treated with caution.
IMPORTANCE OF STATISTICS
There are three major functions in any business enterprise in which the statistical
methods are useful. These are as follows:
i. The planning of operations: This may relate to either special projects or to
the recurring activities of a firm over a specified period.
8
ii. The setting up of standards: This may relate to the size of employment,
volume of sales, fixation of quality norms for the manufactured product, norms
for the daily output, and so forth.
iii. The function of control: This involves comparison of actual production
achieved against the norm or target set earlier. In case the production has
fallen short of the target, it gives remedial measures so that such a deficiency
does not occur again.
LIMITATIONS OF STATISTICS
Statistics has a number of limitations, pertinent among them are as follows:
i. There are certain phenomena or concepts where statistics cannot be used.
This is because these phenomena or concepts are not amenable to
measurement. For example, beauty, intelligence, courage cannot be
quantified. Statistics has no place in all such cases where quantification is not
possible.
ii. Statistics reveal the average behavior, the normal or the general trend. An
application of the 'average' concept if applied to an individual or a particular
situation may lead to a wrong conclusion and sometimes may be disastrous.
For example, one may be misguided when told that the average depth of a
river from one bank to the other is four feet, when there may be some points
in between where its depth is far more than four feet. On this understanding,
one may enter those points having greater depth, which may be hazardous
iii. Since statistics are collected for a particular purpose, such data may not be
relevant or useful in other situations or cases. For example, secondary data
(i.e., data originally collected by someone else) may not be useful for the
other person.
iv. Statistics are not 100 per cent precise as is Mathematics or Accountancy.
Those who use statistics should be aware of this limitation.
v. In statistical surveys, sampling is generally used as it is not physically
possible to cover all the units or elements comprising the universe. The
results may not be appropriate as far as the universe is concerned. Moreover,
9
different surveys based on the same size of sample but different sample units
may yield different results.
vi. At times, association or relationship between two or more variables is studied
in statistics, but such a relationship does not indicate cause and effect'
relationship. It simply shows the similarity or dissimilarity in the movement of
the two variables. In such cases, it is the user who has to interpret the results
carefully, pointing out the type of relationship obtained.
.
SELF-TEST QUESTIONS
1. Define Statistics. Explain its types, and importance to trade, commerce and business.
2. “Statistics is all-pervading”. Elucidate this statement.
3. Write a note on the scope and limitations of Statistics.
4. What are the major limitations of Statistics? Explain with suitable examples.
5. Distinguish between descriptive Statistics and inferential Statistic?
A simple bar chart is used to represents data involving only one variable classified on spatial,
quantitative or temporal basis. In simple bar chart, we make bars of equal width but variable
length, i.e. the magnitude of a quantity is represented by the height or length of the bars.
10
Years
Profit (million
$)
By multiple bars diagram two or more sets of inter-related data are represented
(multiple bar diagram facilities comparison between more than one phenomena). The
technique of simple bar chart is used to draw this diagram but the difference is that we
use different shades, colors, or dots to distinguish between different phenomena. We
use to draw multiple bar charts if the total of different phenomena is meaningless.
Example:
11
Draw a multiple bar chart to represent the import and export of Canada (values in $) for the years 1991
to 1995.
Multiple bar Charts showing the imports and exports of Canada in 1991-1995
12
Component Bar Chart
Sub-divided or component bar chart is used to represent data in which the total
magnitude is divided into different or components.
In this diagram, first we make simple bars for each class taking total magnitude in that
class and then divide these simple bars into parts in the ratio of various components.
This type of diagram shows the variation in different components within each class as
well as between different classes. Sub-divided bar diagram is also known as component
bar chart or staked chart.
Example: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced on a
certain form during the years 1991 to 1994.
The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced on a
certain form during the years 1991 to 1994.
13
Construct a component bar chart to illustrate this data
Solution:
To make the component bar chart, first of all we have to take year wise total production.
14
Percentage Component Bar Chart
Sub-divided bar chart may be drawn on percentage basis. To draw sub-divided bar chart on percentage
basis, we express each component as the percentage of its respective total. In drawing percentage bar
chart, bars of length equal to 100 for each class are drawn at first step and sub-divided in the proportion
of the percentage of their component in the second step. The diagram so obtained is called percentage
component bar chart or percentage staked bar chart. This type of chart is useful to make comparison in
components holding the difference of total constant.
Example:
The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced on a certain form during the
years 1991 to 1994.
15
Solution: Necessary computations for the construction of percentage bar chart given
below:
Item
Wheat
Barley
Oats
Total
16
PIE CHARTS
Pie chart can used to compare the relation between the whole and its components. Pie chart is a
circular diagram and the area of the sector of a circle is used in pie chart. To construct a pie chart
(sector diagram), we draw a circle with radius (square root of the total). The total angle of the circle
Example:
The following table gives the details of monthly budget of a family. Represent these figures by a suitable
diagram.
Food
Clothing
House Rent
Miscellaneous
Total
17
Expenditure $ Angle of Sectors Cumulative Angle
Food
Clothing
House Rent
Miscellaneous
Total
“A single value which can represent the whole set of data is called an average”. If the
average tends to lie or indicating the centre of the distribution is called measure of
central tendency or sometimes they locate the general position of the data, so they are
also called measure of location.
18
Desirable Qualities of a Good Average:
Introduction
The word average is commonly used in day-to-day conversations. For example, we may
say that Albert is an average boy in a high school class; we may talk of an average
American, average income, etc. When it is said, "Albert is an average student," it means
19
is that he is neither very good nor very bad, but a mediocre student. However, in
statistics the term average has a different meaning.
(2) Median
(3) Mode
However the most common measures of central tendencies or Locations are Arithmetic
mean, median and mode.
Arithmetic Mean
Arithmetic mean is the amount secured by dividing the sum of values of the items in a
series by their number.The arithmetic average may be defined as the sum of aggregate
of a series of items divided by their number.
Thus, we add all observations (values of all items) together and divide this sum by the
number of observations (or items).
Ungrouped Data
Suppose, we have 'n' observations (or measures) x 1 , x2 , x3...xn then the Arithmetic
mean is obviously
We shall use the symbol x (pronounced as x bar) to denote the Arithmetic mean. Since
we have to write the sum of observations very frequently, we use the usual symbol
(pronounced as sigma) to denote the sum. The symbol x i will be used to denote, in
20
general the 'i' th observation. Then the sum, x 1 + x2 + x3 + .......+ xn will be represented
n
by x
i 1
i or x i simply
Example A variable takes the values as given below. Calculate the arithmetic mean of
110, 117, 129, 195, 95, 100, 100, 175, 250 and 750.
x = 110 + 117 + 129 +195 + 95 +100 +100 +175 +250 + 750 = 2021
n
Where n = 10
A = Assumed Mean = A u
Calculations:
ui= -65, -58, -46, +20, -80, -75,-75, +0, + 75, +575
= 670 - 399
= 271/10 = 27.1
21
= 175 + 27.1
= 202.1
Solution:
n = 5
Arithmetic mean =
Short-cut Method:
Sometimes the values of x are very big and in that case, to simplify the calculation the
short-cut method is used. For this, first you assume a mean (called as the assumed
mean). Let it be A. Now find the deviations of all the values of x from A. We now get a
new variable ui = xi - A
22
Now find
then
Family: A B C D E F G H I J
Expenditure: 300 700 100 750 500 80 120 250 100 370
(in dollars).
H L
I
K
23
H- Is the highest observed value
L-is the lowest observed value
k- is number of classes
Unequal classes are not normally used bcoz they make computation of the me3an,
median and the mode difficult. In a frequency distribution the CI is the difference
between two corresponding lower limits.
interval midpoint
51-<59 III 3 55 165
59-<67 IIIII I 6 63 378
67-<75 IIII 4 71 284
75-<83 III 3 79 337
83-<91 II 2 87 174
91-<99 II 2 95 190
Total 1428
24
n
fm i i
X i 1
fi
- is class frequency
x
1428
20
=71
In calculation of arithmetic mean, the importance of all the items was considered to be equal.
However, there may be situations in which all the items under considerations are not equal
importance. For example, we want to find average number of marks per subject who appeared in
different subjects like Mathematics, Statistics, Physics and Biology. These subjects do not have
equal importance. If we find arithmetic mean by giving Mean.
The arithmetic mean computed by considering relative importance of each item is called weighted
arithmetic mean. To give importance to each item under consideration, we assign a number called
weight to each item in proportion to its relative importance.
Xw
wx
w
Example: A student obtained 40, 50, 60, 80, and 45 marks in the subjects of Maths, Physics,
25
Chemistry and Biology. Assuming weights5, 2, 4, 3, and 1 respectively for the above subjects,
calculate the weighted arithmetic mea
Marks ObtainedWeight
Subjects
Math
Statistics
Physics
Chemistry
Biology
Total
26
It is capable of further algebraic treatment such as finding the sum of the values of the
observations, if the mean and the total number of the observations are given; finding the
combined arithmetic mean when different groups are given etc.
Demerits
It is affected by outliers or extreme values. For example, the average (A.) mean of 10,
Due to the outlier 500 the A. mean of the four numbers is raised to 137.5. In such a
case A. mean is not a good representative of the given data.
Many a times it gives absurd results like 4.4 children per family.
We cannot calculate it when open-end class intervals are present in the data
i. The sum of the deviations of the individual items from the arithmetic
27
in the negative direction, the arithmetic mean is regarded as a
measure of central tendency.
ii. The sum of the squared deviations of the individual items from the
arithmetic mean is always minimum. In other words, the sum of the
squared deviations taken from any value other than the arithmetic
mean will be higher.
iii. As the arithmetic mean is based on all the items in a series, a change
in the value of any item will lead to a change in the value of the
arithmetic mean.
iv. In the case of highly skewed distribution, the arithmetic mean may get
distorted on account of a few items with extreme values. In such a
case, it may cease to be the representative characteristic of the
distribution.
THE MEDIAN
It is the value of the size of the central item of the arranged data (data arranged in the
ascending or the descending order). Thus, it is the value of the middle item and divides
the series in to equal parts. Median is defined as the value of the middle item (or the
mean of the values of the two middle items) when the data are arranged in an
ascending or descending order of magnitude.
The median is that value of the variable which divides the group into two equal parts,
one part comprising all values greater and the other all values lesser than the median.
For example, the daily wages of 7 workers are 5, 7, 9, 11, 12, 14 and 15 dollars. This
series contains 7 terms. The fourth term i.e. $11 is the median.
Set the individual series either in the ascending (increasing) or in the descending
(decreasing) order, of the size of its items or observations.
28
The median = size of observation
Example The following figures represent the number of books issued at the counter of a
Statistics library on 11 different days. 96, 180, 98, 75, 270, 80, 102, 100, 94, 75 and
200. Calculate the median.
Solution:
Arrange the data in the ascending order as 75, 75, 80, 94, 96, 98, 100, 102,180, 200,
270.
2468, 591, 437, 20, 213, 143, 1490, 407, 284, 176, 263, 19, 181, 777, 387, 302, 213,
204, 153, 733, 391, 176 178, 122, 532, 360, 65, 260, 193, 92, 672, 258, 239, 160, 147,
151.
Solution:
29
20, 65, 92, 131, 142, 143, 147, 151, 153, 160, 169, 176, 178, 181, 193, 204, (213, 39),
258, 263, 260, 384, 302, 360, 387, 391, 407, 437, 522, 591, 672, 733, 777, 1490, 2488.
Example 2: The following data give the savings bank accounts balances of nine sample
households selected in a survey. The figures are in rupees.
Steps:
Determine the particular class in which the value of the median lies. Use as the rank
After ascertaining the class in which median lies, the following formula is used for
determining the exact value of the median.
Median =
where, = lower limit of the median class, the class in which the middle item of the
distribution lies.
30
f = sample frequency of the median class
It should be noted that while interpolating the median value of frequency distribution it is assumed that
the variable is continuous and that there is an orderly and even distribution of items within each class.
Example Calculate the median for the following and verify it graphically.
Solution:
Median =
Therefore, Median
31
Sometimes the series is given in the descending order of magnitude. In this situation
convert the series in the ascending order of magnitude and then using the regular
formula, the median can be calculated or the series can be put in the descending order
of the magnitude and an alternative formula be used to calculate the median.
It is not affected by extreme values like the arithmetic mean. For example, 5 persons
have their incomes $2000, $2500, $2600, $3000, $5000. The median would be $2600
while the arithmetic mean would be $3020.
32
It can be used for qualitative studies.
Even if the extreme values are unknown, median can be calculated if one knows the
number of items.
Demerits of Median
1) The median can also be determined graphically whereas the arithmetic mean
cannot be ascertained in this manner.
2) As it is not influenced by the extreme values, it is preferred in case of a
distribution having extreme values.
3) In case of the qualitative data where the items are not counted or measured but
are scored or ranked, it is the most appropriate measure of central tendency.
Mode
It is the size of that item which possesses the maximum frequency. It is the value of the
variable which occurs most frequently in a distribution is called the mode.It is the most
common value. It is the point of maximum density.
Ungrouped Data
Individual series: The mode of this series can be obtained by mere inspection. The
number which occurs most often is the mode.
33
Example Locate mode in the data 7, 12, 8, 5, 9, 6, 10, 9, 4, 9, 9
If in any series, two or more numbers have the maximum frequency, then the mode will
be difficult to calculate. Such series are called as Bi-modal, Tri-modal or Multi-modal
series.
Grouped Data
Steps:
f f0
Mode = L1 ( L2 L1 )
2 f1 f 0 f 2
Where
Solution:
34
Here the maximum frequency is 12, corresponding to the class interval (35 -< 40) which
is the modal class.
Therefore
By interpolation
f f0
Mode = L1 ( L2 L1 )
2 f1 f 0 f 2
12 8
5
= 35+ 24 8 7
= $37.22
MERITS OF MODE
It is simple to calculate.
35
It is easy to understand. Everyone is used to the idea of average size of a garment, an
average American etc.
Like the Average mean, it is not a value which cannot be found in the series.
It is not necessary to know all the items. What we need the point of maximum density
frequency.
DEMERITS
36
In case, a distribution is skewed to the right, then mean> median> mode. Generally,
income distribution is skewed to the right where a large number of families have
relatively low income and a small number of families have extremely high income. In
such a case, the mean is pulled up by the extreme high incomes and the relation among
these three measures is as shown in Fig below. Here, we find that mean> median>
mode.
37
(ii) When a distribution is skewed to the left, then mode> median> mean. This is
because here mean is pulled down below the median by extremely low values. This is
shown as in the figure.
(iii) Given the mean and median of a unimodal distribution, we can determine whether it
is skewed to the right or left. When mean> median, it is skewed to the right; when
median> mean, it is skewed to the left. It may be noted that the median is always in the
middle between mean and mode.
Geometric Mean
It is another measure of central tendency .Geometric mean can be defined in the following
terms:Geometric mean is the nth positive root of the product of “n” positive given values
It is useful in finding the average change in %, ratios, indexes and growth rates over time. The
geometric mean is most suitable in the following three cases: averaging rates of change, The
compound interest formula, discounting, and capitalization. It is useful for GDP which compound
38
or built over each other. The GM is always less than or equal but never more than the arithmetic
mean. All the data values must be positive for one to use the GM.
If we have a series of n positive values with repeated values such as x1 , x2 , x3 x4 are
Where
Example:
Find the Geometric Mean of the values 10, 5, 15, 8, 12
Solution:
Example:
Find the Geometric Mean of the following Data
Solution:
Using the formula of geometric mean for grouped data, geometric mean in this case will
become:
39
The method explained above for the calculation of geometric mean is useful when the
numbers of values in given data are small in number. When a set of data contains large number of
values then we need an alternative way for computing geometric mean. The alternative way of
computing geometric mean is given as under:
Total Example:
Find the Geometric Mean for the following
40
distribution of students’ marks:
No. of Students 20 30 40 10
Solution
0-<30 20 15 20log15=23.5218
50-<80 40 65 40log65=72,5165
80-<100 10 90 10log90=19.5424
ADVANTAGES OF G. M.
i. Geometric mean is based on each and every observation in the data set.
ii. It is rigidly defined.
iii. It is more suitable while averaging ratios and percentages as also in calculating growth rates.
iv. As compared to the arithmetic mean, it gives more weight to small values and less weight to
large values. As a result of this characteristic of the geometric mean, it is generally less than
41
the arithmetic mean. At times it may be equal to the arithmetic mean.
v. It is capable of algebraic manipulation. If the geometric mean has two or more series is known
along with their respective frequencies. Then a combined geometric mean can be calculated
by using the logarithms.
LIMITATIONS OF G.M.
In view of the limitations mentioned above, the geometric mean is not frequently used.
HARMONIC MEAN
The harmonic mean is defined as the reciprocal of the arithmetic mean of the
reciprocals of individual observations. It is mainly used to find the average speed
HM
n
HM
f i
1 f
x x
Calculate the harmonic mean of the numbers: 13.5, 14.5, 14.8, 15.2 and 16.1
Solution:
42
The harmonic mean is calculated as below:
Example 2
Total
Marks
Total
43
The main advantage of the harmonic mean is that it is based on all observations in a
distribution and is amenable to further algebraic treatment. When we desire to give
greater weight to smaller observations and less weight to the larger observations, then
the use of harmonic mean will be more suitable. As against these advantages, there are
certain limitations of the harmonic mean. First, it is difficult to understand as well as
difficult to compute. Second, it cannot be calculated if any of the observations is zero or
negative. Third, it is only a summary figure, which may not be an actual observation in
the distribution.
It is worth noting that the harmonic mean is always lower than the geometric mean,
which is lower than the arithmetic mean. This is because the harmonic mean assigns
lesser importance to higher values. Since the harmonic mean is based on reciprocals, it
becomes clear that as reciprocals of higher values are lower than those of lower values,
it is a lower average than the arithmetic mean as well as the geometric mean.
Example 1: It takes ship A 10 days to cross the Pacific Ocean; ship B takes 15 days
and ship C takes 20 days. (i) What is the average number of days taken by a ship to
cross the Pacific Ocean? (ii) What is the average number of days taken by a cargo to
cross the Pacific Ocean when the ships are hired for 60 days?
44
MEASURES OF DISPERSION
The measures of central tendencies (i.e. means) indicate the general magnitude of the
data and locate only the center of a distribution of measures. They do not establish the
degree of variability or the spread out or scatter of the individual items and their
deviation from (or the difference with) the means.
45
Students Group Group Group
X Y Z
1 50 45 30
2 50 50 45
3 50 55 75
mean 50 50 50
Thus, the three groups have same mean i.e. 50. In fact the median of group X and Y
are also equal. Now if one would say that the students from the three groups are of
equal capabilities, it is totally a wrong conclusion then. Close examination reveals that in
group X students have equal marks as the mean, students from group Y are very close
to the mean but in the third group Z, the marks are widely scattered. It is thus clear that
the measures of the central tendency is alone not sufficient to describe the data.
Definition of dispersion: The arithmetic mean of the deviations of the values of the
individual items from the measure of a particular central tendency used. In measuring
dispersion, it is imperative to know the amount of variation (absolute measure) and the
degree of variation (relative measure). In the former case we consider the range, mean
deviation, standard deviation etc. In the latter case we consider the coefficient of range,
the coefficient mean deviation, the coefficient of variation etc.
MEASURES OF DISPERSION
RANGE
In any statistical series, the difference between the largest and the smallest values is
called as the range.
46
The range is based on the two extreme observations. It gives no weight to the central
values of the data. It is a poor measure of dispersion and does not give a good picture
of the overall spread of the observations with respect to the center of the observations.
Let us consider three groups of the data which have the same range:
Coefficient of Range: The relative measure of the range. It is used in the comparative
In all the three groups the range is 50 – 30 = 20. In group A there is concentration of
observations in the center. In group B the observations are friendly with the extreme
corner and in group C the observations are almost equally distributed in the interval
from 30 to 50. The range fails to explain these differences in the three groups of data.
This defect in range cannot be removed even if we calculate the coefficient of range
which is a relative measure of dispersion. If we calculate the range of a sample, we
cannot draw any inferences about the range of the population.
Coefficient of Range:
It is relative measure of dispersion and is based on the value of range. It is also called
range coefficient of dispersion. It is defined as:
H L
Coefficient of Range =
H L
The range H-L is standardized by the total H+L
Let us take two sets of observations. Set A contains marks of five students in Financial
Mathematics out of 25 marks and group B contains marks of the same student in QUAB
out of 100 marks.
47
Set A: 10, 15, 18, 20, 20
Set B: 30, 35, 40, 45, 50
the values of range and coefficient of range are calculated as:
Set B: (QUAB)
In set A the range is 10 and in set B the range is 20. Apparently it seems as if
there is greater dispersion in set B. But this is not true. The range of 20 in set B is for
large observations and the range of 10 in set A is for small observations. Thus 20 and
10 cannot be compared directly. Their base is not the same. Marks in Financial
Mathematics are out of 25 and marks of QUAB are out of 100. Thus, it makes no sense
to compare 10 with 20. When we convert these two values into coefficient of range, we
see that coefficient of range for set A is greater than that of set B. Thus there is greater
dispersion or variation in set A. The marks of students in QUAB are more stable than
their marks in Mathematics.
Question 1
Following are the wages of 8 workers of a factory. Find the range and the
coefficient of range. Wages in ($) 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440.
Example2 :
Find the range of the weight of the students of a university.
Weights (Kg)
Number of Students
48
Solution:
Method 1:
Here H- Upper class boundary of the highest class is 74.5
L- Lower class boundary of the lowest class 59.5
H L
Coefficient of Range: =15/134= 0.1114
H L
It measures the amount by which the values in a saample vary from their mean. The
MD is the arithimetic mean of the absolute values of deviations from the mean.
MD
x 1 x
n
where : x value of each observation
x sample mean
n number of observations
indicates the absolute value
E.g. the numbers of TVs sold at OK Entumbane are: 20, 40, 60, 70, and 30. Calculate
the mean absolute deviation. Its shows that the number of TVs sold at OK Entumbane
deviate non average by 17 TVs from the mean of 44 TVs per day.
49
Mean Absolute Deviation For grouped data
MD= f mx i
n
f- Class frequency
m- Class midpoints
n- Total frequency
VARIANCE
Variance is defined as follows: It is the arithmetic mean of squared deviations from the
mean. It is non- negative and is zero only when all observations are the same. A small
variance indicates that the observations have values close to the mean, i.e. the
observations are more clustered around the mean, and therefore the mean is more
representative as a measure of central tendency. The variance is difficult to interpret
because it is in units squared, e.g.
x x
2
Sample Variance = 2
1
n 1
x1 is the i th observation
x is the sample mean
Where is the sample size
n
STANDARD DEVIATION
It is the square root of the arithmetic mean of the squared deviations of various values
from their arithmetic mean. it is denoted by 2 . The standard deviation is in the same
units as the units of the original observations. If the original observations are in grams,
the value of the standard deviation will also be in grams.
x x
2
1
n 1
50
f (m
i i x )2
= n 1
n is sample size
fi is class frequency
mi is class midpo int
Where x is sample mean
Merits:
Demerits:
EXAMPLE; the following marks were scored by students in a company law exam
51
Mean =47.29
148.3195
=$12.18
CO-EFFICIENT OF VARIATION
100
C. V. = x
Example Calculate the standard deviation and it’s co-efficient from the following data.
Calculations:
i)
xi x
446
sd 6.7
ii) n 1 10
sd 6.7
iii.C.V *100 0.45 45%
x 15
Percentile
A percentile divides the values into 100 equal size each comprising of 1%of the
observations. The median describes the 50th percentile.
52
The nth percentile is that value (or size) such that n% of values of the whole data lies
below it. For example, a score of 7% from the topmost score would be 93 the percentile
as it is above 93% of the other scores.
PERCENTILE RANGE
It is used as one of the measure of dispersion. It is a set of data and is defined as = P90
- P10 where P90 and P10 are the 90th and 10th percentile respectively. The semi -
percentile range,
QUARTILE
It divides the values into four parts of equal size, each comprising of 25% of the
observations. The median describes the second quartile below which 50% of the values
falls. The first and the third quartile are referred to as hinges
N 1
Q1
4
2( N 1)
Q2
4
3( N 1)
Q3
4
INTERQUARTILE RANGE
If we concentrate on two extreme values (as in the case of range), we don’t get any idea
about the scatter of the data within the range (i.e. the two extreme values). If we discard
these two values the limited range thus available might be more informative. For this
53
reason the concept of inter-quartile range is developed. It is the range which includes
middle 50% of the distribution. Here 1/4 (one quarter of the lower end and 1/4 (one
quarter) of the upper end of the observations are excluded.
Now the lower quartile (Q1) is the 25th percentile and the upper quartile (Q3) is the 75th
percentile. NB; note that the 50th percentile is the middle quartile (Q2) which is in fact
the’ Median ". Thus symbolically
Therefore Q. D. (SIQR) =
54
part is mirror image of the other
It may happen that two distributions have the same mean and standard deviations.
For example, see the following diagram.
Although the two distributions have the same means and standard deviations they
are not identical. They differ in symmetry. The left-hand side distribution is
symmetrical one where as the distribution on the right-hand is asymmetrical or
skewed. For a symmetrical distribution, the values, of equal distances on either
side of the mode, have equal frequencies. Thus, the mode, median and mean - all
coincide. Its curve rises slowly, reaches a maximum (peak) and falls equally slowly.
But for a skewed distribution, the mean, mode and median do not coincide.
Skewness is positive or negative as per the positions of the mean and median on
the right or the left of the mode.
A positively skewed distribution curve rises rapidly, reaches the maximum and falls
slowly. A positively skewed distribution is characterized by many outliers in the
upper region, or right tail. It’s skewed to the right because if the relatively long
upper tail. A negatively skewed distribution has a disproportionately large amount
of outliers that fall within its lower tail.
55
TESTS OF SKEWNESS
1. The values of mean, median and mode do not coincide. The more the difference
between them, the more is the skewness.
2. Quartiles are not equidistant from the median. i.e. (Q3 -Me) not equal to(Me - Q 1).
3 The sum of positive deviations from the median is not equal to the sum of the negative
deviations.
4. Frequencies are not equally distributed at points of equal deviation from the mode.
5. When the data is plotted on a graph they do not give the normal bell-shaped form.
3 Mean Median
Skewness
KURTOSIS
Kurtosis is a measure of whether the data are peaked or flat relative to a normal
distribution. That is, data sets with high kurtosis tend to have a distinct peak near the
mean, decline rather rapidly, and have thin tails. Data sets with low kurtosis tend to
have a flat top near the mean rather than a sharp peak. A uniform distribution would be
the extreme case.
56
Kurtosis it is the degree of flatness or ’peakedness’ in the region of mode of a frequency
curve. It is measured relative to the ’peakedness’ of the normal curve. It tells us the
extent to which a distribution is more peaked or flat-topped than the normal curve. If the
curve is more peaked than a normal curve it is called ’Leptokurtic.’ In this case items are
more clustered about the mode. If the curve is more flat-toped than the normal curve, it
is Platykurtic. The normal curve itself is known as "Mesokurtic."
57