BS101 StudyGuide 1 2021 Final
BS101 StudyGuide 1 2021 Final
Business Statistics
(BS101B)
This module forms a compulsory core module for the following undergraduate academic
programmes:
SECTION B ................................................................................................................................ 20
SECTION C ................................................................................................................................ 69
Word of Welcome
Welcome to the exciting world of Business Statistics, an important discipline within the broad
field of marketing. Statistical analysis has not only gained status as a science discipline but has
also gained wide acceptance as a critical tool to generate information for effective business
decision making.
Herewith a brief summary of the BCom in Marketing and Management Science indicating
where Business Statistics (BS101B) fits in.
Programme Purpose
The purpose of this qualification is to provide candidates in the private, public and voluntary
sectors with comprehensive and in-depth knowledge of the principles, major theories, and
paradigms, skills, methods, and technology of the science and profession of the field of
marketing, management, supply chain, sales and project management. This, in order to
promote sustainable growth and development and maximise prosperity in all sectors of the
economy and society at large.
To develop competent leaders with applied economic, management, supply chain, project
management, sales and marketing skills as well as generic cross-functional knowledge and
skills to steer sustainable development, growth and prosperity in the most appropriate
direction.
To provide students who want to enrol for advanced studies in management, supply chain,
project management, sales and marketing, with a sound academic base, to apply their skills
and for further advancement in careers and academic studies in the field of marketing, sales,
supply chain, project management, and management science.
Programme Outcomes
Programme Exit-Level Outcomes:
Students must demonstrate an integrated understanding of a broad scope of
management knowledge and how it practically applies to the disciplines of marketing,
sales management, supply chain and project management.
To demonstrate a comprehensive understanding of the knowledge regarding
economics, financial management, research as applied to marketing, sales, supply chain
Module purpose:
The task of statistical analysis is to help generate accurate information for major decision
makers in the world of business. The required information is often used to design a marketing
strategy, and for this reason, the collected information will assist in identifying marketing
opportunities and threats, formulating marketing plans and actions, and evaluating and
improving overall marketing performance.
Statistics is the science of collecting, organising and interpreting numerical facts, which we
call data (Moore & McCabe, 1989). In other words, statistics make sense of numbers. We read
and hear about statistics every day: Temba Bavuma’s average batting score, the results of the
municipal elections, the average summer rainfall in Gauteng, the monthly range in the oil
price. The understanding of statistics is important in many professions, and marketing is no
exception. Business decisions are based on the results of market research, so you can
appreciate how important an understanding of statistics is going to be to you.
Statistics is a fascinating subject that you can apply to everyday situations. During this learning
programme you will become familiar with terms and concepts that you may have shied away
from until now. Statistics allow us to use data to gain insight into day-to-day simple or
complex problems. Data on their own are meaningless without the ability to understand
them. At the end of this programme you will be able to critically interpret reported statistics,
both within the business environment and in the popular press, as well as analyse data using
the appropriate methodology.
Although the basis of statistics is a series of complex formulae, you are not required to
memorise these formulae. However, you will be expected to use the formulae extensively
during analysis, as well as to think about the reasoning behind the particular statistical
methods.
b. Prescribe IMM Graduate School Study Guide for BS101B, dated January
2020.
NB: the prescribed book forms the foundation of knowledge required to master all learning
outcomes. All the examples and required chapters must be attempted. The study guide
provides additional summaries, examples and information to unlock these concepts.
You will need a basic calculator that is typically used at schools and
universities, similar to the one displayed in the picture. It can perform a
variety of functions, including fraction calculations, percentage calculations,
scientific calculations, and statistical calculations. It is a CASIO fx 82 ZA Plus.
This type of calculator is adequate for the basic business calculations that you
will have to perform during this course.
You are registered for this module on a distance learning basis and you are expected to work
on your own 70% of the time. However, this does not mean that you are completely on your
own. Please use the available IMM Graduate School Student Support resources to help you
during your studies.
The IMM Graduate School is committed to assisting students with all queries and have
introduced [email protected] to answer all general queries. This is supported by a
ticketing system that issues students with a unique ticket number and ensures we are able to
track the progress of queries, ensure prompt response and swift resolution times.
NB: Please ensure that all module specific questions and queries are still posted on the
module specific discussion forums, available on eLearn. Do not leave your queries to the last
day before you write your examination or before the assignment submission due dates.
You are required to regularly visit eLearn as it is an essential source of information that is
continuously updated with topical material, additional guidance, messages and tutorial
letters.
eLibrary is an excellent place for you to read additional material on your own. This tool
will be extremely valuable when conducting research for your assignments / projects
/ research reports. For access to the virtual library, please follow the instructions
available on eLearn.
Information Centres - the IMM Graduate School has libraries in all Student Support
Centres with textbooks and additional materials that could help you in your
assignments when you need to reference additional sources. For opening times at
eMasterclassess - in our on-going efforts to support our students, the IMM Graduate
School hosts online tutorials in all our modules for additional guidance and support.
Subject matter experts share their knowledge through the use of a presentation or
video conferencing addressing learning outcomes, assignment and examination
preparation, etc., giving ample opportunity for student feedback and interaction.
eDiscussion Forum– join group forums for discussions, to post questions, discuss
subject content and to receive updates on specific modules.
The Journal of Strategic Marketing - the official publication of the IMM Institute of Marketing
Management, which keeps you up-to-date with the latest news and trends of what is
happening in the industry. Another publication is the Strategic Marketing Africa magazine,
which addresses the unique marketing challenges and opportunities in Africa. These
magazines are released quarterly and could assist you in providing examples to use in
assessments to back up your theoretical knowledge. Both of these magazines are available
electronically on eLearn.
At this point you should understand the learning process already explained, as well as what
Business Statistics is all about and you should be ready to start your journey towards the
successful completion of your module.
Prescribed textbook
BS101B study guide
IMM Graduate School eLearn platform
IMM Graduate School eLibrary platform
Let’s recap what the relevant module learning outcome is for this study unit:
Let’s recap what the relevant study unit learning outcome is for this study unit
Data is readily available from a variety of sources and of varying quality and quantity, but data
does not help marketers. Statistical analysis is used to process the raw data into useful
information that can be used to make decisions. This, together with marketing and research
tools, enables researchers to reach valid conclusions regarding the marketing and business
problems or issues being explored. The course must, therefore, be considered a preparation
for the practical issues which confront marketers when making decisions. (Shipham, 2012)
This is why applied statistics is regarded as a decision support tool.
The first year Statistics students at the Tshwane University of Technology conducted a study
of a random sample of 50 students to measure the average daily amount spent at the canteen.
This information will be used to help the canteen set its pricing to be in line with the needs of
the students.
1. What was the population? (1)
2. What is the sample? (1)
3. What type of data are the average amounts spent? (1)
Did you complete all the relevant revision exercises and check your answers
against the answers provided?
At this point, you should be able to: (list study unit outcomes again)
Let’s recap what the relevant module learning outcome is for this study unit:
Describe and understand how data are distributed, how data can be summarised in
statistical terms, and how this summarised data can be used by marketing managers
to aid decision-making.
Let’s recap what the relevant study unit learning outcome is for this study unit
At the end of this section of the study unit, you should be able to use different graphical
presentations in order to summarise and interpret data with reference to a particular
marketing decision-making situation.
1. Introduction
For example: A retail clothing store chain surveys 600 Port Elizabeth residents to identify if
they would use a new mail order catalogue to purchase their clothing. The survey will result
in a data set of 600 values. The descriptive statistics will summarise this information to make
it useful for the marketing department of the clothing chain.
Statistical results are most often displayed by means of graphs in annual reports, newspaper
articles, research studies, etc.
This method is far more effective for communicating results than writing the words and
numbers.
Graphic displays of data are useful for presentation of results so that they can be quickly
and easily understood by clients, suppliers or colleagues.
Graphic displays of data are useful for analysis of data as it allows for easy comparison
of results.
This study unit deals with the most common ways in which data can be summarised
graphically, methods used to produce each type, when to use each type, and limitations.
2. Charts
Charts are one type of graph most commonly used to summarise or describe qualitative data
collected in marketing research studies. Qualitative data are essentially converted into
‘count’ data (totals for each category of observation) before they can be displayed in a chart.
This is known as a frequency distribution. The two types of charts described here are pie
3. Pie charts
Pie charts are commonly used to present relative frequency distributions for qualitative data
(most often, categorical data). To draw a pie chart, first draw a circle. Use the relative
frequencies (percentages) to subdivide the circle into segments.
An example of the construction of a pie chart, and its interpretation, based on example in
Wegner (2020):
TABLE 2.1
Grocery store preference of shoppers
Total 30 100%
Figure 2.1
Grocery store preference of shoppers
Checkers Pick n Pay Spar
10%
33%
57%
4. Bar charts
A simple bar chart can be constructed from the same frequency distribution used to construct
a pie chart. The same information is displayed by means of vertical columns (bars), rather
than segments. The categories are given on the horizontal (x) axis and the frequencies are
given on the vertical (y) axis. Using the same data in Table 2.1:
Figure 2.2
Grocery store preference of sshoppers
18
16
14
12
10
0
Checkers Pick n Pay Spar
A component bar chart (more commonly called a stacked bar chart) allows you to summarise
data that have two or more variables.
For example, shoppers preference for Checkers, Pick n Pay or Spar (variable 1 = store
preference) divided up into males and females (variable 2 = gender). The chart looks similar
to a simple bar chart but the components of each bar are stacked on top of each other. A
multiple bar chart is very similar to a stacked bar chart but the components are presented
side-by-side, rather than as a single bar with two layers. Differences between categories are
more obviously displayed, using this type of bar chart. Using the same data in Table 2.1, just
adding the gender variable:
TABLE 2.2
Checkers 7 3 10
Pick n Pay 10 7 17
Spar 2 1 3
Total 19 11 30
15
7
10
3 Male
5 10 Female
7
1
0 2
Checkers Pick n Pay Spar
Store preference
Interpretation: Figure 2.3 still shows that the majority of female shoppers surveyed prefer
shopping at Pick n Pay. The majority of male shoppers also prefer shopping at Pick n Pay. This
study tells us that Pick n Pay is the most preferred store out of the three store options given
for both males and females. Checkers is the second most preferred store for both males and
females.
Tip: A graph is meaningless unless it has a title, and both x- and y-axis labels (with units
specified, if appropriate).
5. Frequency distributions
A frequency distribution is a tabular summary of a set of data showing the frequency (or
number) of items in each of several non-overlapping categories, such as gender, age group,
etc. The sum of the frequencies always equals the total number of elements in the data set.
The sum of the relative frequencies always equals 1 (or 100%).
Table 2.1 and Table 2.2 are both examples of a frequency distribution.
6. Histograms
A histogram looks similar to a bar chart but there are no spaces between the columns, and
the intervals are continuous. The class frequencies are plotted on the y-axis and the class
intervals, which are of equal width, on the x-axis. Usually, spaces are created on the left and
right hand sides of the graph.
Example from Wegner (2020): We are still interested in finding out more about shoppers who
buy groceries from our three stores under investigation – Checkers, Pick n Pay and Spar. Now
we would like to investigate the age of the shoppers. The results of this study are summarised
in the frequency distribution below:
TABLE 2.3
Frequency distribution – age of shoppers
Age (years) Number of shoppers Percentage
Class Boundaries
Variable under study (Frequency) (Relative Frequency)
(20 - 29) 19.5 – 29.5 6 20%
(30 – 39) 29.5 – 39.5 9 30%
(40 - 49) 39.5 – 49.5 8 27%
(50 - 59) 49.5 – 59.5 4 13%
(60 - 69) 59.5 – 69.5 3 3%
30 100%
10
9
9
Number of shoppers
8
8
7
6
6
5
4
4
3
3
2
1
0
Age in years
19.5 - 29.5 29.5 - 39.5 39.5 - 49.5 49.5 - 59.5 59.5 - 69.5
Interpretation: Figure 2.4 shows that the age group with highest frequency is the 30 to 39
year olds, followed closely by the 40 to 49 year olds. This tells us that you are more likely to
find a shopper aged between 30 and 49 than finding a shopper older than the age of 60. The
store marketers can use this information to target their marketing mix to suit the majority of
their customers, who are aged between 30 and 4 years old. Or the stores could decide to
appeal to older market by changing their marketing mix to appeal more to customers aged
50 and older, in the hope of increasing their market share.
7. Frequency polygon
A frequency polygon is a graph in which the mid-points of each column in a histogram are
connected by means of a straight line. This, in essence, converts a histogram to a line graph.
10
9
9
8
8
7
6
6
5
4
4
3
3
2
1
0
Age in years
19.5 - 29.5 29.5 - 39.5 39.5 - 49.5 49.5 - 59.5 59.5 - 69.5
10.Other graphs
Three other types of graphs are described, all essentially line graphs, namely, the line graph,
the Lorenz curve and the z-curve.
Note: As mentioned earlier in this study guide, MS Excel has an excellent graphing facility.
Data can be easily displayed, using any one of many types of graphs. It is relatively easy to
personalise your graphs (colours, font size, annotations, etc.) once you have mastered the
software.
At the end of this section of the study unit, you should be able to identify, describe, calculate
and interpret the appropriate central and non-central location measures for grouped and
ungrouped data. You should also be able to calculate and interpret the various measures of
spread. Meaningful interpretation of results is the key to effective marketing decision-making.
1. Introduction
Frequency tables and graphs provide only an approximate indication about the spread of the
data. For information to be valuable to marketing decision-making, we need more precise
information, specifically if we want to make comparisons between groups. Measures of
central location (or central tendency) are additional alternatives for summarising data, and
this chapter serves as an introduction to the world of statistical measurements. The three
measures discussed in Wegner (2020) are the mean, the median and the mode.
Dispersion or ‘spread’ describes the variability of the data which is important to know when
comparing different data sets, and in order to draw meaningful conclusions from analysis of
data. Various measures of dispersion are discussed in Wegner (2020).
When relating statistics to a marketing problem analysis, it is important to know which ones
to use and which to ignore, and so avoid presenting meaningless statistics in presentations
and reports. It is important that you be able to interpret the measurements in the context of
decision making. You do not have to remember the formulae themselves as all statistical
packages will calculate the statistics for you. Make sure you are able to use the stats function
on your calculator.
Note: If the measures are calculated for data from a sample, they are called sample statistics.
If they are calculated for data from a population, they are called population parameters.
̅)
2. Arithmetic mean (average) (𝒙
Size (n): 20
Formula: ∑fxi / n = 350 / 20 = 17.5
3. Mode (𝑴𝒐 )
Order data = 13; 15; 15; 16; 18; 18; 18; 18; 18; 19; 20; 20; 20; 20; 20; 20; 21; 21; 22; 24
Mode = 20 (most often occurring value)
Interpretation: More often an entrepreneur will spend 20 days on market research.
Order data = 13; 15; 15; 16; 18; 18; 18; 18; 18; 19; 20; 20; 20; 20; 20; 20; 21; 21; 22; 24
19.50
Interpretation: Half of the entrepreneurs spend 19.50 days or less on market research,
while the other half spend 19.5 days or more on market research.
Median class: n / 2 = 10 / 2 = 5 = the 5th frequency falls within the class (11-20)
Formula: Me = Ome + (c [ (n/2) – f(<)]) / fme ) = 11+ (10 [(20/2) – 0]) / 16 = 11+ 6.25 = 17.25
Interpretation: Half of the entrepreneurs spend 17.25 days or less on market research,
while the other half spend 17.25 days or more on market research.
5. Quartiles
Quartiles are non-central measures that divide an ordered data set into four equal parts
(Wegner, 2020). Make sure that you understand the difference between quartile position
and the quartile value.
Order data = 13; 15; 15; 16; 18; 18; 18; 18; 18; 19; 20; 20; 20; 20; 20; 20; 21; 21; 22; 24
Middle quartile (Q2) = n+1 / 2 = 21 / 2 = 10.5th position = 19+((20-19) x 0.5) = 19+0.50 = 19.50
Interpretation: 25% the entrepreneurs spend up to 18 days on market research, 50% will
spend up to 19.50 days on market research, and 75% will spend up to 20 days on market
research.
* Formula and process for calculating the quartiles for grouped data does not need to be
known.
Three additional measures of central location are described (the geometric mean, the
harmonic mean and the weighted average). These are used less often and will not be
examined.
Geometric mean: Used when the data represents percentage changes, such as indexes
and growth rates.
Weighted average: In some studies, the importance of some data is more than for others.
In these cases, data are assigned weights based on their importance.
6. Range
This is the simplest measure of dispersion and is usually written as ‘highest value (max) –
lowest value (min)’. For example, the range of weights of boys in the previous example is
131kg – 55kg = 76 kg. The range is influenced by extreme data values (outliers) and is not
widely used. However, it is very useful in identifying data outliers which you may want to
exclude from your analysis, or which may be incorrectly recorded (what would you think if
one of the boys weighed 313kg?).
7. Variance (𝒔𝟐 )
Variance is a measure of dispersion that utilises all the data values and, as such, it is extremely
powerful and the most commonly used measure. It is based on the difference between each
The standard deviation is the square root of the variance. It fluctuates very little from one
sample to the next taken from the same population and is the most important statistical
measure of spread. As such, you will see and use standard deviation frequently throughout
the remainder of this study guide, and it is important that you have a clear understanding of
the concept.
The standard deviation is a measure of absolute variability in a data set. The coefficient of
variation is a relative measure of variability, used when comparing two samples that have
vastly different means. In such a case, you would not get an accurate picture of the relative
dispersion in the two data sets by comparing the two standard deviations.
xi ̅
𝒙 (xi - 𝒙
̅) (xi - 𝒙
͞ )2
̅
The shape of the data, which you can observe from the frequency polygon is important for
you, since it influences the choice of measurements you are going to use to describe the data.
Skewness looks at the relationship between the mean, mode and median, and what that
relationship tells us about the data. There are three types of skewness. You must be able to
interpret the coefficient of skewness.
Symmetrical distribution
In this case, the mean, median and mode are all equal. The coefficient of skewness is equal
to 0.
In this case, the mode has the highest value, followed by the median and mean. The
coefficient of skewness is a positive value about 0. This means that the majority of items /
individuals act in a certain way (mode) and the rest act less often, e.g. the majority of students
study every day, but a few study less than 3 days a week.
In this case, the mode has the lowest value, followed by the median and mean. The coefficient
of skewness is a negative value below 0. This means that the majority of items / individuals
act in a certain way (mode) and the rest act more often, e.g. the majority of students study
only 3 days a week, but a few study more than 3 days a week.
Interpretation:
In the example used previously, where we investigated the amount of days an entrepreneur
spends on market research, the mean = 18.80 days, the median = 19.50 days and the mode =
20 days. This data is almost symmetrical in distribution because the three measures of central
location are close to one another, but because the mode is the highest value (20 days) the
data set is said to be slightly skewed to the right.
Data can be summarised in terms of five descriptive measurements that are also used in an
easy-to-view graph, the box plot. Make sure that you are able to draw and interpret the box
plot. A box plot is drawn up based on a five-number-summary-table, and them graphically
summarises that information:
19.50 (Me)
18 (Q1) 20 (Q3)
13 (Min) 24 (Max)
Interpretation: The range of the data set is 13 to 24 days. The middle value is 19.50 days. 25%
of the time an entrepreneur will spend up to 18 days on market research and 75% of the time
an entrepreneur will spend up to 20 days on market research. Based on the median, half of
the entrepreneurs will spend 19.50 days or less on market research, and the other half will
spend 19.50 days or more on market research.
A marketing manager for a popular hair product manufacturer has decided that he needs to
make better decisions regarding the choice of media to reach the target market. He has hired
you to conduct a study to identify the most popular media among the company’s target
market.
The following is a table of responses to the question: “From what medium do you get your
daily news?” Interviewers also recorded the age and gender of the respondents.
Age
Gender Medium
Initials (years)
BN 45 M TV
RF 34 M Newspaper
DF 87 M Newspaper
SD 23 F TV
IK 18 M Newspaper
HY 25 F TV
NM 26 F TV
MK 43 M Newspaper
JK 67 M TV
LO 40 F Radio
GB 23 M Newspaper
DP 46 F TV
SA 23 F Radio
EW 55 F Radio
CD 21 F TV
1. Summarise the data in three charts (use a combination of pie and simple bar charts),
depicting: 1) gender, 2) medium and 3) age (hint: divide the data into five age groups).
(13)
2. Calculate the range of the age of the respondents. Does this range show outliers? (3)
3. Calculate the mean ages of the respondents. Is this the most appropriate measure of
central location? Why/why not? (3)
4. Calculate the standard deviation of the ages. What does this tell you? (6)
6. Is the choice of medium influenced by age? Describe any 2 methods that you used to get
your answer. (3)
(1) Gender: A Pie chart is a good choice for depicting gender, as there is only male or
female, however a bar chart could also be used to summarise the data
47% M
53% F
(2) Medium: A Bar chart was used to summarise the medium, however a Pie chart
could also be used.
The mean can be influenced by outliers and is not always the best measure of
central location.
∑(𝑥−𝑥̅ )2 5363.6
3. Variance = = = 383.11 𝑦𝑒𝑎𝑟𝑠 2
𝑛−1 14
Interpretation: The variance shows that the average deviation from the mean is
383.11 years squared, OR according to the standard deviation the average
deviation is 19.57 years from the mean of 38.4 years. This means the ages of the
interviewers vary by almost 20 years.
4. TV is the most popular medium amongst woman
Newspaper is the most popular medium amongst men
The relative frequency distribution can be used to determine the most popular
mediums by gender.
5. No, the choice of medium is not influenced by age The following measures could
be used:
Did you complete all the relevant revision exercises and check your answers
against the answers provided?
At this point, you should be able to: (list study unit outcomes again)
This and the following study units are all about inferential statistics. The process of statistical
inference enables us, as marketers, to answer specific questions with a known degree of
confidence. This allows us to be more confident in our decision-making. Study Unit 3
introduces the concept of probability and describes the methods for generalising results. In
most circumstances, we obtain results from a sample of the target population.
Let’s recap what the relevant module learning outcome is for this study unit:
Understand the concept of probability and know which methods are used for generalising
results.
Let’s recap what the relevant study unit learning outcome is for this study unit
The student should be able to describe the properties and concepts of probability and
probability distributions, and define and apply the rules of the different types of
Define the different types Describe the properties and concepts of Chapter 4
of probabilities. probability and probability distributions.
Define and apply the rules of the different
types of probability.
Interpret probabilities and make decisions
based on the results.
Describe the properties Explain that a probability is a value between 0 Chapter 4
and concepts of a and 1.
probability in relation to
marketing decision-
making problem.
Apply the rules of Understand the different rules of Chapter 4
probability and describe probabilities. I.e. “less than”, “more than”, “at
the complement of an least”, “no more than”.
event and the process for
determining its
probability
Describe two common Describe the two common probability Chapter 5
probability distributions, distributions (Binomial and Poisson
i.e., Binomial and Poisson distributions).
distributions.
At the end of this chapter, you should be able to describe the properties and concepts of
probability, and define and apply the rules of the different types of probability. You should
also be able to interpret probabilities and make decisions based on the results.
1. Introduction
Although the rest of your Statistical Analysis studies deal with inferential statistics, please
remember that descriptive techniques should always be applied to your data, as a first step
in analysis. This section is about probability, although you will be relieved to know that there
is not much about probability theory, an independent and ongoing field of research! The
purpose is to teach you how to apply probability theory and for this you need to understand
the basic concepts and different types of probability.
2. Types of Probabilities
Probabilities are loosely divided into subjective and objective types, but the latter is the type
used in inferential statistics. The first section focuses on empirical probabilities (calculated
from data that have been collected).
For example, if you flip a coin, the probability of flipping heads is the same as the probability
of flipping tails, i.e. 50% or 0.5 or ½. If you have a crooked coin, with two heads, the
probability of flipping heads would be 1 (100%) and the probability of flipping tails would be
0 (0%).
Data may fall into only one category, or into more than one category.
For example: A clothing store asks you to conduct research into the split between male and
female customers, and who of those customers utilise extended shopping hours. The
following results were collected from a random sample of 100 shoppers:
When looking only at gender, a respondent is either male or female, meaning the data
will only fall into a single category, e.g. 61% are female.
However, when looking at gender and likelihood to use extended shopping hours,
respondents fall into two categories, e.g. 67% are females who would use extended
shopping hours.
This leads up to the concept of statistical independence, meaning the occurrence of one event
does not influence the occurrence of a second event. The most important thing to remember
is that the probability of two independent events is equal to the product of the probabilities
of the separate events. For example:
Flipping heads (of a coin) the first time will have no effect on the probability of flipping
heads (or tails) the second time = statistical independence.
Example: a study done on the companies listed on the JSE classified the companies into
groups according to industry sector and company size. This data is summarised in the cross-
tabulation below (Wegner, 2020):
Mining 3 8 30 41
Financial 9 21 42 72
Service 10 6 8 24
Retail 14 13 6 33
Marginal probability: what are the chances of a company listed on the JSE being
medium in size? 48 out of 170 = 48 / 170 x 100 = 28.24 %.
Joint probability: what are the chances of a company listed on the JSE being both
medium in size and in the retail industry? 13 out of 170 = 13 / 170 x 100 = 7.65 %.
Conditional probability: what are the changes of a JSE listed company that is already
known to be medium in size, being in the retail industry? 13 retail companies out of the
48 medium sized companies = 13 / 48 x 100 = 27.08 %.
n1 x n2 x n3 ……………
n! = n factorial = n(n-1)(n-2)(n-3)………..
k = number of objects
n! = n factorial = n (n-1)(n-2)(n-3)………..
r! = r factorial = r (r-1)(r-2)(r-3)………..
r = number of objects
n = total number of objects
At the end of this section, you should be able to understand the concept of probability
distributions, and know when to apply two different types, how to describe and compute
them, as well as how to apply them to marketing decision-making.
1. Introduction
The probability distribution for a random variable describes how the probabilities are
distributed over the values of the random variable. No matter what the type of random
variable, it will have an associated probability distribution, namely, a list of the possible
outcomes and their associated probabilities. For a discrete random variable, x, the probability
distribution is defined by the probability function, f(x). This section deals with the two most
commonly used discrete probability functions, i.e. Binomial and Poisson probability
distributions.
This section outlines the four properties of a binomial distribution. Remember that bi means
‘two’, so any random variable with two possible outcomes (success = p, or failure = q or 1 - p)
will have a binomial distribution. Think of flipping a coin and consider that a head is a success
and a tail is a failure (especially if you are betting!). The formulas may rather be complicated,
Note: A binomial probability distribution will always refer to a probability of success or failure.
n = number of trials
x = number of successes
n – x = number of failures
The Marketing Manager of a chain of Chicken Licken restaurants would like to know which
area of the business is most successful – take-away or eat in. He hires you to do some
marketing research.
If there is a 25% probability of a customer wanting a take-away and there are 10 customers
in the restaurant, what are the chances that 7 of them will want a take-away pizza?
n = 10
x=7
n–x=3
The Poisson distribution is often used when dealing with the number of occurrences of an
event over a specified interval of time or space, such as the number of Big Macs sold in one
hour, or the number of raisins in a muffin. It is impossible to know, beforehand, what the
maximum number can be. Two assumptions must be met when using a Poisson probability
function to describe the number of occurrences of a random variable:
the probability of the occurrence of the event is the same for any two intervals of equal
length, and
the occurrence of the event in any interval is independent of the occurrence in any
other interval.
Note: A Poisson probability distribution will always refer to an “average” or “mean” value.
𝑒 − 𝑥
P(number of occurrences (x) out of n trial) = 𝑥!
x=7
e = 2.71828
=5
𝟐.𝟕𝟏𝟖𝟐𝟖−𝟓 ×𝟓𝟕
P (7 customers out of 10) = = 0.104445 = 10.44% chance
𝟕!
Interpretation: On average 5 out 10 customers want take-away pizza, but there is a 10.44%
chance that the number could increase to 7 out 10 customers wanting a take-away pizza.
Use for marketing decision-making:
Knowing more about purchase trends – it is unlikely to vary from the mean
5. Probability rules
Probability of 1 out of 5: P (x = 1)
Probability of 4 out of 5: P (x = 4)
Probability of less than 1 out of 5: P (x = 0)
Probability of less than 3 out of 5: P (x = 0) + P (x = 1) + P (x = 2)
Probability of more than 1 out of 5: 1 – (P(x = 0) + P(x = 1))
Probability of at least 2 out of 5: 1 – (P (x = 0) + P (x =1))
Probability of no more than 2 out of 5: P (x = 0) + P (x = 1) + P (x = 2)
Probability of between 0 and 2 out of 5: P (x = 1)
Probability of between 0 and 2 (inclusive) out of 5: P (x = 0) + P(x = 1) + P (x = 2)
It is bell shaped
It is symmetrical about a central value, µ (population mean)
The tails of the distribution never touch the x axis
It is described by two parameters: population mean (µ) and population standard
deviation (σ)
The area under the curve equals 1 – this corresponds to the complete sample space
of a random experiment. This means it represents the sum of probabilities associated
with the variable being studied, or 100%
Due to the symmetry, the area above µ is 0.5
The probability associated with a particular range of x values is described by the area
under the curve, between the limits of the x range
A random variable that has a normal distribution with a mean (µ) of 0 and a standard deviation
(σ) of 1 is said to have a standard normal probability distribution. The letter z is commonly
used to denote this particular normal random variable.
Again, please remember the concept, but know that your statistical package will ‘look up’ the
values for you.
At the end of this chapter, you should understand the purpose of inferential statistics, the
difference between a sample and a population, the different types of sampling and reasons
for choosing a particular type in a marketing research study. You should also understand the
concept of a sampling distribution and explain its role in inferential statistics.
1. Introduction
Inferential statistics allows the results obtained or measured from a sample (e.g. how much
50 South African grocery store shoppers spend) to be used to estimate the true parameter of
the population from which the sample was chosen (e.g. the mean (µ) spend of all South
African grocery shoppers). In this chapter you will learn about different sampling techniques
and sampling distributions.
2. Sampling
When the population whose parameter you want to measure (e.g. mean (µ) spend of all South
African grocery shoppers) is large, a sample, or subset of the particular population, is usually
chosen from that population. Often it is too expensive or otherwise not feasible to measure
the parameter in the entire population. It would cost thousands of Rands to ask all single
mothers living in Gauteng, with one child who is younger than 10 years, what their household
income is. It would also be impossible to identify the entire population, even if you did have
the time and the money. In such a case, you would select a sample, and measure the
population statistic (the mean (𝑥̅ ) household income of the sample of single mothers), and
then extrapolate, using inferential statistics, to estimate the population parameter (the mean
This is so that you know, immediately, whether the measure comes from the sample or the
population.
3. Sampling methods
Only a sample selected using probability sampling methods can be used for inferential
statistics – as the sample needs to represent the population for the results of the study to be
generalised for the population. Simple random sampling is the easiest and most commonly
used method. Random numbers can be used to select a sample in this way. Such a sample
can be selected by drawing numbers from a hat. However, random number tables are
available in most statistical books, and they can be generated using statistical software.
Selection of the Lotto numbers is a bit like choosing random numbers from a ‘hat’. Each
number from one to 49 has an equal chance (probability) of being chosen.
If all possible samples of size n are chosen from a population and a statistic (e.g. sample mean
(𝑥̅ )) is calculated for each sample, all the values (e.g. sample mean for each sample) have a
certain distribution known as the sampling distribution of the statistic. In other words, the
sampling distribution is the distribution of values of the statistic in a large number of samples
Again, remember that, although the formulas are complicated, it is important to understand
the underlying concepts, but you do not need to memorise them.
A sampling distribution shows the relationship of the sample statistic (mean of the sample
(𝑥̅ )) and its population parameter (true mean of the population (µ)). From this, the level of
confidence in estimating the population parameter from a single sample statistic can be
established.
The measure of central location for a continuous variable is the mean; the measure of central
location for a categorical variable is the proportion. The concept is very similar to that
described in the previous section. A sampling distribution shows the relationship of the
sample proportion (p) and its population parameter (π). From this, the level of confidence in
estimating the population parameter from a single sample proportion can be established.
Often, two samples (𝑥̅ 1 – 𝑥̅ 2) are used to measure the difference between two populations
(µ1 - µ2), e.g. the difference in the mean turnover of a Gauteng shoes store (𝑥̅ 1) and a
Western Province (𝑥̅ 2) shoes store would be compared to the mean turnover of all the shoe
stores in Gauteng (µ1) and all the shoe stores in Western Province (µ1).
Often, two sample proportions (p1 – p2) are used to measure the difference between two
populations (π1 - π2), e.g. the difference in the proportion of male shoppers in a Gauteng golf
equipment store and a Western Province golf equipment store would be compared to the
proportion of men in all the golf equipment stores in Gauteng (π1) and all the golf equipment
stores in Western Province (π2).
1. A survey was conducted by the canteen management to identify the bread preferences
of their customers at a local college. This information will be used to help redesign the
menu to be more appealing to the students. The survey produced the following results:
50 eat white untoasted bread
12 eat white and brown untoasted bread, and brown toasted bread
2. A clothing store distributes flyers on a certain road in the Eastern Cape. There are, on
average, 4.5 sales per 10 flyers handed out.
a. What type of probability distribution is this? (1)
c. Determine the probability that there will be more than 2 sales for every 10 flyers
handed out. (8)
d. What does this information tell us about the effectiveness of this method of
advertising? (1)
3. In the same clothing store there is a 70% chance that a customer will be under the age of
50.
a. What type of probability distribution is this? (1)
b. If 10 customers are randomly surveyed, what is the probability that less than 3
customers will be over 50 years old? (8)
c. If 15 customers are randomly surveyed, what is the probability that more than 2
customers will be under 50 years old? (8)
d. What does this information tell us about customers of this store? (1)
1.
a. 266 students
50+30+24+12 116
b. White untoasted bread: = 266 = 43.61%
266
60+72+30+24+18+12 216
c. Brown bread: = 266 = 81.20%
266
50+60+30+18+24+12 194
d. Untoasted bread: = 266 = 72.93%
266
e. 81% of the respondents prefer brown bread and 72.93% of respondents prefer
untoasted bread, therefore the menu needs to be adjusted to have more brown
untoasted options.
2.
a. Poisson Probability distribution
b. n = 10
d. This is an effective way of marketing as there is a high probability (82.64%) that there
will be more than 2 sales for every 10 flyers handed out.
3.
a. Binomial Probability distribution
a. n = 10
x = 0, 1, 2
p = 0.7 = probability of a customer under the age of 50
1 - P = 1 – 0.7 = 0.3 = probability of a customer over the age of 50
Did you complete all the relevant revision exercises and check your answers
against the answers provided?
At this point, you should be able to: (list study unit outcomes again)
Let’s recap what the relevant module learning outcome is for this study unit:
Let’s recap what the relevant module learning outcome is for this study unit
In most cases data are gathered from a sample, rather than from a population. The student
should be able to understand the different measures available to choose an appropriate
sample, these methods are discussed in this study unit:
The sampling distribution describes the relationship between the sample statistic and
the population parameter.
At the end of this chapter, you should understand the concept of confidence intervals, as well
as be able to calculate and interpret them for use in marketing decision-making.
1. Introduction
Confidence intervals and point estimations enable us to determine the ‘accuracy’ of our
measured sample statistic in terms of the true population parameter.
2. Point estimation
Using a point estimate, you can extrapolate (used to create an estimate) the sample statistic
to the population directly, without any degree of confidence. In other words, the mean or
proportion of the sample surveyed is generalised for the population under study. The point
estimate is highly unlikely to be a measure of the true population parameter and is therefore
seldom used.
Remember that you are using sample statistics in order to make certain inferences about the
population. A confidence interval estimate is a range of values within which the population
parameter is expected to lie. The key word is expected and therefore you calculate the
The width of the confidence interval is dependent on various aspects. The narrower the
confidence interval, the more precise the interval estimate.
The confidence interval is expressed by the z values from a standard normal probability
distribution.
The z-limits identify the number of standard errors either side of the sample mean point
estimate that reflects the probability that derived the confidence limits that will cover
the true population mean.
Greater confidence is associated with lower precision (a wider range within which an
acceptable answer may fall), and vice versa.
In order to calculate the confidence interval estimate for the population mean you require
three statistical measures. You do not have to memorise the formula, but make sure you are
able to use the formula in order to calculate the limits and that you are able to interpret the
results.
𝜎 𝜎
𝑥̅ − 𝑧 ≤ 𝜇 ≤ 𝑥̅ + 𝑧
√𝑛 √𝑛
Example from Wegner (2020): A survey of a random sample of 300 grocery shoppers in
Kimberley found that the mean value of their grocery purchases was R 78. Assume that the
population standard deviation of grocery purchases is R 21.
Find the 95% confidence limits for the average value of a grocery purchase by all grocery
shoppers in Kimberley.
Solution:
𝜎 𝜎
Calculation: 𝑥̅ − 𝑧 ≤ 𝜇 ≤ 𝑥̅ + 𝑧
√𝑛 √𝑛
21 21
= 78 − 1.96 ; 78 + 1.96
√300 √300
Interpretation: There is a 95% chance that the population parameter will fall between the
upper and lower limits calculated. I am 95% confident that the average amount spent by a
shopper in Kimberley is between R 75.62 and R 80.38.
When the standard deviation of the population is unknown, interval estimation of the
population mean is based upon a probability distribution known as the t-distribution. The t-
distribution is known as a robust distribution as it can be used to provide satisfactory results
for many possible population distributions. This is the most common distribution used in
calculating confidence intervals.
The calculation of a confidence interval estimate is described for situations where the
population standard deviations are unknown and the sample size is small. In this case you
make use of the t-distribution in order to calculate the level of confidence. When the sample
size is greater than 40 you can use the z-limits as an approximation to the t-limits.
𝑠 𝑠
𝑥̅ − 𝑡(𝑛−1) ≤ 𝜇 ≤ 𝑥̅ + 𝑡(𝑛−1)
√𝑛 √𝑛
As the population standard deviation is not known, the z values can’t be used. Degrees of
freedom are used instead, which are found by using (n – 1):
6 5 +/- 2.571
11 10 +/- 2.228
25 24 +/- 2.064
41 40 +/- 2.021
61 60 +/- 2.000
Example: A clothing store analysed the value of a random sample of 25 credit card purchases.
The sample mean was found to be R 170 and the sample standard deviation was R 22.
Set the 95% confidence limits for the actual mean value of credit card purchases made at this
store.
Solution:
n (sample size) = 25
𝑠 𝑠
Calculation: 𝑥̅ − 𝑡(𝑛−1) ≤ 𝜇 ≤ 𝑥̅ + 𝑡(𝑛−1)
√𝑛 √𝑛
22 22
= 170 − 2.064 √25 ; 170 + 2.064 √25
Interpretation: There is a 95% chance that the population parameter will fall between the
upper and lower limits calculated. I am 95% confident that the actual mean value of credit
card purchases at the store lies between R 160.92 and R 179.08.
The statistical analyses you are going to use depend on the type of data you have sampled. If
you have sampled categorical data, the appropriate measure of central location is the
proportion. Again it is important that you understand the concept of a confidence interval
and correctly interpret the calculated interval.
𝑝𝑞 𝑝𝑞
𝑝 − 𝑡√ ≤ 𝜋 ≤ 𝑝 + 𝑡√
For a small sample: 𝑛 𝑛
(t-limits)
𝑝𝑞 𝑝𝑞
For a large sample: 𝑝 − 𝑧√ ≤ 𝜋 ≤ 𝑝 + 𝑧√
𝑛 𝑛
(z-limits)
Example: A recent survey of 240 randomly selected street vendors in Johannesburg showed
that 84 of them felt that local by-laws still hampered their trading (84 out of 240 = a
proportion)
Find the 90% confidence limits for the true proportion,𝜋 of all Johannesburg street vendors
who believe that local by-laws still hamper their trading
𝒒 = 1 − 0.35
z – value = +/- 1.645
𝑝𝑞 𝑝𝑞
Calculation: 𝑝 − 𝑧√ ≤ 𝜋 ≤ 𝑝 + 𝑧√
𝑛 𝑛
0.35(1−0.35) 0.35(1−0.35)
= 0.35 − 1.645√ ; 0.35 + 1.645√
240 240
= 0.299; 0.401
Interpretation: There is a 90% chance that the population parameter will fall between the
upper and lower limits calculated. I am 90% confident that true proportion of Johannesburg
street vendors who believe local by-laws hamper their trading is between 29.9 % and 40.1%
1. Introduction
Hypothesis tests are often used to validate claims about products or about statistics. This
section describes the hypotheses tests that can be conducted on the four different population
parameters discussed in the previous two sections
Formulation of the null hypothesis (𝐻0 ) (and, conversely, the alternative hypothesis) is the
key element of this process. It is important that you fully understand and be able to clearly
formulate and articulate null hypotheses. Basically, the null hypothesis states that the
difference between the hypothesised population parameter and the true population
parameter is zero (i.e. there is no difference). The alternative hypothesis (𝐻1 ) states that
there is a difference. However, it is important to establish whether this difference is a ‘greater
than’ difference, a ‘smaller than inference, or merely a difference in any direction. This will
determine which test you use (a one-sided or two-sided test).
Note that in hypothesis testing, it is always the null hypothesis that is tested, rather than the
alternative hypothesis. Once you have performed the calculation, you will either accept the
null hypothesis or reject it, based on where your sample statistic falls in relation to the area
of acceptance. It is incorrect to say that you accept the alternative hypothesis.
Statistical conclusions can be based by using two approaches. Most statistical programmes
will produce a p-value (approach two) for any test of significance. Therefore it is important
that you understand how to make use of approach two, using the p-value method, for
example, if a hypothesis test produces a p value of less than 0.05 (at a significance level of
95%) then it can be said the difference is statistically significant and the null hypothesis is
rejected. At a p-value of 0.05 or greater means that the difference is not statistically
significant and the null hypothesis is accepted.
Make sure you are able to apply the five steps of hypothesis testing to any example.
Understand the interpretation of the p-value.
𝑥̅ − 𝜇
𝑧𝑐𝑎𝑙𝑐 = 𝜎
√𝑛
Example: The Grocery Retailers Association of South Africa (GRASA) believes that the average
amount spent on groceries by Cape Town shoppers on each visit to the supermarket is R 175.
To test this belief, the association commissioned Market Research e-Africa to conduct a
survey among a random sample of 360 grocery shoppers at supermarkets in Cape Town.
Based on the survey, the average value of grocery purchases was R 182.40. Assume that the
population of grocery purchase values is normally distributed with a standard deviation of R
67.50. Can GRASA conclude that grocery shoppers spend R 175, on average, on each visit to
a supermarket? Conduct a test at the 5% level of significance.
Solution:
1) Hypotheses:
H0: There is no difference between the population parameter and the sample statistic; 𝜇
= R 175
H1: There is a difference between the population parameter and the sample statistic; 𝜇
≠ R 175; 𝜇 <> R 175
2) Region of acceptance:
Accept if the values falls between -1.96 and 1.96 (two-tailed test); reject if bigger or less
than.
182.4−175
3) Calculation: 𝒛 = = 7.4 / 3.558 = 2.08
67.5/√360
Reject the Null hypothesis as 2.08 falls outside the area of acceptance (-1.96 to 1.96).
5) Conclusion:
This shows that the average amount spent by grocery shoppers in Cape Town is not R175,
and that GRASA’s claims cannot be supported based on the sample results.
Apply the five steps of hypothesis testing to examples if the population standard deviation is
unknown. When this is the case, you replace the population standard deviation with the
sample standard deviation. When the population standard deviation is unknown and the
sample size is less than or equal to 40 then you always make use of the t-statistic.
𝑥̅ − 𝜇
𝑡𝑐𝑎𝑙𝑐 = 𝑠
√𝑛
Example adopted from Wegner (2020): SARS officials think that on average it takes 45
minutes or less for the typical South African to complete their tax return via e-filing. To test
this, SARS conducted a study on 12 tax-paying South Africans to assess how long the e-filing
system takes. They found that the average completion time was 41.5 minutes, with a sample
standard deviation of 9.04 minutes. Can SARS conclude that tax payer’s take, on average, 45
minutes or less to complete their tax returns via e-filing? Conduct a test at the 5% level of
significance.
Solution:
1) Hypotheses:
H0: SARS is correct; it takes 45 minutes or less to complete a tax return via e-filing; 𝜇 ≥45
minutes.
H1: SARS is not correct, it takes longer than 45 minutes to complete a tax return; 𝜇 < 45
minutes.
2) Region of acceptance:
∝ = 5% = -2.201
Accept if the value falls at or above -2.201 (one-tailed test); reject if less than.
41.5−45
3) Calculation: 𝒕 = = -3.5 / 2.6096 = -1.341
9.04/√12
5) Conclusion:
This survey revealed that the average amount of time taken to complete a tax return via
e-filing is not less than 45 minutes. Thus SARS’ claim cannot be supported, with a 5% level
of significance.
Follow the example in the case when a claim is made about the central value of a categorical
variable. In this case we refer to this measurement as a sample proportion. You will notice
that the same four steps are followed as in the case of hypothesis testing for the mean.
Example from Wegner (2020): A mobile phone service provider, Cell D Mobile, claims that it
has 15% of the prepaid mobile phone market. A competitor, who commissioned a market
research company to conduct a survey amongst prepaid mobile phone users, challenged this
claim. The market research company randomly sampled 360 prepaid mobile users and found
that 42 users subscribe to Cell D Mobile.
Test, at a 1% level of significance, Cell D Mobile’s claim that they have 15% share of the
prepaid market.
Solution:
1) Hypotheses:
H0:𝝅 = 𝟎. 𝟏𝟓
2) Region of acceptance:
∝ = 1% = +/- 2.58
Accept if the values falls between -2.58 and 2.58; reject if bigger / less than.
0.1167−0.15
3) Calculation: 𝒛 = = -0.0333 / 0.0188 = -1.771
0.15(1−0.15)
√
360
This proves that the claims made by Cell D Mobile are correct, and with a 1% level of
significance, they do hold 15% of the market share.
After studying this section you should be able to perform hypotheses tests for two population
problems when different kinds of data are surveyed.
1. Introduction
In this chapter you make use of exactly the same principles and procedures you have used in
the previous section, except now you apply it to comparing parameters of two populations.
You have to know the four procedures of hypothesis testing; the main difference in this
section is the test statistics that are calculated in order to test the null hypothesis.
These kinds of hypotheses are executed to make decisions regarding real marketing
problems, such as:
There are different assumptions that you need to be aware of when performing these kinds
of hypothesis tests. Work through the given examples and make sure you are able to perform
the steps. If you look at all the formulas and calculations, you may get discouraged, but
remember that all of the tests follow the same five steps. The greatest challenge for you is to
know what the applicable assumptions are for that specific problem and which test statistic
to choose in order to perform the hypothesis test. It may benefit you to use a flow diagram
with all the different combination parameters and assumptions with the relevant test statistic
to use.
(𝑥̅1 − 𝑥̅ 2 ) − (𝜇1 − 𝜇2 )
𝑧𝑐𝑎𝑙𝑐 =
𝜎12 𝜎22
√
𝑛1 + 𝑛2
Example Wegner (2020): The marketing manager of PQ Printers wants to be sure that they
are offering their customers the best possible service, and since part of their marketing
offering is a “speedy same-day delivery’, the marketing manager wants to be sure they are
using the quickest couriers to deliver to their customers. PQ Printers currently uses Courier A,
but wants to compare their delivery times to Courier B. PQ Printers assess the delivery times
of the last 60 times that they had used Courier A to deliver, and found that the sample mean
delivery time was 42 minutes, with a population standard deviation of 14 minutes. Courier B
was tested 48 times, and it was recorded that the sample mean delivery time was 38 minutes,
with a population standard deviation of 10 minutes. Test the claim with a 5% level of
significance that there is no difference between the two delivery companies.
Solution:
1) Hypotheses:
H0: 𝜇 1 − 𝜇 2 = 0; the delivery companies are the same
2) Region of acceptance:
Accept if the value falls between - 1.96 and + 1.96; reject if greater than 1.96, or less than
-1.96.
5) Conclusion:
This means that the marketing manager of PQ Printing can safely continue to use Courier A,
as there is not enough evidence to suggest that Courier B delivers any faster, at a 5% level of
significance.
(𝑝1 − 𝑝2 ) − (𝜋1 − 𝜋2 )
𝑧𝑐𝑎𝑙𝑐 =
1 1
√𝜋̂(1 − 𝜋̂)( + )
𝑛1 𝑛2
Example Wegner (2020): After a recent AIDS awareness campaign, the Department of
National Health commissioned a market research company to conduct a survey on its
effectiveness. Their brief was to establish whether the recall rate of teenagers differed from
that of young adults. The market research company interviewed a random sample of 640
teenagers and 420 young adults. It was found that 362 teenagers and 260 young adults were
able to recall the AIDS awareness slogan. Test, at the 5% level of significance, the hypothesis
that there is an equal recall rate between teenagers and young adults.
1) Hypotheses:
H0: 𝜋1 − 𝜋2 = 0; the recall rates are equal
2) Region of acceptance:
Accept the Null hypothesis as -1.62 falls within the area of acceptance.
5) Conclusion:
This means that at a 5% level of significance, there is no difference between the recall rates
teenagers and young adults regarding the AIDS awareness slogan.
1. Introduction
One often ends up with a table with ‘counts’ of a value, such as the number of males and
females who prefer Willards or Simba potato chips. This information would be displayed in a
cross-tabulation, i.e. a table with two rows and two columns of numbers. The chi-square
If two variables are independent, then there is no association between them. If they are not
independent and there is some relationship between them, then the next step in the analysis
is to study the nature of the relationship. The chi-square statistic can be used to determine if
there is an association. This section describes the test and the five steps to be followed. Two
assumptions are made about the data, namely, a simple random sample of size n has been
selected from a large population, and the sample size is reasonably large.
This test is used mainly to confirm the normality of a data set (i.e. are the data normally
distributed?) but can be used to confirm any underlying probability model. The observed
frequency distribution is compared to the expected frequency distribution. Again, the steps
in the procedure are described with the aid of examples.
Note: The chi-square approximation of the test statistic for the goodness-of-fit test is, strictly
speaking, only applicable if the number of sample observations is large and all the expected
frequencies are at least five.
(𝑓𝑜 − 𝑓𝑒 )2
𝑥 2 𝑐𝑎𝑙𝑐 = ∑
𝑓𝑒
𝑓𝑜 is the observed frequency
𝑓𝑒 is the expected frequency
Example:
Metrorail Commuter Service is studying the daily commuting patterns of workers into
the central business district of Cape Town. A study conducted seven years ago found
that 40% of commuters used trains, 25% used cars, 20% used taxis and 15% used buses.
1) Hypotheses:
H0: 𝒇𝒐 = 𝒇𝒆; commuting patterns today are the same as they were seven years ago.
2) Region of acceptance:
Accept if the value falls at or below 7.815; reject if higher than 7.815.
IF critical value is not given, use degree of freedom table
df = k – m – 1 = 4 (train, car, taxi, bus) – 0 – 1 = 3
∝ = 5% = 7.815.
3) Calculation: = 9.956
4) Compare value with region of acceptance:
Reject the Null hypothesis as 9.956 falls outside the area of acceptance (7.815)
5) Conclusion:
The results of the research show that there has been a change in commuting patterns and
Metrorail should take the new trends into consideration.
1. A large South African general dealer hired a marketing research company to research into
the effectiveness of introducing a new drill at a discounted prices. A representative sample
of 120 stores in the chain was chosen, and the stores were randomly split in two equal
groups of 60 stores. These stores did not advertise, and displayed their merchandise in
similar ways. A new kind of drill was introduced in all 120 stores. Group A introduced the
drill at the special low price of R 599, with the price increasing to R 649 after two weeks.
Group B introduced the drill at the regular price of R 649. Total sales of the drills were
computed for each store for the first two weeks; the results are given below.
Group A 50 5.8%
(R 599)
Group B 66 15.5%
(R 649)
- 16
a. Define “population” as it relates to marketing research, and state what the population
of this study would be. (2)
b. Define “sample” as it relates to marketing research, and state what the sample of this
study consists of. (2)
g. Calculate the confidence limits for the actual mean sales of the new drills under group
A’s discounted price of R 599 (4)
h. Interpret the above answer in terms of the marketing research conducted (2)
i. If you were a store manager for one of the group A stores, what inventory level of the
new drills would you stock? Explain your answer (2)
a. Population – the collection of all the observations of a random variable under study.
In this study the population would be all stores in the chain selling the new drills.
b. Sample – a representative subset of the population on which observations are made.
In this study the sample would be the 120 stores chosen to be in the study.
c. Variable under study – the selling price of the new drills
d. Standard deviation (CV x Mean):
Group A: 5.8% 𝑥 50 = 2.9
Group B: 15.5% 𝑥 66 = 10.23
e. 95% confident
f. Critical value for 95% = 1.96
𝜎 𝜎
g. 𝑥̅ − 𝑧 ≤ 𝜇 ≤ 𝑥̅ + 𝑧
√𝑛 √𝑛
2.9 2.9
50 − 1.96 ≤ 𝜇 ≤ 50 + 1.96
√60 √60
49.27 ≤ 𝜇 ≤ 50.73
Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises
Did you complete all the relevant revision exercises and check your answers
against the answers provided?
At this point, you should be able to: (list study unit outcomes again)
A critical aspect of managing any organisation is planning for the future, and developing
appropriate strategies. Although good judgement and intuition are invaluable when making
decisions, there are several statistical methods that can help predict many future aspects of
a business operation.
Shipham (2012) comments that modern tools such as “key performance indicators” can be
used to support the identification of problems and opportunities in the marketing
environment. Figures such as product sales, by unit and value, and customer databases with
information like who customers are and what types of products they are interested in, allows
an assessment of trends. These trends can pinpoint a need for research and management
intervention.
This study unit explains some of the methods. Essentially, these are the issues that you will
be facing in the ‘real world’. Everything that you have learnt so far is applied to the concepts
in these three last chapters.
Let’s recap what the relevant module learning outcome is for this study unit:
Use statistical methods to predict future aspects of a business operation.
Let’s recap what the relevant module learning outcome is for this study unit
At the end of the first section of this study unit, the student should:
At the end of the second session of the stud unit, the student should:
Understand the purpose of index numbers and be able to calculate different indices.
Be able to distinguish between different weighting methods, discuss pitfalls of index
number construction, and revise the base period of a series of index numbers.
Be able to interpret the results and make marketing decisions based on them.
At the end of the final section of this study unit, the student should:
Be able to identify the components of a time series, compute the trend and seasonal
influence in a time series, de-seasonalise a time series, and forecast future values.
Be able to analyse this information to make good marketing decisions.
1. Introduction
Regression analysis is merely a way of looking at the relationship between variables, and is
used when two or more variables are involved. For example, while income levels often
determine the brands a market segment purchases, and while age usually also determines
the brands a market segment purchases, it is difficult to picture what their independent
contributions are. This is where linear regression becomes a useful tool. If one can quantify
the contribution of income and age (the independent variables) to brand purchased (the
dependent variable), by constructing a linear model, then one can predict the brand
purchases of someone if his or her income level and age is known. However, this section
addresses regression analysis using only one independent variable as a predictor of a
dependent variable. The extent to which these two variables ‘match’ is known as correlation.
The area of regression analysis is far more complex than is presented in this chapter, and most
textbooks deal with it comprehensively, if you wish to learn more about it.
As regression analysis aims to produce a linear model, the straight line that best fits the data
will be determined by the regression line. The sample variance around the regression line is
a measure of spread of the observed y values around the regression line (refer to Study Unit
2). The mathematical calculation of determining the regression line is known as the least
squares method. Again, we are fortunate to live in a period of technological advancement,
and we can rely on the statistical software to perform the calculations for us. You will have
to be able to calculate the coefficients for the regression line using the least squares method
Extrapolation is the process of estimating values of y using values of x which lie outside
the range of x values used in the construction of the estimated regression line.
Extrapolation can lead to unreliable or meaningless results.
If the range of x values lies between 8 and 15, then 8≤x≤15. This means that you may
only use values that fall within that range. If you were to include values outside of that
range, the results would not be accurate.
Example: estimate the amount of electricity generated based on tonnes of coal usage.
Use the range 8≤x≤15.
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
b1 =
𝑛 ∑ 𝑥2 − (∑ 𝑥)2
∑ 𝑦 − b1 ∑ 𝑥
b0 =
𝑛
𝑦 = 𝑎 + 𝑏 (𝑥)
Example Wegner (2020): Music Technologies, an electronics retail company in Durban, has
recorded the number of flat-screen TVs sold each week and the number of advertisements
placed weekly for a period of 12 weeks.
Ads (x) 4 4 3 2 5 2 4 3 5 5 3 4
For a class project, you are required to calculate the linear regression equation:
Solution:
x y xy x2
4 26 104 16
4 28 112 16
3 24 72 9
2 18 36 4
5 35 175 25
2 24 48 4
4 36 144 16
3 25 75 9
5 31 155 25
5 37 185 25
3 30 90 9
4 32 128 16
(12)(1324)− (44)(346)
𝑏= (12)(174)− (44)2
= 664 / 152 = 4.368 = 4.37
346− 4.368(44)
𝑎= = 12.817 = 12.82
12
𝒚 = 12.82 + 4.37 𝒙
Using the results from the project, if 10 ads were placed, predict how many flat screen TV’s
would be sold:
𝒙 = 10
𝒚 = 12.82 + 4.37 (𝟏𝟎)
𝒚 = 56.52 = 57 TV’s would be sold
3. Correlation analysis
The correlation coefficient measures the strength of the linear association between two
quantitative variables (but does not indicate that one causes the other). It is denoted by r
which can have values from -1 to +1. A value of -1 or +1 denotes 100% correlation (negative
and positive, respectively); a variable of 0 denotes zero correlation.
1000 1000
Hot
500 Ice cream 500
chocolate
sales
0 0 sales
0 20 40 0 20 40
Daily temperature Daily temperature
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ] × [𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Example: Find the correlation between daily temperature and ice cream sales.
(x) Temp 20 25 24 27 22 28 28 25 19
(y) Sales 50 60 55 69 52 82 78 57 45
Solution:
(x) (y) (x2) xy (y2)
9 (13587)− (218)(548)
𝑟= = 2819 / 3042.0835 = 0.92667 = 0.93
√[9 (5368)− (218)2 ] × [9 (34672)− (548)2 ]
Interpretation: There is a strong positive linear relationship between the temperature and
ice cream sales (0.92667 is almost + 1). This means that warmer days lead to more sales. Store
owners can use weather forecasts to predict sales and order stock accordingly. This also
means ice cream prices can be increased in warmer weather.
1. Introduction
Price changes have a direct effect on the daily living expenses of every person. Everyone has
to make provisions for price increases due to inflation. The rising cost of living leads to a
demand for higher salaries, which increase production costs which increase prices (vicious
circle, isn’t it?). Various indices related to price changes can be calculated, e.g. price and
quantity indices. Likewise, indices can be calculated for changes in other things such as
availability of electricity (which we see through load shedding), or the number of residents in
a town. These are known as quantity indices. The aim of the calculation of various types of
indices is to monitor price and quantity changes over time.
Note: The plural of index is indices, although Wegner (2020) uses the word indexes.
2. Price indices
𝑝0 = older price
Example: January 2017: a 330 ml can of SuperCola sells for R 9.50 January 2018: a 330 ml can
of SuperCola sells for R 12.50
12.50
Solution: × 100 = 131.58% =132%
9.50
Interpretation: There was a 32% increase in the price of a 330 ml can of SuperCola over the
12 month period.
∑( 𝑝1 × 𝑞0 )
𝐿𝑎𝑠𝑝𝑒𝑦𝑟𝑒𝑠 𝑝𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑝0 × 𝑞0 )
𝑝1 = more recent price
𝑝0 = older price
𝑞0 = older quantity
∑( 𝑝1 × 𝑞1 )
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑝𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑝0 × 𝑞1 )
𝑝1 = more recent price
𝑝0 = older price
𝑞1 = more recent quantity
Example: as a marketing student, you are interested in how price increases have affected the
most basic items in your local grocery store. In order to get a better understanding of this,
you will calculate the Laspeyres and Paasche price indices.
p0 x q0 p1 x q0 p1 x q 1 p0 x q1
72.15 77.70 84 78
554.86
Solution: Laspeyres price index: 511.81 × 100 = 108.41%
Interpretation: There was an 8% increase in the price of toiletry items between 2017 and
2018.
478.94
Solution: Paasche price index: 442.34 × 100 = 108.27%
Interpretation: There was an 8% increase in the price of toiletry items between 2017 and
2018.
3. Quantity indices
The concept of quantity indices is very similar to that of price indices, as are the methods used
for their calculation. The formulas given in the formula sheet in the examination are all price
index formulas. If you want to compare quantities rather than prices, you have only to replace
the p’s with q’s, and the q’s with p’s in the price index formulas to obtain the formulas for
quantity index numbers. Remember the subscripts (numbers) remain the same.
𝑞0 = older quantity
∑( 𝑞1 × 𝑝0 )
𝐿𝑎𝑠𝑝𝑒𝑦𝑟𝑒𝑠 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑞0 × 𝑝0 )
𝑞1 = more recent quantity
𝑞0 = older quantity
𝑝0 = older price
∑( 𝑞1 × 𝑝1 )
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑞0 × 𝑝1 )
𝑞1 = more recent quantity
𝑞0 = older quantity
Example:
2016 2017
Regarding soap, what was the quantity chance between 2016 and 2017?
40
Solution: × 100 = 108.11%
37
Interpretation: There was an 8% increase in the quantity of soap sold in the 12 months.
72.15 77.70 84 78
442.34
Solution: Laspeyres quantity index: 551.81 × 100 = 80.16%
Interpretation: There was a 20% decrease in the quantity of toiletry items sold between 2016
and 2017. This may be due to the 8% increase in price.
478.94
Solution: Paasche quantity index: 554.86 × 100 = 86.317%
Interpretation: There was 14% decrease in the quantity of toiletry items sold between 2016
and 2017. This may be due to the 8% increase in price.
1) The purpose of the index that is to be calculated - a clear understanding of the scope
of the proposed index is needed to determine the following factors.
3) What weighting method should be used – this reflects the relative importance of each
item.
4) Which base year should be selected - choice of the base year must also be carefully
considered. If we compare the price of motor cars in January 2006 with that in January
1998, 1998 is taken to be the base year as it is furthest in the past. However, this
choice of base year (furthest in the past) may not always be appropriate for a number
The base of a series of price indices can be shifted from one time period to another by
multiplying each price index by an adjustment factor:
100
𝐴𝑑𝑗𝑢𝑠𝑡𝑚𝑒𝑛𝑡 𝑓𝑎𝑐𝑡𝑜𝑟 =
𝑂𝑙𝑑 𝑖𝑛𝑑𝑒𝑥
It is used for transforming monetary values into real values, relative to the base year.
Example Wegner (2020): Consider the following price index series with 2007 as the base year
(2007 = 100).
Year 2005 2006 2007 2008 2009 2010 2011
Price index 78 87 100 106 125 138 144
As a marketing student you have learnt about consumer price sensitivity and the impact it has
on demand. The financial manager of the company you work for wants to increase prices, and
the marketing manager has asked you for a report on the price index series using 2009 as the
base year, as this is the year that a new competitor entered the market.
Interpretation: The revised price index shows that since the new competitor entered the
market in 2009, prices have been increasing. An increase in the prices of the company’s
product needs to be consistent with, or below this, or customers will become sensitive to the
fact that your prices are increasing more than the competitor’s and demand may decrease.
1. Introduction
Suppose you are the marketing assistant of a B2B IT consulting company, and you are asked
to provide estimates of computer sales for the four quarters of 2018. Your estimates will
affect the number of orders, inventory policies, sales quotas, etc. It is essential that you
provide good estimates so that the management team can plan efficiently and effectively.
Poor planning will result in increased costs, and perhaps even the loss of your, and the other
staff members’, end of year bonuses. Good estimates can be calculated by reviewing past
sales figures, and the accompanying trends. Do sales peak in January, or at the beginning of
the tax year; do they fall in the third quarter? Such a review of historical data allows for better
predictions of future sales. This historical sales data is known as a time series. Specifically, a
Example: A shoe store supplies you with the following sales data:
Jan May Sep Jan May Sep Jan May Sep Jan
2014 2014 2014 2015 2015 2015 2016 2016 2016 2017
145 73 93 166 83 106 189 94 15 207
200
150
50
0
Mar-14
Sep-14
Nov-14
Mar-15
Sep-15
Nov-15
Mar-16
Sep-16
Nov-16
May-14
May-15
May-16
Jan-14
Jul-14
Jan-15
Jul-15
Jan-16
Jul-16
Jan-17
Interpretation:
Trend – overall increase in sales
Cyclical variations – period of steady growth
Seasonal variation – sales peak at January every year
Irregular variation – abnormal drop in sales during September 2016
By de-seasonalising a time series, we effectively remove the seasonal effects which results in
a different set of figures. The multiplicative time series model is simple a very simple equation
used to smooth out the trend line:
𝑦 = 𝑇 ×𝐶 ×𝑆 ×𝐼
Much of the variation is often described by the trend and seasonal components of the time
series and these are the components that will be addressed in more detail.
4. Trend analysis
When we de-seasonalise a time series, we smooth out the curve into a straight line (linear
trend). Two methods for removing the trend effect on a time series are the moving average
method and the regression analysis method.
The moving average is obtained by replacing each observed value with the averages of the
observed values. The concept can be illustrated in the following diagram, where a three-year
moving average is calculated for a series of values collected during the period 2013 to 2017.
2013 y1
2014 y2 (y1 + y2 + y3)/3
2017 y5
We can use the de-seasonalised time series to identify trend.
Example Wegner (2020): The table below shows the number of fire insurance claims received
by an insurance company in each four-month period from 2008 to 2011. You have just been
hired as an intern in the marketing research department, and your first task is to comment on
the claims trend over the past four years.
Period P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 P2 P3
Claims 7 3 5 9 7 9 12 4 10 13 9 10
Calculate a three-period moving average for the number of insurance claims received:
2008 P1 7
P2 3 7+3+5 = 15 15 / 3 = 5
P3 5 3+5+9 = 17 17 / 3 = 5.67
2009 P1 9 5+9+7 = 21 21 / 3 = 7
P2 7 9+7+9 = 25 25 / 3 = 8.33
P3 9 7+9+12 = 28 28 / 3 = 9.33
P2 4 12+4+10 = 26 26 / 3 = 8.67
P3 10 4+10+13 = 27 27 / 3 = 9
P2 9 13+9+10 = 32 32 / 3 = 10.67
P3 10
Comment on the trend: there is an overall increase in the number of claims between 2008
and 2011. The least squares method (see section one of this study unit on linear regression).
In this case, the dependent variable, y, is the actual time series and the independent variable,
x, is time. As the “name” of the time period is not a numeric value, each time period is
numbered. Two methods can be used to do this, namely, the sequential numbering method
and the zero-sum method.
Sales (y) Year (x) Sequential numbering (x) Zero-sum method (x)
578 2011 1 -3
593 2012 2 -1
620 2013 3 0
647 2014 4 1
671 2015 5 3
∑𝑥 = 0
Now that there are numerical values for each time period, the data can be used in the normal
least squares regression formula:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
b1 =
𝑛 ∑ 𝑥2 − (∑ 𝑥)2
∑ 𝑦 − b1 ∑ 𝑥
b0 =
𝑛
𝑦 = 𝑎 + 𝑏 (𝑥)
5. Seasonal analysis
The technique described in this section can be applied to any time interval (weekly, monthly,
quarterly data, etc.). Calculating seasonal indices is the most important aspect of the analysis
of seasonal fluctuation. Using monthly data, for example, the seasonal index will contain 12
figures, one for each month of the year, and will be used as a measure of activity for each
month relative to the average activity over a year. The four steps used to calculate the 4-
point moving average for quarterly data are the same as for the trend analysis above.
2) Constructing forecasts of time series values, using the trend line and
incorporating the seasonal influence to predict how the trend will continue into
the future.
1. As the marketing assistant for a chain of One-stop coffee shops, you have been asked
to analyse the pricing of different coffee brands between 2013 and 2017. Consider
the table below, which gives prices of coffee over a 5-year period
Brand 2013 2014 2015 2016 2017
Ricoffy 10.59 12.99 13.99 16.59 17.99
Nescafe 5.29 6.99 7.59 7.99 9.29
Jacobs 11.99 12.49 12.99 13.49 16.68
Frisco 4.79 4.99 7.99 8.39 8.99
a. Discuss issues that should be considered before choosing the base year. (2)
b. Calculate the price index for each coffee, using 2015 as the base year. (20)
c. Comment on the results. (1)
2. The following data show the age (in years) and the selling price (in R1000) of used
cars with the same engine capacity and make at eight different second-hand car
dealers in Port Elizabeth. You are a market analyst of a well-known car magazine, and
are compiling an article on the following:
Age 1 6 4 2 5 4 1 2
Price 41.2 10.3 24.3 38.7 8.7 26.1 38.7 36.2
3. Greyhound suspects that there is a direct link between advertising expenditure and
the number of passengers who choose Greyhound. The expenditure on advertising
and the number of passengers who have ridden with Greyhound over the last 12
months are shown below:
4. The turnover figures for a bicycle manufacturer for a 10-year period are shown
below. You have been hired as a marketing consultant to assist the manufacturer
with the following:
a. Calculate the five-period moving average for this time series. (6)
b. Comment on turnover trend of the company for the 10-year period. (1)
1.
a. The base year should not be too far in the past as many products change, there
might be products discontinued and others newly introduced. The base year
should also be a period of economic and political stability.
𝑝
b. 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 = 𝑝1 × 100
0
c.
Brand 2013 2014 2015 2016 2017
Ricoffy 76 93 100 119 129
Nescafe 70 92 100 105 122
Jacobs 92 96 100 104 128
Frisco 60 62 100 105 113
d. Ricoffy had the highest increase since 2015, the price of Ricoffy increased by 29%
since 2015.
∑𝑦 − 𝑏∑𝑥
𝑎=
𝑛
375 − (−6.59 × 25)
𝑎=
8
𝑎 = 48.62
𝑦 = −6.59𝑥 + 48.62
3. a.
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
(12 × 3174.09) − (160.6 × 227.6)
𝑏=
(12 × 2266.8) − 160.62
𝑏 = 1.46
∑𝑦 − 𝑏∑𝑥
𝑎=
𝑛
227.6 − (1.46 × 160.6)
𝑎=
12
𝑎 = −0.57
𝑦 = 1.46𝑥 − 0.57
b.
𝑦 = 1.46𝑥 − 0.57
𝑦 = (1.46 × 19) − 0.57
𝑦 = 27.17 ≈ 𝑅27170
Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises.
Did you complete all the relevant revision exercises and check your answers
against the answers provided?
At this point, you should be able to: (list study unit outcomes again)
Are you ready to tackle the questions relevant to Study Unit 5 in the Exam?
You have now covered every module outcome and the associated learning activities or
exercises relating to this module.
Are you ready for your final assessment?
Did you compare your answers with those answers provided in the study guide?
Are you more aware of the same basic principles applied around you in your work
environment or day-to-day life?
Did you complete all your assignments before the due date and ensure it reached the
IMM Graduate School in time?
Consult eLearn for more exam tips on how to study and how to prepare yourself for
the exams.
The overall content of this study guide is based on the prescribed textbook of this module.:
Wegner, T., 2020. Applied Business Statistics: Methods and Excel-based Applications. 5th ed.
Cape Town: Juta. (CD-ROM included).
Alphabetical list
http://www.academia.edu/6334673/19_Growth_Hacker_Quotes-
Thoughts_on_the_Future_of_Marketing
[Accessed: 15 August 2014]
Anderson, D.R., Sweeney, D.J., and Williams, T.A., 1991. An Introduction to Management
Science – Quantitative Approaches to Decision Making. 6th ed. St Paul: West Publishing
Company.
http://en.wikipedia.org/wiki/Ronald_Fisher
[Accessed: 15 August 2014]
http://www.math.wpi.edu/Course_Materials/SAS/quotes.html
[Accessed: 15 August 2014]
Shipham, S.O., 2012. Basic Marketing Research Study Guide. IMM Graduate School.
http://stats.stackexchange.com/questions/726/famous-statistician-quotes
[Accessed: 15 August 2014]
http://todayinsci.com/QuotationsCategories/S_Cat/Statistics-Quotations.htm
Accessed: 15 August 2014]
Wiid, J. and Diggines, C. 2013. Marketing Research. 2nd Ed., Cape Town: Juta.
Population The collection of all the Population – all visitors to the Pretoria
observations of a random Zoo
variable under study and
about which one is trying to
draw conclusions in
practice.
Sampling unit The item / individual being Sampling unit – each visitor who is
measured or counted with questioned in the survey
respect to the random
variables(s) under study.
Sample A subset of the population Sample – the 100 visitors who were
on which observations are selected to represent the population
made or measurements are
taken. A sample is used
when a census is too
expensive, time-consuming
or impossible.
Interval data Associated with quantitative D Data type – interval, as the data is
(Level of data, scaled with order quantitative and the animals are ranked.
measurement) (ranking) and distance. at
Copyright 2020
In terms of the Copyright Act 98 of 1978, no part of this study material may be reproduced,
be stored in retrieval system, be transmitted or used in any form or be published,
redistributed or screened by any means (electronic, mechanical, photocopying, recording or
otherwise) without the written permission of the IMM Graduate School. However,
permission to use any material in this work that was derived from other sources must be
obtained from the original sources.