0% found this document useful (0 votes)
343 views

BS101 StudyGuide 1 2021 Final

Statistics Study Guide

Uploaded by

Thuli Mashinini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
343 views

BS101 StudyGuide 1 2021 Final

Statistics Study Guide

Uploaded by

Thuli Mashinini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

Study Guide

Business Statistics
(BS101B)

This module forms a compulsory core module for the following undergraduate academic
programmes:

 BBA in Marketing Management


 BCom in Marketing and Management Science

Notional Hours: 200


Credits: 20
NQF: 6
Weeks: 14

©IMM Graduate School Study Guide (BS101B) Page 1 of 125


Published by IMM Graduate School © Copyright Reserved
January 2021 Revised Edition

©IMM Graduate School Study Guide (BS101B) Page 2 of 125


Contents
SECTION A: GENERAL INFORMATION ........................................................................................ 6

Word of Welcome .............................................................................................................. 6

Programme Structure - BBA in Marketing Management .................................................. 7

Programme Structure – Bachelor of Commerce in Marketing and Management Science


.......................................................................................................................................... 10

Business Statistics (BS101B) Overall purpose and module outcomes ............................ 13

IMM Graduate School Alumnus – Chesney McCall-Peat ................................................. 14

Study tips ......................................................................................................................... 15

Planning your Business Statistics studies ........................................................................ 15

Student Support at your fingertips ..................................................................................... 17

Your Learning Process Checklist ...................................................................................... 19

SECTION B ................................................................................................................................ 20

A. Study Unit 1 – Chapter 1.............................................................................. 20


B. Study Unit 1 – Specific Outcomes ................................................................ 20
C. Study Unit 1 - Assessment Criteria ............................................................... 21
D. Introduction to Business Statistics ............................................................... 21
E. Study Unit 1 - Revision Exercises .................................................................. 22
F. Study Unit 1 – Revision Exercises Solutions ................................................. 22
G. Study Unit 1 - Progress check ....................................................................... 22

Study Unit 2 – Exploratory data analysis ......................................................................... 25

A. Study Unit 2 - Chapter 2 ............................................................................... 25


B. Study Unit 2: Module Outcomes .................................................................. 25
C. Study Unit 2 – Specific Outcomes ................................................................ 25
D. Study Unit 2 - Assessment Criteria ............................................................... 26
E. Study Unit 2 – Objective 1 ............................................................................ 27
F. Study Unit 2 – Objective 2 ............................................................................ 35
G. Study Unit 2 – Revision Exercises ................................................................. 45

©IMM Graduate School Study Guide (BS101B) Page 3 of 125


H. Study Unit 2 – Revision Exercises Solutions ................................................. 46
I. Study Unit 2 - Progress check ....................................................................... 48

Study Unit 3 – The foundation of statistical inference: Sampling ................................... 50

A. Study Unit 3 - Chapter 4 – 6 ......................................................................... 50


B. Study Unit 3: Module Outcomes .................................................................. 50
C. Study Unit 3 – Specific Outcomes ................................................................ 50
D. Study Unit 3 – Assessment Criteria ................................................................... 51
D. Study Unit 3 – Objective 1 (Chapter 4) ......................................................... 52
E. Study Unit 3 – Objective 2 (Chapter 5) ......................................................... 56
F. Study Unit 3 – Objective 3 (Chapter 6) ......................................................... 61
G. Study Unit 3 – Revision Exercises ................................................................. 64
H. Study Unit 3 – Revision Exercises Solutions ................................................. 65
J. Study Unit 3 - Progress check ....................................................................... 67

SECTION C ................................................................................................................................ 69

Study Unit 4 – Making statistical inferences ................................................................... 69

A. Study Unit 4 – Chapters 7 – 10: .................................................................... 69


B. Study Unit 4: Module Outcomes .................................................................. 69
C. Study Unit 4 – Specific Outcomes ................................................................ 69
D. Study Unit 4 - Assessment Criteria ............................................................... 70
E. Study Unit 4 – Objective 1 (Chapter 7) ......................................................... 72
F. Study Unit 4 – Objective 2 (Chapter 8) ......................................................... 77
G. Study Unit 4 – Objective 3 (Chapter 9) ........................................................ 83
H. Study Unit 4 – Objective 4 (Chapter 10) ....................................................... 86
I. Study Unit 4 – Revision Exercises ................................................................. 89
J. Study Unit 4 – Revision Exercises Solutions ................................................. 90
K. Study Unit 4 - Progress check ....................................................................... 91

Study Unit 5 – Statistical models for forecasting............................................................. 93

A. Study Unit 5 – Chapters 12 – 15: .................................................................. 93


B. Study Unit 5: Module Outcomes .................................................................. 94
C. Study Unit 5 – Specific Outcomes ................................................................ 94

©IMM Graduate School Study Guide (BS101B) Page 4 of 125


D. Study Unit 5 – Assessment Criteria ................................................................... 95
D. Study Unit 5 - Objective 1 (Chapter 12) ....................................................... 95

SECTION D .............................................................................................................................. 102

E. Study Unit 5 - Objective 2 (Chapter 14) ..................................................... 102


F. Study Unit 5 - Objective 3 (Chapter 15) ..................................................... 108
G. Study Unit 5 – Revision Exercises ............................................................... 114
H. Study Unit 5 – Revision Exercises Solutions ............................................... 116
I. Study Unit 5 - Progress check ..................................................................... 119

SECTION E – REVISION & EXAM PREPARATION ..................................................................... 121

Final Progress Check ............................................................................................ 121

Reference list ......................................................................................................................... 122

Glossary .................................................................................................................................. 123

©IMM Graduate School Study Guide (BS101B) Page 5 of 125


SECTION A: GENERAL INFORMATION

Word of Welcome

Welcome to the exciting world of Business Statistics, an important discipline within the broad
field of marketing. Statistical analysis has not only gained status as a science discipline but has
also gained wide acceptance as a critical tool to generate information for effective business
decision making.

©IMM Graduate School Study Guide (BS101B) Page 6 of 125


Programme Structure - BBA in Marketing Management
Herewith a brief summary of the BBA in Marketing Management programme indicating where
Business Statistics (BS101B) fits into.

Overall Programme Purpose and Outcomes


Once you have successfully completed the modules and achieved the module outcomes
covered within the BBA in Marketing Management programme you will have completed the
programme’s purpose and outcomes.

©IMM Graduate School Study Guide (BS101B) Page 7 of 125


Programme Purpose
To empower qualifiers with graduate-level knowledge, specific skills and applied competence
in the field of Marketing Management to enable them to pursue practical and rewarding
careers in the marketing business environment. The purpose of the qualification is also to
provide graduates competence in marketing, business management, and financial
management. Further, the purpose of the qualification is to assist and enable students to
develop their intellectual capacity, understanding of the business and marketing
environment; and to think critically and innovatively and to build a foundation for further
specialisation in the field of marketing.

Programme Exit-Level Outcomes

1. Mastered an advanced knowledge of marketing principles and basic application skills in


marketing-related field.
2. Demonstrate a broad understanding of business management knowledge, functional areas
within an organisation and how it applies to the business environment, recognize and
appreciate the interdependencies between these functional areas and how they apply to
the business environment. Furthermore, be able to take a strategic view of an organisation
and align the strategies with the objectives.
3. Select, apply and evaluate typical methods and procedures to assist in making informed
marketing decisions.
4. Furthermore, demonstrate a broad understanding of economics in order to understand
how it applies not only to the business world but also to everyday life.
5. Solve marketing problems in various types of organisations, such as retail-driven, service-
related, business-to-business, government-related and NPO’s.
6. Demonstrate a broad understanding of financial management knowledge and how it
applies to the marketing and business environment.
7. Produce a strategic marketing and business plan and be able to evaluate the success of the
plan.

©IMM Graduate School Study Guide (BS101B) Page 8 of 125


8. Produce and communicate information in a business environment by applying proper
communication skills acquired which should also include the correct application of
intellectual property, copyright, and plagiarism.
9. Demonstrate an advanced understanding of the economic context and systems within
which organisations operate and be able to link it to marketing opportunities.
10. Understand the scope of responsibilities that go with a management position in the
marketing field, and understand the accountability to senior management in an
organisation.
11. Understand and apply principles of ethics to practical marketing scenarios.

©IMM Graduate School Study Guide (BS101B) Page 9 of 125


Programme Structure – Bachelor of Commerce in Marketing and
Management Science

Herewith a brief summary of the BCom in Marketing and Management Science indicating
where Business Statistics (BS101B) fits in.

©IMM Graduate School Study Guide (BS101B) Page 10 of 125


Programme Overview
Once you have successfully completed the modules and achieved the module outcomes
covered within the BCom in Marketing and Management Science programme, you will have
completed the programme’s purpose and outcomes.

Programme Purpose
The purpose of this qualification is to provide candidates in the private, public and voluntary
sectors with comprehensive and in-depth knowledge of the principles, major theories, and
paradigms, skills, methods, and technology of the science and profession of the field of
marketing, management, supply chain, sales and project management. This, in order to
promote sustainable growth and development and maximise prosperity in all sectors of the
economy and society at large.

To develop competent leaders with applied economic, management, supply chain, project
management, sales and marketing skills as well as generic cross-functional knowledge and
skills to steer sustainable development, growth and prosperity in the most appropriate
direction.

To provide students who want to enrol for advanced studies in management, supply chain,
project management, sales and marketing, with a sound academic base, to apply their skills
and for further advancement in careers and academic studies in the field of marketing, sales,
supply chain, project management, and management science.

Programme Outcomes
Programme Exit-Level Outcomes:
Students must demonstrate an integrated understanding of a broad scope of
management knowledge and how it practically applies to the disciplines of marketing,
sales management, supply chain and project management.
To demonstrate a comprehensive understanding of the knowledge regarding
economics, financial management, research as applied to marketing, sales, supply chain

©IMM Graduate School Study Guide (BS101B) Page 11 of 125


and project management activities in relation to the organisation and the business
environment in general.
Students must be able to collect, analyse, organise and critically evaluate relevant
economic, financial, marketing and project related information to make sound decisions
in the organisation.
To demonstrate the ability to identify, analyse, evaluate, and critically reflect on
complex problems related to sales, marketing, operations and supply chain in the
organisation with the aim of finding evidence- based solutions.
Evaluate, apply, and integrate sales, marketing, supply chain and project management
knowledge and skills and general business principles to real life situations taking into
account societal, ethical, and cultural considerations.
Students must access, process and manage information, demonstrating the ability to
develop appropriate processes of information gathering for a given context or use, also
independently validating the sources of information and evaluate and manage the
information.
Students use appropriate academic/ professional/occupational discourse to produce
and communicate information in a business environment, demonstrating their
understanding and own ideas and opinions on business science, marketing sales, project
management and supply chain related matters. Students must do so whilst respecting
conventions around intellectual property, copyright and plagiarism.
Critically analyse contemporary business information and evaluate the potential future
outcomes of sales, marketing, supply chain and project management decisions.
Students must show an understanding of the scope of responsibilities required of a
management position in the sales, marketing, supply chain, human resources
operations, project management functions, and understand the accountability to senior
management in an organisation.

©IMM Graduate School Study Guide (BS101B) Page 12 of 125


Business Statistics (BS101B) Overall purpose and module outcomes

Module purpose:
The task of statistical analysis is to help generate accurate information for major decision
makers in the world of business. The required information is often used to design a marketing
strategy, and for this reason, the collected information will assist in identifying marketing
opportunities and threats, formulating marketing plans and actions, and evaluating and
improving overall marketing performance.

Statistics is the science of collecting, organising and interpreting numerical facts, which we
call data (Moore & McCabe, 1989). In other words, statistics make sense of numbers. We read
and hear about statistics every day: Temba Bavuma’s average batting score, the results of the
municipal elections, the average summer rainfall in Gauteng, the monthly range in the oil
price. The understanding of statistics is important in many professions, and marketing is no
exception. Business decisions are based on the results of market research, so you can
appreciate how important an understanding of statistics is going to be to you.

Statistics is a fascinating subject that you can apply to everyday situations. During this learning
programme you will become familiar with terms and concepts that you may have shied away
from until now. Statistics allow us to use data to gain insight into day-to-day simple or
complex problems. Data on their own are meaningless without the ability to understand
them. At the end of this programme you will be able to critically interpret reported statistics,
both within the business environment and in the popular press, as well as analyse data using
the appropriate methodology.

Although the basis of statistics is a series of complex formulae, you are not required to
memorise these formulae. However, you will be expected to use the formulae extensively
during analysis, as well as to think about the reasoning behind the particular statistical
methods.

©IMM Graduate School Study Guide (BS101B) Page 13 of 125


Module outcomes:

By the end of the module, students should be able to:

 Understand the importance of business statistics in marketing management decision


making
 Know the difference between a sample and a population (community). Use specific
methods to choose an appropriate sample.
 Describe and understand how data are distributed, how data can be summarised in
statistical terms, and how this summarised data can be used by marketing managers to
aid decision-making.
 Understand the concept of probability and know which methods are used for generalising
results.
 Use statistical inference to answer specific questions with a known degree of confidence.
 Use statistical methods to predict future aspects of a business operation.
 Appreciate and describe the practical considerations of the use of statistics to address
business problems.

IMM Graduate School Alumnus – Chesney McCall-Peat

Business statistics is the science of good decision making in the face of


uncertainty, and is used in many disciplines, such as financial analysis,
production, and operations, including services improvement, and
marketing research. When I first began business statistics my initial
thought on the subject was that this has no impact on my life and I am only
learning it to get my qualification, but once I started learning more, I
realised that it can also be used in everyday marketing activities and I was suddenly captivated
by the integrated knowledge behind the subject. Some of the work is a bit confusing in the
beginning but with the aid of a great study guide as well as the textbook it has led to my full
understanding of stats.

Chesney McCall-Peat, 2014

©IMM Graduate School Study Guide (BS101B) Page 14 of 125


Study tips

Planning your Business Statistics studies


The IMM Graduate School has designed student pacers for each module. These pacers will
assist you in planning your studies to ensure you cover the entire syllabus and to schedule
your studies at manageable intervals. Distance learning requires careful planning and
scheduling of your studies and the student pacer will provide you with a guideline on how to
plan and not fall behind. Adhering to the student pacer will guide you and provide you with a
good start to achieve the targets set out for each module and to ensure you plan beforehand
to hand in your assignments before or on the due date and to ensure you have sufficient time
to study for your exam.

Prescribed learning material and additional support


You need to make use of the following prescribed material throughout your studies of this
module:

a. Wegner, T. (2020). Applied Business Statistics – Methods and Excel-based


Applications. 5th Edition, Cape Town: Juta.

b. Prescribe IMM Graduate School Study Guide for BS101B, dated January
2020.

NB: the prescribed book forms the foundation of knowledge required to master all learning
outcomes. All the examples and required chapters must be attempted. The study guide
provides additional summaries, examples and information to unlock these concepts.

©IMM Graduate School Study Guide (BS101B) Page 15 of 125


Calculators

You will need a basic calculator that is typically used at schools and
universities, similar to the one displayed in the picture. It can perform a
variety of functions, including fraction calculations, percentage calculations,
scientific calculations, and statistical calculations. It is a CASIO fx 82 ZA Plus.

This type of calculator is adequate for the basic business calculations that you
will have to perform during this course.

You will need this calculator for the examination.

©IMM Graduate School Study Guide (BS101B) Page 16 of 125


Student Support at your fingertips

You are registered for this module on a distance learning basis and you are expected to work
on your own 70% of the time. However, this does not mean that you are completely on your
own. Please use the available IMM Graduate School Student Support resources to help you
during your studies.

The IMM Graduate School is committed to assisting students with all queries and have
introduced [email protected] to answer all general queries. This is supported by a
ticketing system that issues students with a unique ticket number and ensures we are able to
track the progress of queries, ensure prompt response and swift resolution times.

NB: Please ensure that all module specific questions and queries are still posted on the
module specific discussion forums, available on eLearn. Do not leave your queries to the last
day before you write your examination or before the assignment submission due dates.

You are required to regularly visit eLearn as it is an essential source of information that is
continuously updated with topical material, additional guidance, messages and tutorial
letters.

Other support resources available to you are:

 eLibrary is an excellent place for you to read additional material on your own. This tool
will be extremely valuable when conducting research for your assignments / projects
/ research reports. For access to the virtual library, please follow the instructions
available on eLearn.

 Information Centres - the IMM Graduate School has libraries in all Student Support
Centres with textbooks and additional materials that could help you in your
assignments when you need to reference additional sources. For opening times at

©IMM Graduate School Study Guide (BS101B) Page 17 of 125


facilities please enquire at your Student Support Centre. You have access to free
Internet at the Information Centre.

 eMasterclassess - in our on-going efforts to support our students, the IMM Graduate
School hosts online tutorials in all our modules for additional guidance and support.
Subject matter experts share their knowledge through the use of a presentation or
video conferencing addressing learning outcomes, assignment and examination
preparation, etc., giving ample opportunity for student feedback and interaction.

 eDiscussion Forum– join group forums for discussions, to post questions, discuss
subject content and to receive updates on specific modules.

The Journal of Strategic Marketing - the official publication of the IMM Institute of Marketing
Management, which keeps you up-to-date with the latest news and trends of what is
happening in the industry. Another publication is the Strategic Marketing Africa magazine,
which addresses the unique marketing challenges and opportunities in Africa. These
magazines are released quarterly and could assist you in providing examples to use in
assessments to back up your theoretical knowledge. Both of these magazines are available
electronically on eLearn.

©IMM Graduate School Study Guide (BS101B) Page 18 of 125


Your Learning Process Checklist

At this point you should understand the learning process already explained, as well as what
Business Statistics is all about and you should be ready to start your journey towards the
successful completion of your module.

Checklist Done / still to do / still to buy or


access

Do you have access to all the prescribed – and


additional learning material?

Prescribed textbook
BS101B study guide
IMM Graduate School eLearn platform
IMM Graduate School eLibrary platform

Do you have a quiet place to study?

Do you have support from your close family /


friends / colleagues?

Do you know who to contact at the IMM


Graduate School when needed?

©IMM Graduate School Study Guide (BS101B) Page 19 of 125


SECTION B

Study Unit 1- Introduction to business statistics

A. Study Unit 1 – Chapter 1

“Because of his poor eyesight…he was tutored in


mathematics without the aid of paper and pen,
which developed his ability to visualise
problems…”
-Wikipedia, on Sir Ronald Fisher (Father of Modern Statistics)

Study Unit 1: Module Outcomes

Let’s recap what the relevant module learning outcome is for this study unit:

Understand the importance of business statistics in marketing management decision


making
Know the difference between a sample and a population (community). Use specific
methods to choose an appropriate sample.

B. Study Unit 1 – Specific Outcomes

Let’s recap what the relevant study unit learning outcome is for this study unit

After completing this study unit, you should be able to:

Understand the importance of Business Statistics in marketing management decision


making.
Differentiate between inferential and descriptive statistical problems.

©IMM Graduate School Study Guide (BS101B) Page 20 of 125


C. Study Unit 1 - Assessment Criteria

D. Introduction to Business Statistics

Study Unit Outcomes Assessment Criteria Relevant Chapter


(The student should be able (How will you know if the student has in the Prescribed
to…) achieved the learning outcome?) Text Book
1. Understand the  Describe the business statistics Chapter 1
importance of principles used marketing
business statistics in management decision making.
marketing  Describe how business statistics
management principles are used in marketing
decision making management decision making.
2. Know the difference  Differentiate between a sample Chapter 1
between a sample and a population.
and a population
(community).
Business Statistics in Marketing Management
Successful marketing satisfies the wants and needs of the customer. In order to do this,
marketers need to know who their customers are, and what their wants and needs are. This
is the role of marketing research: “The systematic collection, analysis and interpretation of
information about all marketing problems by means of marketing recognised scientific
methods to provide information that marketing management can use in decision-making.”
(Wiid & Diggines, 2010).

Data is readily available from a variety of sources and of varying quality and quantity, but data
does not help marketers. Statistical analysis is used to process the raw data into useful
information that can be used to make decisions. This, together with marketing and research
tools, enables researchers to reach valid conclusions regarding the marketing and business
problems or issues being explored. The course must, therefore, be considered a preparation
for the practical issues which confront marketers when making decisions. (Shipham, 2012)
This is why applied statistics is regarded as a decision support tool.

©IMM Graduate School Study Guide (BS101B) Page 21 of 125


Statistics and Computers
There are many statistical packages available, from Microsoft Excel and Microsoft Access (part
of the MS Office suite) to more advanced software such as Epistat, Stata and SAS. Many
calculators also have in-built statistical functions. It will be an added advantage if you do have
such a calculator for the exam. Make sure that you are able to use the ‘stat-function’ on the
calculator. MS Excel has excellent graphics ability, enabling the personalised presentation of
results in the histogram, pie chart and line-graph formats (amongst others). This feature is
also useful to display results that are generated using other software.

E. Study Unit 1 - Revision Exercises

The first year Statistics students at the Tshwane University of Technology conducted a study
of a random sample of 50 students to measure the average daily amount spent at the canteen.
This information will be used to help the canteen set its pricing to be in line with the needs of
the students.
1. What was the population? (1)
2. What is the sample? (1)
3. What type of data are the average amounts spent? (1)

F. Study Unit 1 – Revision Exercises Solutions

1. The population is all the students at the Tshwane University of Technology 


2. The sample is the 50 students selected to conduct the study on 
3. Quantitative data

G. Study Unit 1 - Progress check

You have come to the end of Study Unit 1.


Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises.

©IMM Graduate School Study Guide (BS101B) Page 22 of 125


©IMM Graduate School Study Guide (BS101B) Page 23 of 125
Your progress checklist:

Progress checklist YES / NO?

Did you read through each study unit outcome?

Did you go through all learning material

Did you complete all the relevant revision exercises and check your answers
against the answers provided?

At this point, you should be able to: (list study unit outcomes again)

©IMM Graduate School Study Guide (BS101B) Page 24 of 125


Study Unit 2 – Exploratory data analysis

A. Study Unit 2 - Chapter 2

“You need the kind of objectivity that makes you


forget everything you’ve heard, clear the table,
and do a factual study like a scientist would”
- Steve Wozniak, Co-founder, Apple

B. Study Unit 2: Module Outcomes

Let’s recap what the relevant module learning outcome is for this study unit:

 Describe and understand how data are distributed, how data can be summarised in
statistical terms, and how this summarised data can be used by marketing managers
to aid decision-making.

C. Study Unit 2 – Specific Outcomes

Let’s recap what the relevant study unit learning outcome is for this study unit

After completing this study unit, you should be able to:

Understand how data are distributed


Describe how data can be summarised by using graphical presentations and other
statistical terms.
Know how this summarised data can be used by marketing managers to assist them in
decision-making.

©IMM Graduate School Study Guide (BS101B) Page 25 of 125


D. Study Unit 2 - Assessment Criteria

Study Unit Outcomes Assessment Criteria Relevant


(The student should be able (How will you know if the student has achieved the Chapter in the
to…) learning outcome?) Prescribed Text
Book
1. Transform raw, processed  Present categorical data in a frequency Chapter 2
data into organised data. distribution table.
 Calculate the absolute and percentage
frequencies of each interval, the relative
frequency and the cumulative frequency.
 Group numeric data and present it in a
frequency distribution table
 Calculate the absolute and percentage
frequencies, relative and cumulative frequencies
of each interval
2. Use different graphical  Differentiate between different types of graphs Chapter 2
presentations in order to and charts
summarise and interpret  Identify the most appropriate type of graph or
data with reference to a chart given a specific dataset and/or scenario.
particular marketing  Draw and interpret different graphs and charts.
decision-making situation.
3. Construct histograms,  Differentiate graphs according to the type of Chapter 2
frequency polygons and data they are used to represent
ogives from frequency  Draw them from the information given.
distributions and relative
frequency distributions.
4. Interpret the findings from  Interpret the graphs in line with the various Chapter 2
each graphic scenarios given.
representation.
5. Identify and compute the  Identify, describe, calculate and interpret the Chapter 2
various measures of appropriate central and non-central location
central tendency for both measures for grouped and ungrouped data.
grouped and ungrouped  Calculate and interpret the various measures of
data. dispersion.
6. Identify and compute the  Calculate the range, variance, standard deviation Chapter 2
various measures of and coefficient of variation.
dispersion appropriate for
the different data types for
ungrouped data.
7. Understand why the  Describe the utility of each type of measure of Chapter 2
various measures are dispersion
valuable and know how to  Interpret measures of dispersion.
use the various measures

©IMM Graduate School Study Guide (BS101B) Page 26 of 125


in marketing decision-
making.

E. Study Unit 2 – Objective 1

At the end of this section of the study unit, you should be able to use different graphical
presentations in order to summarise and interpret data with reference to a particular
marketing decision-making situation.

1. Introduction

For example: A retail clothing store chain surveys 600 Port Elizabeth residents to identify if
they would use a new mail order catalogue to purchase their clothing. The survey will result
in a data set of 600 values. The descriptive statistics will summarise this information to make
it useful for the marketing department of the clothing chain.

Statistical results are most often displayed by means of graphs in annual reports, newspaper
articles, research studies, etc.

This method is far more effective for communicating results than writing the words and
numbers.
Graphic displays of data are useful for presentation of results so that they can be quickly
and easily understood by clients, suppliers or colleagues.
Graphic displays of data are useful for analysis of data as it allows for easy comparison
of results.
This study unit deals with the most common ways in which data can be summarised
graphically, methods used to produce each type, when to use each type, and limitations.

2. Charts

Charts are one type of graph most commonly used to summarise or describe qualitative data
collected in marketing research studies. Qualitative data are essentially converted into
‘count’ data (totals for each category of observation) before they can be displayed in a chart.
This is known as a frequency distribution. The two types of charts described here are pie

©IMM Graduate School Study Guide (BS101B) Page 27 of 125


charts and bar charts (simple, component and multiple). Bar charts are also sometimes
referred to as bar graphs.

3. Pie charts

Pie charts are commonly used to present relative frequency distributions for qualitative data
(most often, categorical data). To draw a pie chart, first draw a circle. Use the relative
frequencies (percentages) to subdivide the circle into segments.

An example of the construction of a pie chart, and its interpretation, based on example in
Wegner (2020):
TABLE 2.1
Grocery store preference of shoppers

Preferred store Number of shoppers Percentage


(Variable under study) (Frequency) (Relative frequency)

Checkers 10 33.33% (10 / 30 x 100)

Pick n Pay 17 56.67% (17 / 30 x 100)

Spar 3 10.00% (3 / 30 x 100)

Total 30 100%

Figure 2.1
Grocery store preference of shoppers
Checkers Pick n Pay Spar

10%

33%

57%

©IMM Graduate School Study Guide (BS101B) Page 28 of 125


Interpretation: Figure 2.1 shows that the largest percentage of shoppers (56.67%) prefer
shopping at Pick n Pay. There is a smaller, but still significant group of shoppers who prefer
shopping at Checkers (33.33%). This information can be used to evaluate the brand equity of
different grocery store chains, to identify opportunities for more Pick n Pay stores in the area,
and this research may lead to further studies looking at why more customers prefer Pick n
Pay. The marketing department of Spar should also think of ways to improve the way
shoppers feel about their stores.

4. Bar charts

A simple bar chart can be constructed from the same frequency distribution used to construct
a pie chart. The same information is displayed by means of vertical columns (bars), rather
than segments. The categories are given on the horizontal (x) axis and the frequencies are
given on the vertical (y) axis. Using the same data in Table 2.1:

Figure 2.2
Grocery store preference of sshoppers
18

16

14

12

10

0
Checkers Pick n Pay Spar

©IMM Graduate School Study Guide (BS101B) Page 29 of 125


Tip: make sure that the scaling of your graph in correct. In the graph above the numbers along
the vertical (y) axis are evenly spaced, moving up in two’s.

A component bar chart (more commonly called a stacked bar chart) allows you to summarise
data that have two or more variables.

For example, shoppers preference for Checkers, Pick n Pay or Spar (variable 1 = store
preference) divided up into males and females (variable 2 = gender). The chart looks similar
to a simple bar chart but the components of each bar are stacked on top of each other. A
multiple bar chart is very similar to a stacked bar chart but the components are presented
side-by-side, rather than as a single bar with two layers. Differences between categories are
more obviously displayed, using this type of bar chart. Using the same data in Table 2.1, just
adding the gender variable:

TABLE 2.2

Grocery store preference of shoppers

Preferred store Number of Number of Total

(Variable under female shoppers male shoppers

study) (Frequency) (Frequency)

Checkers 7 3 10

Pick n Pay 10 7 17

Spar 2 1 3

Total 19 11 30

©IMM Graduate School Study Guide (BS101B) Page 30 of 125


Figure 2.3
Grocery store preference of shoppers
20
Number of shoppers

15
7
10
3 Male
5 10 Female
7
1
0 2
Checkers Pick n Pay Spar
Store preference

Interpretation: Figure 2.3 still shows that the majority of female shoppers surveyed prefer
shopping at Pick n Pay. The majority of male shoppers also prefer shopping at Pick n Pay. This
study tells us that Pick n Pay is the most preferred store out of the three store options given
for both males and females. Checkers is the second most preferred store for both males and
females.

Tip: A graph is meaningless unless it has a title, and both x- and y-axis labels (with units
specified, if appropriate).

5. Frequency distributions

A frequency distribution is a tabular summary of a set of data showing the frequency (or
number) of items in each of several non-overlapping categories, such as gender, age group,
etc. The sum of the frequencies always equals the total number of elements in the data set.
The sum of the relative frequencies always equals 1 (or 100%).

Table 2.1 and Table 2.2 are both examples of a frequency distribution.

6. Histograms

A histogram looks similar to a bar chart but there are no spaces between the columns, and
the intervals are continuous. The class frequencies are plotted on the y-axis and the class
intervals, which are of equal width, on the x-axis. Usually, spaces are created on the left and
right hand sides of the graph.

©IMM Graduate School Study Guide (BS101B) Page 31 of 125


Class / interval – when data are grouped into classes
Class frequency – the number of data that fall within that particular class
Classes are what distinguished grouped data from ungrouped data (a simple list of
values)
Class boundaries – in order for the columns of the bar chart to not have spaces, each
class is adjusted to make the intervals continuous. This adjustment is made by dropping
0.5 from the bottom of each class and adding 0.5 to the top of each class.

Example from Wegner (2020): We are still interested in finding out more about shoppers who
buy groceries from our three stores under investigation – Checkers, Pick n Pay and Spar. Now
we would like to investigate the age of the shoppers. The results of this study are summarised
in the frequency distribution below:

TABLE 2.3
Frequency distribution – age of shoppers
Age (years) Number of shoppers Percentage
Class Boundaries
Variable under study (Frequency) (Relative Frequency)
(20 - 29) 19.5 – 29.5 6 20%
(30 – 39) 29.5 – 39.5 9 30%
(40 - 49) 39.5 – 49.5 8 27%
(50 - 59) 49.5 – 59.5 4 13%
(60 - 69) 59.5 – 69.5 3 3%
30 100%

©IMM Graduate School Study Guide (BS101B) Page 32 of 125


Figure 2.4
Histogram - age of shoppers

10
9
9
Number of shoppers

8
8
7
6
6
5
4
4
3
3
2
1
0
Age in years
19.5 - 29.5 29.5 - 39.5 39.5 - 49.5 49.5 - 59.5 59.5 - 69.5

Interpretation: Figure 2.4 shows that the age group with highest frequency is the 30 to 39
year olds, followed closely by the 40 to 49 year olds. This tells us that you are more likely to
find a shopper aged between 30 and 49 than finding a shopper older than the age of 60. The
store marketers can use this information to target their marketing mix to suit the majority of
their customers, who are aged between 30 and 4 years old. Or the stores could decide to
appeal to older market by changing their marketing mix to appeal more to customers aged
50 and older, in the hope of increasing their market share.

7. Frequency polygon

A frequency polygon is a graph in which the mid-points of each column in a histogram are
connected by means of a straight line. This, in essence, converts a histogram to a line graph.

©IMM Graduate School Study Guide (BS101B) Page 33 of 125


Figure 2.5
Frequency polygon - age of shoppers
Number of shoppers

10
9
9
8
8
7
6
6
5
4
4
3
3
2
1
0
Age in years

19.5 - 29.5 29.5 - 39.5 39.5 - 49.5 49.5 - 59.5 59.5 - 69.5

8. Cumulative frequency distribution (ogive)

This is often generated automatically by statistical software when constructing frequency


tables. It is merely an addition of each frequency, relative to each class, in a separate column,
and is clearly illustrated in Wegner (2020):

9. Cumulative frequency polygon

This is a graphic representation of the cumulative frequency distribution. Either numbers or


percentages can be plotted – the graph will have the same pattern, regardless.

10.Other graphs

Three other types of graphs are described, all essentially line graphs, namely, the line graph,
the Lorenz curve and the z-curve.

Note: As mentioned earlier in this study guide, MS Excel has an excellent graphing facility.
Data can be easily displayed, using any one of many types of graphs. It is relatively easy to
personalise your graphs (colours, font size, annotations, etc.) once you have mastered the
software.

©IMM Graduate School Study Guide (BS101B) Page 34 of 125


F. Study Unit 2 – Objective 2

At the end of this section of the study unit, you should be able to identify, describe, calculate
and interpret the appropriate central and non-central location measures for grouped and
ungrouped data. You should also be able to calculate and interpret the various measures of
spread. Meaningful interpretation of results is the key to effective marketing decision-making.

1. Introduction

Frequency tables and graphs provide only an approximate indication about the spread of the
data. For information to be valuable to marketing decision-making, we need more precise
information, specifically if we want to make comparisons between groups. Measures of
central location (or central tendency) are additional alternatives for summarising data, and
this chapter serves as an introduction to the world of statistical measurements. The three
measures discussed in Wegner (2020) are the mean, the median and the mode.

Dispersion or ‘spread’ describes the variability of the data which is important to know when
comparing different data sets, and in order to draw meaningful conclusions from analysis of
data. Various measures of dispersion are discussed in Wegner (2020).

When relating statistics to a marketing problem analysis, it is important to know which ones
to use and which to ignore, and so avoid presenting meaningless statistics in presentations
and reports. It is important that you be able to interpret the measurements in the context of
decision making. You do not have to remember the formulae themselves as all statistical
packages will calculate the statistics for you. Make sure you are able to use the stats function
on your calculator.

Note: If the measures are calculated for data from a sample, they are called sample statistics.
If they are calculated for data from a population, they are called population parameters.

̅)
2. Arithmetic mean (average) (𝒙

©IMM Graduate School Study Guide (BS101B) Page 35 of 125


You have all calculated averages at some time, and averages are quoted by the popular media
on a daily basis, e.g. average car sales for the first quarter of 2020, average rainfall in Gauteng
for the months September 2019 to April 2020. The mean is merely the statistical term for the
average. It is the most important numerical measure of central location. Wegner (2020)
provides examples of methods for calculating the mean for grouped and ungrouped data.

Mean for ungrouped data:


A study conducted to assess how many days new and upcoming African entrepreneurs spent on
market research found that out of a sample of 20 entrepreneurs, the following number of days were
spent on market research.
Data: 16; 20; 13; 19; 24; 22; 18; 18; 15; 20; 21; 21; 18; 20; 18; 20; 15; 20; 18; 20
Sample size (n): 20
Formula: ∑x / n = 16+20+13+19+24+22+18+18+15+20+21+21+18+20+18+20+15+20+18+20 / 20
= 376 / 20
= 18.80 days

Interpretation: On average, an African entrepreneur spends 18.80 days on market research.

Mean for grouped data:


A study conducted to assess how many days new and upcoming African entrepreneurs spent on
market research found that out of a sample of 20 entrepreneurs, the following number of days were
spent on market research.

Research days Number of Class midpoints fxi


entrepreneurs (xi)
(frequency)
(1 - 10) 0 5.5 0 x 5.5 = 0
(11 – 20) 16 15.5 16 x 15.5 = 248
(21 – 30) 4 25.5 4 x 25.5 = 102
20 350

Size (n): 20
Formula: ∑fxi / n = 350 / 20 = 17.5

Interpretation: On average, an African entrepreneur spends 17.50 days on market research.

3. Mode (𝑴𝒐 )

©IMM Graduate School Study Guide (BS101B) Page 36 of 125


Data outliers (extreme values) will distort the mean. For example, the mean weight of seven
boys weighing 55kg, 73kg, 73kg, 89kg, 88kg, 69kg and 131kg is 82.57kg. However, if we
omitted the seventh boy from the sample, the average weight would be 74.5kg – this value is
much lower once the relatively high weight of 131 is removed. Clearly, the mean is a poor
description of the central location of the data, in this case. A better indication of the central
location for these data is the mode, namely, the data value that occurs with the greatest
frequency (73kg). The mode is an important measure for qualitative data (e.g. most preferred
car of a choice of five).

Mode for ungrouped data:


A study conducted to assess how many days new and upcoming African entrepreneurs
spent on market research found that out of a sample of 20 entrepreneurs, the following
number of days were spent on market research.
Data: 16; 20; 13; 19; 24; 22; 18; 18; 15; 20; 21; 21; 18; 20; 18; 20; 15; 20; 18; 20
Sample size (n): 20

Order data = 13; 15; 15; 16; 18; 18; 18; 18; 18; 19; 20; 20; 20; 20; 20; 20; 21; 21; 22; 24
Mode = 20 (most often occurring value)
Interpretation: More often an entrepreneur will spend 20 days on market research.

©IMM Graduate School Study Guide (BS101B) Page 37 of 125


Mode for grouped data:
A study conducted to assess how many days new and upcoming African entrepreneurs
spent on market research found that out of a sample of 20 entrepreneurs, the following
number of days were spent on market research.
Research days Number of Cumulative
entrepreneurs frequency
(frequency) (f<)
(1 - 10) 0 0
(11 – 20) 16 0 + 16 = 16
(21 – 30) 4 16 + 4 = 20
20

Sample size (n): 20

Modal class: (11-20) as it is the class with the highest frequency


Formula: M0 = Omo + ( c (fm – fm-1 )) / (2fm – fm-1 – fm+1) = 11+(10 (16-0)) /(2x16 – 0 –
4)=11+5.71
M0 = 16.71
4. Median (𝑴𝒆 )
Interpretation: The most frequently occurring number of days spent on market research
Like the mode,
is 16.71 the value
days. This median is not to
is similar influenced
the meanbyofdata
17.5 outliers.
days. In the above example, the
median (the value of the middle ‘item’) is the same as the mode (73kg).

Median for ungrouped data:


A study conducted to assess how many days new and upcoming African entrepreneurs
spend on market research found that out of a sample of 20 entrepreneurs, the following
number of days were spent on market research.
Data: 16; 20; 13; 19; 24; 22; 18; 18; 15; 20; 21; 21; 18; 20; 18; 20; 15; 20; 18; 20
Sample size (n): 20

Order data = 13; 15; 15; 16; 18; 18; 18; 18; 18; 19; 20; 20; 20; 20; 20; 20; 21; 21; 22; 24

Median = (n+1) / 2 = (20 + 1) / 2= 21 / 2= 10.5th position = 19 + (20-19) x 0.50= 19+0.50 =

19.50
Interpretation: Half of the entrepreneurs spend 19.50 days or less on market research,
while the other half spend 19.5 days or more on market research.

©IMM Graduate School Study Guide (BS101B) Page 38 of 125


Median for grouped data:

Research days Number of Cumulative


entrepreneurs frequency
(frequency) (f<)
(1 - 10) 0 0
(11 – 20) 16 0 + 16 = 16
(21 – 30) 4 16 + 4 = 20
20

Sample size (n): 20

Median class: n / 2 = 10 / 2 = 5 = the 5th frequency falls within the class (11-20)

Formula: Me = Ome + (c [ (n/2) – f(<)]) / fme ) = 11+ (10 [(20/2) – 0]) / 16 = 11+ 6.25 = 17.25

Interpretation: Half of the entrepreneurs spend 17.25 days or less on market research,
while the other half spend 17.25 days or more on market research.

5. Quartiles

Quartiles are non-central measures that divide an ordered data set into four equal parts
(Wegner, 2020). Make sure that you understand the difference between quartile position
and the quartile value.

Quartiles for ungrouped data:


A study conducted to assess how many days new and upcoming African entrepreneurs spend
on market research found that out of a sample of 20 entrepreneurs, the following number of
days were spent on market research.
Data: 16; 20; 13; 19; 24; 22; 18; 18; 15; 20; 21; 21; 18; 20; 18; 20; 15; 20; 18; 20
Sample size (n): 20

Order data = 13; 15; 15; 16; 18; 18; 18; 18; 18; 19; 20; 20; 20; 20; 20; 20; 21; 21; 22; 24

©IMM Graduate School Study Guide (BS101B) Page 39 of 125


Lower quartile (Q1) = n+1 / 4 = 21 / 4 = 5.25th position = 18+((18 -18) x 0.25) = 18+0 = 18

Middle quartile (Q2) = n+1 / 2 = 21 / 2 = 10.5th position = 19+((20-19) x 0.5) = 19+0.50 = 19.50

Upper quartile (Q3) = 3(n+1) / 4= 63 / 4= 15.75th position= 20+((20-20) x 0.75)) =20+0= 20

Interpretation: 25% the entrepreneurs spend up to 18 days on market research, 50% will
spend up to 19.50 days on market research, and 75% will spend up to 20 days on market
research.
* Formula and process for calculating the quartiles for grouped data does not need to be
known.

Other measures of central location

Three additional measures of central location are described (the geometric mean, the
harmonic mean and the weighted average). These are used less often and will not be
examined.

Geometric mean: Used when the data represents percentage changes, such as indexes
and growth rates.

Harmonic mean: Used when the data represents rates of change.

Weighted average: In some studies, the importance of some data is more than for others.
In these cases, data are assigned weights based on their importance.

6. Range

This is the simplest measure of dispersion and is usually written as ‘highest value (max) –
lowest value (min)’. For example, the range of weights of boys in the previous example is
131kg – 55kg = 76 kg. The range is influenced by extreme data values (outliers) and is not
widely used. However, it is very useful in identifying data outliers which you may want to
exclude from your analysis, or which may be incorrectly recorded (what would you think if
one of the boys weighed 313kg?).

7. Variance (𝒔𝟐 )

Variance is a measure of dispersion that utilises all the data values and, as such, it is extremely
powerful and the most commonly used measure. It is based on the difference between each

©IMM Graduate School Study Guide (BS101B) Page 40 of 125


data value and the mean (deviation about the mean). The variance is the most commonly
used measure of spread; therefore, make sure you understand this concept.

8. Standard deviation (𝒔)

The standard deviation is the square root of the variance. It fluctuates very little from one
sample to the next taken from the same population and is the most important statistical
measure of spread. As such, you will see and use standard deviation frequently throughout
the remainder of this study guide, and it is important that you have a clear understanding of
the concept.

9. Coefficient of variation (𝑪𝑽)

The standard deviation is a measure of absolute variability in a data set. The coefficient of
variation is a relative measure of variability, used when comparing two samples that have
vastly different means. In such a case, you would not get an accurate picture of the relative
dispersion in the two data sets by comparing the two standard deviations.

©IMM Graduate School Study Guide (BS101B) Page 41 of 125


Variance, Standard deviation and Coefficient of variation for ungrouped data:
The traveling time (in minutes) to get to college for 10 marketing students are given in
below table:
Data: 18; 26; 15; 17; 7; 27; 24; 17; 10; 8 Sample size (n): 10 Mean (𝑥̅ ): 16.9 minutes

xi ̅
𝒙 (xi - 𝒙
̅) (xi - 𝒙
͞ )2
̅

18 16.9 1.1 1.21


26 16.9 9.1 82.81
15 16.9 -1.9 3.61
17 16.9 0.1 0.01
7 16.9 -9.9 98.01
27 16.9 10.1 102.01
24 16.9 7.1 50.41
17 16.9 0.1 0.01
10 16.9 -6.9 47.61
8 16.9 -8.9 79.21
∑ = 464.9

Variance (s2) = ∑ (xi - ͞x) 2 / (n – 1) = 464.9 / 9 = 51.66 minutes2

Standard deviation (s) = √ s2 = √ 51.66 = 7.19 minutes

Coefficient of variation (CV) = (s / 𝑥


͞ ̅ ) x 100 = (7.19 / 16.9) x 100 = 43%
Interpretation: The variance shows that the average deviation from the mean is 51.66
minutes squared, OR according to the standard deviation the average deviation is 7.19
minutes from the mean of 16.9 minutes. This means the marketing student needs to plan
for a 7.19 minute variation in the travelling time to college. A 43% coefficient of variation
shows that the travelling times show some variety, so it would be slightly risky for the
student to only allow 17 minutes to get to college.

©IMM Graduate School Study Guide (BS101B) Page 42 of 125


10.Measures of skewness

The shape of the data, which you can observe from the frequency polygon is important for
you, since it influences the choice of measurements you are going to use to describe the data.
Skewness looks at the relationship between the mean, mode and median, and what that
relationship tells us about the data. There are three types of skewness. You must be able to
interpret the coefficient of skewness.

Symmetrical distribution

In this case, the mean, median and mode are all equal. The coefficient of skewness is equal
to 0.

Skewed to the right distribution

In this case, the mode has the highest value, followed by the median and mean. The
coefficient of skewness is a positive value about 0. This means that the majority of items /
individuals act in a certain way (mode) and the rest act less often, e.g. the majority of students
study every day, but a few study less than 3 days a week.

Skewed to the left distribution

In this case, the mode has the lowest value, followed by the median and mean. The coefficient
of skewness is a negative value below 0. This means that the majority of items / individuals
act in a certain way (mode) and the rest act more often, e.g. the majority of students study
only 3 days a week, but a few study more than 3 days a week.

Interpretation:

In the example used previously, where we investigated the amount of days an entrepreneur
spends on market research, the mean = 18.80 days, the median = 19.50 days and the mode =
20 days. This data is almost symmetrical in distribution because the three measures of central
location are close to one another, but because the mode is the highest value (20 days) the
data set is said to be slightly skewed to the right.

©IMM Graduate School Study Guide (BS101B) Page 43 of 125


11.The box plot

Data can be summarised in terms of five descriptive measurements that are also used in an
easy-to-view graph, the box plot. Make sure that you are able to draw and interpret the box
plot. A box plot is drawn up based on a five-number-summary-table, and them graphically
summarises that information:

Five-Number-Summary-Table for the number of days African entrepreneurs spend


on research
Minimum value (lowest number) 13
Q1 18
Median 19.50
Q3 20
Maximum value (highest number) 24

Box Plot – number of days African entrepreneurs spend on research

19.50 (Me)
18 (Q1) 20 (Q3)

13 (Min) 24 (Max)

Interpretation: The range of the data set is 13 to 24 days. The middle value is 19.50 days. 25%
of the time an entrepreneur will spend up to 18 days on market research and 75% of the time
an entrepreneur will spend up to 20 days on market research. Based on the median, half of
the entrepreneurs will spend 19.50 days or less on market research, and the other half will
spend 19.50 days or more on market research.

©IMM Graduate School Study Guide (BS101B) Page 44 of 125


G. Study Unit 2 – Revision Exercises

A marketing manager for a popular hair product manufacturer has decided that he needs to
make better decisions regarding the choice of media to reach the target market. He has hired
you to conduct a study to identify the most popular media among the company’s target
market.

The following is a table of responses to the question: “From what medium do you get your
daily news?” Interviewers also recorded the age and gender of the respondents.

Age
Gender Medium
Initials (years)
BN 45 M TV
RF 34 M Newspaper
DF 87 M Newspaper
SD 23 F TV
IK 18 M Newspaper
HY 25 F TV
NM 26 F TV
MK 43 M Newspaper
JK 67 M TV
LO 40 F Radio
GB 23 M Newspaper
DP 46 F TV
SA 23 F Radio
EW 55 F Radio
CD 21 F TV

1. Summarise the data in three charts (use a combination of pie and simple bar charts),
depicting: 1) gender, 2) medium and 3) age (hint: divide the data into five age groups).
(13)

2. Calculate the range of the age of the respondents. Does this range show outliers? (3)

3. Calculate the mean ages of the respondents. Is this the most appropriate measure of
central location? Why/why not? (3)
4. Calculate the standard deviation of the ages. What does this tell you? (6)

©IMM Graduate School Study Guide (BS101B) Page 45 of 125


5. What is the most popular medium amongst women? What is the most popular medium
amongst men? What measure told you this? (3)

6. Is the choice of medium influenced by age? Describe any 2 methods that you used to get
your answer. (3)

H. Study Unit 2 – Revision Exercises Solutions

(1) Gender: A Pie chart is a good choice for depicting gender, as there is only male or
female, however a bar chart could also be used to summarise the data

Pie chart - Gender of interviewers

47% M

53% F

(2) Medium: A Bar chart was used to summarise the medium, however a Pie chart
could also be used.

Bar chart - Medium of daily news source for


interviewers
8 7
7
6 5
5
4 3
3
2
1
0
Newspaper TV Radio

©IMM Graduate School Study Guide (BS101B) Page 46 of 125


(3) Age: A histogram was used to summarise age as there were classes created for age
of interviewers

Histogram - Age of interviewers


8
7
7
6
5
4
4
3
2
2
1 1
1
0
15.5 - 30.5 30.5 - 45.5 45.5 - 60.5 60.5 - 75.5 75.5 - 90.5

1. Range = Max – Min = 87 – 18 = 69

Yes the age 87 can be seen as an outlier


∑𝑥 576
2. 𝑀𝑒𝑎𝑛 = =  = 38.40 𝑦𝑒𝑎𝑟𝑠
𝑛 15

The mean can be influenced by outliers and is not always the best measure of
central location. 
∑(𝑥−𝑥̅ )2 5363.6
3. Variance = =  = 383.11 𝑦𝑒𝑎𝑟𝑠 2 
𝑛−1 14

Standard deviation √𝑠 2 = √383.11 = 19.57

Interpretation: The variance shows that the average deviation from the mean is
383.11 years squared, OR according to the standard deviation the average
deviation is 19.57 years from the mean of 38.4 years. This means the ages of the
interviewers vary by almost 20 years. 
4. TV is the most popular medium amongst woman
Newspaper is the most popular medium amongst men
The relative frequency distribution can be used to determine the most popular
mediums by gender. 
5. No, the choice of medium is not influenced by age The following measures could
be used:

©IMM Graduate School Study Guide (BS101B) Page 47 of 125


- Mean: The average age for each medium does not vary widely which means that
there is not a distinct age group of people which use different mediums. The
averages are as follows: TV - 36, Radio - 39, Newspaper - 41 
- Range: The ranges calculated for each medium is fairly large, which means that
from young to old people would make use of that specific medium. The ranges are
as follows: TV - 46 , Radio - 32, Newspaper - 69
Note: These aren’t the only measure to justify your answer, any logical statistical measure can
be used to justify your answer.

I. Study Unit 2 - Progress check

You have come to the end of Study Unit 2.


Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises.

©IMM Graduate School Study Guide (BS101B) Page 48 of 125


Your progress checklist:

Progress checklist YES /


NO?

Did you read through each study unit outcome?

Did you go through all learning material

Did you complete all the relevant revision exercises and check your answers
against the answers provided?

At this point, you should be able to: (list study unit outcomes again)

©IMM Graduate School Study Guide (BS101B) Page 49 of 125


Study Unit 3 – The foundation of statistical inference: Sampling

A. Study Unit 3 - Chapter 4 – 6

This and the following study units are all about inferential statistics. The process of statistical
inference enables us, as marketers, to answer specific questions with a known degree of
confidence. This allows us to be more confident in our decision-making. Study Unit 3
introduces the concept of probability and describes the methods for generalising results. In
most circumstances, we obtain results from a sample of the target population.

“It is difficult to understand why statisticians


commonly limit their inquiries to Averages, and
do not revel in more comprehensive views.”
- Sir Francis Galton

B. Study Unit 3: Module Outcomes

Let’s recap what the relevant module learning outcome is for this study unit:

Understand the concept of probability and know which methods are used for generalising
results.

C. Study Unit 3 – Specific Outcomes

Let’s recap what the relevant study unit learning outcome is for this study unit

The student should be able to describe the properties and concepts of probability and
probability distributions, and define and apply the rules of the different types of

©IMM Graduate School Study Guide (BS101B) Page 50 of 125


probability. They should also be able to interpret probabilities and make decisions
based on the results.
There are two different types of probability distributions, the Poisson distribution and
the Binomial distribution, students should know when to apply the two different types,
how to describe and compute them, as well as how to apply them to marketing decision
making.
The final part of this study unit serves as preparatory reading for study unit 4. The
student should understand the purpose of inferential statistics, the difference between
a sample and a population, the different types of sampling and reasons for choosing a
particular type in marketing research study.

D. Study Unit 3 – Assessment Criteria

Study Unit Outcomes Assessment Criteria Relevant


(The student should be able (How will you know if the student has achieved Chapter in the
to…) the learning outcome?) Prescribed Text
Book

 Define the different types  Describe the properties and concepts of Chapter 4
of probabilities. probability and probability distributions.
 Define and apply the rules of the different
types of probability.
 Interpret probabilities and make decisions
based on the results.
 Describe the properties  Explain that a probability is a value between 0 Chapter 4
and concepts of a and 1.
probability in relation to
marketing decision-
making problem.
 Apply the rules of  Understand the different rules of Chapter 4
probability and describe probabilities. I.e. “less than”, “more than”, “at
the complement of an least”, “no more than”.
event and the process for
determining its
probability
 Describe two common  Describe the two common probability Chapter 5
probability distributions, distributions (Binomial and Poisson
i.e., Binomial and Poisson distributions).
distributions.

©IMM Graduate School Study Guide (BS101B) Page 51 of 125


 Recognise when to apply  Identifywhen to use the Binomial distribution Chapter 5
each of these and when to use the Poisson probability
distributions in marketing distribution given a specific dataset and/or
decision-making. scenario.
 Compute probabilities  Compute probabilities using the Poisson and Chapter 5
using each distribution. Binomial distributions.
 Interpret probability  Interpret probabilities and make decisions Chapter5
findings and apply them based on results.
to a marketing problem.

D. Study Unit 3 – Objective 1 (Chapter 4)

At the end of this chapter, you should be able to describe the properties and concepts of
probability, and define and apply the rules of the different types of probability. You should
also be able to interpret probabilities and make decisions based on the results.

1. Introduction

Although the rest of your Statistical Analysis studies deal with inferential statistics, please
remember that descriptive techniques should always be applied to your data, as a first step
in analysis. This section is about probability, although you will be relieved to know that there
is not much about probability theory, an independent and ongoing field of research! The
purpose is to teach you how to apply probability theory and for this you need to understand
the basic concepts and different types of probability.

2. Types of Probabilities

Probabilities are loosely divided into subjective and objective types, but the latter is the type
used in inferential statistics. The first section focuses on empirical probabilities (calculated
from data that have been collected).

3. Basic properties of a probability

Probabilities are always stated as a value from 0 to 1:

©IMM Graduate School Study Guide (BS101B) Page 52 of 125


0 is equivalent to a probability of zero = no chance of occurring,
1 is a probability of 100% =100% chance of occurring.

For example, if you flip a coin, the probability of flipping heads is the same as the probability
of flipping tails, i.e. 50% or 0.5 or ½. If you have a crooked coin, with two heads, the
probability of flipping heads would be 1 (100%) and the probability of flipping tails would be
0 (0%).

4. Basic probability concepts

Data may fall into only one category, or into more than one category.

For example: A clothing store asks you to conduct research into the split between male and
female customers, and who of those customers utilise extended shopping hours. The
following results were collected from a random sample of 100 shoppers:

Summary Table of Customer Attitudes towards Extended Shopping Hours

Gender Would shop during Would not shop during


extended shopping hours extended shopping hours

Male 39 12% 88%

Female 61 67% 33%

 When looking only at gender, a respondent is either male or female, meaning the data
will only fall into a single category, e.g. 61% are female.

 However, when looking at gender and likelihood to use extended shopping hours,
respondents fall into two categories, e.g. 67% are females who would use extended
shopping hours.

This leads up to the concept of statistical independence, meaning the occurrence of one event
does not influence the occurrence of a second event. The most important thing to remember
is that the probability of two independent events is equal to the product of the probabilities
of the separate events. For example:

 Flipping heads (of a coin) the first time will have no effect on the probability of flipping
heads (or tails) the second time = statistical independence.

©IMM Graduate School Study Guide (BS101B) Page 53 of 125


 The probability of flipping heads twice in a row is 0.5 (the probability of flipping heads
the first time) multiplied by 0.5 (the probability of flipping heads the second time),
which equals 0.25, or 25%.

5. Computation of objective probabilities

The three types of objective probabilities:

1. Marginal probability – the probability of only a single event occurring.


2. Joint probability – the probability of two events occurring simultaneously.
3. Conditional probability – the probability of one event occurring given
information about the occurrence of another.

Example: a study done on the companies listed on the JSE classified the companies into
groups according to industry sector and company size. This data is summarised in the cross-
tabulation below (Wegner, 2020):

Cross-tabulation table – JSE companies by sector and size

Sector Company size Row total

Small Medium Large

Mining 3 8 30 41

Financial 9 21 42 72

Service 10 6 8 24

Retail 14 13 6 33

Column total 36 48 86 170

Marginal probability: what are the chances of a company listed on the JSE being
medium in size? 48 out of 170 = 48 / 170 x 100 = 28.24 %.
Joint probability: what are the chances of a company listed on the JSE being both
medium in size and in the retail industry? 13 out of 170 = 13 / 170 x 100 = 7.65 %.
Conditional probability: what are the changes of a JSE listed company that is already
known to be medium in size, being in the retail industry? 13 retail companies out of the
48 medium sized companies = 13 / 48 x 100 = 27.08 %.

©IMM Graduate School Study Guide (BS101B) Page 54 of 125


6. Counting rules

There are three basic counting rules:

1. Multiplication rule – this can be applied in two ways:


a. The total number of ways in which n objects can be arranged (ordered): n!
= n factorial = n(n-1)(n-2)(n-3)………..
b. If a particular random process has n1 possible outcomes in the first trial, n2
possible outcomes in the second trial………

n1 x n2 x n3 ……………

2. Permutations – the number of distinct ways in which a group of objects can be


arranged. Each possible outcome (arrangement) is called a permutation.

n! = n factorial = n(n-1)(n-2)(n-3)………..

k = number of objects

n = total number of objects

3. Combinations – the number of different ways of arranging a subset of objects


selected from a group of objects where the ordering is not important. Each
possible arrangement is called a combination.

n! = n factorial = n (n-1)(n-2)(n-3)………..

r! = r factorial = r (r-1)(r-2)(r-3)………..

r = number of objects
n = total number of objects

©IMM Graduate School Study Guide (BS101B) Page 55 of 125


E. Study Unit 3 – Objective 2 (Chapter 5)

At the end of this section, you should be able to understand the concept of probability
distributions, and know when to apply two different types, how to describe and compute
them, as well as how to apply them to marketing decision-making.

1. Introduction

A random variable is a numerical description of the outcome of an experiment. For example,


the number of goals scored in a soccer match is a random variable with possible values of 1,
2, 3, 4… n; the number of customers eating at KFC on a particular day is a random variable
with possible values of 1, 2, 3, 4…. n. Even when flipping a coin, the random variable (the
outcome) can be designated as 1 (for heads) or 2 (for tails). These are examples of discrete
random variables. A continuous random variable, such as height or weight, however, can take
on all values in a certain interval (for example heights in metres of 1.60, 1.61, 1.62, 1.63, 1.64,
1.65, 1.66, 1.67…n).

2. Discrete probability distributions

The probability distribution for a random variable describes how the probabilities are
distributed over the values of the random variable. No matter what the type of random
variable, it will have an associated probability distribution, namely, a list of the possible
outcomes and their associated probabilities. For a discrete random variable, x, the probability
distribution is defined by the probability function, f(x). This section deals with the two most
commonly used discrete probability functions, i.e. Binomial and Poisson probability
distributions.

3. The Binomial probability distribution

This section outlines the four properties of a binomial distribution. Remember that bi means
‘two’, so any random variable with two possible outcomes (success = p, or failure = q or 1 - p)
will have a binomial distribution. Think of flipping a coin and consider that a head is a success
and a tail is a failure (especially if you are betting!). The formulas may rather be complicated,

©IMM Graduate School Study Guide (BS101B) Page 56 of 125


but don’t be concerned because you will not be asked to remember the formulas, since they
will be given in the exam. It is therefore essential that you understand the concepts behind
the formulas and how to interpret your results.

Note: A binomial probability distribution will always refer to a probability of success or failure.

P(number of successes (r) out of n trial) = 𝑛𝐶𝑥 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥

n = number of trials
x = number of successes

n – x = number of failures

p = probability of success in one trial

1 – p = q = probability of failure in one trial

The Marketing Manager of a chain of Chicken Licken restaurants would like to know which
area of the business is most successful – take-away or eat in. He hires you to do some
marketing research.

If there is a 25% probability of a customer wanting a take-away and there are 10 customers
in the restaurant, what are the chances that 7 of them will want a take-away pizza?

n = 10

x=7

n–x=3

p = 0.25 = probability of a customer wanting a take-away pizza


1 - p = 1 – 0.25 = 0.75 = probability of a customer not wanting a take-away chicken

P(7 out of 10 customers) = 10𝐶7 × 0.257 × 0.7510−7 = 0.0031 = 0%

Interpretation: There is almost no chance of 7 out 10 customers wanting a take-away.

Use for marketing decision-making:

 Having a better understanding of the customer - more eat in the restaurant

 Knowing more about purchase trends – fewer buy take-aways

 Knowing what services are more popular than others – eat in

 Knowing what areas of the business are busier – the restaurant

 Knowing what areas to promote more – take-aways

©IMM Graduate School Study Guide (BS101B) Page 57 of 125


4. The Poisson probability distribution

The Poisson distribution is often used when dealing with the number of occurrences of an
event over a specified interval of time or space, such as the number of Big Macs sold in one
hour, or the number of raisins in a muffin. It is impossible to know, beforehand, what the
maximum number can be. Two assumptions must be met when using a Poisson probability
function to describe the number of occurrences of a random variable:

the probability of the occurrence of the event is the same for any two intervals of equal
length, and
the occurrence of the event in any interval is independent of the occurrence in any
other interval.

Note: A Poisson probability distribution will always refer to an “average” or “mean” value.
𝑒 −  𝑥
P(number of occurrences (x) out of n trial) = 𝑥!

x = number of number of occurrences for which a probability is required

e = mathematical constant = 2.71828

 = mean or average number of occurrences


You are asked by the same Marketing Manager of the chain of pizza restaurants to do some
more marketing research. If there is an average that 5 out of 10 customers want a take-away
and there are 10 customers in the restaurant, what are the chances that 7 of them will want
a take-away pizza.

x=7
e = 2.71828

 =5
𝟐.𝟕𝟏𝟖𝟐𝟖−𝟓 ×𝟓𝟕
P (7 customers out of 10) = = 0.104445 = 10.44% chance
𝟕!

Interpretation: On average 5 out 10 customers want take-away pizza, but there is a 10.44%
chance that the number could increase to 7 out 10 customers wanting a take-away pizza.
Use for marketing decision-making:

Knowing more about purchase trends – it is unlikely to vary from the mean

©IMM Graduate School Study Guide (BS101B) Page 58 of 125


Knowing how to plan – there is only a 10.44% chance that the addition resources would
be needed for 7 take-away customers
Knowing what areas of the business will be busier – unlikely that the take-away side will
increase to 7 customers
Knowing what areas to promote more – would extra marketing increase the chances of
more customers wanting take-away?

5. Probability rules

Probability of 1 out of 5: P (x = 1)
Probability of 4 out of 5: P (x = 4)
Probability of less than 1 out of 5: P (x = 0)
Probability of less than 3 out of 5: P (x = 0) + P (x = 1) + P (x = 2)
Probability of more than 1 out of 5: 1 – (P(x = 0) + P(x = 1))
Probability of at least 2 out of 5: 1 – (P (x = 0) + P (x =1))
Probability of no more than 2 out of 5: P (x = 0) + P (x = 1) + P (x = 2)
Probability of between 0 and 2 out of 5: P (x = 1)
Probability of between 0 and 2 (inclusive) out of 5: P (x = 0) + P(x = 1) + P (x = 2)

6. The normal probability distribution

The normal probability distribution are characterised by the following:

 It is bell shaped
 It is symmetrical about a central value, µ (population mean)
 The tails of the distribution never touch the x axis
 It is described by two parameters: population mean (µ) and population standard
deviation (σ)
 The area under the curve equals 1 – this corresponds to the complete sample space
of a random experiment. This means it represents the sum of probabilities associated
with the variable being studied, or 100%
 Due to the symmetry, the area above µ is 0.5
 The probability associated with a particular range of x values is described by the area
under the curve, between the limits of the x range

©IMM Graduate School Study Guide (BS101B) Page 59 of 125


It is very important that you understand and remember these concepts.

7. The standard normal probability distribution

A random variable that has a normal distribution with a mean (µ) of 0 and a standard deviation
(σ) of 1 is said to have a standard normal probability distribution. The letter z is commonly
used to denote this particular normal random variable.

Again, please remember the concept, but know that your statistical package will ‘look up’ the
values for you.

©IMM Graduate School Study Guide (BS101B) Page 60 of 125


F. Study Unit 3 – Objective 3 (Chapter 6)

At the end of this chapter, you should understand the purpose of inferential statistics, the
difference between a sample and a population, the different types of sampling and reasons
for choosing a particular type in a marketing research study. You should also understand the
concept of a sampling distribution and explain its role in inferential statistics.

Inferential statistics – generalise sample findings to the whole population.


Sample – a subset that represents the population about which one is trying to draw
conclusions.
Population – all observations of a random variable under study about which one is trying
to draw conclusions.

1. Introduction

Inferential statistics allows the results obtained or measured from a sample (e.g. how much
50 South African grocery store shoppers spend) to be used to estimate the true parameter of
the population from which the sample was chosen (e.g. the mean (µ) spend of all South
African grocery shoppers). In this chapter you will learn about different sampling techniques
and sampling distributions.

2. Sampling

When the population whose parameter you want to measure (e.g. mean (µ) spend of all South
African grocery shoppers) is large, a sample, or subset of the particular population, is usually
chosen from that population. Often it is too expensive or otherwise not feasible to measure
the parameter in the entire population. It would cost thousands of Rands to ask all single
mothers living in Gauteng, with one child who is younger than 10 years, what their household
income is. It would also be impossible to identify the entire population, even if you did have
the time and the money. In such a case, you would select a sample, and measure the
population statistic (the mean (𝑥̅ ) household income of the sample of single mothers), and
then extrapolate, using inferential statistics, to estimate the population parameter (the mean

©IMM Graduate School Study Guide (BS101B) Page 61 of 125


(µ) household income of all single mothers with one child younger than 10 years and living in
Gauteng). It is important to remember that a representative sample will have identical
characteristics to the population from which it is selected.

Remember that the Greek letter annotations differ for:

a sample statistic (𝑥̅ )


and its population parameter (µ)

This is so that you know, immediately, whether the measure comes from the sample or the
population.

3. Sampling methods

There are two different types of sampling:

Non-probability sampling – observations / respondents are not selected randomly; not


every observation / respondent has an equal chance of being selected.
Probability sampling – observations / respondents are selected randomly; every
observation / respondent has an equal chance of being selected.

Only a sample selected using probability sampling methods can be used for inferential
statistics – as the sample needs to represent the population for the results of the study to be
generalised for the population. Simple random sampling is the easiest and most commonly
used method. Random numbers can be used to select a sample in this way. Such a sample
can be selected by drawing numbers from a hat. However, random number tables are
available in most statistical books, and they can be generated using statistical software.
Selection of the Lotto numbers is a bit like choosing random numbers from a ‘hat’. Each
number from one to 49 has an equal chance (probability) of being chosen.

4. The sampling distribution

If all possible samples of size n are chosen from a population and a statistic (e.g. sample mean
(𝑥̅ )) is calculated for each sample, all the values (e.g. sample mean for each sample) have a
certain distribution known as the sampling distribution of the statistic. In other words, the
sampling distribution is the distribution of values of the statistic in a large number of samples

©IMM Graduate School Study Guide (BS101B) Page 62 of 125


from the same population. It describes how the sample statistic varies from the population
parameter. From this, the level of confidence in estimating the population parameter from a
single sample statistic can be established. The sampling distributions of several common
statistics are approximately normal (remember the normal probability distribution from the
previous section).

Again, remember that, although the formulas are complicated, it is important to understand
the underlying concepts, but you do not need to memorise them.

5. Sampling distribution of the sample mean

A sampling distribution shows the relationship of the sample statistic (mean of the sample
(𝑥̅ )) and its population parameter (true mean of the population (µ)). From this, the level of
confidence in estimating the population parameter from a single sample statistic can be
established.

6. Sampling distribution of the sample proportion

The measure of central location for a continuous variable is the mean; the measure of central
location for a categorical variable is the proportion. The concept is very similar to that
described in the previous section. A sampling distribution shows the relationship of the
sample proportion (p) and its population parameter (π). From this, the level of confidence in
estimating the population parameter from a single sample proportion can be established.

7. Sampling distribution of the difference between two sample means

Often, two samples (𝑥̅ 1 – 𝑥̅ 2) are used to measure the difference between two populations
(µ1 - µ2), e.g. the difference in the mean turnover of a Gauteng shoes store (𝑥̅ 1) and a
Western Province (𝑥̅ 2) shoes store would be compared to the mean turnover of all the shoe
stores in Gauteng (µ1) and all the shoe stores in Western Province (µ1).

©IMM Graduate School Study Guide (BS101B) Page 63 of 125


8. Sampling distribution of the difference between two sample proportions

Often, two sample proportions (p1 – p2) are used to measure the difference between two
populations (π1 - π2), e.g. the difference in the proportion of male shoppers in a Gauteng golf
equipment store and a Western Province golf equipment store would be compared to the
proportion of men in all the golf equipment stores in Gauteng (π1) and all the golf equipment
stores in Western Province (π2).

G. Study Unit 3 – Revision Exercises

1. A survey was conducted by the canteen management to identify the bread preferences
of their customers at a local college. This information will be used to help redesign the
menu to be more appealing to the students. The survey produced the following results:
50 eat white untoasted bread

60 eat brown untoasted bread

72 eat brown toasted bread

30 eat white untoasted and brown toasted bread

24 eat brown and white untoasted bread


18 eat brown toasted and untoasted bread

12 eat white and brown untoasted bread, and brown toasted bread

a) How many students were involved in the study (n)? (1)

b) What percentage (proportion) of students eats white untoasted bread? (2)


c) What percentage (proportion) of students eats brown bread? (2)

d) What percentage (proportion) of students eats untoasted bread? (2)

e) How should the menu be changed, based on this information? (1)

2. A clothing store distributes flyers on a certain road in the Eastern Cape. There are, on
average, 4.5 sales per 10 flyers handed out.
a. What type of probability distribution is this? (1)

©IMM Graduate School Study Guide (BS101B) Page 64 of 125


b. Determine the probability that there will be at least 2 sales for every 10 flyers handed
out (6)

c. Determine the probability that there will be more than 2 sales for every 10 flyers
handed out. (8)

d. What does this information tell us about the effectiveness of this method of
advertising? (1)

3. In the same clothing store there is a 70% chance that a customer will be under the age of
50.
a. What type of probability distribution is this? (1)

b. If 10 customers are randomly surveyed, what is the probability that less than 3
customers will be over 50 years old? (8)

c. If 15 customers are randomly surveyed, what is the probability that more than 2
customers will be under 50 years old? (8)

d. What does this information tell us about customers of this store? (1)

H. Study Unit 3 – Revision Exercises Solutions

1.
a. 266 students 
50+30+24+12 116
b. White untoasted bread: = 266  = 43.61%
266
60+72+30+24+18+12 216
c. Brown bread: = 266  = 81.20%
266
50+60+30+18+24+12 194
d. Untoasted bread: = 266  = 72.93%
266

e. 81% of the respondents prefer brown bread and 72.93% of respondents prefer
untoasted bread, therefore the menu needs to be adjusted to have more brown
untoasted options. 
2.
a. Poisson Probability distribution

b. n = 10

©IMM Graduate School Study Guide (BS101B) Page 65 of 125


λ = 4.5
e = 2.71828
x = 2, 3, 4, 5, 6, 7, 8, 9, 10
𝑃(𝑋 ≥ 2) = 1 − 𝑃(𝑋 < 2) = 1 − (𝑃(𝑋 = 0) + 𝑃(𝑋 = 1))
𝑒 −4.5 4.5 0 𝑒 −4.5 4.5 1
=1−( + )
0! 1!
= 1 − (0.0111 + 0.04999)
= 0.9389 ≈ 93.89%
c. n = 10
λ = 4.5
e = 2.71828
x = 3, 4, 5, 6, 7, 8, 9, 10
𝑃(𝑋 > 2) = 1 − 𝑃(𝑋 ≤ 2) = 1 − (𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2))
𝑒 −4.5 4.5 0 𝑒 −4.5 4.5 1 𝑒 −4.5 4.5 2
=1−( + + )
0! 1! 2!
= 1 − (0.0111 + 0.04999 + 0.1125)
= 0.8264 ≈ 82.64%

d. This is an effective way of marketing as there is a high probability (82.64%) that there
will be more than 2 sales for every 10 flyers handed out. 

3.
a. Binomial Probability distribution

a. n = 10
x = 0, 1, 2
p = 0.7 = probability of a customer under the age of 50
1 - P = 1 – 0.7 = 0.3 = probability of a customer over the age of 50

𝑃(𝑋 < 3) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)


= [10𝐶0 × 0.30 × 0.7(10−0)  + 10𝐶1 × 0.31 × 0.7(10−1)  + 10𝐶0 × 0.32 × 0.7(10−2) ]

= 0.0282 + 0.1211 + 0.2335


= 0.3828 ≈ 38.28%

©IMM Graduate School Study Guide (BS101B) Page 66 of 125


b. n = 15
x = 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
p = 0.7 = probability of a customer under the age of 50
1 - P = 1 – 0.7 = 0.3 = probability of a customer over the age of 50

𝑃(𝑋 > 2) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)]


= 1 − [15𝐶0 × 0.70 × 0.3(10−0)  + 15𝐶1 × 0.71 × 0.3(10−1)  + 15𝐶0 × 0.72
× 0.3(10−2) ]

= 1 − (0.000000 + 0.000001 + 0.000008)


= 0.999991 ≈ 100%
c. There is a very big chance, almost 100% that more than 2 out of 15 people will be
under the age of 50

J. Study Unit 3 - Progress check

You have come to the end of Study Unit 3.


Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises.

©IMM Graduate School Study Guide (BS101B) Page 67 of 125


Your progress checklist:

Progress checklist YES / NO?

Did you read through each study unit outcome?

Did you go through all learning material

Did you complete all the relevant revision exercises and check your answers
against the answers provided?

At this point, you should be able to: (list study unit outcomes again)

©IMM Graduate School Study Guide (BS101B) Page 68 of 125


SECTION C

Study Unit 4 – Making statistical inferences

A. Study Unit 4 – Chapters 7 – 10:

“Statistical thinking will one day be as necessary a


qualification for efficient citizenship as the ability to

read and write.”


- H.G. Wells

B. Study Unit 4: Module Outcomes

Let’s recap what the relevant module learning outcome is for this study unit:

Use statistical inference to answer specific questions with a known degree of


confidence.

C. Study Unit 4 – Specific Outcomes

Let’s recap what the relevant module learning outcome is for this study unit

In most cases data are gathered from a sample, rather than from a population. The student
should be able to understand the different measures available to choose an appropriate
sample, these methods are discussed in this study unit:

The sampling distribution describes the relationship between the sample statistic and
the population parameter.

©IMM Graduate School Study Guide (BS101B) Page 69 of 125


The confidence interval gives a measure of the confidence with which we can say that
the population parameter lies between two values. This refers to how confident we are
that results can be generalised to the population.
Confidence intervals and their calculations are also discussed in this study unit.
Inferential statistics often requires the student to make decisions based on minimal
information (from a sample).
Hypothesis testing is the means by which decision-making can be done using scientific
methods.

D. Study Unit 4 - Assessment Criteria

Study Unit Outcomes Assessment Criteria Relevant


(The student should be able (How will you know if the student has Chapter in
to…) achieved the learning outcome?) the
Prescribed
Text Book
1. Describe the purpose of  Describe the concept of a sampling Chapter 6
inferential statistics. distribution
 Explain the role of a sampling
distribution in inferential statistics.
2. Understand the concept of  Explain that a sampling distribution Chapter 6
a sampling distribution and shows the relationship of the sample
explain the role of a statistic (mean of the sample (𝑥̅ )) and
sampling distribution in its population parameter (true mean
inferential statistics. of the population (µ)). From this, the
level of confidence in estimating the
population parameter from a single
sample statistic can be established.
 Explain that a sampling distribution
shows the relationship of the sample
proportion (p) and its population
parameter (π).
3. Understand the concept of  Extrapolate (used to create an Chapter 7
point estimation and estimate) the sample statistic to the
confidence interval population directly, without any
estimation. degree of confidence.

©IMM Graduate School Study Guide (BS101B) Page 70 of 125


4. Compute and interpret  Explain that when the standard Chapter 7
confidence intervals for deviation of the population is
various sample statistics. unknown, interval estimation of the
population mean is based upon a
probability distribution known as the t-
distribution.
 Explain that when the standard
deviation of the population is known,
interval estimation of the population
mean is based upon a probability
distribution known as the z-
distribution.
 Apply knowledge of the characteristics
of t- and z- distributions by identifying
the distribution type given a specific
dataset and/or scenario.
 Calculate a confidence interval either
using the t-or z-table.
5. Understand the concept of  Describe the five steps to be followed Chapter 8
classical statistical in the process of hypothesis testing:
hypothesis testing. - Formulating hypotheses – null and
alternative
- Determine the area of acceptance
and rejection
- Compute the sample statistic
- Compare the sample statistic with
the area of acceptance
- Draw statistical and management
conclusions

6. Perform the classical  Formulate appropriate null and Chapter 8


statistical hypothesis alternative hypotheses
testing process  perform the following hypothesis tests
on marketing problems:
- Tests concerning means and
proportions
- Tests concerning differences
between two means
- Tests concerning differences
between two proportions
- Tests concerning small samples
and population deviations
unknown (learner distribution).
7. Interpret the results of a  Interpret the results and draw Chapter 8
hypothesis test. management and statistical
conclusions.

©IMM Graduate School Study Guide (BS101B) Page 71 of 125


 Understand the concept of  Explain that the observed frequency Chapter 10
the chi-square statistic. distribution is compared to the
expected frequency distribution in a
chi-squared goodness of fit.
 Perform independence of  Use the chi-square statistic to Chapter 10
association hypothesis determine if there is an association
tests. between two independent variables.

 Perform goodness-of-fit  Use the five steps in hypothesis testing Chapter 10


hypothesis tests to perform a goodness-of-fit
hypothesis test

E. Study Unit 4 – Objective 1 (Chapter 7)

At the end of this chapter, you should understand the concept of confidence intervals, as well
as be able to calculate and interpret them for use in marketing decision-making.

1. Introduction

Confidence intervals and point estimations enable us to determine the ‘accuracy’ of our
measured sample statistic in terms of the true population parameter.

2. Point estimation

Using a point estimate, you can extrapolate (used to create an estimate) the sample statistic
to the population directly, without any degree of confidence. In other words, the mean or
proportion of the sample surveyed is generalised for the population under study. The point
estimate is highly unlikely to be a measure of the true population parameter and is therefore
seldom used.

3. Confidence interval estimation

Remember that you are using sample statistics in order to make certain inferences about the
population. A confidence interval estimate is a range of values within which the population
parameter is expected to lie. The key word is expected and therefore you calculate the

©IMM Graduate School Study Guide (BS101B) Page 72 of 125


interval with a specific level of confidence. The confidence interval estimate is usually
calculated using a confidence coefficient of 0.95. In other words, when we calculate a
confidence interval, we are 95% confident that the interval, or range of values, will include
the population mean.

4. Confidence intervals and the corresponding z-limits

The width of the confidence interval is dependent on various aspects. The narrower the
confidence interval, the more precise the interval estimate.

The confidence interval is expressed by the z values from a standard normal probability
distribution.
The z-limits identify the number of standard errors either side of the sample mean point
estimate that reflects the probability that derived the confidence limits that will cover
the true population mean.
Greater confidence is associated with lower precision (a wider range within which an
acceptable answer may fall), and vice versa.

90% confidence = +/ - 1.645


95% confidence = +/- 1.96

99% confidence = +/- 2.58

5. Confidence interval for a population mean when the population standard


deviation is known

In order to calculate the confidence interval estimate for the population mean you require
three statistical measures. You do not have to memorise the formula, but make sure you are
able to use the formula in order to calculate the limits and that you are able to interpret the
results.
𝜎 𝜎
𝑥̅ − 𝑧 ≤ 𝜇 ≤ 𝑥̅ + 𝑧
√𝑛 √𝑛

©IMM Graduate School Study Guide (BS101B) Page 73 of 125


The most important aspect of this section is the interpretation of the confidence interval. It
is important that you be able to interpret a confidence interval correctly and accurately.

Example from Wegner (2020): A survey of a random sample of 300 grocery shoppers in
Kimberley found that the mean value of their grocery purchases was R 78. Assume that the
population standard deviation of grocery purchases is R 21.

Find the 95% confidence limits for the average value of a grocery purchase by all grocery
shoppers in Kimberley.

Solution:

n (sample size) = 300


𝒙̅ (sample mean) = 78
σ (population standard deviation) = 21
z – value = +/- 1.96

𝜎 𝜎
Calculation: 𝑥̅ − 𝑧 ≤ 𝜇 ≤ 𝑥̅ + 𝑧
√𝑛 √𝑛
21 21
= 78 − 1.96 ; 78 + 1.96
√300 √300

= (78 – 2.376); (78 + 2.376)


= 75.62; 80.38

Interpretation: There is a 95% chance that the population parameter will fall between the
upper and lower limits calculated. I am 95% confident that the average amount spent by a
shopper in Kimberley is between R 75.62 and R 80.38.

The student t-distribution

When the standard deviation of the population is unknown, interval estimation of the
population mean is based upon a probability distribution known as the t-distribution. The t-
distribution is known as a robust distribution as it can be used to provide satisfactory results
for many possible population distributions. This is the most common distribution used in
calculating confidence intervals.

©IMM Graduate School Study Guide (BS101B) Page 74 of 125


6. Confidence interval estimation for the population when the population
standard deviation is unknown

The calculation of a confidence interval estimate is described for situations where the
population standard deviations are unknown and the sample size is small. In this case you
make use of the t-distribution in order to calculate the level of confidence. When the sample
size is greater than 40 you can use the z-limits as an approximation to the t-limits.
𝑠 𝑠
𝑥̅ − 𝑡(𝑛−1) ≤ 𝜇 ≤ 𝑥̅ + 𝑡(𝑛−1)
√𝑛 √𝑛

As the population standard deviation is not known, the z values can’t be used. Degrees of
freedom are used instead, which are found by using (n – 1):

Sample size (n) Degrees of freedom (n – 1) t limits (read off t tables)

6 5 +/- 2.571

11 10 +/- 2.228

25 24 +/- 2.064

41 40 +/- 2.021

61 60 +/- 2.000

121 120 +/- 1.960

Example: A clothing store analysed the value of a random sample of 25 credit card purchases.
The sample mean was found to be R 170 and the sample standard deviation was R 22.

Set the 95% confidence limits for the actual mean value of credit card purchases made at this
store.

Solution:

 n (sample size) = 25

 ̅ (sample mean) = 170


𝒙

 𝒔 (sample standard deviation) = 22

©IMM Graduate School Study Guide (BS101B) Page 75 of 125


 t – value = (n-1) = 25-1= 24 degrees of freedom= +/- 2.064 (Table 2, Appendix
1)

𝑠 𝑠
Calculation: 𝑥̅ − 𝑡(𝑛−1) ≤ 𝜇 ≤ 𝑥̅ + 𝑡(𝑛−1)
√𝑛 √𝑛
22 22
= 170 − 2.064 √25 ; 170 + 2.064 √25

= (170 – 9.08); (170 + 9.08)


= 160.92; 179.08

Interpretation: There is a 95% chance that the population parameter will fall between the
upper and lower limits calculated. I am 95% confident that the actual mean value of credit
card purchases at the store lies between R 160.92 and R 179.08.

7. Confidence interval for the population proportion

The statistical analyses you are going to use depend on the type of data you have sampled. If
you have sampled categorical data, the appropriate measure of central location is the
proportion. Again it is important that you understand the concept of a confidence interval
and correctly interpret the calculated interval.

𝑝𝑞 𝑝𝑞
𝑝 − 𝑡√ ≤ 𝜋 ≤ 𝑝 + 𝑡√
For a small sample: 𝑛 𝑛

(t-limits)

𝑝𝑞 𝑝𝑞
For a large sample: 𝑝 − 𝑧√ ≤ 𝜋 ≤ 𝑝 + 𝑧√
𝑛 𝑛
(z-limits)

Example: A recent survey of 240 randomly selected street vendors in Johannesburg showed
that 84 of them felt that local by-laws still hampered their trading (84 out of 240 = a
proportion)

Find the 90% confidence limits for the true proportion,𝜋 of all Johannesburg street vendors
who believe that local by-laws still hamper their trading

©IMM Graduate School Study Guide (BS101B) Page 76 of 125


Solution:

 n (sample size) = 240 (large sample = z-limits)

 𝒑 (sample proportion) = 0.35

 𝒒 = 1 − 0.35
 z – value = +/- 1.645

𝑝𝑞 𝑝𝑞
Calculation: 𝑝 − 𝑧√ ≤ 𝜋 ≤ 𝑝 + 𝑧√
𝑛 𝑛

0.35(1−0.35) 0.35(1−0.35)
= 0.35 − 1.645√ ; 0.35 + 1.645√
240 240

= (0.35 – 0.0507); (0.35 + 0.0507)

= 0.299; 0.401

Interpretation: There is a 90% chance that the population parameter will fall between the
upper and lower limits calculated. I am 90% confident that true proportion of Johannesburg
street vendors who believe local by-laws hamper their trading is between 29.9 % and 40.1%

F. Study Unit 4 – Objective 2 (Chapter 8)


At the end of this chapter, you should understand and be able to perform and interpret
hypothesis tests.

1. Introduction

Hypothesis tests are often used to validate claims about products or about statistics. This
section describes the hypotheses tests that can be conducted on the four different population
parameters discussed in the previous two sections

2. The process of hypothesis testing

The five steps to be followed in the process of hypothesis testing:

©IMM Graduate School Study Guide (BS101B) Page 77 of 125


1.) Formulating hypotheses – null and alternative

2.) Determine the area of acceptance and rejection

3.) Compute the sample statistic


4.) Compare the sample statistic with the area of acceptance

5.) Draw statistical and management conclusions

Formulation of the null hypothesis (𝐻0 ) (and, conversely, the alternative hypothesis) is the
key element of this process. It is important that you fully understand and be able to clearly
formulate and articulate null hypotheses. Basically, the null hypothesis states that the
difference between the hypothesised population parameter and the true population
parameter is zero (i.e. there is no difference). The alternative hypothesis (𝐻1 ) states that
there is a difference. However, it is important to establish whether this difference is a ‘greater
than’ difference, a ‘smaller than inference, or merely a difference in any direction. This will
determine which test you use (a one-sided or two-sided test).

Note that in hypothesis testing, it is always the null hypothesis that is tested, rather than the
alternative hypothesis. Once you have performed the calculation, you will either accept the
null hypothesis or reject it, based on where your sample statistic falls in relation to the area
of acceptance. It is incorrect to say that you accept the alternative hypothesis.

Statistical conclusions can be based by using two approaches. Most statistical programmes
will produce a p-value (approach two) for any test of significance. Therefore it is important
that you understand how to make use of approach two, using the p-value method, for
example, if a hypothesis test produces a p value of less than 0.05 (at a significance level of
95%) then it can be said the difference is statistically significant and the null hypothesis is
rejected. At a p-value of 0.05 or greater means that the difference is not statistically
significant and the null hypothesis is accepted.

10% = 90% confidence = +/ - 1.645

5% = 95% confidence = +/- 1.96

1% = 99% confidence = +/- 2.58

©IMM Graduate School Study Guide (BS101B) Page 78 of 125


3. Hypothesis test for a single population mean (σ is known)

Make sure you are able to apply the five steps of hypothesis testing to any example.
Understand the interpretation of the p-value.

𝑥̅ − 𝜇
𝑧𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

Example: The Grocery Retailers Association of South Africa (GRASA) believes that the average
amount spent on groceries by Cape Town shoppers on each visit to the supermarket is R 175.
To test this belief, the association commissioned Market Research e-Africa to conduct a
survey among a random sample of 360 grocery shoppers at supermarkets in Cape Town.
Based on the survey, the average value of grocery purchases was R 182.40. Assume that the
population of grocery purchase values is normally distributed with a standard deviation of R
67.50. Can GRASA conclude that grocery shoppers spend R 175, on average, on each visit to
a supermarket? Conduct a test at the 5% level of significance.

Solution:

n (sample size) = 360


̅ (sample mean) = 182.40
𝒙
𝝁 (population mean) = 175
𝝈 (population standard deviation) = 67.5
z – value = +/- 1.96
Two-tailed test because the claim that the average spend is exactly R 175 is being
tested.

1) Hypotheses:

H0: There is no difference between the population parameter and the sample statistic; 𝜇
= R 175

H1: There is a difference between the population parameter and the sample statistic; 𝜇
≠ R 175; 𝜇 <> R 175
2) Region of acceptance:

©IMM Graduate School Study Guide (BS101B) Page 79 of 125


∝ = 5% = +/- 1.96

Accept if the values falls between -1.96 and 1.96 (two-tailed test); reject if bigger or less
than.
182.4−175
3) Calculation: 𝒛 = = 7.4 / 3.558 = 2.08
67.5/√360

4) Compare value with region of acceptance and conclusion:

2.08 is bigger than 1.96

Reject the Null hypothesis as 2.08 falls outside the area of acceptance (-1.96 to 1.96).

5) Conclusion:

This shows that the average amount spent by grocery shoppers in Cape Town is not R175,
and that GRASA’s claims cannot be supported based on the sample results.

4. Hypothesis test for a single population mean (σ is unknown)

Apply the five steps of hypothesis testing to examples if the population standard deviation is
unknown. When this is the case, you replace the population standard deviation with the
sample standard deviation. When the population standard deviation is unknown and the
sample size is less than or equal to 40 then you always make use of the t-statistic.

𝑥̅ − 𝜇
𝑡𝑐𝑎𝑙𝑐 = 𝑠
√𝑛

Example adopted from Wegner (2020): SARS officials think that on average it takes 45
minutes or less for the typical South African to complete their tax return via e-filing. To test
this, SARS conducted a study on 12 tax-paying South Africans to assess how long the e-filing
system takes. They found that the average completion time was 41.5 minutes, with a sample
standard deviation of 9.04 minutes. Can SARS conclude that tax payer’s take, on average, 45
minutes or less to complete their tax returns via e-filing? Conduct a test at the 5% level of
significance.

Solution:

©IMM Graduate School Study Guide (BS101B) Page 80 of 125


n (sample size) = 12
̅ (sample mean) = 41.5
𝒙
𝝁 (population mean) = 45
𝒔 (sample standard deviation) = 9.04
t – value = 1.796
One-tailed test - the claim that the average time taken is 45 minutes or less.

1) Hypotheses:

H0: SARS is correct; it takes 45 minutes or less to complete a tax return via e-filing; 𝜇 ≥45
minutes.

H1: SARS is not correct, it takes longer than 45 minutes to complete a tax return; 𝜇 < 45
minutes.

2) Region of acceptance:

∝ = 5% = -2.201

Accept if the value falls at or above -2.201 (one-tailed test); reject if less than.
41.5−45
3) Calculation: 𝒕 = = -3.5 / 2.6096 = -1.341
9.04/√12

4) Compare value with region of acceptance and conclusion:

-1.341 falls inside of the area of acceptance

5) Conclusion:

This survey revealed that the average amount of time taken to complete a tax return via
e-filing is not less than 45 minutes. Thus SARS’ claim cannot be supported, with a 5% level
of significance.

5. Hypothesis test for a single population proportion

Follow the example in the case when a claim is made about the central value of a categorical
variable. In this case we refer to this measurement as a sample proportion. You will notice
that the same four steps are followed as in the case of hypothesis testing for the mean.

©IMM Graduate School Study Guide (BS101B) Page 81 of 125


𝑝− 𝜋
𝑧𝑐𝑎𝑙𝑐 =
√𝜋(1 − 𝜋)
𝑛

Example from Wegner (2020): A mobile phone service provider, Cell D Mobile, claims that it
has 15% of the prepaid mobile phone market. A competitor, who commissioned a market
research company to conduct a survey amongst prepaid mobile phone users, challenged this
claim. The market research company randomly sampled 360 prepaid mobile users and found
that 42 users subscribe to Cell D Mobile.

Test, at a 1% level of significance, Cell D Mobile’s claim that they have 15% share of the
prepaid market.

Solution:

n (sample size) = 360


𝒑 (sample proportion) = 42 out of 360 = 0.11667
𝝅 (population proportion) = 0.15
z– value = +/- 2.58
Two-tailed test because the claim that the market share is exactly 15% is being tested.

1) Hypotheses:

H0:𝝅 = 𝟎. 𝟏𝟓

H1:𝝅 ≠ 𝟎. 𝟏𝟓; 𝝅 <> 0.15

2) Region of acceptance:
∝ = 1% = +/- 2.58
Accept if the values falls between -2.58 and 2.58; reject if bigger / less than.

0.1167−0.15
3) Calculation: 𝒛 = = -0.0333 / 0.0188 = -1.771
0.15(1−0.15)

360

4) Compare value with region of acceptance:

-1.771 is less than 2.58, but more than -2.58


Accept the Null hypothesis as -1.771 falls within the area of acceptance

©IMM Graduate School Study Guide (BS101B) Page 82 of 125


5) Conclusion:

This proves that the claims made by Cell D Mobile are correct, and with a 1% level of
significance, they do hold 15% of the market share.

G. Study Unit 4 – Objective 3 (Chapter 9)

After studying this section you should be able to perform hypotheses tests for two population
problems when different kinds of data are surveyed.

1. Introduction

In this chapter you make use of exactly the same principles and procedures you have used in
the previous section, except now you apply it to comparing parameters of two populations.
You have to know the four procedures of hypothesis testing; the main difference in this
section is the test statistics that are calculated in order to test the null hypothesis.

These kinds of hypotheses are executed to make decisions regarding real marketing
problems, such as:

The level of brand awareness between two different market segments.


The average time customers spend queuing in two different stores in a chain.
The sales of a particular brand at two different stores.
The difference is purchasing habits between male and female customers.
Try to add real life problems from your own experience in your working environment.

There are different assumptions that you need to be aware of when performing these kinds
of hypothesis tests. Work through the given examples and make sure you are able to perform
the steps. If you look at all the formulas and calculations, you may get discouraged, but
remember that all of the tests follow the same five steps. The greatest challenge for you is to
know what the applicable assumptions are for that specific problem and which test statistic
to choose in order to perform the hypothesis test. It may benefit you to use a flow diagram
with all the different combination parameters and assumptions with the relevant test statistic
to use.

©IMM Graduate School Study Guide (BS101B) Page 83 of 125


2. Hypothesis test for the difference between two population means

(𝑥̅1 − 𝑥̅ 2 ) − (𝜇1 − 𝜇2 )
𝑧𝑐𝑎𝑙𝑐 =
𝜎12 𝜎22

𝑛1 + 𝑛2

Example Wegner (2020): The marketing manager of PQ Printers wants to be sure that they
are offering their customers the best possible service, and since part of their marketing
offering is a “speedy same-day delivery’, the marketing manager wants to be sure they are
using the quickest couriers to deliver to their customers. PQ Printers currently uses Courier A,
but wants to compare their delivery times to Courier B. PQ Printers assess the delivery times
of the last 60 times that they had used Courier A to deliver, and found that the sample mean
delivery time was 42 minutes, with a population standard deviation of 14 minutes. Courier B
was tested 48 times, and it was recorded that the sample mean delivery time was 38 minutes,
with a population standard deviation of 10 minutes. Test the claim with a 5% level of
significance that there is no difference between the two delivery companies.

Solution:

n1 (sample size A) = 60 ; n2 (sample size B) = 48


̅1 (sample mean A) = 42 ; 𝒙
𝒙 ̅2 (sample mean B) = 38
𝝈1 (population standard deviation A) = 14
𝝈2 (population standard deviation B) = 10
z– value = +/- 1.96
Two-tailed test because the claim is simply that there is no difference.

1) Hypotheses:
H0: 𝜇 1 − 𝜇 2 = 0; the delivery companies are the same

H1: 𝜇 1 − 𝜇 2 ≠ 0; the delivery companies are not the same

2) Region of acceptance:

∝ = 5% = + / -1.96 (two-sided test)

Accept if the value falls between - 1.96 and + 1.96; reject if greater than 1.96, or less than
-1.96.

©IMM Graduate School Study Guide (BS101B) Page 84 of 125


(42 − 38)− (0) 4
3) Calculation: 𝒛 = = = 1.72944
(14 x14) (10 x10) 2.313
√ +
60 48

4) Compare value with region of acceptance:

1.73 is less than 1.96, but more than -1.96


Accept the Null hypothesis as 1.73 falls within the area of acceptance (+/- 1.96).

5) Conclusion:

This means that the marketing manager of PQ Printing can safely continue to use Courier A,
as there is not enough evidence to suggest that Courier B delivers any faster, at a 5% level of
significance.

3. Hypothesis test for the difference between two population proportions

(𝑝1 − 𝑝2 ) − (𝜋1 − 𝜋2 )
𝑧𝑐𝑎𝑙𝑐 =
1 1
√𝜋̂(1 − 𝜋̂)( + )
𝑛1 𝑛2

Example Wegner (2020): After a recent AIDS awareness campaign, the Department of
National Health commissioned a market research company to conduct a survey on its
effectiveness. Their brief was to establish whether the recall rate of teenagers differed from
that of young adults. The market research company interviewed a random sample of 640
teenagers and 420 young adults. It was found that 362 teenagers and 260 young adults were
able to recall the AIDS awareness slogan. Test, at the 5% level of significance, the hypothesis
that there is an equal recall rate between teenagers and young adults.

©IMM Graduate School Study Guide (BS101B) Page 85 of 125


Solution:

n1 (sample size teenagers) = 640 ; n2 (sample size young adults) = 420


𝒑1 (sample proportion teenagers) = 362 out of 640 = 0.57
𝒑2 (sample proportion young adults) = 260 out of 420 = 0.62
z– value = +/- 1.96
Two-tailed test because the claim is that the recall rates are equal.

1) Hypotheses:
H0: 𝜋1 − 𝜋2 = 0; the recall rates are equal

H1: 𝜋1 − 𝜋2 ≠ 0; the recall rates are not equal

2) Region of acceptance:

∝ = 5% = +/- 1.96 (two-sided test)


Accept if the values within - 1.96 and +1.96; reject if less than - 1.96 or more than 1.96.
(0.57 – 0.62)− 0 −0.05 −0.05
3) Calculation: 𝒛 = = = 0.0309 = −1.62
√(
362+260
)
1
(1−0.59)( + )
1 √0.59×0.41×0.00394
640+420 640 420

4) Compare value with region of acceptance:

-1.62 is greater than - 1.96 and less than 1.96.

Accept the Null hypothesis as -1.62 falls within the area of acceptance.
5) Conclusion:

This means that at a 5% level of significance, there is no difference between the recall rates
teenagers and young adults regarding the AIDS awareness slogan.

H. Study Unit 4 – Objective 4 (Chapter 10)


At the end of this section, you should understand the concept of the chi-square statistic, be
able to perform the relevant tests using the test statistic, and interpret the results.

1. Introduction

One often ends up with a table with ‘counts’ of a value, such as the number of males and
females who prefer Willards or Simba potato chips. This information would be displayed in a
cross-tabulation, i.e. a table with two rows and two columns of numbers. The chi-square

©IMM Graduate School Study Guide (BS101B) Page 86 of 125


statistic is used to test the null hypothesis that there is no relationship between the row and
column classifications. The chi-square statistic compares the observed or counted values with
the expected values

2. Test of independence of association

If two variables are independent, then there is no association between them. If they are not
independent and there is some relationship between them, then the next step in the analysis
is to study the nature of the relationship. The chi-square statistic can be used to determine if
there is an association. This section describes the test and the five steps to be followed. Two
assumptions are made about the data, namely, a simple random sample of size n has been
selected from a large population, and the sample size is reasonably large.

3. The chi-squared goodness-of-fit-test

This test is used mainly to confirm the normality of a data set (i.e. are the data normally
distributed?) but can be used to confirm any underlying probability model. The observed
frequency distribution is compared to the expected frequency distribution. Again, the steps
in the procedure are described with the aid of examples.

Note: The chi-square approximation of the test statistic for the goodness-of-fit test is, strictly
speaking, only applicable if the number of sample observations is large and all the expected
frequencies are at least five.

(𝑓𝑜 − 𝑓𝑒 )2
𝑥 2 𝑐𝑎𝑙𝑐 = ∑
𝑓𝑒
𝑓𝑜 is the observed frequency
𝑓𝑒 is the expected frequency

Example:

Metrorail Commuter Service is studying the daily commuting patterns of workers into
the central business district of Cape Town. A study conducted seven years ago found
that 40% of commuters used trains, 25% used cars, 20% used taxis and 15% used buses.

©IMM Graduate School Study Guide (BS101B) Page 87 of 125


As a marketing intern at Metrorail, you conducted a recent survey of 400 randomly
selected commuters found that 135 commuters used trains, 115 used cars, 96 used taxis
and the rest travel by bus.
Metrorail would like to test with a 5% level of significance if commuting patterns have
changed. Use the critical value of 7.815
Solution:

Mode of fo fe (fo- fe)2 (fo- fe)2


transport fe

Train 135 (400x40%) = 160 625 3.906

Car 115 (400x25%) = 100 225 2.25

Taxi 96 (400x20%) = 80 256 3.20

Bus 54 (400x15%) = 60 36 0.60

400 400 9.956

1) Hypotheses:
H0: 𝒇𝒐 = 𝒇𝒆; commuting patterns today are the same as they were seven years ago.

H1: 𝒇𝒐 ≠ 𝒇𝒆; commuting patterns are not the same

2) Region of acceptance:

Accept if the value falls at or below 7.815; reject if higher than 7.815.
IF critical value is not given, use degree of freedom table
df = k – m – 1 = 4 (train, car, taxi, bus) – 0 – 1 = 3

∝ = 5% = 7.815.

3) Calculation: = 9.956
4) Compare value with region of acceptance:

9.956 is higher than 7.815

Reject the Null hypothesis as 9.956 falls outside the area of acceptance (7.815)

5) Conclusion:

The results of the research show that there has been a change in commuting patterns and
Metrorail should take the new trends into consideration.

©IMM Graduate School Study Guide (BS101B) Page 88 of 125


I. Study Unit 4 – Revision Exercises

1. A large South African general dealer hired a marketing research company to research into
the effectiveness of introducing a new drill at a discounted prices. A representative sample
of 120 stores in the chain was chosen, and the stores were randomly split in two equal
groups of 60 stores. These stores did not advertise, and displayed their merchandise in
similar ways. A new kind of drill was introduced in all 120 stores. Group A introduced the
drill at the special low price of R 599, with the price increasing to R 649 after two weeks.
Group B introduced the drill at the regular price of R 649. Total sales of the drills were
computed for each store for the first two weeks; the results are given below.

This research was conducted under a 95% confidence level.

Group Average number of Average number Difference Coefficient of


discounted sales of standard sales between variation
per week per week discounted sales
and standard sales

Group A 50 5.8%
(R 599)

Group B 66 15.5%
(R 649)

- 16

a. Define “population” as it relates to marketing research, and state what the population
of this study would be. (2)

b. Define “sample” as it relates to marketing research, and state what the sample of this
study consists of. (2)

c. What is the variable under study in this example? (1)

d. Calculate the standard deviation for both groups. (4)


e. How confident are the marketing researchers in the accuracy of this research? (1)

©IMM Graduate School Study Guide (BS101B) Page 89 of 125


f. Based on the level of confidence, what is the critical value that would be used to
calculate the confidence limits? (1)

g. Calculate the confidence limits for the actual mean sales of the new drills under group
A’s discounted price of R 599 (4)

h. Interpret the above answer in terms of the marketing research conducted (2)

i. If you were a store manager for one of the group A stores, what inventory level of the
new drills would you stock? Explain your answer (2)

J. Study Unit 4 – Revision Exercises Solutions

a. Population – the collection of all the observations of a random variable under study.
In this study the population would be all stores in the chain selling the new drills. 
b. Sample – a representative subset of the population on which observations are made.
In this study the sample would be the 120 stores chosen to be in the study. 
c. Variable under study – the selling price of the new drills
d. Standard deviation (CV x Mean):
Group A: 5.8% 𝑥 50 = 2.9 
Group B: 15.5% 𝑥 66 = 10.23
e. 95% confident
f. Critical value for 95% = 1.96
𝜎 𝜎
g. 𝑥̅ − 𝑧 ≤ 𝜇 ≤ 𝑥̅ + 𝑧
√𝑛 √𝑛

2.9 2.9
50 − 1.96 ≤ 𝜇 ≤ 50 + 1.96 
√60 √60
49.27 ≤ 𝜇 ≤ 50.73

𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 = 49.27


𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 = 50.73
h. The researchers are 95% confident that the mean sales for the drills will fall between
49.27 and 50.73 
i. Between 49 and 51 – to be safely within the upper and lower limit, hopefully don’t run
out of stock and don’t carry too much stock. 

©IMM Graduate School Study Guide (BS101B) Page 90 of 125


Or Above 51 – drills are not perishable items and it would do no harm to keep more
than the expected upper limit. 

K. Study Unit 4 - Progress check

You have come to the end of Study Unit 4.

Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises

©IMM Graduate School Study Guide (BS101B) Page 91 of 125


Your progress checklist:

Progress checklist YES / NO?

Did you read through each study unit outcome?

Did you go through all learning material

Did you complete all the relevant revision exercises and check your answers
against the answers provided?

At this point, you should be able to: (list study unit outcomes again)

©IMM Graduate School Study Guide (BS101B) Page 92 of 125


Study Unit 5 – Statistical models for forecasting

A. Study Unit 5 – Chapters 12 – 15:

A critical aspect of managing any organisation is planning for the future, and developing
appropriate strategies. Although good judgement and intuition are invaluable when making
decisions, there are several statistical methods that can help predict many future aspects of
a business operation.

Shipham (2012) comments that modern tools such as “key performance indicators” can be
used to support the identification of problems and opportunities in the marketing
environment. Figures such as product sales, by unit and value, and customer databases with
information like who customers are and what types of products they are interested in, allows
an assessment of trends. These trends can pinpoint a need for research and management
intervention.

This study unit explains some of the methods. Essentially, these are the issues that you will
be facing in the ‘real world’. Everything that you have learnt so far is applied to the concepts
in these three last chapters.

“Conducting data analysis is like drinking a fine wine. It is


important to swirl and sniff the wine, to unpack the complex
bouquet and to appreciate the experience. Gulping the wine
doesn’t work.”
- Daniel B. Wright

©IMM Graduate School Study Guide (BS101B) Page 93 of 125


B. Study Unit 5: Module Outcomes

Let’s recap what the relevant module learning outcome is for this study unit:
Use statistical methods to predict future aspects of a business operation.

C. Study Unit 5 – Specific Outcomes

Let’s recap what the relevant module learning outcome is for this study unit

At the end of the first section of this study unit, the student should:

Understand the concept of linear regression


Be able to calculate a simple linear regression line, and use the line for prediction
purposes.
Understand the concept of correlation, be able to calculate a correlation coefficient and
explain the link between correlation analysis and regression.

At the end of the second session of the stud unit, the student should:

Understand the purpose of index numbers and be able to calculate different indices.
Be able to distinguish between different weighting methods, discuss pitfalls of index
number construction, and revise the base period of a series of index numbers.
Be able to interpret the results and make marketing decisions based on them.

At the end of the final section of this study unit, the student should:

Be able to identify the components of a time series, compute the trend and seasonal
influence in a time series, de-seasonalise a time series, and forecast future values.
Be able to analyse this information to make good marketing decisions.

©IMM Graduate School Study Guide (BS101B) Page 94 of 125


D. Study Unit 5 – Assessment Criteria

Study Unit Outcomes Assessment Criteria Relevant Chapter


(The student should be (How will you know if the student has in the Prescribed
able to…) achieved the learning outcome?) Text Book
1. Explain the purpose  Explain the purpose of index numbers Chapter 14
of index numbers in and be able to calculate different
marketing decision- indices.
making.
2. Develop indices to  Develop price and quantity indices Chapter 14
measure price and which are summary measures of
quantity changes relative price and quantity changes
over time. over time in a set of items
3. Distinguish between  Calculate both the Paashe and Chapter 14
the Laspeyres and Laspeyeres indices.
Paasche weighting  Interpret Paashe and Laspeyeres
methods. indices.
4. Understand the  State the goal and objectives of a Chapter 12
concept of linear simple regression analysis in marketing
regression, calculate decision-making.
a simple linear  Specify the simple regression model.
regression line, and  Explain the least squares criterion in
use the line for relation to a marketing problem.
prediction purposes.  Find the equation of the least-squares
line.
5. Understand the  Explain the concept of a correlation in Chapter 12
concept of marketing decision-making.
correlation, calculate  Compute the correlation between two
a correlation variables in a marketing research
coefficient and study.
explain the link  Explain the link between correlation
between correlation analysis and regression.
analysis and
regression
6. Identify the  Compute the trend and seasonal Chapter 15
components of a influence in a time series.
time series of  De-seasonalise a time series, and
marketing data. forecast future values.
 Analyse this information to make good
marketing decisions.

D. Study Unit 5 - Objective 1 (Chapter 12)

©IMM Graduate School Study Guide (BS101B) Page 95 of 125


At the end of this section, you should understand the concept of linear regression, be able to
calculate a simple linear regression line, and use the line for prediction purposes. You should
also understand the concept of correlation, be able to calculate a correlation coefficient
(measuring the strength of the linear relationship between two variables), and explain the
link between correlation analysis and regression.

1. Introduction

Regression analysis is merely a way of looking at the relationship between variables, and is
used when two or more variables are involved. For example, while income levels often
determine the brands a market segment purchases, and while age usually also determines
the brands a market segment purchases, it is difficult to picture what their independent
contributions are. This is where linear regression becomes a useful tool. If one can quantify
the contribution of income and age (the independent variables) to brand purchased (the
dependent variable), by constructing a linear model, then one can predict the brand
purchases of someone if his or her income level and age is known. However, this section
addresses regression analysis using only one independent variable as a predictor of a
dependent variable. The extent to which these two variables ‘match’ is known as correlation.

The area of regression analysis is far more complex than is presented in this chapter, and most
textbooks deal with it comprehensively, if you wish to learn more about it.

2. Simple linear regression analysis

As regression analysis aims to produce a linear model, the straight line that best fits the data
will be determined by the regression line. The sample variance around the regression line is
a measure of spread of the observed y values around the regression line (refer to Study Unit
2). The mathematical calculation of determining the regression line is known as the least
squares method. Again, we are fortunate to live in a period of technological advancement,
and we can rely on the statistical software to perform the calculations for us. You will have
to be able to calculate the coefficients for the regression line using the least squares method

©IMM Graduate School Study Guide (BS101B) Page 96 of 125


for the exam, therefore it is advisable to have a calculator for the exam that has a ‘stat-
function’. Make sure you are able to use this function on the calculator. Once the estimates
of the regression line are calculated, you can use the regression equation to estimate values
of the dependent variable from the known independent variable.

Be certain that you understand the dangers of extrapolation:

Extrapolation is the process of estimating values of y using values of x which lie outside
the range of x values used in the construction of the estimated regression line.
Extrapolation can lead to unreliable or meaningless results.
If the range of x values lies between 8 and 15, then 8≤x≤15. This means that you may
only use values that fall within that range. If you were to include values outside of that
range, the results would not be accurate.
Example: estimate the amount of electricity generated based on tonnes of coal usage.
Use the range 8≤x≤15.

y = 900 + 150(x), so if x = 10, then:

y = 900 + 150(10) = 2 400 kilowatts of electricity is produced from 10 tonnes of coal.

But if a values outside the range, say zero, is used then:


y = 900 + 150(0) = 900 kilowatts of electricity is produced from no coal – that is impossible.

As described in previous chapters, 95% confidence intervals can be calculated around


predictions, but this is not covered in this section.

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
b1 =
𝑛 ∑ 𝑥2 − (∑ 𝑥)2

∑ 𝑦 − b1 ∑ 𝑥
b0 =
𝑛

𝑦 = 𝑎 + 𝑏 (𝑥)
Example Wegner (2020): Music Technologies, an electronics retail company in Durban, has
recorded the number of flat-screen TVs sold each week and the number of advertisements
placed weekly for a period of 12 weeks.

Ads (x) 4 4 3 2 5 2 4 3 5 5 3 4

©IMM Graduate School Study Guide (BS101B) Page 97 of 125


Sales (y) 26 28 24 18 35 24 36 25 31 37 30 32

For a class project, you are required to calculate the linear regression equation:

Solution:

x y xy x2

4 26 104 16

4 28 112 16

3 24 72 9

2 18 36 4

5 35 175 25

2 24 48 4

4 36 144 16

3 25 75 9

5 31 155 25

5 37 185 25

3 30 90 9

4 32 128 16

∑𝒙= 44 ∑𝒚 = 346 ∑ 𝒙𝒚 =1 324 ∑ 𝒙𝟐 = 174

(12)(1324)− (44)(346)
𝑏= (12)(174)− (44)2
= 664 / 152 = 4.368 = 4.37

346− 4.368(44)
𝑎= = 12.817 = 12.82
12

𝒚 = 12.82 + 4.37 𝒙

Using the results from the project, if 10 ads were placed, predict how many flat screen TV’s
would be sold:

𝒙 = 10
𝒚 = 12.82 + 4.37 (𝟏𝟎)
𝒚 = 56.52 = 57 TV’s would be sold

©IMM Graduate School Study Guide (BS101B) Page 98 of 125


Now the store is able to calculate if the number of ads placed is profitable in terms of the
number of sales that result.

3. Correlation analysis

The correlation coefficient measures the strength of the linear association between two
quantitative variables (but does not indicate that one causes the other). It is denoted by r
which can have values from -1 to +1. A value of -1 or +1 denotes 100% correlation (negative
and positive, respectively); a variable of 0 denotes zero correlation.

Ice cream sales in relation Hot chocolate sales in


to temperature relation to temperature
1500 1500

1000 1000
Hot
500 Ice cream 500
chocolate
sales
0 0 sales
0 20 40 0 20 40
Daily temperature Daily temperature

Direct, positive linear relationship. Inverse, negative linear relationship.


As the independent (y) daily temperature As the independent (y) daily
rises, so do the dependent (x) ice cream temperature rises, so the dependent
sales also rise. (x) hot chocolate sales decrease.
r will be close to +1 r will be close to -1

Calculator sales in relation From these results we can see that:


to temperature
 As it gets warmer, people buy
30
more ice cream, so when it is hot,
20
store should stock more ice cream.
10 Calculator
sales
 When it is colder, people buy more
0
0 10 20 hot chocolate, so in winter, ice
Daily temperature cream can be discounted and hot

©IMM Graduate School Study Guide (BS101B) Page 99 of 125


No linear relationship. chocolate can be sold at a
There is no relationship between the premium.
independent (y) daily temperature rises,  Calculator sales are not affected by
and the dependent (x) calculator sales the temperature. Maybe another
r will be close to 0 variable should be used, like the
time of year.

4. Pearson’s correlation coefficient

Pearson’s correlation coefficient is used to determine if there is a positive or negative


relationship between two variables.

Description of the relationship

Strong Moderate Weak No Weak Moderate Strong


negative negative negative relationship positive positive positive

r=-1 r=0 r=+1

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ] × [𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]

Example: Find the correlation between daily temperature and ice cream sales.

(x) Temp 20 25 24 27 22 28 28 25 19
(y) Sales 50 60 55 69 52 82 78 57 45

Solution:
(x) (y) (x2) xy (y2)

20 50 400 1000 2500

25 60 625 1500 3600

24 55 576 1320 3025

27 69 729 1863 4761

©IMM Graduate School Study Guide (BS101B) Page 100 of 125


22 52 484 1144 2704

28 82 784 2296 6724

28 78 784 2184 6084

25 57 625 1425 3249

19 45 361 855 2025

∑ 𝟐𝟏𝟖 ∑ 548 ∑ 5368 ∑ 13587 ∑ 34672

9 (13587)− (218)(548)
𝑟= = 2819 / 3042.0835 = 0.92667 = 0.93
√[9 (5368)− (218)2 ] × [9 (34672)− (548)2 ]

Interpretation: There is a strong positive linear relationship between the temperature and
ice cream sales (0.92667 is almost + 1). This means that warmer days lead to more sales. Store
owners can use weather forecasts to predict sales and order stock accordingly. This also
means ice cream prices can be increased in warmer weather.

©IMM Graduate School Study Guide (BS101B) Page 101 of 125


SECTION D

E. Study Unit 5 - Objective 2 (Chapter 14)


At the end of this section, you should understand the purpose of index numbers and be able
to calculate different indices. You should be able to distinguish between different weighting
methods, discuss pitfalls of index number construction, and revise the base period of a series
of index numbers. You should be able to interpret the results and make marketing decisions
based on them.

1. Introduction

Price changes have a direct effect on the daily living expenses of every person. Everyone has
to make provisions for price increases due to inflation. The rising cost of living leads to a
demand for higher salaries, which increase production costs which increase prices (vicious
circle, isn’t it?). Various indices related to price changes can be calculated, e.g. price and
quantity indices. Likewise, indices can be calculated for changes in other things such as
availability of electricity (which we see through load shedding), or the number of residents in
a town. These are known as quantity indices. The aim of the calculation of various types of
indices is to monitor price and quantity changes over time.

An index number is a summary measure (usually expressed as a percentage) of the change in


activity from one-time period (usually a base period) to another. An index is a ratio that
measures a relative change. Price and quantity indices are summary measures of relative
price and quantity changes over time in a set of items; they are also used in statistical
analyses. Regression models (previous section), for instance, frequently contain price or
quantity indices as independent variables. Price and quantity indices are discussed in this
chapter, in their several roles.

Note: The plural of index is indices, although Wegner (2020) uses the word indexes.

2. Price indices

©IMM Graduate School Study Guide (BS101B) Page 102 of 125


A simple price index represents the price change of a single commodity; a composite price
index represents the price changes of more than one commodity. Two methods (of weighted
aggregates) that can be used to calculate composite price indices are described, as the
weighted average of price relativities method (or, the weighted arithmetic mean of relative
prices). It is important to be familiar with all these methods.
𝑝1
𝑃𝑟𝑖𝑐𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 = × 100 = %
𝑝0
𝑝1 = more recent price

𝑝0 = older price
Example: January 2017: a 330 ml can of SuperCola sells for R 9.50 January 2018: a 330 ml can
of SuperCola sells for R 12.50

12.50
Solution: × 100 = 131.58% =132%
9.50

Interpretation: There was a 32% increase in the price of a 330 ml can of SuperCola over the
12 month period.
∑( 𝑝1 × 𝑞0 )
𝐿𝑎𝑠𝑝𝑒𝑦𝑟𝑒𝑠 𝑝𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑝0 × 𝑞0 )
𝑝1 = more recent price

𝑝0 = older price
𝑞0 = older quantity
∑( 𝑝1 × 𝑞1 )
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑝𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑝0 × 𝑞1 )
𝑝1 = more recent price

𝑝0 = older price
𝑞1 = more recent quantity

Example: as a marketing student, you are interested in how price increases have affected the
most basic items in your local grocery store. In order to get a better understanding of this,
you will calculate the Laspeyres and Paasche price indices.

©IMM Graduate School Study Guide (BS101B) Page 103 of 125


2017 2018

Toiletry Unit price Quantity Unit price Quantity


items (p0) (q0) (p1) (q1)

Soap R 1.95 37 R2.10 40

Deodorant R 14.65 24 R15.95 18

Toothpaste R6.29 14 R6.74 16

p0 x q0 p1 x q0 p1 x q 1 p0 x q1

72.15 77.70 84 78

351.60 382.80 287.10 263.70

88.06 94.36 107.84 100.64

511.81 554.86 478.94 442.34

554.86
Solution: Laspeyres price index: 511.81 × 100 = 108.41%

Interpretation: There was an 8% increase in the price of toiletry items between 2017 and
2018.

478.94
Solution: Paasche price index: 442.34 × 100 = 108.27%

Interpretation: There was an 8% increase in the price of toiletry items between 2017 and
2018.

3. Quantity indices

The concept of quantity indices is very similar to that of price indices, as are the methods used
for their calculation. The formulas given in the formula sheet in the examination are all price
index formulas. If you want to compare quantities rather than prices, you have only to replace
the p’s with q’s, and the q’s with p’s in the price index formulas to obtain the formulas for
quantity index numbers. Remember the subscripts (numbers) remain the same.

©IMM Graduate School Study Guide (BS101B) Page 104 of 125


𝑞1
𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 = × 100 = %
𝑞0
𝑞1 = more recent quantity

𝑞0 = older quantity

∑( 𝑞1 × 𝑝0 )
𝐿𝑎𝑠𝑝𝑒𝑦𝑟𝑒𝑠 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑞0 × 𝑝0 )
𝑞1 = more recent quantity
𝑞0 = older quantity

𝑝0 = older price

∑( 𝑞1 × 𝑝1 )
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 = × 100 = %
∑( 𝑞0 × 𝑝1 )
𝑞1 = more recent quantity
𝑞0 = older quantity

𝑝1 = more recent price

Example:

2016 2017

Toiletry Unit price Quantity Unit price Quantity


items (p0) (q0) (p1) (q1)

Soap R 1.95 37 R2.10 40

Deodorant R 14.65 24 R15.95 18

Toothpaste R6.29 14 R6.74 16

Regarding soap, what was the quantity chance between 2016 and 2017?

40
Solution: × 100 = 108.11%
37

Interpretation: There was an 8% increase in the quantity of soap sold in the 12 months.

©IMM Graduate School Study Guide (BS101B) Page 105 of 125


p0 x q0 p1 x q0 p1 x q 1 p0 x q1

72.15 77.70 84 78

351.60 382.80 287.10 263.70

88.06 94.36 107.84 100.64

511.81 554.86 478.94 442.34

442.34
Solution: Laspeyres quantity index: 551.81 × 100 = 80.16%

Interpretation: There was a 20% decrease in the quantity of toiletry items sold between 2016
and 2017. This may be due to the 8% increase in price.
478.94
Solution: Paasche quantity index: 554.86 × 100 = 86.317%

Interpretation: There was 14% decrease in the quantity of toiletry items sold between 2016
and 2017. This may be due to the 8% increase in price.

4. Problems of index number construction

There are four important considerations regarding index numbers:

1) The purpose of the index that is to be calculated - a clear understanding of the scope
of the proposed index is needed to determine the following factors.

2) Which items should be included - when a composite index is being calculated, it is


often impossible or unnecessary to include all items. A representative sample of items
must be selected.

3) What weighting method should be used – this reflects the relative importance of each
item.

4) Which base year should be selected - choice of the base year must also be carefully
considered. If we compare the price of motor cars in January 2006 with that in January
1998, 1998 is taken to be the base year as it is furthest in the past. However, this
choice of base year (furthest in the past) may not always be appropriate for a number

©IMM Graduate School Study Guide (BS101B) Page 106 of 125


of reasons. It is important to remember that the base year should not be too far in
the past. Many products change; old products disappear and new products are
produced. Consider the change in the amount spent on telephone calls from 1992 to
2012. In 1992, there were no cell phones and the cost of phone calls increased hugely
when they became available. The base year should also be a period of economic and
political stability.

5. Limitations in the interpretation of index numbers

Index numbers are based on samples, so sampling errors may be present.


Any changes that affect purchasing patterns, such as inflation, new technologies or
differing quality, can make comparisons over time unreliable.

6. Applications of index numbers

The base of a series of price indices can be shifted from one time period to another by
multiplying each price index by an adjustment factor:

100
𝐴𝑑𝑗𝑢𝑠𝑡𝑚𝑒𝑛𝑡 𝑓𝑎𝑐𝑡𝑜𝑟 =
𝑂𝑙𝑑 𝑖𝑛𝑑𝑒𝑥

It is used for transforming monetary values into real values, relative to the base year.

Example Wegner (2020): Consider the following price index series with 2007 as the base year
(2007 = 100).
Year 2005 2006 2007 2008 2009 2010 2011
Price index 78 87 100 106 125 138 144

As a marketing student you have learnt about consumer price sensitivity and the impact it has
on demand. The financial manager of the company you work for wants to increase prices, and
the marketing manager has asked you for a report on the price index series using 2009 as the
base year, as this is the year that a new competitor entered the market.

©IMM Graduate School Study Guide (BS101B) Page 107 of 125


100
Solution: 𝐴𝑑𝑗𝑢𝑠𝑡𝑚𝑒𝑛𝑡 𝑓𝑎𝑐𝑡𝑜𝑟 = = 0.80
125

Year 2005 2006 2007 2008 2009 2010 2011


Price 78 87 100 106 125 138 144
index
x factor x 0.80 x 0.80 x 0.80 x 0.80 x 0.80 x 0.80 x 0.80

= = 62.40 = 69.60 = 80.00 = 84.80 = 100.00 =110.40 = 115.20

Interpretation: The revised price index shows that since the new competitor entered the
market in 2009, prices have been increasing. An increase in the prices of the company’s
product needs to be consistent with, or below this, or customers will become sensitive to the
fact that your prices are increasing more than the competitor’s and demand may decrease.

F. Study Unit 5 - Objective 3 (Chapter 15)


At the end of this section, you should be able to identify the components of a time series,
compute the trend and seasonal influence in a time series, de-seasonalise a time series, and
forecast future values. You should also be able to analyse this information to make good
marketing decisions.

1. Introduction

Suppose you are the marketing assistant of a B2B IT consulting company, and you are asked
to provide estimates of computer sales for the four quarters of 2018. Your estimates will
affect the number of orders, inventory policies, sales quotas, etc. It is essential that you
provide good estimates so that the management team can plan efficiently and effectively.
Poor planning will result in increased costs, and perhaps even the loss of your, and the other
staff members’, end of year bonuses. Good estimates can be calculated by reviewing past
sales figures, and the accompanying trends. Do sales peak in January, or at the beginning of
the tax year; do they fall in the third quarter? Such a review of historical data allows for better
predictions of future sales. This historical sales data is known as a time series. Specifically, a

©IMM Graduate School Study Guide (BS101B) Page 108 of 125


time series is a set of observations measured at successive points in time or over successive
periods of time.

2. Components of a time series

A time series is a set of observations of a random variable arranged in chronological (time)


order. A time series is constructed out of four separate components:
Trend (T) – the long-term smooth underlying movement, or general pattern of the time
series;
Cyclical variations (C) – medium to long term deviations from the trend, of the waves
of growth and recession;
Seasonal variation (S) – fluctuations that are repeated periodically, usually within a
year, like toy sales peaking over December every year; and
Irregular variation (I) – which are random variations that are unpredictable, like load
shedding forcing a restaurant to close.

Example: A shoe store supplies you with the following sales data:
Jan May Sep Jan May Sep Jan May Sep Jan
2014 2014 2014 2015 2015 2015 2016 2016 2016 2017
145 73 93 166 83 106 189 94 15 207

©IMM Graduate School Study Guide (BS101B) Page 109 of 125


Shoe sales trend from 2014 to 2017
250

200

150

100 Shoe sales

50

0
Mar-14

Sep-14
Nov-14

Mar-15

Sep-15
Nov-15

Mar-16

Sep-16
Nov-16
May-14

May-15

May-16
Jan-14

Jul-14

Jan-15

Jul-15

Jan-16

Jul-16

Jan-17
Interpretation:
Trend – overall increase in sales
Cyclical variations – period of steady growth
Seasonal variation – sales peak at January every year
Irregular variation – abnormal drop in sales during September 2016

3. Decomposition of a time series

By de-seasonalising a time series, we effectively remove the seasonal effects which results in
a different set of figures. The multiplicative time series model is simple a very simple equation
used to smooth out the trend line:

𝑦 = 𝑇 ×𝐶 ×𝑆 ×𝐼

Much of the variation is often described by the trend and seasonal components of the time
series and these are the components that will be addressed in more detail.

4. Trend analysis

When we de-seasonalise a time series, we smooth out the curve into a straight line (linear
trend). Two methods for removing the trend effect on a time series are the moving average
method and the regression analysis method.

©IMM Graduate School Study Guide (BS101B) Page 110 of 125


Both methods are explained in detail in this section. A moving average can be regarded as an
artificially created time series.

The moving average is obtained by replacing each observed value with the averages of the
observed values. The concept can be illustrated in the following diagram, where a three-year
moving average is calculated for a series of values collected during the period 2013 to 2017.

Year y Three-year moving average

2013 y1
2014 y2 (y1 + y2 + y3)/3

2015 y3 (y2 + y3 + y4)/3

2016 y4 (y3 + y4 + y5)/3

2017 y5
We can use the de-seasonalised time series to identify trend.

Example Wegner (2020): The table below shows the number of fire insurance claims received
by an insurance company in each four-month period from 2008 to 2011. You have just been
hired as an intern in the marketing research department, and your first task is to comment on
the claims trend over the past four years.

2008 2009 2010 2011

Period P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 P2 P3

Claims 7 3 5 9 7 9 12 4 10 13 9 10

Calculate a three-period moving average for the number of insurance claims received:

©IMM Graduate School Study Guide (BS101B) Page 111 of 125


Solution:

Year Period Claims Three-period moving total Three-period moving


(centered) average

2008 P1 7

P2 3 7+3+5 = 15 15 / 3 = 5

P3 5 3+5+9 = 17 17 / 3 = 5.67

2009 P1 9 5+9+7 = 21 21 / 3 = 7

P2 7 9+7+9 = 25 25 / 3 = 8.33

P3 9 7+9+12 = 28 28 / 3 = 9.33

2010 P1 12 9+12+4 = 25 25 / 3 = 8.33

P2 4 12+4+10 = 26 26 / 3 = 8.67

P3 10 4+10+13 = 27 27 / 3 = 9

2011 P1 13 10+13+9 = 32 32 / 3 = 10.67

P2 9 13+9+10 = 32 32 / 3 = 10.67

P3 10

Comment on the trend: there is an overall increase in the number of claims between 2008
and 2011. The least squares method (see section one of this study unit on linear regression).
In this case, the dependent variable, y, is the actual time series and the independent variable,
x, is time. As the “name” of the time period is not a numeric value, each time period is
numbered. Two methods can be used to do this, namely, the sequential numbering method
and the zero-sum method.

©IMM Graduate School Study Guide (BS101B) Page 112 of 125


Example:

Sales (y) Year (x) Sequential numbering (x) Zero-sum method (x)

578 2011 1 -3

593 2012 2 -1

620 2013 3 0

647 2014 4 1

671 2015 5 3

∑𝑥 = 0

*Sequential numbering is the favoured approach

Now that there are numerical values for each time period, the data can be used in the normal
least squares regression formula:

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
b1 =
𝑛 ∑ 𝑥2 − (∑ 𝑥)2

∑ 𝑦 − b1 ∑ 𝑥
b0 =
𝑛

𝑦 = 𝑎 + 𝑏 (𝑥)
5. Seasonal analysis

The technique described in this section can be applied to any time interval (weekly, monthly,
quarterly data, etc.). Calculating seasonal indices is the most important aspect of the analysis
of seasonal fluctuation. Using monthly data, for example, the seasonal index will contain 12
figures, one for each month of the year, and will be used as a measure of activity for each
month relative to the average activity over a year. The four steps used to calculate the 4-
point moving average for quarterly data are the same as for the trend analysis above.

©IMM Graduate School Study Guide (BS101B) Page 113 of 125


6. Use of time series indicators

There are two applications of trend analysis and seasonal indices:

1) De-seasonalising a time series using seasonal indices in order to create a linear


trend to analyse the movement.

2) Constructing forecasts of time series values, using the trend line and
incorporating the seasonal influence to predict how the trend will continue into
the future.

G. Study Unit 5 – Revision Exercises

1. As the marketing assistant for a chain of One-stop coffee shops, you have been asked
to analyse the pricing of different coffee brands between 2013 and 2017. Consider
the table below, which gives prices of coffee over a 5-year period
Brand 2013 2014 2015 2016 2017
Ricoffy 10.59 12.99 13.99 16.59 17.99
Nescafe 5.29 6.99 7.59 7.99 9.29
Jacobs 11.99 12.49 12.99 13.49 16.68
Frisco 4.79 4.99 7.99 8.39 8.99

a. Discuss issues that should be considered before choosing the base year. (2)
b. Calculate the price index for each coffee, using 2015 as the base year. (20)
c. Comment on the results. (1)

2. The following data show the age (in years) and the selling price (in R1000) of used
cars with the same engine capacity and make at eight different second-hand car
dealers in Port Elizabeth. You are a market analyst of a well-known car magazine, and
are compiling an article on the following:

Age 1 6 4 2 5 4 1 2
Price 41.2 10.3 24.3 38.7 8.7 26.1 38.7 36.2

a. Name the dependent and independent variables. (2)

©IMM Graduate School Study Guide (BS101B) Page 114 of 125


b. Calculate the regression equation of line y, given x (5)

3. Greyhound suspects that there is a direct link between advertising expenditure and
the number of passengers who choose Greyhound. The expenditure on advertising
and the number of passengers who have ridden with Greyhound over the last 12
months are shown below:

Advertising expenditure (𝒙) (R 1 000) No. of passengers (𝒚) (1 000)


11.2 16.8
13.3 19.0
8.5 12.8
16.9 24.3
11.3 16.4
15.9 21.8
11.0 15.4
15.1 21.7
20.3 25.3
10.5 18.0
12.6 16.6
14.0 19.5

a. Fit a linear regression equation to predict the number of passengers from


advertising expenditure. (5)
b. How many passengers can be expected for the following month if the
budgeted advertising expenditure is R 19 000? (2)
c. Comment on the correlation between the advertising expenditure and the
number of passengers if the correlation coefficient is given as 0.96 (2)

4. The turnover figures for a bicycle manufacturer for a 10-year period are shown
below. You have been hired as a marketing consultant to assist the manufacturer
with the following:

©IMM Graduate School Study Guide (BS101B) Page 115 of 125


Year Turnover (𝑦) (R 100 000)
1 2428
2 2951
3 3533
4 3618
5 3616
6 4264
7 4738
8 4460
9 5318
10 6915

a. Calculate the five-period moving average for this time series. (6)
b. Comment on turnover trend of the company for the 10-year period. (1)

H. Study Unit 5 – Revision Exercises Solutions

1.
a. The base year should not be too far in the past as many products change, there
might be products discontinued and others newly introduced. The base year
should also be a period of economic and political stability. 
𝑝
b. 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 = 𝑝1 × 100
0
c.
Brand 2013 2014 2015 2016 2017
Ricoffy 76 93 100 119 129
Nescafe 70  92  100  105  122 
Jacobs 92  96  100  104  128 
Frisco 60  62  100  105  113 

d. Ricoffy had the highest increase since 2015, the price of Ricoffy increased by 29%
since 2015.

2. a. Independent variable (x) – Age


Dependent variable (y) – Price

©IMM Graduate School Study Guide (BS101B) Page 116 of 125


b.
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
(8 × 536.6) − (25 × 224.2)
𝑏=
(8 × 103) − 252
𝑏 = −6.59

∑𝑦 − 𝑏∑𝑥
𝑎=
𝑛
375 − (−6.59 × 25)
𝑎=
8
𝑎 = 48.62

𝑦 = −6.59𝑥 + 48.62

3. a.
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
(12 × 3174.09) − (160.6 × 227.6)
𝑏=
(12 × 2266.8) − 160.62
𝑏 = 1.46

∑𝑦 − 𝑏∑𝑥
𝑎=
𝑛
227.6 − (1.46 × 160.6)
𝑎=
12
𝑎 = −0.57

𝑦 = 1.46𝑥 − 0.57

b.
𝑦 = 1.46𝑥 − 0.57
𝑦 = (1.46 × 19) − 0.57
𝑦 = 27.17 ≈ 𝑅27170

c. There is a strong positive linear relationship between the advertising


expenditure and the number of passengers, the more money spent on
advertising the more passengers will be on the bus. 

©IMM Graduate School Study Guide (BS101B) Page 117 of 125


4. a.
5-period moving 5-period moving
Year Turnover total average
1 2428
2 2951
3533
3 16146 3,229.20 
3618
4 17982 3,596.40 
3616
5 19769 3,953.80 
4264
6 20696 4,139.20 
4738
7 22396 4,479.20 
4460
8 25695 5,139.00 
9 5318
10 6915

b. The turnover of the bicycle manufacturer is showing an increasing trend


over the 10-year period. 

©IMM Graduate School Study Guide (BS101B) Page 118 of 125


I. Study Unit 5 - Progress check

You have come to the end of Study Unit 5.

Time to do a progress check to determine whether you have gone through all the required
content, completed all the exercises.

©IMM Graduate School Study Guide (BS101B) Page 119 of 125


Your progress checklist:

Progress checklist YES / NO?

Did you read through each study unit outcome?

Did you go through all learning material

Did you complete all the relevant revision exercises and check your answers
against the answers provided?

At this point, you should be able to: (list study unit outcomes again)

Are you ready to tackle the questions relevant to Study Unit 5 in the Exam?

©IMM Graduate School Study Guide (BS101B) Page 120 of 125


SECTION E – REVISION & EXAM PREPARATION

Final Progress Check

You have now covered every module outcome and the associated learning activities or
exercises relating to this module.
Are you ready for your final assessment?

Let’s recap again:

 Can you explain each core concept?

 Did you complete all the review exercises?

 Did you compare your answers with those answers provided in the study guide?

 Are you more aware of the same basic principles applied around you in your work
environment or day-to-day life?

 Did you complete all your assignments before the due date and ensure it reached the
IMM Graduate School in time?

 Consult eLearn for more exam tips on how to study and how to prepare yourself for
the exams.

Good luck for your upcoming exams!

©IMM Graduate School Study Guide (BS101B) Page 121 of 125


Reference list

The overall content of this study guide is based on the prescribed textbook of this module.:
Wegner, T., 2020. Applied Business Statistics: Methods and Excel-based Applications. 5th ed.
Cape Town: Juta. (CD-ROM included).

Alphabetical list
http://www.academia.edu/6334673/19_Growth_Hacker_Quotes-
Thoughts_on_the_Future_of_Marketing
[Accessed: 15 August 2014]

Anderson, D.R., Sweeney, D.J., and Williams, T.A., 1991. An Introduction to Management
Science – Quantitative Approaches to Decision Making. 6th ed. St Paul: West Publishing
Company.
http://en.wikipedia.org/wiki/Ronald_Fisher
[Accessed: 15 August 2014]

http://www.math.wpi.edu/Course_Materials/SAS/quotes.html
[Accessed: 15 August 2014]

Shipham, S.O., 2012. Basic Marketing Research Study Guide. IMM Graduate School.
http://stats.stackexchange.com/questions/726/famous-statistician-quotes
[Accessed: 15 August 2014]

http://todayinsci.com/QuotationsCategories/S_Cat/Statistics-Quotations.htm
Accessed: 15 August 2014]

Wiid, J. and Diggines, C. 2013. Marketing Research. 2nd Ed., Cape Town: Juta.

©IMM Graduate School Study Guide (BS101B) Page 122 of 125


Glossary
Key Term Definition Example: As the marketing manager of
Pretoria Zoo, you conducted a survey on
75 visitors to the Zoo, to identify which
animals they most enjoyed seeing. The
results will be summarised into a report
for the marketing department, and be use
to make the Zoo more attractive for
visitors.

Population The collection of all the Population – all visitors to the Pretoria
observations of a random Zoo
variable under study and
about which one is trying to
draw conclusions in
practice.

Sampling unit The item / individual being Sampling unit – each visitor who is
measured or counted with questioned in the survey
respect to the random
variables(s) under study.

Sample A subset of the population Sample – the 100 visitors who were
on which observations are selected to represent the population
made or measurements are
taken. A sample is used
when a census is too
expensive, time-consuming
or impossible.

Sample size The number of individuals / Sample size – 75 respondents


items in the sample

©IMM Graduate School Study Guide (BS101B) Page 123 of 125


Random sample A sample where each item /
individual is chosen by
chance and all members of
the population have an
equal chance of being
chosen.

Data Individual observations on


an issue.

Variable Any characteristic being Variable being studied – which animals


measured or observed. are most popular

Descriptive Describe things by identify Descriptive or inferential – descriptive, as


statistics the essential characteristics the results will be summarised and used to
of a random variable and make decisions
produce a profile of its
behaviour. Condenses large
volumes of data into a few
summary measures.

Inferential Answer a specific question


statistics by extending the
information extracted from
a sample of the actual
environment in which the
problem arises. Generalises
sample findings to the
broader population.

Qualitative data Non-numeric responses,


e.g. “I feel that …” or “In my
opinion …”

©IMM Graduate School Study Guide (BS101B) Page 124 of 125


Quantitative data Numeric response, e.g. how Quantitative, as the results will be based
many people like seeing the on frequency counts.
lions.

Interval data Associated with quantitative D Data type – interval, as the data is

(Level of data, scaled with order quantitative and the animals are ranked.
measurement) (ranking) and distance. at

Nominal data Associated with qualitative

(Level of data, scaled with no order


measurement) (ranking) as all categories
are seen as equal.

Ordinal data Associated with qualitative

(Level of data, scaled with order


measurement) (ranking).

Copyright 2020

In terms of the Copyright Act 98 of 1978, no part of this study material may be reproduced,
be stored in retrieval system, be transmitted or used in any form or be published,
redistributed or screened by any means (electronic, mechanical, photocopying, recording or
otherwise) without the written permission of the IMM Graduate School. However,
permission to use any material in this work that was derived from other sources must be
obtained from the original sources.

©IMM Graduate School Study Guide (BS101B) Page 125 of 125

You might also like