0% found this document useful (0 votes)
2 views

CAC 428 Topic 1_introduction to Data

The document provides an overview of data, emphasizing its role as a foundational element in business analytics. It explains the characteristics of data, including its structured and unstructured forms, and introduces concepts such as datasets, data attributes, and measurement scales. Additionally, it highlights the importance of understanding data types for effective analysis and decision-making.

Uploaded by

missieluckie23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CAC 428 Topic 1_introduction to Data

The document provides an overview of data, emphasizing its role as a foundational element in business analytics. It explains the characteristics of data, including its structured and unstructured forms, and introduces concepts such as datasets, data attributes, and measurement scales. Additionally, it highlights the importance of understanding data types for effective analysis and decision-making.

Uploaded by

missieluckie23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CAC 428 Big Data and Data

Analytics in Accounting
Topic 1
Data
Introduction – what is data?
• Data (datum in singular form) refers to a
collection of facts usually obtained as the result
of experiments, observations, transactions, or
experiences.
• Data may consist of numbers, letters, words,
images, voice recordings, etc.
• Data may also be measurements of a set of
variables (i.e. characteristics) of the subject or
event that we are interested in studying.
Introduction – how to view data
• In simple terms, data are often viewed as the
lowest level of abstraction from which information
and then knowledge is derived.
• At the highest level of abstraction, data can be
classified as structured and unstructured (or semi
structured).
Introduction – data and business
analytics
• Data is the main ingredient for any business
analytics initiative.
• It is the raw material for decision technologies
used to produce information, insight, and
knowledge.
• Without data none of these technologies could
exist and be popularized
Introduction – data in the past and now
• In the past, it was a big challenge to collect, store,
and manage data. It was not of much value.
• Nowadays data is considered among the most
valuable assets of an organization.
• Today, data has the potential to create invaluable
insight to better understand customers, competitors,
and the business processes..
Introduction – Nature of data today
• Data can be small, or it can be very large.
• Data can be structured (nicely organized for computers to process), or
unstructured (e.g., text that is created for humans and hence not readily
understandable/consumable by computers).
• Data can come in smaller batches continuously or can pour in all at once
as a large batch.
• These are some of the characteristics that define the inherent nature of
today’s data, which is referred to as Big Data.
• Although these characteristics of data make it more challenging to process
and consume, it also makes it more valuable because it enriches the data
beyond its conventional limits, allowing for the discovery of new knowledge.
Introduction – Datasets
• A data set can be viewed as a collection of data objects.
• Collections or groups of related data are referred to as
datasets.
• Each group or dataset member (datum) shares the
same set of attributes or properties as others in the same
dataset.
• A data set is a file, in which the objects are records (or
rows) in the file and each field (or column) corresponds to
an attribute.
Introduction – Datasets
• A data set can be viewed as a collection of data objects.
• Collections or groups of related data are referred to as
datasets.
• Each group or dataset member (datum) shares the
same set of attributes or properties as others in the same
dataset.
• Other names for an attribute are variable, characteristic,
field, feature, or dimension.
Introduction – Datasets
• Data sets differ in several ways.
• Attributes used to describe data objects can be either
(1) Quantitative or (2) qualitative
• Data sets often have special characteristics, e.g., some
data sets contain time series or objects with explicit
relationships to one another.
• The type of data determines which tools and techniques
can be used to analyze the data.
Introduction – Data Objects
• A data object is also a: record, point, vector, pattern,
event, case, sample, instance, observation, or entity.
• Data objects are described by several attributes that
capture the characteristics of an object, such as the price
of a stock, the fiscal year end, forecast earnings, or the
time at which an event occurred.
Introduction – Example
Student ID Year of study Grade point average
(GPA)
1034262 Fourth 3.24
1052663 Second 3.51
1082246 First 3.62
• This is a data set that consists of student information.
• Each row corresponds to a student (an object, a record or observation),
and
• Each column is an attribute (variable) that describes some aspect of a
student, such as grade point average (GPA) or identification number (ID).
Introduction – Using Data sets for
Analytics
• Before using data sets in analytics, one should
understand the data at hand (refer to case
scenario shared before), i.e., “knowing your data.”
• Question – What aspects did the analyst not
know about the data?
• Some of the basic challenges and standard
approaches to knowing your data will be
discussed.
Introduction – Data Attributes
• An attribute (variable) is a property or characteristic of an object
that can vary.
• The variance can either be from one object to another or from
one time to another. For example, eye colour varies from person
to person, while the temperature of an object varies over time
Passenger Eye colour Temperature at Temperature at
ID entry (°C) exit (°C)
XY262 Black 36.4 36.5
PQ663 Blue 35.8 35.6
KL246 Hazel 36.1 36.7
Introduction – Data Attributes
• Attributes can be numbers or symbols.
• Eye colour – symbol; temperature – number
• To discuss and more precisely analyze the
characteristics of objects, numbers or symbols are
assigned to them.
• To assign numbers of symbols to attributes in a well-
defined way, we need a measurement scale.
• Attribute type determines the data analysis technique to
be used.
Introduction – Measurement scale
• A measurement scale is a rule (function) that
associates a number or symbol with an attribute of
an object.
• Measurement is the application of the rule
(function) to associate a value with a particular
attribute of a specific object.
• We engage in measurement all the time.
Introduction – Measurement scale
• Examples
- Stepping on a scale to determine our weight,
- Classifying people as male or female, or
- Counting the number of chairs in a room to
assess whether there will be enough seats for all
meeting attendants.
- Multiplying price by units sold to determine sales
Introduction – Type of Attributes
• Attributes are described using four properties or
operations:
• 1. Distinctness = and ≠
• 2. Order <, ≤, >, and ≥
• 3. Addition + and subtraction –
• 4. Multiplication × and division /

Given these properties, we can define four types of


attributes:
nominal, ordinal, interval, and ratio.
Introduction – Type of Attributes
Introduction – Type of Attributes
• Nominal and ordinal attributes are collectively referred
to as categorical or qualitative attributes.
• Even nominal and ordinal attributes are represented by
numbers, i.e., integers, they should be treated more like
symbols.
• Interval, and ratio attributes are collectively referred to
as quantitative or numeric attributes.
• Quantitative attributes can be integer-valued or
continuous.
Introduction – Type of Attributes
• The types of attributes can also be described in terms of
transformations that do not change the meaning of an attribute
Introduction – Type of Attributes
• Attributes can also be described as either discrete or continuous.
• A discrete attribute has a finite or countably infinite set of values. It
is represented using integer variables
• Binary (or categorical) attributes are a special case of discrete
attributes and assume only two values, e.g., true/false, yes/no,
male/female, or 0/1.
• A continuous attribute is one whose values are real numbers.
Examples are: temperature, height, or weight.
• Nominal, and ordinal attributes are binary or discrete, while interval
and ratio attributes are continuous.
Introduction – Type of Attributes
• Knowing the type of an attribute is important.
• Type of attribute tells us which properties of the
measured values are consistent with the
underlying properties of the attribute,
• Knowledge about attribute type allows us to avoid
foolish actions, such as computing the average
employee ID.
Introduction – Simple Data Taxonomy
Introduction – Data Types and Analytics

• Data comes in many different variable types and


representations
• Business analytics tools, there, are continuously
improving in their ability to help in the daunting
task of data transformation and data
representation.
• This allows data requirements of specific
analytic models to be properly executed.

You might also like