0% found this document useful (0 votes)
18 views

Chapter 1.1 Introduction to Data

Data refers to collected and analyzed information that becomes useful for decision-making once processed. It can be categorized into various types, including quantitative, qualitative, nominal, ordinal, discrete, and continuous data, each serving different purposes in research and analysis. Additionally, data sources can vary from databases and files to external providers, with different levels of interaction and types of data management.

Uploaded by

tsudiksha23cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Chapter 1.1 Introduction to Data

Data refers to collected and analyzed information that becomes useful for decision-making once processed. It can be categorized into various types, including quantitative, qualitative, nominal, ordinal, discrete, and continuous data, each serving different purposes in research and analysis. Additionally, data sources can vary from databases and files to external providers, with different levels of interaction and types of data management.

Uploaded by

tsudiksha23cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

What Is Data ?

 Data are collected and analyzed; data only becomes information suitable for making
decisions once it has been analyzed in some fashion

 Data are used in scientific research, businesses management (e.g., sales data, revenue,
profits, stock price), finance, governance (e.g., crime rates, unemployment
rates, literacy rates), and in virtually every other form of human organizational
activity (e.g., censuses of the number of homeless people by non-profit
organizations).

 Data are measured, collected, reported, and analyzed, and used to create
data visualizations such as graphs, tables or images. Data as a general concept refers
to the fact that some existing information or knowledge is represented or coded in
some form suitable for better usage or processing.

 Raw data ("unprocessed data") is a collection of numbers or characters before it has


been "cleaned" and corrected by researchers. Raw data needs to be corrected to
remove outliers or obvious instrument or data entry errors (e.g., a thermometer
reading from an outdoor Arctic location recording a tropical temperature).

Types of Data
1. Quantitative data
Quantitative data seems to be the easiest to explain. It answers key questions such as “how
many, “how much” and “how often”.

Quantitative data can be expressed as a number or can be quantified. Simply put, it can be
measured by numerical variables.

Quantitative data are easily amenable to statistical manipulation and can be represented by a
wide variety of statistical types of graphs and charts such as line, bar graph, scatter plot, and
etc.
Examples of quantitative data:
 Scores on tests and exams e.g. 85, 67, 90 and etc.
 The weight of a person or a subject.
 Your shoe size.
 The temperature in a room.
There are 2 general types of quantitative data: discrete data and continuous data. We will
explain them later in this article.

2. Qualitative data
Qualitative data can’t be expressed as a number and can’t be measured. Qualitative data
consist of words, pictures, and symbols, not numbers.

Qualitative data is also called categorical data because the information can be sorted by
category, not by number.
Qualitative data can answer questions such as “how this has happened” or and “why this has
happened”.

Examples of qualitative data:


 Colors e.g. the color of the sea
 Your favorite holiday destination such as Hawaii, New Zealand and etc.
 Names as John, Patricia,…..
 Ethnicity such as American Indian, Asian, etc.

3. Nominal data
Nominal data is used just for labeling variables, without any type of quantitative value. The
name ‘nominal’ comes from the Latin word “nomen” which means ‘name’.

The nominal data just name a thing without applying it to order. Actually, the nominal data
could just be called “labels.”

Examples of Nominal Data:


 Gender (Women, Men)
 Hair color (Blonde, Brown, Brunette, Red, etc.)
 Marital status (Married, Single, Widowed)
 Ethnicity (Hispanic, Asian)
As you see from the examples there is no intrinsic ordering to the variables.

Eye color is a nominal variable having a few categories (Blue, Green, Brown) and there is no
way to order these categories from highest to lowest.

4. Ordinal data
Ordinal data shows where a number is in order. This is the crucial difference from nominal
types of data.

Ordinal data is data which is placed into some kind of order by their position on a scale.
Ordinal data may indicate superiority.

However, you cannot do arithmetic with ordinal numbers because they only show
sequence.
Ordinal variables are considered as “in between” qualitative and quantitative variables.

In other words, the ordinal data is qualitative data for which the values are ordered.

In comparison with nominal data, the second one is qualitative data for which the values
cannot be placed in an ordered.

We can also assign numbers to ordinal data to show their relative position. But we cannot do
math with those numbers. For example: “first, second, third…etc.”

Examples of Ordinal Data:


 The first, second and third person in a competition.
 Letter grades: A, B, C, and etc.
 When a company asks a customer to rate the sales experience on a scale of 1-10.
 Economic status: low, medium and high.
In statistics, marketing research, and data science, many decisions depend on whether the
basic data is discrete or continuous.

5. Discrete data
Discrete data is a count that involves only integers. The discrete values cannot be subdivided
into parts.

For example, the number of children in a class is discrete data. You can count whole
individuals. You can’t count 1.5 kids.

To put in other words, discrete data can take only certain values. The data variables cannot be
divided into smaller parts.

It has a limited number of possible values e.g. days of the month.

Examples of discrete data:


 The number of students in a class.
 The number of workers in a company.
 The number of home runs in a baseball game.
 The number of test questions you answered correctly
6. Continuous data
Continuous data is information that could be meaningfully divided into finer levels. It can be
measured on a scale or continuum and can have almost any numeric value.

For example, you can measure your height at very precise scales — meters, centimeters,
millimeters and etc.

You can record continuous data at so many different measurements – width, temperature,
time, and etc. This is where the key difference from discrete types of data lies.

The continuous variables can take any value between two numbers. For example, between 50
and 72 inches, there are literally millions of possible heights: 52.04762 inches, 69.948376
inches and etc.

A good great rule for defining if a data is continuous or discrete is that if the point of
measurement can be reduced in half and still make sense, the data is continuous.

Examples of continuous data:


 The amount of time required to complete a project.
 The height of children.
 The square footage of a two-bedroom house.
 The speed of cars.
Data Source
A data source, in the context of computer science and computer applications, is the location
where data that is being used come from. In a database management system, the primary data
source is the database, which can be located in a disk or a remote server. The data source for
a computer program can be a file, a data sheet, a spreadsheet, an XML file or even hard-
coded data within the program.
Data sources can differ according to the application or the field in question. Computer
applications can have multiple data sources defined, depending on their purpose or function.
Applications such as relational database management systems and even websites use
databases as primary data sources. Hardware such as input devices and sensors use the
environment as the primary data source. A good example is a temperature and pressure
control system for a fluid circulation system such as the ones used in factories and oil
refineries, which take all related data from the environment or whatever they are monitoring;
so the data source here is the environment. Data such as temperature and pressure of the fluid
are taken by sensors regularly and then stored in a database, which then becomes the primary
data source for another computer application that manipulates and presents this data.

A data source is most commonly used in context with databases and database management
systems or any system that primarily deals with data, and is referred to as a data source name
(DSN), which is defined in the application so that it can find the location of the data. It
simply means what the words mean: where data is coming from.
A data source is
(1) the physical or digital location where data under question is stored as a data table
(or other format),
(2) the degree of originality of a data table,
(3) a brand name data provider
(4) the data used via a self-service data tool such as Excel, Tableau, or Power BI,
(5) the computer storage type, i.e File Data Source or Machine Data Source,
(6) a technical database such as Amazon AWS or Microsoft Azure
(7) a legacy data source with a proper name within an organization,
(8) a data type such as stock, accounting, or economic indicator.

Types of Data Source


1. Data Table Level

The basic interaction with data sources is found at the data table level. A data table is nothing
more than columns and rows. Each row holds an ID and entries under each column that
describe the row, whereas each column contains all entries for every ID on the specific
describer for that column. In my article on data sets, I explain this with the following example
table:
Item Color Weight

Jeep Green 2.5 tons

Honda Blue 2 tons

BMW Gray 2 tons

Ford Blue 2.5 tons

Chevrolet Green 2.5 tons

Lincoln Blue 2 tons

2. Conceptual Level

When discussing data sources in a professional setting, a common issue is misunderstanding


around original data. Most data we consume and read in headlines is aggregate data — data
that’s been averaged, summed, divided, or otherwise mathematically manipulated.
Original data is data that exists just as it was collected. Each row represents the raw form of
data as it is collected, like the example shown above.
However, if I created a smaller table of the above Car original data table with the averages for
each color type, such as the below table, it would be an aggregate data source.

Color Number Avg. Weight

Green 2 2.5 tons

Blue 3 2.17 tons

Gray 1 2 tons

3. Research Level

When we’re looking for data from an external provider such as Google Finance or Data.gov,
“data source” refers to the brands themselves. This is the research level because it occurs
when we’re looking for external data to use on an internal assessment, i.e research. In my
article on data sets, I outlined the following data sources that can be used in research:
1. Kaggle. Kaggle has a good variety of data sets on machine learning. It requires
registration but is worth it.
2. FiveThirtyEight. FiveThirtyEight is a news and sports site with data sets that
are available on GitHub.
3. BuzzFeed. BuzzFeed is a news and entertainment site that publishes data used
in its articles on GitHub.
4. Reddit. Reddit data sets from contributors.
4. Self-Service Application Level

When we’re working with self-service data applications such as Tableau and Power BI, the
data source is tabular data available via our connection. We can connect to different servers,
tables, and joins, but that is the extent of it.
At the self-service application level, data source can mean data from any brand, and data
that’s original or aggregate. As long as it’s available for connection.

5. Computer Level

When we’re talking about computers and the actual location of data storage, the topic is
slightly different. Computer level scope does not concern tabular data used by analysts, but
instead how a computer stores information.

Computers store data in two ways:

 Machine Data Sources


 File Data Sources
Machine Data Sources are unique to each physical machine. One desktop has many machine
data sources that are stored in its Windows Registry. These sources are not transferable
between machines. Moreover, Machine Data Sources can be further split into user-defined
and system-defined.
File Data Sources are data stored in independent text files. They are not unique to each
computer and can be transferred across devices.

6. Database Level

Perhaps the most common place for data sources is databases. A database is defined not only
by the data it holds but also by the brand of the tool used to create it. Common examples
include Microsoft Azure, Amazon AWS, Dynamics 365, and SAP. Each of these tools work
as a data warehouse or as an enterprise resource planning (ERP) tool.
If you hear “what is the database data source?” At the database level, the correct answer is the
brand name of the software that hosts the data AND the data itself.

7. Legacy Level

Legacy data sources are databases whose technical structure is built within a company that
does not specialize in database creation.

Many digital companies have built internal data warehouses to handle transactional data.
Today, databases are most often outsourced (to AWS or Azure for example), but there was a
time when in-house solutions were preferable. As you can imagine, once the data
infrastructure is set, it’s not altogether easy to modify, so these legacy systems still exist in
many places.
You may hear the question “what is the data source?” If the question is at the legacy level,
the correct answer is the name of the legacy system.

8. Data Type Level

Data sources can also be thought of as data types, such as accounting, stock, transactional, or
economic indicators. Usually the data type comes from an external source, and there are few
subcategories to choose from.

For example, NASA Earth Observation Data is concerned with biosphere, agriculture, and
other Earthy topics:

You might also like