0% found this document useful (0 votes)

18 views

Chapter 1.1 Introduction to Data

Data refers to collected and analyzed information that becomes useful for decision-making once processed. It can be categorized into various types, including quantitative, qualitative, nominal, ordinal, discrete, and continuous data, each serving different purposes in research and analysis. Additionally, data sources can vary from databases and files to external providers, with different levels of interaction and types of data management.

Uploaded by

tsudiksha23cs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Chapter 1.1 Introduction to Data

Uploaded by

tsudiksha23cs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

What Is Data ?

 Data are collected and analyzed; data only becomes information suitable for making
decisions once it has been analyzed in some fashion

 Data are used in scientific research, businesses management (e.g., sales data, revenue,
profits, stock price), finance, governance (e.g., crime rates, unemployment
rates, literacy rates), and in virtually every other form of human organizational
activity (e.g., censuses of the number of homeless people by non-profit
organizations).

 Data are measured, collected, reported, and analyzed, and used to create
data visualizations such as graphs, tables or images. Data as a general concept refers
to the fact that some existing information or knowledge is represented or coded in
some form suitable for better usage or processing.

 Raw data ("unprocessed data") is a collection of numbers or characters before it has

been "cleaned" and corrected by researchers. Raw data needs to be corrected to
remove outliers or obvious instrument or data entry errors (e.g., a thermometer
reading from an outdoor Arctic location recording a tropical temperature).

Types of Data
1. Quantitative data
Quantitative data seems to be the easiest to explain. It answers key questions such as “how
many, “how much” and “how often”.

Quantitative data can be expressed as a number or can be quantified. Simply put, it can be
measured by numerical variables.

Quantitative data are easily amenable to statistical manipulation and can be represented by a
wide variety of statistical types of graphs and charts such as line, bar graph, scatter plot, and
etc.
Examples of quantitative data:
 Scores on tests and exams e.g. 85, 67, 90 and etc.
 The weight of a person or a subject.
 Your shoe size.
 The temperature in a room.
There are 2 general types of quantitative data: discrete data and continuous data. We will
explain them later in this article.

2. Qualitative data
Qualitative data can’t be expressed as a number and can’t be measured. Qualitative data
consist of words, pictures, and symbols, not numbers.

Qualitative data is also called categorical data because the information can be sorted by
category, not by number.
Qualitative data can answer questions such as “how this has happened” or and “why this has
happened”.

Examples of qualitative data:

 Colors e.g. the color of the sea
 Your favorite holiday destination such as Hawaii, New Zealand and etc.
 Names as John, Patricia,…..
 Ethnicity such as American Indian, Asian, etc.

3. Nominal data
Nominal data is used just for labeling variables, without any type of quantitative value. The
name ‘nominal’ comes from the Latin word “nomen” which means ‘name’.

The nominal data just name a thing without applying it to order. Actually, the nominal data
could just be called “labels.”

Examples of Nominal Data:

 Gender (Women, Men)
 Hair color (Blonde, Brown, Brunette, Red, etc.)
 Marital status (Married, Single, Widowed)
 Ethnicity (Hispanic, Asian)
As you see from the examples there is no intrinsic ordering to the variables.

Eye color is a nominal variable having a few categories (Blue, Green, Brown) and there is no
way to order these categories from highest to lowest.

4. Ordinal data
Ordinal data shows where a number is in order. This is the crucial difference from nominal
types of data.

Ordinal data is data which is placed into some kind of order by their position on a scale.
Ordinal data may indicate superiority.

However, you cannot do arithmetic with ordinal numbers because they only show
sequence.
Ordinal variables are considered as “in between” qualitative and quantitative variables.

In other words, the ordinal data is qualitative data for which the values are ordered.

In comparison with nominal data, the second one is qualitative data for which the values
cannot be placed in an ordered.

We can also assign numbers to ordinal data to show their relative position. But we cannot do
math with those numbers. For example: “first, second, third…etc.”

Examples of Ordinal Data:

 The first, second and third person in a competition.
 Letter grades: A, B, C, and etc.
 When a company asks a customer to rate the sales experience on a scale of 1-10.
 Economic status: low, medium and high.
In statistics, marketing research, and data science, many decisions depend on whether the
basic data is discrete or continuous.

5. Discrete data
Discrete data is a count that involves only integers. The discrete values cannot be subdivided
into parts.

For example, the number of children in a class is discrete data. You can count whole
individuals. You can’t count 1.5 kids.

To put in other words, discrete data can take only certain values. The data variables cannot be
divided into smaller parts.

It has a limited number of possible values e.g. days of the month.

Examples of discrete data:

 The number of students in a class.
 The number of workers in a company.
 The number of home runs in a baseball game.
 The number of test questions you answered correctly
6. Continuous data
Continuous data is information that could be meaningfully divided into finer levels. It can be
measured on a scale or continuum and can have almost any numeric value.

For example, you can measure your height at very precise scales — meters, centimeters,
millimeters and etc.

You can record continuous data at so many different measurements – width, temperature,
time, and etc. This is where the key difference from discrete types of data lies.

The continuous variables can take any value between two numbers. For example, between 50
and 72 inches, there are literally millions of possible heights: 52.04762 inches, 69.948376
inches and etc.

A good great rule for defining if a data is continuous or discrete is that if the point of
measurement can be reduced in half and still make sense, the data is continuous.

Examples of continuous data:

 The amount of time required to complete a project.
 The height of children.
 The square footage of a two-bedroom house.
 The speed of cars.
Data Source
A data source, in the context of computer science and computer applications, is the location
where data that is being used come from. In a database management system, the primary data
source is the database, which can be located in a disk or a remote server. The data source for
a computer program can be a file, a data sheet, a spreadsheet, an XML file or even hard-
coded data within the program.
Data sources can differ according to the application or the field in question. Computer
applications can have multiple data sources defined, depending on their purpose or function.
Applications such as relational database management systems and even websites use
databases as primary data sources. Hardware such as input devices and sensors use the
environment as the primary data source. A good example is a temperature and pressure
control system for a fluid circulation system such as the ones used in factories and oil
refineries, which take all related data from the environment or whatever they are monitoring;
so the data source here is the environment. Data such as temperature and pressure of the fluid
are taken by sensors regularly and then stored in a database, which then becomes the primary
data source for another computer application that manipulates and presents this data.

A data source is most commonly used in context with databases and database management
systems or any system that primarily deals with data, and is referred to as a data source name
(DSN), which is defined in the application so that it can find the location of the data. It
simply means what the words mean: where data is coming from.
A data source is
(1) the physical or digital location where data under question is stored as a data table
(or other format),
(2) the degree of originality of a data table,
(3) a brand name data provider
(4) the data used via a self-service data tool such as Excel, Tableau, or Power BI,
(5) the computer storage type, i.e File Data Source or Machine Data Source,
(6) a technical database such as Amazon AWS or Microsoft Azure
(7) a legacy data source with a proper name within an organization,
(8) a data type such as stock, accounting, or economic indicator.

Types of Data Source

1. Data Table Level

The basic interaction with data sources is found at the data table level. A data table is nothing
more than columns and rows. Each row holds an ID and entries under each column that
describe the row, whereas each column contains all entries for every ID on the specific
describer for that column. In my article on data sets, I explain this with the following example
table:
Item Color Weight

Jeep Green 2.5 tons

Honda Blue 2 tons

BMW Gray 2 tons

Ford Blue 2.5 tons

Chevrolet Green 2.5 tons

Lincoln Blue 2 tons

2. Conceptual Level

When discussing data sources in a professional setting, a common issue is misunderstanding

around original data. Most data we consume and read in headlines is aggregate data — data
that’s been averaged, summed, divided, or otherwise mathematically manipulated.
Original data is data that exists just as it was collected. Each row represents the raw form of
data as it is collected, like the example shown above.
However, if I created a smaller table of the above Car original data table with the averages for
each color type, such as the below table, it would be an aggregate data source.

Color Number Avg. Weight

Green 2 2.5 tons

Blue 3 2.17 tons

Gray 1 2 tons

3. Research Level

When we’re looking for data from an external provider such as Google Finance or Data.gov,
“data source” refers to the brands themselves. This is the research level because it occurs
when we’re looking for external data to use on an internal assessment, i.e research. In my
article on data sets, I outlined the following data sources that can be used in research:
1. Kaggle. Kaggle has a good variety of data sets on machine learning. It requires
registration but is worth it.
2. FiveThirtyEight. FiveThirtyEight is a news and sports site with data sets that
are available on GitHub.
3. BuzzFeed. BuzzFeed is a news and entertainment site that publishes data used
in its articles on GitHub.
4. Reddit. Reddit data sets from contributors.
4. Self-Service Application Level

When we’re working with self-service data applications such as Tableau and Power BI, the
data source is tabular data available via our connection. We can connect to different servers,
tables, and joins, but that is the extent of it.
At the self-service application level, data source can mean data from any brand, and data
that’s original or aggregate. As long as it’s available for connection.

5. Computer Level

When we’re talking about computers and the actual location of data storage, the topic is
slightly different. Computer level scope does not concern tabular data used by analysts, but
instead how a computer stores information.

Computers store data in two ways:

 Machine Data Sources

 File Data Sources
Machine Data Sources are unique to each physical machine. One desktop has many machine
data sources that are stored in its Windows Registry. These sources are not transferable
between machines. Moreover, Machine Data Sources can be further split into user-defined
and system-defined.
File Data Sources are data stored in independent text files. They are not unique to each
computer and can be transferred across devices.

6. Database Level

Perhaps the most common place for data sources is databases. A database is defined not only
by the data it holds but also by the brand of the tool used to create it. Common examples
include Microsoft Azure, Amazon AWS, Dynamics 365, and SAP. Each of these tools work
as a data warehouse or as an enterprise resource planning (ERP) tool.
If you hear “what is the database data source?” At the database level, the correct answer is the
brand name of the software that hosts the data AND the data itself.

7. Legacy Level

Legacy data sources are databases whose technical structure is built within a company that
does not specialize in database creation.

Many digital companies have built internal data warehouses to handle transactional data.
Today, databases are most often outsourced (to AWS or Azure for example), but there was a
time when in-house solutions were preferable. As you can imagine, once the data
infrastructure is set, it’s not altogether easy to modify, so these legacy systems still exist in
many places.
You may hear the question “what is the data source?” If the question is at the legacy level,
the correct answer is the name of the legacy system.

8. Data Type Level

Data sources can also be thought of as data types, such as accounting, stock, transactional, or
economic indicators. Usually the data type comes from an external source, and there are few
subcategories to choose from.

For example, NASA Earth Observation Data is concerned with biosphere, agriculture, and
other Earthy topics:

System Design Stock Trading Platforms - Scaler
No ratings yet
System Design Stock Trading Platforms - Scaler
1 page
Philips Efficia CM Series Network Configuration Manual 74
No ratings yet
Philips Efficia CM Series Network Configuration Manual 74
74 pages
505H Digital Governor For Hydraulic Turbines
No ratings yet
505H Digital Governor For Hydraulic Turbines
248 pages
Unit-2-1
No ratings yet
Unit-2-1
48 pages
Data Analytics
No ratings yet
Data Analytics
302 pages
Data and Types of Data
No ratings yet
Data and Types of Data
7 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
CHAR OF DATA DV 1
No ratings yet
CHAR OF DATA DV 1
14 pages
Unit 1; Data Analytics (KCA-034)
No ratings yet
Unit 1; Data Analytics (KCA-034)
21 pages
DG Intro
No ratings yet
DG Intro
22 pages
DSUR Notes-1
No ratings yet
DSUR Notes-1
12 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
Basics of Data and Types of Data
No ratings yet
Basics of Data and Types of Data
3 pages
ITDS Unit 1_merged
No ratings yet
ITDS Unit 1_merged
86 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Moduke 2 (1)
No ratings yet
Moduke 2 (1)
55 pages
Types of Data
No ratings yet
Types of Data
14 pages
DATA ANALYSIS_Full_Note_Immersive 2
No ratings yet
DATA ANALYSIS_Full_Note_Immersive 2
13 pages
4.0 Introduction to Data
No ratings yet
4.0 Introduction to Data
16 pages
ML-Lecture-4-data
No ratings yet
ML-Lecture-4-data
22 pages
module 2
No ratings yet
module 2
55 pages
Lecture_2_Basics of Data Science (1)
No ratings yet
Lecture_2_Basics of Data Science (1)
56 pages
business Analytics (tanya pandey) mba m3a
No ratings yet
business Analytics (tanya pandey) mba m3a
64 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
How data is col
No ratings yet
How data is col
11 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
CHAPTER 1 - Introduction to Data Science
No ratings yet
CHAPTER 1 - Introduction to Data Science
67 pages
RESEARCH
No ratings yet
RESEARCH
4 pages
EDA Unit-1
No ratings yet
EDA Unit-1
9 pages
Module 3 Data Types
No ratings yet
Module 3 Data Types
10 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Data Visulaziation
No ratings yet
Data Visulaziation
42 pages
Classes of Data
No ratings yet
Classes of Data
10 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
quantitative analysis for business module 3
No ratings yet
quantitative analysis for business module 3
5 pages
data 2
No ratings yet
data 2
48 pages
Data Science and Ai Education For Young Minds
No ratings yet
Data Science and Ai Education For Young Minds
75 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
1.4 - About Data
No ratings yet
1.4 - About Data
17 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
Course 3
No ratings yet
Course 3
22 pages
MMW Stat 24 25
No ratings yet
MMW Stat 24 25
42 pages
CS109a Lecture1
No ratings yet
CS109a Lecture1
67 pages
Data and Information
No ratings yet
Data and Information
6 pages
Data & Types of Data
No ratings yet
Data & Types of Data
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
fds print
No ratings yet
fds print
7 pages
EDA 1
No ratings yet
EDA 1
137 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
Classification and Identification of Data
No ratings yet
Classification and Identification of Data
3 pages
Data Analyst work
No ratings yet
Data Analyst work
22 pages
Note On Data Analytics
No ratings yet
Note On Data Analytics
21 pages
Lesson 03 Understanding The Data
No ratings yet
Lesson 03 Understanding The Data
81 pages
SFDC Defination and Terminolgy
No ratings yet
SFDC Defination and Terminolgy
18 pages
Lesson 01
No ratings yet
Lesson 01
21 pages
GRADE 10 DP - Data and Information
No ratings yet
GRADE 10 DP - Data and Information
4 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
C-Zone SDN BHD: Price List Effective 10 AUG 2019
No ratings yet
C-Zone SDN BHD: Price List Effective 10 AUG 2019
2 pages
PayTm Clone
No ratings yet
PayTm Clone
7 pages
Extron FW Loader v5x1
No ratings yet
Extron FW Loader v5x1
17 pages
Archi Manual
No ratings yet
Archi Manual
18 pages
9955 ACCO 4.15.3: User Guide
No ratings yet
9955 ACCO 4.15.3: User Guide
188 pages
NSU V2.02 Build1036 ReleaseNote
No ratings yet
NSU V2.02 Build1036 ReleaseNote
5 pages
SIGMETRICS 2009 SHARPE Age TwentyTwo
No ratings yet
SIGMETRICS 2009 SHARPE Age TwentyTwo
6 pages
Course_AI
No ratings yet
Course_AI
4 pages
928_historyof
No ratings yet
928_historyof
53 pages
3-Evolutionary Models - Prototype, Spiral and Concurrent Models-09!01!2024
No ratings yet
3-Evolutionary Models - Prototype, Spiral and Concurrent Models-09!01!2024
21 pages
Readme Desktopinfo
No ratings yet
Readme Desktopinfo
16 pages
GTN Company Profile
No ratings yet
GTN Company Profile
3 pages
How To Make An Escape Room Planning Template
100% (1)
How To Make An Escape Room Planning Template
22 pages
sm-g361h Ds
No ratings yet
sm-g361h Ds
44 pages
Software Testing PDF
No ratings yet
Software Testing PDF
21 pages
Effective Tips and Tricks To Google Search
No ratings yet
Effective Tips and Tricks To Google Search
15 pages
DELL SERVIÇOS - Prodeploy Enterprise Suite Customer
No ratings yet
DELL SERVIÇOS - Prodeploy Enterprise Suite Customer
37 pages
PacDrive 3 ILM62DDD24D1000 Document
No ratings yet
PacDrive 3 ILM62DDD24D1000 Document
1 page
Fundamentals of Programming Lecture 2. Procedural Programming
No ratings yet
Fundamentals of Programming Lecture 2. Procedural Programming
18 pages
Everything You Want To Know About Oozie
No ratings yet
Everything You Want To Know About Oozie
31 pages
Horcm
No ratings yet
Horcm
5 pages
PS Printer Kit-AE1 SM Rev1 111110
No ratings yet
PS Printer Kit-AE1 SM Rev1 111110
45 pages
Interrupt Driven Io
No ratings yet
Interrupt Driven Io
15 pages
IP Project Deepika
No ratings yet
IP Project Deepika
26 pages
HCIA-Storage+V5 0+Version+Instruction
No ratings yet
HCIA-Storage+V5 0+Version+Instruction
3 pages
TOSN LoRa
No ratings yet
TOSN LoRa
35 pages
Syriac Open Fonts For Windows
No ratings yet
Syriac Open Fonts For Windows
20 pages

Uploaded by

Uploaded by

What Is Data ?

 Raw data ("unprocessed data") is a collection of numbers or characters before it has

Examples of qualitative data:

Examples of Nominal Data:

Examples of Ordinal Data:

It has a limited number of possible values e.g. days of the month.

Examples of discrete data:

Examples of continuous data:

Types of Data Source

Jeep Green 2.5 tons

Honda Blue 2 tons

BMW Gray 2 tons

Ford Blue 2.5 tons

Chevrolet Green 2.5 tons

Lincoln Blue 2 tons

When discussing data sources in a professional setting, a common issue is misunderstanding

Color Number Avg. Weight

Green 2 2.5 tons

Blue 3 2.17 tons

Computers store data in two ways:

 Machine Data Sources

8. Data Type Level

You might also like