02 DataCategorization
02 DataCategorization
Data Categorization
Ordinal scale
Interval and ratio-scale
Multidimensional data model
Symmetric Numerically
Ordered Continuous
Literally
Asymmetric Ordered
1. Distinctiveness = and ≠
Categorical
(Qualitative)
2. Order <,≤,>,≥
3. Addition + and -
Numerical
(Quantitative)
4. Multiplication * and /
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 9
NOIR summary
ü Nominal (with distinctiveness property only)
Examples
Gender Used letters or numbers
{ M, F} or { 1, 0 }
Country code ??
????
Examples
Switch: {ON, OFF}
Attendance: {True, False}
Entry: {Yes, No}
etc.
Note
A Binary variable is a special case of a nominal variable that
takes only two possible values.
Note
The values assumed by an ordinal variable can be ordered
among themselves as each pair of values can be compared
literally or using relational operators ( < , ≤ , > , ≥ ).
Note
Interval data are with well-defined interval.
Interval data are measured on a numeric scale (with +ve, 0 (zero), and –ve
values).
Interval data has a zero point on origin. However, the origin does not imply a
true absence of the measured characteristics.
For example, temperature in Celsius and Fahrenheit; 0⁰ does not mean absence
of temperature, that is, no heat!
Note
All ratio data are interval data but the reverse is not true.
In ratio scale, both differences between data values and ratios
(of non-zero) data pairs are meaningful.
Ratio data may be in linear or non-linear scale.
Both interval and ratio data can be stored in same data type
(i.e., integer, float, double, etc.)
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 22
Operation on Ratio data
All arithmetic operations on interval data are
applicable to ratio data.
Example.
Rainfall data of Metrological Department
Time (Year, Season, Month, Week, Day, etc.)
Location (Country, Region, State, etc.)
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 25
2-D view of rainfall data
DRILL DOWN
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 31 31
Data cube segregation
BASE CUBOID
SLICE
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 32 32
Data representation
How a document (e.g., text) can be represented?