0% found this document useful (0 votes)

26 views

Unit 3

KPRIET AI Fundamentals and Machine Learning Unit 3

Uploaded by

22cs103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Unit 3

KPRIET AI Fundamentals and Machine Learning Unit 3

Uploaded by

22cs103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Unit-3

Components of learning – Types of machine learning - Data

Objects and Attribute Types – Basic Statistical Descriptions of
Data - Data Preprocessing: Cleaning – Integration - Feature
selection - Feature extraction by Principal Component
Analysis – Data Transformation by Normalization -
Discretization: Binning and Histogram analysis.

Machine Learning Basics and

Preprocessing
What is machine learning?

• Machine Learning, as the name suggests,

provides machines with the ability to learn
autonomously based on experiences,
observations and analysing patterns within a
given data set without explicitly programming
Example of machine learning

• Facebook: For instance, think of Facebook’s facial

recognition algorithm which prompts you to tag
photos whenever you upload a photo.
• Alexa, Cortana, and other voice assistants:
Another example is of the voice assistants who
use machine learning to identify and service the
user’s request.
• Tesla automobiles: One more example is of Tesla’s
autopilot feature.
Every machine learning algorithm has
three components
• Representation: This implies how to represent
knowledge. Examples include decision trees, sets of rules,
instances, graphical models, neural networks, support
vector machines, model ensembles and others.
• Evaluation: This is the way to evaluate candidate
programs (hypotheses). Examples include accuracy,
prediction and recall, squared error, likelihood, posterior
probability, cost, margin, entropy k-L divergence and
others.
• Optimization: Last but not the least, optimization is the
way candidate programs are generated and is known as
the search process. For example, combinational
optimization, convex optimization, and constrained
optimization.
Types of machine learning
Data Objects
• A data object represents the entity. Data Objects
are like a group of attributes of an entity.

• For example, a sales data object may represent

customers, sales, or purchases. When a data
object is listed in a database they are called data
tuples.
Attribute

• Customer - object attributes can be customer

Id, address, etc.
• Type of attributes :
• Qualitative (Nominal (N), Ordinal (O),
Binary(B)).
• Quantitative (Numeric, Discrete, Continuous)
1. Quantitative Data Type

• This Type Of Data Type Consists Of Numerical

Values. Anything Which Is Measured By
Numbers.

• E.G., Profit, Quantity Sold, Height, Weight,

Temperature, Etc.
A.) Discrete Data Type

• The Numeric Data Which Have Discrete Values Or

Whole Numbers. This Type Of Variable Value If
Expressed In Decimal Format Will Have No Proper
Meaning. Their Values Can Be Counted.
• E.G.: – No. Of Cars You Have, No. Of Marbles In
Containers, Students In A Class, Etc.
•
B.) Continuous Data Type

• The Numerical Measures Which Can Take The Value Within A

Certain Range. This Type Of Variable Value If Expressed In
Decimal Format Has True Meaning. Their Values Can Not Be
Counted But Measured. The Value Can Be Infinite.
• E.G.: – Height, Weight, Time, Area, Distance, Measurement Of
Rainfall, Etc.
•
2. Qualitative Data Type

• These Are The Data Types That Cannot Be

Expressed In Numbers.
• This Describes Categories Or Groups And Is
Hence Known As The Categorical Data Type
A. Structured Data

• This Type Of Data Is Either Number Or Words. This

Can Take Numerical Values But Mathematical
Operations Cannot Be Performed On It. This Type
Of Data Is Expressed In Tabular Format.

• E.G.) Sunny=1, Cloudy=2, Windy=3 Or Binary

Form Data Like 0 Or1, Good Or Bad, Etc.
•
B. Unstructured Data
• This Type Of Data Does Not Have The Proper
Format And Therefore Known As Unstructured
Data. This Comprises Textual Data, Sounds,
Images, Videos, Etc.
• Besides This, There Are Also Other Types Refer
As Data Types Preliminaries Or Data
Measures:-
• Nominal
• Ordinal
• Interval
• Ratio
I. Nominal Data Type

• This Is In Use To Express Names Or Labels

Which Are Not Order Or Measurable.
• E.G., Male Or Female (Gender), Race, Country,
Etc.
II. Ordinal Data Type

• This Is Also A Categorical Data Type Like

Nominal Data But Has Some Natural Ordering
Associated With It.
• E.G., Likert Rating Scale, Shirt Sizes, Ranks,
Grades, Etc.
III. Interval Data Type

• This Is Numeric Data Which Has Proper Order

And The Exact Zero Means The True Absence
Of A Value Attached. Here Zero Means Not A
Complete Absence But Has Some Value. This Is
The Local Scale.
• E.G., Temperature Measured In Degree
Celsius, Time, Sat Score, Credit Score, PH, Etc.
Difference Between Values Is Familiar. In This
Case, There Is No Absolute Zero. Absolute
IV. Ratio Data Type

• This Quantitative Data Type Is The Same As

The Interval Data Type But Has The Absolute
Zero. Here Zero Means Complete Absence And
The Scale Starts From Zero. This Is The Global
Scale.
• E.G., Temperature In Kelvin, Height, Weight,
Etc
Basic Statistical Descriptions of Data
• Data Scientist is defined as the most desirable
profession of the 21st century. Machine
Learning and Statistics are the two core skills
required to become a data scientist.
• Statistical Analysis
In statistics, data is collected, analyzed, explored,
and presented to identify patterns and trends.
Alternatively, it is referred to as quantitative
analysis.
1. Descriptive Statistics
• Descriptive Statistics: The purpose of
descriptive statistics is to organize data and
identify the main characteristics of that data.
• Mean: It is the central value which is
commonly known as arithmetic average.
• Mode: It refers to the value that appears most
often in a data set.
• Median: It is the middle value of the ordered
set that divides it in exactly half.
2. Variability
• Standard Deviation: It is a statistic that calculates the
dispersion of a data set as compared to its mean.
• Range: This is defined as the difference between the
largest and smallest value of a dataset.
• Percentile: It refers to the measure used in statistics
that indicates the value below which the given
percentage of observation in the dataset falls.
• Quartile: It is defined as the value that divides the
data points into quarters.
• Variance: It refers to a statistical measure of the
spread between the numbers in a data set.
3. Correlation
• It is one of the major statistical techniques that
measure the relationship between two variables.

• A correlation coefficient that is more than zero

indicates a positive relationship.
• A correlation coefficient that is less than zero
indicates a negative relationship.
• Correlation coefficient zero indicates that there is
no relationship between the two variables.
4. Probability Distribution
• It specifies the likelihood of all possible
events. In simple terms, an event refers to the
result of an experiment like tossing a coin.
Events are of two types dependent and
independent.
• Independent event ( tossing a coin )
• Dependent event ( queen draw in card)
5. Regression
• It is a method that is used to determine the
relationship between one or more independent
variables and a dependent variable.
• Linear regression: It is used to fit the regression
model that explains the relationship between a
numeric predictor variable and one or more
predictor variables.
• Logistic regression: It is used to fit a regression
model that explains the relationship between the
binary response variable and one or more
predictor variables.
6. Normal Distribution
7. BIAS
• In statistical terms, it means when a model is
representative of a complete population. This needs to
be minimized to get the desired outcome.
• The three most common types of bias are:
• Selection bias: It is a phenomenon of selecting a group
of data for statistical analysis, the selection in such a
way that data is not randomized resulting in the data
being unrepresentative of the whole population.
• Confirmation bias: It occurs when the person
performing the statistical analysis has some predefined
assumption.
• Time interval bias: It is caused intentionally by
specifying a certain time range to favour a particular
outcome.

CFA Level 2 1712974289
No ratings yet
CFA Level 2 1712974289
19 pages
Sage 2001 Software
No ratings yet
Sage 2001 Software
68 pages
ML U2
No ratings yet
ML U2
62 pages
EDA
No ratings yet
EDA
52 pages
ML-Lecture-4-data
No ratings yet
ML-Lecture-4-data
22 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
UNIT-1 (Preparing To Model)
No ratings yet
UNIT-1 (Preparing To Model)
82 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
Data Types
No ratings yet
Data Types
18 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
Unit-2-1
No ratings yet
Unit-2-1
48 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
ml 2
No ratings yet
ml 2
4 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
ML Unit-II Notes
No ratings yet
ML Unit-II Notes
86 pages
machine learning unit 2
No ratings yet
machine learning unit 2
9 pages
Data and Its Types
No ratings yet
Data and Its Types
32 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
Dealing with Different Type of Data
No ratings yet
Dealing with Different Type of Data
32 pages
Data and Types of Data
No ratings yet
Data and Types of Data
7 pages
4.0 Introduction to Data
No ratings yet
4.0 Introduction to Data
16 pages
unit1
No ratings yet
unit1
78 pages
Module 3 Data Types
No ratings yet
Module 3 Data Types
10 pages
DAT100_Int_Data_Ana_Lec3_Types_Of_Data
No ratings yet
DAT100_Int_Data_Ana_Lec3_Types_Of_Data
35 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Statistics & Data
No ratings yet
Statistics & Data
11 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Data Science
No ratings yet
Data Science
47 pages
MMW Stat 24 25
No ratings yet
MMW Stat 24 25
42 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Types of Data
No ratings yet
Types of Data
14 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
DSA Unit 2 Answers
No ratings yet
DSA Unit 2 Answers
22 pages
Topics To Be Covered
No ratings yet
Topics To Be Covered
58 pages
How data is col
No ratings yet
How data is col
11 pages
EDS Unit 2 ?
No ratings yet
EDS Unit 2 ?
13 pages
Data Science Using R
No ratings yet
Data Science Using R
74 pages
Unit 1; Data Analytics (KCA-034)
No ratings yet
Unit 1; Data Analytics (KCA-034)
21 pages
EDA Unit-1
No ratings yet
EDA Unit-1
9 pages
1 - Structured Analysis Methodology and Tools (20241204172416)
No ratings yet
1 - Structured Analysis Methodology and Tools (20241204172416)
30 pages
CHAR OF DATA DV 1
No ratings yet
CHAR OF DATA DV 1
14 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
UNIT-2-Preparing To Model
No ratings yet
UNIT-2-Preparing To Model
137 pages
W1L1,2,3 Lecture Script
No ratings yet
W1L1,2,3 Lecture Script
17 pages
Statistics For Management: Q.1 A) 'Statistics Is The Backbone of Decision Making'. Comment
No ratings yet
Statistics For Management: Q.1 A) 'Statistics Is The Backbone of Decision Making'. Comment
10 pages
Chapter 1. Biostatistics
No ratings yet
Chapter 1. Biostatistics
34 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
Typesof Data MSSridhar
No ratings yet
Typesof Data MSSridhar
18 pages
Chapter 1.1 Introduction to Data
No ratings yet
Chapter 1.1 Introduction to Data
10 pages
Types of Data and Data Quality: KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
Types of Data and Data Quality: KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
25 pages
Introduction to Statistics..Final
No ratings yet
Introduction to Statistics..Final
221 pages
IE208-Updated Course Slides Up to April 24
No ratings yet
IE208-Updated Course Slides Up to April 24
321 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
EDA 1
No ratings yet
EDA 1
137 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Unit 2
No ratings yet
Unit 2
47 pages
Semiconductors
No ratings yet
Semiconductors
29 pages
Fuels
No ratings yet
Fuels
19 pages
Java Basic Questions
No ratings yet
Java Basic Questions
13 pages
Low Head Oxygenators
No ratings yet
Low Head Oxygenators
13 pages
What Is Behind Labor Mobility Costs Evidence From Indonesia
No ratings yet
What Is Behind Labor Mobility Costs Evidence From Indonesia
43 pages
1MA Estadistica
No ratings yet
1MA Estadistica
24 pages
The Application of Computer Vision Machine and Deep Learning Alg
No ratings yet
The Application of Computer Vision Machine and Deep Learning Alg
58 pages
Logistic Regression
No ratings yet
Logistic Regression
47 pages
Demand Forecasting
No ratings yet
Demand Forecasting
17 pages
Forecasting
No ratings yet
Forecasting
81 pages
Chapter 7
No ratings yet
Chapter 7
2 pages
Vicky
No ratings yet
Vicky
11 pages
11 1998 Itner Lacker Nonfinancial Measure
No ratings yet
11 1998 Itner Lacker Nonfinancial Measure
36 pages
Data Collection Methods
No ratings yet
Data Collection Methods
44 pages
A Guide To Machine Learning Algorithms 100+
No ratings yet
A Guide To Machine Learning Algorithms 100+
49 pages
Lesson 5
No ratings yet
Lesson 5
5 pages
BCSL606_P7
No ratings yet
BCSL606_P7
5 pages
May Jun 2023
No ratings yet
May Jun 2023
4 pages
Survival Analysis For Cache Time-To-Live Optimization Presentation
No ratings yet
Survival Analysis For Cache Time-To-Live Optimization Presentation
27 pages
Forecast Stevenson
No ratings yet
Forecast Stevenson
24 pages
Algebra 1 B 18 Week EOC Syllabus Spring 2020
No ratings yet
Algebra 1 B 18 Week EOC Syllabus Spring 2020
11 pages
Religion and Economic Growth Across Countries
No ratings yet
Religion and Economic Growth Across Countries
23 pages
IE442: IE442:: Design and Analysis of Experiments in Engineering Experiments in Engineering
No ratings yet
IE442: IE442:: Design and Analysis of Experiments in Engineering Experiments in Engineering
9 pages
Hotel Spss Data
No ratings yet
Hotel Spss Data
42 pages
Colloquim Jan 29 Abstract Corrected Jan 6
No ratings yet
Colloquim Jan 29 Abstract Corrected Jan 6
87 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
Methematical Methods of Relative Engine Performance VOLPONI
No ratings yet
Methematical Methods of Relative Engine Performance VOLPONI
27 pages
WMO168 Ed2009 Vol II Ch7 Up2008 en
No ratings yet
WMO168 Ed2009 Vol II Ch7 Up2008 en
34 pages
A Machine Learning Approach For Forecasting Hierarchical Time Series
No ratings yet
A Machine Learning Approach For Forecasting Hierarchical Time Series
17 pages
Philippines Fast Food
No ratings yet
Philippines Fast Food
12 pages
Jurnal Surani
No ratings yet
Jurnal Surani
8 pages

Uploaded by

Uploaded by

Unit-3

Components of learning – Types of machine learning - Data

Machine Learning Basics and

• Machine Learning, as the name suggests,

• Facebook: For instance, think of Facebook’s facial

• For example, a sales data object may represent

• Customer - object attributes can be customer

• This Type Of Data Type Consists Of Numerical

• E.G., Profit, Quantity Sold, Height, Weight,

• The Numeric Data Which Have Discrete Values Or

• The Numerical Measures Which Can Take The Value Within A

• These Are The Data Types That Cannot Be

• This Type Of Data Is Either Number Or Words. This

• E.G.) Sunny=1, Cloudy=2, Windy=3 Or Binary

• This Is In Use To Express Names Or Labels

• This Is Also A Categorical Data Type Like

• This Is Numeric Data Which Has Proper Order

• This Quantitative Data Type Is The Same As

• A correlation coefficient that is more than zero

You might also like