0% found this document useful (0 votes)
26 views

Unit 3

KPRIET AI Fundamentals and Machine Learning Unit 3

Uploaded by

22cs103
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Unit 3

KPRIET AI Fundamentals and Machine Learning Unit 3

Uploaded by

22cs103
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit-3

Components of learning – Types of machine learning - Data


Objects and Attribute Types – Basic Statistical Descriptions of
Data - Data Preprocessing: Cleaning – Integration - Feature
selection - Feature extraction by Principal Component
Analysis – Data Transformation by Normalization -
Discretization: Binning and Histogram analysis.

Machine Learning Basics and


Preprocessing
What is machine learning?

• Machine Learning, as the name suggests,


provides machines with the ability to learn
autonomously based on experiences,
observations and analysing patterns within a
given data set without explicitly programming
Example of machine learning

• Facebook: For instance, think of Facebook’s facial


recognition algorithm which prompts you to tag
photos whenever you upload a photo.
• Alexa, Cortana, and other voice assistants:
Another example is of the voice assistants who
use machine learning to identify and service the
user’s request.
• Tesla automobiles: One more example is of Tesla’s
autopilot feature.
Every machine learning algorithm has
three components
• Representation: This implies how to represent
knowledge. Examples include decision trees, sets of rules,
instances, graphical models, neural networks, support
vector machines, model ensembles and others.
• Evaluation: This is the way to evaluate candidate
programs (hypotheses). Examples include accuracy,
prediction and recall, squared error, likelihood, posterior
probability, cost, margin, entropy k-L divergence and
others.
• Optimization: Last but not the least, optimization is the
way candidate programs are generated and is known as
the search process. For example, combinational
optimization, convex optimization, and constrained
optimization.
Types of machine learning
Data Objects
• A data object represents the entity. Data Objects
are like a group of attributes of an entity.

• For example, a sales data object may represent


customers, sales, or purchases. When a data
object is listed in a database they are called data
tuples.
Attribute

• Customer - object attributes can be customer


Id, address, etc.
• Type of attributes :
• Qualitative (Nominal (N), Ordinal (O),
Binary(B)).
• Quantitative (Numeric, Discrete, Continuous)
1. Quantitative Data Type

• This Type Of Data Type Consists Of Numerical


Values. Anything Which Is Measured By
Numbers.

• E.G., Profit, Quantity Sold, Height, Weight,


Temperature, Etc.
A.) Discrete Data Type

• The Numeric Data Which Have Discrete Values Or


Whole Numbers. This Type Of Variable Value If
Expressed In Decimal Format Will Have No Proper
Meaning. Their Values Can Be Counted.
• E.G.: – No. Of Cars You Have, No. Of Marbles In
Containers, Students In A Class, Etc.

B.) Continuous Data Type

• The Numerical Measures Which Can Take The Value Within A


Certain Range. This Type Of Variable Value If Expressed In
Decimal Format Has True Meaning. Their Values Can Not Be
Counted But Measured. The Value Can Be Infinite.
• E.G.: – Height, Weight, Time, Area, Distance, Measurement Of
Rainfall, Etc.

2. Qualitative Data Type

• These Are The Data Types That Cannot Be


Expressed In Numbers.
• This Describes Categories Or Groups And Is
Hence Known As The Categorical Data Type
A. Structured Data

• This Type Of Data Is Either Number Or Words. This


Can Take Numerical Values But Mathematical
Operations Cannot Be Performed On It. This Type
Of Data Is Expressed In Tabular Format.

• E.G.) Sunny=1, Cloudy=2, Windy=3 Or Binary


Form Data Like 0 Or1, Good Or Bad, Etc.

B. Unstructured Data
• This Type Of Data Does Not Have The Proper
Format And Therefore Known As Unstructured
Data. This Comprises Textual Data, Sounds,
Images, Videos, Etc.
• Besides This, There Are Also Other Types Refer
As Data Types Preliminaries Or Data
Measures:-
• Nominal
• Ordinal
• Interval
• Ratio
I. Nominal Data Type

• This Is In Use To Express Names Or Labels


Which Are Not Order Or Measurable.
• E.G., Male Or Female (Gender), Race, Country,
Etc.
II. Ordinal Data Type

• This Is Also A Categorical Data Type Like


Nominal Data But Has Some Natural Ordering
Associated With It.
• E.G., Likert Rating Scale, Shirt Sizes, Ranks,
Grades, Etc.
III. Interval Data Type

• This Is Numeric Data Which Has Proper Order


And The Exact Zero Means The True Absence
Of A Value Attached. Here Zero Means Not A
Complete Absence But Has Some Value. This Is
The Local Scale.
• E.G., Temperature Measured In Degree
Celsius, Time, Sat Score, Credit Score, PH, Etc.
Difference Between Values Is Familiar. In This
Case, There Is No Absolute Zero. Absolute
IV. Ratio Data Type

• This Quantitative Data Type Is The Same As


The Interval Data Type But Has The Absolute
Zero. Here Zero Means Complete Absence And
The Scale Starts From Zero. This Is The Global
Scale.
• E.G., Temperature In Kelvin, Height, Weight,
Etc
Basic Statistical Descriptions of Data
• Data Scientist is defined as the most desirable
profession of the 21st century. Machine
Learning and Statistics are the two core skills
required to become a data scientist.
• Statistical Analysis
In statistics, data is collected, analyzed, explored,
and presented to identify patterns and trends.
Alternatively, it is referred to as quantitative
analysis.
1. Descriptive Statistics
• Descriptive Statistics: The purpose of
descriptive statistics is to organize data and
identify the main characteristics of that data.
• Mean: It is the central value which is
commonly known as arithmetic average.
• Mode: It refers to the value that appears most
often in a data set.
• Median: It is the middle value of the ordered
set that divides it in exactly half.
2. Variability
• Standard Deviation: It is a statistic that calculates the
dispersion of a data set as compared to its mean.
• Range: This is defined as the difference between the
largest and smallest value of a dataset.
• Percentile: It refers to the measure used in statistics
that indicates the value below which the given
percentage of observation in the dataset falls.
• Quartile: It is defined as the value that divides the
data points into quarters.
• Variance: It refers to a statistical measure of the
spread between the numbers in a data set.
3. Correlation
• It is one of the major statistical techniques that
measure the relationship between two variables.

• A correlation coefficient that is more than zero


indicates a positive relationship.
• A correlation coefficient that is less than zero
indicates a negative relationship.
• Correlation coefficient zero indicates that there is
no relationship between the two variables.
4. Probability Distribution
• It specifies the likelihood of all possible
events. In simple terms, an event refers to the
result of an experiment like tossing a coin.
Events are of two types dependent and
independent.
• Independent event ( tossing a coin )
• Dependent event ( queen draw in card)
5. Regression
• It is a method that is used to determine the
relationship between one or more independent
variables and a dependent variable.
• Linear regression: It is used to fit the regression
model that explains the relationship between a
numeric predictor variable and one or more
predictor variables.
• Logistic regression: It is used to fit a regression
model that explains the relationship between the
binary response variable and one or more
predictor variables.
6. Normal Distribution
7. BIAS
• In statistical terms, it means when a model is
representative of a complete population. This needs to
be minimized to get the desired outcome.
• The three most common types of bias are:
• Selection bias: It is a phenomenon of selecting a group
of data for statistical analysis, the selection in such a
way that data is not randomized resulting in the data
being unrepresentative of the whole population.
• Confirmation bias: It occurs when the person
performing the statistical analysis has some predefined
assumption.
• Time interval bias: It is caused intentionally by
specifying a certain time range to favour a particular
outcome.

You might also like