0% found this document useful (0 votes)

212 views

ML UNIT-2 Notes

The document discusses decision tree learning and algorithms. It covers: 1) Decision tree learning represents concepts as decision trees and uses recursive induction to build trees, selecting splitting attributes based on entropy, information gain, and computational complexity. 2) Experimental evaluation of learning algorithms measures accuracy and compares algorithms using cross-validation, learning curves, and statistical testing. 3) The ID3 algorithm recursively builds decision trees by selecting the attribute with highest information gain at each node, splitting on attribute values and partitioning examples.

Uploaded by

Anil Krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

212 views

ML UNIT-2 Notes

Uploaded by

Anil Krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

R19-MACHINE LEARNING

Unit-2
Decision Tree Learning: Representing concepts as decision trees, Recursive induction of decision trees,
Picking the best splitting attribute: entropy and information gain, Searching for simple trees and
computational complexity, Occam's razor, Over fitting, noisy data, and pruning. Experimental Evaluation of
Learning Algorithms: Measuring the accuracy of learned hypotheses. Comparing learning algorithms: cross-
validation, learning curves, and statistical hypothesis testing.

Decision Tree Learning:

Decision tree learning is a method for approximating discrete-valued target functions, in which the
learned function is represented by a decision tree. Learned trees can also be re-represented as sets of if-then
rules to improve human readability.
These learning methods are among the most popular of inductive inference algorithms and have been
successfully applied to a broad range of tasks from learning to diagnose medical cases, learning to assess
credit risk of loan applicants.
Decision Trees are a type of Supervised Machine Learning. The tree can be explained by two entities,
namely decision nodes and leaves. The leaves are the decisions or the final outcomes. And the decision
nodes are where the data is split.
DECISION TREE REPRESENTATION
Decision trees classify instances by sorting them down the tree from the root to some leaf node, which
provides the classification of the instance. Each node in the tree specifies a test of some attribute of the
instance, and each branch descending from that node corresponds to one of the possible values for this
attribute. An instance is classified by starting at the root node of the tree, testing the attribute specified by
this node, then moving down the tree branch corresponding to the value of the attribute in the given
example. This process is then repeated for the sub tree rooted at the new node.

FIGURE : A decision tree for the concept PlayTennis. An example is classified by sorting it through the tree
to the appropriate leaf node, then returning the classification associated with this leaf (in this case, Yes or
No).
Test Sample: This decision tree classifies Saturday mornings according to whether they are suitable for
playing tennis. For example, the instance
<Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>
would be sorted down the leftmost branch of this decision tree and would therefore be classified as a
negative instance (i.e., the tree predicts that PlayTennis = no).
In general, decision trees represent a disjunction of conjunctions of constraints on the attribute values of
instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny Λ Humidity = Normal)
V (Outlook = Overcast)
V (Outlook = Rain Λ Wind = Weak)  PlayTennis=Yes
Appropriate Problems for Decision Tree Learning:
• Instances are represented by attribute-value pair
• The target function has discrete output values
• Disjunctive descriptions may be required
• The training data may contain errors
• The training data may contain missing attribute values.
Recursive induction of decision trees(ID3 algorithm)
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES

Homogeneity means all instances belongs to same class.

Attribute Selection Measures
 Information Gain  Gain ratio  Gini Index

Let’s assume for the moment that we are dealing with Boolean features, so D is split into D1 and D2. Let’s
also assume we have two classes, and denote by and the positives and negatives in D .
The question is how to assess the utility of a feature in terms of splitting the examples into positives and
negatives.
The best situation is where and or where and
In that case, the two children of the split are said to be pure.
To measure the impurity of a set of positives and negatives.
impurity can be defined in terms of the proportion

Entropy:
Entropy refers to a common way to measure impurity. In the decision tree, it measures the randomness or
impurity in data sets.
Let us say S is the sample set of training examples. Then, Entropy (S) measuring the impurity of S is defined
as

where c is the number of different class labels and p refers to the proportion of values falling into the i-th
class label.

To illustrate, suppose S is a collection of 14 examples of some boolean concept, including 9 positive and 5
negatives relative to this boolean classification is negative examples (we adopt the notation [9+, 5-] to
summarize such a sample of data). Then the entropy of
Notice that
 the entropy is 0 if all members of S belong to the same class. For example, if all members are
positive (p+ve = I), then p-ve, is 0, and Entropy(S) = -1 . log2(1) - 0 . log2 0 = -1 . 0 - 0 . log2 0 = 0.
 the entropy is 1 when the collection contains an equal number of positive and negative examples.
 If the collection contains unequal numbers of positive and negative examples, the entropy is
between0 and 1

INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY

Given entropy as a measure of the impurity in a collection of training examples, we can now define a
measure of the effectiveness of an attribute in classifying the training data. The measure we will use, called
information gain, is simply the expected reduction in entropy caused by partitioning the examples according
to this attribute.
The information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as

where Values(A) is the set of all possible values for attribute A, and S v is the subset of S for which attribute
A has value v (i.e., Sv = {s ∈ S|A(s) = v}).
Information Gain:
The information gain is created on the basis of the decrease in entropy (S) after a data set is split according
to a particular attribute (A).
Information gain for a particular feature A is calculated by the difference in entropy before a split (or
SbS ) with the entropy after the split (SaS ).

Information Gain refers to the decline in entropy after the dataset is split. It is also called Entropy
Reduction. Building a decision tree is all about discovering attributes that return the highest data gain.

For example, suppose S is a collection of training-example days described by attributes including Wind,
which can have the values Weak or Strong. As before, assume S is a collection containing 14 examples, [9+,
5-]. Of these 14 examples, suppose 6 of the positive and 2 of the negative examples have Wind = Weak, and
the remainder have Wind = Strong. The information gain due to sorting the original 14 examples by the
attribute Wind may then be calculated as
Note: means coverages of weak feature

Information gain is precisely the measure used by ID3 to select the best attribute at each step in growing the
tree. The use of information gain to evaluate the relevance of attributes is summarized in below Figure. In
this figure the information gain of two different attributes, Humidity and Wind, is computed in order to
determine which is the better attribute for classifying the training examples shown in below Table.

Fig:Humidity provides greater information gain than Wind, relative to the target classification. Here, E
stands for entropy and S for the original collection of examples. Given an initial collection S of 9 positive
and 5 negative examples, [9+, 5-], sorting these by their Humidity produces collections of [3+, 4-] (Humidity
= High) and [6+, 1-] (Humidity = Normal). The information gained by this partitioning is .151, compared to
a gain of only .048 for the attribute Wind.
TABLE Training examples for the target concept PlayTennis.
The information gain values for all four attributes are

where S denotes the collection of training examples from above Table.

According to the information gain measure, the Outlook attribute provides the best prediction of the target
attribute, PlayTennis, over the training examples. Therefore, Outlook is selected as the decision attribute for
the root node, and branches are created below the root for each of its possible values (i.e., Sunny, Overcast,
and Rain). The resulting partial decision tree is shown in below Figure along with the training examples
sorted to each new descendant node.

FIGURE:The partially learned decision tree resulting from the first step of ID3. The training examples are
sorted to the corresponding descendant nodes. The Overcast descendant has only positive examples and
therefore becomes a leaf node with classification Yes. The other two nodes will be further expanded, by
selecting the attribute with highest information gain relative to the new subsets of examples.
ALGORITHM:
ID3(Examples, Targetattribute, Attributes)
//Examples are the training examples. Targetattribute is the attribute whose value is to be
predictedby the tree. Attributes is a list of other attributes that may be tested by the learned decision
tree. Returns a decision tree that correctly classifies the given Examples.
 Create a Root node for the tree
 If all Examples are positive, Return the single-node tree Root, with label = +
 If all Examples are negative, Return the single-node tree Root, with label = -
 If Attributes is empty, Return the single-node tree Root, with label = most common value of
Target_attribute in Examples
 Otherwise Begin
o A  the attribute from Attributes that best* classifies Examples
o The decision attribute for Root  A
o For each possible value, vi, of A,
 Add a new tree branch below Root, corresponding to the test A = v i
 Let Examples vi be the subset of Examples that have value vi for A
 If Examples vi, is empty
 Then below this new branch add a leaf node with label = most common value of
Target_attribute in Examples
 Else below this new branch add the sub tree
ID3(Examples vi, Target_attribute, Attributes – {A}))
 End
 Return Root

Note: The best attribute is the one with highest information gain

The final decision tree learned by ID3 from the 14 training examples of PlayTennis is shown in Figure:
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING

 ID3 searches the space of possible decision trees: doing hill-climbing on information gain.

 It searches the complete space of all finite discrete-valued functions. All functions have at least one
tree that represents them.

 It maintains only one hypothesis (unlike Candidate-Elimination). It cannot tell us how many other
viable ones there are.

 It does not do back tracking. Can get stuck in local optima.

 Uses all training examples at each step. Results are less sensitive to errors.

INDUCTIVE BIAS IN DECISION TREE LEARNING

 Note H is the power set of instances X

• Inductive Bias in ID3
– Approximate inductive bias of ID3
• Shorter trees are preferred over larger tress
• BFS-ID3
– A closer approximation to the inductive bias of ID3
• Shorter trees are preferred over longer trees. Trees that place high information gain
attributes close to the root are preferred over those that do not.

• Difference between ID3 & C-E (Candidate-Elimination)

ID3 Candidate-Elimination
Searches a complete hypothesis space incompletely Searches an incomplete hypothesis space completely
Inductive bias is solely a consequence of the Inductive bias is solely a consequence of the
ordering of hypotheses by its search strategy expressive power of its hypothesis representation
• Restriction bias and Preference bias
Preference bias Restriction bias
ID3 Candidate-Elimination
Preference for certain hypotheses over others Categorical restriction on the set of hypotheses
considered
Work within a complete hypothesis space Possibility of excluding the unknown target function

Occam’s razor
– Prefer the simplest hypothesis that fits the data
– Argument in favor
• Fewer short hypotheses than long hypotheses
– Argument opposed
• There are many ways to define small sets of hypotheses
• What’s so special about small sets based on size of hypothesis?
Issues in Decision Tree Learning
• Determine how deeply to grow the decision tree
• Handling continuous attributes
• Choosing an appropriate attribute selection measure
• Handling training data with missing attribute values
• Handling attributes with differing costs
• Improving computational efficiency

Overfitting in decision trees

Definition: Given a hypothesis space H, a hypothesis h ∈ H is said to overfit the training data if there exists
some alternative hypothesis h' ∈ H, such that h has smaller error than h' over the training examples

i.e, ,
but h' has a smaller error than h over the entire distribution of instances.

i.e

Overfitting in decision trees

– Consider adding noisy training example
• <Sunny, Hot, Normal, Strong, PlayTennis = No>
• What effect on earlier tree?
 Avoiding overfitting
– How can we avoid overfitting?
• Stop growing before it reaches the point where it perfectly classifies the training data
• Grow full tree, then post-prune
– How to select best tree?
• Measure performance statistically over training data
• Measure performance over separate validation data set
• MDL: minimize the complexity for encoding the training examples and the decision trees

 Reduced-error pruning
– Split data into training set, validation set used for pruning, and test set for measuring accuracy over
future unseen examples.
– Do until further pruning is harmful:
1. Evaluate impact on validation set of pruning each possible node (plus those below it),
starting at its maximum size and lowest accuracy over test set.
2. Greedily remove the one that most improves validation set accuracy
– Produces smallest version of most accurate sub tree
– What if data is limited?
• Rule post-pruning
– Most frequently used method (e.g., ch.4.5)
1. Convert tree to equivalent set of rules
2. Prune each rule independently of others
3. Sort final rules into desired sequence for use
– In C4.5, evaluation of performance is based on the training set itself, using pessimistic estimate :
• Calculate the rule accuracy over the training set
• Calculate the standard deviation in this estimated accuracy assuming on a binomial
distribution.
• The lower-bound estimate is taken as the measure of rule performance for a given
confidence level.
• Converting a tree to rules
Advantages of rule post-pruning over reduced-error Pruning
• Allows distinguishing among the different contexts in which a decision node is used.
• Removes the distinction between attributes near the root and those near the leaves.
• Improves readability

• Continuous valued attributes

– Define new discrete valued attributes that partition the continuous attribute value into a discrete set
of intervals

– Find a set of thresholds midway Between different target values of the attribute : Temperature>54 and
Temperature>85
– Pick a threshold, c, that produces the greatest information gain : temperature>54
• Attributes with many values
– Problem
• If attribute has many values, Gain will select it
• Imagine using Date = Oct_13_2004 as attribute
– One approach: use GainRatio instead

where Si is subset of S for which A has value vi

(What if |Si| is much closer to |S|? SplitInformation(S,A) becomes very small so that Attribute A would be
selected with large value of GainRatio(S,A) even when Gain(S,A) is small.)
• Unknown attribute values
– What if some examples missing values of A?
Use training examples, sort through tree:
− If node n tests A, assign most common value of A among other examples sorted to node n
− Assign most common value of A among other examples with same target value
− Assign probability pi to each possible value vi of A and classify new examples in same fashion
• Attributes with costs
– Use low-cost attributes where possible, relying on high-cost attributes only when needed to produce
reliable classficiations
– Tan and Schlimmer (1990)
– Nunez (1988)

Where w ∈ [0, 1] determines importance of cost

• Enhancements in C4.5
– Allows for attributes that have a whole range of discrete or continuous values
– Post-pruning after induction of trees, e.g. based on test sets, in order to increase accuracy
– Uses gain ratio as the information gain measure to replace the old biased method
– Handles training data with missing attribute values by replacing them with the most common or the
most probable value

Strengths of decision tree

 It produces very simple understandable rules. For smaller trees, not much mathematical and
computational knowledge is required to understand this model.
 Works well for most of the problems.
 It can handle both numerical and categorical variables.
 Can work well both with small and large training data sets.
 Decision trees provide a definite clue of which features are more useful for classification.
Weaknesses of decision tree
 Decision tree models are often biased towards features having more number of possible values, i.e.
levels.
 This model gets overfitted or underfitted quite easily.
 Decision trees are prone to errors in classification problems with many classes and relatively small
number of training examples.
 A decision tree can be computationally expensive to train.
 Large trees are complex to understand.
Experimental Evaluation of Learning Algorithms:
Evaluating Inductive Hypotheses
• Accuracy of hypotheses on training data is obviously biased since the hypothesis was constructed to fit this
data.
• Accuracy must be evaluated on an independent (usually disjoint) test set.
• The larger the test set is, the more accurate the measured accuracy and the lower the variance observed
across different test sets.

Team 6 - Pricing Assignment 2 - Cambridge Software Corporation V 1.0
100% (3)
Team 6 - Pricing Assignment 2 - Cambridge Software Corporation V 1.0
7 pages
(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
100% (5)
(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
81 pages
LEAD2000 Self-Reflective Essay
No ratings yet
LEAD2000 Self-Reflective Essay
9 pages
ML Unit 1 Notes
100% (1)
ML Unit 1 Notes
19 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Unit-3-Second Chapter
No ratings yet
Unit-3-Second Chapter
9 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
71A Machine Learning
No ratings yet
71A Machine Learning
8 pages
Random Forest
No ratings yet
Random Forest
32 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
135 pages
Deep Learning (RCS-086) ppt-1 of Unit-1
100% (2)
Deep Learning (RCS-086) ppt-1 of Unit-1
14 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Machine Learning Module-3
No ratings yet
Machine Learning Module-3
23 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Statistics in Details
100% (2)
Statistics in Details
283 pages
ML UNIT-1 Notes PDF
No ratings yet
ML UNIT-1 Notes PDF
22 pages
ML (U1&u2)
No ratings yet
ML (U1&u2)
51 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
Internship Project.1
100% (1)
Internship Project.1
32 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
No ratings yet
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
60 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Soft Max
No ratings yet
Soft Max
6 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Data Science Chapitre 0
No ratings yet
Data Science Chapitre 0
25 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
Chapter
100% (1)
Chapter
101 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Machine Learning C
No ratings yet
Machine Learning C
24 pages
A-Simple-Neural-Network-From-Scratch - Jupyter Notebook
No ratings yet
A-Simple-Neural-Network-From-Scratch - Jupyter Notebook
9 pages
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
O1 MethodologicalFramework
No ratings yet
O1 MethodologicalFramework
46 pages
Rubrics For Research Paper
No ratings yet
Rubrics For Research Paper
2 pages
16 PTIRecommendationsforStayCableDesignTestingandInstallation
100% (1)
16 PTIRecommendationsforStayCableDesignTestingandInstallation
23 pages
WHITEHEAD Modelo de Investigación Acción (Inglés)
No ratings yet
WHITEHEAD Modelo de Investigación Acción (Inglés)
15 pages
3 Program Outcomes and Learning Outcomes
No ratings yet
3 Program Outcomes and Learning Outcomes
13 pages
Blunt Et Al, 2011
No ratings yet
Blunt Et Al, 2011
73 pages
Measures of Indicator 1
No ratings yet
Measures of Indicator 1
2 pages
NSDF Namibia Presentation
No ratings yet
NSDF Namibia Presentation
19 pages
Performance Analysis of A Flexible Manufacturing S
No ratings yet
Performance Analysis of A Flexible Manufacturing S
6 pages
Ranajit Dutta
No ratings yet
Ranajit Dutta
4 pages
Unit 4 Customer Service in Travel and Tourism
No ratings yet
Unit 4 Customer Service in Travel and Tourism
12 pages
The School Environment and Student Health
No ratings yet
The School Environment and Student Health
11 pages
U2 Marketing Ass Brief Nuffield SEP 24 (1)
No ratings yet
U2 Marketing Ass Brief Nuffield SEP 24 (1)
6 pages
Module 4 - Sample Answers
100% (1)
Module 4 - Sample Answers
8 pages
The nature causes and effects of contraband trade and capital between India and Sri Lanka
No ratings yet
The nature causes and effects of contraband trade and capital between India and Sri Lanka
8 pages
BS MBA - Syllabus 21-22 Paris
No ratings yet
BS MBA - Syllabus 21-22 Paris
3 pages
Historic Preservation - Hand Outs
100% (1)
Historic Preservation - Hand Outs
3 pages
Laboratory Sop
No ratings yet
Laboratory Sop
7 pages
2.2. Kalantari 2010 - Herbert A. Simon On Making Decisions
No ratings yet
2.2. Kalantari 2010 - Herbert A. Simon On Making Decisions
13 pages
Cement and Concrete Resources
No ratings yet
Cement and Concrete Resources
2 pages
Thesis On Inservice Teacher Training
100% (3)
Thesis On Inservice Teacher Training
7 pages
Boomerang FAL
No ratings yet
Boomerang FAL
19 pages
Topic 1 Site Investigation PDF
50% (2)
Topic 1 Site Investigation PDF
71 pages
July 2015 PMTH003Q
No ratings yet
July 2015 PMTH003Q
20 pages
Article Search and Selection Technique
No ratings yet
Article Search and Selection Technique
7 pages
Unit 3 or
No ratings yet
Unit 3 or
15 pages
Research Review
No ratings yet
Research Review
2 pages
Misconceptions Primary Science
No ratings yet
Misconceptions Primary Science
23 pages

Uploaded by

Uploaded by

R19-MACHINE LEARNING

Decision Tree Learning:

Homogeneity means all instances belongs to same class.

INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY

where S denotes the collection of training examples from above Table.

 It does not do back tracking. Can get stuck in local optima.

INDUCTIVE BIAS IN DECISION TREE LEARNING

 Note H is the power set of instances X

• Difference between ID3 & C-E (Candidate-Elimination)

Overfitting in decision trees

Overfitting in decision trees

• Continuous valued attributes

where Si is subset of S for which A has value vi

Where w ∈ [0, 1] determines importance of cost

Strengths of decision tree

You might also like