0% found this document useful (0 votes)

50 views

Data Mining Using Decision Trees: Professor J. F. Baldwin

The document describes how decision trees can be used to classify data based on attribute values, showing an example of classifying examples as satisfying a concept or not based on their size, color, and shape attributes; it explains the ID3 algorithm for building decision trees by choosing the attribute that best splits the data at each node based on information gain; and it walks through applying the ID3 algorithm to build a decision tree to classify the examples in the example data set.

Uploaded by

Srinivasan Mohanakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Data Mining Using Decision Trees: Professor J. F. Baldwin

Uploaded by

Srinivasan Mohanakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

Data Mining using Decision

Trees

Professor J. F. Baldwin
Decision Trees from Data Base
Ex Att Att Att Concept
Num Size Colour Shape Satisfied

1 med blue brick yes

2 small red wedge no
3 small red sphere yes
4 large red wedge no
5 large green pillar yes
6 large red pillar no
7 large green sphere yes

Choose target : Concept satisfied

Use all attributes except Ex Num
CLS - Concept Learning
System - Hunt et al.

Tree Structure

Node with mixture

of +ve and -ve Attribute Parent node
examples V

v1 v2 v3

Children nodes
CLS ALGORITHM
1. Initialise the tree T by setting it to consist of one
node containing all the examples, both +ve and -ve,
in the training set
2. If all the examples in T are +ve, create a YES node and HALT

3. If all the examples in T are -ve, create a NO node and HALT

4. Otherwise, select an attribute F with values v1, ..., vn

Partition T into subsets T1, ..., Tn according to the values on F.
Create branches with F as parent and T1, ..., Tn as child nodes.

5. Apply the procedure recursively to each child node

Data Base Example
Using attribute SIZE

{1, 2, 3, 4, 5, 6, 7}
SIZE

med large
small
{1} {2, 3} {4, 5, 6, 7}

YES Expand Expand

Expanding
{1, 2, 3, 4, 5, 6, 7}
SIZE

med large
small
{1} {2, 3} {4, 5, 6, 7}
COLOUR SHAPE
wedge sphere
YES {2, 3}
SHAPE pillar
wedge {4} {5, 6} {7}
sphere
COLOUR
{2} {3} red
No green Yes
no yes {6} {5}
No Yes
Rules from Tree
IF (SIZE = large
AND
((SHAPE = wedge) OR (SHAPE = pillar AND
COLOUR = red) )))
OR (SIZE = small AND SHAPE = wedge)
THEN NO

IF (SIZE = large
AND
((SHAPE = pillar) AND COLOUR = green)
OR SHAPE = sphere) )
OR (SIZE = small AND SHAPE = sphere)
OR (SIZE = medium)
THEN YES
Disjunctive Normal Form - DNF
IF
(SIZE = medium)
OR
(SIZE = small AND SHAPE = sphere)
OR
(SIZE = large AND SHAPE = sphere)
OR
(SIZE = large AND SHAPE = pillar
AND COLOUR = green
THEN CONCEPT = satisfied

ELSE CIONCEPT = not satisfied

ID3 - Quinlan
Attributes are chosen in any order for the CLS algorithm.
This can result in large decision trees if the ordering is not
optimal. Optimal ordering would result in smallest decision
Tree.

No method is known to determine optimal ordering.

We use a heuristic to provide efficient ordering which
will result in near optimal ordering

ID3 = CLS + efficient ordering of attributes

Entropy is used to order the attributes.

Entropy
For random variable V which can take values {v1, v2, …, vn}
with Pr(vi) = pi, all i, the entropy of V is given by
n
S(V) = − ∑ p i ln(pi )
i=1

6 1 ⎛1 ⎞
Entropy for a fair dice = S = − ∑ ln ⎜ ⎟ = ln(6) = 1.7917
i=1 6
⎝6 ⎠
3 1 ⎛1 ⎞
Entropy for fair dice with even score = S = − ∑ ln⎜⎝ ⎟ = ln(3) = 1.0986
i=1 3 3⎠

Information gain = 1.7917 - 1.0986 = 0.6931 Differences between

entropies
Attribute Expansion
Expand attribute Ai -
other
attributes Ai T Pr
Equally likely unless specified
Pr(A1, …Ai, …An, T)

Attributes
Except
ai1 aim
Ai
T Pr T

Pr(A1, …Ai-1, Ai+1, …An, T | Ai = ai1)

Pass probabilities corresponding to ai1 from above and re-normalise
-equally likely again if previous equally likely
Expected Entropy for an
Attribute
Attribute Ai and target T -
Ai T Pr
Pr(Ai, T) = ∑ ... ∑ ∑ ... ∑ Pr(A1..., An ,T)
A1 A i−1 A i=1 An

S(a i1 ) = ∑ Pr(t k | a i1 )Ln Pr(t k | a i1 )

Pass probabilities corresponding to tk k
from above for ai1and re-normalise
ai1 aim
T Pr T Pr

Pr(T | Ai=aim)
S(ai1) S(ai2) S(aim)
Pr(A i ) = ∑ Pr(Ai ,T)
Expected Entropy for Ai = S(Ai ) = ∑ Pr(a ik )S(a ik ) T
k
How to choose attribute and
Information gain
Determine expected entropy for each attribute
i.e. S(Ai), all i
Choose s such that MIN{S(A j} = S(As )
j
Expand attribute As

By choosing attribute As the information gain is

S - S(As)
where S = ∑ Pr(T)Ln{Pr(T)} where Pr(T) = ∑ ... ∑ Pr(A1..., An , T)
T A1 A n

Minimising expected entropy is equivalent to maximising

Information gain
Previous Example
Ex Att Att Att Concept
Num Size Colour Shape Satisfied Pr

1 med blue brick yes 1/7

2 small red wedge no 1/7
3 small red sphere yes 1/7
4 large red wedge no 1/7
5 large green pillar yes 1/7
6 large red pillar no 1/7
7 large green sphere yes 1/7
Concept
satisfied Pr
yes S = (4/7)Log(4/7) + (3/7)Log(3/7) = 0.99
4/7
no 3/7
Entropy for attribute Size
Att Concept Pr
Information Gain Size Satisfied S(Size) = (2/7)1 + (1/7)0
for Size = 0.99 - 0.86 med yes 1/7 + (4/7)1 = 6/7 = 0.86
= 0.13 small no 1/7 Pr(large) = 4/7
Pr(small) = 2/7 small yes 1/7 large
small large no 2/7
large yes 2/7
Concept
Concept Satisfied Pr
Satisfied Pr no 1/2
no 1/2 Pr(med) = 1/7 med
yes 1/2
yes 1/2 Concept S(large) = 1
S(small) = 1 Satisfied Pr
yes 1
S(med) = 0
First Expansion
Attribute Information Gain
SIZE 0.13
COLOUR 0.52 max
SHAPE 0.7 choose

{1, 2, 3, 4, 5, 6, 7}
SHAPE

sphere
wedge pillar
brick

{2, 4} {1} {5, 6}

{3, 7}

NO YES Expand YES

Complete Decision Tree
{1, 2, 3, 4, 5, 6, 7}
SHAPE Rule:
IF
sphere
wedge
brick pillar Shape is wedge
OR
{2, 4} {1} {5, 6}
{3, 7} Shape is brick
COLOUR
OR
NO YES red green YES Shape is pillar AND
Colour is red
{6) {5} OR
Shape is sphere
YES
THEN NO
NO
ELSE YES
A new case
Att Att Att Concept
Size Colour Shape Satisfied

med red pillar ?

SHAPE
pillar

COLOUR
red

? = NO
Post Pruning
Any Node S

N examples
in node Let C be class
with most
E(S) n cases of C examples
C is one of i.e majority
{YES, NO}

Suppose we terminate this node and make it a leaf with

classification C.
What will be the expected error, E(S), if we use the tree
for new cases and we reach this node.

E(S) = Pr(class of new case is a class ≠ C)

Bayes Updating for Post Pruning
Let p denote probability of class C for new case arriving at S
We do not know p. Let f(p) be a prior probability distribution
for p on [0, 1]. We can update this prior using Bayes’ updating
with the information at node S.

The information at node S is n C in S

Pr(n C in S | p) f(p)
f(p | n in S) =
1
0 Pr(n C in S | p) f(p)dp
Mathematics of Post Pruning
Assume f(p) to be uniform over [0, 1]
The evaluation of the integral
n N–n
p (1-p) 1
f(p | n C in S) = a b
1 n N–n
0 x (1-x) dx =
0 p (1-p) dp n! (N – n + 1)!
(N + 2)!
using Beta Functions
E(S) = E (1 – p)
f(p | n C in S)

E(S) =
p n
(1-p)
N–n+1
dp
=
N–n+1
using
1 n N–n N+2 Beta Functions.
0 p (1-p) dp
Post Pruning for Binary Case
For leaf nodes Si
Error(S) =
S Error(Si) = E(Si)
MIN {E(S)
}
BackUpError(S)
Pm Num of examples in Si
Pi =
P1 P2 Num of examples in S

E(S)
Error(S1) S1 Error(S2) S2 Error(Sm) Sm BackUpError(S)

For any node S which is not a leaf

node we can calculate Decision: Prune at S if
BackUpError(S) ≥ Error(S)
BackUpError(S) = i Pi Error(Si)
Example of Post Pruning
Before Pruning [x, y]
0.417 a means x YES cases
0.378 [6, 4] and y NO cases
We underline Error(Sk)
0.375 b 0.5 c
0.413 [4, 2] PRUNE 0.383 [2, 2]

[1, 0]
[3, 2] [1, 0] d 0.333
0.4
0.429 0.333 0.444 [1, 2] PRUNE
PRUNE means cut the
[1, 1] [0, 1] sub- tree below this point
0.5 0.333
Result of Pruning
After Pruning
a
[6, 4]

[4, 2] c
[2, 2]

[1, 0]
[1, 2]
Generalisation

For the case in which we have k classes the generalisation

for E(S) is

N–n+k–1
E(S) =
N+k

Otherwise, pruning method is the same.

Testing
DataBase
Training Test Learn rules using Training Set and Prune
Set Set
Test rules on this set and record % correct

Test rules on Test Set record % correct

% accuracy on test set should be close to that of training set.

This indicates good generalisation

Over-fitting can occur if noisy data is used or too specific attributes

are used. Pruning will overcome noise to some extent but not
completely. Too specific attributes must be dropped.

12 NODIA Economics
100% (1)
12 NODIA Economics
5 pages
SX501 - User Manual
No ratings yet
SX501 - User Manual
36 pages
ACT6000 - Presentation - For - TLC Services - 2019
No ratings yet
ACT6000 - Presentation - For - TLC Services - 2019
40 pages
Permutation and Combinations
From Everand
Permutation and Combinations
Ramesh Chandra
4/5 (36)
Paddington (2014) 720p BrRip x264 - YIFY - Magnet
No ratings yet
Paddington (2014) 720p BrRip x264 - YIFY - Magnet
2 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
AI PPT
No ratings yet
AI PPT
16 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
unit 3
No ratings yet
unit 3
90 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
3. Classification Trees,
No ratings yet
3. Classification Trees,
48 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
Randomforest TNP
No ratings yet
Randomforest TNP
71 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Decision Tree Classifier Project
100% (1)
Decision Tree Classifier Project
20 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Springer.linguistic Decision Trees for Classification-2014
No ratings yet
Springer.linguistic Decision Trees for Classification-2014
43 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Ml Lecture04x2
No ratings yet
Ml Lecture04x2
16 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
ML Lecture 3
No ratings yet
ML Lecture 3
13 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
15-780: Graduate Artificial Intelligence: Decision Trees
No ratings yet
15-780: Graduate Artificial Intelligence: Decision Trees
41 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
02 DecisionTrees Done
No ratings yet
02 DecisionTrees Done
68 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
4.Decision Tree
No ratings yet
4.Decision Tree
39 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Machine Learning: Decision Trees: CS540 Jerry Zhu University of Wisconsin-Madison
No ratings yet
Machine Learning: Decision Trees: CS540 Jerry Zhu University of Wisconsin-Madison
49 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
6.034 Notes: Section 5.1: Slide 5.1.1
No ratings yet
6.034 Notes: Section 5.1: Slide 5.1.1
22 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Simple Numbers
From Everand
Simple Numbers
Prasant
No ratings yet
Electronic Service Information System (ERWIN - ERWIN) Polo 7
No ratings yet
Electronic Service Information System (ERWIN - ERWIN) Polo 7
4 pages
Informatica PowerCenter L2
No ratings yet
Informatica PowerCenter L2
2 pages
RISONIC Modular Flyer E22 040 0082532 001 PDF
No ratings yet
RISONIC Modular Flyer E22 040 0082532 001 PDF
2 pages
(Ebook) Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications by Cal Henderson ISBN 9780596102357, 0596102356 download
100% (1)
(Ebook) Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications by Cal Henderson ISBN 9780596102357, 0596102356 download
58 pages
Deposit Bank - Google Search
No ratings yet
Deposit Bank - Google Search
3 pages
Pin: Building Customized Program Analysis Tools With Dynamic Instrumentation
No ratings yet
Pin: Building Customized Program Analysis Tools With Dynamic Instrumentation
11 pages
Call Letter
No ratings yet
Call Letter
1 page
Research Article Blockchain-Based Electronic Voting System: Signi Ficance and Requirements
No ratings yet
Research Article Blockchain-Based Electronic Voting System: Signi Ficance and Requirements
17 pages
DSD Lab 14 Handout
No ratings yet
DSD Lab 14 Handout
9 pages
3rd term dp notes for ss2
No ratings yet
3rd term dp notes for ss2
79 pages
37_sap t codes
No ratings yet
37_sap t codes
97 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Working With Yocto To Build Linux
No ratings yet
Working With Yocto To Build Linux
55 pages
Tech Note - MXview License Activation Process - v1.2
No ratings yet
Tech Note - MXview License Activation Process - v1.2
16 pages
2024-25 - Stage 7 - Mathematics - Term1 - PS1.
No ratings yet
2024-25 - Stage 7 - Mathematics - Term1 - PS1.
4 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
JD Application Management Specialist
No ratings yet
JD Application Management Specialist
2 pages
CH 15
No ratings yet
CH 15
41 pages
Untitled
No ratings yet
Untitled
107 pages
Hardening Step by Step
No ratings yet
Hardening Step by Step
2 pages
Ingedata Mastering Annotation Whitepaper 3
No ratings yet
Ingedata Mastering Annotation Whitepaper 3
21 pages
TGT Maths Syllabus
No ratings yet
TGT Maths Syllabus
3 pages
ME105 Lecture 1 - Course Outline
No ratings yet
ME105 Lecture 1 - Course Outline
9 pages
FICT
No ratings yet
FICT
67 pages
csv2tcxml:TCXML Data Migration in TC11.2.x
No ratings yet
csv2tcxml:TCXML Data Migration in TC11.2.x
1 page
Plan 3.0 by Aki Khan
No ratings yet
Plan 3.0 by Aki Khan
6 pages

Uploaded by

Uploaded by

Data Mining using Decision

1 med blue brick yes

Choose target : Concept satisfied

Node with mixture

3. If all the examples in T are -ve, create a NO node and HALT

4. Otherwise, select an attribute F with values v1, ..., vn

5. Apply the procedure recursively to each child node

YES Expand Expand

ELSE CIONCEPT = not satisfied

No method is known to determine optimal ordering.

ID3 = CLS + efficient ordering of attributes

Entropy is used to order the attributes.

Information gain = 1.7917 - 1.0986 = 0.6931 Differences between

Pr(A1, …Ai-1, Ai+1, …An, T | Ai = ai1)

S(a i1 ) = ∑ Pr(t k | a i1 )Ln Pr(t k | a i1 )

By choosing attribute As the information gain is

Minimising expected entropy is equivalent to maximising

1 med blue brick yes 1/7

{2, 4} {1} {5, 6}

NO YES Expand YES

med red pillar ?

Suppose we terminate this node and make it a leaf with

E(S) = Pr(class of new case is a class ≠ C)

The information at node S is n C in S

For any node S which is not a leaf

For the case in which we have k classes the generalisation

Otherwise, pruning method is the same.

Test rules on Test Set record % correct

% accuracy on test set should be close to that of training set.

Over-fitting can occur if noisy data is used or too specific attributes

You might also like