Data Mining Using Decision Trees: Professor J. F. Baldwin
Data Mining Using Decision Trees: Professor J. F. Baldwin
Trees
Professor J. F. Baldwin
Decision Trees from Data Base
Ex Att Att Att Concept
Num Size Colour Shape Satisfied
Tree Structure
v1 v2 v3
Children nodes
CLS ALGORITHM
1. Initialise the tree T by setting it to consist of one
node containing all the examples, both +ve and -ve,
in the training set
2. If all the examples in T are +ve, create a YES node and HALT
{1, 2, 3, 4, 5, 6, 7}
SIZE
med large
small
{1} {2, 3} {4, 5, 6, 7}
med large
small
{1} {2, 3} {4, 5, 6, 7}
COLOUR SHAPE
wedge sphere
YES {2, 3}
SHAPE pillar
wedge {4} {5, 6} {7}
sphere
COLOUR
{2} {3} red
No green Yes
no yes {6} {5}
No Yes
Rules from Tree
IF (SIZE = large
AND
((SHAPE = wedge) OR (SHAPE = pillar AND
COLOUR = red) )))
OR (SIZE = small AND SHAPE = wedge)
THEN NO
IF (SIZE = large
AND
((SHAPE = pillar) AND COLOUR = green)
OR SHAPE = sphere) )
OR (SIZE = small AND SHAPE = sphere)
OR (SIZE = medium)
THEN YES
Disjunctive Normal Form - DNF
IF
(SIZE = medium)
OR
(SIZE = small AND SHAPE = sphere)
OR
(SIZE = large AND SHAPE = sphere)
OR
(SIZE = large AND SHAPE = pillar
AND COLOUR = green
THEN CONCEPT = satisfied
6 1 ⎛1 ⎞
Entropy for a fair dice = S = − ∑ ln ⎜ ⎟ = ln(6) = 1.7917
i=1 6
⎝6 ⎠
3 1 ⎛1 ⎞
Entropy for fair dice with even score = S = − ∑ ln⎜⎝ ⎟ = ln(3) = 1.0986
i=1 3 3⎠
Attributes
Except
ai1 aim
Ai
T Pr T
Pr(T | Ai=aim)
S(ai1) S(ai2) S(aim)
Pr(A i ) = ∑ Pr(Ai ,T)
Expected Entropy for Ai = S(Ai ) = ∑ Pr(a ik )S(a ik ) T
k
How to choose attribute and
Information gain
Determine expected entropy for each attribute
i.e. S(Ai), all i
Choose s such that MIN{S(A j} = S(As )
j
Expand attribute As
{1, 2, 3, 4, 5, 6, 7}
SHAPE
sphere
wedge pillar
brick
SHAPE
pillar
COLOUR
red
? = NO
Post Pruning
Any Node S
N examples
in node Let C be class
with most
E(S) n cases of C examples
C is one of i.e majority
{YES, NO}
Pr(n C in S | p) f(p)
f(p | n in S) =
1
0 Pr(n C in S | p) f(p)dp
Mathematics of Post Pruning
Assume f(p) to be uniform over [0, 1]
The evaluation of the integral
n N–n
p (1-p) 1
f(p | n C in S) = a b
1 n N–n
0 x (1-x) dx =
0 p (1-p) dp n! (N – n + 1)!
(N + 2)!
using Beta Functions
E(S) = E (1 – p)
f(p | n C in S)
E(S) =
p n
(1-p)
N–n+1
dp
=
N–n+1
using
1 n N–n N+2 Beta Functions.
0 p (1-p) dp
Post Pruning for Binary Case
For leaf nodes Si
Error(S) =
S Error(Si) = E(Si)
MIN {E(S)
}
BackUpError(S)
Pm Num of examples in Si
Pi =
P1 P2 Num of examples in S
E(S)
Error(S1) S1 Error(S2) S2 Error(Sm) Sm BackUpError(S)
[1, 0]
[3, 2] [1, 0] d 0.333
0.4
0.429 0.333 0.444 [1, 2] PRUNE
PRUNE means cut the
[1, 1] [0, 1] sub- tree below this point
0.5 0.333
Result of Pruning
After Pruning
a
[6, 4]
[4, 2] c
[2, 2]
[1, 0]
[1, 2]
Generalisation
N–n+k–1
E(S) =
N+k