100% found this document useful (1 vote)

107 views

Handout9 Trees Bagging Boosting

This document provides an overview of decision trees, bagging, boosting and their applications in machine learning. It discusses decision trees, ensemble methods like bagging and boosting, random forests, AdaBoost and gradient boosting algorithms. Key advantages of ensemble methods are their ability to fit complex data and cancel out weaknesses of individual models. Decision trees are prone to overfitting but bagging and boosting help address this.

Uploaded by

matthiaskoerner19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

107 views

Handout9 Trees Bagging Boosting

Uploaded by

matthiaskoerner19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

FIE453 – Big Data with

Applications to Finance
Trees, Bagging, Boosting

Who? Walt Pohl

From? Norwegian School of Economics
Department of Finance

When? September 26, 2022

Decision Trees
Decision trees are perhaps the easiest
supervised learning technique.
Pick a variable. Split the data into two
groups along the variable.
Within each group, pick another variable,
and split the data into two subgroups.
etc.
Once you have a bunch of groups, use the
average over each group as your prediction.
You can picture this as as tree.
Example: Titanic Survival

(sibsp is the number of siblings or spouses

also on the boat.)
Loss and Regularization

You choose the splits to minimize the loss

function.
To regularize, you penalize based on the
size of the tree. (Very important – decision
trees are very prone to overfitting.)
This is hard to do exactly, so algorithms
use heuristics.
Pros and Cons of Decision
Trees
Pros:
Can in principle fit arbitrary nonlinear
relationship.
Very easy to interpret output.
Cons:
An arbitrary nonlinear relationship can
require an arbitrarily large tree.
Very sensitive to outliers. One observation
can change an early split, which makes all
of the later splits completely different.
Ensemble Methods
Ensemble methods are methods to
combine results from multiple supervised
learning techniques into one.
Weaknesses in individual methods can
cancel out when combined.
The way individual techniques are
combined depends on the type of problem.
For example:
Regression – average together predictions.
Classification – majority vote.
Types of Ensemble Methods

We consider two main types of ensemble

methods:
Bagging
Boosting
Decision trees are frequently used as the
raw methods to combine. We refer to each
individual method as a “learner”.
Bagging

Bootstrap aggregation (bagging) is a

bootstrap-based method. It is a very
simple idea.
Start with N different learners.
For each learner, choose a random sample
from the data, and apply the learner.
Combine the results.
Random Forests
A refined version of this idea applied to
decision trees is called a random forest.
The model consists of a series of trees.
Each individual tree is fitted as follows:
For each split point, randomly throw out
some of the variables and some of the data.
Choose the split on that subset of the data.
Each new split point, rerandomize.
Any mistakes you make from randomizing
get washed out when you average the
trees.
Regularizing Random Forests

Random forests are mostly self-regularizing.

The fact that you are always working with
a subset of the data helps prevent
overfitting.
Averaging the models cancels out
accidental overfits, even if you use big
trees.
Boosting
Boosting results from a famous theorem in
computer science. A weak learner is a
learner that is only slightly better than
random guessing. Is there was an
algorithm to turn a weak learner into a
strong learner?
Yes. Any technique to do so is known as
boosting.
Bagging reduces variance (by averaging).
Boosting reduces bias.
Boosting, cont’d

How does boosting accomplish this trick?

Start with a learner.
Look at the mistakes that learner makes.
Make a new learner that diagnoses the
mistakes of the first learner.
Look at the mistakes that the first two
learners make.
Repeat.
AdaBoost
AdaBoost is a simple classification
algorithm to turn weak learners into strong
learners by iteratively reweighing the
sample.
So each subsequent learner pays more
attention to the cases that were hard to
classify for the first learner.
The weak learners can even be “decision
stumps” – decision trees with only one
split.
AdaBoost Algorithm
Each observation has a weight, as does
each learner. The observation weights start
as equal.
The algorithms works as follows:
1 Fit the m-th learner on the predictions it makes.
2 Compute how many mistakes it makes.
Observations with more weight count for more.
3 Choose the weight of the learner so that a learner
which makes few mistakes has a high weight.
4 Update the observation weights so that the
observations it has trouble with count for more.
The final predictor is the weighted sum of the
individual learners.
Why Does AdaBoost Work?

The steps in the algorithm do two things:

Data points that are hard to classify are
given more weight.
Learners that do well count for more.
Gradient Boosting

Gradient boosting (or gradient boosting

machine) is the leading general-purpose
machine learning algorithm.
For example, most winning entries on
Kaggle – a machine learning competition
site – use gradient boosting.
AdaBoost is restricted to classification.
Gradient boosting is suitable for combining
arbitrary learners.
Gradient Boosting

The idea is simple:

Use the first learner to predict the data.
Compute the “residuals” from the first
learner, and use the second learner to
predict the residuals.
etc.
Output of gradient boosting

You fit a sequence of (small) decision trees:

The first tree fits the data.
The second, third, etc. tree fit the error
from the previous tree.
The final model combines all of the trees,
like in AdaBoost.
Putting the “gradient” in
gradient boosting

The challenge is to define a “residual” for

each possible loss function.
Gradient boosting uses the gradient of the
loss function on the data – this points in
the direction of best improvement.
(For regression, this is exactly the residuals
returned by R.)
Gradient Boosting Algorithm

Gradient boosting can be viewed as a form

of gradient descent.
You have a measure of predictive accuracy,
called the loss.
Each step you compute the gradient of the
loss, and fit a tree to that gradient.
You then take a step in the direction of the
tree.
Final model is a sum of tree predictions.
Regularization of Boosting
Boosting can overfit.
Each individual tree tends to be small
(gradient boosted trees use stumps by
default).
At the same time, each new tree tries to fit
more and more hard-to-match aspects of
the data.
The main way to regularize is to cut back
on the number of trees.
This makes regularization easy to
implement – just look at what happens
when you throw out the later trees.
Boosting Implementations

There are several gradient boosting

implementations, which are worth
comparing:
gbm – original implementation.
xgboost – most high-profile
implementation.
lightgbm – by Microsoft.
catboost – Specialized to classification
problems. Originally by Yandex.
Pros and Cons of Bagging and
Boosting

Pros: Fit complex data much better.

Cons: While we can understand individual
trees, the whole model is an opque black
box.

Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Presentation GPT 4
100% (1)
Presentation GPT 4
25 pages
Weather Forecasting Basepaper
100% (1)
Weather Forecasting Basepaper
14 pages
Lead Scoring Case Study Presentation
100% (2)
Lead Scoring Case Study Presentation
11 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Book
100% (1)
Book
480 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Prediction of Company Bankruptcy: Amlan Nag
100% (2)
Prediction of Company Bankruptcy: Amlan Nag
16 pages
PR01
100% (1)
PR01
41 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Vinee
100% (1)
Vinee
28 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Classification Problems
100% (1)
Classification Problems
25 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
10 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
9 Regression
100% (1)
9 Regression
14 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Programming On Parallel Machines
100% (1)
Programming On Parallel Machines
344 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
From Everand
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
Partha Pritam Deka
No ratings yet
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Determination of Specific Gravity: Experiment No 2 Soil Mechanics Laboratory CE PC 594
No ratings yet
Determination of Specific Gravity: Experiment No 2 Soil Mechanics Laboratory CE PC 594
11 pages
Tle 8
No ratings yet
Tle 8
23 pages
Between Parents and Teenagers
100% (1)
Between Parents and Teenagers
21 pages
Operational Readiness Guide - 2017
100% (2)
Operational Readiness Guide - 2017
36 pages
A Coder's Guide To To ECC
No ratings yet
A Coder's Guide To To ECC
54 pages
22.04.25 WEEKLY Funding Updates
No ratings yet
22.04.25 WEEKLY Funding Updates
6 pages
128 Budhwani2
No ratings yet
128 Budhwani2
8 pages
Lean Training New PDF
No ratings yet
Lean Training New PDF
93 pages
Class 6 Mathematics
No ratings yet
Class 6 Mathematics
3 pages
Vuca Management PDF
No ratings yet
Vuca Management PDF
57 pages
History of Cooperative in The Philippines
No ratings yet
History of Cooperative in The Philippines
4 pages
School Portal System With SMS Notification For Sultan Kudarat State University Kalamansig Campus
No ratings yet
School Portal System With SMS Notification For Sultan Kudarat State University Kalamansig Campus
12 pages
PS 10 Q2 WEEK 1 4 Online
No ratings yet
PS 10 Q2 WEEK 1 4 Online
9 pages
CV - Dr. Ezinwa B. Azuka
No ratings yet
CV - Dr. Ezinwa B. Azuka
12 pages
Project Management Triple Constraint
100% (1)
Project Management Triple Constraint
4 pages
ART2016615
No ratings yet
ART2016615
7 pages
Age Ageing 2007 Baccini 30 5
No ratings yet
Age Ageing 2007 Baccini 30 5
6 pages
Thermocouples Experiment: To Determine The Time Constant of A Typical Iron-Constantan Thermocouple.
50% (2)
Thermocouples Experiment: To Determine The Time Constant of A Typical Iron-Constantan Thermocouple.
8 pages
Welcome To The Course On The Bank Statement Processing Setup
No ratings yet
Welcome To The Course On The Bank Statement Processing Setup
36 pages
Comparative Study of Seismic Behaviour of Multi - Storey Buildings With Flat Slab, Waffle Slab, Ribbed Slab &slab With Secondary Beam
No ratings yet
Comparative Study of Seismic Behaviour of Multi - Storey Buildings With Flat Slab, Waffle Slab, Ribbed Slab &slab With Secondary Beam
14 pages
Law and Justice Project
No ratings yet
Law and Justice Project
23 pages
قصاصات الرابع
No ratings yet
قصاصات الرابع
9 pages
BC ROBO 888 Marketing 2 PDF
No ratings yet
BC ROBO 888 Marketing 2 PDF
2 pages
MCQs ME6604 SCAD MSM by EasyEngineering
33% (3)
MCQs ME6604 SCAD MSM by EasyEngineering
93 pages
Free PDF Modular Arithmetic
0% (1)
Free PDF Modular Arithmetic
2 pages
F
No ratings yet
F
44 pages
Paam Merck
No ratings yet
Paam Merck
1 page
Annual Plan For Economics Grade 11th
73% (11)
Annual Plan For Economics Grade 11th
3 pages
4.idmt Efoc Relay Nx1000a
No ratings yet
4.idmt Efoc Relay Nx1000a
2 pages
IB Chemistry IA Intro
No ratings yet
IB Chemistry IA Intro
13 pages