Yggdrasil Decision Forests (YDF) is a production-grade collection of algorithms for the training, serving, and interpretation of decision forest models. YDF is open-source and is available in C++, command-line interface (CLI), TensorFlow (under the name TensorFlow Decision Forests; TF-DF), JavaScript (inference only), and Go (inference only). YDF is supported on Linux, Windows, macOS, Raspberry Pi, and Arduino (experimental).
To learn more about YDF, check the documentation 📕.
For details about YDF design, read our KDD 2023 paper Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library.
- Random Forest, Gradient Boosted Trees, CART, and variations such as Dart, Extremely randomized trees.
- Classification, regression, ranking and uplifting.
- Model evaluation e.g. accuracy, auc, roc, auuc, pr-auc, confidence boundaries, ndgc.
- Model analysis e.g. pdp, cep, variable importance, model plotting, structure analysis.
- Native support for numerical, categorical, boolean, categorical-set (e.g. text) features.
- Native support for missing values.
- State of the art tree learning features e.g. oblique split, honest tree, hessian score, global tree optimization.
- Distributed training.
- Automatic hyper-parameter tuning.
- Fast model inference e.g. vpred, quick-scorer extended.
- Cross compatible API and models: C++, CLI, Go, JavaScript and Python.
See the feature list for more details.
With the CLI you can train, evaluate, and benchmark the speed of a model as follows:
# Download YDF.
wget https://github.com/google/yggdrasil-decision-forests/releases/download/1.0.0/cli_linux.zip
unzip cli_linux.zip
# Create a training configuration
echo 'label:"my_label" learner:"RANDOM_FOREST" ' > config.pbtxt
# List columns in training dataset
infer_dataspec --dataset="csv:train.csv" --output="spec.pbtxt"
# Train model
train --dataset="csv:train.csv" --dataspec="spec.pbtxt" --config="config.pbtxt" --output="my_model"
# Evaluate model
evaluate --dataset="csv:test.csv" --model="my_model" > evaluation.txt
# Benchmark the speed of the model
benchmark_inference --dataset="csv:test.csv" --model="my_model" > benchmark.txt
(based on examples/beginner.sh)
The same model can be trained in C++ as follows:
auto dataset_path = "csv:train.csv";
// List columns in training dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);
// Create a training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");
// Train model
std::unique_ptr<AbstractLearner> learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);
// Export model
SaveModel("my_model", model.get());
(based on examples/beginner.cc)
The same model can be trained in Python using TensorFlow Decision Forests as follows:
import tensorflow_decision_forests as tfdf
import pandas as pd
# Load dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
# Convert dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
# Train model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
# Export model.
model.save("project/model")
(see TensorFlow Decision Forests)
Yggdrasil Decision Forests powers TensorFlow Decision Forests.
The following resources are available:
If you us Yggdrasil Decision Forests in a scientific publication, please cite the following paper: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library.
Bibtex
@inproceedings{GBBSP23,
author = {Mathieu Guillame{-}Bert and
Sebastian Bruch and
Richard Stotz and
Jan Pfeifer},
title = {Yggdrasil Decision Forests: {A} Fast and Extensible Decision Forests
Library},
booktitle = {Proceedings of the 29th {ACM} {SIGKDD} Conference on Knowledge Discovery
and Data Mining, {KDD} 2023, Long Beach, CA, USA, August 6-10, 2023},
pages = {4068--4077},
year = {2023},
url = {https://doi.org/10.1145/3580305.3599933},
doi = {10.1145/3580305.3599933},
}
Raw
Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library, Guillame-Bert et al., KDD 2023: 4068-4077. doi:10.1145/3580305.3599933
Yggdrasil Decision Forests and TensorFlow Decision Forests are developed by:
- Mathieu Guillame-Bert (gbm AT google DOT com)
- Jan Pfeifer (janpf AT google DOT com)
- Sebastian Bruch (sebastian AT bruch DOT io)
- Richard Stotz (richardstotz AT google DOT com)
- Arvind Srinivasan (arvnd AT google DOT com)
Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, check the contribution guidelines.