Skip to content

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

License

Notifications You must be signed in to change notification settings

gl-yziquel/yggdrasil-decision-forests

 
 

Repository files navigation

Yggdrasil Decision Forests (YDF) is a production-grade collection of algorithms for the training, serving, and interpretation of decision forest models. YDF is open-source and is available in C++, command-line interface (CLI), TensorFlow (under the name TensorFlow Decision Forests; TF-DF), JavaScript (inference only), and Go (inference only). YDF is supported on Linux, Windows, macOS, Raspberry Pi, and Arduino (experimental).

To learn more about YDF, check the documentation 📕.

For details about YDF design, read our KDD 2023 paper Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library.

Features

  • Random Forest, Gradient Boosted Trees, CART, and variations such as Dart, Extremely randomized trees.
  • Classification, regression, ranking and uplifting.
  • Model evaluation e.g. accuracy, auc, roc, auuc, pr-auc, confidence boundaries, ndgc.
  • Model analysis e.g. pdp, cep, variable importance, model plotting, structure analysis.
  • Native support for numerical, categorical, boolean, categorical-set (e.g. text) features.
  • Native support for missing values.
  • State of the art tree learning features e.g. oblique split, honest tree, hessian score, global tree optimization.
  • Distributed training.
  • Automatic hyper-parameter tuning.
  • Fast model inference e.g. vpred, quick-scorer extended.
  • Cross compatible API and models: C++, CLI, Go, JavaScript and Python.

See the feature list for more details.

Usage example

With the CLI you can train, evaluate, and benchmark the speed of a model as follows:

# Download YDF.
wget https://github.com/google/yggdrasil-decision-forests/releases/download/1.0.0/cli_linux.zip
unzip cli_linux.zip

# Create a training configuration
echo 'label:"my_label" learner:"RANDOM_FOREST" ' > config.pbtxt

# List columns in training dataset
infer_dataspec --dataset="csv:train.csv" --output="spec.pbtxt"

# Train model
train --dataset="csv:train.csv" --dataspec="spec.pbtxt" --config="config.pbtxt" --output="my_model"

# Evaluate model
evaluate --dataset="csv:test.csv" --model="my_model" > evaluation.txt

# Benchmark the speed of the model
benchmark_inference --dataset="csv:test.csv" --model="my_model" > benchmark.txt

(based on examples/beginner.sh)

The same model can be trained in C++ as follows:

auto dataset_path = "csv:train.csv";

// List columns in training dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);

// Create a training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");

// Train model
std::unique_ptr<AbstractLearner> learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);

// Export model
SaveModel("my_model", model.get());

(based on examples/beginner.cc)

The same model can be trained in Python using TensorFlow Decision Forests as follows:

import tensorflow_decision_forests as tfdf
import pandas as pd

# Load dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")

# Convert dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")

# Train model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)

# Export model.
model.save("project/model")

(see TensorFlow Decision Forests)

Google I/O Presentation

Yggdrasil Decision Forests powers TensorFlow Decision Forests.

Documentation & Resources

The following resources are available:

Citation

If you us Yggdrasil Decision Forests in a scientific publication, please cite the following paper: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library.

Bibtex

@inproceedings{GBBSP23,
  author       = {Mathieu Guillame{-}Bert and
                  Sebastian Bruch and
                  Richard Stotz and
                  Jan Pfeifer},
  title        = {Yggdrasil Decision Forests: {A} Fast and Extensible Decision Forests
                  Library},
  booktitle    = {Proceedings of the 29th {ACM} {SIGKDD} Conference on Knowledge Discovery
                  and Data Mining, {KDD} 2023, Long Beach, CA, USA, August 6-10, 2023},
  pages        = {4068--4077},
  year         = {2023},
  url          = {https://doi.org/10.1145/3580305.3599933},
  doi          = {10.1145/3580305.3599933},
}

Raw

Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library, Guillame-Bert et al., KDD 2023: 4068-4077. doi:10.1145/3580305.3599933

Credits

Yggdrasil Decision Forests and TensorFlow Decision Forests are developed by:

  • Mathieu Guillame-Bert (gbm AT google DOT com)
  • Jan Pfeifer (janpf AT google DOT com)
  • Sebastian Bruch (sebastian AT bruch DOT io)
  • Richard Stotz (richardstotz AT google DOT com)
  • Arvind Srinivasan (arvnd AT google DOT com)

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, check the contribution guidelines.

License

Apache License 2.0

About

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 89.3%
  • Go 5.0%
  • Starlark 4.0%
  • JavaScript 0.8%
  • Shell 0.6%
  • HTML 0.1%
  • Other 0.2%