Overview | Changelog | Installation | Quick Start | Configuration | Benchmark | System Architecture | License | Concat Us
BytevalKit-Emb is a modular embedding model evaluation framework that implements automated model performance assessment through standardized processes. The framework adopts a configuration-driven design and supports multiple task types and model architectures.
- Multi-type Model Support: Supports various model calls including GritLM/SentenceTransformers/GME, and supports both single-modal and multi-modal models
- Automated Evaluation Pipeline: Complete automated pipeline of "dataset loading - model calling - evaluation metrics calculation"
- Extended Evaluation Methods: Supports not only MTEB and MMEB evaluation tasks, but also custom Retrieval, Classification, and Similarity Classification evaluation tasks
- Flexible Configuration System: YAML-based configuration system, easy to customize and extend
- Extensible and Reproducible: Quickly support new models/evaluation tasks based on BaseModel and BaseTask; complete recording of embeddings & related results during evaluation, reproducible debugging of evaluation results
- 🎉 [2025.06.13]: BytevalKit-Emb v1.0.0 first open source release
- 📚 [2025.06.13]: Documentation and tutorials are now online
Clone the repository and install:
Recommended Python Version: Python 3.9 or above
git clone https://github.com/bytedance/BytevalKit-Emb.git
cd BytevalKit-Emb
pip install -r requirements.txt
For more detailed usage instructions, including how to evaluate models, add custom models/datasets/evaluation metrics, please refer to Usage Instructions.
Start evaluation task:
python3 run.py --yaml-path={workspace}/configs/config.yaml
For example YAML configuration, refer to Example YAML Configuration
DEFAULT: # Task-level configuration
task_name: eval_task_1 # Evaluation task name
work_dir: {workspace}/outputs # Directory for storing evaluation inference results, metric results, etc.
DATASET: # Dataset-level configuration
dataset_xxxx:
type: mteb_classification # Evaluation task type, options: classification, mteb_classification, retrieval, similarity_classification
name: IFlyTek # Evaluation dataset name
data_dir: {workspace}/demo/datasets/mteb_classification/IFlyTek-classification # Evaluation dataset path
data_type: parquet # Dataset file format
# For other configuration parameters, refer to the documentation for each evaluation task
MODEL: # Model-level configuration
model_paraphrase-multilingual-MiniLM-L12-v2:
type: sentence_transformer # Model type, options: sentence_transformer, gritlm
name: paraphrase-multilingual-MiniLM-L12-v2 # Model name
path_or_dir: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 # Model save path
model_kwargs: # Model loading parameters
revision: "v1.1"
preprocessors: [] # Pre-inference processors
worker_num: 20 # Inference concurrency number
Note: To demonstrate that our framework is applicable to MTEB and MMEB evaluation methods, we use open-source models to validate the framework on some evaluation datasets from MTEB and MMEB evaluation scripts. The evaluation datasets and evaluation logic are sourced from MTEB and MMEB evaluation scripts.
The following are only framework evaluation results, models are not ranked in any particular order
Model | IFlyTek-classification | JDReview-classification | MultilingualSentiment-classification | OnlineShopping-classification | TNews-classification | waimai-classification |
---|---|---|---|---|---|---|
xiaobu-embedding | 49.29 | 85.56 | 76.83 | 92.75 | 26.01 | 88.1 |
xiaobu-embedding-v2 | 51.21 | 88.47 | 79.38 | 94.5 | 27.3 | 88.85 |
Conan-embedding-v1 | 51.52 | 90.07 | 78.6 | 95 | 27.5 | 89.7 |
gte-base-zh | 47.67 | 85.83 | 75.28 | 93.8 | 26.72 | 87.85 |
gte-large-zh | 49.83 | 88 | 76.33 | 91.75 | 25.8 | 88.05 |
gte-Qwen2-1.5B-instruct | 39.75 | 80.49 | 67.92 | 87.6 | 25.23 | 84.75 |
bge-large-zh-v1.5 | 48.21 | 85.02 | 74.15 | 92.74 | 26.08 | 86.7 |
Model | CMNLI | Ocnli |
---|---|---|
xiaobu-embedding | 55.3 | 55.93 |
xiaobu-embedding-v2 | 51.44 | 51.27 |
Conan-embedding-v1 | 54.46 | 51.38 |
gte-base-zh | 63.04 | 60.8 |
gte-large-zh | 76.2 | 73.03 |
gte-Qwen2-1.5B-instruct | 53.27 | 53.65 |
bge-large-zh-v1.5 | 67.66 | 62.59 |
CmedqaRetrieval | CovidRetrieval | DuRetrieval | MedicalRetrieval | MMarcoRetrieval | T2Retrieval | VideoRetrieval | |
---|---|---|---|---|---|---|---|
xiaobu-embedding | 44.47 | 87.75 | 86.81 | 63.19 | 78.39 | 86.22 | 73.17 |
xiaobu-embedding-v2 | 47.38 | 89.5 | 89.68 | 67.98 | 82.26 | 85.59 | 80.08 |
Conan-embedding-v1 | 47.78 | 91.23 | 88.79 | 67.13 | 82.27 | 83.79 | 80.29 |
gte-base-zh | 44.57 | 75.71 | 84.09 | 65.02 | 77.71 | 83.91 | 74.38 |
gte-large-zh | 43.42 | 88.44 | 85.65 | 62.81 | 77.52 | 82.95 | 73.01 |
bge-large-zh-v1.5 | 41.81 | 73.03 | 88.76 | 57.35 | 78.77 | 84.29 | 70.89 |
Model | ChartQA | DocVQA | ImageNet-1K | ImageNet-A | ImageNet-R | MSCOCO_t2i | ObjectNet | OK-VQA | VisDial |
---|---|---|---|---|---|---|---|---|---|
gme-Qwen2-VL-2B-Instruct | 8.3 | 17.5 | 26.5 | 12.5 | 60.1 | 53.5 | 31.1 | 11.8 | 30.1 |
gme-Qwen2-VL-7B-Instruct | 15.3 | 33.6 | 65.2 | 42.3 | 87.1 | 71.1 | 66.6 | 32.3 | 62.5 |
This project is developed by the BytevalKit team, development members:
{Zirui Guo, Hanyu Li, Shenwei Huang}, Yaling Mou, Xianxian Ma,
Ming Jiang, Haizhen Liao, Jingwei Sun, Binbin Xing
{*} Equal Contributions.
We also thank the Bytedance Douyin Content Team for their support:
Jiefeng Long, Zhihe Wan, Zhenming Sun, Yongchao Liu, Xulei Lou, Shuang Zeng, Xing Lin, Chao Wang,
Fubang Zhao, QingSong Liu, Song Chen, Xiao Liang, Yixing Chen, Mingyu Guo, Bolun Cai,
Yi Lin, Junfeng Yao, Chao Feng, Jiao Ran
And the support provided by Product design and Byteval platform team:
Ziyu Shi, Zhao Lin, Yang Li, Jing Yang, Zhen Wang, Guojun Ma
And from AI platform team:
Huiyu Yu, Lin Dong, Yong Zhang
We welcome contributions of all kinds! Please check our Contributing Guide for details.
If you use BytevalKit-Emb in your research, please consider citing:
@misc{BytevalKit-Emb-2025,
title={BytevalKit-Emb: Comprehensive Embedding Model Evaluation Framework},
author={BytevalKit},
year={2025},
howpublished={\url{https://github.com/bytedance/BytevalKit-Emb}}
}
BytevalKit-Emb is licensed under the Apache License 2.0.
If you have any questions, feel free to contact us at: [email protected]