⚡️BytevalKit-Emb: One-Stop Embedding Model Evaluation Tool

English | 中文

Overview

BytevalKit-Emb is a modular embedding model evaluation framework that implements automated model performance assessment through standardized processes. The framework adopts a configuration-driven design and supports multiple task types and model architectures.

Core Features

Multi-type Model Support: Supports various model calls including GritLM/SentenceTransformers/GME, and supports both single-modal and multi-modal models
Automated Evaluation Pipeline: Complete automated pipeline of "dataset loading - model calling - evaluation metrics calculation"
Extended Evaluation Methods: Supports not only MTEB and MMEB evaluation tasks, but also custom Retrieval, Classification, and Similarity Classification evaluation tasks
Flexible Configuration System: YAML-based configuration system, easy to customize and extend
Extensible and Reproducible: Quickly support new models/evaluation tasks based on BaseModel and BaseTask; complete recording of embeddings & related results during evaluation, reproducible debugging of evaluation results

Changelog

🎉 [2025.06.13]: BytevalKit-Emb v1.0.0 first open source release
📚 [2025.06.13]: Documentation and tutorials are now online

Installation

Install from Source

Clone the repository and install:

Recommended Python Version: Python 3.9 or above

git clone https://github.com/bytedance/BytevalKit-Emb.git
cd BytevalKit-Emb
pip install -r requirements.txt

Quick Start

For more detailed usage instructions, including how to evaluate models, add custom models/datasets/evaluation metrics, please refer to Usage Instructions.

Basic Usage

Start evaluation task:

python3 run.py --yaml-path={workspace}/configs/config.yaml

For example YAML configuration, refer to Example YAML Configuration

Configuration Parameters

DEFAULT:  # Task-level configuration
    task_name: eval_task_1  # Evaluation task name
    work_dir: {workspace}/outputs  # Directory for storing evaluation inference results, metric results, etc.
DATASET:  # Dataset-level configuration
    dataset_xxxx:
        type: mteb_classification  # Evaluation task type, options: classification, mteb_classification, retrieval, similarity_classification
        name: IFlyTek  # Evaluation dataset name
        data_dir: {workspace}/demo/datasets/mteb_classification/IFlyTek-classification  # Evaluation dataset path
        data_type: parquet  # Dataset file format
        
        # For other configuration parameters, refer to the documentation for each evaluation task

MODEL:  # Model-level configuration
    model_paraphrase-multilingual-MiniLM-L12-v2:
        type: sentence_transformer  # Model type, options: sentence_transformer, gritlm
        name: paraphrase-multilingual-MiniLM-L12-v2  # Model name
        path_or_dir: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2  # Model save path
        model_kwargs:  # Model loading parameters
            revision: "v1.1"
        preprocessors: []  # Pre-inference processors
        worker_num: 20  # Inference concurrency number

Benchmark

Note: To demonstrate that our framework is applicable to MTEB and MMEB evaluation methods, we use open-source models to validate the framework on some evaluation datasets from MTEB and MMEB evaluation scripts. The evaluation datasets and evaluation logic are sourced from MTEB and MMEB evaluation scripts.

The following are only framework evaluation results, models are not ranked in any particular order

MTEB-Classification

Model	IFlyTek-classification	JDReview-classification	MultilingualSentiment-classification	OnlineShopping-classification	TNews-classification	waimai-classification
xiaobu-embedding	49.29	85.56	76.83	92.75	26.01	88.1
xiaobu-embedding-v2	51.21	88.47	79.38	94.5	27.3	88.85
Conan-embedding-v1	51.52	90.07	78.6	95	27.5	89.7
gte-base-zh	47.67	85.83	75.28	93.8	26.72	87.85
gte-large-zh	49.83	88	76.33	91.75	25.8	88.05
gte-Qwen2-1.5B-instruct	39.75	80.49	67.92	87.6	25.23	84.75
bge-large-zh-v1.5	48.21	85.02	74.15	92.74	26.08	86.7

MTEB-Similarity Classification

Model	CMNLI	Ocnli
xiaobu-embedding	55.3	55.93
xiaobu-embedding-v2	51.44	51.27
Conan-embedding-v1	54.46	51.38
gte-base-zh	63.04	60.8
gte-large-zh	76.2	73.03
gte-Qwen2-1.5B-instruct	53.27	53.65
bge-large-zh-v1.5	67.66	62.59

MTEB-Retrieval(NDCG@10)

	CmedqaRetrieval	CovidRetrieval	DuRetrieval	MedicalRetrieval	MMarcoRetrieval	T2Retrieval	VideoRetrieval
xiaobu-embedding	44.47	87.75	86.81	63.19	78.39	86.22	73.17
xiaobu-embedding-v2	47.38	89.5	89.68	67.98	82.26	85.59	80.08
Conan-embedding-v1	47.78	91.23	88.79	67.13	82.27	83.79	80.29
gte-base-zh	44.57	75.71	84.09	65.02	77.71	83.91	74.38
gte-large-zh	43.42	88.44	85.65	62.81	77.52	82.95	73.01
bge-large-zh-v1.5	41.81	73.03	88.76	57.35	78.77	84.29	70.89

MMEB

Model	ChartQA	DocVQA	ImageNet-1K	ImageNet-A	ImageNet-R	MSCOCO_t2i	ObjectNet	OK-VQA	VisDial
gme-Qwen2-VL-2B-Instruct	8.3	17.5	26.5	12.5	60.1	53.5	31.1	11.8	30.1
gme-Qwen2-VL-7B-Instruct	15.3	33.6	65.2	42.3	87.1	71.1	66.6	32.3	62.5

System Architecture

Architecture Design

Contributing

This project is developed by the BytevalKit team, development members:

{Zirui Guo, Hanyu Li, Shenwei Huang}, Yaling Mou, Xianxian Ma, 
Ming Jiang, Haizhen Liao, Jingwei Sun, Binbin Xing

{*} Equal Contributions.

We also thank the Bytedance Douyin Content Team for their support:

Jiefeng Long, Zhihe Wan, Zhenming Sun, Yongchao Liu, Xulei Lou, Shuang Zeng, Xing Lin, Chao Wang, 
Fubang Zhao, QingSong Liu, Song Chen, Xiao Liang, Yixing Chen, Mingyu Guo, Bolun Cai, 
Yi Lin, Junfeng Yao, Chao Feng, Jiao Ran

And the support provided by Product design and Byteval platform team:

Ziyu Shi, Zhao Lin, Yang Li, Jing Yang, Zhen Wang, Guojun Ma

And from AI platform team:

Huiyu Yu, Lin Dong, Yong Zhang

We welcome contributions of all kinds! Please check our Contributing Guide for details.

Citation

If you use BytevalKit-Emb in your research, please consider citing:

@misc{BytevalKit-Emb-2025,
  title={BytevalKit-Emb: Comprehensive Embedding Model Evaluation Framework},
  author={BytevalKit},
  year={2025},
  howpublished={\url{https://github.com/bytedance/BytevalKit-Emb}}
}

License

BytevalKit-Emb is licensed under the Apache License 2.0.

Contact Us

If you have any questions, feel free to contact us at: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
classification_head		classification_head
dataset		dataset
demo		demo
docs		docs
evaluator		evaluator
model		model
task		task
utils		utils
vector_db		vector_db
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_zh-CN.md		README_zh-CN.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡️BytevalKit-Emb: One-Stop Embedding Model Evaluation Tool

Overview

Core Features

Changelog

Installation

Install from Source

Quick Start

Basic Usage

Configuration Parameters

Benchmark

MTEB-Classification

MTEB-Similarity Classification

MTEB-Retrieval(NDCG@10)

MMEB

System Architecture

Architecture Design

Contributing

Citation

License

Contact Us

About

Uh oh!

Releases 1

Languages

License

bytedance/BytevalKit-Emb

Folders and files

Latest commit

History

Repository files navigation

⚡️BytevalKit-Emb: One-Stop Embedding Model Evaluation Tool

Overview

Core Features

Changelog

Installation

Install from Source

Quick Start

Basic Usage

Configuration Parameters

Benchmark

MTEB-Classification

MTEB-Similarity Classification

MTEB-Retrieval(NDCG@10)

MMEB

System Architecture

Architecture Design

Contributing

Citation

License

Contact Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages