automm
automm
Note that,
AutoGluon.Multimodal 1.0.0 To use MultiModalPredictor for object detection, please first install
without hyperparameter customization, the huge SAM serves as the
default model, which requires efficient fine‑tuning in many cases. After
additional dependencies by
Installation fine‑tuning, evaluate/predict on the test data.
AutoGluon (GitHub) requires pip > 1.4 (upgrade by pip install ‑U pip). mim install mmcv
predictor.evaluate(test_data, metrics=["iou"])
More installation options. AutoGluon supports Python 3.8 to 3.11. pip install mmdet
pred = predictor.predict(test_data)
Installation is available for Linux, MacOS, and Windows. pip install pycocotools-windows # only for windows
pip install autogluon MultiModalPredictor supports common object detection data formats We can visualize the image, the predicted mask before and after
such as COCO (recommended) and VOC. Here we use the dataset fine‑tuning (docs).
Import Autogluon Multimodal: tiny_motorbike_coco to demonstrate how to use MultiModalPredictor.
The predictor natively supports json files in the COCO‑format. We can
from autogluon.multimodal import MultiModalPredictor also visualize the detected bounding boxes with its confidence scores,
train_path = "./Annotations/trainval_cocoformat.json"
test_path = "./Annotations/test_cocoformat.json"
predictor = MultiModalPredictor(
problem_type="object_detection",
image SAM SAM+PEFT
sample_data_path=train_path, As evident from the results, the predicted mask after finetuning is much
)
closer to the groundtruth. This demonstrates the effectiveness of using
predictor.fit(train_path) # Train the detector
MultiModalPredictor to fine‑tune SAM for domain‑specific applications,
enhancing its performance in tasks like leaf disease segmentation
predictor.evaluate(test_path) # Evaluation
predictor.predict(test_path) # Inference Semantic Matching
MultiModalPredictor implements a flexible twin‑tower architecture that
Named‑Entity Recognition can solve text‑text, image‑image, and text‑image matching problems
(docs). Here is an example of finetuning the matching model via
MultiModalPredictor supports named‑entity recognition. We use MIT relevance data, demonstrated via the Flickr30K image‑text matching
movies corpus to demonstrate the usage, which can be downloaded dataset preprocessed in the dataframe format: flickr30k.zip.
from train.csv and test.csv.
import pandas as pd
import pandas as pd train_data = pd.read_csv("train.csv", index_col=0)
predictor = MultiModalPredictor( tdata = pd.read_csv("test.csv", index_col=0)
Classification & Regression
problem_type="ner", label="entity_annotations"
)
# Train model
To finetune model, just specify the “query” and “response” keys when
MultiModalPredictor finetunes foundation models for solving creating predictor and pick “image_text_similarity” as problem type.
classification and regression problems with image, text, and tabular predictor.fit(pd.read_csv("train.csv"))
features. Here, we use a simplified version petfinder_for_tutorial from # Evaluation
predictor = MultiModalPredictor(
the PetFinder dataset. MultiModalPredictor automatically analyzes the predictor.evaluate(pd.read_csv("test.csv"))
query="caption",
columns in the input dataframe to detect categorical, numerical, text, # Inference response="image",
and images (stored as paths or bytearrays). text = "Game of Thrones is an American fantasy " problem_type="image_text_similarity",
"drama TV series created by David Benioff" )
import pandas as pd pred = predictor.predict('text_snippet': [text]}) # Finetuning (optional, zero-shot is also supported)
train_data = pd.read_csv('train.csv', index_col=0) predictor.fit(train_data, time_limit=180)
test_data = pd.read_csv('test.csv', index_col=0)
Image Segmentation # Extract embedding
e_i = predictor.extract_embedding(tdata["image"])
To train the model, just call ‘.fit()’. We also support customization (docs). While Segment Anything Model (SAM) performs exceptionally well on e_t = predictor.extract_embedding(tdata["caption"])
generic scenes, it encounters challenges when applied to specialized
predictor = MultiModalPredictor( domains like manufacturing, agriculture, etc. MultiModalPredictor
problem_type="classification", mitigates the issue by fine‑tuning on domain specific data. Below is an • MultiModalPredictor also supports model and hyperparameter
label="AdoptionSpeed" example on Leaf Disease Segmentation dataset. customization (docs), knowledge distillation (docs), few shot learning
) (docs), parameter‑efficient finetuning (docs), HPO (docs), and more.
predictor.fit(train_data) import pandas as pd • For deployment, check AutoGluon‑Cloud.
train_data = pd.read_csv('train.csv', index_col=0)
• For other use‑cases, check TabularPredictor and TimeSeriesPredictor.
To extract embedding, or evaluate/inference on the test set. test_data = pd.read_csv('test.csv', index_col=0)
image_col, label_col = 'image', 'label' • Check the latest version of this cheat sheet.
predictor.extract_embedding(test_data) predictor = MultiModalPredictor(
predictor.evaluate(test_data) # Evaluation problem_type="semantic_segmentation", • Any questions? Ask here
label=label_col,
predictor.predict(test_data) # Inference • Like what you see? Consider starring AutoGluon on GitHub and
)
# To Predict probability
# Finetuning (optional, zero-shot is also supported) following us on twitter to get notified of the latest updates!
predictor.predict_proba(test_data)
predictor.fit(train_data=train_data)