🌐 语言

🚀 没时间训练！

无需训练的基于参考的实例分割

最新技术（Papers with Code） _SOTA 1-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 10-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 30-shot_ | -21CBCE?style=flat&logo=paperswithcode)

🚨 更新（2025年7月22日）： 已添加自定义数据集的使用说明！

🔔 更新（2025年7月16日）： 代码已更新并附带使用说明！

🔍 自定义数据集
0. 准备自定义数据集 ⛵🐦
0.1 如果只有边界框标注
0.2 将 COCO 标注转换为 pickle 文件
1. 用参考填充内存
2. 内存库后处理
📚 引用

🎯 亮点

💡 免训练：无需微调，无需提示工程——只需一张参考图片。
🖼️ 基于参考：仅用少量示例即可分割新目标。
🔥 SOTA 性能：在 COCO、PASCAL VOC 和跨域 FSOD 上超越以往免训练方法。

链接：

📜 摘要

The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).

cdfsod-results-final-comic-sans-min

🧠 Architecture

training-free-architecture-comic-sans-min

🛠️ Installation instructions

1. Clone the repository

git clone https://github.com/miquel-espinosa/no-time-to-train.git
cd no-time-to-train

2. 创建 conda 环境

我们将创建一个包含所需软件包的 conda 环境。

conda env create -f environment.yml
conda activate no-time-to-train

3. 安装 SAM2 和 DinoV2

我们将从源码安装 SAM2 和 DinoV2。

pip install -e .
cd dinov2
pip install -e .
cd ..

4. 下载数据集

请下载 COCO 数据集并将其放置在 data/coco

5. 下载 SAM2 和 DinoV2 检查点

我们将下载论文中使用的确切 SAM2 检查点。（但请注意，SAM2.1 检查点已经可用，且可能表现更好。）

mkdir -p checkpoints/dinov2
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
cd dinov2
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
cd ../..

📊 推理代码

⚠️ 免责声明：这是研究代码——可能会有些混乱！

在少量样本 COCO 中复现 30-shot SOTA 结果

定义有用的变量并创建结果文件夹：

CONFIG=./no_time_to_train/new_exps/coco_fewshot_10shot_Sam2L.yaml
CLASS_SPLIT="few_shot_classes"
RESULTS_DIR=work_dirs/few_shot_results
SHOTS=30
SEED=33
GPUS=4mkdir -p $RESULTS_DIR
FILENAME=few_shot_${SHOTS}shot_seed${SEED}.pkl

#### 0. 创建参考集

python no_time_to_train/dataset/few_shot_sampling.py \
        --n-shot $SHOTS \
        --out-path ${RESULTS_DIR}/${FILENAME} \
        --seed $SEED \
        --dataset $CLASS_SPLIT

#### 1. 使用引用填充内存

python run_lightening.py test --config $CONFIG \
                              --model.test_mode fill_memory \
                              --out_path ${RESULTS_DIR}/memory.ckpt \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.memory_pkl ${RESULTS_DIR}/${FILENAME} \
                              --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

#### 2. 后处理内存库

python run_lightening.py test --config $CONFIG \
                              --model.test_mode postprocess_memory \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --ckpt_path ${RESULTS_DIR}/memory.ckpt \
                              --out_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --trainer.devices 1

#### 3. 对目标图像的推断

python run_lightening.py test --config $CONFIG  \
                              --ckpt_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --model.init_args.test_mode test \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.model_cfg.dataset_name $CLASS_SPLIT \
                              --model.init_args.dataset_cfgs.test.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

如果您希望在线查看推理结果（即结果在计算时显示），请添加参数：

    --model.init_args.model_cfg.test.online_vis True

要调整分数阈值 score_thr 参数，请添加参数（例如，显示所有分数高于 0.4 的实例）：

    --model.init_args.model_cfg.test.vis_thr 0.4

图像现在将保存在 results_analysis/few_shot_classes/ 目录下。左侧的图像显示的是真实标签，右侧的图像展示的是我们无训练方法分割出的实例。

请注意，在此示例中我们使用的是 few_shot_classes 划分，因此我们只应看到此划分中类别的分割实例（而不是 COCO 中的所有类别）。

#### 结果

在对验证集中的所有图像运行后，你应该获得：

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342

🔍 自定义数据集

我们提供了在自定义数据集上运行我们流水线的说明。标注格式始终为 COCO 格式。

简要说明： 想直接了解如何在自定义数据集上运行完整流水线，请参考 scripts/matching_cdfsod_pipeline.sh 以及 CD-FSOD 数据集的示例脚本（如 scripts/dior_fish.sh）

0. 准备自定义数据集 ⛵🐦

假设我们想在自定义数据集中检测船只⛵ 和鸟类🐦。使用我们的方法需要：

每个类别至少有 1 张标注的参考图片（即 1 张船的参考图片和 1 张鸟的参考图片）
多张目标图片，用于寻找我们所需类别的实例。

我们已经准备了一个玩具脚本，利用 coco 图片创建一个1-shot设置的自定义数据集。

mkdir -p data/my_custom_dataset
python scripts/make_custom_dataset.py

这将创建一个具有以下文件夹结构的自定义数据集：

data/my_custom_dataset/
    ├── annotations/
    │   ├── custom_references.json
    │   ├── custom_targets.json
    │   └── references_visualisations/
    │       ├── bird_1.jpg
    │       └── boat_1.jpg
    └── images/
        ├── 429819.jpg
        ├── 101435.jpg
        └── (all target and reference images)

参考图像可视化（1-shot）：

0.1 如果只提供了bbox标注

我们还提供了一个脚本，利用SAM2生成实例级分割掩码。如果您仅有参考图像的边界框（bounding box）标注，这将非常有用。

# Download sam_h checkpoint. Feel free to use more recent checkpoints (note: code might need to be adapted)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -O checkpoints/sam_vit_h_4b8939.pth
Run automatic instance segmentation from ground truth bounding boxes.
python no_time_to_train/dataset/sam_bbox_to_segm_batch.py \
    --input_json data/my_custom_dataset/annotations/custom_references.json \
    --image_dir data/my_custom_dataset/images \
    --sam_checkpoint checkpoints/sam_vit_h_4b8939.pth \
    --model_type vit_h \
    --device cuda \
    --batch_size 8 \
    --visualize

参考图像及其实例级分割掩码（由 SAM2 根据 gt 边界框生成，1-shot）：

生成的分割掩码的可视化结果保存在 data/my_custom_dataset/annotations/custom_references_with_SAM_segm/references_visualisations/。

0.2 将 coco 标注转换为 pickle 文件

python no_time_to_train/dataset/coco_to_pkl.py \
    data/my_custom_dataset/annotations/custom_references_with_segm.json \
    data/my_custom_dataset/annotations/custom_references_with_segm.pkl \
    1

1. 用引用填充内存

首先，定义有用的变量并为结果创建一个文件夹。为了正确显示标签，类别名称应按 json 文件中出现的类别 id 顺序排列。例如，bird 的类别 id 是 16，boat 的类别 id 是 9。因此，CAT_NAMES=boat,bird。

DATASET_NAME=my_custom_dataset
DATASET_PATH=data/my_custom_dataset
CAT_NAMES=boat,bird
CATEGORY_NUM=2
SHOT=1
YAML_PATH=no_time_to_train/pl_configs/matching_cdfsod_template.yaml
PATH_TO_SAVE_CKPTS=./tmp_ckpts/my_custom_dataset
mkdir -p $PATH_TO_SAVE_CKPTS

运行步骤 1：

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode fill_memory \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \
    --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \
    --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \
    --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

2. 后处理存储库

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode postprocess_memory \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

3. 对目标图像进行推理

如果将 ONLINE_VIS 设置为 True，预测结果将保存在 results_analysis/my_custom_dataset/ 并在计算时显示。请注意，开启在线可视化会大大降低运行速度。

可以自由修改分数阈值 VIS_THR 以查看更多或更少的分割实例。

ONLINE_VIS=True
VIS_THR=0.4
python run_lightening.py test --config $YAML_PATH \
    --model.test_mode test \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --model.init_args.model_cfg.test.imgs_path $DATASET_PATH/images \
    --model.init_args.model_cfg.test.online_vis $ONLINE_VIS \
    --model.init_args.model_cfg.test.vis_thr $VIS_THR \
    --model.init_args.dataset_cfgs.test.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.test.json_file $DATASET_PATH/annotations/custom_targets.json \
    --model.init_args.dataset_cfgs.test.cat_names $CAT_NAMES \
    --trainer.devices 1

结果

性能指标（使用与上述命令完全相同的参数）应为：

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.478SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.458

可视化结果保存在 results_analysis/my_custom_dataset/。请注意，我们的方法适用于假阴性，即不包含任何目标类别实例的图像。

点击图片放大 ⬇️

| 包含船只的目标图像 ⛵（左为GT，右为预测） | 包含鸟类的目标图像 🐦（左为GT，右为预测） | |:----------------------:|:----------------------:| | 000000459673 | 000000407180 |

| 同时包含船只和鸟类的目标图像 ⛵🐦（左为GT，右为预测） | 不含船只和鸟类的目标图像 🚫（左为GT，右为预测） | |:---------------------------------:|:----------------------------------:| | 000000517410 | 000000460598 |

📚 引用

如果您使用了本工作，请引用我们：

@article{espinosa2025notimetotrain,
  title={No time to train! Training-Free Reference-Based Instance Segmentation},
  author={Miguel Espinosa and Chenhongyi Yang and Linus Ericsson and Steven McDonagh and Elliot J. Crowley},
  journal={arXiv preprint arXiv:2507.02798},
  year={2025},
  primaryclass={cs.CV}
}

--- Tranlated By Open Ai Tx | Last indexed: 2025-09-06 ---