🌐 語言

🚀 沒有時間訓練！

無需訓練的參考式實例分割

最先進技術（Papers with Code） _SOTA 1-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 10-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 30-shot_ | -21CBCE?style=flat&logo=paperswithcode)

🚨 更新（2025年7月22日）： 已新增自訂資料集的操作說明！

🔔 更新（2025年7月16日）： 程式碼已更新並附上操作指引！

🔍 自訂資料集
0. 準備自訂資料集 ⛵🐦
0.1 僅有邊界框註釋時
0.2 將 COCO 註釋轉換為 pickle 檔案
1. 以參考資料填充記憶體
2. 記憶庫後處理
📚 引用

🎯 亮點

💡 免訓練：無需微調、無需提示工程—只需一張參考影像。
🖼️ 參考式：僅用少量範例即可分割新物件。
🔥 SOTA 效能：在 COCO、PASCAL VOC 及跨領域 FSOD 上超越既有免訓練方法。

連結：

📜 摘要

The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).

cdfsod-results-final-comic-sans-min

🧠 Architecture

training-free-architecture-comic-sans-min

🛠️ Installation instructions

1. Clone the repository

git clone https://github.com/miquel-espinosa/no-time-to-train.git
cd no-time-to-train

2. 建立 conda 環境

我們將建立一個包含所需套件的 conda 環境。

conda env create -f environment.yml
conda activate no-time-to-train

3. 安裝 SAM2 和 DinoV2

我們將從原始碼安裝 SAM2 和 DinoV2。

pip install -e .
cd dinov2
pip install -e .
cd ..

4. 下載數據集

請下載 COCO 數據集並將其放置於 data/coco

5. 下載 SAM2 和 DinoV2 權重檔

我們將下載論文中使用的 SAM2 權重檔。（但請注意，SAM2.1 權重檔已經可用，且可能有更好的表現。）

mkdir -p checkpoints/dinov2
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
cd dinov2
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
cd ../..

📊 推論程式碼

⚠️ 免責聲明：這是研究用程式碼——請預期會有一些混亂！

在 Few-shot COCO 中重現 30-shot SOTA 結果

定義有用的變數並為結果建立資料夾：

CONFIG=./no_time_to_train/new_exps/coco_fewshot_10shot_Sam2L.yaml
CLASS_SPLIT="few_shot_classes"
RESULTS_DIR=work_dirs/few_shot_results
SHOTS=30
SEED=33
GPUS=4mkdir -p $RESULTS_DIR
FILENAME=few_shot_${SHOTS}shot_seed${SEED}.pkl

#### 0. 建立參考集

python no_time_to_train/dataset/few_shot_sampling.py \
        --n-shot $SHOTS \
        --out-path ${RESULTS_DIR}/${FILENAME} \
        --seed $SEED \
        --dataset $CLASS_SPLIT

#### 1. 用參考填充記憶體

python run_lightening.py test --config $CONFIG \
                              --model.test_mode fill_memory \
                              --out_path ${RESULTS_DIR}/memory.ckpt \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.memory_pkl ${RESULTS_DIR}/${FILENAME} \
                              --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

#### 2. 後處理記憶體庫

python run_lightening.py test --config $CONFIG \
                              --model.test_mode postprocess_memory \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --ckpt_path ${RESULTS_DIR}/memory.ckpt \
                              --out_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --trainer.devices 1

#### 3. 在目標影像上進行推論

python run_lightening.py test --config $CONFIG  \
                              --ckpt_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --model.init_args.test_mode test \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.model_cfg.dataset_name $CLASS_SPLIT \
                              --model.init_args.dataset_cfgs.test.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

如果您希望在線查看推論結果（在計算時即時顯示），請添加以下參數：

    --model.init_args.model_cfg.test.online_vis True

要調整分數閾值 score_thr 參數，請添加該參數（例如，僅顯示分數高於 0.4 的所有實例）：

    --model.init_args.model_cfg.test.vis_thr 0.4

圖像現在將被儲存在 results_analysis/few_shot_classes/。左側的圖像顯示了真實標註，右側的圖像則顯示了我們無需訓練方法找到的分割實例。

請注意，在這個例子中我們使用的是 few_shot_classes 分割，因此，我們應該只會看到這個分割中的類別分割實例（而不是 COCO 中的所有類別）。

#### 結果

在運行完驗證集中的所有圖像後，你應該會得到：

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342

🔍 自訂資料集

我們提供了在自訂資料集上運行我們流程的操作說明。註釋格式一律採用 COCO 格式。

總結； 若要直接查看如何在自訂資料集上運行完整流程，請參考 scripts/matching_cdfsod_pipeline.sh 以及 CD-FSOD 資料集的範例腳本（例如 scripts/dior_fish.sh）

0. 準備自訂資料集 ⛵🐦

假設我們想在自訂資料集中偵測船隻⛵和鳥類🐦。要使用我們的方法，您需要：

每個類別至少 1 張已註釋的參考圖片（即 1 張船的參考圖片和 1 張鳥的參考圖片）
多張目標圖片，用以尋找我們目標類別的實例。

我們已準備了一個簡易腳本，能夠以 coco 圖片創建自訂資料集，適用於1-shot設置。

mkdir -p data/my_custom_dataset
python scripts/make_custom_dataset.py

這將會建立一個具有以下資料夾結構的自訂資料集：

data/my_custom_dataset/
    ├── annotations/
    │   ├── custom_references.json
    │   ├── custom_targets.json
    │   └── references_visualisations/
    │       ├── bird_1.jpg
    │       └── boat_1.jpg
    └── images/
        ├── 429819.jpg
        ├── 101435.jpg
        └── (all target and reference images)

參考圖片視覺化（1-shot）：

0.1 僅有 bbox 標註時

我們也提供一個腳本，利用 SAM2 生成實例級分割遮罩。這在僅有參考圖片的邊界框標註時非常有用。

# Download sam_h checkpoint. Feel free to use more recent checkpoints (note: code might need to be adapted)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -O checkpoints/sam_vit_h_4b8939.pth
Run automatic instance segmentation from ground truth bounding boxes.
python no_time_to_train/dataset/sam_bbox_to_segm_batch.py \
    --input_json data/my_custom_dataset/annotations/custom_references.json \
    --image_dir data/my_custom_dataset/images \
    --sam_checkpoint checkpoints/sam_vit_h_4b8939.pth \
    --model_type vit_h \
    --device cuda \
    --batch_size 8 \
    --visualize

帶有實例級分割遮罩的參考圖像（由 SAM2 根據 gt 邊界框生成，1-shot）：

產生的分割遮罩視覺化結果已儲存在 data/my_custom_dataset/annotations/custom_references_with_SAM_segm/references_visualisations/。

0.2 將 coco 標註轉換為 pickle 檔案

python no_time_to_train/dataset/coco_to_pkl.py \
    data/my_custom_dataset/annotations/custom_references_with_segm.json \
    data/my_custom_dataset/annotations/custom_references_with_segm.pkl \
    1

1. 以參考資料填充記憶體

首先，定義有用的變數並建立一個用於儲存結果的資料夾。為了正確顯示標籤，類別名稱應按照 json 檔案中出現的類別 id 順序排列。例如，bird 的類別 id 為 16，boat 的類別 id 為 9。因此，CAT_NAMES=boat,bird。

DATASET_NAME=my_custom_dataset
DATASET_PATH=data/my_custom_dataset
CAT_NAMES=boat,bird
CATEGORY_NUM=2
SHOT=1
YAML_PATH=no_time_to_train/pl_configs/matching_cdfsod_template.yaml
PATH_TO_SAVE_CKPTS=./tmp_ckpts/my_custom_dataset
mkdir -p $PATH_TO_SAVE_CKPTS

執行步驟 1：

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode fill_memory \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \
    --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \
    --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \
    --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

2. 後處理記憶體庫

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode postprocess_memory \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

#### 2.1 視覺化後處理記憶體庫

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode vis_memory \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \
    --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \
    --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \
    --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

記憶庫影像的 PCA 及 K-means 視覺化結果儲存在 results_analysis/memory_vis/my_custom_dataset。

3. 對目標影像進行推論

如果將 ONLINE_VIS 設為 True，預測結果會儲存在 results_analysis/my_custom_dataset/，並在計算時即時顯示。請注意，開啟即時視覺化會大幅降低運行速度。

歡迎調整分數閾值 VIS_THR，以查看更多或更少分割的實例。

ONLINE_VIS=True
VIS_THR=0.4
python run_lightening.py test --config $YAML_PATH \
    --model.test_mode test \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --model.init_args.model_cfg.test.imgs_path $DATASET_PATH/images \
    --model.init_args.model_cfg.test.online_vis $ONLINE_VIS \
    --model.init_args.model_cfg.test.vis_thr $VIS_THR \
    --model.init_args.dataset_cfgs.test.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.test.json_file $DATASET_PATH/annotations/custom_targets.json \
    --model.init_args.dataset_cfgs.test.cat_names $CAT_NAMES \
    --trainer.devices 1

結果

性能指標（使用與上述指令完全相同的參數）應如下所示：

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.478SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.458

視覺結果已儲存在 results_analysis/my_custom_dataset/。請注意，我們的方法適用於偽陰性，也就是那些不包含任何目標類別實例的影像。

點擊圖片以放大 ⬇️

| 含有船隻的目標影像 ⛵（左為GT，右為預測） | 含有鳥類的目標影像 🐦（左為GT，右為預測） | |:----------------------:|:----------------------:| | 000000459673 | 000000407180 |

| 含有船隻與鳥類的目標影像 ⛵🐦（左為GT，右為預測） | 不含船隻或鳥類的目標影像 🚫（左為GT，右為預測） | |:---------------------------------:|:----------------------------------:| | 000000517410 | 000000460598 |

📚 引用

如果您使用本研究，請引用我們：

@article{espinosa2025notimetotrain,
  title={No time to train! Training-Free Reference-Based Instance Segmentation},
  author={Miguel Espinosa and Chenhongyi Yang and Linus Ericsson and Steven McDonagh and Elliot J. Crowley},
  journal={arXiv preprint arXiv:2507.02798},
  year={2025},
  primaryclass={cs.CV}
}

--- Tranlated By Open Ai Tx | Last indexed: 2026-01-15 ---