🌐 언어

🚀 훈련할 시간이 없다!

훈련 없이 참조 기반 인스턴스 분할

최신 연구 (Papers with Code) _SOTA 1-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 10-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 30-shot_ | -21CBCE?style=flat&logo=paperswithcode)

🚨 업데이트 (2025년 7월 22일): 사용자 지정 데이터셋에 대한 지침이 추가되었습니다!

🔔 업데이트 (2025년 7월 16일): 코드가 설명과 함께 업데이트되었습니다!

📋 목차

🎯 하이라이트
📜 초록
🧠 아키텍처
🛠️ 설치 방법
1. 저장소 클론
2. conda 환경 생성
3. SAM2 및 DinoV2 설치
4. 데이터셋 다운로드
5. SAM2 및 DinoV2 체크포인트 다운로드
📊 추론 코드: Few-shot COCO에서 30-shot SOTA 결과 재현
0. 참조 세트 생성
1. 참조로 메모리 채우기
2. 메모리 뱅크 후처리
3. 대상 이미지에서 추론
결과

🔍 커스텀 데이터셋
0. 커스텀 데이터셋 준비 ⛵🐦
0.1 바운딩 박스 어노테이션만 있을 경우
0.2 coco 어노테이션을 pickle 파일로 변환
1. 참조로 메모리 채우기
2. 메모리 뱅크 후처리
📚 인용

🎯 하이라이트

💡 학습 불필요: 파인튜닝도, 프롬프트 엔지니어링도 없음—참조 이미지만 사용합니다.
🖼️ 참조 기반: 소수의 예시만으로 새로운 객체를 분할합니다.
🔥 최첨단 성능: COCO, PASCAL VOC, Cross-Domain FSOD에서 기존 학습 불필요 방법보다 뛰어난 성능을 보입니다.

링크:

📜 초록

The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).

cdfsod-results-final-comic-sans-min

🧠 Architecture

training-free-architecture-comic-sans-min

🛠️ Installation instructions

1. Clone the repository

git clone https://github.com/miquel-espinosa/no-time-to-train.git
cd no-time-to-train

2. conda 환경 생성

필요한 패키지를 포함한 conda 환경을 생성합니다.

conda env create -f environment.yml
conda activate no-time-to-train

3. SAM2 및 DinoV2 설치

우리는 SAM2와 DinoV2를 소스에서 설치할 것입니다.

pip install -e .
cd dinov2
pip install -e .
cd ..

4. 데이터셋 다운로드

COCO 데이터셋을 다운로드하여 data/coco에 저장하세요.

5. SAM2 및 DinoV2 체크포인트 다운로드

논문에서 사용된 정확한 SAM2 체크포인트를 다운로드할 것입니다. (단, SAM2.1 체크포인트는 이미 제공되고 있으며 더 나은 성능을 보일 수 있습니다.)

mkdir -p checkpoints/dinov2
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
cd dinov2
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
cd ../..

📊 추론 코드

⚠️ 면책 조항: 이 코드는 연구용입니다 — 약간의 혼란이 있을 수 있습니다!

Few-shot COCO에서 30-shot SOTA 결과 재현하기

유용한 변수를 정의하고 결과를 위한 폴더를 만듭니다:

CONFIG=./no_time_to_train/new_exps/coco_fewshot_10shot_Sam2L.yaml
CLASS_SPLIT="few_shot_classes"
RESULTS_DIR=work_dirs/few_shot_results
SHOTS=30
SEED=33
GPUS=4mkdir -p $RESULTS_DIR
FILENAME=few_shot_${SHOTS}shot_seed${SEED}.pkl

#### 0. 참조 세트 생성

python no_time_to_train/dataset/few_shot_sampling.py \
        --n-shot $SHOTS \
        --out-path ${RESULTS_DIR}/${FILENAME} \
        --seed $SEED \
        --dataset $CLASS_SPLIT

#### 1. 참조로 메모리 채우기

python run_lightening.py test --config $CONFIG \
                              --model.test_mode fill_memory \
                              --out_path ${RESULTS_DIR}/memory.ckpt \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.memory_pkl ${RESULTS_DIR}/${FILENAME} \
                              --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

#### 2. 후처리 메모리 뱅크

python run_lightening.py test --config $CONFIG \
                              --model.test_mode postprocess_memory \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --ckpt_path ${RESULTS_DIR}/memory.ckpt \
                              --out_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --trainer.devices 1

#### 3. 대상 이미지에 대한 추론

python run_lightening.py test --config $CONFIG  \
                              --ckpt_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --model.init_args.test_mode test \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.model_cfg.dataset_name $CLASS_SPLIT \
                              --model.init_args.dataset_cfgs.test.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

온라인에서 추론 결과를 실시간으로 확인하고 싶다면 다음 인자를 추가하세요:

    --model.init_args.model_cfg.test.online_vis True

score_thr 매개변수의 점수 임계값을 조정하려면 인자를 추가하세요 (예를 들어, 점수가 0.4보다 높은 모든 인스턴스를 시각화하려면):

    --model.init_args.model_cfg.test.vis_thr 0.4

이미지는 이제 results_analysis/few_shot_classes/에 저장됩니다. 왼쪽 이미지는 실제 정답(ground truth)을, 오른쪽 이미지는 학습 없이 우리 방법으로 찾아낸 분할 인스턴스를 보여줍니다.

이 예제에서는 few_shot_classes 분할을 사용하므로, 해당 분할에 포함된 클래스의 분할된 인스턴스만 볼 수 있습니다(COCO의 모든 클래스가 아님).

#### 결과

검증 세트의 모든 이미지를 실행한 후, 다음과 같은 결과를 얻게 됩니다:

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342

🔍 커스텀 데이터셋

우리는 커스텀 데이터셋에서 파이프라인을 실행하는 방법에 대한 지침을 제공합니다. 어노테이션 포맷은 항상 COCO 포맷입니다.

요약; 커스텀 데이터셋에서 전체 파이프라인을 실행하는 방법을 바로 확인하려면, scripts/matching_cdfsod_pipeline.sh와 함께 CD-FSOD 데이터셋의 예시 스크립트(예: scripts/dior_fish.sh)를 참고하세요.

0. 커스텀 데이터셋 준비하기 ⛵🐦

예를 들어, 커스텀 데이터셋에서 보트⛵와 새🐦를 탐지하고 싶다고 가정해봅시다. 우리의 방법을 사용하려면 다음이 필요합니다:

각 클래스마다 적어도 1장의 어노테이션된 기준 이미지(예: 보트 1장, 새 1장)
원하는 클래스를 찾기 위한 여러 장의 타깃 이미지

우리는 coco 이미지를 사용하여 1-shot 설정에 맞는 커스텀 데이터셋을 만드는 토이 스크립트를 준비했습니다.

mkdir -p data/my_custom_dataset
python scripts/make_custom_dataset.py

이렇게 하면 다음과 같은 폴더 구조를 가진 사용자 지정 데이터셋이 생성됩니다:

data/my_custom_dataset/
    ├── annotations/
    │   ├── custom_references.json
    │   ├── custom_targets.json
    │   └── references_visualisations/
    │       ├── bird_1.jpg
    │       └── boat_1.jpg
    └── images/
        ├── 429819.jpg
        ├── 101435.jpg
        └── (all target and reference images)

참조 이미지 시각화 (1-shot):

0.1 바운딩 박스 주석만 있는 경우

SAM2를 사용하여 인스턴스 수준의 분할 마스크를 생성하는 스크립트도 제공합니다. 이는 참조 이미지에 대해 바운딩 박스 주석만 있는 경우에 유용합니다.

# Download sam_h checkpoint. Feel free to use more recent checkpoints (note: code might need to be adapted)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -O checkpoints/sam_vit_h_4b8939.pth
Run automatic instance segmentation from ground truth bounding boxes.
python no_time_to_train/dataset/sam_bbox_to_segm_batch.py \
    --input_json data/my_custom_dataset/annotations/custom_references.json \
    --image_dir data/my_custom_dataset/images \
    --sam_checkpoint checkpoints/sam_vit_h_4b8939.pth \
    --model_type vit_h \
    --device cuda \
    --batch_size 8 \
    --visualize

SAM2로 gt 바운딩 박스에서 생성된 인스턴스 수준 분할 마스크가 포함된 참조 이미지(1-shot):

생성된 분할 마스크의 시각화 결과는 data/my_custom_dataset/annotations/custom_references_with_SAM_segm/references_visualisations/에 저장됩니다.

0.2 coco 어노테이션을 피클 파일로 변환

python no_time_to_train/dataset/coco_to_pkl.py \
    data/my_custom_dataset/annotations/custom_references_with_segm.json \
    data/my_custom_dataset/annotations/custom_references_with_segm.pkl \
    1

1. 참조로 메모리 채우기

먼저, 유용한 변수를 정의하고 결과를 위한 폴더를 만듭니다. 라벨의 올바른 시각화를 위해 클래스 이름은 json 파일에 나타나는 카테고리 id 순서대로 정렬되어야 합니다. 예를 들어, bird의 카테고리 id는 16, boat의 카테고리 id는 9입니다. 따라서 CAT_NAMES=boat,bird가 됩니다.

DATASET_NAME=my_custom_dataset
DATASET_PATH=data/my_custom_dataset
CAT_NAMES=boat,bird
CATEGORY_NUM=2
SHOT=1
YAML_PATH=no_time_to_train/pl_configs/matching_cdfsod_template.yaml
PATH_TO_SAVE_CKPTS=./tmp_ckpts/my_custom_dataset
mkdir -p $PATH_TO_SAVE_CKPTS

1단계 실행:

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode fill_memory \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \
    --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \
    --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \
    --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

2. 후처리 메모리 뱅크

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode postprocess_memory \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

3. 타겟 이미지에 대한 추론

ONLINE_VIS가 True로 설정되어 있으면, 예측 결과가 results_analysis/my_custom_dataset/에 저장되고 계산되는 대로 표시됩니다. 온라인 시각화와 함께 실행하면 속도가 훨씬 느려진다는 점에 유의하세요.

분할된 인스턴스를 더 많이 또는 더 적게 보고 싶다면 점수 임계값 VIS_THR를 자유롭게 변경하세요.

ONLINE_VIS=True
VIS_THR=0.4
python run_lightening.py test --config $YAML_PATH \
    --model.test_mode test \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --model.init_args.model_cfg.test.imgs_path $DATASET_PATH/images \
    --model.init_args.model_cfg.test.online_vis $ONLINE_VIS \
    --model.init_args.model_cfg.test.vis_thr $VIS_THR \
    --model.init_args.dataset_cfgs.test.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.test.json_file $DATASET_PATH/annotations/custom_targets.json \
    --model.init_args.dataset_cfgs.test.cat_names $CAT_NAMES \
    --trainer.devices 1

결과

성능 지표(위 명령어와 정확히 동일한 매개변수 사용)는 다음과 같아야 합니다:

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.478SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.458

시각적 결과는 results_analysis/my_custom_dataset/에 저장됩니다. 본 방법은 원하는 클래스의 인스턴스가 없는 이미지(즉, false negative)에 대해서도 작동합니다.

이미지를 클릭하면 확대됩니다 ⬇️

| 보트가 있는 대상 이미지 ⛵ (왼쪽 GT, 오른쪽 예측) | 새가 있는 대상 이미지 🐦 (왼쪽 GT, 오른쪽 예측) | |:----------------------:|:----------------------:| | 000000459673 | 000000407180 |

| 보트와 새가 모두 있는 대상 이미지 ⛵🐦 (왼쪽 GT, 오른쪽 예측) | 보트나 새가 없는 대상 이미지 🚫 (왼쪽 GT, 오른쪽 예측) | |:---------------------------------:|:----------------------------------:| | 000000517410 | 000000460598 |

📚 인용

이 연구를 사용하신다면, 아래와 같이 인용해 주세요:

@article{espinosa2025notimetotrain,
  title={No time to train! Training-Free Reference-Based Instance Segmentation},
  author={Miguel Espinosa and Chenhongyi Yang and Linus Ericsson and Steven McDonagh and Elliot J. Crowley},
  journal={arXiv preprint arXiv:2507.02798},
  year={2025},
  primaryclass={cs.CV}
}

--- Tranlated By Open Ai Tx | Last indexed: 2025-09-06 ---