🌐 ภาษา

🚀 ไม่มีเวลาฝึก!

การแบ่งส่วนวัตถุแต่ละชิ้นจากการอ้างอิงโดยไม่ต้องฝึกฝน

เทคโนโลยีล้ำสมัย (Papers with Code) _SOTA 1-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 10-shot_ | -21CBCE?style=flat&logo=paperswithcode)

_SOTA 30-shot_ | -21CBCE?style=flat&logo=paperswithcode)

🚨 อัปเดต (22 กรกฎาคม 2025): ได้เพิ่มคำแนะนำสำหรับชุดข้อมูลที่กำหนดเองแล้ว!

🔔 อัปเดต (16 กรกฎาคม 2025): โค้ดได้รับการอัปเดตพร้อมคำแนะนำแล้ว!

📋 สารบัญ

🎯 ไฮไลท์
📜 บทคัดย่อ
🧠 สถาปัตยกรรม
🛠️ คำแนะนำการติดตั้ง
1. โคลนรีโพซิทอรี
2. สร้างสภาพแวดล้อม conda
3. ติดตั้ง SAM2 และ DinoV2
4. ดาวน์โหลดชุดข้อมูล
5. ดาวน์โหลด checkpoints ของ SAM2 และ DinoV2
📊 โค้ดสำหรับ Inference: ทำซ้ำผลลัพธ์ SOTA แบบ 30-shot ใน Few-shot COCO
0. สร้างชุดอ้างอิง
1. เติมข้อมูลอ้างอิงลงในหน่วยความจำ
2. ประมวลผลหน่วยความจำเพิ่มเติม
3. Inference บนภาพเป้าหมาย
ผลลัพธ์

🔍 ข้อมูลชุดทดลองแบบกำหนดเอง
0. เตรียมข้อมูลชุดทดลองแบบกำหนดเอง ⛵🐦
0.1 หากมีเพียงการระบุ bbox เท่านั้น
0.2 แปลงการระบุ coco เป็นไฟล์ pickle
1. เติมหน่วยความจำด้วยข้อมูลอ้างอิง
2. ประมวลผลหน่วยความจำหลังการบันทึก
📚 การอ้างอิง

🎯 จุดเด่น

💡 ไม่ต้องฝึกสอน: ไม่ต้องปรับแต่งเพิ่มเติม ไม่ต้องออกแบบ prompt—ใช้เพียงภาพตัวอย่างอ้างอิง
🖼️ อ้างอิงจากตัวอย่าง: แยกวัตถุใหม่โดยใช้ตัวอย่างเพียงไม่กี่ภาพ
🔥 ประสิทธิภาพระดับ SOTA: ทำผลงานเหนือกว่าวิธีที่ไม่ใช้การฝึกสอนเดิมบน COCO, PASCAL VOC และ Cross-Domain FSOD

ลิงก์:

📜 บทคัดย่อ

The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).

cdfsod-results-final-comic-sans-min

🧠 Architecture

training-free-architecture-comic-sans-min

🛠️ Installation instructions

1. Clone the repository

git clone https://github.com/miquel-espinosa/no-time-to-train.git
cd no-time-to-train

2. สร้างสภาพแวดล้อม conda

เราจะสร้างสภาพแวดล้อม conda พร้อมกับแพ็กเกจที่จำเป็น

conda env create -f environment.yml
conda activate no-time-to-train

3. ติดตั้ง SAM2 และ DinoV2

เราจะติดตั้ง SAM2 และ DinoV2 จากซอร์สโค้ด

pip install -e .
cd dinov2
pip install -e .
cd ..

4. ดาวน์โหลดชุดข้อมูล

กรุณาดาวน์โหลดชุดข้อมูล COCO และวางไว้ใน data/coco

5. ดาวน์โหลด SAM2 และ DinoV2 checkpoints

เราจะดาวน์โหลด SAM2 checkpoints ที่ใช้ในบทความนี้โดยตรง (อย่างไรก็ตาม โปรดทราบว่า SAM2.1 checkpoints มีให้ใช้งานแล้วและอาจให้ประสิทธิภาพที่ดีกว่า)

mkdir -p checkpoints/dinov2
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
cd dinov2
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
cd ../..

📊 โค้ดสำหรับการอนุมาน

⚠️ ข้อสงวนสิทธิ์: นี่คือโค้ดสำหรับการวิจัย — อาจมีความยุ่งเหยิงบ้าง!

การทำซ้ำผลลัพธ์ SOTA แบบ 30 ตัวอย่างใน Few-shot COCO

กำหนดตัวแปรที่มีประโยชน์และสร้างโฟลเดอร์สำหรับผลลัพธ์:

CONFIG=./no_time_to_train/new_exps/coco_fewshot_10shot_Sam2L.yaml
CLASS_SPLIT="few_shot_classes"
RESULTS_DIR=work_dirs/few_shot_results
SHOTS=30
SEED=33
GPUS=4mkdir -p $RESULTS_DIR
FILENAME=few_shot_${SHOTS}shot_seed${SEED}.pkl

#### 0. สร้างชุดข้อมูลอ้างอิง

python no_time_to_train/dataset/few_shot_sampling.py \
        --n-shot $SHOTS \
        --out-path ${RESULTS_DIR}/${FILENAME} \
        --seed $SEED \
        --dataset $CLASS_SPLIT

#### 1. เติมหน่วยความจำด้วยการอ้างอิง

python run_lightening.py test --config $CONFIG \
                              --model.test_mode fill_memory \
                              --out_path ${RESULTS_DIR}/memory.ckpt \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.memory_pkl ${RESULTS_DIR}/${FILENAME} \
                              --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOTS \
                              --model.init_args.dataset_cfgs.fill_memory.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

#### 2. ประมวลผลหน่วยความจำธนาคารหลังการประมวลผล

python run_lightening.py test --config $CONFIG \
                              --model.test_mode postprocess_memory \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --ckpt_path ${RESULTS_DIR}/memory.ckpt \
                              --out_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --trainer.devices 1

#### 3. การอนุมานบนภาพเป้าหมาย

python run_lightening.py test --config $CONFIG  \
                              --ckpt_path ${RESULTS_DIR}/memory_postprocessed.ckpt \
                              --model.init_args.test_mode test \
                              --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \
                              --model.init_args.model_cfg.dataset_name $CLASS_SPLIT \
                              --model.init_args.dataset_cfgs.test.class_split $CLASS_SPLIT \
                              --trainer.logger.save_dir ${RESULTS_DIR}/ \
                              --trainer.devices $GPUS

หากคุณต้องการดูผลการอนุมานแบบออนไลน์ (ขณะที่กำลังคำนวณ) ให้เพิ่มอาร์กิวเมนต์:

    --model.init_args.model_cfg.test.online_vis True

เพื่อปรับค่าเกณฑ์คะแนนพารามิเตอร์ score_thr ให้เพิ่มอาร์กิวเมนต์ (ตัวอย่างเช่น เพื่อแสดงผลอินสแตนซ์ทั้งหมดที่มีคะแนนสูงกว่า 0.4):

    --model.init_args.model_cfg.test.vis_thr 0.4

ขณะนี้ภาพจะถูกบันทึกไว้ใน results_analysis/few_shot_classes/ ภาพทางซ้ายแสดงค่าความจริงพื้นฐาน ส่วนภาพทางขวาแสดงอินสแตนซ์ที่ถูกแบ่งส่วนซึ่งพบโดยวิธีที่ไม่ต้องฝึกสอนของเรา

โปรดทราบว่าในตัวอย่างนี้เราใช้ชุด few_shot_classes ดังนั้นเราควรคาดหวังว่าจะเห็นเฉพาะอินสแตนซ์ที่ถูกแบ่งส่วนของคลาสในชุดนี้ (ไม่ใช่ทุกคลาสใน COCO)

#### ผลลัพธ์

หลังจากรันภาพทั้งหมดในชุด validation คุณควรจะได้รับ:

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342

🔍 ชุดข้อมูลที่กำหนดเอง

เรามีคำแนะนำสำหรับการรัน pipeline ของเรากับชุดข้อมูลที่กำหนดเอง รูปแบบการกำกับข้อมูลจะต้องเป็นแบบ COCO เสมอ

โดยสรุป; หากต้องการดูวิธีการรัน pipeline แบบเต็มบน ชุดข้อมูลที่กำหนดเอง โดยตรง ให้ดูที่ scripts/matching_cdfsod_pipeline.sh พร้อมกับสคริปต์ตัวอย่างของชุดข้อมูล CD-FSOD (เช่น scripts/dior_fish.sh)

0. เตรียมชุดข้อมูลที่กำหนดเอง ⛵🐦

สมมติว่าเราต้องการตรวจจับ เรือ⛵ และ นก🐦 ในชุดข้อมูลที่กำหนดเอง เพื่อใช้วิธีการของเรา คุณจะต้องมี:

อย่างน้อย 1 ภาพอ้างอิงที่ มีการกำกับข้อมูล สำหรับแต่ละคลาส (เช่น 1 ภาพอ้างอิงสำหรับเรือ และ 1 ภาพอ้างอิงสำหรับนก)
ภาพเป้าหมายหลายภาพเพื่อค้นหาตัวอย่างของคลาสที่ต้องการ

เราได้เตรียมสคริปต์ตัวอย่างสำหรับสร้างชุดข้อมูลที่กำหนดเองโดยใช้ภาพ coco สำหรับกรณี 1-shot

mkdir -p data/my_custom_dataset
python scripts/make_custom_dataset.py

สิ่งนี้จะสร้างชุดข้อมูลที่กำหนดเองโดยมีโครงสร้างโฟลเดอร์ดังนี้:

data/my_custom_dataset/
    ├── annotations/
    │   ├── custom_references.json
    │   ├── custom_targets.json
    │   └── references_visualisations/
    │       ├── bird_1.jpg
    │       └── boat_1.jpg
    └── images/
        ├── 429819.jpg
        ├── 101435.jpg
        └── (all target and reference images)

การแสดงผลภาพอ้างอิง (1-shot):

0.1 หากมีเฉพาะ annotation ประเภท bbox เท่านั้น

เรายังมีสคริปต์สำหรับสร้าง segmentation mask ระดับอินสแตนซ์โดยใช้ SAM2 ซึ่งจะมีประโยชน์หากคุณมี annotation เฉพาะ bounding box สำหรับภาพอ้างอิงเท่านั้น

# Download sam_h checkpoint. Feel free to use more recent checkpoints (note: code might need to be adapted)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -O checkpoints/sam_vit_h_4b8939.pth
Run automatic instance segmentation from ground truth bounding boxes.
python no_time_to_train/dataset/sam_bbox_to_segm_batch.py \
    --input_json data/my_custom_dataset/annotations/custom_references.json \
    --image_dir data/my_custom_dataset/images \
    --sam_checkpoint checkpoints/sam_vit_h_4b8939.pth \
    --model_type vit_h \
    --device cuda \
    --batch_size 8 \
    --visualize

ภาพอ้างอิงพร้อมมาสก์เซกเมนต์ระดับอินสแตนซ์ (สร้างโดย SAM2 จากกรอบบอกซ์จริง, 1-shot):

ภาพแสดงผลของมาสก์เซกเมนต์ที่สร้างขึ้นถูกบันทึกไว้ใน data/my_custom_dataset/annotations/custom_references_with_SAM_segm/references_visualisations/

0.2 แปลง annotation แบบ coco เป็นไฟล์ pickle

python no_time_to_train/dataset/coco_to_pkl.py \
    data/my_custom_dataset/annotations/custom_references_with_segm.json \
    data/my_custom_dataset/annotations/custom_references_with_segm.pkl \
    1

1. เติมหน่วยความจำด้วยการอ้างอิง

ก่อนอื่น กำหนดตัวแปรที่จำเป็นและสร้างโฟลเดอร์สำหรับผลลัพธ์ เพื่อให้แสดงป้ายกำกับได้ถูกต้อง ชื่อคลาสควรเรียงตามรหัสหมวดหมู่ที่ปรากฏในไฟล์ json เช่น bird มีรหัสหมวดหมู่ 16, boat มีรหัสหมวดหมู่ 9 ดังนั้น CAT_NAMES=boat,bird

DATASET_NAME=my_custom_dataset
DATASET_PATH=data/my_custom_dataset
CAT_NAMES=boat,bird
CATEGORY_NUM=2
SHOT=1
YAML_PATH=no_time_to_train/pl_configs/matching_cdfsod_template.yaml
PATH_TO_SAVE_CKPTS=./tmp_ckpts/my_custom_dataset
mkdir -p $PATH_TO_SAVE_CKPTS

รันขั้นตอนที่ 1:

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode fill_memory \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \
    --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \
    --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \
    --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

2. หน่วยความจำแบงก์หลังการประมวลผล

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode postprocess_memory \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \
    --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

#### 2.1 แสดงภาพหน่วยความจำที่ผ่านการประมวลผลแล้ว

python run_lightening.py test --config $YAML_PATH \
    --model.test_mode vis_memory \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \
    --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \
    --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \
    --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --trainer.devices 1

การแสดงผลภาพด้วย PCA และ K-means สำหรับภาพใน memory bank ถูกจัดเก็บไว้ที่ results_analysis/memory_vis/my_custom_dataset

3. การอนุมานบนภาพเป้าหมาย

หากตั้งค่า ONLINE_VIS เป็น True ผลการทำนายจะถูกบันทึกไว้ใน results_analysis/my_custom_dataset/ และแสดงผลในขณะที่คำนวณ โปรดทราบว่าการรันด้วยการแสดงผลออนไลน์จะช้ากว่ามาก

คุณสามารถปรับเกณฑ์คะแนน VIS_THR เพื่อดูอินสแตนซ์ที่ถูกแบ่งส่วนมากขึ้นหรือน้อยลงได้ตามต้องการ

ONLINE_VIS=True
VIS_THR=0.4
python run_lightening.py test --config $YAML_PATH \
    --model.test_mode test \
    --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \
    --model.init_args.model_cfg.dataset_name $DATASET_NAME \
    --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \
    --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \
    --model.init_args.model_cfg.test.imgs_path $DATASET_PATH/images \
    --model.init_args.model_cfg.test.online_vis $ONLINE_VIS \
    --model.init_args.model_cfg.test.vis_thr $VIS_THR \
    --model.init_args.dataset_cfgs.test.root $DATASET_PATH/images \
    --model.init_args.dataset_cfgs.test.json_file $DATASET_PATH/annotations/custom_targets.json \
    --model.init_args.dataset_cfgs.test.cat_names $CAT_NAMES \
    --trainer.devices 1

ผลลัพธ์

ตัวชี้วัดประสิทธิภาพ (โดยใช้พารามิเตอร์เดียวกับคำสั่งด้านบน) ควรเป็นดังนี้:

BBOX RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.478SEGM RESULTS:
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.458

ผลลัพธ์เชิงภาพจะถูกบันทึกไว้ใน results_analysis/my_custom_dataset/ โปรดทราบว่าวิธีการของเราสามารถใช้ได้กับกรณีที่ตรวจไม่พบ (false negatives) กล่าวคือ รูปภาพที่ไม่มีวัตถุในคลาสที่ต้องการ

คลิกที่ภาพเพื่อขยาย ⬇️

| ภาพเป้าหมายที่มีเรือ ⛵ (ซ้าย GT, ขวาคือผลทำนาย) | ภาพเป้าหมายที่มีนก 🐦 (ซ้าย GT, ขวาคือผลทำนาย) | |:----------------------:|:----------------------:| | 000000459673 | 000000407180 |

| ภาพเป้าหมายที่มีทั้งเรือและนก ⛵🐦 (ซ้าย GT, ขวาคือผลทำนาย) | ภาพเป้าหมายที่ไม่มีเรือหรือไม่มีนก 🚫 (ซ้าย GT, ขวาคือผลทำนาย) | |:---------------------------------:|:----------------------------------:| | 000000517410 | 000000460598 |

📚 การอ้างอิง

หากคุณใช้ผลงานนี้ กรุณาอ้างอิงถึงเรา:

@article{espinosa2025notimetotrain,
  title={No time to train! Training-Free Reference-Based Instance Segmentation},
  author={Miguel Espinosa and Chenhongyi Yang and Linus Ericsson and Steven McDonagh and Elliot J. Crowley},
  journal={arXiv preprint arXiv:2507.02798},
  year={2025},
  primaryclass={cs.CV}
}

--- Tranlated By Open Ai Tx | Last indexed: 2026-01-15 ---