RefineAnything

Multimodal Region-Specific Refinement for Perfect Local Details

RefineAnything targets region-specific image refinement: given an input image and a user-specified region (e.g., scribble mask or bounding box), it restores fine-grained details—text, logos, thin structures—while keeping all non-edited pixels unchanged. It supports both reference-based and reference-free refinement.

Teaser

News

2026-04-21 — Environment pinning update. For best results (and to avoid color shifts), please use exactly the versions pinned in requirement.txt: diffusers==0.36.0, transformers==4.55.0, safetensors==0.5.3, peft==0.17.0. See Environment Notice below for a visual comparison.
2026-04-21 — Hugging Face Space environment fixed. The online demo now runs on the correct dependency versions, so refinement results are noticeably better: .
2026-04-14 — Community ComfyUI integration by @smthemex: ComfyUI_RefineAnything. Thanks for the great work!
2026-04-14 — Local Gradio demo (app.py) is available for interactive testing.
2026-04-12 — Hugging Face Space demo is live: .
2026-04-09 — Checkpoint released on Hugging Face: .
2026-04-09 — Release inference scripts.
2026-04-08 — Documentation skeleton added; code release coming this month (inference scripts, environment, and checkpoints will be linked here).
TBD — Checkpoints and training/evaluation resources will be announced once finalized.

---

Highlights

Region-accurate refinement — Explicit region cues (scribbles or boxes) steer edits to the target area.
Reference-based and reference-free — Optional reference image for guided local detail recovery.
Strict background preservation — Edits stay inside the target region; training emphasizes seamless boundaries.

---

Comparisons

Reference-free qualitative comparisons

Reference-based qualitative comparisons

Installation

pip install -r requirement.txt

Important — pin these versions exactly. RefineAnything is sensitive to small numerical differences in the underlying libraries. Please install exactly the versions below; using newer or older releases can cause visible artifacts such as color shifts in the refined region.

> diffusers==0.36.0


transformers==4.55.0
safetensors==0.5.3
peft==0.17.0
``
Environment Notice
We have observed that mismatched versions of diffusers / transformers / safetensors / peft can introduce color shifts in the refined region, even when everything else is identical. The example below uses the prompt "remove the hand":


Input (masked region = hand)
Correct environment
Wrong environment (color shift)






If your output shows a mild color/tone mismatch inside the mask while the rest of the image looks fine, the first thing to check is your package versions.
Quick Start
Only three things are required to run RefineAnything:
| Argument | Description |
|----------|-------------|
| --input | Source image |
| --mask | Binary mask (white = region to refine) |
| --prompt | What to refine |
| --ref | (optional) Reference image for guided refinement |
Demo 1 — Reference-based Logo Refinement
Refine a blurry logo on a pillow using a reference image.
bash
python scripts/fast_inference.py \
    --input  src/input1.png \
    --mask   src/mask1.png \
    --prompt "Refine the LOGO." \
    --ref    src/ref1.png \
    --output output/demo1.png


Input
Reference
Prompt




"Refine the LOGO."


Output




Demo 2 — Reference-free Text Refinement
Refine blurry Chinese text on a building sign — no reference image needed.bash
python scripts/fast_inference.py \
    --input  src/input2.png \
    --mask   src/mask2.png \
    --prompt "refine the text '鼎好商城'" \
    --output output/demo2.png


Input
Prompt



"refine the text '鼎好商城'"


Output




Local Gradio Demo
We also provide a Gradio-based web UI for interactive testing. You can brush regions, upload reference images, and adjust all inference parameters in the browser.bash
python app.py

Then open http://localhost:7860 in your browser. The app will automatically download the base model (Qwen/Qwen-Image-Edit-2511) and the RefineAnything LoRA from Hugging Face on first launch.You can specify a custom base model path via the MODEL_DIR environment variable:
bash
MODEL_DIR=/path/to/local/Qwen-Image-Edit-2511 python app.py

Features of the Gradio demo:
Brush-to-select: paint directly on the source image to define the refinement region.
Optional reference image: upload a second image and optionally brush to crop a specific reference area.
Focus crop: automatically crops and zooms into the edit region for higher detail fidelity, then composites back seamlessly.
Lightning LoRA: one-click toggle for faster inference with fewer steps.
Before / After slider: instantly compare input and output.
---
Citation
If you use this repository, please cite:
bibtex
@article{zhou2026refineanything,
  title={RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details},
  author={Zhou, Dewei and Li, You and Yang, Zongxin and Yang, Yi},
  journal={arXiv preprint arXiv:2604.06870},
  year={2026}
}


Acknowledgements and License
RefineAnything builds on ideas and components from the broader diffusion and multimodal ecosystem (including Qwen2.5-VL, Qwen-Image, and latent diffusion with VAE + MMDiT). Base model weights and API terms are subject to their respective licenses—verify compliance before redistributing checkpoints or derived weights.

Repository code license: TBD (e.g., Apache-2.0 or MIT)—set LICENSE` when you open-source the implementation.

--- Tranlated By Open Ai Tx | Last indexed: 2026-06-29 ---

Input	Reference	Prompt
		"Refine the LOGO."
Output

Input	Prompt
	"refine the text '鼎好商城'"
Output

RefineAnything

RefineAnything

News

Highlights

Comparisons

Installation

`Environment Notice`

`Quick Start`

`Demo 1 — Reference-based Logo Refinement`

Demo 2 — Reference-free Text Refinement

Local Gradio Demo

Citation

Acknowledgements and License

Input (masked region = hand)	Correct environment	Wrong environment (color shift)