Web Analytics

ComfyUI-DaVinci-MagiHuman

⭐ 94 stars English by mjansrud

Edit: This repo has been archived as I have not been able to generate good (enough) results with the model, I'll stick to LTX2.3 for now. Feel free to keep working on it.

Edit: NOTE! This is still work in progress, do not expect it to work. I'm going away on Easter holiday and have no time to look at it before I'm back. Feel free to fork it and continue the work, or wait for Kijai to release his version.

The code will (for now, will be changed later) automatically download the required text encoder and wan VAE from Huggingface, expect it to take some time on the first run.

ComfyUI-DaVinci-MagiHuman

ComfyUI custom nodes for daVinci-MagiHuman, a 15B parameter single-stream transformer for fast audio-video generation. Optimized for consumer GPUs (RTX 5090 32GB).

Features

Nodes

| Node | Description | |------|-------------| | DaVinci Model Loader | Load distill/base/SR model with configurable blocks_on_gpu | | DaVinci TurboVAE Loader | Load the fast decode-only VAE | | DaVinci Text Encode | Text prompt to embeddings (accepts external T5 encoder) | | DaVinci Sampler | Denoising loop (8 steps distill / 32 steps base) | | DaVinci Super Resolution | Upscale 256p latent to 1080p with SR model | | DaVinci Decode | TurboVAE latent-to-video with output offload | | DaVinci Video Output | Save to mp4/webm via FFmpeg |

Workflow

Model Loader (distill, 8 blocks on GPU)
  → Text Encode
    → Sampler (256p, 8 steps)
      → [optional] SR Model Loader (1080p_sr) → Super Resolution
        → TurboVAE Loader → Decode → Video Output

Requirements

Model Setup

Download model weights from HuggingFace:

cd ComfyUI/models

Clone without large files

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/GAIR/daVinci-MagiHuman

cd daVinci-MagiHuman

Pull only what you need (skip 540p_sr if you only want 1080p)

git lfs pull --include="distill/,turbo_vae/" # ~61GB - base generation git lfs pull --include="1080p_sr/*" # ~61GB - 1080p upscaling

Expected directory structure:

ComfyUI/models/daVinci-MagiHuman/
├── distill/          # 8-step distilled model (~61GB)
├── 1080p_sr/         # Super-resolution model (~61GB)
├── turbo_vae/        # Fast decoder (small)
├── base/             # Full 32-step model (optional, ~30GB)
└── 540p_sr/          # 540p SR (optional, ~61GB)

VRAM Guide

| blocks_on_gpu | VRAM Usage | Speed | Recommended For | |-----------------|-----------|-------|-----------------| | 4 | ~3GB + overhead | Slowest | 16GB GPUs | | 8 | ~6GB + overhead | Good | 24-32GB GPUs | | 16 | ~12GB + overhead | Fast | 48GB GPUs | | 40 | ~30GB | Fastest | 80GB+ GPUs |

Text Encoder

daVinci-MagiHuman uses T5Gemma-9B as its text encoder. The DaVinci Text Encode node currently provides:

For production use, connect a T5-XXL or T5Gemma encoder node to the t5_embeds input.

Architecture

The model is a single-stream transformer that jointly generates video and audio:

Credits

License

Apache 2.0

--- Tranlated By Open Ai Tx | Last indexed: 2026-04-22 ---