ninja Pillow einops safetensors timm tomesd torch torchdiffeq torchsde decord datasets torchvision opencv-python>=4.9.0.80 diffusers transformers tokenizers>=0.20.3 accelerate>=1.1.1 tqdm easydict ftfy dashscope imageio-ffmpeg numpy>=1.23.5,<2 scikit-image opencv-python omegaconf SentencePiece albumentations imageio[ffmpeg] imageio[pyav] tensorboard beautifulsoup4 ftfy librosa torchaudio
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation,you can try it in ComfyUI
In the ./ComfyUI/custom_nodes directory, run the following:
git clone https://github.com/smthemex/ComfyUI_StableAvatar.git
audio-separator仅在推理歌曲时有用。
pip install audio-separator --no-deps # optional if need vocal
pip install -r requirements.txt
3.1 fromFrancisRing/StableAvatar downlaod “Wan2.1_VAE.pth” ,”diffusion_pytorch_model.safetensors” and “config.json “,”Kim_Vocal_2.onnx” ,”transformer3d-rec-vec.pt” or “transformer3d-square.pt ” 底模有2个可选
3.2 use comfyui ,clipvison-h and umt5_xxl_fp8_e4m3fn_scaled.safetensors
3.4 if use echomimic v3,just only download”transformer3d-rec-vec.pt” or “transformer3d-square.pt ” and “Kim_Vocal_2.onnx” / 如果也用echomimic v3,仅需下载底模和Kim_Vocal_2.onnx,会自动调用echomimic的模型
3.5 可选/optional lora kijai
├── ComfyUI/models/StableAvatar/transformer
| ├── diffusion_pytorch_model.safetensors # Wan2.1-Fun-V1.1-1.3B-InP transformer #3.13G 务必注意模型同名。
| ├── config.json
├── ComfyUI/models/StableAvatar/wav2vec2-base-960h
| ├── all config json files
| ├── model.safetensors
├── ComfyUI/models/clip
| ├── umt5_xxl_fp8_e4m3fn_scaled.safetensors #comfy
├── ComfyUI/models/clip_vision
| ├──clipvison-h # 1.26G comfy
├── ComfyUI/models/diffusion_models/
| ├──transformer3d-rec-vec.pt # FrancisRing/StableAvatar 二选一
| ├──transformer3d-square.pt # FrancisRing/StableAvatar
├── ComfyUI/models/vae
| ├── Wan2.1_VAE.pth
├── ComfyUI/models/StableAvatar/ # 音频分离用
| ├──Kim_Vocal_2.onnx
├── ComfyUI/models/loras/
| ├──lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors #KJ
@article{tu2025stableavatar,
title={StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation},
author={Tu, Shuyuan and Pan, Yueming and Huang, Yinming and Han, Xintong and Xing, Zhen and Dai, Qi and Luo, Chong and Wu, Zuxuan and Jiang Yu-Gang},
journal={arXiv preprint arXiv:2508.08248},
year={2025}
}