opencv-python diffusers==0.33.0 transformers==4.45.1 accelerate pandas numpy einops tqdm loguru imageio imageio-ffmpeg safetensors #gradio==4.42.0 #fastapi==0.115.12 #uvicorn==0.34.2 decord librosa scikit-video ffmpeg flash-attn omegaconf
TIPS:
1.Installation
In the ./ComfyUI /custom_node directory, run the following:
git clone https://github.com/smthemex/ComfyUI_HunyuanAvatar_Sm.git
2.requirements
pip install -r requirements.txt
3 models
├── ComfyUI/models/HunyuanAvatar/
| ├── det_align/
| ├──detface.pt
| ├── llava_llama_image/
| ├──config.json
| ├── ...所有json文件以及所有safetensors模型
| ├──text_encoder_2/
| ├──config.json
| ├── ... 所有json文件以及model.safetensors模型
| ├──vae/
| ├──config.json
| ├── pytorch_model.pt
| ├──whisper-tiny/
| ├──config.json
| ├── ... 所有json文件以及model.safetensors模型
| ├── mp_rank_00_model_states_fp8_map.pt #104K if use fp8 如果用fp8则下载
| ├── mp_rank_00_model_states_fp8.pt.pt #24.9G if use fp8 如果用fp8则下载
| ├──mp_rank_00_model_states.pt
4 example
If you find HunyuanVideo-Avatar useful for your research and applications, please cite using this BibTeX:
@misc{hu2025HunyuanVideo-Avatar,
title={HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters},
author={Yi Chen and Sen Liang and Zixiang Zhou and Ziyao Huang and Yifeng Ma and Junshu Tang and Qin Lin and Yuan Zhou and Qinglin Lu},
year={2025},
eprint={2505.20156},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/pdf/2505.20156},
}
We would like to thank the contributors to the HunyuanVideo, SD3, FLUX, Llama, LLaVA, Xtuner, diffusers and HuggingFace repositories, for their open research and exploration.