toml transformers diffusers datasets pillow sentencepiece protobuf peft torch-optimi tensorboard tqdm safetensors bitsandbytes imageio[ffmpeg] av einops accelerate loguru easydict ftfy decord pyloudnorm #deepspeed
InteractAvatar is a novel dual-stream DiT framework that enables talking avatars to perform Grounded Human-Object Interaction (GHOI). Unlike previous methods restricted to simple gestures, our model can perceive the environment from a static reference image and generate complex, text-guided interactions with objects while maintaining high-fidelity lip synchronization.
InteractAvatar is a novel dual-stream DiT framework that enables talking avatars to perform Grounded Human-Object Interaction (GHOI)
In the ./ComfyUI /custom_nodes directory, run the following:
git clone https://github.com/smthemex/ComfyUI_InteractAvatar.git
pip install -r requirements.txt
-- ComfyUI/models/vae
|-- wan2.2_vae.safetensors # or Wan2.2_VAE.pth origin
-- ComfyUI/models/clip
|-- umt5_xxl_fp8_e4m3fn_scaled.safetensors # or fp16
-- ComfyUI/models/diffusion_models
|--interact-avatar-long.safetensors # rename from diffusion_pytorch_model.safetensors long or normal
@article{zhang2026making,
title={Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars},
author={Zhang, Youliang and Zhou, Zhengguang and Yu, Zhentao and Huang, Ziyao and Hu, Teng and Liang, Sen and Zhang, Guozhen and Peng, Ziqiao and Li, Shunkai and Chen, Yi and Zhou, Zixiang and Zhou, Yuan and Lu, Qinglin and Li, Xiu},
journal={arXiv preprint arXiv:2602.01538},
year={2026}
}
We sincerely thank the contributors to the following projects: