accelerate==0.21.0 av==11.0.0 clip @ https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip#sha256=b5842c25da441d6c581b53a5c60e0c2127ebafe0f746f8e15561a006c6c3be6a decord==0.6.0 diffusers==0.24.0 einops==0.4.1 gradio==3.41.2 gradio_client==0.5.0 imageio==2.33.0 imageio-ffmpeg==0.4.9 numpy==1.23.5 omegaconf==2.2.3 onnxruntime-gpu==1.16.3 open-clip-torch==2.20.0 opencv-contrib-python==4.8.1.78 opencv-python==4.8.1.78 Pillow==9.5.0 scikit-image==0.21.0 scikit-learn==1.3.2 scipy==1.11.4 torch==2.0.1 torchdiffeq==0.2.3 torchmetrics==1.2.1 torchsde==0.2.5 torchvision==0.15.2 tqdm==4.66.1 transformers==4.30.2 mlflow==2.9.2 xformers==0.0.22 controlnet-aux==0.0.7 ffmpeg-python soundfile mediapipe decord IPython scenedetect
You can use EchoMimic & EchoMimic V2 & EchoMimic V3 in comfyui.
Echomimic:Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Echomimic_v2: Towards Striking, Simplified, and Semi-Body Human Animation
Echomimic_v3:1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
# Previous
In the ./ComfyUI /custom_nodes directory, run the following:
git clone https://github.com/smthemex/ComfyUI_EchoMimic.git
pip install -r requirements.txt
If use v1 version 如果要使用V1版本:
pip install --no-deps facenet-pytorch
If use v3 version 如果要使用V3版本: # v3 flash do not use it #flash模型不用
pip install retina-face==0.0.17 #使用须外网下载模型,待处理
pip install mmgp # optional 可选
pip install tensorflow==2.15.0 #高版本可能会报错,存疑
pip uninstall ffmpeg
pip install ffmpeg-python
3.1 v3 version
3.1.1 from Wan2.1-Fun-V1.1-1.3B-InPdownlaod Wan2.1_VAE.pth and diffusion_pytorch_model.safetensors v3 and v3 flash
3.1.2 use comfyui ,clipvison-h and umt5_xxl_fp8_e4m3fn_scaled.safetensors v3 and v3 flash
3.1.3 wav2vec2-base-960h v3 only
3.1.4 BadToBest/EchoMimicV3 transformer v3 only
3.1.5 retinaface.h5 目录下没有一般会自动下载 v3 only
3.1.6 可选/optional lora kijai v3 only
3.1.7 BadToBest/EchoMimicV3/echomimicv3-flash-pro v3 flash
3.1.8 chinese-wav2vec2-base v3 flash
├── ComfyUI/models/echo_mimic/transformer
| ├── diffusion_pytorch_model.safetensors # Wan2.1-Fun-V1.1-1.3B-InP transformer #3.13G 务必注意模型同名。 v3 and v3 flash
| ├── config.json
├── ComfyUI/models/echo_mimic/wav2vec2-base-960h # v3 only
| ├── all config json files
| ├── model.safetensors
├── ComfyUI/models/clip
| ├── umt5_xxl_fp8_e4m3fn_scaled.safetensors # v3 and v3 flash
├── ComfyUI/models/clip_vision # v3 and v3 flash
| ├──clipvison-h # 1.26G
├── ComfyUI/models/echo_mimic/
| ├──diffusion_pytorch_model.safetensors # BadToBest/EchoMimicV3 v3 only
├── ComfyUI/models/echo_mimic/echomimicv3-flash-pro/
| ├──diffusion_pytorch_model.safetensors # BadToBest/EchoMimicV3 v3 flashonly
├── ComfyUI/models/echo_mimic/chinese-wav2vec2-base/ #v3 flashonly
| ├──chinese-wav2vec2-base-fairseq-ckpt.pt
| ├──model.safetensors
| ├──all config
├── ComfyUI/models/vae
| ├── Wan2.1_VAE.pth # v3 and v3 flash
├── ComfyUI/models/echo_mimic/.deepface/weights/ #注意.deepface前面有个点,这个是方便不能翻墙玩家 # v3 only
| ├──retinaface.h5
├── ComfyUI/models/loras/
| ├──lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors #KJ # v3 only
3.2 V1 & V2 Shared model v1 和 v2 共用的模型:
如果能直连抱脸,点击就会自动下载所需模型,不需要手动下载.
├── ComfyUI/models/ echo_mimic
| ├── unet
| ├── diffusion_pytorch_model.bin
| ├── config.json
| ├── audio_processor
| ├── whisper_tiny.pt
├── ComfyUI/models/vae
| ├── diffusion_pytorch_model.safetensors or rename sd-vae-ft-mse.safetensors
3.3 V1 models V1使用以下模型:
├── ComfyUI/models/echo_mimic
| ├── denoising_unet.pth
| ├── face_locator.pth
| ├── motion_module.pth
| ├── reference_unet.pth
Audio-Drived Algo Inference acc 音频驱动加速版
| ├── denoising_unet_acc.pth
| ├── motion_module_acc.pth
├── ComfyUI/models/echo_mimic
| ├── denoising_unet_pose.pth
| ├── face_locator_pose.pth
| ├── motion_module_pose.pth
| ├── reference_unet_pose.pth
Using Pose-Drived Algo Inference ACC 姿态驱动加速版
| ├── denoising_unet_pose_acc.pth
| ├── motion_module_pose_acc.pth
3.4 v2 version
use model below V2, Automatic download, you can manually add it 使用以下模型,使用及自动下载,你可以手动添加:
模型地址address:huggingface
├── ComfyUI/models/echo_mimic/v2
| ├── denoising_unet.pth
| ├── motion_module.pth
| ├── pose_encoder.pth
| ├── reference_unet.pth
if use acc 姿态驱动加速版
| ├── denoising_unet_acc.pth
| ├── motion_module_acc.pth
YOLOm8 download link
sapiens pose download link
sapiens的pose 模型可以量化为fp16的,详细见我的sapiens插件 地址
├── ComfyUI/models/echo_mimic
| ├── yolov8m.pt
| ├── sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2 or/或者 sapiens_1b_goliath_best_goliath_AP_639_torchscript_fp16.pt2
a. 单纯音频驱动视频生成模式,infer_mode可选常规的“audio_drived” 和加速版”audio_drived_acc” 模型;
b. pose驱动生成模式,常规选项为pose_normal_sapiens/pose_normal_dwpose(等同) 加速版本为”pose_acc”模型;
—-motion_sync:pose驱动时,如果打开且video_file有视频文件时,生成pkl文件,并生成参考视频的视频;pkl文件在input\tensorrt_lite 目录下,再次使用需要重启comfyUI。
—-motion_sync:如果关闭且pose_dir不为none的时候,读取选定的pose_dir目录名的pkl文件,生成pose视频;如果pose_dir为空的时候,生成基于默认assets\test_pose_demo_pose的视频
a. infer_mode选择audio_drive,pose_dir 选择列表里的几个默认pose,则使用默认的npy pose文件;
b. infer_mode选择audio_drive,pose_dir 选择已有的npy文件夹(位于…ComfyUI/input/tensorrt_lite目录下);
c. infer_mode选择pose_normal_dwpose 或pose_normal_sapiens,video_images连接视频入口,确认…ComfyUI/models/echo_mimic 下有yolov8m.pt 和sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2 模型,根据输入视频生成npy文件(可以下次用)和视频
a. 基于retina-face库生成
b. 如果retina-face调用失败,则以默认的女性face作为mask
特别的选项:
6 Citation
EchoMimici
@misc{chen2024echomimic,
title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
EchoMimici-V2
@misc{meng2024echomimic,
title={EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation},
author={Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma},
year={2024},
eprint={2411.10061},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
EchoMimici-V3
@misc{meng2025echomimicv3,
title={EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation},
author={Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma},
year={2025},
eprint={2507.03905},
archivePrefix={arXiv}
}
LightX2V
@misc{lightx2v,
author = {LightX2V Contributors},
title = {LightX2V: Light Video Generation Inference Framework},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}
sapiens
@article{khirodkar2024sapiens,
title={Sapiens: Foundation for Human Vision Models},
author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
journal={arXiv preprint arXiv:2408.12569},
year={2024}
}