ComfyUI-IF_MemoAvatar

★ 174

说话头像视频生成音频驱动情感迁移

ComfyUI节点，基于MEMO从单张人像与音频生成情感丰富的会说话头像视频，支持音频驱动表情迁移与高质量输出。

💡 用单张照片和音频快速生成情感化的会说话头像视频。

🍴 11 Forks💻 Python🔄 2025-03-09

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/9671236b7e59

📦 requirements.txt

git-lfs
diffusers>=0.31.0
audio-separator
albumentations
numba
librosa
modelscope
transformers>=4.46.3
numpy>=1.26.4
PyYAML>=6.0.1
moviepy>=1.0.3
pillow>=10.4.0
librosa==0.10.2
audio-separator==0.24.1
funasr==1.0.27
modelscope
insightface==0.7.3
accelerate==1.1.1
albumentations==1.4.21
black==23.12.1
einops==0.8.0
ffmpeg-python==0.2.0
huggingface-hub==0.26.2
imageio==2.36.0
imageio-ffmpeg==0.5.1
hydra-core==1.3.2
jax==0.4.35
mediapipe==0.10.18
modelscope==1.20.1
omegaconf==2.3.0
onnxruntime>=1.20.1
onnxruntime-gpu>=1.20.1
opencv-python-headless==4.10.0.84
scikit-learn>=1.5.2
scipy>=1.14.1
tqdm>=4.67.1

📄 README

ComfyUI-IF_MemoAvatar

Memory-Guided Diffusion for Expressive Talking Video Generation

ORIGINAL REPO

MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Yifan Zhang\*,

_Project Page | arXiv | Model_

This repository contains the example inference script for the MEMO-preview model. The gif demo below is compressed. See our project page for full videos.

ComfyUI-IF_MemoAvatar

Memory-Guided Diffusion for Expressive Talking Video Generation

Overview

This is a ComfyUI implementation of MEMO (Memory-Guided Diffusion for Expressive Talking Video Generation), which enables the creation of expressive talking avatar videos from a single image and audio input.

Features

Generate expressive talking head videos from a single image

Audio-driven facial animation

Emotional expression transfer

High-quality video output

https://github.com/user-attachments/assets/bfbf896d-a609-4e0f-8ed3-16ec48f8d85a

Installation

* Xformers NOT REQUIRED BUT BETTER IF INSTALLED*

* MAKE SURE YoU HAVE HF Token On Your environment VARIABLES *

git clone the repo to your custom_nodes folder and then

cd ComfyUI-IF_MemoAvatar
pip install -r requirements.txt

I removed xformers from the file because it needs a particular combination of pytorch on windows to work

if you are on linux you can just run

pip install xformers

for windows users if you don’t have xformers on your env

pip show xformers

follow this guide to install a good comfyui environment if you don’t see any version install the latest following this free guide

Installing Triton and Sage Attention Flash Attention

[](https://www.youtube.com/watch?v=nSUGEdm2wU4)

Model Files

The models will automatically download to the following locations in your ComfyUI installation:

models/checkpoints/memo/
├── audio_proj/
├── diffusion_net/
├── image_proj/
├── misc/
│ ├── audio_emotion_classifier/
│ ├── face_analysis/
│ └── vocal_separator/
└── reference_net/
models/wav2vec/
models/vae/sd-vae-ft-mse/
models/emotion2vec/emotion2vec_plus_large/

Copy the faceanalisys/models models from the folder directly into faceanalisys

just until I make sure don’t just move then duplicate them cos

HF will detect empty and download them every time

If you don’t see a models.json or errors out create one yourself this is the content

{
  "detection": [
    "scrfd_10g_bnkps"
  ],
  "recognition": [
    "glintr100"
  ],
  "analysis": [
    "genderage",
    "2d106det",
    "1k3d68"
  ]
}

and a version.txt containing

0.7.3