ComfyUI_wav2lip

★ 159

口型同步视频处理人脸检测音频驱动

基于Wav2Lip模型为视频执行口型同步，输入视频与音频生成唇形对齐的视频，支持多种人脸检测器与批处理设置。

💡 将音频与视频合成出自动唇形对齐的视频。

🍴 28 Forks💻 Python🔄 2024-09-18

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/8f9eee5e2cdb

📦 requirements.txt

--index-url
https://download.pytorch.org/whl/cu115
torch==1.11.0
torchaudio==0.11.0
torchvision==0.12.0

📄 README

ComfyUI_wav2lip

Wav2Lip Node for ComfyUI

The Wav2Lip node is a custom node for ComfyUI that allows you to perform lip-syncing on videos using the Wav2Lip model. It takes an input video and an audio file and generates a lip-synced output video.

Features

Lip-syncing of videos using the Wav2Lip model

Support for various face detection models

Audio path upload for input audio file

Inputs

images: Input video frames (required)

audio: Input audio file (required)

mode: Processing mode, either “sequential” or “repetitive” (default: “sequential”)

face_detect_batch: Batch size for face detection (default: 8)

Outputs

images: Lip-synced output video frames

audio: Output audio file

Installation

Clone the repository to custom_nodes folder:

“`

git clone https://github.com/ShmuelRonen/ComfyUI_wav2lip.git

“`

Install the required dependencies:

“`

pip install -r requirements.txt

“`

Model Setup

To use the Wav2Lip node, you need to download the required models separately. Please follow these steps:

wav2lip model:

Download the wav2lip model: -model-

Place the .pth model file in the custom_nodes\ComfyUI_wav2lip\Wav2Lip\checkpoints` folder

Start or restart ComfyUI.

Usage

Add the Wav2Lip node to your ComfyUI workflow.

Connect the input video frames and audio file to the corresponding inputs of the Wav2Lip node.

Adjust the node settings according to your requirements:

Set the mode to “sequential” or “repetitive” based on your video processing needs.

Adjust the face_detect_batch size if needed.

Execute the ComfyUI workflow to generate the lip-synced output video.

Acknowledgement

Thanks to

ganimation_replicate,

STIT

for sharing their code.

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)

3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)

ComfyUI_wav2lip

ComfyUI_wav2lip

Wav2Lip Node for ComfyUI

Features

Inputs

Outputs

Installation

Model Setup

wav2lip model:

Usage

Acknowledgement

Related Work