ComfyUI_wav2lip

ComfyUI_wav2lip
★ 159

口型同步视频处理人脸检测音频驱动
基于Wav2Lip模型为视频执行口型同步,输入视频与音频生成唇形对齐的视频,支持多种人脸检测器与批处理设置。
💡 将音频与视频合成出自动唇形对齐的视频。
🍴 28 Forks💻 Python🔄 2024-09-18
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
--index-url
https://download.pytorch.org/whl/cu115
torch==1.11.0
torchaudio==0.11.0
torchvision==0.12.0
wav2lip
📄 README

ComfyUI_wav2lip

Wav2Lip Node for ComfyUI

The Wav2Lip node is a custom node for ComfyUI that allows you to perform lip-syncing on videos using the Wav2Lip model. It takes an input video and an audio file and generates a lip-synced output video.

Features

  • Lip-syncing of videos using the Wav2Lip model
  • Support for various face detection models
  • Audio path upload for input audio file
  • Inputs

  • images: Input video frames (required)
  • audio: Input audio file (required)
  • mode: Processing mode, either “sequential” or “repetitive” (default: “sequential”)
  • face_detect_batch: Batch size for face detection (default: 8)
  • Outputs

  • images: Lip-synced output video frames
  • audio: Output audio file
  • Installation

  • Clone the repository to custom_nodes folder:
  • “`

    git clone https://github.com/ShmuelRonen/ComfyUI_wav2lip.git

    “`

  • Install the required dependencies:
  • “`

    pip install -r requirements.txt

    “`

    Model Setup

    To use the Wav2Lip node, you need to download the required models separately. Please follow these steps:

    wav2lip model:

  • Download the wav2lip model: -model-
  • Place the .pth model file in the custom_nodes\ComfyUI_wav2lip\Wav2Lip\checkpoints` folder
  • Start or restart ComfyUI.
  • Usage

  • Add the Wav2Lip node to your ComfyUI workflow.
  • Connect the input video frames and audio file to the corresponding inputs of the Wav2Lip node.
  • Adjust the node settings according to your requirements:
  • Set the mode to “sequential” or “repetitive” based on your video processing needs.
  • Adjust the face_detect_batch size if needed.
  • Execute the ComfyUI workflow to generate the lip-synced output video.
  • Acknowledgement

    Thanks to

    ArtemM,

    Wav2Lip,

    PIRenderer,

    GFP-GAN,

    GPEN,

    ganimation_replicate,

    STIT

    for sharing their code.

    Related Work

  • StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)
  • CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
  • SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
  • DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
  • 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
  • T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)