ComfyUI_Aniportrait

ComfyUI_Aniportrait
★ 57

视频生成人脸重演音频驱动VHS兼容
在ComfyUI中基于AniPortrait生成视频的节点,支持自驱动、面部重演与音频驱动,兼容VHS节点并通过帧插值加速生成。
💡 用参考图像/视频结合音频生成口型同步或重演视频。
🍴 11 Forks💻 Python🔄 2024-09-13
📦 网盘链接待填入
📦 requirements.txt
mediapipe==0.10.11
ffmpeg-python==0.2.0
av==11.0.0
librosa==0.9.2
diffusers==0.26.2
omegaconf==2.3.0
截图 2024-08-30 12-04-53
pose2video
face_reenacment
📄 README

Updates:

① Implement the frame_interpolation to speed up generation

② Modify the current code and support chain with the VHS nodes, i just found that comfyUI IMAGE type requires the torch float32 datatype, and AniPortrait heavily used numpy of image unit8 datatype,so i just changed my mind from my own image/video upload and generation nodes to the prevelance SOTA VHS image/video upload and video combined nodes,it WYSIWYG and inteactive well and instantly render the result

  • ✅ [2024/04/09] raw video to pose video with reference image(aka self-driven)
  • ✅ [2024/04/09] audio driven
  • ✅ [2024/04/09] face reenacment
  • ✅ [2024/04/22] implement audio2pose model and pre-trained weight for audio2video,the face reenacment and audio2video workflow has been modified, currently inference up to a maximum length of 10 seconds has been supported,you can experiment with the length hyperparameter.
  • U can contact me thr twitter Weixin:GalaticKing

    audio driven combined with reference image and reference video

    audio2video workflow

    raw video to pose video with reference image

    face reenacment

    video2video workflow

    This is unofficial implementation of AniPortrait in ComfyUI custom_node,cuz i have routine jobs,so i will update this project when i have time

    Aniportrait_pose2video.json

    Audio driven

    face reenacment

    you should run

    git clone https://github.com/frankchieng/ComfyUI_Aniportrait.git

    then run

    pip install -r requirements.txt

    download the pretrained models

    StableDiffusion V1.5

    sd-vae-ft-mse

    image_encoder

    wav2vec2-base-960h

    download the weights:

    denoising_unet.pth

    reference_unet.pth

    pose_guider.pth

    motion_module.pth

    audio2mesh.pt

    audio2pose.pt

    film_net_fp16.pt

    ./pretrained_model/
    |-- image_encoder
    |   |-- config.json
    |   `-- pytorch_model.bin
    |-- sd-vae-ft-mse
    |   |-- config.json
    |   |-- diffusion_pytorch_model.bin
    |   `-- diffusion_pytorch_model.safetensors
    |-- stable-diffusion-v1-5
    |   |-- feature_extractor
    |   |   `-- preprocessor_config.json
    |   |-- model_index.json
    |   |-- unet
    |   |   |-- config.json
    |   |   `-- diffusion_pytorch_model.bin
    |   `-- v1-inference.yaml
    |-- wav2vec2-base-960h
    |   |-- config.json
    |   |-- feature_extractor_config.json
    |   |-- preprocessor_config.json
    |   |-- pytorch_model.bin
    |   |-- README.md
    |   |-- special_tokens_map.json
    |   |-- tokenizer_config.json
    |   `-- vocab.json
    |-- audio2mesh.pt
    |-- audio2pose.pt
    |-- denoising_unet.pth
    |-- motion_module.pth
    |-- pose_guider.pth
    |-- reference_unet.pth
    |-- film_net_fp16.pt

    Tips :

    The intermediate audio file will be generated and deleted,the raw video to pose video with audio and pose2video mp4 file will be located in the output directory of ComfyUI

    the original uploaded mp4 video requires square size like 512×512, otherwise the result will be weird

    I’ve updated diffusers from 0.24.x to 0.26.2,so the diffusers/models/embeddings.py classname of PositionNet changed to GLIGENTextBoundingboxProjection and CaptionProjection changed to PixArtAlphaTextProjection,you should pay attention to it and modify the corresponding python files like src/models/transformer_2d.py if you installed the lower version of diffusers