ComfyUI-FLOAT

ComfyUI-FLOAT
★ 264

动态人像音频驱动潜流匹配ComfyUI扩展
在ComfyUI中实现a/FLOAT非官方节点,利用音频驱动的生成式运动潜流匹配,将静态人像生成高保真同步口型与表情动画。
💡 用音频和静态人像生成同步口型与表情的说话动画
🍴 33 Forks💻 Python🔄 2026-01-02
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/6862a2001521
📦 requirements.txt
opencv-python
pandas
matplotlib
flow-vis
librosa
albumentations
albucore
torchdiffeq
timm>=1.0.9
face_alignment
torchcodec
📄 README

ComfyUI FLOAT

[](https://www.python.org/downloads/release/python-31012/)

[](https://arxiv.org/abs/2412.01064)

[](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)

This project provides a ComfyUI wrapper of FLOAT for Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

For a more advanced and maintained version, check out: ComfyUI-FLOAT_Optimized

Last tested: 2 January 2026 (ComfyUI v0.7.0@f2fda02 | Torch 2.9.1 | Python 3.10.12 | RTX4090 | CUDA 13.0 | Debian 12)

⭐ Support

If you like my projects and wish to see updates and new features, please consider supporting me. It helps a lot!

[](https://github.com/yuvraj108c/ComfyUI-Depth-Anything-Tensorrt)

[](https://github.com/yuvraj108c/ComfyUI-Upscaler-Tensorrt)

[](https://github.com/yuvraj108c/ComfyUI-Dwpose-Tensorrt)

[](https://github.com/yuvraj108c/ComfyUI-Rife-Tensorrt)

[](https://github.com/yuvraj108c/ComfyUI-Whisper)

[](https://github.com/yuvraj108c/ComfyUI_InvSR)

[](https://github.com/yuvraj108c/ComfyUI-FLOAT)

[](https://github.com/yuvraj108c/ComfyUI-Thera)

[](https://github.com/yuvraj108c/ComfyUI-Video-Depth-Anything)

[](https://github.com/yuvraj108c/ComfyUI-PiperTTS)

[](https://www.buymeacoffee.com/yuvraj108cZ)

[](https://paypal.me/yuvraj108c)


🚀 Installation

git clone https://github.com/yuvraj108c/ComfyUI-FLOAT.git
cd ./ComfyUI-FLOAT
pip install -r requirements.txt

☀️ Usage

  • Load example workflow
  • Upload driving image and audio, click queue
  • Models autodownload to /ComfyUI/models/float
  • The models are organized as follows:
  • “`.bash

    |– float.pth # main model

    |– wav2vec2-base-960h/ # audio encoder

    | |– config.json

    | |– model.safetensors

    | |– preprocessor_config.json

    |– wav2vec-english-speech-emotion-recognition/ # emotion encoder

    |– config.json

    |– preprocessor_config.json

    |– pytorch_model.bin

    🛠️ Parameters

  • ref_image: Reference image with a face (must have batch size 1)
  • ref_audio: Reference audio (For long audios (e.g 3+ minutes), ensure that you have enough ram/vram)
  • a_cfg_scale: Audio classifier-free guidance scale (default:2)
  • r_cfg_scale: Reference classifier-free guidance scale (default:1)
  • emotion: none, angry, disgust, fear, happy, neutral, sad, surprise (default:none)
  • e_cfg_scale: Intensity of emotion (default:1). For more emotion intensive video, try large value from 5 to 10
  • crop: Enable only if the reference image does not have a centered face
  • fps: Frame rate of the output video (default:25)
  • Citation

    @article{ki2024float,
      title={FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait},
      author={Ki, Taekyung and Min, Dongchan and Chae, Gyeongsu},
      journal={arXiv preprint arXiv:2412.01064},
      year={2024}
    }

    Acknowledgments

    Thanks to simplepod.ai for providing GPU servers

    License

    Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)