ComfyUI-Geeky-LatentSyncWrapper

★ 10

性能优化内存管理兼容性GPU加速

基于LatentSync 1.5的ComfyUI优化分支，显著提速、智能VRAM管理与自动清理，避免OOM并能与其它实现共存且路径隔离。

💡 在ComfyUI里替换或增强LatentSync以提速并避免OOM。

🍴 1 Forks💻 Python🔄 2025-09-13

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/e58c8376a81b

📦 requirements.txt

diffusers
mediapipe>=0.10.8
transformers
huggingface-hub
omegaconf
einops
opencv-python
face-alignment
decord
ffmpeg-python>=0.2.0
safetensors
soundfile

📄 README

(Out Dated) ComfyUI-Geeky-LatentSyncWrapper 1.5 (Mediapipe isn’t compatible with some changes to ComfyUI, will need to downgrade python version of portable comfyUI and remove xformers).

Unofficial optimized and enhanced fork of LatentSync 1.5 implementation for ComfyUI on Windows and WSL 2.0.

Works with ComfyUI Portable version 3.49 https://github.com/comfyanonymous/ComfyUI/releases/tag/v0.3.49

This node provides advanced lip-sync capabilities in ComfyUI using ByteDance’s LatentSync 1.5 model with significantly improved performance, memory efficiency, and stability. This fork focuses on speed, reliability, and conflict-free coexistence with other LatentSync implementations.

Why This Fork? Performance & Stability

🚀 Much Faster Performance: This implementation is significantly faster than other versions and eliminates OOM (Out of Memory) errors that plague other implementations.

🧠 Better Memory Management: Intelligent VRAM usage with user-selectable settings (high/medium/low) and automatic cleanup prevents memory issues.

🔒 Conflict-Free: Can be installed alongside other LatentSync implementations without interference – uses isolated paths and unique node names.

⚡ LatentSync 1.5 vs 1.6: We use LatentSync 1.5 instead of 1.6 because:

More Stable: 1.5 has proven stability and reliability in production use

Better Performance: 1.5 runs faster and uses less VRAM than 1.6

No Manual Downloads: 1.5 models download automatically, unlike 1.6’s private repository requirements

Fewer Dependencies: Simpler, more reliable dependency chain

What’s New in This Optimized Fork?

Performance Enhancements

Advanced Memory Management: Intelligent VRAM allocation with user-selectable modes

Faster Processing: Optimized batch processing and GPU utilization

No OOM Errors: Comprehensive memory cleanup and management

Mixed Precision Support: Automatic FP16 optimization when beneficial

User Experience Improvements

Single Image Support: Process individual images with LatentSync

Batch Image Processing: Process multiple images efficiently

Smart Temp Management: Isolated temporary directories prevent conflicts

Better Error Handling: Robust error recovery and informative messages

Compatibility Features

Conflict-Free Installation: Can coexist with ShmuelRonen’s implementation

Unique Node Names: “Geeky” prefixed nodes prevent naming conflicts

Isolated Model Storage: Uses geeky_checkpoints/ directory

Automatic Path Management: Handles compatibility transparently

Original LatentSync 1.5 Features

Temporal Layer Improvements: Corrected implementation provides significantly improved temporal consistency compared to version 1.0

Better Chinese Language Support: Performance on Chinese videos is substantially improved through additional training data

Reduced VRAM Requirements: Optimized to run on 20GB VRAM (RTX 3090 compatible) through various optimizations:

Gradient checkpointing in U-Net, VAE, SyncNet and VideoMAE

Native PyTorch FlashAttention-2 implementation (no xFormers dependency)

More efficient CUDA cache management

Focused training of temporal and audio cross-attention layers only

Code Optimizations:

Removed dependencies on xFormers and Triton

Upgraded to diffusers 0.32.2

Compatibility with Other LatentSync Nodes

This repository can be installed alongside ShmuelRonen’s ComfyUI-LatentSyncWrapper without conflicts:

✅ Different node names: Geeky nodes use “Geeky” prefix (“Geeky LatentSync 1.5 (Optimized)”)

✅ Separate checkpoints: Uses geeky_checkpoints/ directory

✅ Independent models: Downloads to isolated paths

✅ No shared resources: Completely separate from other LatentSync implementations

✅ Isolated temp directories: Prevents interference with other nodes

Both repositories can coexist and users can choose which nodes to use based on their performance needs.

Prerequisites

Before installing this node, you must install the following in order:

ComfyUI installed and working

FFmpeg installed on your system:

Windows: Download from here and add to system PATH

Installation

Only proceed with installation after confirming all prerequisites are installed and working.

Clone this repository into your ComfyUI custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/GeekyGhost/ComfyUI-Geeky-LatentSyncWrapper.git
cd ComfyUI-Geeky-LatentSyncWrapper
pip install -r requirements.txt

Required Dependencies

diffusers>=0.32.2
transformers
huggingface-hub
omegaconf
einops
opencv-python
mediapipe
face-alignment
decord
ffmpeg-python
safetensors
soundfile

Note on Model Downloads

On first use, the node will automatically download required model files from HuggingFace:

LatentSync 1.5 UNet model (~5GB)

Whisper model for audio processing (~1.6GB)

All models download automatically – no manual intervention required

Models are stored in isolated geeky_checkpoints/ directory

Checkpoint Directory Structure

After successful installation and model download, your checkpoint directory structure will look like this:

./geeky_checkpoints/
|-- .cache/
|-- auxiliary/
|-- whisper/
|   `-- tiny.pt
|-- config.json
|-- latentsync_unet.pt  (~5GB)
|-- stable_syncnet.pt   (~1.6GB)

Make sure all these files are present for proper functionality. The main model files are:

latentsync_unet.pt: The primary LatentSync 1.5 model

stable_syncnet.pt: The SyncNet model for lip-sync supervision

whisper/tiny.pt: The Whisper model for audio processing

Usage

For Videos:

Select an input video file with a video loader

Load an audio file using ComfyUI audio loader

(Optional) Set a seed value for reproducible results

(Optional) Adjust the lips_expression parameter to control lip movement intensity

(Optional) Modify the inference_steps parameter to balance quality and speed

(Optional) Choose VRAM usage setting based on your GPU

Connect to the Geeky LatentSync 1.5 (Optimized) node

Run the workflow

For Single Images:

Load a single image using ComfyUI’s image loader

Load an audio file using ComfyUI audio loader

Connect to the Geeky LatentSync 1.5 (Optimized) node

Adjust parameters as needed

Run the workflow

For Batch Images:

Load multiple images using ComfyUI’s batch image loader or image list to batch node

Load an audio file using ComfyUI audio loader

Connect to the Geeky LatentSync 1.5 (Optimized) node

Adjust parameters as needed

Run the workflow

The processed video or images will be saved in ComfyUI’s output directory.

Node Parameters:

images: Input image(s) – supports single images, video frames, or batch processing

audio: Audio input from ComfyUI audio loader

seed: Random seed for reproducible results (default: 1247)

lips_expression: Controls the expressiveness of lip movements (default: 1.5)

Higher values (2.0-3.0): More pronounced lip movements, better for expressive speech

Lower values (1.0-1.5): Subtler lip movements, better for calm speech

This parameter affects the model’s guidance scale, balancing between natural movement and lip sync accuracy

inference_steps: Number of denoising steps during inference (default: 20)

Higher values (30-50): Better quality results but slower processing

Lower values (10-15): Faster processing but potentially lower quality

The default of 20 usually provides a good balance between quality and speed

vram_usage: NEW – Choose memory usage profile (default: medium)

High: Maximum performance, uses 95% VRAM, enables all optimizations

Medium: Balanced performance, uses 85% VRAM, good for most users

Low: Conservative usage, uses 75% VRAM, for systems with limited memory

Available Nodes:

Geeky LatentSync 1.5 (Optimized): Main lip-sync processing node

Geeky Video Length Adjuster (Fast): Utility node for video/audio length matching

Tips for Better Results:

Performance: Start with “medium” VRAM usage and increase to “high” if you have sufficient GPU memory

Quality: For speeches or presentations, try increasing lips_expression to 2.0-2.5

Efficiency: For quick previews, use “low” VRAM setting with 10-15 inference steps

Stability: This implementation handles single images automatically by duplicating frames to match audio length

Memory: The optimized memory management prevents OOM errors even with long audio clips

Performance Comparison

| Feature | This Fork (Geeky) | Original Implementation |

|———|——————-|————————|

| OOM Errors | ❌ None | ✅ Frequent |

| Processing Speed | 🚀 Much Faster | 🐌 Slower |

| Memory Usage | 🧠 Optimized | 💾 High |

| VRAM Settings | ✅ 3 Modes | ❌ Fixed |

| Conflict-Free | ✅ Yes | ❌ No |

| Auto Downloads | ✅ Yes | ⚠️ Manual (1.6) |

Known Limitations

Works best with clear, frontal face images/videos

Currently does not support anime/cartoon faces

Video should be at 25 FPS (will be automatically converted)

Face should be visible throughout the image/video

Single images are automatically extended to match audio duration

Troubleshooting

Common Issues:

“Geeky model checkpoints already exist”: This is normal – models are cached for faster startup

Memory errors: Try lowering VRAM usage setting from high → medium → low

Slow performance: Ensure you’re using a CUDA-compatible GPU and try “high” VRAM setting

Node not appearing: Restart ComfyUI after installation and refresh your browser

Credits

This optimized fork is based on:

LatentSync 1.5 by ByteDance Research

ComfyUI-LatentSyncWrapper by ShmuelRonen

ComfyUI

Special thanks to the original developers for their groundbreaking work. This fork focuses on performance optimization, memory efficiency, and user experience improvements.

License

This project is licensed under the Apache License 2.0 – see the LICENSE file for details.