ComfyUI-FreeVC_wrapper

★ 69

语音转换语音克隆音频预处理GPU加速

在 ComfyUI 中集成 FreeVC，实现高质量语音风格转换与模仿，支持降噪、自动重采样、立体声与 CUDA 加速。

💡 将源音频转换为目标说话人风格并保留音质。

🍴 2 Forks💻 Python🔄 2025-04-03

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/8f9eee5e2cdb

📦 requirements.txt

altair
httpx==0.24.1
numpy
scipy
torch
transformers
librosa
webrtcvad==2.0.10

📄 README

ComfyUI-FreeVC_wrapper

Support My Work

If you find this project helpful, consider buying me a coffee:

[](https://buymeacoffee.com/shmuelronen)

A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.

Features

Support for multiple FreeVC models:

Standard models (16kHz): FreeVC, FreeVC-s

High-quality model (24kHz): FreeVC (24kHz)

Enhanced voice mimicry capabilities

Advanced audio pre and post-processing options

Stereo and mono audio support

Automatic audio resampling

Integrated with ComfyUI’s audio processing pipeline

GPU acceleration support (CUDA)

Installation

Install the extension in your ComfyUI’s custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
cd ComfyUI-FreeVC_wrapper

Install required Python packages:

pip install librosa transformers numpy torch noisereduce

Download required checkpoints:

a. Voice Conversion Models:

All model checkpoint files (3 models) are available in a single Google Drive folder:

Download All Model Checkpoints (Google Drive)

After downloading, extract the file and place the checkpoints folder in the freevc directory:

ComfyUI-FreeVC_wrapper/freevc/

b. Speaker Encoder:

Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt directory:

| Component | Filename | Required For |

|———–|———-|————–|

| Speaker Encoder | pretrained_bak_5805000.pt | FreeVC, FreeVC (24kHz), D-FreeVC, and D-FreeVC (24kHz) models |

Direct download link:

pretrained_bak_5805000.pt

Your final directory structure should look like this:

ComfyUI-FreeVC_wrapper/
├── freevc/
    ├── checkpoints/
    │   ├── freevc.pth         # Standard 16kHz model
    │   ├── freevc-s.pth       # Source-filtering based model
    │   ├── freevc-24.pth      # High-quality 24kHz model
    │  
    └── speaker_encoder/
        └── ckpt/
            └── pretrained_bak_5805000.pt  # Speaker encoder checkpoint

Usage

In ComfyUI, locate the “FreeVC Voice Converter v2 🎤” node under the “audio/voice conversion” category

Connect your inputs:

Source audio: The audio you want to convert

Reference audio: The target voice style

(Optional) Secondary reference: Additional reference for more robust voice matching

Select model type: Choose between standard and diffusion-enhanced models

Configure the conversion parameters:

Source processing: Noise reduction, source neutralization, clarity enhancement

Conversion settings: Temperature, diffusion parameters (for diffusion models)

Post-processing: Voice matching strength, presence boost, normalization

Connect the output to your desired audio output node

Model Selection Guide

FreeVC: Good for general purpose voice conversion at 16kHz

FreeVC-s: Better preservation of source speech content, recommended for maintaining clarity

FreeVC (24kHz): Higher quality output with better audio fidelity

Tips for Better Voice Conversion

Use longer reference samples: 5-10 seconds of clean speech works best

Try multiple reference samples: Use the secondary reference input for more robust voice profiles

Adjust voice mimicry settings:

Increase voice_match_strength (0.6-0.8) for stronger character matching

Use neutralize_source (0.3-0.5) to reduce source voice influence

Add presence_boost (0.3-0.5) for more “in the room” sound

Known Issues and Troubleshooting

File Not Found Errors:

Ensure all checkpoint files are in the correct directory

Verify file names match exactly (case-sensitive)

CUDA Out of Memory:

Try processing shorter audio clips

Use CPU if GPU memory is insufficient

Lower diffusion steps for diffusion-based models

Audio Quality Issues:

Try different models – each has strengths for different source/target voices

For diffusion models, lower the noise coefficient if there’s static

Increase clarity_enhancement for better intelligibility

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License – see the LICENSE file for details.

Acknowledgments

Original FreeVC implementation by OlaWod

ComfyUI framework by comfyanonymous

Citation

If you use this in your research, please cite:

@article{wang2023freevc,
  title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
  author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
  journal={arXiv preprint arXiv:2210.15418},
  year={2023}
}