ComfyUI-FreeVC_wrapper

ComfyUI-FreeVC_wrapper
★ 69

语音转换语音克隆音频预处理GPU加速
在 ComfyUI 中集成 FreeVC,实现高质量语音风格转换与模仿,支持降噪、自动重采样、立体声与 CUDA 加速。
💡 将源音频转换为目标说话人风格并保留音质。
🍴 2 Forks💻 Python🔄 2025-04-03
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
altair
httpx==0.24.1
numpy
scipy
torch
transformers
librosa
webrtcvad==2.0.10
Buy Me A Coffee
image
📄 README

ComfyUI-FreeVC_wrapper

Support My Work

If you find this project helpful, consider buying me a coffee:

[](https://buymeacoffee.com/shmuelronen)

A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.

Features

  • Support for multiple FreeVC models:
  • Standard models (16kHz): FreeVC, FreeVC-s
  • High-quality model (24kHz): FreeVC (24kHz)
  • Enhanced voice mimicry capabilities
  • Advanced audio pre and post-processing options
  • Stereo and mono audio support
  • Automatic audio resampling
  • Integrated with ComfyUI’s audio processing pipeline
  • GPU acceleration support (CUDA)
  • Installation

  • Install the extension in your ComfyUI’s custom_nodes directory:
  • cd ComfyUI/custom_nodes
    git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
    cd ComfyUI-FreeVC_wrapper

  • Install required Python packages:
  • pip install librosa transformers numpy torch noisereduce

  • Download required checkpoints:
  • a. Voice Conversion Models:

    All model checkpoint files (3 models) are available in a single Google Drive folder:

    Download All Model Checkpoints (Google Drive)

    After downloading, extract the file and place the checkpoints folder in the freevc directory:

    ComfyUI-FreeVC_wrapper/freevc/

    b. Speaker Encoder:

    Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt directory:

    | Component | Filename | Required For |

    |———–|———-|————–|

    | Speaker Encoder | pretrained_bak_5805000.pt | FreeVC, FreeVC (24kHz), D-FreeVC, and D-FreeVC (24kHz) models |

    Direct download link:

  • pretrained_bak_5805000.pt
  • Your final directory structure should look like this:

    ComfyUI-FreeVC_wrapper/
    ├── freevc/
        ├── checkpoints/
        │   ├── freevc.pth         # Standard 16kHz model
        │   ├── freevc-s.pth       # Source-filtering based model
        │   ├── freevc-24.pth      # High-quality 24kHz model
        │  
        └── speaker_encoder/
            └── ckpt/
                └── pretrained_bak_5805000.pt  # Speaker encoder checkpoint

    Usage

  • In ComfyUI, locate the “FreeVC Voice Converter v2 🎤” node under the “audio/voice conversion” category
  • Connect your inputs:
  • Source audio: The audio you want to convert
  • Reference audio: The target voice style
  • (Optional) Secondary reference: Additional reference for more robust voice matching
  • Select model type: Choose between standard and diffusion-enhanced models
  • Configure the conversion parameters:
  • Source processing: Noise reduction, source neutralization, clarity enhancement
  • Conversion settings: Temperature, diffusion parameters (for diffusion models)
  • Post-processing: Voice matching strength, presence boost, normalization
  • Connect the output to your desired audio output node
  • Model Selection Guide

  • FreeVC: Good for general purpose voice conversion at 16kHz
  • FreeVC-s: Better preservation of source speech content, recommended for maintaining clarity
  • FreeVC (24kHz): Higher quality output with better audio fidelity
  • Tips for Better Voice Conversion

  • Use longer reference samples: 5-10 seconds of clean speech works best
  • Try multiple reference samples: Use the secondary reference input for more robust voice profiles
  • Adjust voice mimicry settings:
  • Increase voice_match_strength (0.6-0.8) for stronger character matching
  • Use neutralize_source (0.3-0.5) to reduce source voice influence
  • Add presence_boost (0.3-0.5) for more “in the room” sound
  • Known Issues and Troubleshooting

  • File Not Found Errors:
  • Ensure all checkpoint files are in the correct directory
  • Verify file names match exactly (case-sensitive)
  • CUDA Out of Memory:
  • Try processing shorter audio clips
  • Use CPU if GPU memory is insufficient
  • Lower diffusion steps for diffusion-based models
  • Audio Quality Issues:
  • Try different models – each has strengths for different source/target voices
  • For diffusion models, lower the noise coefficient if there’s static
  • Increase clarity_enhancement for better intelligibility
  • Contributing

    Contributions are welcome! Please feel free to submit a Pull Request.

    License

    This project is licensed under the MIT License – see the LICENSE file for details.

    Acknowledgments

  • Original FreeVC implementation by OlaWod
  • ComfyUI framework by comfyanonymous
  • Citation

    If you use this in your research, please cite:

    @article{wang2023freevc,
      title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
      author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
      journal={arXiv preprint arXiv:2210.15418},
      year={2023}
    }