modelscope huggingface_hub transformers>=4.40.0 librosa>=0.10.0 soundfile>=0.12.0 numpy>=1.24.0 tqdm onnxruntime>=1.18.0 requests HyperPyYAML>=1.2.0 ruamel.yaml<0.18 conformer>=0.3.2 omegaconf>=2.3.0 inflect>=7.0.0 hydra-core>=1.3.0 pydantic>=2.0.0 x-transformers>=2.0.0 openai-whisper diffusers>=0.29.0 wetext>=0.1.0 pyarrow>=14.0.0 pyworld>=0.3.0

Advanced text-to-speech nodes for ComfyUI powered by the CosyVoice3 model family. Features zero-shot voice cloning, cross-lingual synthesis, and voice conversion.
[](https://github.com/FunAudioLLM/CosyVoice)
[](https://www.patreon.com/Machinedelusions)
| Node | Description |
|——|————-|
| Model Loader | Downloads and caches CosyVoice models |
| Zero-Shot Clone | Clone voices from reference audio |
| Cross-Lingual | Generate speech in different languages |
| Voice Conversion | Convert source audio to target voice |
| Dialog | Multi-speaker dialog synthesis with up to 4 voices |
| Audio Crop | Trim audio to specific time ranges |
| Node | Description |
|——|————-|
| Instruct2 | Clone a voice with instruct text |
| Save Speaker | Save voice preset |
| Speaker Clone | Voice clone with voice preset |
| Speaker Instruct2 | Voice clone with voice preset and instruct text |
Search for “FL CosyVoice3” and install.
cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_FL-CosyVoice3.git
cd ComfyUI_FL-CosyVoice3
pip install -r requirements.txt
Fun-CosyVoice3-0.5B| Model | Size | Notes |
|——-|——|——-|
| Fun-CosyVoice3-0.5B | ~2GB | Recommended |
| CosyVoice2-0.5B | ~2GB | Alternative |
| CosyVoice-300M | ~1.2GB | Lightweight |
Models download automatically on first use to ComfyUI/models/cosyvoice/.
Notice: CosyVoice-300M won’t work well, do not use.
Named Instruct2 because in CosyVoice’s source code, there is an instruct1 function only for CosyVoice1 model. Instruct2 is for CosyVoice2 and CosyVoice3 model.
Choose a refernce voice with 3~10 seconds is the best, no more than 30 seconds.
If reference text is empty, it will try to script reference audio into text as reference text.
Speaker preset is saved to Comfyui's model folder/cosyvoice/speaker.
Be notieced: a voice preset saved with CosyVoice3/2 model, can not be used with CosyVoice2/3 model.
Be notieced: a voice preset saved with CosyVoice3/2 model, can not be used with CosyVoice2/3 model.
Be notieced2: CosyVoice’s official speaker preset spk2info.pt from CosyVoice-300M-SFT model is not supported.
If you really want to use those speaker presets from spk2info.pt, you can find those 8 voices at:
https://fun-audio-llm.github.io/#CosyVoice-basic
Then you can just download those audios and save them into speaker presets with Save Speaker node.
Using a speaker preset is excatly the same as using that speaker’s reference audio for voice clone, same process, same result.
Load all speaker presets saved with Save Speaker node into a list, then you can pick one for tts with instruct text.
Notice: instruct text can not be empty.
Apache 2.0