numpy<2.0.0 numba torch>=2.1.0 torchaudio tqdm vector_quantize_pytorch transformers>=4.41.1 vocos IPython pybase16384 pynini==2.1.5; sys_platform == 'linux' WeTextProcessing; sys_platform == 'linux' nemo_text_processing; sys_platform == 'linux' av pydub

A ComfyUI integration for ChatTTS, enabling high-quality, controllable text-to-speech generation directly in your ComfyUI workflows.
This simple workflow demonstrates basic text-to-speech conversion:
custom_nodes directory“`
git clone https://github.com/neverbiasu/ComfyUI-ChatTTS
“`
“`
cd ComfyUI-ChatTTS
pip install -r requirements.txt
“`
ChatTTS models will be automatically downloaded when first used, or you can manually place them in:
ComfyUI/models/chattts/
The first time you run the ChatTTSLoader node, it will:
ChatTTS supports various special tags that can be inserted into your text to control the speech generation. These tags allow you to customize the speech output without changing the model parameters.
| Tag | Range | Description |
| ———— | —– | ——————————————————- |
| [speed_n] | 1-9 | Controls speech speed (higher numbers = faster) |
| [oral_n] | 0-9 | Controls oral expressiveness style |
| [laugh_n] | 0-2 | Controls laughter intensity |
| [break_n] | 0-7 | Controls pause duration (higher numbers = longer pause) |
| [uv_break] | – | Inserts a brief pause/break at the word level |
| [lbreak] | – | Inserts a longer pause/break (similar to line break) |
| [laugh] | – | Inserts laughter at the specified position |
This project is licensed under the MIT License – see the LICENSE file for details.