ComfyUI_Fill-ChatterBox

★ 225

语音克隆文本转语音多语种情感表达

集成零样本语音克隆与多语种TTS，支持多模型、情感标签与语音转换，便于快速生成或转换语音。

💡 用几秒参考音快速克隆声音并生成多语种TTS。

🍴 39 Forks💻 Python🔄 2026-01-24

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/e00a65475347

📦 requirements.txt

numpy
resampy
librosa
s3tokenizer
transformers
diffusers
omegaconf
conformer
safetensors
soundfile
#
Optional
watermarking
(may
have
Python
3.12+
compatibility
issues)
#
resemble-perth

📄 README

FL ChatterBox

High-quality text-to-speech nodes for ComfyUI powered by ResembleAI’s Chatterbox models. Features voice cloning, multilingual synthesis, paralinguistic expressions, and voice conversion.

[](https://github.com/resemble-ai/chatterbox)

[](https://www.patreon.com/Machinedelusions)

Features

Zero-Shot Voice Cloning – Clone any voice from a few seconds of reference audio

3 TTS Models – Standard, Turbo (faster), and Multilingual variants

23 Languages – Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish

Paralinguistic Tags – Express emotions with tags like [laugh], [sigh], [gasp], [chuckle] (Turbo model)

Voice Conversion – Transform one voice to sound like another

Dialog Synthesis – Multi-speaker conversations with up to 4 voices

Model Caching – Keep models loaded between runs for faster iteration

Nodes

| Node | Description |

|——|————-|

| FL Chatterbox TTS | Standard high-quality text-to-speech with voice cloning |

| FL Chatterbox Turbo TTS | Faster GPT2-based TTS with paralinguistic tag support |

| FL Chatterbox Multilingual TTS | 23-language TTS with voice cloning |

| FL Chatterbox VC | Voice conversion – transform source audio to target voice |

| FL Chatterbox Dialog TTS | Multi-speaker dialog synthesis with up to 4 voices |

Installation

ComfyUI Manager

Search for “FL ChatterBox” and install.

Manual

cd ComfyUI/custom_nodes

git clone https://github.com/filliptm/ComfyUI_Fill-ChatterBox.git

cd ComfyUI_Fill-ChatterBox

pip install -r requirements.txt

Optional: Watermarking Support

pip install resemble-perth

Note: The resemble-perth package may have compatibility issues with Python 3.12+. Nodes will function without watermarking if import fails.

Quick Start

Add FL Chatterbox TTS (or Turbo/Multilingual variant)

Enter your text in the text field

Optionally connect reference audio for voice cloning

Set keep_model_loaded = True for faster subsequent runs

Generate!

Turbo Model with Expressions

Hello there! [laugh] Isn't this amazing? [sigh] I just love text to speech.

Supported tags: [laugh], [sigh], [gasp], [chuckle], [cough], [sniff], [groan], [shush], [clear throat]

Models

|——-|——-|———–|——-|

Models download automatically on first use to ComfyUI/models/chatterbox/.

Parameters

TTS Parameters

| Parameter | Range | Description |

|———–|——-|————-|

| exaggeration | 0.25-2.0 | Emotion intensity |

| cfg_weight | 0.2-1.0 | Pace/classifier-free guidance |

| temperature | 0.05-5.0 | Randomness in generation |

| seed | 0-4.29B | Reproducible generation |

| keep_model_loaded | bool | Cache model between runs |

Turbo Parameters

| Parameter | Range | Description |

|———–|——-|————-|

| temperature | 0.05-2.0 | Randomness in generation |

| top_k | 1-5000 | Top-k sampling |

| top_p | 0.1-1.0 | Nucleus sampling threshold |

| repetition_penalty | 1.0-3.0 | Token repetition penalty |

Limitations

Maximum audio length: ~40 seconds per generation

Reference audio: Minimum 5-6 seconds recommended

Turbo paralinguistic tags: English only

Requirements

Python 3.10+

8GB RAM minimum (16GB+ recommended)

NVIDIA GPU with 8GB+ VRAM recommended

CPU and Mac MPS supported

License

MIT License – See Chatterbox repo for model licenses.

Changelog

2025-12-28

Added Turbo TTS node (faster, GPT2-based with paralinguistic tags)

Added Multilingual TTS node (23 languages)

Improved model caching using module-level globals

Centralized model downloads to ComfyUI/models/chatterbox/

2025-07-24

Added Dialog TTS node for multi-speaker conversations (up to 4 speakers)

Extended all nodes with seed parameters for reproducible generation

Isolated audio track outputs per speaker

2025-06-24

Added seed parameter for reproducible generation

Made Perth watermarking optional for Python 3.12+ compatibility

2025-05-31

Added persistent model loading and loading bar

Added Mac MPS support

Native inference code (removed chatterbox-tts library dependency)