# ChatterboxTTS Dependencies for ComfyUI # Install with: pip install -r requirements.txt s3tokenizer>=0.1.7 resemble-perth librosa omegaconf accelerate transformers==4.46.3


An unofficial ComfyUI custom node integration for High-quality Text-to-Speech and Voice Conversion nodes for ComfyUI using ResembleAI’s ChatterboxTTS with unlimited text length!!!.
NEW: Audio capture node
🎤 ChatterBox TTS – Generate speech from text with optional voice cloning
🔄 ChatterBox VC – Convert voice from one speaker to another
🎙️ ChatterBox Voice Capture – Record voice input with smart silence detection
⚡ Fast & Quality – Production-grade TTS that outperforms ElevenLabs
🎭 Emotion Control – Unique exaggeration parameter for expressive speech
📝 Enhanced Chunking – Intelligent text splitting for long content with multiple combination methods
📦 Self-Contained – Bundled ChatterBox for zero-installation-hassle experience
Note: There are multiple ChatterBox extensions available. This implementation focuses on simplicity, ComfyUI standards, and enhanced text processing capabilities.
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI_ChatterBox.git
Expected folder structure for bundled approach:
ComfyUI_ChatterBox_Voice/
├── __init__.py
├── nodes.py
├── nodes_audio_recorder.py
├── chatterbox/
├── web/
├── models/ # ← Models bundled here (optional)
│ └── chatterbox/
│ ├── conds.pt
│ ├── s3gen.pt
│ ├── t3_cfg.pt
│ ├── tokenizer.json
│ └── ve.pt
└── README.md
pip install -r requirements.txt
Download the ChatterboxTTS models and place them in:
ComfyUI/models/TTS/chatterbox/
Required files:
conds.pt (105 KB)s3gen.pt (~1 GB)t3_cfg.pt (~1 GB)tokenizer.json (25 KB)ve.pt (5.5 MB)Download from: https://huggingface.co/ResembleAI/chatterbox/tree/main
pip install sounddevice
Long text support with smart processing:
auto – Smart selection based on text lengthconcatenate – Simple joiningsilence_padding – Add configurable silence between chunkscrossfade – Smooth audio blendingChunking Controls (all optional):
enable_chunking – Enable/disable smart chunking (default: True)max_chars_per_chunk – Chunk size limit (default: 400)chunk_combination_method – How to join audio (default: auto)silence_between_chunks_ms – Silence duration (default: 100ms)Auto-selection logic:
Priority-based model detection:
Console output shows source:
📦 Using BUNDLED ChatterBox (self-contained)
📦 Loading from bundled models: ./models/chatterbox
✅ ChatterboxTTS model loaded from bundled!
Long Text with Smart Chunking:
Text Input (2000+ chars) → ChatterBox TTS (chunking enabled) → PreviewAudio
Voice Cloning with Recording:
🎤 Voice Capture → ChatterBox TTS (reference_audio) → PreviewAudio
Voice Conversion Pipeline:
🎤 Voice Capture (source) → ChatterBox VC ← 🎤 Voice Capture (target)
Complete Advanced Pipeline:
Long Text Input → ChatterBox TTS (with voice reference) → PreviewAudio
↘ ChatterBox VC ← 🎤 Target Voice Recording
For Long Articles/Books:
max_chars_per_chunk=600, combination_method=silence_padding, silence_between_chunks_ms=200For Natural Speech:
max_chars_per_chunk=400, combination_method=auto (default – works well)For Fast Processing:
max_chars_per_chunk=800, combination_method=concatenateFor Smooth Audio:
max_chars_per_chunk=300, combination_method=crossfadeGeneral Recording:
silence_threshold=0.01, silence_duration=2.0 (default settings)Noisy Environment:
silence_threshold (~0.05) to ignore background noisesilence_duration (~3.0) to avoid cutting off speechQuiet Environment:
silence_threshold (~0.005) for sensitive detectionsilence_duration (~1.0) for quick stoppingGeneral Use:
exaggeration=0.5, cfg_weight=0.5 (default settings work well)Expressive Speech:
cfg_weight (~0.3) + higher exaggeration (~0.7)Unlike many TTS systems:
Sentence Boundary Detection:
.!? with proper spacingLong Sentence Handling:
Examples:
Input: "This is a very long article about artificial intelligence and machine learning. It contains multiple sentences and complex punctuation, including lists, quotes, and technical terms. The enhanced chunking system will split this intelligently."
Output: 3 well-formed chunks with natural boundaries
MIT License – Same as ChatterboxTTS
Note: The original ChatterBox model includes Resemble AI’s Perth watermarking system for responsible AI usage. This ComfyUI integration includes the Perth dependency but has watermarking disabled by default to ensure maximum compatibility. Users can re-enable watermarking by modifying the code if needed, while maintaining the full quality and capabilities of the underlying TTS model.