ComfyUI-FireRedTTS

ComfyUI-FireRedTTS
★ 41

多说话人TTS实时流式情感化语调零样本声线克隆
为ComfyUI集成FireRedTTS-2,支持多说话人实时情感化TTS、流式生成与长文本稳定输出,支持自动下载与设备自适应。
💡 用于聊天机器人、播客或长篇角色配音的实时语音合成。
🍴 3 Forks💻 Python🔄 2025-09-16
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
#
Core
PyTorch
dependencies
torch>=2.0.0
torchaudio>=2.0.0
#
FireRedTTS2
model
dependencies
torchtune>=0.1.0
transformers>=4.30.0
huggingface_hub>=0.16.0
#
Audio
processing
and
utilities
einops>=0.6.0
tqdm>=4.64.0
#
Additional
dependencies
for
model
functionality
numpy>=1.21.0
ComfyUI-FireRedTTS
📄 README

ComfyUI-FireRedTTS

A ComfyUI integration for FireRedTTS‑2, a real-time multi-speaker TTS system enabling high-quality, emotionally expressive dialogue and monologue synthesis. Leveraging a streaming architecture and context-aware prosody modeling, it supports natural speaker turns and stable long-form generation, ideal for interactive chat and podcast applications.

Features

  • Dialogue Generation: Multi-speaker conversation audio generation
  • Monologue Generation: Single-speaker narrative audio generation
  • Voice Cloning: Zero-shot voice cloning functionality
  • Multi-language Support: Chinese, English, Japanese, Korean, French, German, Russian
  • Automatic Model Download: Models download automatically on first use
  • Device Adaptive: Automatically selects optimal device (CUDA/MPS/CPU)
  • Installation

    Method 1: ComfyUI Manager (Recommended)

  • open [ComfyUI Manager]
  • Search for “ComfyUI-FireRedTTS” in ComfyUI Manager
  • Click Install
  • Method 2: Manual Installation

  • Clone this repository to your ComfyUI custom nodes directory:
  • cd ComfyUI/custom_nodes
    
    git clone https://github.com/1038lab/ComfyUI-FireRedTTS.git
    

  • Install dependencies:
  • cd ComfyUI-FireRedTTS
    
    pip install -r requirements.txt
    

  • Restart ComfyUI
  • Model Download

    On first use, the system will automatically download the FireRedTTS2 model from Hugging Face:

  • Model source: FireRedTeam/FireRedTTS2
  • Storage location: ComfyUI\models\TTS\FireRedTTS2
  • Download size: ~2GB
  • A progress bar will show during download. Once complete, the model is cached for future use.

    Nodes

    FireRedTTS2 Dialogue Node

    Generates Multi-speaker dialogue audio.

    Inputs:

  • text_list (STRING): Dialogue text with speaker tags ([S1], [S2])
  • temperature (FLOAT): Controls generation randomness (0.1-2.0, default: 0.9)
  • topk (INT): Controls sampling range (1-100, default: 30)
  • S1 (AUDIO, optional): Reference audio for Speaker 1
  • S1_text (STRING, optional): Reference text for Speaker 1
  • S2 (AUDIO, optional): Reference audio for Speaker 2
  • S2_text (STRING, optional): Reference text for Speaker 2
  • Outputs:

  • audio (AUDIO): Generated dialogue audio
  • sample_rate (INT): Audio sample rate (24000Hz)
  • FireRedTTS2 Monologue Node

    Generates single-speaker monologue audio.

    Inputs:

  • text (STRING): Input text content
  • temperature (FLOAT): Temperature parameter (0.1-2.0, default: 0.75)
  • topk (INT): TopK parameter (1-100, default: 20)
  • prompt_wav (STRING, optional): Reference audio file path
  • prompt_text (STRING, optional): Reference text content
  • Outputs:

  • audio (AUDIO): Generated monologue audio
  • sample_rate (INT): Audio sample rate (24000Hz)
  • Usage

    Speaker Tag Format

    Use square brackets to mark different speakers in dialogue text:

    [S1]Hello, what a nice day![S2]Yes, perfect for a walk.[S1]Shall we go to the park?[S2]Great idea!
    

    Supported speaker tags:

  • [S1] – Speaker 1
  • [S2] – Speaker 2
  • Voice Cloning Setup

    For voice cloning, provide both audio and text for each speaker:

    Speaker 1 (S1):

  • Connect reference audio to S1 input
  • Enter reference text in S1_text field
  • Speaker 2 (S2):

  • Connect reference audio to S2 input
  • Enter reference text in S2_text field
  • Examples

    Basic Dialogue Generation

  • Add “FireRedTTS2 Dialogue” node
  • Input in text_list:
  • “`

    [S1]Welcome to our podcast![S2]Today we’ll discuss AI development.[S1]That’s a fascinating topic indeed.

    “`

  • Adjust temperature and topk parameters
  • Connect audio output to preview or save node
  • Voice Cloning Dialogue

  • Prepare reference audio files for each speaker
  • Connect Speaker 1 reference audio to S1 input
  • Enter Speaker 1 reference text in S1_text:
  • “`

    This is a voice sample for speaker one

    “`

  • Connect Speaker 2 reference audio to S2 input
  • Enter Speaker 2 reference text in S2_text:
  • “`

    This is a voice sample for speaker two

    “`

    Monologue Generation

  • Add “FireRedTTS2 Monologue” node
  • Input long text content in text field
  • Optionally provide prompt_wav and prompt_text for voice cloning
  • Adjust parameters and generate audio
  • Parameter Guide

    Temperature

  • Low (0.1-0.5): More stable, consistent speech
  • Medium (0.6-1.0): Balanced stability and naturalness
  • High (1.1-2.0): More variation and expressiveness, may be unstable
  • TopK

  • Low (1-20): Conservative sampling, more stable speech
  • Medium (21-50): Balanced choice
  • High (51-100): More diverse sampling, increased variation
  • Troubleshooting

    Common Issues

    Q: Model download fails

    A: Check network connection and Hugging Face access. Try using proxy or mirror sites.

    Q: CUDA out of memory

    A:

  • Reduce input text length
  • Lower batch size
  • Use CPU mode by setting device="cpu" in code
  • Q: Poor audio quality

    A:

  • Check input text format is correct
  • Adjust temperature parameter (recommended 0.7-1.0)
  • Ensure reference audio quality is good (if using voice cloning)
  • Q: Speaker tags not working

    A:

  • Ensure correct tag format: [S1], [S2], etc.
  • Check for extra spaces around tags
  • Confirm text contains corresponding speaker tags
  • Q: Node loading fails

    A:

  • Check dependencies are properly installed
  • Verify ComfyUI version compatibility
  • Check console for error messages
  • Performance Optimization

    Memory Optimization:

  • Long texts are automatically split for processing
  • Model instances are cached and reused
  • Recommended single text length: under 500 characters
  • Speed Optimization:

  • First use requires model download, subsequent uses are faster
  • GPU acceleration significantly improves generation speed
  • Batch processing multiple short texts is more efficient than single long text
  • System Requirements

    Minimum:

  • Python 3.8+
  • 4GB RAM
  • 2GB storage space (for models)
  • Recommended:

  • Python 3.9+
  • 8GB+ RAM
  • NVIDIA GPU (4GB+ VRAM)
  • SSD storage
  • Support

    If you encounter issues, please check:

  • Dependencies are fully installed
  • Models downloaded correctly
  • Input format meets requirements
  • System resources are sufficient
  • For more technical details, refer to the project source code and FireRedTTS2 official documentation.