ComfyUI-FireRedTTS

★ 41

多说话人TTS实时流式情感化语调零样本声线克隆

为ComfyUI集成FireRedTTS-2，支持多说话人实时情感化TTS、流式生成与长文本稳定输出，支持自动下载与设备自适应。

💡 用于聊天机器人、播客或长篇角色配音的实时语音合成。

🍴 3 Forks💻 Python🔄 2025-09-16

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/8f9eee5e2cdb

📦 requirements.txt

#
Core
PyTorch
dependencies
torch>=2.0.0
torchaudio>=2.0.0
#
FireRedTTS2
model
dependencies
torchtune>=0.1.0
transformers>=4.30.0
huggingface_hub>=0.16.0
#
Audio
processing
and
utilities
einops>=0.6.0
tqdm>=4.64.0
#
Additional
dependencies
for
model
functionality
numpy>=1.21.0

📄 README

ComfyUI-FireRedTTS

A ComfyUI integration for FireRedTTS‑2, a real-time multi-speaker TTS system enabling high-quality, emotionally expressive dialogue and monologue synthesis. Leveraging a streaming architecture and context-aware prosody modeling, it supports natural speaker turns and stable long-form generation, ideal for interactive chat and podcast applications.

Features

Dialogue Generation: Multi-speaker conversation audio generation

Monologue Generation: Single-speaker narrative audio generation

Voice Cloning: Zero-shot voice cloning functionality

Multi-language Support: Chinese, English, Japanese, Korean, French, German, Russian

Automatic Model Download: Models download automatically on first use

Device Adaptive: Automatically selects optimal device (CUDA/MPS/CPU)

Installation

Method 1: ComfyUI Manager (Recommended)

open [ComfyUI Manager]

Search for “ComfyUI-FireRedTTS” in ComfyUI Manager

Click Install

Method 2: Manual Installation

Clone this repository to your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes

git clone https://github.com/1038lab/ComfyUI-FireRedTTS.git

Install dependencies:

cd ComfyUI-FireRedTTS

pip install -r requirements.txt

Restart ComfyUI

Model Download

On first use, the system will automatically download the FireRedTTS2 model from Hugging Face:

Model source: FireRedTeam/FireRedTTS2

Storage location: ComfyUI\models\TTS\FireRedTTS2

Download size: ~2GB

A progress bar will show during download. Once complete, the model is cached for future use.

Nodes

FireRedTTS2 Dialogue Node

Generates Multi-speaker dialogue audio.

Inputs:

text_list (STRING): Dialogue text with speaker tags ([S1], [S2])

temperature (FLOAT): Controls generation randomness (0.1-2.0, default: 0.9)

topk (INT): Controls sampling range (1-100, default: 30)

S1 (AUDIO, optional): Reference audio for Speaker 1

S1_text (STRING, optional): Reference text for Speaker 1

S2 (AUDIO, optional): Reference audio for Speaker 2

S2_text (STRING, optional): Reference text for Speaker 2

Outputs:

audio (AUDIO): Generated dialogue audio

sample_rate (INT): Audio sample rate (24000Hz)

FireRedTTS2 Monologue Node

Generates single-speaker monologue audio.

Inputs:

text (STRING): Input text content

temperature (FLOAT): Temperature parameter (0.1-2.0, default: 0.75)

topk (INT): TopK parameter (1-100, default: 20)

prompt_wav (STRING, optional): Reference audio file path

prompt_text (STRING, optional): Reference text content

Outputs:

audio (AUDIO): Generated monologue audio

sample_rate (INT): Audio sample rate (24000Hz)

Usage

Speaker Tag Format

Use square brackets to mark different speakers in dialogue text:

[S1]Hello, what a nice day![S2]Yes, perfect for a walk.[S1]Shall we go to the park?[S2]Great idea!

Supported speaker tags:

[S1] – Speaker 1

[S2] – Speaker 2

Voice Cloning Setup

For voice cloning, provide both audio and text for each speaker:

Speaker 1 (S1):

Connect reference audio to S1 input

Enter reference text in S1_text field

Speaker 2 (S2):

Connect reference audio to S2 input

Enter reference text in S2_text field

Examples

Basic Dialogue Generation

Add “FireRedTTS2 Dialogue” node

Input in text_list:

“`

[S1]Welcome to our podcast![S2]Today we’ll discuss AI development.[S1]That’s a fascinating topic indeed.

“`

Adjust temperature and topk parameters

Connect audio output to preview or save node

Voice Cloning Dialogue

Prepare reference audio files for each speaker

Connect Speaker 1 reference audio to S1 input

Enter Speaker 1 reference text in S1_text:

“`

This is a voice sample for speaker one

“`

Connect Speaker 2 reference audio to S2 input

Enter Speaker 2 reference text in S2_text:

“`

This is a voice sample for speaker two

“`

Monologue Generation

Add “FireRedTTS2 Monologue” node

Input long text content in text field

Optionally provide prompt_wav and prompt_text for voice cloning

Adjust parameters and generate audio

Parameter Guide

Temperature

Low (0.1-0.5): More stable, consistent speech

Medium (0.6-1.0): Balanced stability and naturalness

High (1.1-2.0): More variation and expressiveness, may be unstable

TopK

Low (1-20): Conservative sampling, more stable speech

Medium (21-50): Balanced choice

High (51-100): More diverse sampling, increased variation

Troubleshooting

Common Issues

Q: Model download fails

A: Check network connection and Hugging Face access. Try using proxy or mirror sites.

Q: CUDA out of memory

Reduce input text length

Lower batch size

Use CPU mode by setting device="cpu" in code

Q: Poor audio quality

Check input text format is correct

Adjust temperature parameter (recommended 0.7-1.0)

Ensure reference audio quality is good (if using voice cloning)

Q: Speaker tags not working

Ensure correct tag format: [S1], [S2], etc.

Check for extra spaces around tags

Confirm text contains corresponding speaker tags

Q: Node loading fails

Check dependencies are properly installed

Verify ComfyUI version compatibility

Check console for error messages

Performance Optimization

Memory Optimization:

Long texts are automatically split for processing

Model instances are cached and reused

Recommended single text length: under 500 characters

Speed Optimization:

First use requires model download, subsequent uses are faster

GPU acceleration significantly improves generation speed

Batch processing multiple short texts is more efficient than single long text

System Requirements

Minimum:

Python 3.8+

4GB RAM

2GB storage space (for models)

Recommended:

Python 3.9+

8GB+ RAM

NVIDIA GPU (4GB+ VRAM)

SSD storage

Support

If you encounter issues, please check:

Dependencies are fully installed

Models downloaded correctly

Input format meets requirements

System resources are sufficient

For more technical details, refer to the project source code and FireRedTTS2 official documentation.