# Core PyTorch dependencies torch>=2.0.0 torchaudio>=2.0.0 # FireRedTTS2 model dependencies torchtune>=0.1.0 transformers>=4.30.0 huggingface_hub>=0.16.0 # Audio processing and utilities einops>=0.6.0 tqdm>=4.64.0 # Additional dependencies for model functionality numpy>=1.21.0

A ComfyUI integration for FireRedTTS‑2, a real-time multi-speaker TTS system enabling high-quality, emotionally expressive dialogue and monologue synthesis. Leveraging a streaming architecture and context-aware prosody modeling, it supports natural speaker turns and stable long-form generation, ideal for interactive chat and podcast applications.
cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-FireRedTTS.git
cd ComfyUI-FireRedTTS
pip install -r requirements.txt
On first use, the system will automatically download the FireRedTTS2 model from Hugging Face:
ComfyUI\models\TTS\FireRedTTS2A progress bar will show during download. Once complete, the model is cached for future use.
Generates Multi-speaker dialogue audio.
Inputs:
text_list (STRING): Dialogue text with speaker tags ([S1], [S2])temperature (FLOAT): Controls generation randomness (0.1-2.0, default: 0.9)topk (INT): Controls sampling range (1-100, default: 30)S1 (AUDIO, optional): Reference audio for Speaker 1S1_text (STRING, optional): Reference text for Speaker 1S2 (AUDIO, optional): Reference audio for Speaker 2S2_text (STRING, optional): Reference text for Speaker 2Outputs:
audio (AUDIO): Generated dialogue audiosample_rate (INT): Audio sample rate (24000Hz)Generates single-speaker monologue audio.
Inputs:
text (STRING): Input text contenttemperature (FLOAT): Temperature parameter (0.1-2.0, default: 0.75)topk (INT): TopK parameter (1-100, default: 20)prompt_wav (STRING, optional): Reference audio file pathprompt_text (STRING, optional): Reference text contentOutputs:
audio (AUDIO): Generated monologue audiosample_rate (INT): Audio sample rate (24000Hz)Use square brackets to mark different speakers in dialogue text:
[S1]Hello, what a nice day![S2]Yes, perfect for a walk.[S1]Shall we go to the park?[S2]Great idea!
Supported speaker tags:
[S1] – Speaker 1[S2] – Speaker 2For voice cloning, provide both audio and text for each speaker:
Speaker 1 (S1):
S1 inputS1_text fieldSpeaker 2 (S2):
S2 inputS2_text fieldtext_list:“`
[S1]Welcome to our podcast![S2]Today we’ll discuss AI development.[S1]That’s a fascinating topic indeed.
“`
temperature and topk parametersS1 inputS1_text:“`
This is a voice sample for speaker one
“`
S2 inputS2_text:“`
This is a voice sample for speaker two
“`
text fieldprompt_wav and prompt_text for voice cloningQ: Model download fails
A: Check network connection and Hugging Face access. Try using proxy or mirror sites.
Q: CUDA out of memory
A:
device="cpu" in codeQ: Poor audio quality
A:
Q: Speaker tags not working
A:
[S1], [S2], etc.Q: Node loading fails
A:
Memory Optimization:
Speed Optimization:
Minimum:
Recommended:
If you encounter issues, please check:
For more technical details, refer to the project source code and FireRedTTS2 official documentation.