ComfyUI-Orpheus-TTS

ComfyUI-Orpheus-TTS
★ 10

语音合成情感表达多声线音频特效
为ComfyUI提供高质量的Orpheus TTS语音合成,支持情感表达、多声线与长文本分段,并内置变调、变速、混响等专业音效。
💡 生成自然、有情感的朗读和配音,支持长文本与音效调节。
🍴 4 Forks💻 Python🔄 2025-05-03
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
torch
numpy
soundfile
transformers
huggingface_hub
nltk>=3.8.0
snac==0.1.0
librosa>=0.10.2
image
📄 README

ComfyUI-Orpheus-TTS

This project adds high-quality Text-to-Speech capabilities to ComfyUI using the Orpheus TTS model. Create natural-sounding voices with emotional expressions, multilingual support, and audio effects.

Features

  • 🎙️ High-quality, natural-sounding speech synthesis
  • 🎭 Support for emotional expressions and paralinguistic elements
  • 👥 Multiple voice options (tara, leah, jess, leo, dan, mia, zac, zoe, etc.)
  • 📝 Long text handling with automatic chunking for consistent output
  • 🎛️ Professional audio effects:
  • Pitch shifting (-12 to +12 semitones)
  • Speed adjustment (0.5x to 2.0x speed)
  • Volume control with anti-clipping protection
  • Audio normalization option
  • Reverb with adjustable room size and amount
  • Echo with configurable delay and decay
  • 🌐 Optional support for private Hugging Face models
  • 💻 Cross-platform: Works on Windows, Linux/WSL, and macOS
  • Installation

    1. Install the Extension

    Clone this repository into your ComfyUI’s custom_nodes directory:

    cd ComfyUI/custom_nodes
    git clone https://github.com/ShmuelRonen/ComfyUI-Orpheus-TTS.git

    2. Install Required Python Dependencies

    pip install torch numpy soundfile transformers huggingface_hub nltk snac

    For WSL 2, you may need to install directly from the GitHub repository:

    pip install git+https://github.com/hubertsiuzdak/snac.git

    3. Install SoX (Required for Audio Effects)

    Windows

  • Download SoX for Windows from the official SourceForge page
  • Download the .exe installer (e.g., sox-14.4.2-win32.exe)
  • Run the installer:
  • Follow the installation prompts
  • Important: Note the installation directory (default is usually C:\Program Files (x86)\sox-14-4-2\)
  • No need to add to PATH – the extension uses the direct path to SoX
  • WSL 2 (Ubuntu)

    sudo apt-get update
    sudo apt-get install sox

    macOS

    brew install sox

    4. Restart ComfyUI

    After installing all required components, restart ComfyUI to load the extension.

    Configuration

    Hugging Face Authentication (Optional)

    To access private models on Hugging Face, create a file named hf_config.json in the extension directory and insert your HF Token KEY:

    “`json

    {

    “token”: “YOUR_HUGGING_FACE_TOKEN_HERE”

    }

    “`

  • Save the file and restart ComfyUI
  • Your token will be used to authenticate with Hugging Face when downloading models. This is only required if you’re using private models or if you need higher rate limits.

    Nodes

    Orpheus TTS Model Loader

    Loads the required models for Orpheus TTS.

    Inputs:

  • snac_model_path (optional): Path to SNAC model (default: “hubertsiuzdak/snac_24khz”)
  • orpheus_model_path (optional): Path to Orpheus model (default: “canopylabs/orpheus-3b-0.1-ft”)
  • Outputs:

  • model: Model reference to be passed to the generate node
  • Orpheus TTS Generate

    Generates speech from text input.

    Inputs:

  • model: Model reference from the loader node
  • text: The text to convert to speech
  • voice: Voice style to use (tara, leah, jess, leo, dan, mia, zac, zoe, etc.)
  • language (optional): Language for multilingual output (en, fr, es, etc.)
  • max_chunk_size (optional): Maximum chunk size for long text processing
  • Outputs:

  • audio: Audio data to be passed to preview or effects nodes
  • Orpheus Audio Effects

    Applies high-quality audio processing to the generated speech.

    Inputs:

  • audio: Audio data from the generate node
  • pitch_shift: Semitone adjustment (-12 to +12)
  • speed_factor: Playback speed modifier (0.5x to 2.0x)
  • sox_path (optional): Custom path to SoX executable
  • gain_db (optional): Volume adjustment in decibels
  • use_limiter (optional): Enable/disable limiter for positive gain
  • normalize_audio (optional): Enable/disable audio normalization
  • add_reverb (optional): Enable/disable reverb effect
  • reverb_amount (optional): Reverb intensity
  • reverb_room_scale (optional): Size of virtual space
  • add_echo (optional): Enable/disable echo effect
  • echo_delay (optional): Time between echo repetitions
  • echo_decay (optional): How quickly echo fades
  • Outputs:

  • audio: Processed audio data
  • Looking at the README section you provided, I’ll expand it to include information about the different element position options, including the new pipe feature:

    Paralinguistic Elements

    You can add expressive elements to the speech by inserting these tags:

  • – Natural laughter
  • – Light, subtle laughter
  • – Exhaling with emotion
  • – Clearing throat
  • – Subtle nasal sound
  • – Low, grumbling sound
  • – Tired exhale
  • – Sudden intake of breath
  • Element Position Options

    The Element Position dropdown provides different ways to add these paralinguistic elements to your text:

  • None – No automatic element insertion. You can manually type the element tags in your text where desired.
  • “`

    I can’t believe it! That’s amazing!

    “`

  • Append – Automatically adds the selected element at the end of your text.
  • “`

    Input: “That’s amazing!”

    Output: “That’s amazing!

    “`

  • Prepend – Automatically adds the selected element at the beginning of your text.
  • “`

    Input: “I need to get back to work.”

    Output: “ I need to get back to work.”

    “`

  • Pipe – Replace pipe characters (|) in your text with the selected element. This gives you precise control over element placement.
  • “`

    Input: “I can’t believe it! | That’s the funniest thing | I’ve heard all day.”

    Element: laugh

    Output: “I can’t believe it! That’s the funniest thing I’ve heard all day.”

    “`

    Examples:

    Manual placement (Element Position: None):

    I can't believe it! <laugh> That's the funniest thing I've heard all day.
    <sigh> But now I need to get back to work.

    Using pipe placeholders (Element Position: Pipe):

    Input: "Did you hear that? | It's hilarious! | I can't stop laughing!"
    Element: laugh
    Result: "Did you hear that? <laugh> It's hilarious! <laugh> I can't stop laughing!"

    Multiple elements in one text:

    <gasp> What was that? <pause> Did you hear something? <sigh> Maybe I'm just tired.

    Audio Effect Tips

    Volume Control

  • Gain Control: Use gain_db to increase or decrease volume without distortion
  • Positive values (0 to +20 dB): Increase volume with automatic clipping prevention
  • Negative values (-20 to 0 dB): Decrease volume
  • For best results with multiple effects, set gain last in your workflow
  • Normalization: Enable normalize_audio to automatically balance levels
  • Great for ensuring consistent volume across different voice samples
  • Applied before other effects for best results
  • Reverb

    Reverb adds a sense of space to your audio. Here are some suggested settings:

  • Small Room: reverb_amount = 20, reverb_room_scale = 25
  • Medium Room: reverb_amount = 40, reverb_room_scale = 50
  • Large Hall: reverb_amount = 70, reverb_room_scale = 80
  • Cathedral: reverb_amount = 90, reverb_room_scale = 95
  • Echo

    Echo creates repeating sound reflections. Good settings to try:

  • Subtle Echo: echo_delay = 0.3, echo_decay = 0.3
  • Moderate Echo: echo_delay = 0.5, echo_decay = 0.5
  • Canyon Echo: echo_delay = 1.0, echo_decay = 0.7
  • Effect Combinations

  • Phone Call: pitch_shift = 0, speed_factor = 1.0, add_reverb = True, reverb_amount = 10, reverb_room_scale = 10
  • Radio Announcer: pitch_shift = -2, speed_factor = 0.9, add_reverb = True, reverb_amount = 20, gain_db = 3
  • Stadium Announcement: pitch_shift = 0, speed_factor = 1.0, add_reverb = True, reverb_amount = 60, add_echo = True, echo_delay = 0.8
  • Child Voice: pitch_shift = 4, speed_factor = 1.1, gain_db = 2
  • Deep Voice: pitch_shift = -4, speed_factor = 0.9, gain_db = -2
  • Usage Examples

    Basic Text-to-Speech

  • Add “Orpheus TTS Model Loader”
  • Add “Orpheus TTS Generate”
  • Connect the model loader’s output to the generate node’s input
  • Enter your text and select voice options
  • Connect to “Preview Audio” node to hear the result
  • Advanced: TTS with Audio Effects

  • Add “Orpheus TTS Model Loader”
  • Add “Orpheus TTS Generate”
  • Add “Orpheus Audio Effects”
  • Connect in sequence: Model Loader → Generate → Audio Effects → Preview Audio
  • Adjust pitch shift and speed factor sliders
  • Cross-Platform Compatibility

    This extension has been tested and works on:

  • Windows 10/11
  • Linux (including WSL 2 on Windows)
  • macOS
  • Different environments may require specific setup steps:

    Windows Notes

  • SoX is automatically located in standard installation directories
  • If installed elsewhere, provide the full path in the effects node
  • WSL 2 Notes

  • Use pip install git+https://github.com/hubertsiuzdak/snac.git to ensure compatibility
  • SoX is automatically located through the system PATH
  • macOS Notes

  • Install SoX via Homebrew for best compatibility
  • SoX Troubleshooting

    Windows

    If you encounter issues with SoX:

  • Verify the SoX path in the “Orpheus Audio Effects” node:
  • Default: C:\Program Files (x86)\sox-14-4-2\sox.exe
  • If your installation is in a different location, provide the full path to sox.exe
  • Check if SoX is installed correctly:
  • Open Command Prompt
  • Run "C:\Program Files (x86)\sox-14-4-2\sox.exe" --version
  • If you get an error, reinstall SoX
  • WSL 2 (Ubuntu)

  • Verify SoX installation:
  • “`bash

    sox –version

    “`

  • If SoX is not found, install it:
  • “`bash

    sudo apt-get update

    sudo apt-get install sox

    “`

    Model Details

    This extension uses the following models:

  • Orpheus TTS: A powerful text-to-speech model developed by Canopy AI
  • SNAC Codec: A high-quality neural audio codec for voice synthesis
  • License

    This project uses models with their own licenses:

  • Orpheus Model: Canopy AI’s Orpheus-TTS
  • SNAC Model: Hubert Siuzdak’s model
  • Please consult these licenses for usage terms and restrictions.

    Credits

  • Original Orpheus TTS implementation by Canopy AI
  • SoX audio processing library: SoX – Sound eXchange
  • ComfyUI: ComfyUI