ComfyUI-Geeky-Kokoro-TTS

ComfyUI-Geeky-Kokoro-TTS
★ 49

文本转语音语音变声音色混合GPU加速
在 ComfyUI 中集成 Kokoro TTS 与高级语音变换,支持 54+ 音色、语音混合、引导变声与音高校正
💡 在 ComfyUI 流程中生成并实时调整自然语音
🍴 10 Forks💻 Python🔄 2026-03-22
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/e58c8376a81b
📦 requirements.txt
#
Geeky
Kokoro
TTS
Requirements
-
2025
Edition
#
Python
3.12
and
3.13
compatible
#
Updated
for
Kokoro
v0.19+
and
ComfyUI
v3.49+
#
Core
TTS
dependencies
kokoro>=0.9.4
soundfile>=0.12.1
torch>=2.0.0
tqdm>=4.64.0
einops>=0.6.0
#
Numeric
computing
(Python
3.12/3.13
compatible)
numpy>=1.24.0,<2.0.0
#
Audio
processing
dependencies
(for
Voice
Mod
features)
librosa>=0.10.0
scipy>=1.9.0,<2.0.0
resampy>=0.4.3
#
Language-specific
dependencies
pyopenjtalk>=0.3.0
#
For
Japanese
language
support
ordered-set>=4.1.0
#
For
Chinese
language
support
#
Optional
but
recommended
#
espeak-ng
(system
package,
install
via:
apt-get
install
espeak-ng
or
brew
install
espeak-ng)
📄 README

🔊 Geeky Kokoro TTS for ComfyUI – 2025 Edition (Does not work with python 3.13)

The most comprehensive Kokoro TTS implementation for ComfyUI with ALL 54+ voices across 9 languages, voice blending, and advanced voice modification effects.

[](https://www.python.org/downloads/)

[](https://www.python.org/downloads/)

[](https://github.com/comfyanonymous/ComfyUI)

[](LICENSE)

🌟 What’s New in 2025 Edition

Ignore the Advanced Voice Mod node for now, it’s an experimental thing currently.

Special Note: Advanced Voice Mod node currently under construction. Will not function as intended at the moment. Japanese voices do not work at the moment either, will require custom wheel.

Complete Rewrite from Ground Up

  • 🌍 54+ Voices: Complete support for all Kokoro-82M voices across 9 languages
  • 🎯 9 Languages: US English, UK English, Japanese (not working yet, needs a custom wheel build), Mandarin Chinese, Spanish, French, Hindi, Italian, Brazilian Portuguese
  • 🔀 Advanced Voice Blending: Mix any two voices with adjustable blend ratios
  • **🐍 Python 3.12: Fully tested and optimized for the latest Python versions
  • 📦 Modern Architecture: Completely rewritten following 2025 ComfyUI best practices
  • ⚡ Improved Performance: Better memory management and processing speed
  • 🛡️ Enhanced Reliability: Robust error handling and fallback mechanisms
  • Key Features

  • ✅ ALL 54+ Kokoro-82M voices (nothing left out!)
  • ✅ Voice blending with linear interpolation
  • NEW: Guided Voice Morphing – Use any audio file to guide voice transformation
  • NEW: Autotune-style Pitch Correction – Match pitch to reference audio
  • NEW: Advanced Spectral Morphing – Match tone, timbre, and character
  • NEW: 18 Voice Profiles – Professional presets for instant transformations
  • ✅ Advanced voice modification effects (pitch, formant, reverb, etc.)
  • ✅ Intelligent text chunking that preserves sentence order
  • ✅ GPU acceleration with automatic CPU fallback
  • ✅ Multi-language support with proper phoneme handling
  • ✅ Professional audio processing pipeline with Dynamic Time Warping
  • ✅ ComfyUI v3.49+ compatibility
  • 📋 Table of Contents

  • Installation
  • Complete Voice List
  • Usage Guide
  • Voice Blending
  • Guided Voice Morphing (NEW!)
  • Voice Modification Effects
  • Technical Details
  • Troubleshooting
  • Credits
  • 🔧 Installation

    Prerequisites

  • ComfyUI v3.49+ (or compatible version)
  • **Python 3.9, 3.10, 3.11, 3.12, (3.13+ not supported)
  • PyTorch 2.0+ (usually included with ComfyUI)
  • 4GB+ RAM (8GB recommended for longer texts)
  • Optional: CUDA-capable GPU for faster processing
  • Method 1: ComfyUI Manager (Recommended)

  • Open ComfyUI and navigate to “Manager”
  • Click “Install Custom Nodes”
  • Search for “Geeky Kokoro TTS”
  • Click “Install” and restart ComfyUI
  • Done! Nodes will appear in the “audio” category
  • Method 2: Manual Installation (Git Clone)

    # Navigate to your ComfyUI custom nodes directory
    
    cd ComfyUI/custom_nodes
    
    
    
    # Clone this repository
    
    git clone https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS.git
    
    
    
    # Navigate into the directory
    
    cd ComfyUI-Geeky-Kokoro-TTS
    
    
    
    # Install Python dependencies
    
    pip install -r requirements.txt
    
    
    
    # Optional: Run installation verification script
    
    python install.py
    

    Method 3: ComfyUI Portable (Windows)

    REM Navigate to custom nodes directory
    
    cd ComfyUI_windows_portable\ComfyUI\custom_nodes
    
    
    
    REM Clone repository
    
    git clone https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS.git
    
    
    
    REM Navigate into directory
    
    cd ComfyUI-Geeky-Kokoro-TTS
    
    
    
    REM Install with portable Python
    
    ..\..\..\python_embeded\python.exe -m pip install -r requirements.txt
    

    System Dependencies (Optional but Recommended)

    For best phoneme processing, install espeak-ng:

    Ubuntu/Debian:

    sudo apt-get update
    
    sudo apt-get install espeak-ng
    

    macOS:

    brew install espeak-ng
    

    Windows:

    Download and install from: https://github.com/espeak-ng/espeak-ng/releases

    🎭 Complete Voice List (54+ Voices)

    🇺🇸 US English Voices (20 voices)

    Female Voices (11)

    | Voice Name | Code | Character | Best For |

    |————|——|———–|———-|

    | Heart ❤️ | af_heart | Warm, friendly, natural | Narration, audiobooks, general purpose |

    | Bella 🔥 | af_bella | Energetic, dynamic, engaging | Marketing, announcements, enthusiastic content |

    | Nicole 🎧 | af_nicole | Clear, professional, articulate | Training videos, tutorials, instructional content |

    | Aoede 🎵 | af_aoede | Musical, expressive, artistic | Creative content, storytelling, entertainment |

    | Kore | af_kore | Balanced, versatile | General purpose, business content |

    | Sarah | af_sarah | Neutral, calm, reliable | Documentation, formal content, reports |

    | Nova ⭐ | af_nova | Bright, modern, upbeat | Social media, vlogs, casual content |

    | Sky ☁️ | af_sky | Soft, gentle, soothing | Meditation, relaxation, ASMR |

    | Alloy | af_alloy | Professional, authoritative | Corporate, presentations, business |

    | Jessica | af_jessica | Friendly, approachable | Customer service, help content, guides |

    | River 🌊 | af_river | Flowing, natural, smooth | Long-form narration, podcasts |

    Male Voices (9)

    | Voice Name | Code | Character | Best For |

    |————|——|———–|———-|

    | Michael | am_michael | Deep, authoritative, commanding | Documentary, serious content, news |

    | Fenrir 🐺 | am_fenrir | Strong, bold, powerful | Action content, gaming, intense narration |

    | Puck 🎭 | am_puck | Playful, character-driven, versatile | Entertainment, comedy, character voices |

    | Echo 🔊 | am_echo | Clear, resonant, memorable | Announcements, radio-style content |

    | Eric | am_eric | Reliable, professional | Business, training, educational content |

    | Liam | am_liam | Modern, relatable, friendly | Casual content, social media, vlogs |

    | Onyx 💎 | am_onyx | Rich, deep, elegant | Premium content, luxury brands, sophistication |

    | Adam | am_adam | Classic, versatile, dependable | General purpose, all-around use |

    | Santa 🎅 | am_santa | Warm, jolly, festive | Holiday content, cheerful narration |

    🇬🇧 UK English Voices (8 voices)

    Female Voices (4)

    | Voice Name | Code | Character | Best For |

    |————|——|———–|———-|

    | Emma | bf_emma | Refined, elegant, sophisticated | Formal content, literature, high-end narration |

    | Isabella | bf_isabella | Professional, articulate | Business, corporate, presentations |

    | Alice 📚 | bf_alice | Clear, storytelling, engaging | Children’s content, education, books |

    | Lily 🌸 | bf_lily | Gentle, pleasant, approachable | General content, tutorials, friendly narration |

    Male Voices (4)

    | Voice Name | Code | Character | Best For |

    |————|——|———–|———-|

    | George | bm_george | Authoritative, professional, commanding | Business, education, serious content |

    | Fable 📖 | bm_fable | Narrative, expressive, storytelling | Audiobooks, tales, creative content |

    | Lewis | bm_lewis | Reliable, clear, articulate | Training, documentation, instructional content |

    | Daniel | bm_daniel | Modern, professional, versatile | General purpose, business, presentations |

    🇯🇵 Japanese Voices (5 voices) – Not Working, needs custom wheel

    | Voice Name | Code | Gender | Character | Best For |

    |————|——|——–|———–|———-|

    | Hina ひな | jf_hina | Female | Gentle, youthful, sweet | Anime, casual content, friendly narration |

    | Yuki 雪 | jf_yuki | Female | Cool, elegant, refined | Formal content, professional narration |

    | Sakura 桜 | jf_sakura | Female | Warm, traditional, pleasant | Cultural content, storytelling |

    | Sora 空 | jf_sora | Female | Bright, energetic, cheerful | Entertainment, upbeat content |

    | Kaito 海斗 | jm_kaito | Male | Strong, confident, clear | News, serious content, professional narration |

    🇨🇳 Mandarin Chinese Voices (8 voices)

    Female Voices (4)

    | Voice Name | Code | Character | Best For |

    |————|——|———–|———-|

    | Xiaoxiao 小小 | zf_xiaoxiao | Gentle, friendly, approachable | General purpose, casual content |

    | Yunxi 云希 | zf_yunxi | Professional, clear, articulate | Business, news, formal content |

    | Xiaoyi 小艺 | zf_xiaoyi | Energetic, youthful, lively | Entertainment, social media |

    | Xiaoxuan 小萱 | zf_xiaoxuan | Warm, expressive, engaging | Storytelling, narration |

    Male Voices (4)

    | Voice Name | Code | Character | Best For |

    |————|——|———–|———-|

    | Yunyang 云扬 | zm_yunyang | Strong, authoritative, commanding | News, serious content, professional |

    | Yunfeng 云枫 | zm_yunfeng | Calm, mature, reliable | Documentation, education |

    | Yunhao 云昊 | zm_yunhao | Clear, professional, articulate | Business, presentations |

    | Yunxia 云霞 | zm_yunxia | Versatile, balanced | General purpose content |

    🇪🇸 Spanish Voices (3 voices)

    | Voice Name | Code | Gender | Character | Best For |

    |————|——|——–|———–|———-|

    | Sofia | ef_sofia | Female | Warm, friendly, engaging | General content, narration, education |

    | Diego | em_diego | Male | Confident, clear, professional | Business, formal content, news |

    | Carlos | em_carlos | Male | Friendly, approachable, versatile | Casual content, tutorials |

    🇫🇷 French Voice (1 voice)

    | Voice Name | Code | Gender | Character | Best For |

    |————|——|——–|———–|———-|

    | Céline | ff_celine | Female | Elegant, refined, sophisticated | All French content, narration, professional |

    🇮🇳 Hindi Voices (4 voices)

    | Voice Name | Code | Gender | Character | Best For |

    |————|——|——–|———–|———-|

    | Priya | hf_priya | Female | Friendly, warm, approachable | General content, education |

    | Anjali | hf_anjali | Female | Professional, clear, articulate | Business, formal content |

    | Arjun | hm_arjun | Male | Strong, confident, authoritative | News, serious content |

    | Raj | hm_raj | Male | Friendly, versatile, engaging | General purpose, casual content |

    🇮🇹 Italian Voices (2 voices)

    | Voice Name | Code | Gender | Character | Best For |

    |————|——|——–|———–|———-|

    | Giulia | if_giulia | Female | Expressive, warm, engaging | Narration, storytelling, general content |

    | Marco | im_marco | Male | Confident, professional, clear | Business, formal content, presentations |

    🇧🇷 Brazilian Portuguese Voices (3 voices)

    | Voice Name | Code | Gender | Character | Best For |

    |————|——|——–|———–|———-|

    | Lúcia | pf_lucia | Female | Warm, friendly, natural | General content, education, narration |

    | João | pm_joao | Male | Professional, clear, reliable | Business, news, formal content |

    | Pedro | pm_pedro | Male | Friendly, approachable, versatile | Casual content, tutorials, general purpose |

    🚀 Usage Guide

    Basic Text-to-Speech

  • Add the Node: In ComfyUI, add “🔊 Geeky Kokoro TTS (2025)” node to your workflow
  • Enter Text: Type or paste your text in the multiline text field
  • Select Voice: Choose from 54+ voices in the dropdown
  • Adjust Speed: Set speed from 0.5x (slower) to 2.0x (faster)
  • GPU Option: Enable “use_gpu” if you have a CUDA-capable GPU
  • Generate: Connect to audio output or preview node
  • Voice Blending (Creating Unique Voices)

    Voice blending allows you to create unique vocal characteristics by mixing two voices:

  • Enable Blending: Check the “enable_blending” checkbox
  • Select Second Voice: Choose a second voice from the dropdown
  • Adjust Blend Ratio:
  • 1.0 = 100% primary voice (no blending)
  • 0.7 = 70% primary, 30% secondary (subtle blend)
  • 0.5 = 50/50 mix (balanced blend)
  • 0.3 = 30% primary, 70% secondary (secondary dominant)
  • 0.0 = 100% secondary voice
  • Blending Tips:

  • Mix voices from the same language for best results
  • Blend male + female voices for androgynous effects
  • Try Heart + Bella at 0.6 for energetic yet warm narration
  • Try Michael + Adam at 0.5 for rich, authoritative voice
  • Experiment with ratios to find your perfect voice!
  • 🎵 Guided Voice Morphing (NEW!)

    The game-changing feature that makes voices sing, match, and transform!

    The Advanced Voice node now supports guided voice morphing – using a secondary audio file (like a song or reference voice) to guide the transformation of your TTS output. Perfect for:

  • Making TTS voices “sing” along to music
  • Matching tone and style of reference speakers
  • Creating autotune-style effects
  • Professional voice-over matching
  • How to Use Guided Morphing

  • Connect Guide Audio:
  • Load your guide audio (song, reference voice, etc.)
  • Connect it to the guide_audio input on the Advanced Voice node
  • Enable Morphing:
  • Check the enable_guided_morph checkbox
  • Adjust Morph Parameters (0.0 to 1.0):
  • Pitch Morph: Match pitch contour to guide audio (autotune effect)
  • Formant Morph: Match vocal character and tone
  • Spectral Morph: Match overall timbre and frequency balance
  • Amplitude Morph: Match dynamics and volume envelope
  • Morphing Parameters Explained

    Pitch Morph (0.0 – 1.0)

  • 0.0: No pitch change (original TTS pitch)
  • 0.3-0.5: Subtle pitch guidance (natural autotune)
  • 0.7-0.9: Strong pitch matching (follows melody closely)
  • 1.0: Complete pitch matching (perfect autotune)
  • Use Cases:

  • Music: 0.7-1.0 to make voice follow melody
  • Speech matching: 0.3-0.5 for natural intonation
  • Character voice: 0.0 (use manual pitch shift instead)
  • Formant Morph (0.0 – 1.0)

  • Matches the vocal tract characteristics
  • Affects perceived age, gender, and character
  • 0.0: Original voice character
  • 0.5: Blend of both voices
  • 1.0: Fully matched character
  • Use Cases:

  • Voice cloning: 0.6-0.9
  • Gender transformation: 0.5-0.7
  • Age adjustment: 0.4-0.6
  • Spectral Morph (0.0 – 1.0)

  • Matches overall frequency spectrum and timbre
  • Affects “brightness”, “warmth”, and tonal quality
  • Most subtle but powerful for natural matching
  • Use Cases:

  • Microphone matching: 0.5-0.7
  • Tone matching: 0.6-0.8
  • Style transfer: 0.4-0.6
  • Amplitude Morph (0.0 – 1.0)

  • Matches volume dynamics and expression
  • Follows the energy and intensity patterns
  • Great for emotional expression
  • Use Cases:

  • Dynamic speech: 0.5-0.7
  • Singing expression: 0.6-0.8
  • Whisper/shout: 0.4-0.6
  • Guided Morphing Examples

    Example 1: Make TTS Voice Sing

    Setup:
    
    1. Generate TTS with lyrics text
    
    2. Load instrumental or vocal track as guide_audio
    
    3. Enable guided morph
    
    4. Set: pitch_morph=0.8, formant_morph=0.3, spectral_morph=0.4
    
    
    
    Result: Voice follows melody while maintaining TTS character
    

    Example 2: Clone Speaking Style

    Setup:
    
    1. Generate TTS with script
    
    2. Load reference speaker audio as guide_audio
    
    3. Enable guided morph
    
    4. Set: pitch_morph=0.4, formant_morph=0.7, spectral_morph=0.6
    
    
    
    Result: TTS matches speaking style and voice character
    

    Example 3: Autotune Effect

    Setup:
    
    1. Generate TTS with any text
    
    2. Load musical scale or melody as guide_audio
    
    3. Enable guided morph
    
    4. Set: pitch_morph=1.0, formant_morph=0.0, spectral_morph=0.2
    
    
    
    Result: Perfect pitch-corrected robotic singing effect
    

    Advanced Voice Effects

    Connect the TTS output to “🎛️ Geeky Kokoro Advanced Voice (2025)” node for effects:

    Preset Profiles (18 Total):

    Original Profiles:

  • Cinematic: Deep, movie-trailer style (-3 semitones, reverb, compression)
  • Monster: Growling creature voice (-6 semitones, formant shift, distortion)
  • Robot: Mechanical, synthesized voice (band-pass filter, modulation)
  • Child: Young character voice (+3 semitones, formant shift)
  • Darth Vader: Deep, breathing villain voice (-4 semitones, echo, modulation)
  • Singer: Optimized for vocal content (compression, EQ, reverb)
  • NEW Profiles:

  • Alien: Otherworldly voice (-8 semitones, extreme formant shift, modulation)
  • Deep Voice: Professional bass voice (-5 semitones, bass boost)
  • Chipmunk: High-pitched cartoon voice (+6 semitones, formant shift up)
  • Telephone: Classic phone quality (300-3400Hz bandpass, compression)
  • Radio: Broadcast radio sound (100-5000Hz, compression, EQ)
  • Cathedral: Large reverberant space (heavy reverb, echo)
  • Cave: Echo chamber effect (reverb, echo with feedback)
  • Metallic: Robotic metallic sound (ring modulation, bandpass)
  • Whisper: Quiet breathy voice (noise, reduced bass)
  • Shout: Loud emphasized voice (compression, distortion, mid boost)
  • Custom: Full manual control of all parameters
  • Manual Controls:

  • Pitch Shift: ±12 semitones (0.1 step precision)
  • Formant Shift: Vocal tract size adjustment (-5 to +5)
  • Time Stretch: Speed without pitch change (0.5x to 2.0x)
  • Reverb: Room ambiance with room size control
  • Echo: Discrete repeats with adjustable feedback
  • Distortion: Harmonic saturation (0.0 to 1.0)
  • Compression: Dynamic range control
  • 3-Band EQ: Bass, Mid, Treble (-1.0 to +1.0)
  • Brightness: High-frequency emphasis (-1.0 to +1.0)
  • Warmth: Low-frequency emphasis (-1.0 to +1.0)
  • Effect Blend: Mix with original audio (0.0 to 1.0)
  • Output Volume: -60dB to +60dB
  • ⚙️ Technical Details

    Model Information

  • Model: Kokoro-82M v0.19
  • Parameters: 82 million
  • Architecture: Decoder-only based on StyleTTS 2 + ISTFTNet
  • Sample Rate: 24kHz
  • License: Apache 2.0
  • Repository: hexgrad/Kokoro-82M
  • Performance Benchmarks

    Processing Speed (Python 3.12, CUDA GPU):

  • Short text (< 200 chars): ~2-3 seconds
  • Medium text (200-800 chars): ~5-10 seconds
  • Long text (800+ chars): ~15-30 seconds
  • Voice blending: +20% processing time
  • Voice effects: +5-15% processing time
  • Guided morphing: +30-50% processing time (feature extraction + morphing)
  • Memory Usage:

  • Base model: ~2GB VRAM/RAM
  • With GPU acceleration: ~3GB VRAM
  • Voice effects processing: +500MB
  • Voice blending: +200MB temporary
  • Guided morphing: +800MB-1.5GB (feature extraction + DTW alignment)
  • Guided Morphing Technology

    Feature Extraction:

  • Pitch Tracking: PYIN algorithm with autocorrelation fallback
  • Formant Analysis: LPC (Linear Predictive Coding) with Levinson-Durbin recursion
  • Spectral Envelope: Cepstral smoothing with liftering
  • Amplitude Envelope: RMS energy tracking
  • MFCC: 13-coefficient mel-frequency cepstral analysis
  • Morphing Algorithms:

  • Dynamic Time Warping (DTW): Aligns feature sequences between source and guide
  • Phase Vocoder: Time-varying pitch shifting with STFT
  • Spectral Transfer: Magnitude envelope morphing with phase preservation
  • Volume Matching: RMS-based amplitude envelope transfer
  • Supported Guide Audio:

  • Any sample rate (auto-resampling to 24kHz)
  • Mono or stereo (auto-converted to mono)
  • WAV, MP3, FLAC, OGG formats
  • Recommended: 16kHz+ sample rate for best results
  • Text Processing

  • Intelligent Chunking: Automatically splits long texts while preserving sentence order
  • Chunk Size: 350 characters (configurable)
  • Gap Insertion: 150ms natural pauses between chunks
  • Paragraph Awareness: Respects paragraph breaks and structure
  • Punctuation Handling: Proper sentence boundary detection
  • Supported Languages & Codes

  • a – American English
  • b – British English
  • j – Japanese
  • z – Mandarin Chinese
  • e – Spanish
  • f – French
  • h – Hindi
  • i – Italian
  • p – Brazilian Portuguese
  • 🔧 Troubleshooting

    Common Issues

    “Kokoro import error”

    Solution:

    pip install --upgrade kokoro>=0.9.4
    

    Voice not loading

    Solution:

  • Restart ComfyUI completely
  • Check console for specific error messages
  • Ensure all dependencies are installed
  • Try reinstalling: pip install --force-reinstall kokoro
  • GPU out of memory

    Solutions:

  • Disable “use_gpu” option
  • Reduce text length
  • Close other GPU-intensive applications
  • Use CPU mode for very long texts
  • Audio sounds distorted

    Solutions:

  • Reduce “output_volume” in Voice Mod node
  • Lower “effect_blend” ratio (start at 0.3-0.5)
  • Reduce distortion and compression amounts
  • Check that input audio isn’t already clipping
  • Python version issues

    Solution:

    python --version  # Check your version
    
    # Must be 3.9, 3.10, 3.11, 3.12, or 3.13
    
    pip install --upgrade pip
    
    pip install -r requirements.txt --force-reinstall
    

    espeak-ng not found

    Solution:

  • Ubuntu/Debian: sudo apt-get install espeak-ng
  • macOS: brew install espeak-ng
  • Windows: Download from espeak-ng GitHub releases
  • Performance Tips

  • For long texts: Enable GPU acceleration
  • For short texts: CPU mode is often faster
  • Memory management: Process texts in batches if needed
  • Effect intensity: Start low (30-50%) and increase gradually
  • Voice blending: Keep both voices in the same language family
  • 🤝 Contributing

    Contributions are welcome! Areas where help is appreciated:

  • Additional voice profile presets
  • Performance optimizations
  • Bug reports and fixes
  • Documentation improvements
  • Testing on different platforms
  • 📄 License & Credits

    This Project

  • License: MIT License
  • Author: GeekyGhost
  • Repository: https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS
  • Kokoro TTS Model

  • License: Apache 2.0
  • Author: hexgrad
  • Model: https://huggingface.co/hexgrad/Kokoro-82M
  • Dependencies

  • librosa: Audio processing (ISC License)
  • scipy: Scientific computing (BSD License)
  • PyTorch: Deep learning framework (BSD License)
  • soundfile: Audio I/O (BSD License)
  • Special Thanks

  • hexgrad for the incredible Kokoro-82M model
  • ComfyUI Team for the amazing framework
  • Community testers and contributors
  • Audio processing library developers
  • 📚 Research & Resources

    Useful Links

  • Kokoro Model Page: https://huggingface.co/hexgrad/Kokoro-82M
  • ComfyUI Documentation: https://docs.comfy.org
  • Issue Tracker: https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS/issues
  • Discussions: https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS/discussions
  • Research Papers & References

  • StyleTTS 2 architecture
  • ISTFTNet vocoder
  • Phase vocoder techniques
  • Voice morphing and blending

  • 🌟 Quick Start Examples

    Example 1: Basic Narration

    Node: Geeky Kokoro TTS (2025)
    
    Text: "Welcome to my tutorial on advanced AI techniques."
    
    Voice: 🇺🇸 🚺 Nicole 🎧
    
    Speed: 1.0
    
    GPU: true
    

    Example 2: Character Voice with Effects

    Node 1: Geeky Kokoro TTS (2025)
    
    Voice: 🇺🇸 🚹 Puck 🎭
    
    Text: "The villain laughed menacingly."
    
    
    
    Node 2: Geeky Kokoro Advanced Voice
    
    Profile: Monster
    
    Intensity: 0.7
    

    Example 3: Blended Voice for Unique Sound

    Node: Geeky Kokoro TTS (2025)
    
    Voice: 🇺🇸 🚺 Heart ❤️
    
    Enable Blending: true
    
    Second Voice: 🇺🇸 🚺 Bella 🔥
    
    Blend Ratio: 0.6
    
    Text: "This creates a warm yet energetic voice perfect for marketing."
    


    Made with ❤️ for the ComfyUI community

    Enjoy natural, high-quality text-to-speech with 54+ voices and unlimited creative possibilities! 🎉