ComfyUI-Gemini_TTS

ComfyUI-Gemini_TTS
★ 21

文本转语音Gemini TTS多音色实时计价与回退
将Google Gemini TTS接入ComfyUI,支持30+音色、免费/付费分层、智能回退与实时计价,便于生成高质量语音。
💡 在ComfyUI工作流中快速生成多音色高质量语音。
🍴 3 Forks💻 Python🔄 2025-05-23
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
#
Core
dependencies
for
Gemini
TTS
ComfyUI
Node
google-generativeai>=0.8.0
requests>=2.25.0
#
Optional:
For
enhanced
audio
processing
(if
needed)
#
sounddevice>=0.4.0
image
📄 README

🎙️ ComfyUI-Gemini_TTS

A powerful ComfyUI custom node that brings Google’s Gemini TTS capabilities directly to your workflow. Generate high-quality speech with 30+ voices supporting both free and paid tiers.

✨ Features

  • 30+ Premium Voices: Male and female voices with unique characteristics
  • Dual Tier Support: Free tier with generous limits + Paid tier for production use
  • Smart Fallback: Automatic model switching when quotas are reached
  • Voice Characteristics: Detailed voice info with personality descriptions
  • Flexible Configuration: Environment variables, node parameters, or config file
  • Robust Error Handling: Clear error messages and automatic retry logic
  • Real-time Pricing: Cost estimates for paid tier usage
  • 🚀 Quick Start

    1. Installation

  • Clone or download this repository to your ComfyUI custom nodes folder:
  • “`bash

    cd ComfyUI/custom_nodes/

    git clone https://github.com/ShmuelRonen/ComfyUI-Gemini_TTS.git

    “`

  • Install dependencies:
  • “`bash

    cd gemini-tts-node

    pip install google-generativeai requests torch torchaudio numpy

    “`

  • Restart ComfyUI – The node will appear as “🎙️ Gemini Text-to-Speech”
  • 2. Get Your API Key

    Free Tier (Recommended to Start)

  • Go to Google AI Studio
  • Sign in with your Google account
  • Click “Get API Key” → “Create API Key”
  • Select “Create API key in new project”
  • Copy your API key (starts with AIza...)
  • Paid Tier (For Production)

    See the Paid Tier Setup section below.

    3. Configure the Node

    Option A: Environment Variable (Recommended)

    export GEMINI_API_KEY="your_api_key_here"

    Option B: Direct Input

  • Enter your API key directly in the node’s api_key field
  • The node will save it automatically for future use
  • 🎭 Available Voices

    Female Voices (14 total)

  • Aoede – Breezy and natural
  • Kore – Firm and confident
  • Leda – Youthful and energetic
  • Zephyr – Bright and cheerful
  • Autonoe – Bright and optimistic
  • Callirhoe – Easy-going and relaxed
  • Despina – Smooth and flowing
  • Erinome – Clear and precise
  • Gacrux – Mature and experienced
  • Laomedeia – Upbeat and lively
  • Pulcherrima – Forward and expressive
  • Sulafat – Warm and welcoming
  • Vindemiatrix – Gentle and kind
  • Achernar – Soft and gentle
  • Male Voices (16 total)

  • Puck – Upbeat and energetic (default)
  • Charon – Informative and clear
  • Fenrir – Excitable and dynamic
  • Orus – Firm and decisive
  • Achird – Friendly and approachable
  • Algenib – Gravelly texture
  • Algieba – Smooth and pleasant
  • Alnilam – Firm and strong
  • Enceladus – Breathy and soft
  • Iapetus – Clear and articulate
  • Rasalgethi – Informative and professional
  • Sadachbia – Lively and animated
  • Sadaltager – Knowledgeable and authoritative
  • Schedar – Even and balanced
  • Umbriel – Easy-going and calm
  • Zubenelgenubi – Casual and conversational
  • ⚙️ Node Parameters

    Required Parameters

  • prompt: Text to convert to speech (supports “Say:” prefix)
  • tts_model: Choose between:
  • gemini-2.5-pro-preview-tts (Higher quality, slower)
  • gemini-2.5-flash-preview-tts (Faster, good quality)
  • voice: Select from 30+ available voices
  • temperature: Control creativity (0.0-2.0, default: 1.0)
  • Optional Parameters

  • api_key: Enter API key directly (auto-saved)
  • auto_fallback_to_flash: Auto-switch to Flash if Pro is rate-limited
  • retry_delay: Wait time between retries (10-120 seconds)
  • use_paid_tier: Enable paid billing for higher quotas
  • billing_project_id: Google Cloud project ID for billing
  • aggressive_retry: More retry attempts for better reliability
  • show_voice_info: Display voice characteristics in output
  • 💰 Paid Tier Setup

    Why Upgrade to Paid Tier?

    | Feature | Free Tier | Paid Tier |

    |———|———–|———–|

    | Quota Limits | Low (good for testing) | High (production ready) |

    | Rate Limits | Restrictive | Generous |

    | Priority Access | Standard | Premium |

    | Cost | Free | ~$0.001-0.02 per request |

    Step-by-Step Paid Setup

    1. Create Google Cloud Project

  • Go to Google Cloud Console
  • Click “New Project” or select existing project
  • Enter project name (e.g., “my-gemini-tts”)
  • Note your Project ID (not the name – this is important!)
  • 2. Enable Billing

  • In Google Cloud Console, go to Billing
  • Click “Link a billing account” or “Enable billing”
  • Add a payment method (credit card required)
  • Verify billing is active on your project
  • 3. Enable the Gemini API

  • Go to APIs & Services > Library
  • Search for “Generative Language API”
  • Click “Enable” on the Generative Language API
  • Wait for activation (usually instant)
  • 4. Create API Key

  • Go to APIs & Services > Credentials
  • Click “Create Credentials” > “API Key”
  • Copy your new API key
  • Optional: Restrict the key to “Generative Language API” for security
  • 5. Configure the Node

    Set these parameters in the node:

  • use_paid_tier: True
  • billing_project_id: Your Project ID from step 1
  • api_key: Your API key from step 4
  • 💵 Pricing Information

    Gemini 2.5 Pro TTS:

  • Input: $1.00 per 1M tokens
  • Output: $20.00 per 1M tokens
  • ~$0.01-0.02 per typical request
  • Gemini 2.5 Flash TTS:

  • Input: $0.50 per 1M tokens
  • Output: $10.00 per 1M tokens
  • ~$0.005-0.01 per typical request
  • *Typical 20-word sentence costs less than $0.02*

    🔧 Troubleshooting

    Common Issues

    “API key not valid” Error

  • Solution: Verify your API key starts with AIza and is ~39 characters
  • Check: API key hasn’t expired or been deleted
  • Verify: You’re using the correct key from Google AI Studio or Cloud Console
  • “Rate limit exceeded” Error

  • Free Tier: Wait 60 seconds or try Flash model
  • Solution: Enable paid tier for higher quotas
  • Temporary: Use auto_fallback_to_flash = True
  • “Billing project not found” Error

  • Check: Use Project ID, not project name
  • Verify: Project exists and billing is enabled
  • Confirm: API key belongs to the same project
  • “Permission denied” Error

  • Verify: Generative Language API is enabled
  • Check: API key has proper permissions
  • Ensure: Billing is active if using paid tier
  • Configuration Files

    The node creates a config.json file to save your settings:

    {
        "GEMINI_API_KEY": "your_key_here",
        "use_paid_tier": true,
        "billing_project_id": "your-project-id"
    }

    Debug Information

    Enable debugging by checking console output:

  • Green ✅: Successful operations
  • Yellow ⚠️: Warnings and fallbacks
  • Red ❌: Errors requiring attention
  • 📝 Usage Examples

    Basic Text-to-Speech

    Prompt: "Hello, welcome to our presentation today."
    Model: gemini-2.5-flash-preview-tts
    Voice: [F] Zephyr
    Temperature: 1.0

    Expressive Reading

    Prompt: "Say: Once upon a time, in a land far, far away..."
    Model: gemini-2.5-pro-preview-tts  
    Voice: [M] Charon
    Temperature: 1.5
    Show Voice Info: True

    Production Setup

    Use Paid Tier: True
    Billing Project ID: my-production-project-123
    Aggressive Retry: True
    Model: gemini-2.5-pro-preview-tts

    🛡️ Security Best Practices

  • Protect Your API Key: Never commit API keys to version control
  • Use Environment Variables: Set GEMINI_API_KEY in your environment
  • Restrict API Keys: Limit to specific APIs in Google Cloud Console
  • Monitor Usage: Check Google Cloud billing dashboard regularly
  • Project Isolation: Use separate projects for development vs production
  • 🔄 Updates and Compatibility

  • ComfyUI: Compatible with latest versions
  • Python: Requires Python 3.8+
  • Dependencies: Auto-updated through pip
  • Voice Library: Automatically synced with Google’s latest voices
  • 📞 Support

    Common Solutions

  • Restart ComfyUI after installation or configuration changes
  • Check Console Output for detailed error messages
  • Verify API Key Format (should start with AIza)
  • Confirm Project Settings in Google Cloud Console
  • Getting Help

  • Check the troubleshooting section above
  • Review console output for specific error messages
  • Verify your Google Cloud project configuration
  • Ensure billing is properly enabled for paid tier
  • 📜 License

    This project is provided as-is for educational and commercial use. Google Gemini API usage is subject to Google’s terms of service and pricing.


    🎉 Ready to generate amazing speech with Gemini TTS!

    *Last updated: May 2025*