Pillow==10.4.0 requests==2.32.5 openai==1.44.0 blend-modes==2.1.0 huggingface_hub>=0.34.0 color_matcher==0.5.0 chardet==5.2.0 google-generativeai==0.7.2 anthropic transformers>=4.40.0 decord>=0.6.0 scipy>=1.10.0 tqdm>=4.67.1 huggingface_hub[hf_xet]


Advanced Prompt Generation & Multi-Model AI Integration for ComfyUI
A comprehensive suite of nodes for ComfyUI featuring multi-provider LLM support (OpenAI, Gemini, Claude, Grok, Groq, QwenVL), local model inference (Phi, MiniCPM, Ollama), professional image effects, and advanced prompt generation tools.
Search for “comfyui_dagthomas” in ComfyUI Manager and click Install.
cd ComfyUI/custom_nodes
git clone https://github.com/dagthomas/comfyui_dagthomas
cd comfyui_dagthomas
pip install -r requirements.txt
Set your API keys as environment variables:
# OpenAI GPT
set OPENAI_API_KEY=sk-your-key-here
# Google Gemini
set GEMINI_API_KEY=your-key-here
# Anthropic Claude
set ANTHROPIC_API_KEY=your-key-here
# or
set CLAUDE_API_KEY=your-key-here
# xAI Grok
set XAI_API_KEY=your-key-here
# or
set GROK_API_KEY=your-key-here
# Groq
set GROQ_API_KEY=your-key-here
Display Name: APNext Universal Generator
A model-agnostic prompt generator that automatically detects available API keys and supports all major LLM providers.
| Input | Description |
|——-|————-|
| input_text | Base text to enhance |
| model | Select provider:model or “auto-detect” |
| generation_mode | Creative, Balanced, Focused, or Custom |
| seed | Seed for reproducible variations |
| style_preference | Cinematic, Photorealistic, Artistic, etc. |
| detail_level | Brief to Very Detailed output |
Supported Models:
gpt:gpt-4o, gpt:gpt-4o-mini, gpt:gpt-4-turbogemini:gemini-2.5-flash, gemini:gemini-2.5-proclaude:claude-sonnet-4.5, claude:claude-3-5-sonnetgrok:grok-beta, grok:grok-2-visiongroq:llama-3.3-70b-versatileReturns: (generated_prompt, model_used, seed_used)
Display Name: APNext Universal Vision Cloner
Analyze images with any supported vision model to generate detailed descriptions or clone image styles.
| Input | Description |
|——-|————-|
| images | One or more images to analyze |
| model | Vision model to use (auto-detect available) |
| fade_percentage | Blend percentage for multiple images |
| analysis_mode | Detailed Analysis, Style Cloning, Scene Description, Creative Interpretation |
| output_format | Text Only, JSON Structure, or Formatted Prompt |
Returns: (formatted_output, raw_response, faded_image, model_used)
Display Name: APNext Gemini Prompt Enhancer
Enhances prompts with cinematic terminology and LLM refinement for video/image generation.
| Input | Description |
|——-|————-|
| base_prompt | Original prompt to enhance |
| enhancement_mode | Random Mix, Cinematic/Lighting/Camera/Motion/Style Focus, Full Enhancement, or LLM Only |
| use_llm | Enable Gemini LLM enhancement |
| intensity | Enhancement intensity (0.1-2.0) |
| Optional dropdowns | visual_style, lighting_type, camera_angle, shot_size, lens_type, color_tone, etc. |
Returns: (enhanced_prompt, random_enhanced, llm_enhanced)
Display Name: APNext Gemini Custom Vision
Analyze multiple images with custom prompts. Supports dynamic prompt templates with variable substitution.
| Input | Description |
|——-|————-|
| images | Input images |
| custom_prompt | Custom analysis prompt |
| dynamic_prompt | Enable ##TAG##, ##SEX##, ##PRONOUNS##, ##WORDS## substitution |
| fade_percentage | Blend multiple images together |
Returns: (output, clip_l, faded_image)
Display Name: APNext Gemini Text Only
Pure text generation with Gemini models. Supports dynamic prompt templates.
Returns: (output, clip_l)
Display Name: APNext Gemini Next Scene
Generate cinematic transitions for visual narratives. Creates the “next scene” based on a previous prompt and current frame.
| Input | Description |
|——-|————-|
| image | Current frame image |
| original_prompt | Previous scene description |
| focus_on | Camera Movement, Framing Evolution, Environmental Reveals, Atmospheric Shifts |
| transition_intensity | Subtle, Moderate, or Dramatic |
Returns: (next_scene_prompt, short_description)
Display Name: APNext GPT Mini Generator
Efficient text generation using GPT-4o-mini.
| Input | Description |
|——-|————-|
| input_text | Text to enhance |
| happy_talk | Enthusiastic vs professional tone |
| compress | Enable output compression |
| poster | Movie poster style formatting |
Display Name: APNext GPT Vision Cloner
Clone image styles using GPT-4o vision capabilities with custom prompts.
Display Name: APNext GPT Custom Vision
Full custom vision analysis with GPT-4o.
Display Name: APNext Claude Text Generator
Text generation with Claude models (Claude 3.5 Sonnet, Claude Sonnet 4.5).
| Input | Description |
|——-|————-|
| input_text | Text to process |
| claude_model | Model selection |
| happy_talk, compress, poster | Output style controls |
| variation_instruction | Custom instruction for creative variations |
Display Name: APNext Claude Vision Analyzer
Image analysis with Claude’s multimodal capabilities.
Display Name: APNext Grok Text Generator
Text generation using xAI’s Grok models.
Display Name: APNext Grok Vision Analyzer
Image analysis with Grok vision models.
Display Name: APNext Groq Text Generator
Lightning-fast text generation using Groq’s optimized infrastructure with Llama and Mixtral models.
| Input | Description |
|——-|————-|
| groq_model | llama-3.3-70b-versatile, llama-3.1-8b-instant, etc. |
| Other standard LLM inputs |
Display Name: APNext Groq Vision Analyzer
Fast image analysis with Groq vision models.
Display Name: APNext QwenVL Vision Analyzer
Local vision analysis using Qwen-VL models. Downloads models automatically.
| Input | Description |
|——-|————-|
| images | Input images |
| qwen_model | Qwen3-VL-4B-Instruct, etc. |
| max_tokens | Maximum response length |
| keep_model_loaded | Cache model in memory |
Display Name: APNext QwenVL Vision Cloner
Clone image styles locally without API calls.
Display Name: APNext QwenVL Video Analyzer
Analyze video content frame-by-frame.
Display Name: APNext QwenVL Next Scene
Generate cinematic scene transitions locally using QwenVL models. Takes a previous scene description and 1-5 frame images, then creates natural camera movements, framing evolution, and atmospheric shifts. Multiple frames help the model understand motion/progression.
| Input | Description |
|——-|————-|
| images | 1-5 frame images (batch) |
| original_prompt | Previous scene description |
| qwen_model | QwenVL model to use |
| prompt_file | Custom prompt template file |
| custom_prompt | Override with inline prompt (optional) |
| max_frames | Max frames to use from batch (1-5) |
| focus_on | Camera Movement, Framing Evolution, Environmental Reveals, Atmospheric Shifts |
| transition_intensity | Subtle, Moderate, or Dramatic |
| keep_model_loaded | Cache model in memory |
Returns: (next_scene_prompt, short_description)
Custom Prompts: Create your own prompt templates in data/custom_prompts/. Use ##ORIGINAL_PROMPT## as placeholder for the previous scene description. Included templates:
next_scene.txt – Default detailed cinematography promptqwen_next_scene_simple.txt – Simplified versionqwen_next_scene_video.txt – Optimized for AI video generationDisplay Name: APNext QwenVL Frame Prep
Utility node to prepare multiple images for QwenVL Next Scene. Accepts up to 5 individual images or a batch, scales them to max dimensions, and outputs a batched tensor.
| Input | Description |
|——-|————-|
| max_width | Maximum width (default 1024) |
| max_height | Maximum height (default 1024) |
| image_1 – image_5 | Individual image inputs |
| image_batch | Pre-batched images (optional) |
Returns: (images, frame_count)
Display Name: APNext QwenVL Z-Image Vision
Analyzes images and outputs in Z-Image TurnBuilder chat format with <|im_start|>/<|im_end|> tokens.
Display Name: APNext OllamaNode
Local LLM inference using Ollama. Supports any model installed in your Ollama instance.
| Input | Description |
|——-|————-|
| input_text | Text to process |
| model_name | Any Ollama model (llama3, mistral, etc.) |
| happy_talk, compress | Output controls |
Display Name: APNext OllamaVision
Local vision analysis with Ollama multimodal models (llava, bakllava, etc.).
Display Name: APNext MiniCPM Image
Image understanding with MiniCPM-V 4.5 (OpenBMB). Supports thinking mode for complex reasoning.
| Input | Description |
|——-|————-|
| images | Input images |
| question | Question about the image |
| enable_thinking | Deep reasoning mode |
| precision | bfloat16 or float16 |
| unload_after_inference | Free memory after use |
Display Name: APNext MiniCPM Video
Video understanding and analysis.
Display Name: APNext Phi Model Loader
Load Microsoft Phi-3.5-vision-instruct model.
| Input | Description |
|——-|————-|
| model_version | Phi-3.5-vision-instruct |
| image_crops | 4 or 16 crops for detail |
| attention_mechanism | flash_attention_2, sdpa, or eager |
Display Name: APNext Phi Model Inference
Run inference with loaded Phi model.
Professional image effects using optimized tensor operations.
Creates a bloom/glow effect on bright areas.
| Input | Description |
|——-|————-|
| intensity | Bloom strength (0-5) |
| threshold | Brightness threshold (0-1) |
| blur_radius | Glow spread (1-50) |
| blend_mode | additive, screen, or overlay |
Professional color grading with LUT support or manual controls.
| Input | Description |
|——-|————-|
| method | manual or lut_file |
| lut_file | .cube, .3dl, or image LUT |
| exposure | -3 to +3 stops |
| contrast, saturation | Standard adjustments |
| highlights, shadows | Tone controls |
| temperature, tint | White balance |
Supported LUT Formats: .cube (Adobe/Blackmagic), .3dl (Autodesk/Flame), Image LUTs (.png, .jpg)
Intelligent image sharpening.
Add film grain and noise effects.
Add texture and roughness.
Film cross-processing color effects.
Separate color toning for highlights and shadows.
HDR-style tone mapping.
Digital glitch and databending effects.
Classic film halation (light bleeding) effect.
Display Name: APNext Latent Generator
Generate latent tensors with intelligent dimension calculation.
| Input | Description |
|——-|————-|
| width, height | Base dimensions (0 = auto-calculate) |
| megapixel_scale | Target megapixels (0.1-2.0) |
| aspect_ratio | 1:1, 3:2, 4:3, 16:9, 21:9 |
| is_portrait | Portrait orientation |
Returns: (LATENT, width, height)
Display Name: APNext PGSD3LatentGenerator
Optimized latent generation for Stable Diffusion 3 pipelines.
Display Name: Auto Prompter
Generate random prompts from extensive category databases.
| Input | Description |
|——-|————-|
| subject | Main subject (can include LoRA triggers) |
| custom | Prefix text for styling |
| artform | Photography, digital art, etc. |
| Various category selections | Random or specific choices |
Display Name: APNext Node
Advanced prompt building with category-based enhancements.
The system includes numerous nodes that can be chained together to create complex workflows:
Supports 24 main categories with subcategories:
Display Name: APNext String Merger
Combine multiple strings with separators.
Display Name: APNext Flexible String Merger
Advanced string combining with custom formatting.
Display Name: APNext Sentence Mixer
Shuffle and mix sentences from multiple inputs for creative variations.
Display Name: APNext Custom Prompts
Load prompt templates from the data/custom_prompts/ directory.
Included templates:
promptcreator.txt – Full creative prompt generationimage_analyze.txt – Image analysis promptsgemini_video.txt – Video generation promptscloner.txt – Style cloning promptsDisplay Name: APNext Local random prompt
Load random prompts from local text files.
Display Name: APNext Random Integer Generator
Generate random integers with min/max range.
Create your own categories for APNextNode:
data/next/ (e.g., data/next/mycategory/)["item1", "item2", "item3"]
{
"preprompt": "with",
"separator": " and ",
"endprompt": "visual effects",
"items": ["motion blur", "lens flare", "particle effects"],
"attributes": {
"motion blur": ["dynamic", "cinematic"],
"lens flare": ["bright", "atmospheric"]
}
}
Create your own prompt templates for use with the Custom Prompt Loader node.
Place .txt files in: data/custom_prompts/
Templates are plain text files containing instructions for LLM nodes. They support dynamic variable substitution:
| Variable | Description |
|———-|————-|
| ##TAG## | Replaced with the tag input (e.g., “ohwx man”) |
| ##SEX## | Replaced with the sex input (e.g., “male”, “female”) |
| ##PRONOUNS## | Replaced with pronouns (e.g., “him, his”) |
| ##WORDS## | Replaced with target word count |
Create a file data/custom_prompts/my_style.txt:
As a professional art critic, describe the provided image in detail.
Focus on creating a cohesive scene as if describing a movie still.
If the subject is ##TAG##, use ##PRONOUNS## pronouns appropriately.
The subject is ##SEX##.
Include:
- Main subject description with clothing, accessories, position
- Setting and environment details
- Lighting type, direction, and atmosphere
- Color palette and emotional tone
- Camera angle and composition
Output approximately ##WORDS## words.
Do not use JSON format. Provide a single cohesive paragraph.
| Template | Purpose |
|———-|———|
| promptcreator.txt | Detailed image analysis (~150 words) |
| promptcreator_small.txt | Concise image analysis |
| image_analyze.txt | General image description |
| cloner.txt | Style cloning prompts |
| gemini_video.txt | Video generation prompts |
| gemini_ohwx.txt | LoRA trigger-aware prompts |
| t5xxl.txt | T5-XXL optimized prompts |
| ltxv.txt | LTX Video model prompts |
| next_scene.txt | Cinematic scene transitions |
Customize available models by editing JSON configuration files in the data/ folder.
| File | Provider | Description |
|——|———-|————-|
| gemini_models.json | Google Gemini | Gemini model list |
| gpt_models.json | OpenAI | GPT model list |
| claude_models.json | Anthropic | Claude model list |
| grok_models.json | xAI | Grok model list |
| groq_models.json | Groq | Groq model list (text + vision) |
| qwenvl_models.json | QwenVL | Local Qwen vision models |
QwenVL nodes support loading additional models from private configuration files. This allows you to add custom or uncensored models without modifying the main configuration.
How to add private models:
data/ with a name matching private_*qwenvl*.jsonprivate_qwenvl_models.json, private_uncensored.qwenvl_models.jsonqwenvl_models.json:{
"models": [
"huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated",
"huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated",
"another-namespace/custom-model"
]
}
Notes:
qwenvl_models.jsonnamespace/model-name)ComfyUI/models/LLM/Qwen-VL/ on first useMost model files use a simple array format:
{
"models": [
"model-name-1",
"model-name-2",
"model-name-3"
]
}
Edit data/gemini_models.json:
{
"models": [
"gemini-2.5-pro",
"gemini-2.5-flash",
"gemini-flash-latest",
"gemini-flash-lite-latest",
"gemini-2.5-flash-lite",
"gemini-exp-1206"
]
}
Edit data/claude_models.json:
{
"models": [
"claude-sonnet-4.5",
"claude-sonnet-4",
"claude-sonnet-3.7",
"claude-opus-4.1",
"claude-opus-4",
"claude-haiku-3.5",
"claude-haiku-3"
]
}
Groq supports separate text and vision model lists:
{
"text_models": [
"llama-3.3-70b-versatile",
"llama-3.1-8b-instant",
"groq/compound",
"qwen/qwen3-32b"
],
"vision_models": [
"meta-llama/llama-4-scout-17b-16e-instruct",
"meta-llama/llama-4-maverick-17b-128e-instruct"
],
"note": "Edit this file to add/remove models"
}
Example workflows are available in the examples/ directory:
examples/flux/apnext/examples/flux/florence2/examples/flux/gpt-4o_vision/examples/flux/ollama_local_llm/examples/minicpm/Pillow>=10.4.0
requests>=2.32.5
openai>=1.44.0
blend-modes>=2.1.0
huggingface_hub>=0.34.0
color_matcher>=0.5.0
chardet>=5.2.0
google-generativeai>=0.7.2
anthropic
transformers>=4.40.0
decord>=0.6.0
scipy>=1.10.0
tqdm>=4.67.1
| Provider | Text | Vision | Video | Local |
|———-|——|——–|——-|——-|
| OpenAI GPT | ✅ | ✅ | ❌ | ❌ |
| Google Gemini | ✅ | ✅ | ✅ | ❌ |
| Anthropic Claude | ✅ | ✅ | ❌ | ❌ |
| xAI Grok | ✅ | ✅ | ❌ | ❌ |
| Groq | ✅ | ✅ | ❌ | ❌ |
| QwenVL | ✅ | ✅ | ✅ | ✅ |
| Ollama | ✅ | ✅ | ❌ | ✅ |
| MiniCPM | ✅ | ✅ | ✅ | ✅ |
| Phi-3.5 | ✅ | ✅ | ❌ | ✅ |
MIT License
Built for the ComfyUI community. Special thanks to all contributors and users providing feedback.