ComfyUI-JoyCaption

ComfyUI-JoyCaption
★ 247

图像字幕批量处理GGUF支持内存优化
基于LLaVA的ComfyUI自定义节点,提供高效风格化图像自动描述,支持GGUF量化模型、批量处理与内存优化
💡 在ComfyUI中批量生成风格化图像字幕并自动管理模型
🍴 27 Forks💻 Python🔄 2025-12-24
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
torch>=2.0.0
transformers>=4.36.0
huggingface_hub>=0.19.0
accelerate
bitsandbytes>=0.42.0
compressed-tensors>=0.6.0
Pillow>=10.0.0
Joycaption_node
Joycaption GGUF
v1 2 0
Joycaption_node
📄 README

ComfyUI-JoyCaption

Joy Caption is a ComfyUI custom node powered by the LLaVA model for efficient, stylized image captioning. Caption Tools nodes handle batch image processing and automatic separation of caption text.

News & Updates

  • 2025/10/27: Added Custom Models v2.0.2 ( update.md )
  • 2025/10/20: Update and bug fix ComfyUI-JoyCaption to v2.0.1 ( update.md )
  • 2025/08/22: Update ComfyUI-JoyCaption to v2.0.0 ( update.md )
  • Added GGUF model support for better performance and compatibility
  • Fixed critical parameter handling issues (top_k, image format)
  • Improved console output (no more base64 spam)
  • Enhanced error handling and stability
  • Added JoyCaption GGUF and JoyCaption GGUF (Advanced) nodes
  • Enhanced GGUF Support: Comprehensive support for 12 quantization levels (Q2_K to F16)
  • llama_cpp_install folder: Complete installation guides and automated scripts for llama-cpp-python
  • Simplified Installation: One-click llama-cpp-python installation with automatic CUDA support
  • Cross-Platform Support: Windows, macOS, and Linux installation guides
  • Performance Improvements: Optimized model loading and memory management
  • 2025/06/15: Update ComfyUI-JoyCaption to v1.2.0 ( update.md )
  • Enhanced CUDA performance optimization
  • Improved memory management strategies
  • Added Global Cache mode for faster processing
  • 2025/06/07: Update ComfyUI-JoyCaption to v1.1.1 ( update.md )
  • 2025/06/05: Update ComfyUI-JoyCaption to v1.1.0 ( update.md )
  • Add Caption tools
  • [](https://github.com/1038lab/ComfyUI-JoyCaption/blob/main/example_workflows/batch_image_text_output.json)

    Features

  • Simple and user-friendly interface
  • Multiple caption types support
  • Flexible length control
  • Memory optimization options
  • GGUF model support – Efficient quantized models for better performance and lower memory usage
  • Automatic model download – Models will be downloaded automatically and properly renamed on first use
  • Support for multiple caption types
  • Support for advanced customization options
  • Multiple precision options (fp32, bf16, fp16, 8-bit, 4-bit)
  • Enhanced memory management with Global Cache mode
  • Clean console output (no debug spam)
  • Robust error handling and parameter validation
  • llama_cpp_install folder – Complete installation guides and automated scripts for llama-cpp-python
  • Installation

  • Clone this repository to your ComfyUI/custom_nodes directory:
  • cd ComfyUI/custom_nodes
    
    git clone https://github.com/1038lab/ComfyUI-JoyCaption.git
    

  • Install the required dependencies:
  • cd ComfyUI/custom_nodes/ComfyUI-JoyCaption
    
    pip install -r requirements.txt
    

  • For GGUF Models (Recommended): Install llama-cpp-python with CUDA support:
  • # Option 1: Automated installation (Recommended)
    
    python llama_cpp_install/llama_cpp_install.py
    
    
    
    # Option 2: Manual installation
    
    pip install -r requirements_gguf.txt
    

    Installation Guides Available:

  • English: llama_cpp_install/llama_cpp_install.md
  • 中文: llama_cpp_install/llama_cpp_install_zh.md
  • Download Models

    The models will be automatically downloaded and renamed on first use, or you can manually download them:

    Standard Models (HuggingFace Format)

    | Model | Link | Memory Usage |

    | —– | —- | ———— |

    | JoyCaption Beta One | Download | ~8-16GB |

    | JoyCaption Alpha Two | Download | ~8-16GB |

    GGUF Models (Quantized, Recommended)

    | Model | Size | Memory Usage | Quality | Recommended For | Link |

    | —– | —- | ———— | ——- | ————— | —- |

    | JoyCaption Beta One (Q2_K) | 3.18GB | ~4GB | Good | Low VRAM (6GB+) | Download |

    | JoyCaption Beta One (Q3_K_S) | 3.66GB | ~5GB | Good+ | Budget systems (8GB+) | Download |

    | JoyCaption Beta One (Q3_K_M) | 4.02GB | ~5GB | Better | Balanced performance | Download |

    | JoyCaption Beta One (Q3_K_L) | 4.32GB | ~6GB | Better+ | Good quality/size ratio | Download |

    | JoyCaption Beta One (IQ4_XS) | 4.48GB | ~6GB | Very Good | Recommended (8GB+) | Download |

    | JoyCaption Beta One (Q4_K_S) | 4.69GB | ~6GB | Very Good | Quality focused | Download |

    | JoyCaption Beta One (Q4_K_M) | 4.92GB | ~7GB | Very Good+ | Balanced choice | Download |

    | JoyCaption Beta One (Q5_K_S) | 5.60GB | ~7GB | Excellent | High quality (10GB+) | Download |

    | JoyCaption Beta One (Q5_K_M) | 5.73GB | ~8GB | Excellent+ | Premium quality | Download |

    | JoyCaption Beta One (Q6_K) | 6.60GB | ~8GB | Near Original | Maximum quality (12GB+) | Download |

    | JoyCaption Beta One (Q8_0) | 8.54GB | ~10GB | Original- | Full precision alternative | Download |

    | JoyCaption Beta One (F16) | 16.1GB | ~18GB | Original | Full precision (24GB+) | Download |

    Alternative GGUF Sources

    | Model | Size | Source | Link |

    | —– | —- | —— | —- |

    | JoyCaption Beta One (Q4_K) | 4.92GB | concedo | Download |

    | JoyCaption Beta One (Q8_0) | 8.54GB | concedo | Download |

    | JoyCaption Beta One (F16) | 16.1GB | concedo | Download |

    Note: GGUF models also require the vision projection model:

  • llama-joycaption-beta-one-llava-mmproj-model-f16.gguf
  • After downloading, place the model files in your ComfyUI/models/LLM/GGUF directory.

    Basic Usage

    Standard Nodes (HuggingFace Models)

    Basic Node

  • Add the “JoyCaption” node from the 🧪AILab/📝JoyCaption category
  • Connect an image source to the node
  • Select the model file (defaults to llama-joycaption-beta-one-hf-llava)
  • Adjust the parameters as needed
  • Run the workflow
  • Advanced Node

  • Add the “JoyCaption (Advanced)” node from the 🧪AILab/📝JoyCaption category
  • Connect an image source to the node
  • Select the caption type
  • Adjust the parameters as needed
  • Run the workflow
  • GGUF Nodes (Recommended for Better Performance)

    GGUF Basic Node

  • Add the “JoyCaption GGUF” node from the 🧪AILab/📝JoyCaption-GGUF category
  • Connect an image source to the node
  • Select the GGUF model (e.g., “JoyCaption Beta One (IQ4_XS)”)
  • Choose processing mode (GPU/CPU)
  • Select caption style and length
  • Run the workflow
  • GGUF Advanced Node

  • Add the “JoyCaption GGUF (Advanced)” node from the 🧪AILab/📝JoyCaption-GGUF category
  • Connect an image source to the node
  • Configure all generation parameters (temperature, top_p, top_k, etc.)
  • Set custom prompts if needed
  • Run the workflow
  • Caption Tools

    Image Batch Path 🖼️

    This node allows you to load multiple images from a directory for batch processing:

    | Parameter | Description |

    | ——— | ———– |

    | image_dir | Directory path containing images |

    | batch_size | Number of images to load (0 = all images) |

    | start_from | Start from Nth image (1 = first image) |

    | sort_method | Image loading order: sequential/reverse/random |

    Caption Saver 📝

    This node saves generated captions to text files:

    | Parameter | Description |

    | ——— | ———– |

    | string | Caption text to save |

    | image_path | Path to the image being captioned |

    | image | (Optional) Image to save alongside the caption |

    | custom_output_path | (Optional) Custom output directory path |

    | custom_file_name | (Optional) Custom filename (without extension) |

    | overwrite | If true, will overwrite existing files; if false, will add a number to make the filename unique |

    Parameters

    Standard Nodes

    Basic Node

    | Parameter | Description | Default | Range |

    | ——— | ———– | ——- | —– |

    | Model | The JoyCaption model to use | llama-joycaption-beta-one-hf-llava | – |

    | Memory Control | Memory optimization settings | Default | Default (fp32), Balanced (8-bit), Maximum Savings (4-bit) |

    | Caption Type | Caption style selection | Descriptive | Descriptive, Descriptive (Casual), Straightforward, Tags, Technical, Artistic |

    | Caption Length | Output length control | medium | any, very short, short, medium, long, very long |

    GGUF Nodes

    GGUF Basic Node

    | Parameter | Description | Default | Range |

    | ——— | ———– | ——- | —– |

    | Model | GGUF model to use | JoyCaption Beta One (IQ4_XS) | Q2_K, IQ4_XS, Q5_K_M, Q6_K |

    | Processing Mode | Hardware acceleration | GPU | GPU, CPU |

    | Caption Type | Caption style selection | Descriptive | Descriptive, Descriptive (Casual), Straightforward, Tags, Technical, Artistic |

    | Caption Length | Output length control | any | any, very short, short, medium, long, very long |

    | Memory Management | Model memory handling | Keep in Memory | Keep in Memory, Clear After Run, Global Cache |

    GGUF Advanced Node

    | Parameter | Description | Default | Range |

    | ——— | ———– | ——- | —– |

    | Model | GGUF model to use | JoyCaption Beta One (IQ4_XS) | Q2_K, IQ4_XS, Q5_K_M, Q6_K |

    | Processing Mode | Hardware acceleration | GPU | GPU, CPU |

    | Max New Tokens | Maximum tokens to generate | 512 | 1-2048 |

    | Temperature | Generation randomness | 0.6 | 0.0-2.0 |

    | Top-p | Nucleus sampling | 0.9 | 0.0-1.0 |

    | Top-k | Top-k sampling | 0 | 0-100 |

    | Custom Prompt | Custom prompt template | “” | Any text |

    | Memory Management | Model memory handling | Keep in Memory | Keep in Memory, Clear After Run, Global Cache |

    Quantization Options

    Standard Models (HuggingFace)

    | Mode | Precision | Memory Usage | Speed | Quality | Recommended GPU |

    |——|———–|————–|——-|———|—————-|

    | Default | fp32 | ~16GB | 1x | Best | 24GB+ |

    | Default | bf16 | ~8GB | 1.5x | Excellent | 16GB+ |

    | Default | fp16 | ~8GB | 2x | Very Good | 16GB+ |

    | Balanced | 8-bit | ~4GB | 2.5x | Good | 12GB+ |

    | Maximum Savings | 4-bit | ~2GB | 3x | Acceptable | 8GB+ |

    GGUF Models (Recommended)

    | Quantization | Model Size | Memory Usage | Speed | Quality | Recommended GPU |

    |————-|————|————–|——-|———|—————-|

    | Q2_K | 3.18GB | ~4GB | 4x | Good | 6GB+ |

    | Q3_K_S | 3.66GB | ~5GB | 3.5x | Good+ | 8GB+ |

    | Q3_K_M | 4.02GB | ~5GB | 3.5x | Better | 8GB+ |

    | Q3_K_L | 4.32GB | ~6GB | 3x | Better+ | 8GB+ |

    | IQ4_XS | 4.48GB | ~6GB | 3x | Very Good | 8GB+ |

    | Q4_K_S | 4.69GB | ~6GB | 3x | Very Good | 8GB+ |

    | Q4_K_M | 4.92GB | ~7GB | 2.5x | Very Good+ | 10GB+ |

    | Q5_K_S | 5.60GB | ~7GB | 2.5x | Excellent | 10GB+ |

    | Q5_K_M | 5.73GB | ~8GB | 2.5x | Excellent+ | 12GB+ |

    | Q6_K | 6.60GB | ~8GB | 2x | Near Original | 12GB+ |

    | Q8_0 | 8.54GB | ~10GB | 1.8x | Original- | 16GB+ |

    | F16 | 16.1GB | ~18GB | 1.5x | Original | 24GB+ |

    Note: GGUF models provide better performance and lower memory usage compared to standard models. IQ4_XS offers the best balance of quality and efficiency.

    Advanced Node

    | Parameter | Description | Default | Range |

    | ——— | ———– | ——- | —– |

    | Extra Options | Additional feature options | [] | Multiple options |

    | Person Name | Name for person descriptions | “” | Any text |

    | Max New Tokens | Maximum tokens to generate | 512 | 1-2048 |

    | Temperature | Generation temperature | 0.6 | 0.0-2.0 |

    | Top-p | Sampling parameter | 0.9 | 0.0-1.0 |

    | Top-k | Top-k sampling | 0 | 0-100 |

    | Precision | Use fp32 for best quality, bf16 for balanced performance, fp16 for maximum speed. 8-bit and 4-bit quantization provide significant memory savings with minimal quality impact |

    Memory Management Options

    | Mode | Description | Recommended Use |

    |——|————-|—————–|

    | Global Cache | Keeps model in memory for fastest processing | High VRAM GPUs (16GB+) |

    | Keep in Memory | Maintains model in memory until workflow ends | Medium VRAM GPUs (8GB+) |

    | Clear After Run | Clears model after each generation | Low VRAM GPUs (<8GB) |

    Setting Tips

    | Setting | Recommendation |

    | ——- | ————– |

    | Model Choice | Use GGUF models for better performance and lower memory usage. IQ4_XS offers the best balance of quality and efficiency |

    | Memory Mode | Based on GPU VRAM: 24GB+ use Global Cache, 12GB+ use Keep in Memory, 8GB+ use Clear After Run |

    | Processing Mode | Use GPU mode for faster processing, CPU mode for systems with limited VRAM |

    | Input Resolution | The model works best with images of 512×512 or higher resolution |

    | Memory Usage | If you encounter memory issues, try GGUF Q2_K model or use Clear After Run mode |

    | Performance | For batch processing, consider using Global Cache mode with GGUF models if you have sufficient VRAM |

    | Temperature | Lower values (0.6-0.7) for more stable results, higher values (0.8-1.0) for more diverse results |

    | Top-k Parameter | Set to 0 to disable, or use values like 40-50 for more focused generation |

    About Model

    This implementation uses LLaVA-based image captioning models, optimized to provide fast and accurate image descriptions.

    Model features:

  • Free and open weights
  • Uncensored content coverage
  • Broad style diversity (digital art, photoreal, anime, etc.)
  • Fast processing with bfloat16 precision
  • High-quality caption generation
  • Memory-efficient operation
  • Consistent results across various image types
  • Support for multiple caption styles
  • Optimized for training diffusion models
  • The models are trained on diverse datasets, ensuring:

  • Balanced representation across different image types
  • High accuracy in various scenarios
  • Robust performance with complex scenes
  • Support for both SFW and NSFW content
  • Equal coverage of different styles and concepts
  • Roadmap

    ✅ Completed (v1.3.0)

  • ✅ GGUF model support for better performance
  • ✅ Support for 12 GGUF quantization formats (Q2_K to F16)
  • ✅ Enhanced error handling and parameter validation
  • ✅ Clean console output (no debug spam)
  • ✅ Improved memory management options
  • ✅ Batch processing support (Caption Tools)
  • ✅ Performance optimizations for CPU processing (GGUF CPU mode)
  • ✅ Enhanced batch processing features with memory management
  • 🔄 Future Plans

  • More caption style options and custom prompts
  • Advanced fine-tuning support for custom models
  • Real-time processing optimizations
  • Multi-language caption support
  • Integration with more vision-language models
  • Credits

  • Original JoyCaption Model: fancyfeast – Creator of the base JoyCaption models
  • GGUF Quantized Models: mradermacher – Provider of comprehensive GGUF quantized models (Q2_K to F16)
  • Alternative GGUF Models: concedo – Provider of alternative GGUF models with vision projection
  • ComfyUI Integration: 1038lab – Developer of this ComfyUI custom node
  • LLaVA Framework: Microsoft Research – Base vision-language model architecture
  • llama-cpp-python: abetlen – Python bindings for llama.cpp
  • License

    This repository’s code is released under the GPL-3.0 License.