ComfyUI-NvidiaCaptioner

★ 1

图像描述批量处理NVIDIA模型内置缓存

使用NVIDIA视觉模型为图片生成丰富细致的文字描述，支持批处理、多种风格、速率限制和内置缓存。

💡 为大量图片生成多风格、高质量的文字描述，用于索引或文案

🍴 1 Forks💻 Python🔄 2025-12-10

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/9d76119b2ef2

📄 README

NVIDIA Captioner for ComfyUI

A powerful ComfyUI node for generating rich, detailed captions for images using NVIDIA’s vision models. This node allows batch processing of images with customizable prompts and supports various captioning styles.

[](https://github.com/theshubzworld/ComfyUI-NvidiaCaptioner/stargazers)

Features

🖼️ Batch process multiple images with a single click

🎨 Multiple captioning styles (detailed, concise, product-focused, etc.)

⚡ Optimized for performance with rate limiting

🔄 Built-in caching to avoid reprocessing the same images

📊 Progress tracking for batch operations

🔍 Case-insensitive image filtering

🎭 Support for custom system prompts

Installation

Navigate to your ComfyUI custom_nodes directory:

“`bash

cd ComfyUI/custom_nodes

“`

Clone this repository:

“`bash

git clone https://github.com/theshubzworld/ComfyUI-NvidiaCaptioner.git

“`

Install the required dependencies:

“`bash

pip install -r ComfyUI-NvidiaCaptioner/requirements.txt

“`

Restart ComfyUI.

Usage

Add the NVIDIA Captioner node to your workflow from the NVIDIA/Vision category.

Configure the node settings:

Image Directory: Folder containing images to process

API Key: Your NVIDIA API key

Model: Select the vision model to use

Prompt Style: Choose from various captioning styles

Use Cache: Toggle to skip already processed images

Connect the output to any text node or save the captions to a file.

Node Configuration

Inputs

image_directory: Path to directory containing images to process

api_key: Your NVIDIA API key

model: The vision model to use (default: “nvidia/vision”)

system_prompt_preset: Predefined prompt styles

custom_system_prompt: Custom prompt (overrides preset if provided)

prompt: Instruction for the model

use_cache: Skip already processed images (default: True)

skip_existing_txt: Skip images with existing .txt files (default: False)

max_tokens: Maximum tokens in response (default: 300)

temperature: Sampling temperature (default: 0.2)

top_p: Nucleus sampling parameter (default: 0.7)

frequency_penalty: Penalize frequent tokens (default: 0.0)

presence_penalty: Penalize new tokens (default: 0.0)

max_retries: Maximum retry attempts (default: 3)

retry_delay: Delay between retries in seconds (default: 2.0)

Outputs

all_captions: Concatenated captions for all processed images

last_caption: Caption for the most recently processed image

Example Workflow

Load a batch of images using the Load Batch node

Connect to the NVIDIA Captioner node

Configure the captioning settings

Save or use the generated captions in your workflow

License

This project is licensed under the MIT License – see the LICENSE file for details.

Support

For issues and feature requests, please open an issue.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

ComfyUI for the amazing node-based UI

NVIDIA for their powerful vision models