A powerful ComfyUI node for generating rich, detailed captions for images using NVIDIA’s vision models. This node allows batch processing of images with customizable prompts and supports various captioning styles.
[](https://github.com/theshubzworld/ComfyUI-NvidiaCaptioner/stargazers)
custom_nodes directory:“`bash
cd ComfyUI/custom_nodes
“`
“`bash
git clone https://github.com/theshubzworld/ComfyUI-NvidiaCaptioner.git
“`
“`bash
pip install -r ComfyUI-NvidiaCaptioner/requirements.txt
“`
NVIDIA/Vision category.image_directory: Path to directory containing images to processapi_key: Your NVIDIA API keymodel: The vision model to use (default: “nvidia/vision”)system_prompt_preset: Predefined prompt stylescustom_system_prompt: Custom prompt (overrides preset if provided)prompt: Instruction for the modeluse_cache: Skip already processed images (default: True)skip_existing_txt: Skip images with existing .txt files (default: False)max_tokens: Maximum tokens in response (default: 300)temperature: Sampling temperature (default: 0.2)top_p: Nucleus sampling parameter (default: 0.7)frequency_penalty: Penalize frequent tokens (default: 0.0)presence_penalty: Penalize new tokens (default: 0.0)max_retries: Maximum retry attempts (default: 3)retry_delay: Delay between retries in seconds (default: 2.0)all_captions: Concatenated captions for all processed imageslast_caption: Caption for the most recently processed imageThis project is licensed under the MIT License – see the LICENSE file for details.
For issues and feature requests, please open an issue.
Contributions are welcome! Please feel free to submit a Pull Request.