ComfyUI-Image-Captioner

★ 29

图像字幕本地VLM批量处理自然语言提示

在本地利用多种视觉语言模型（VLM）为图像生成可定制的文字描述，支持自然语言指令、批处理与标签化，数据不外传。

💡 为图片批量生成可定制的自然语言描述，便于整理与检索。

🍴 6 Forks💻 Python🔄 2025-05-12

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/e58c8376a81b

📦 requirements.txt

http
torch
dashscope
torchvision

📄 README

ComfyUI ImageCaptioner

A ComfyUI extension for generating captions for your images. Runs on your own system, no external services used, no filter.

Uses various VLMs with APIs to generate captions for images. You can give instructions or ask questions in natural language.

Try asking for:

captions or long descriptions

whether a person or object is in the image, and how many

lists of keywords or tags

a description of the opposite of the image

git clone https://github.com/neverbiasu/ComfyUI-ImageCaptioner into your custom_nodes folder

e.g. custom_nodes\ComfyUI-ImageCaptioner

Open a console/Command Prompt/Terminal etc

Change to the custom_nodes/ComfyUI-ImageCaptioner folder you just created

e.g. cd C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-ImageCaptioner or wherever you have it installed

Run pip install -r requirements.txt

Add the node via image -> ImageCaptioner

Supports tagging and outputting multiple batched inputs.

image: The image you want to make captions.

api: The API of dashscope.

use_prompt: The prompt to drive the VLMs.

U need to get the API of dashscope from the document