ComfyUI-Image-Captioner

ComfyUI-Image-Captioner
★ 29

图像字幕本地VLM批量处理自然语言提示
在本地利用多种视觉语言模型(VLM)为图像生成可定制的文字描述,支持自然语言指令、批处理与标签化,数据不外传。
💡 为图片批量生成可定制的自然语言描述,便于整理与检索。
🍴 6 Forks💻 Python🔄 2025-05-12
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/e58c8376a81b
📦 requirements.txt
http
torch
dashscope
torchvision
workflow
📄 README

ComfyUI ImageCaptioner

A ComfyUI extension for generating captions for your images. Runs on your own system, no external services used, no filter.

Uses various VLMs with APIs to generate captions for images. You can give instructions or ask questions in natural language.

Try asking for:

  • captions or long descriptions
  • whether a person or object is in the image, and how many
  • lists of keywords or tags
  • a description of the opposite of the image
  • Installation

  • git clone https://github.com/neverbiasu/ComfyUI-ImageCaptioner into your custom_nodes folder
  • e.g. custom_nodes\ComfyUI-ImageCaptioner
  • Open a console/Command Prompt/Terminal etc
  • Change to the custom_nodes/ComfyUI-ImageCaptioner folder you just created
  • e.g. cd C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-ImageCaptioner or wherever you have it installed
  • Run pip install -r requirements.txt
  • Usage

    Add the node via image -> ImageCaptioner

    Supports tagging and outputting multiple batched inputs.

  • image: The image you want to make captions.
  • api: The API of dashscope.
  • use_prompt: The prompt to drive the VLMs.
  • Requirements

    U need to get the API of dashscope from the document

    See also

  • ComfyUI-WD14-Tagger
  • ComfyUI-LLaVA-Captioner
  • IELTSDuck