ComfyUI_VLM_nodes

★ 567

视觉语言模型图像描述自动提示生成关键词提取

为ComfyUI提供视觉-语言模型与LLM自定义节点，自动生成图像描述、提示与关键词，提升提示创意与一致性

💡 自动为图像生成描述与创意提示，优化ComfyUI提示

🍴 59 Forks💻 Python🔄 2026-01-11

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/6862a2001521

📦 requirements.txt

accelerate>=1.0
bitsandbytes
cffi
decord
diffusers
>=0.31.0
diskcache
einops>=0.7.0
gitpython
huggingface-hub>=0.26.2
matplotlib
moviepy
numpy>=1.26.4,<2.0.0
openai>=0.27.8
opencv-python
optimum>=1.17.0
pillow>=9.4.0
py-cpuinfo>=3.3.0
python-dateutil>=2.7.0
pytz
qwen-vl-utils
safetensors>=0.4.1
scikit-build
six
soundfile
symusic
torch>=2.0.1
torchvision>=0.15.2
transformers>=4.46

📄 README

👁️ VLM Nodes

🔽Examples below •

📙 Visit my other repo to learn more about Vision Language Models •

🔍 Compare VLM outputs side-by-side with DualView

Usage

For Windows and Linux

cd custom_nodes
git clone https://github.com/gokayfem/ComfyUI_VLM_nodes.git

Acknowledgements

JAGS

EnragedAntelope

If you get errors related to llama-cpp-python or if it is not using GPU.

I recommend installing it with the right arguments provided in this link llama-cpp-python

Tools

| Tool | Description |

|——|————-|

| DualView | Free side-by-side comparison tool for VLM outputs, images, videos, and AI prompts |

VLM Nodes

Utilizes “llama-cpp-python“ for integration of LLaVa models. You can load and use any VLM with LLaVa models in GGUF format with this nodes.

You need to download the model similar to “ggml-model-q4_k.gguf` and it's clip projector similar to `mmproj-model-f16.gguf“ from this repositories (in the files and versions).