ComfyUI-QwenVL

ComfyUI-QwenVL
★ 710

多模态AI图像与视频理解文本生成自动模型下载
集成Qwen-VL系列(含Qwen2.5-VL与Qwen3-VL),在ComfyUI中提供图像/视频理解与文本生成的多模态节点,支持预设提示、进阶控制、自动下载与量化。
💡 在ComfyUI中用Qwen-VL对图像或视频做多模态分析并生成文本描述。
🍴 93 Forks💻 Python🔄 2026-02-10
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/e98a62d17551
📦 requirements.txt
#
GGUF
backend
requirements
(vision-capable
llama-cpp-python).
#
Follow
the
platform-specific
install
guide:
#
docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md
#
#
Linux
(CUDA)
example:
#
pip
install
--upgrade
--force-reinstall
--no-cache-dir
\
#
llama-cpp-python==<version>
--extra-index-url
<vision-wheel-index>
#
#
Windows
example:
#
1)
Install
a
vision-capable
wheel
from
the
guide.
#
2)
Then
verify
with:
#
python
-c
"import
llama_cpp;
print(llama_cpp.__version__)"
#
#
Placeholder
only;
do
not
rely
on
this
line
alone:
#
llama-cpp-python
QwenVL_V1.1.0
600346260_122188475918461193_3763807942053883496_n
📄 README

QwenVL for ComfyUI

The ComfyUI-QwenVL custom node integrates the powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including the latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.

📰 News & Updates

  • 2026/02/08: v2.1.1 Fixed compatibility for Transformers 4.x and 5.x [Update]
  • 2026/02/05: v2.1.0 Added SageAttention support with per-GPU architecture optimization, improved FP8 model handling, and automatic attention mode selection. [Update]
  • SageAttention Support: New attention mode with per-GPU optimized kernels (SM80, SM89, SM90, SM120)
  • Improved FP8 Handling: Better support for pre-quantized FP8 models with automatic SDPA fallback
  • Smart Attention Selection: Auto mode now tries Sage → Flash → SDPA for optimal performance
  • Progress Bar: Added ComfyUI progress bar for model loading and generation stages
  • Better Memory Management: Improved cache clearing when changing attention modes or quantization
  • 2025/12/22: v2.0.0 Added GGUF supported nodes and Prompt Enhancer nodes. [Update]
  • [!IMPORTANT]

    Install llama-cpp-python before running GGUF nodes instruction

  • 2025/11/10: v1.1.0 Runtime overhaul with attention-mode selector, flash-attn auto detection, smarter caching, and quantization/torch.compile controls in both nodes. [Update]
  • 2025/10/31: v1.0.4 Custom Models Supported [Update]
  • 2025/10/22: v1.0.3 Models list updated [Update]
  • 2025/10/17: v1.0.0 Initial Release
  • Support for Qwen3-VL and Qwen2.5-VL series models.
  • Automatic model downloading from Hugging Face.
  • On-the-fly quantization (4-bit, 8-bit, FP16).
  • Preset and Custom Prompt system for flexible and easy use.
  • Includes both a standard and an advanced node for users of all levels.
  • Hardware-aware safeguards for FP8 model compatibility.
  • Image and Video (frame sequence) input support.
  • “Keep Model Loaded” option for improved performance on sequential runs.
  • Seed parameter for reproducible generation.
  • [](https://github.com/1038lab/ComfyUI-QwenVL/blob/main/example_workflows/QWenVL.json)

    ✨ Features

  • Standard & Advanced Nodes: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.
  • Prompt Enhancers: Dedicated text-only prompt enhancers for both HF and GGUF backends.
  • Preset & Custom Prompts: Choose from a list of convenient preset prompts or write your own for full control.
  • Multi-Model Support: Easily switch between various official Qwen-VL models.
  • Automatic Model Download: Models are downloaded automatically on first use.
  • Smart Quantization: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options.
  • Hardware-Aware: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).
  • Reproducible Generation: Use the seed parameter to get consistent outputs.
  • Memory Management: “Keep Model Loaded” option to retain the model in VRAM for faster processing.
  • Image & Video Support: Accepts both single images and video frame sequences as input.
  • Robust Error Handling: Provides clear error messages for hardware or memory issues.
  • Clean Console Output: Minimal and informative console logs during operation.
  • SageAttention Support: GPU-optimized attention mechanism with per-architecture kernels (Ampere, Ada, Hopper, Blackwell).
  • Progress Bar: Visual feedback during model loading and generation stages.
  • Intelligent Cache Management: Automatically clears VRAM when changing attention modes or quantization settings.
  • 🚀 Installation

  • Clone this repository to your ComfyUI/custom\_nodes directory:
  • “`

    cd ComfyUI/custom\_nodes

    git clone https://github.com/1038lab/ComfyUI-QwenVL.git

    “`

  • Install the required dependencies:
  • “`

    cd ComfyUI/custom\_nodes/ComfyUI-QwenVL

    pip install \-r requirements.txt

    “`

  • Restart ComfyUI.
  • Optional: SageAttention Support

    For optimal performance on supported GPUs, install SageAttention:

    pip install sageattention

    🧭 Node Overview

    Transformers (HF) Nodes

  • QwenVL: Quick vision-language inference (image/video + preset/custom prompts).
  • QwenVL (Advanced): Full control over sampling, device, and performance settings.
  • QwenVL Prompt Enhancer: Text-only prompt enhancement (supports both Qwen3 text models and QwenVL models in text mode).
  • GGUF (llama.cpp) Nodes

  • QwenVL (GGUF): GGUF vision-language inference.
  • QwenVL (GGUF Advanced): Extended GGUF controls (context, GPU layers, etc.).
  • QwenVL Prompt Enhancer (GGUF): GGUF text-only prompt enhancement.
  • 🧩 GGUF Nodes (llama.cpp backend)

    This repo includes GGUF nodes powered by llama-cpp-python (separate from the Transformers-based nodes).

  • Nodes: QwenVL (GGUF), QwenVL (GGUF Advanced), QwenVL Prompt Enhancer (GGUF)
  • Model folder (default): ComfyUI/models/llm/GGUF/ (configurable via gguf_models.json)
  • Vision requirement: install a vision-capable llama-cpp-python wheel that provides Qwen3VLChatHandler / Qwen25VLChatHandler
  • See docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md

    🗂️ Config Files

  • HF models: hf_models.json
  • hf_vl_models: vision-language models (used by QwenVL nodes).
  • hf_text_models: text-only models (used by Prompt Enhancer).
  • GGUF models: gguf_models.json
  • System prompts: AILab_System_Prompts.json (includes both VL prompts and prompt-enhancer styles).
  • 📥 Download Models

    The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.

    HF Vision Models (Qwen-VL)

    | Model | Link |

    | :—- | :—- |

    | Qwen3-VL-2B-Instruct | Download |

    | Qwen3-VL-2B-Thinking | Download |

    | Qwen3-VL-2B-Instruct-FP8 | Download |

    | Qwen3-VL-2B-Thinking-FP8 | Download |

    | Qwen3-VL-4B-Instruct | Download |

    | Qwen3-VL-4B-Thinking | Download |

    | Qwen3-VL-4B-Instruct-FP8 | Download |

    | Qwen3-VL-4B-Thinking-FP8 | Download |

    | Qwen3-VL-8B-Instruct | Download |

    | Qwen3-VL-8B-Thinking | Download |

    | Qwen3-VL-8B-Instruct-FP8 | Download |

    | Qwen3-VL-8B-Thinking-FP8 | Download |

    | Qwen3-VL-32B-Instruct | Download |

    | Qwen3-VL-32B-Thinking | Download |

    | Qwen3-VL-32B-Instruct-FP8 | Download |

    | Qwen3-VL-32B-Thinking-FP8 | Download |

    | Qwen2.5-VL-3B-Instruct | Download |

    | Qwen2.5-VL-7B-Instruct | Download |

    HF Text Models (Qwen3)

    | Model | Link |

    | :—- | :—- |

    | Qwen3-0.6B | Download |

    | Qwen3-4B-Instruct-2507 | Download |

    | qwen3-4b-Z-Image-Engineer | Download |

    GGUF Models (Manual Download)

    | Group | Model | Repo | Alt Repo | Model Files | MMProj |

    | :– | :– | :– | :– | :– | :– |

    | Qwen text (GGUF) | Qwen3-4B-GGUF | Qwen/Qwen3-4B-GGUF | | Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf | |

    | Qwen-VL (GGUF) | Qwen3-VL-4B-Instruct-GGUF | Qwen/Qwen3-VL-4B-Instruct-GGUF | | Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-4B-Instruct-F16.gguf |

    | Qwen-VL (GGUF) | Qwen3-VL-8B-Instruct-GGUF | Qwen/Qwen3-VL-8B-Instruct-GGUF | | Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-8B-Instruct-F16.gguf |

    | Qwen-VL (GGUF) | Qwen3-VL-4B-Thinking-GGUF | Qwen/Qwen3-VL-4B-Thinking-GGUF | | Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-4B-Thinking-F16.gguf |

    | Qwen-VL (GGUF) | Qwen3-VL-8B-Thinking-GGUF | Qwen/Qwen3-VL-8B-Thinking-GGUF | | Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-8B-Thinking-F16.gguf |

    📖 Usage

    Basic Usage

  • Add the “QwenVL” node from the 🧪AILab/QwenVL category.
  • Select the model\_name you wish to use.
  • Connect an image or video (image sequence) source to the node.
  • Write your prompt using the preset or custom field.
  • Run the workflow.
  • Advanced Usage

    For more control, use the “QwenVL (Advanced)” node. This gives you access to detailed generation parameters like temperature, top\_p, beam search, and device selection.

    ⚙️ Parameters

    | Parameter | Description | Default | Range | Node(s) |

    | :—- | :—- | :—- | :—- | :—- |

    | model\_name | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | \- | Standard & Advanced |

    | quantization | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced |

    | attention\_mode | Attention mechanism: auto (Sage→Flash→SDPA), sage, flash\_attention\_2, sdpa | auto | auto, sage, flash\_attention\_2, sdpa | Standard & Advanced |

    | preset\_prompt | A selection of pre-defined prompts for common tasks. | “Describe this…” | Any text | Standard & Advanced |

    | custom\_prompt | Overrides the preset prompt if provided. | | Any text | Standard & Advanced |

    | max\_tokens | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced |

    | keep\_model\_loaded | Keep the model in VRAM for faster subsequent runs. | True | True/False | Standard & Advanced |

    | seed | A seed for reproducible results. | 1 | 1 \- 2^64-1 | Standard & Advanced |

    | temperature | Controls randomness. Higher values \= more creative. (Used when num\_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only |

    | top\_p | Nucleus sampling threshold. (Used when num\_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only |

    | num\_beams | Number of beams for beam search. \> 1 disables temperature/top\_p sampling. | 1 | 1-10 | Advanced Only |

    | repetition\_penalty | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only |

    | frame\_count | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only |

    | device | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |

    | use\_torch\_compile | Enable torch.compile optimization for faster inference. | False | True/False | Advanced Only |

    💡 Quantization Options

    | Mode | Precision | Memory Usage | Speed | Quality | Recommended For |

    | :—- | :—- | :—- | :—- | :—- | :—- |

    | None (FP16) | 16-bit Float | High | Fastest | Best | High VRAM GPUs (16GB+) |

    | 8-bit (Balanced) | 8-bit Integer | Medium | Fast | Very Good | Balanced performance (8GB+) |

    | 4-bit (VRAM-friendly) | 4-bit Integer | Low | Slower\* | Good | Low VRAM GPUs (<8GB) |

    \* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.

    🎯 Attention Mode Guide

    | Mode | Description | Best For |

    | :—- | :—- | :—- |

    | auto | Automatically selects best available: Sage → Flash → SDPA | Most users (recommended) |

    | sage | SageAttention with GPU-optimized kernels | Speed on modern GPUs (RTX 40 series, Hopper, Blackwell) |

    | flash\_attention\_2 | Flash Attention 2 | Speed when Sage unavailable |

    | sdpa | PyTorch SDPA (default) | Compatibility, FP8/BitsAndBytes models |

    Note: FP8 models and BitsAndBytes quantization automatically use SDPA regardless of selection.

    🤔 Setting Tips

    | Setting | Recommendation |

    | :—- | :—- |

    | Model Choice | For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the \-FP8 version for better performance. |

    | Memory Mode | Keep keep\_model\_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes. |

    | Quantization | Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit. |

    | Attention Mode | Use “auto” for best performance. SageAttention provides fastest inference on supported GPUs. |

    | Performance | The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep\_model\_loaded enabled) will be much faster. |

    🧠 About Model

    This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.

    🗺️ Roadmap

    ✅ Completed (v2.1.0)

  • ✅ SageAttention support with per-GPU architecture optimization
  • ✅ Improved FP8 model handling with automatic SDPA fallback
  • ✅ Smart attention selection (auto: Sage → Flash → SDPA)
  • ✅ Progress bar for model loading and generation
  • ✅ Better memory management and cache clearing
  • ✅ Completed (v2.0.0)

  • ✅ GGUF model support via llama.cpp backend
  • ✅ Prompt Enhancer nodes for text-only optimization
  • ✅ Completed (v1.0.0)

  • ✅ Support for Qwen3-VL and Qwen2.5-VL models.
  • ✅ Automatic model downloading and management.
  • ✅ On-the-fly 4-bit, 8-bit, and FP16 quantization.
  • ✅ Hardware compatibility checks for FP8 models.
  • ✅ Image and Video (frame sequence) input support.
  • 🙏 Credits

  • Qwen Team: Alibaba Cloud \- For developing and open-sourcing the powerful Qwen-VL models.
  • ComfyUI: comfyanonymous \- For the incredible and extensible ComfyUI platform.
  • llama-cpp-python: JamePeng/llama-cpp-python \- GGUF backend with vision support used by the GGUF nodes.
  • SageAttention: SageAttention \- Efficient attention implementation with GPU-optimized kernels.
  • ComfyUI Integration: 1038lab \- Developer of this custom node.
  • 📜 License

    This repository’s code is released under the GPL-3.0 License.