# GGUF backend requirements (vision-capable llama-cpp-python). # Follow the platform-specific install guide: # docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md # # Linux (CUDA) example: # pip install --upgrade --force-reinstall --no-cache-dir \ # llama-cpp-python==<version> --extra-index-url <vision-wheel-index> # # Windows example: # 1) Install a vision-capable wheel from the guide. # 2) Then verify with: # python -c "import llama_cpp; print(llama_cpp.__version__)" # # Placeholder only; do not rely on this line alone: # llama-cpp-python


The ComfyUI-QwenVL custom node integrates the powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including the latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.
[!IMPORTANT]
Install llama-cpp-python before running GGUF nodes instruction
[](https://github.com/1038lab/ComfyUI-QwenVL/blob/main/example_workflows/QWenVL.json)
“`
cd ComfyUI/custom\_nodes
git clone https://github.com/1038lab/ComfyUI-QwenVL.git
“`
“`
cd ComfyUI/custom\_nodes/ComfyUI-QwenVL
pip install \-r requirements.txt
“`
For optimal performance on supported GPUs, install SageAttention:
pip install sageattention
This repo includes GGUF nodes powered by llama-cpp-python (separate from the Transformers-based nodes).
QwenVL (GGUF), QwenVL (GGUF Advanced), QwenVL Prompt Enhancer (GGUF)ComfyUI/models/llm/GGUF/ (configurable via gguf_models.json)llama-cpp-python wheel that provides Qwen3VLChatHandler / Qwen25VLChatHandlerSee docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md
hf_models.jsonhf_vl_models: vision-language models (used by QwenVL nodes).hf_text_models: text-only models (used by Prompt Enhancer).gguf_models.jsonAILab_System_Prompts.json (includes both VL prompts and prompt-enhancer styles).The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.
| Model | Link |
| :—- | :—- |
| Qwen3-VL-2B-Instruct | Download |
| Qwen3-VL-2B-Thinking | Download |
| Qwen3-VL-2B-Instruct-FP8 | Download |
| Qwen3-VL-2B-Thinking-FP8 | Download |
| Qwen3-VL-4B-Instruct | Download |
| Qwen3-VL-4B-Thinking | Download |
| Qwen3-VL-4B-Instruct-FP8 | Download |
| Qwen3-VL-4B-Thinking-FP8 | Download |
| Qwen3-VL-8B-Instruct | Download |
| Qwen3-VL-8B-Thinking | Download |
| Qwen3-VL-8B-Instruct-FP8 | Download |
| Qwen3-VL-8B-Thinking-FP8 | Download |
| Qwen3-VL-32B-Instruct | Download |
| Qwen3-VL-32B-Thinking | Download |
| Qwen3-VL-32B-Instruct-FP8 | Download |
| Qwen3-VL-32B-Thinking-FP8 | Download |
| Qwen2.5-VL-3B-Instruct | Download |
| Qwen2.5-VL-7B-Instruct | Download |
| Model | Link |
| :—- | :—- |
| Qwen3-0.6B | Download |
| Qwen3-4B-Instruct-2507 | Download |
| qwen3-4b-Z-Image-Engineer | Download |
| Group | Model | Repo | Alt Repo | Model Files | MMProj |
| :– | :– | :– | :– | :– | :– |
| Qwen text (GGUF) | Qwen3-4B-GGUF | Qwen/Qwen3-4B-GGUF | | Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf | |
| Qwen-VL (GGUF) | Qwen3-VL-4B-Instruct-GGUF | Qwen/Qwen3-VL-4B-Instruct-GGUF | | Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-4B-Instruct-F16.gguf |
| Qwen-VL (GGUF) | Qwen3-VL-8B-Instruct-GGUF | Qwen/Qwen3-VL-8B-Instruct-GGUF | | Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-8B-Instruct-F16.gguf |
| Qwen-VL (GGUF) | Qwen3-VL-4B-Thinking-GGUF | Qwen/Qwen3-VL-4B-Thinking-GGUF | | Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-4B-Thinking-F16.gguf |
| Qwen-VL (GGUF) | Qwen3-VL-8B-Thinking-GGUF | Qwen/Qwen3-VL-8B-Thinking-GGUF | | Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-8B-Thinking-F16.gguf |
For more control, use the “QwenVL (Advanced)” node. This gives you access to detailed generation parameters like temperature, top\_p, beam search, and device selection.
| Parameter | Description | Default | Range | Node(s) |
| :—- | :—- | :—- | :—- | :—- |
| model\_name | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | \- | Standard & Advanced |
| quantization | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced |
| attention\_mode | Attention mechanism: auto (Sage→Flash→SDPA), sage, flash\_attention\_2, sdpa | auto | auto, sage, flash\_attention\_2, sdpa | Standard & Advanced |
| preset\_prompt | A selection of pre-defined prompts for common tasks. | “Describe this…” | Any text | Standard & Advanced |
| custom\_prompt | Overrides the preset prompt if provided. | | Any text | Standard & Advanced |
| max\_tokens | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced |
| keep\_model\_loaded | Keep the model in VRAM for faster subsequent runs. | True | True/False | Standard & Advanced |
| seed | A seed for reproducible results. | 1 | 1 \- 2^64-1 | Standard & Advanced |
| temperature | Controls randomness. Higher values \= more creative. (Used when num\_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only |
| top\_p | Nucleus sampling threshold. (Used when num\_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only |
| num\_beams | Number of beams for beam search. \> 1 disables temperature/top\_p sampling. | 1 | 1-10 | Advanced Only |
| repetition\_penalty | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only |
| frame\_count | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only |
| device | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |
| use\_torch\_compile | Enable torch.compile optimization for faster inference. | False | True/False | Advanced Only |
| Mode | Precision | Memory Usage | Speed | Quality | Recommended For |
| :—- | :—- | :—- | :—- | :—- | :—- |
| None (FP16) | 16-bit Float | High | Fastest | Best | High VRAM GPUs (16GB+) |
| 8-bit (Balanced) | 8-bit Integer | Medium | Fast | Very Good | Balanced performance (8GB+) |
| 4-bit (VRAM-friendly) | 4-bit Integer | Low | Slower\* | Good | Low VRAM GPUs (<8GB) |
\* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.
| Mode | Description | Best For |
| :—- | :—- | :—- |
| auto | Automatically selects best available: Sage → Flash → SDPA | Most users (recommended) |
| sage | SageAttention with GPU-optimized kernels | Speed on modern GPUs (RTX 40 series, Hopper, Blackwell) |
| flash\_attention\_2 | Flash Attention 2 | Speed when Sage unavailable |
| sdpa | PyTorch SDPA (default) | Compatibility, FP8/BitsAndBytes models |
Note: FP8 models and BitsAndBytes quantization automatically use SDPA regardless of selection.
| Setting | Recommendation |
| :—- | :—- |
| Model Choice | For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the \-FP8 version for better performance. |
| Memory Mode | Keep keep\_model\_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes. |
| Quantization | Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit. |
| Attention Mode | Use “auto” for best performance. SageAttention provides fastest inference on supported GPUs. |
| Performance | The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep\_model\_loaded enabled) will be much faster. |
This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.
This repository’s code is released under the GPL-3.0 License.