transformers>=4.56.0 accelerate pillow torchvision opencv-python-headless huggingface_hub psutil # GGUF Node Requirements (optional) # pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

ComfyUI custom nodes for Tencent Youtu-VL vision-language model.
Youtu-VL is a lightweight yet powerful 4B parameter VLM with comprehensive vision-centric capabilities including visual grounding, segmentation, depth estimation, and pose estimation.
[](https://github.com/1038lab/ComfyUI-Youtu-VL/blob/main/LICENSE)
[](https://www.python.org/)
[](https://github.com/comfyanonymous/ComfyUI)
transformers (high precision) or llama.cpp (high speed/low VRAM).Search for ComfyUI Youtu-VL in the Manager (Publisher: 1038lab) and click Install.
Clone this repo into your custom_nodes folder:
cd ComfyUI/custom_nodes/
git clone https://github.com/1038lab/ComfyUI-Youtu-VL.git
cd ComfyUI-Youtu-VL
pip install -r requirements.txt
To use the GGUF nodes for faster inference:
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
*(Replace cu121 with your CUDA version, e.g., cu118 or metal for macOS)*
*Best for precision and research.*
models/LLM/Youtu-VL).*Best for speed and daily usage.*
.gguf files (Q4_K_M, Q8_0, F16).The nodes include smart presets (in config.json) to automate common tasks:
| Preset Mode | Description |
| :— | :— |
| 📝 Detailed Description | Writes a full paragraph describing lighting, composition, and subjects. |
| 🔍 Analyze Elements | Lists key objects and layout details. |
| 🏷️ Generate Tags | Creates comma-separated tags (Danbooru style). |
| 📄 OCR Text | Reads and outputs visible text. |
| 🎨 Art Style | Identifies medium, artist style, and technique. |
| ❓ Visual QA | Ask custom questions like “What color is the…” |
llama-cpp-python is installed successfully.Beta/ folder.