ComfyUI-Ovis-U1

ComfyUI-Ovis-U1
★ 4

多模态模型封装设备与dtype选择ComfyUI自定义节点
为ComfyUI封装Ovis-U1多模态模型节点,提供理解、生成与编辑三大工作流,支持设备与dtype选择,简化模型部署与调用。
💡 在ComfyUI中快速调用Ovis-U1完成多模态理解、生成与编辑工作流。
🍴 1 Forks💻 Python🔄 2025-12-29
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/e58c8376a81b
📦 requirements.txt
pyarrow~=18.0.0
accelerate~=1.1.0
pydantic~=2.8.2
markdown2[all]
scikit-learn~=1.2.2
requests
httpx
uvicorn
fastapi~=0.112.4
timm~=1.0.11
tiktoken
transformers_stream_generator~=0.0.4
pandas
deepspeed~=0.15.4
pysubs2~=1.7.2
trl~=0.12.1
moviepy~=1.0.3
diffusers~=0.31.0
Text to Image workflow
Image to Text workflow
Image Edit workflow
📄 README

ComfyUI-Ovis-U1

This repository adds ComfyUI custom nodes that wrap the Ovis-U1 multimodal model, exposing three primary workflows inside the ComfyUI editor.

Features

  • Unified model support (understanding, generation, editing) through simple ComfyUI nodes
  • Options for device and dtype selection when loading the model (BF16 / FP16 / FP32)
  • Model download

    Using the model: AIDC-AI/Ovis-U1-3B (Hugging Face). You can either:

  • Place the model under models/ovis/AIDC-AI/Ovis-U1-3B manually (preferred for offline use), or
  • Use the Ovis-U1 Model Loader node with automatic download enabled.
  • Sharded weights and safetensors index files are supported. For large models ensure sufficient disk and GPU memory.

    Installation

    Clone this repository into your ComfyUI custom_nodes directory and install dependencies:

    cd ComfyUI/custom_nodes
    
    git clone https://github.com/neverbiasu/ComfyUI-Ovis-U1.git

    cd ComfyUI-Ovis-U1
    
    pip install -r requirements.txt

    Workflows

    Text-to-Image Generation

    Generate high-quality images from natural language prompts. This workflow shows the minimal node chain to go from prompt to final image.

    Image Understanding (Image → Text)

    Perform captioning, visual question answering, and scene understanding. Suitable for extracting structured descriptions from images.

    Image Editing Workflow

    Instruction-guided image editing. The node implements the official three-step conditional flow (unconditional, image-only, final conditioned) to produce high-quality edits.

    License

    This project is licensed under the Apache 2.0 License. Please refer to the official license terms for the use of the Ovis-U1 model.