ComfyUI-OmniGen2

★ 122

多模态图像生成图像编辑文本引导

在ComfyUI中集成的OmniGen2节点，结合3B VLM与4B扩散模型，提供高效的多模态图像生成与基于文本+图像的编辑能力。

💡 使用文本与参考图像共同生成或编辑高质量图像。

🍴 11 Forks💻 Python🔄 2025-06-26

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/a9fb3a59e10c

📦 requirements.txt

torch>=2.6.0
torchvision>=0.21.0
timm
einops
accelerate
transformers>=4.51.3
diffusers
opencv-python-headless
scipy
wandb
matplotlib
Pillow
tqdm
omegaconf
python-dotenv
ninja
ipykernel
numpy

📄 README

ComfyUI-OmniGen2

ComfyUI-OmniGen2 is now available in ComfyUI, OmniGen2 is a powerful and efficient unified multimodal model. Its architecture is composed of two key components: a 3B Vision-Language Model (VLM) and a 4B diffusion model.

Installation

Make sure you have ComfyUI installed

Clone this repository into your ComfyUI’s custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/Yuan-ManX/ComfyUI-OmniGen2.git

Environment Setup

✅ Recommended Setup

# 1. Environment
cd ComfyUI-OmniGen2

# 2. (Optional) Create a clean Python environment
conda create -n omnigen2 python=3.11
conda activate omnigen2

# 3. Install dependencies
# 3.1 Install PyTorch (choose correct CUDA version)
pip install torch==2.6.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu124

# 3.2 Install other required packages
pip install -r requirements.txt

# Note: Version 2.7.4.post1 is specified for compatibility with CUDA 12.4.
# Feel free to use a newer version if you use CUDA 12.6 or they fixed this compatibility issue.
pip install flash-attn==2.7.4.post1 --no-build-isolation

🌏 For users in Mainland China

# Install PyTorch from a domestic mirror
pip install torch==2.6.0 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu124

# Install other dependencies from Tsinghua mirror
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

# Note: Version 2.7.4.post1 is specified for compatibility with CUDA 12.4.
# Feel free to use a newer version if you use CUDA 12.6 or they fixed this compatibility issue.
pip install flash-attn==2.7.4.post1 --no-build-isolation -i https://pypi.tuna.tsinghua.edu.cn/simple

Model

Download Pretrained Models

OmniGen2, a multimodal generation model, model weights can be accessed in huggingface and modelscope.

💡 Usage Tips

To achieve optimal results with OmniGen2, you can adjust the following key hyperparameters based on your specific use case.

text_guidance_scale: Controls how strictly the output adheres to the text prompt (Classifier-Free Guidance).

image_guidance_scale: This controls how much the final image should resemble the input reference image.

The Trade-off: A higher value makes the output more faithful to the reference image’s structure and style, but it might ignore parts of your text prompt. A lower value (~1.5) gives the text prompt more influence.

Tip: For image editing task, we recommend to set it between 1.2 and 2.0; for in-context generateion task, a higher image_guidance_scale will maintian more details in input images, and we recommend to set it between 2.5 and 3.0.

max_pixels: Automatically resizes images when their total pixel count (width × height) exceeds this limit, while maintaining its aspect ratio. This helps manage performance and memory usage.

Tip: Default value is 1024*1024. You can reduce this value if you encounter memory issues.

max_input_image_side_length: Maximum side length for input images.

negative_prompt: Tell the model what you don’t want to see in the image.

Example: blurry, low quality, text, watermark

Tip: For the best results, try experimenting with different negative prompts. If you’re not sure, just use the default negative prompt.

enable_model_cpu_offload: Reduces VRAM usage by nearly 50% with a negligible impact on speed.

This is achieved by offloading the model weights to CPU RAM when they are not in use.

See: Model Offloading

enable_sequential_cpu_offload: Minimizes VRAM usage to less than 3GB, but at the cost of significantly slower performance.

This works by offloading the model in submodules and loading them onto the GPU sequentially as needed.

See: CPU Offloading

Some suggestions for improving generation quality:

Use high-resolution and high-quality images. Images that are too small or blurry will also result in low-quality output. We recommend ensuring that the input image size is greater than 512 whenever possible.

Provide detailed instructions. For in-context generation tasks, specify which elements from which image the model should use.

Try to use English as much as possible, as OmniGen2 currently performs better in English than in Chinese.