Comfyui-QwenEditUtils

★ 768

图像编辑Qwen兼容多参考图潜在空间编码

为Qwen图像编辑提供一组ComfyUI实用节点，支持最多3张参考图、VAE与VL独立缩放、单图输出与潜在空间编码，便于接入Qwen模型和自定义模板。

💡 在ComfyUI中用Qwen模型基于参考图进行精细图像编辑。

🍴 26 Forks💻 Python🔄 2026-01-24

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/e58c8376a81b

📄 README

Comfyui-QwenEditUtils

A collection of utility nodes for Qwen-based image editing in ComfyUI.

Note: This repository has been succeeded by ComfyUI-EditUtils, which provides enhanced capabilities with support for multiple AI models including Qwen and Flux2Klein. We recommend using the new repository for future projects.

Example

You can find complete ComfyUI workflow examples in the following files:

[qwen edit custom.json](qwen edit custom.json) – demonstrates advanced usage with full configuration options

[qwen edit custom mask.json](qwen edit custom mask.json) – demonstrates mask usage with full configuration options

Update Log

v2.0.5

Added mask option to configs nodes and custom node for improved image editing.

v2.0.3

Added QwenEditAdaptiveLongestEdge for easier to get the best longest edge of the image.

v2.0.0

Added TextEncodeQwenImageEditPlusCustom_lrzjason for highly customized image editing.

Added QwenEditConfigPreparer, QwenEditConfigJsonParser for creating image configurations.

Added QwenEditOutputExtractor for extracting outputs from the custom node.

Added QwenEditListExtractor for extracting items from lists.

Added CropWithPadInfo for cropping images with pad information.

Node

TextEncodeQwenImageEditPlus lrzjason

This node provides text encoding functionality with reference image support for Qwen-based image editing workflows. It allows you to encode prompts while incorporating up to 5 reference images for more controlled image generation.

Inputs

clip: The CLIP model to use for encoding

prompt: The text prompt to encode

vae (optional): The VAE model for image encoding

image1 (optional): First reference image for image editing

image2 (optional): Second reference image for image editing

image3 (optional): Third reference image for image editing

image4 (optional): Fourth reference image for image editing

image5 (optional): Fifth reference image for image editing

enable_resize (optional): Enable automatic resizing of the reference image for VAE encoding

enable_vl_resize (optional): Enable automatic resizing of the reference image for VL encoding

llama_template (optional): Custom Llama template for image description and editing instructions

Outputs

CONDITIONING: The encoded conditioning tensor

image1: The processed first reference image

image2: The processed second reference image

image3: The processed third reference image

image4: The processed fourth reference image

image5: The processed fifth reference image

LATENT: The encoded latent representation of the first reference image

Behavior

Encodes text prompts using CLIP with optional reference image guidance

Supports up to 5 reference images for complex editing tasks

Automatically resizes reference images to optimal dimensions for both VAE and VL encoding

Integrates with VAE models to encode reference images into latent space

Supports custom Llama templates for more precise image editing instructions

Processes images separately for VAE encoding (1024×1024) and VL encoding (384×384)

Returns individual processed images for more flexible workflow connections

TextEncodeQwenImageEditPlusAdvance lrzjason

This advanced node provides enhanced text encoding functionality with reference image support for Qwen-based image editing workflows. It offers more precise control over image resizing and supports flexible image input configurations with separate processing for VAE and VL encoding.

Inputs

clip: The CLIP model to use for encoding

prompt: The text prompt to encode

vae (optional): The VAE model for image encoding

vl_resize_image1 (optional): First reference image for VL encoding with resizing

vl_resize_image2 (optional): Second reference image for VL encoding with resizing

vl_resize_image3 (optional): Third reference image for VL encoding with resizing

not_resize_image1 (optional): First reference image without VL resizing

not_resize_image2 (optional): Second reference image without VL resizing

not_resize_image3 (optional): Third reference image without VL resizing

target_size (optional): Target size for VAE encoding (options: 1024, 1344, 1536, 2048, 768, 512)

target_vl_size (optional): Target size for VL encoding (default: 384)

upscale_method (optional): Method for upscaling images (options: “lanczos”, “bicubic”, “area”)

crop (optional): Cropping method (options: “center”, “disabled”)

instruction (optional): Custom instruction for image editing

Outputs

CONDITIONING: The encoded conditioning tensor

LATENT: The encoded latent representation of the first reference image

target_image1: The processed first target reference image

target_image2: The processed second target reference image

target_image3: The processed third target reference image

vl_resized_image1: The first VL-resized reference image

vl_resized_image2: The second VL-resized reference image

vl_resized_image3: The third VL-resized reference image

Behavior

Provides advanced text encoding with separate control over VAE and VL image processing

Supports 3 reference images with different resize behaviors

Offers multiple target size options for more flexible image processing

Maintains separate image outputs for VAE-encoded and VL-resized images

Provides enhanced upscale and crop controls for optimal image processing

Integrates with custom instructions for tailored image editing

TextEncodeQwenImageEditPlusPro lrzjason

This professional node provides the most flexible text encoding functionality with reference image support for Qwen-based image editing workflows. It offers fine-grained control over which images are VL-resized and includes a main image designation for focused conditioning.

Inputs

clip: The CLIP model to use for encoding

prompt: The text prompt to encode

vae (optional): The VAE model for image encoding

image1 (optional): First reference image for image editing

image2 (optional): Second reference image for image editing

image3 (optional): Third reference image for image editing

image4 (optional): Fourth reference image for image editing

image5 (optional): Fifth reference image for image editing

vl_resize_indexs (optional): Comma-separated indexes of images to apply VL resizing (default: “0,1,2”)

main_image_index (optional): Index of the main reference image for focused conditioning (default: 0)

target_size (optional): Target size for VAE encoding (options: 1024, 1344, 1536, 2048, 768, 512)

target_vl_size (optional): Target size for VL encoding (options: 384, 392)

upscale_method (optional): Method for upscaling images (options: “lanczos”, “bicubic”, “area”)

crop_method (optional): Cropping method (options: “pad”, “center”, “disabled”)

instruction (optional): Custom instruction for image editing

Outputs

CONDITIONING: The encoded conditioning tensor with all reference latents

LATENT: The encoded latent representation of the main reference image

image1: The processed first reference image

image2: The processed second reference image

image3: The processed third reference image

image4: The processed fourth reference image

image5: The processed fifth reference image

CONDITIONING: The encoded conditioning tensor with only the main reference latent

ANY: The pad information dictionary containing scaling and padding details

Behavior

Provides professional-level text encoding with maximum flexibility for image processing

Supports up to 5 reference images with configurable VL-resizing per image

Allows designation of a main reference image for focused conditioning

Offers separate conditioning outputs for full reference and main reference conditioning

Provides multiple target size options for more flexible image processing

Includes pad information for potential image cropping and scaling operations

Provides enhanced upscale and crop controls, including padding for image preservation

TextEncodeQwenImageEditPlusCustom_lrzjason

This node provides maximum flexibility for image editing workflows by allowing custom configurations for each reference image. Each image can have its own specific settings for both reference (VAE) and vision-language (VL) processing, enabling highly customized image editing scenarios.

Inputs

clip: The CLIP model to use for encoding

vae: The VAE model for image encoding

prompt: The text prompt to encode

configs: A list of configuration dictionaries for each image, containing:

image: The reference image for image editing

to_ref: Whether to include image in reference processing (VAE encoding)

ref_main_image: Whether this image is the main reference image for focused conditioning

ref_longest_edge: Target longest edge size for reference processing (default: 1024)

ref_crop: Cropping method for reference processing (options: “pad”, “center”, “disabled”)

ref_upscale: Upscaling method for reference processing (options: “lanczos”, “bicubic”, “area”)

to_vl: Whether to include image in vision-language processing

vl_resize: Whether to resize image for VL processing (default: True)

vl_target_size: Target size for VL processing (default: 384)

vl_crop: Cropping method for VL processing (options: “center”, “disabled”)

vl_upscale: Upscaling method for VL processing (options: “bicubic”, “area”, “lanczos”)

mask: Optional mask for the image to define region of interest for editing

return_full_refs_cond (optional): Whether to return conditioning with all reference images or only with the main reference (default: True)

instruction (optional): Custom instruction for image editing

Outputs

CONDITIONING: The encoded conditioning tensor (with all or only main reference based on return_full_refs_cond)

LATENT: The encoded latent representation of the main reference image

custom_output: A dictionary containing:

pad_info: Padding information dictionary containing scaling and padding details

full_refs_cond: Conditioning with all reference latents

main_ref_cond: Conditioning with only the main reference latent

main_image: The processed main reference image

vae_images: List of all processed VAE images

ref_latents: List of all reference latents

vl_images: List of all processed VL images

full_prompt: The complete prompt with image descriptions

llama_template: The applied system prompt template

mask: The processed mask used for region of interest

Behavior

Provides maximum flexibility by allowing per-image configurations for both reference and VL processing

Supports multiple reference images with different processing requirements simultaneously

Allows fine-grained control over scaling, cropping, and resizing for each image

Returns comprehensive output dictionary with all intermediate results

Integrates with custom instructions for tailored image editing

Provides both full reference and main reference conditioning outputs

QwenEditConfigPreparer

This helper node creates configuration objects for use with the TextEncodeQwenImageEditPlusCustom_lrzjason node. It allows you to define custom processing parameters for individual images.

Inputs

image: The reference image to configure

configs (optional): An existing list of configuration objects to append to

to_ref (optional): Whether to include image in reference processing (default: True)

ref_main_image (optional): Whether this image is the main reference image (default: True)

ref_longest_edge (optional): Target longest edge size for reference processing (default: 1024)

ref_crop (optional): Cropping method for reference processing (options: “pad”, “center”, “disabled”, default: “center”)

ref_upscale (optional): Upscaling method for reference processing (options: “lanczos”, “bicubic”, “area”, default: “lanczos”)

to_vl (optional): Whether to include image in vision-language processing (default: True)

vl_resize (optional): Whether to resize image for VL processing (default: True)

vl_target_size (optional): Target size for VL processing (default: 384)

vl_crop (optional): Cropping method for VL processing (options: “center”, “disabled”, default: “center”)

vl_upscale (optional): Upscaling method for VL processing (options: “bicubic”, “area”, “lanczos”, default: “bicubic”)

mask (optional): Optional mask for the image to define region of interest for editing

Outputs

configs: The updated list of configuration objects

config: The configuration object for the current image

Behavior

Creates configuration objects that define how each image should be processed

Allows appending to existing configuration lists

Provides default values for all configuration parameters

Output config list can be connected directly to TextEncodeQwenImageEditPlusCustom_lrzjason

QwenEditConfigJsonParser

This helper node creates configuration objects from JSON strings for use with the TextEncodeQwenImageEditPlusCustom_lrzjason node. It provides an alternative method to define configuration parameters.

Inputs

image: The reference image to configure

configs (optional): An existing list of configuration objects to append to

config_json (optional): JSON string containing configuration parameters

mask (optional): Optional mask for the image to define region of interest for editing

Outputs

configs: The updated list of configuration objects

config: The configuration object for the current image

Behavior

Creates configuration objects from JSON strings

Allows appending to existing configuration lists

Provides a default JSON configuration template

Output config list can be connected directly to TextEncodeQwenImageEditPlusCustom_lrzjason

QwenEditOutputExtractor

This helper node extracts specific outputs from the custom_output dictionary produced by the TextEncodeQwenImageEditPlusCustom_lrzjason node.

Inputs

custom_output: The custom output dictionary from the custom node

Outputs

pad_info: Padding information dictionary

full_refs_cond: Conditioning with all reference latents

main_ref_cond: Conditioning with only the main reference latent

main_image: The main reference image

vae_images: List of all processed VAE images

ref_latents: List of all reference latents

vl_images: List of all processed VL images

full_prompt: The complete prompt with image descriptions

llama_template: The applied system prompt template

no_refs_cond: Conditioning without any reference latents

mask: The processed mask used for region of interest

Behavior

Extracts individual components from the complex output dictionary

Provides access to all intermediate results from the custom node

Enables modular processing of different output components

QwenEditListExtractor

This helper node extracts a specific item from a list based on its index position.

Inputs

items: The input list

index: The index of the item to extract (default: 0)

Outputs

item: The extracted item from the list

Behavior

Extracts a single item from a list based on index

Useful for extracting specific images from the vae_images list or other collections

Supports any list items regardless of type

QwenEditAdaptiveLongestEdge

This utility node calculates an appropriate longest edge size for an image, ensuring it doesn’t exceed a specified maximum size. This is particularly useful when you want to automatically determine the best longest edge for image processing based on image dimensions, especially when working with large images that need to be downscaled for processing.

Inputs

image: The input image to analyze

max_size: The maximum allowed size for the longest edge (default: 2048, range: 512-4096)

Outputs

longest_edge: The calculated longest edge size for the image that respects the max_size constraint

Behavior

Calculates the longest edge of the input image

If the longest edge exceeds max_size, it calculates a reduced size that fits within the constraint

Returns the appropriate longest edge size that can be used in other nodes like TextEncodeQwenImageEditPlusCustom_lrzjason for dynamic image processing

Useful for workflows where you want to prevent oversized images from causing memory or processing issues

CropWithPadInfo

This utility node crops an image using pad information generated by other nodes, allowing precise cropping back to the original content area after padding operations.

Inputs

image: The image to crop

pad_info: The pad information dictionary containing x, y, width, height and scale values

Outputs

cropped_image: The cropped image with original content dimensions

scale_by: The scale factor used in the original processing

Behavior

Uses pad information to crop images to their original content area

Removes padding that was added during processing

Returns the scale factor for potential additional operations

Key Features

Multi-Image Support: Incorporate up to 3 reference images into your text-to-image generation workflow

Dual Resize Options: Separate resizing controls for VAE encoding (1024px) and VL encoding (384px)

Individual Image Outputs: Each processed reference image is provided as a separate output for flexible connections

Latent Space Integration: Encode reference images into latent space for efficient processing

Qwen Model Compatibility: Specifically designed for Qwen-based image editing models

Customizable Templates: Use custom Llama templates for tailored image editing instructions

Installation

Clone or download this repository into your ComfyUI’s custom_nodes directory.

Restart ComfyUI.

The node will be available in the “advanced/conditioning” category.

Contact

Twitter: @Lrzjason

Email: lrzjason@gmail.com

QQ Group: 866612947

Wechatid: fkdeai

Civitai: xiaozhijason

Sponsors me for more open source projects:

Buy me a coffee:

WeChat: