comfyui_Niutonian_GLM_4_6V

comfyui_Niutonian_GLM_4_6V
★ 0

模型推理显存优化量化支持自动内存管理
comfyui_Niutonian_GLM_4_6V节点用于GLM-4.6V模型推理,核心价值是显存优化与量化支持,提供自动内存管理与OOM错误恢复。
💡 在ComfyUI中进行低显存GLM-4.6V推理并启用量化。
🍴 1 Forks💻 Python🔄 2026-01-05
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/51137d50651f
📦 requirements.txt
transformers>=5.0.0rc0
torch>=2.0.0
huggingface_hub>=0.23.0
Pillow
numpy
accelerate
bitsandbytes>=0.41.0
scipy
📄 README

Niutonian GLM-4.6V ComfyUI Nodes (Transformer Version)

This is the transformer-based implementation of Niutonian GLM-4.6V nodes for ComfyUI with extensive memory optimizations to prevent CUDA out-of-memory errors.

Version: v0.1

Features

  • Memory Optimized: Multiple strategies to reduce VRAM usage
  • Quantization Support: 4-bit and 8-bit quantization via bitsandbytes
  • Automatic Memory Management: CUDA cache clearing and efficient tensor handling
  • Error Recovery: Graceful handling of OOM errors with helpful messages
  • Niutonian Branding: Professional custom node package with consistent naming
  • Nodes

    1. Niutonian GLM46VLoader

    Loads the GLM-4.6V-Flash model with memory optimizations.

    Inputs:

  • device: auto/cuda/cpu (default: auto)
  • torch_dtype: auto/bfloat16/float16/float32 (default: bfloat16)
  • low_cpu_mem_usage: Enable low CPU memory usage (default: True)
  • load_in_8bit: Enable 8-bit quantization (default: False)
  • load_in_4bit: Enable 4-bit quantization (default: True)
  • Outputs:

  • GLM_MODEL: Model and processor for other nodes
  • 2. Niutonian GLM46VDescriber

    Describes images using the GLM-4.6V vision model.

    Inputs:

  • glm_model: Model from Niutonian GLM46VLoader
  • image: Input image tensor
  • user_prompt: Description prompt (default: “Describe this image in detail.”)
  • max_tokens: Maximum output tokens (default: 1024)
  • temperature: Sampling temperature (default: 0.7)
  • Outputs:

  • output_text: Clean description text
  • raw_output: Raw model output with thinking tags
  • 3. Niutonian GLM46VAgenticSampler

    Advanced KSampler that uses GLM-4.6V to verify generated images.

    Inputs:

  • Standard KSampler inputs (model, seed, steps, cfg, etc.)
  • glm_model: GLM model for verification
  • vae: VAE for decoding latents
  • verification_prompt: Prompt for image verification
  • max_retries: Maximum retry attempts (default: 3)
  • Outputs:

  • latent: Final latent representation
  • verified_image: Decoded image
  • is_match: Boolean indicating if image matches prompt
  • summary: Analysis summary
  • 4. Niutonian GLM46VPromptGenerator

    Intelligent prompt generator using GLM-4.6V vision model.

    Inputs:

  • glm_model: Model from Niutonian GLM46VLoader
  • mode: Generation mode (create_from_image, refine_prompt, creative_variations, style_transfer)
  • base_prompt: Base prompt text for refinement modes
  • style: Target artistic style (photorealistic, artistic, cinematic, anime, etc.)
  • detail_level: Level of detail (basic, detailed, very_detailed, ultra_detailed)
  • creativity: Creativity factor (0.0-1.0)
  • max_tokens: Maximum output tokens
  • reference_image: Optional reference image
  • negative_elements: Elements to avoid in prompts
  • Outputs:

  • positive_prompt: Generated positive prompt
  • negative_prompt: Generated negative prompt
  • analysis: Analysis of prompt choices
  • Memory Optimization Strategies

    1. Quantization (Recommended)

    Enable 4-bit or 8-bit quantization to significantly reduce VRAM usage:

  • 4-bit: ~75% memory reduction, minimal quality loss
  • 8-bit: ~50% memory reduction, negligible quality loss
  • 2. Device Mapping

  • Uses device_map="sequential" for efficient GPU memory allocation
  • Automatically reserves 15% of VRAM for other operations
  • Falls back to CPU if GPU memory is insufficient
  • 3. Memory Management

  • Automatic CUDA cache clearing before/after operations
  • Efficient tensor movement and cleanup
  • Gradient checkpointing enabled when available
  • Installation

  • Clone this repository to your ComfyUI custom_nodes directory:
  • cd /path/to/ComfyUI/custom_nodes
    git clone https://github.com/Niutonian/comfyui_Niutonian_GLM_4_6V.git

  • Install dependencies:
  • cd comfyui_Niutonian_GLM_4_6V
    pip install -r requirements.txt

  • Restart ComfyUI to load the new nodes
  • Usage Tips

    For Low VRAM Systems (8-12GB)

  • Enable 4-bit quantization in Niutonian GLM46VLoader
  • Use torch_dtype="float16"
  • Set low_cpu_mem_usage=True
  • For Medium VRAM Systems (16-24GB)

  • Try 8-bit quantization first
  • Fall back to 4-bit if needed
  • Use torch_dtype="float16"
  • For Medium VRAM Systems (24-32GB)

  • Try 8-bit quantization first
  • Fall back to 4-bit if needed
  • Use torch_dtype="bfloat16"
  • For High VRAM Systems (32GB+)

  • Can use without quantization
  • Use torch_dtype="bfloat16" for best performance
  • Consider torch_dtype="float16" if bfloat16 causes issues
  • Troubleshooting

    CUDA Out of Memory

  • Enable 4-bit quantization
  • Reduce max_tokens in Niutonian GLM46VDescriber
  • Close other GPU applications
  • Restart ComfyUI to clear memory
  • Model Loading Fails

  • Check internet connection (model downloads from HuggingFace)
  • Ensure sufficient disk space (~9GB for model)
  • Try CPU device if GPU fails
  • Check transformers version (>=5.0.0rc0 required)
  • Slow Performance

  • Ensure CUDA is available and working
  • Use quantization (4-bit/8-bit) for faster inference
  • Reduce image resolution if possible
  • Check GPU utilization
  • Grey or Missing Images

    If the image is generated but appears grey or not showing properly:

  • Reduce the image size to 1024×1024 or lower
  • Try again with the smaller resolution
  • This often resolves display issues with large images
  • Testing

    Run the memory test script to validate your setup:

    python test_memory.py

    This will test different quantization configurations and report memory usage.

    Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • transformers 5.0.0rc0+
  • bitsandbytes 0.41.0+ (for quantization)
  • CUDA-capable GPU (recommended)
  • 8GB+ VRAM (with quantization) or 24GB+ VRAM (without quantization)
  • Model Information

  • Model: zai-org/GLM-4.6V-Flash
  • Size: ~9B parameters
  • Context: 128K tokens
  • Vision: Supports image + text input
  • License: Check model repository for licensing terms
  • About Niutonian

    This package is part of the Niutonian suite of AI tools, providing professional-grade implementations of cutting-edge AI models for creative workflows.

    Version: v0.1

    Release Date: January 5, 2026

    Repository: Niutonian/comfyui_Niutonian_GLM_4_6V

    Version History

    See CHANGELOG.md for detailed version history and changes.