ComfyUI-MiniCPM

★ 148

图像描述视觉语言模型批量与视频处理支持MiniCPM-V4/V4.5

在ComfyUI中集成MiniCPM视觉语言模型，支持MiniCPM-V-4.5(Transformers)与V-4.0(GGUF)，用于高质量图像/视频字幕、批量和多类型分析。

💡 为图片或视频批量生成高质量、多类型字幕与内容分析

🍴 14 Forks💻 Python🔄 2025-08-28

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/8f9eee5e2cdb

📦 requirements.txt

#
Core
dependencies
for
MiniCPM
transformers
functionality
torch>=2.0.0
transformers>=4.35.0
torchvision>=0.15.0
Pillow>=9.0.0
huggingface-hub>=0.16.0
hf_xet>=0.16.0
#
Optional:
GGUF
functionality
(llama-cpp-python)
#
Uncomment
the
line
below
if
you
want
GGUF
support
#
llama-cpp-python>=0.2.0
#
Additional
utilities
numpy>=1.21.0
accelerate>=0.20.0

📄 README

ComfyUI-MiniCPM

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

🎉 Now supports MiniCPM-V-4.5! The latest model with enhanced capabilities.

News & Updates

2025/08/28: Update ComfyUI-MIniCPM to v1.1.1 ( update.md )

2025/08/27: Update ComfyUI-MIniCPM to v1.1.0 ( update.md )

[](example_workflows/MiniCPM_v4VSv45.json)

Added support for MiniCPM-V-4.5 models (Transformers)

Features

MiniCPM-V-4 GGUF

[](example_workflows/MiniCPM-V-4-GGUF.json)

MiniCPM-V-4 Batch Images

[](example_workflows/MiniCPM-V-4_batchImages.json)

MiniCPM-V-4 video

[](example_workflows/MiniCPM-V-4_video.json)

Supports MiniCPM-V-4.5 (Transformers) and MiniCPM-V-4.0 (GGUF) models

Latest MiniCPM-V-4.5 with enhanced capabilities via Transformers

Multiple caption types to suit different use cases (Describe, Caption, Analyze, etc.)

Memory management options to balance VRAM usage and speed

Auto-downloads model files on first use for easy setup

Customizable parameters: max tokens, temperature, top-p/k sampling, repetition penalty

Advanced node with full parameter control

Legacy node for backward compatibility

Comprehensive GGUF quantization options for V4.0 models

Installation

Clone the repo into your ComfyUI custom nodes folder:

cd ComfyUI/custom_nodes

git clone https://github.com/1038lab/comfyui-minicpm.git

Install required dependencies:

cd ComfyUI/custom_nodes/comfyui-minicpm

ComfyUI\python_embeded\python pip install -r requirements.txt

ComfyUI\python_embeded\python llama_cpp_install.py

[!note]

llama-cpp-python CUDA Installation for ComfyUI Portable

– llama_cpp_install.md

Supported Models

Transformers Models

| Model | Description |

| ——————– | ———————————————- |

| MiniCPM-V-4.5 | 🌟 Latest V4.5 version with enhanced capabilities |

| MiniCPM-V-4.5-int4 | 🌟 V4.5 4-bit quantized version, smaller memory footprint |

| MiniCPM-V-4 | V4.0 full precision version, higher quality |

| MiniCPM-V-4-int4 | V4.0 4-bit quantized version, smaller memory footprint |

https://huggingface.co/openbmb/MiniCPM-V-4_5

https://huggingface.co/openbmb/MiniCPM-V-4_5-int4

https://huggingface.co/openbmb/MiniCPM-V-4

https://huggingface.co/openbmb/MiniCPM-V-4-int4

GGUF Models

Note: MiniCPM-V-4.5 GGUF models are temporarily unavailable due to llama-cpp-python compatibility issues. Please use MiniCPM-V-4.5 Transformers models or MiniCPM-V-4.0 GGUF models.

MiniCPM-V-4.0 (Fully Supported)

| Model | Size | Description |

| ——————– | ——— | ————————————- |

| MiniCPM-V-4 (Q4_K_M) | ~2.19GB | Recommended balance of quality/size |

| MiniCPM-V-4 (Q4_0) | ~2.08GB | Standard 4-bit quantization |

| MiniCPM-V-4 (Q4_1) | ~2.29GB | 4-bit quantization improved |

| MiniCPM-V-4 (Q4_K_S) | ~2.09GB | 4-bit K-quants small |

| MiniCPM-V-4 (Q5_0) | ~2.51GB | 5-bit quantization |

| MiniCPM-V-4 (Q5_1) | ~2.72GB | 5-bit quantization improved |

| MiniCPM-V-4 (Q5_K_M) | ~2.56GB | 5-bit K-quants medium |

| MiniCPM-V-4 (Q5_K_S) | ~2.51GB | 5-bit K-quants small |

| MiniCPM-V-4 (Q6_K) | ~2.96GB | Very high quality |

| MiniCPM-V-4 (Q8_0) | ~3.83GB | Highest quality quantized |

https://huggingface.co/openbmb/MiniCPM-V-4-gguf

The models will be automatically downloaded on first run.

Manual download and placement into models/LLM (transformers) or models/LLM/GGUF (GGUF) is also supported.

Available Nodes

1. MiniCPM-4-V-Transformers

Basic transformers-based node with essential parameters

Supports image and video input

Memory management options

Preset prompt types

2. MiniCPM-4-V-Transformers Advanced

Full-featured transformers-based node

All parameters customizable

System prompt support

Advanced video processing options

3. MiniCPM-4-V-GGUF

GGUF-based node with essential parameters

Optimized for performance

4. MiniCPM-4-V-GGUF Advanced

Full-featured GGUF-based node

All parameters customizable

5. MiniCPM (Legacy)

Original node for backward compatibility

Basic functionality

Usage

Add the MiniCPM node from the 🧪AILab category in ComfyUI.

Connect an image or video input node to the MiniCPM node.

Select the model variant (default is MiniCPM-V-4-int4 for transformers).

Choose caption type and adjust parameters as needed.

Execute your workflow to generate captions or analysis.

Configuration Defaults

{

  "context_window": 4096,

  "gpu_layers": -1,

  "cpu_threads": 4,

  "default_max_tokens": 1024,

  "default_temperature": 0.7,

  "default_top_p": 0.9,

  "default_top_k": 100,

  "default_repetition_penalty": 1.10,

  "default_system_prompt": "You are MiniCPM-V, a helpful, concise and knowledgeable vision-language assistant. Answer directly and stay on task."

}

Caption Types

Describe: Describe this image in detail.

Caption: Write a concise caption for this image.

Analyze: Analyze the main elements and scene in this image.

Identify: What objects and subjects do you see in this image?

Explain: Explain what’s happening in this image.

List: List the main objects visible in this image.

Scene: Describe the scene and setting of this image.

Details: What are the key details in this image?

Summarize: Summarize the key content of this image in 1-2 sentences.

Emotion: Describe the emotions or mood conveyed by this image.

Style: Describe the artistic or visual style of this image.

Location: Where might this image be taken? Analyze the setting or location.

Question: What question could be asked based on this image?

Creative: Describe this image as if writing the beginning of a short story.

Memory Management Options

Keep in Memory: Model stays loaded for faster subsequent runs

Clear After Run: Model is unloaded after each run to save memory

Global Cache: Model is cached globally and shared between nodes

Tips

VRAM Requirements

4-6GB VRAM: Use MiniCPM-V-4-int4 or GGUF Q4 models

8GB VRAM: Use MiniCPM-V-4.5-int4 (recommended)

12GB+ VRAM: Can use full MiniCPM-V-4.5

CUDA OOM Error: Try int4 quantized models or CPU mode

General Tips

🌟 Try MiniCPM-V-4.5 Transformers first – enhanced capabilities over V4.0

For best balance: use MiniCPM-V-4 (Q4_K_M) GGUF model

For highest quality: use MiniCPM-V-4.5 Transformers

For low VRAM: use MiniCPM-V-4.5-int4 or MiniCPM-V-4 (Q4_0) GGUF

Adjust temperature (0.6–0.8) for balancing creativity and coherence.

Use top-p (0.9) and top-k (80) sampling for natural output diversity.

Lower max tokens or precision (bf16/fp16) for faster generation on less powerful GPUs.

Memory modes help optimize VRAM usage: default, balanced, max savings.

Transformers models offer better quality but use more memory.

GGUF models are more memory-efficient but may have slightly lower quality.

License

GPL-3.0 License