comfyui-ollama-model-manager

★ 1

模型管理按需加载/卸载内存优化聊天生成

comfyui-ollama-model-manager 在 ComfyUI 中按需管理 Ollama 模型，自动加载/卸载并动态刷新模型列表，优化内存并支持聊天生成与日志记录。

💡 在ComfyUI流程中按需加载/卸载Ollama模型以节省内存并进行对话生成。

🍴 1 Forks💻 Python🔄 2025-11-05

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/af9fbf81e746

📄 README

ComfyUI Ollama Model Manager

Custom nodes for managing Ollama models in ComfyUI workflows. Load and unload models on-demand to optimize memory usage in constrained environments.

Features

🔄 Auto-Fetch Models – Models load automatically when you connect nodes (no workflow execution needed!)

💬 Chat Completion – Full text generation with conversation history

🔄 Dynamic Dropdowns – Model list updates instantly via ComfyUI API

🎯 Type-Safe Connections – Client config passed between nodes

⬇️ Load/Unload Models – Control memory usage efficiently

📋 Beautiful Logging – Colored console output with JSON file logs

💾 Model Caching – Per-endpoint caching for better performance

✨ No CORS Issues – Backend API proxy eliminates browser restrictions

Installation

Recommended: ComfyUI-Manager

Install via ComfyUI-Manager

Search for “Ollama Manager”

Click Install

Manual Installation

cd ComfyUI/custom_nodes
git clone https://github.com/darth-veitcher/comfyui-ollama-model-manager
cd comfyui-ollama-model-manager

# Install dependencies (auto-detects uv or uses pip)
python install.py

# OR manually with uv (recommended)
uv pip install httpx loguru rich

# OR manually with pip
pip install httpx loguru rich

For portable ComfyUI installations:

# Windows Portable
ComfyUI\python_embeded\python.exe install.py

# Or manually
ComfyUI\python_embeded\python.exe -m pip install httpx loguru rich

🎯 Quick Start Guide

Step 1: Add Ollama Client

Add an Ollama Client node to your workflow

Set endpoint to your Ollama server URL

Default: http://localhost:11434

Or use your remote server URL

Step 2: Add Model Selector

Add an Ollama Model Selector node

Connect the client output from Ollama Client to the client input

✨ Models auto-fetch immediately! – No need to execute the workflow

Select your desired model from the dropdown

Step 3: Load the Model

Add an Ollama Load Model node

Connect client from Model Selector

The model dropdown auto-populates with available models

Set keep_alive (default -1 keeps it loaded)

Execute the workflow to load the model

Step 4: Generate Text with Chat

Add an Ollama Chat Completion node

Connect client from Model Selector (model auto-populates)

Enter your prompt in the prompt field

(Optional) Add a system_prompt to control behavior

Execute to generate a response!

Example:

prompt: “Write a haiku about programming”

system_prompt: “You are a helpful assistant”

response: Returns the generated text

history: Returns the conversation (for multi-turn chat)

Step 5: Multi-Turn Conversations (Optional)

For conversations with memory:

Connect the history output from one Chat Completion node

To the history input of the next Chat Completion node

Each response remembers the previous messages

Step 6: Unload When Done (Optional)

Add an Ollama Unload Model node

Connect it after your processing

This frees up memory

Nodes Reference

Core Nodes

| Node | Description |

|——|————-|

| Ollama Client | Creates a reusable Ollama connection config |

| Ollama Model Selector | Select model with auto-fetch on connection |

| Ollama Load Model | Loads a model into Ollama’s memory |

| Ollama Chat Completion | Generate text with conversation history |

| Ollama Unload Model | Unloads a model to free memory |

Debug/Utility Nodes

| Node | Description |

|——|————-|

| Ollama Debug: History | Formats conversation history as readable text for inspection |

| Ollama Debug: History Length | Returns the number of messages in conversation history |

Option Nodes (Composable Parameters)

|——|———–|————|———|————-|

| Extra Body | extra_body | JSON | {} | Advanced parameters (num_ctx, num_gpu, etc.) |

Advanced Usage

The architecture provides a clean, composable workflow:

[Ollama Client] → [Model Selector] → [Load Model] → [Chat Completion] → [Unload Model]
       ↓               ↓                   ↓                ↓
  (endpoint)     (pick model,        (load with)      (generate text,
                  auto-refresh)       keep_alive)      track history)

Key Benefits:

Reusable Client: Create one client, connect to multiple nodes

Auto-refresh: Model Selector can refresh the list automatically

Type Safety: Client connection passed between nodes

Cleaner Workflows: Less redundant endpoint configuration

Dynamic Dropdowns: Model list automatically populates after refresh

Conversation Memory: History passed between chat nodes for multi-turn conversations

Example Workflow: Simple Chat

1. Ollama Client (endpoint: http://localhost:11434)
       ↓
2. Model Selector (model: "llama3.2", refresh: true)
       ↓
3. Load Model (keep_alive: "-1")
       ↓
4. Chat Completion (prompt: "Hello!")
       ↓
5. Unload Model

Example Workflow: Multi-Turn Conversation

1. [Client] → [Selector] → [Load] → [Chat 1: "My name is Alice"]
                                          ↓ (history)
                                    [Chat 2: "What's my name?"]
                                          ↓ (history)
                                    [Chat 3: "Tell me a joke"]
       ↓
2. Unload Model

Example Workflow: Chat with Options

[Client] → [Selector] → [Load Model]
                           ↓
       ┌───────────────────┴────────────────────┐
       ↓                   ↓                     ↓
[Temperature=0.7]    [Seed=42]          [MaxTokens=200]
       └───────────────────┬────────────────────┘
                           ↓ (merged options)
                   [Chat Completion]
                           ↓
                    "Deterministic response"

Example Workflow: Advanced Parameters

[Temperature=0.8] → [TopK=50] → [RepeatPenalty=1.2] → [ExtraBody]
                                                           ↓
                                                    {"num_ctx": 4096}
                                                           ↓
                                                    [Chat Completion]

This pattern optimizes memory by unloading models when not needed, while maintaining full conversation context and precise control over generation parameters.

Configuration

Ollama Endpoint

Default: http://localhost:11434

Override by specifying a different endpoint in the “Refresh Model List” or “Load/Unload” nodes.

Keep Alive

Control how long models stay in memory:

-1 (default): Keep loaded indefinitely

5m: Keep for 5 minutes

1h: Keep for 1 hour

0: Unload immediately

Chat Parameters

The Ollama Chat Completion node supports:

Required:

client – Ollama client connection

model – Model name (auto-populated from selector)

prompt – User message/question

Optional:

system_prompt – Instructions to guide model behavior

history – Previous conversation (for multi-turn chat)

options – Generation parameters (temperature, seed, etc.)

format – Output format: “none” (default, text) or “json” (structured JSON)

image – Image input for vision models

Outputs:

response – Generated text

history – Updated conversation (connect to next chat node)

Caching & Performance:

The chat node intelligently caches results to avoid unnecessary LLM calls:

With Seed: When you provide a seed via the OllamaOptionSeed node, identical inputs will be cached (like standard ComfyUI nodes). This prevents wasteful re-execution when re-running the same workflow.

Without Seed: When no seed is provided, the node will always re-execute to generate fresh, non-deterministic responses.

Example: Deterministic workflow with caching

[Seed=42] → [Chat Completion] → Output
              ↓
        (Cached on re-run!)

This matches ComfyUI’s standard behavior and significantly reduces API costs when iterating on workflows.

JSON Mode (Phase 3)

The format parameter enables structured output for workflows that need parseable data:

Example: Extract structured data

[Chat Completion]
├── format: "json"
├── prompt: "Extract person data: 'Alice is 30 years old'"
└── system_prompt: "Return JSON with keys: name, age"

Output: {"name": "Alice", "age": 30}

When to use JSON mode:

Data extraction workflows

Structured output for downstream processing

API integrations requiring JSON

ComfyUI workflows that parse the response

Note: Set format to “json” to enable. The model will ensure valid JSON output.

Debug Utilities (Phase 3)

Ollama Debug: History – Inspect conversation memory

[Chat History] → [Debug: History]
                      ↓
           Formatted Text Output:
           === Conversation History (3 messages) ===

           [1] SYSTEM:
               You are helpful

           [2] USER:
               Hello

           [3] ASSISTANT:
               Hi there!

Ollama Debug: History Length – Count messages

[Chat History] → [History Length] → Output: 5 (messages)

Use cases:

Debugging conversation flow

Monitoring context length

Workflow conditional logic based on message count

Understanding what the model “remembers”

Logging

Logs are written to:

Console: Colored output with timestamps

File: logs/ollama_manager.json (14-day retention, compressed)

Example log output:

08:36:30 | INFO     | refresh-abc123 | 🔄 Refreshing model list from http://localhost:11434
08:36:30 | INFO     | refresh-abc123 | ✅ Model list refreshed: 3 models available
08:36:31 | INFO     | load-def456    | ⬇️  Loading model 'llava:latest' (keep_alive=-1)
08:36:32 | INFO     | load-def456    | ✅ Model 'llava:latest' loaded successfully

Requirements

Python ≥3.12

httpx ≥0.28.1

loguru ≥0.7.3

rich ≥14.2.0

Ollama running locally or remotely

Development

Project Structure

comfyui-ollama-model-manager/
├── __init__.py              # ComfyUI entry point
├── install.py               # Dependency installer (uv/pip auto-detect)
├── pyproject.toml           # Package metadata & dependencies
├── src/
│   └── comfyui_ollama_model_manager/
│       ├── __init__.py      # Package init
│       ├── nodes.py         # Model management nodes
│       ├── chat.py          # Chat completion node
│       ├── types.py         # Custom type definitions
│       ├── ollama_client.py # API client (fetch, load, unload, chat)
│       ├── api.py           # ComfyUI API routes
│       ├── state.py         # Model cache
│       ├── log_config.py    # Logging setup
│       └── async_utils.py   # Async utilities
├── tests/                   # Pytest test suite (52 tests)
└── web/
    └── ollama_widgets.js    # Auto-fetch UI logic

Running Tests

# With uv (recommended)
uv run pytest

# Or with pip
pip install pytest pytest-asyncio
pytest

Troubleshooting

Nodes don’t appear in ComfyUI

Check that dependencies are installed: pip list | grep -E "httpx|loguru|rich"

Restart ComfyUI completely

Check ComfyUI console for error messages

Verify Ollama is running: curl http://localhost:11434/api/tags

Import errors

If you see ModuleNotFoundError, install dependencies manually:

pip install httpx loguru rich

Permission errors (Windows)

Close ComfyUI and run:

ComfyUI\python_embeded\python.exe -m pip install --upgrade httpx loguru rich

License

[Add your license here]

Credits

Built for ComfyUI

Uses Ollama API