Custom nodes for managing Ollama models in ComfyUI workflows. Load and unload models on-demand to optimize memory usage in constrained environments.
cd ComfyUI/custom_nodes
git clone https://github.com/darth-veitcher/comfyui-ollama-model-manager
cd comfyui-ollama-model-manager
# Install dependencies (auto-detects uv or uses pip)
python install.py
# OR manually with uv (recommended)
uv pip install httpx loguru rich
# OR manually with pip
pip install httpx loguru rich
For portable ComfyUI installations:
# Windows Portable
ComfyUI\python_embeded\python.exe install.py
# Or manually
ComfyUI\python_embeded\python.exe -m pip install httpx loguru rich
endpoint to your Ollama server URLhttp://localhost:11434client output from Ollama Client to the client inputclient from Model Selectorkeep_alive (default -1 keeps it loaded)client from Model Selector (model auto-populates)prompt fieldsystem_prompt to control behaviorExample:
For conversations with memory:
history output from one Chat Completion nodehistory input of the next Chat Completion node| Node | Description |
|——|————-|
| Ollama Client | Creates a reusable Ollama connection config |
| Ollama Model Selector | Select model with auto-fetch on connection |
| Ollama Load Model | Loads a model into Ollama’s memory |
| Ollama Chat Completion | Generate text with conversation history |
| Ollama Unload Model | Unloads a model to free memory |
| Node | Description |
|——|————-|
| Ollama Debug: History | Formats conversation history as readable text for inspection |
| Ollama Debug: History Length | Returns the number of messages in conversation history |
| Node | Parameter | Range/Type | Default | Description |
|——|———–|————|———|————-|
| Temperature | temperature | 0.0-2.0 | 0.8 | Controls randomness (0=deterministic, 2=very random) |
| Seed | seed | INT | 42 | Random seed for reproducible generation |
| Max Tokens | max_tokens | 1-4096 | 128 | Maximum tokens to generate |
| Top P | top_p | 0.0-1.0 | 0.9 | Nucleus sampling threshold |
| Top K | top_k | 1-100 | 40 | Top-k sampling (Ollama-specific) |
| Repeat Penalty | repeat_penalty | 0.0-2.0 | 1.1 | Penalty for repetition (Ollama-specific) |
| Extra Body | extra_body | JSON | {} | Advanced parameters (num_ctx, num_gpu, etc.) |
The architecture provides a clean, composable workflow:
[Ollama Client] → [Model Selector] → [Load Model] → [Chat Completion] → [Unload Model]
↓ ↓ ↓ ↓
(endpoint) (pick model, (load with) (generate text,
auto-refresh) keep_alive) track history)
Key Benefits:
Example Workflow: Simple Chat
1. Ollama Client (endpoint: http://localhost:11434)
↓
2. Model Selector (model: "llama3.2", refresh: true)
↓
3. Load Model (keep_alive: "-1")
↓
4. Chat Completion (prompt: "Hello!")
↓
5. Unload Model
Example Workflow: Multi-Turn Conversation
1. [Client] → [Selector] → [Load] → [Chat 1: "My name is Alice"]
↓ (history)
[Chat 2: "What's my name?"]
↓ (history)
[Chat 3: "Tell me a joke"]
↓
2. Unload Model
Example Workflow: Chat with Options
[Client] → [Selector] → [Load Model]
↓
┌───────────────────┴────────────────────┐
↓ ↓ ↓
[Temperature=0.7] [Seed=42] [MaxTokens=200]
└───────────────────┬────────────────────┘
↓ (merged options)
[Chat Completion]
↓
"Deterministic response"
Example Workflow: Advanced Parameters
[Temperature=0.8] → [TopK=50] → [RepeatPenalty=1.2] → [ExtraBody]
↓
{"num_ctx": 4096}
↓
[Chat Completion]
This pattern optimizes memory by unloading models when not needed, while maintaining full conversation context and precise control over generation parameters.
Default: http://localhost:11434
Override by specifying a different endpoint in the “Refresh Model List” or “Load/Unload” nodes.
Control how long models stay in memory:
-1 (default): Keep loaded indefinitely5m: Keep for 5 minutes1h: Keep for 1 hour0: Unload immediatelyThe Ollama Chat Completion node supports:
Required:
client – Ollama client connectionmodel – Model name (auto-populated from selector)prompt – User message/questionOptional:
system_prompt – Instructions to guide model behaviorhistory – Previous conversation (for multi-turn chat)options – Generation parameters (temperature, seed, etc.)format – Output format: “none” (default, text) or “json” (structured JSON)image – Image input for vision modelsOutputs:
response – Generated texthistory – Updated conversation (connect to next chat node)Caching & Performance:
The chat node intelligently caches results to avoid unnecessary LLM calls:
OllamaOptionSeed node, identical inputs will be cached (like standard ComfyUI nodes). This prevents wasteful re-execution when re-running the same workflow.Example: Deterministic workflow with caching
[Seed=42] → [Chat Completion] → Output
↓
(Cached on re-run!)
This matches ComfyUI’s standard behavior and significantly reduces API costs when iterating on workflows.
The format parameter enables structured output for workflows that need parseable data:
Example: Extract structured data
[Chat Completion]
├── format: "json"
├── prompt: "Extract person data: 'Alice is 30 years old'"
└── system_prompt: "Return JSON with keys: name, age"
Output: {"name": "Alice", "age": 30}
When to use JSON mode:
Note: Set format to “json” to enable. The model will ensure valid JSON output.
Ollama Debug: History – Inspect conversation memory
[Chat History] → [Debug: History]
↓
Formatted Text Output:
=== Conversation History (3 messages) ===
[1] SYSTEM:
You are helpful
[2] USER:
Hello
[3] ASSISTANT:
Hi there!
Ollama Debug: History Length – Count messages
[Chat History] → [History Length] → Output: 5 (messages)
Use cases:
Logs are written to:
logs/ollama_manager.json (14-day retention, compressed)Example log output:
08:36:30 | INFO | refresh-abc123 | 🔄 Refreshing model list from http://localhost:11434
08:36:30 | INFO | refresh-abc123 | ✅ Model list refreshed: 3 models available
08:36:31 | INFO | load-def456 | ⬇️ Loading model 'llava:latest' (keep_alive=-1)
08:36:32 | INFO | load-def456 | ✅ Model 'llava:latest' loaded successfully
comfyui-ollama-model-manager/
├── __init__.py # ComfyUI entry point
├── install.py # Dependency installer (uv/pip auto-detect)
├── pyproject.toml # Package metadata & dependencies
├── src/
│ └── comfyui_ollama_model_manager/
│ ├── __init__.py # Package init
│ ├── nodes.py # Model management nodes
│ ├── chat.py # Chat completion node
│ ├── types.py # Custom type definitions
│ ├── ollama_client.py # API client (fetch, load, unload, chat)
│ ├── api.py # ComfyUI API routes
│ ├── state.py # Model cache
│ ├── log_config.py # Logging setup
│ └── async_utils.py # Async utilities
├── tests/ # Pytest test suite (52 tests)
└── web/
└── ollama_widgets.js # Auto-fetch UI logic
# With uv (recommended)
uv run pytest
# Or with pip
pip install pytest pytest-asyncio
pytest
pip list | grep -E "httpx|loguru|rich"curl http://localhost:11434/api/tagsIf you see ModuleNotFoundError, install dependencies manually:
pip install httpx loguru rich
Close ComfyUI and run:
ComfyUI\python_embeded\python.exe -m pip install --upgrade httpx loguru rich
[Add your license here]