ComfyStereo

★ 41

立体渲染视差图生成ComfyUI插件性能优化

为ComfyUI提供两款立体图像生成节点：Stereo Image Node（基于Automatic1111深度脚本）与 LazyStereo，快速生成左右视图与视差图。

💡 在ComfyUI中生成左右视图与视差图用于VR或3D合成。

🍴 4 Forks💻 Python🔄 2026-02-26

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/e58c8376a81b

📦 requirements.txt

#
Core
dependencies
for
basic
stereo
image
generation
torch
numpy
Pillow
opencv-python
numba
scipy
psutil
moderngl
#
Native
VR
Viewer
Dependencies
(PyOpenXR)
#
For
native
VR
viewing
with
auto-launch
to
headset
pyopenxr>=1.0.0
PyOpenGL>=3.1.0
PyOpenGL_accelerate>=3.1.0
glfw>=2.0.0
opencv-python>=4.0.0
#
For
video
playback
support
pygame>=2.0.0
#
For
audio
playback
#
Note:
ffmpeg
must
be
installed
separately
for
audio
extraction
from
videos
#
Windows:
Download
from
https://ffmpeg.org/download.html
#
Linux:
sudo
apt
install
ffmpeg
#
Mac:
brew
install
ffmpeg
#
StereoDiffusion
Dependencies
#
For
AI-powered
stereo
generation
using
diffusion
models
diffusers>=0.21.0
transformers
accelerate
einops
tqdm
scikit-image
#
Note:
torch,
numpy,
Pillow,
and
opencv-python
are
already
included
in
base
dependencies
above

📄 README

ComfyStereo – Stereoscopic 3D Toolkit for ComfyUI

A stereoscopic 3D toolkit for ComfyUI that combines three solutions into one unified package:

Stereo Image Generation – Depth-based stereo conversion with GPU acceleration

Native VR Viewing – PyOpenXR viewer for direct VR headset viewing

StereoDiffusion – AI-powered stereo generation using diffusion models

Example Workflow

This example demonstrates the full stereo conversion pipeline using a depth map and video input.

Example Workflow Graph

Download workflow:

Video2Stereo.json — Import directly into ComfyUI

Final Stereo Output

This is the final stereoscopic video generated by ComfyStereo.

Direct link:

stereovideo.mp4

Original Input Video

This is the original monoscopic input video before stereo processing.

Direct link:

example-video.mp4

Depth Map Used

This is the depth map generated and used in the workflow.

Direct link:

depthmap_video.webm

Features Overview

Core Stereo Generation

GPU-Accelerated Processing – 5-20x faster depth processing with CUDA

Advanced Fill Techniques – Multiple interpolation methods (Polylines, Naive, Hybrid Edge, GPU Warp)

Edge-Aware Depth Blurring – Reduces artifacts at high divergence settings

Multiple Output Formats – Side-by-Side, Top-Bottom, Red-Cyan Anaglyph

Batch Video Processing – Memory-efficient video frame processing

Native VR Viewing

Auto-Launch to Headset – Direct VR viewing without browser

Multiple Stereo Formats – Side-by-Side, Over-Under, Mono

Projection Options – Flat, Curved, 180° Dome, 360° Sphere

Image & Video Support – View both stereo images and videos

All VR Headsets – Quest, Vive, Index, WMR, and more

StereoDiffusion AI

AI-Powered Generation – Uses diffusion models for stereo creation

DDIM Inversion – Null-text optimization for high-quality reconstruction

Bilateral Neighbor Attention – Stereo-consistent diffusion

ComfyUI Native Models – Works with MODEL/CLIP/VAE inputs

Diffusers Support – Also works with HuggingFace model IDs

Installation

Method 1: ComfyUI Manager (Recommended)

Open ComfyUI Manager

Search for “ComfyStereo”

Click Install

Restart ComfyUI

Method 2: Manual Installation

Clone the repository:

cd ComfyUI/custom_nodes/
git clone https://github.com/Dobidop/ComfyStereo.git
cd ComfyStereo

Install dependencies (choose one):

# Base only (stereo generation + DeoVR)
pip install -r requirements.txt

Restart ComfyUI

Available Nodes

Stereo Image Generation Nodes

1. Stereo Image Node

The main node for depth-based stereo conversion.

Inputs:

image (IMAGE) – Source image

depth_map (IMAGE) – Depth map (grayscale)

divergence (FLOAT) – Stereo effect strength (0.05-15.0, default: 3.5)

separation (FLOAT) – Additional horizontal shift (-5.0 to 5.0)

stereo_balance (FLOAT) – Effect distribution between eyes (-0.95 to 0.95)

convergence_point (FLOAT) – Depth level at screen plane (0.0-1.0, default: 0.5)

modes – Output format: left-right, right-left, top-bottom, bottom-top, red-cyan-anaglyph

fill_technique – Infill method (see Infill Methods)

depth_map_blur (BOOLEAN) – Enable edge-aware depth blurring

depth_blur_edge_threshold (FLOAT) – Gradient sharpness cutoff (0.1-15.0)

batch_size (INT) – Frames per memory cleanup cycle

Outputs:

stereoscope (IMAGE) – Final stereo image

blurred_depthmap_left (IMAGE) – Processed left depth map

blurred_depthmap_right (IMAGE) – Processed right depth map

no_fill_imperfect_mask (MASK) – Unfilled region mask

Native VR Viewer Nodes

2. Native Stereo Image Viewer

Auto-launches images directly into VR headset.

Inputs:

image (IMAGE) – Stereo image

stereo_format – Side-by-Side, Over-Under, Mono

projection_type – Flat Screen, Curved Screen, 180° Dome, 360° Sphere

screen_size (FLOAT) – Virtual screen size (1.0-10.0)

screen_distance (FLOAT) – Distance from viewer (1.0-10.0)

swap_eyes (BOOLEAN) – Swap left/right

auto_launch (BOOLEAN) – Launch into headset

background_color – Black, Dark Gray, Gray, White

Outputs:

passthrough (IMAGE) – Original image

3. Native Stereo Video Viewer

Play stereo videos in VR with keyboard controls.

Inputs:

video_path (STRING) – Path to stereo video

stereo_format – Side-by-Side, Over-Under, Mono

projection_type – Flat Screen, Curved Screen, 180° Dome, 360° Sphere

screen_size (FLOAT) – Virtual screen size

screen_distance (FLOAT) – Distance from viewer

swap_eyes (BOOLEAN)

loop_video (BOOLEAN)

auto_launch (BOOLEAN)

background_color

4. Native VR Status

Check PyOpenXR installation and VR runtime availability.

Outputs:

status_message (STRING) – Diagnostic information

is_available (BOOLEAN) – VR readiness

StereoDiffusion AI Nodes

5. StereoDiffusion Node

AI-powered stereo generation using diffusion models.

Inputs:

image (IMAGE) – Source image

depth_map (IMAGE) – Depth map

scale_factor (FLOAT) – Disparity strength (1.0-20.0, default: 9.0)

direction – “uni” (unidirectional) or “bi” (bidirectional) attention

deblur (BOOLEAN) – Add noise to unfilled regions

num_ddim_steps (INT) – DDIM steps (10-100, default: 50)

null_text_optimization (BOOLEAN) – Enable for better quality (slower)

guidance_scale (FLOAT) – CFG scale (1.0-20.0, default: 7.5)

model_id (STRING) – HuggingFace model ID (fallback if MODEL/CLIP/VAE not provided)

Outputs:

stereo_pair (IMAGE) – Side-by-side stereo image

left_image (IMAGE) – Left eye view

right_image (IMAGE) – Right eye view

Supports: SD1.x and SD2.x models (SDXL/FLUX planned)

Key Parameters Explained

Divergence

Controls the strength of the 3D effect. Higher values = more depth perception.

Low (1-3): Subtle depth

Medium (3-7): Balanced effect

High (7-15): Extreme pop-out

Convergence Point

Controls which depth appears at screen plane (zero parallax).

0.0 = Nearest depth at screen → Content recedes behind screen

0.5 = Mid-depth at screen → Balanced (default)

1.0 = Furthest depth at screen → Content pops toward viewer

Use cases:

Pop-out mode (1.0): Product displays, comics

Window mode (0.0): Subtle depth, natural recession

Portrait mode (0.6-0.7): Face at screen, background recedes

Landscape mode (0.3-0.4): Foreground pops, horizon recedes

Stereo Balance

Distributes divergence between eyes.

0.0 = Even distribution

Positive/negative = Shift effect toward one eye

Separation

Additional horizontal shift percentage (independent of depth).

GPU Acceleration

Depth processing is automatically GPU-accelerated when CUDA is available:

5-20x faster blur operations

Automatic fallback to CPU if GPU unavailable

Zero configuration – works out of the box

Native VR Setup

Requirements

Install PyOpenXR dependencies:

pip install -r requirements.txt

Install a VR runtime:

SteamVR (recommended) – Supports most headsets

Oculus Runtime – For Meta Quest headsets

Windows Mixed Reality – Built into Windows 10/11

Connect your VR headset

Supported Headsets

Meta Quest (1, 2, 3, Pro)

HTC Vive / Vive Pro

Valve Index

Windows Mixed Reality headsets

Any OpenXR-compatible device

StereoDiffusion Setup

Requirements

CUDA-capable GPU with 8GB+ VRAM (16GB recommended)

Python 3.8+

PyTorch 2.0+

First Run

Downloads Stable Diffusion model (releaseversion SD1.5) (~4GB) if not cached

Null-text optimization takes ~2-3 minutes on modern GPU

Model is cached for faster subsequent runs

Performance Tips

Lower num_ddim_steps to 30 for faster processing

Disable null_text_optimization for 3x speed (lower quality)

Use guidance_scale 3-5 to reduce “burned” look

Troubleshooting StereoDiffusion

Out of Memory: Reduce num_ddim_steps, close other apps

Black Output: Check depth map is valid grayscale

Poor Quality: Enable null_text_optimization, increase num_ddim_steps

License

MIT License – see LICENSE file for details.

Note: This project includes code from multiple sources:

StereoDiffusion components are based on StereoDiffusion (MIT License)

Diffusion utilities derived from prompt-to-prompt (Apache 2.0)

See NOTICE file for full attribution

Credits

Created by Dobidop

Acknowledgments

StereoDiffusion

@inproceedings{wang2024stereodiffusion,
  title={StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models},
  author={Wang, Lezhong and Frisvad, Jeppe Revall and Jensen, Mark Bo and Bigdeli, Siavash Arjomand},
  booktitle={CVPR},
  year={2024}
}

Prompt-to-Prompt

@article{hertz2022prompt,
  title={Prompt-to-prompt image editing with cross attention control},
  author={Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
  year={2022}
}

Contributing

Contributions welcome! Please submit issues or pull requests.

Support

Issues: GitHub Issues