# Core dependencies for basic stereo image generation torch numpy Pillow opencv-python numba scipy psutil moderngl # Native VR Viewer Dependencies (PyOpenXR) # For native VR viewing with auto-launch to headset pyopenxr>=1.0.0 PyOpenGL>=3.1.0 PyOpenGL_accelerate>=3.1.0 glfw>=2.0.0 opencv-python>=4.0.0 # For video playback support pygame>=2.0.0 # For audio playback # Note: ffmpeg must be installed separately for audio extraction from videos # Windows: Download from https://ffmpeg.org/download.html # Linux: sudo apt install ffmpeg # Mac: brew install ffmpeg # StereoDiffusion Dependencies # For AI-powered stereo generation using diffusion models diffusers>=0.21.0 transformers accelerate einops tqdm scikit-image # Note: torch, numpy, Pillow, and opencv-python are already included in base dependencies above




A stereoscopic 3D toolkit for ComfyUI that combines three solutions into one unified package:
This example demonstrates the full stereo conversion pipeline using a depth map and video input.
Download workflow:
Video2Stereo.json — Import directly into ComfyUI
This is the final stereoscopic video generated by ComfyStereo.
Direct link:
This is the original monoscopic input video before stereo processing.
Direct link:
This is the depth map generated and used in the workflow.
Direct link:
cd ComfyUI/custom_nodes/
git clone https://github.com/Dobidop/ComfyStereo.git
cd ComfyStereo
# Base only (stereo generation + DeoVR)
pip install -r requirements.txt
The main node for depth-based stereo conversion.
Inputs:
image (IMAGE) – Source imagedepth_map (IMAGE) – Depth map (grayscale)divergence (FLOAT) – Stereo effect strength (0.05-15.0, default: 3.5)separation (FLOAT) – Additional horizontal shift (-5.0 to 5.0)stereo_balance (FLOAT) – Effect distribution between eyes (-0.95 to 0.95)convergence_point (FLOAT) – Depth level at screen plane (0.0-1.0, default: 0.5)modes – Output format: left-right, right-left, top-bottom, bottom-top, red-cyan-anaglyphfill_technique – Infill method (see Infill Methods)depth_map_blur (BOOLEAN) – Enable edge-aware depth blurringdepth_blur_edge_threshold (FLOAT) – Gradient sharpness cutoff (0.1-15.0)batch_size (INT) – Frames per memory cleanup cycleOutputs:
stereoscope (IMAGE) – Final stereo imageblurred_depthmap_left (IMAGE) – Processed left depth mapblurred_depthmap_right (IMAGE) – Processed right depth mapno_fill_imperfect_mask (MASK) – Unfilled region maskAuto-launches images directly into VR headset.
Inputs:
image (IMAGE) – Stereo imagestereo_format – Side-by-Side, Over-Under, Monoprojection_type – Flat Screen, Curved Screen, 180° Dome, 360° Spherescreen_size (FLOAT) – Virtual screen size (1.0-10.0)screen_distance (FLOAT) – Distance from viewer (1.0-10.0)swap_eyes (BOOLEAN) – Swap left/rightauto_launch (BOOLEAN) – Launch into headsetbackground_color – Black, Dark Gray, Gray, WhiteOutputs:
passthrough (IMAGE) – Original imagePlay stereo videos in VR with keyboard controls.
Inputs:
video_path (STRING) – Path to stereo videostereo_format – Side-by-Side, Over-Under, Monoprojection_type – Flat Screen, Curved Screen, 180° Dome, 360° Spherescreen_size (FLOAT) – Virtual screen sizescreen_distance (FLOAT) – Distance from viewerswap_eyes (BOOLEAN)loop_video (BOOLEAN)auto_launch (BOOLEAN)background_colorCheck PyOpenXR installation and VR runtime availability.
Outputs:
status_message (STRING) – Diagnostic informationis_available (BOOLEAN) – VR readinessAI-powered stereo generation using diffusion models.
Inputs:
image (IMAGE) – Source imagedepth_map (IMAGE) – Depth mapscale_factor (FLOAT) – Disparity strength (1.0-20.0, default: 9.0)direction – “uni” (unidirectional) or “bi” (bidirectional) attentiondeblur (BOOLEAN) – Add noise to unfilled regionsnum_ddim_steps (INT) – DDIM steps (10-100, default: 50)null_text_optimization (BOOLEAN) – Enable for better quality (slower)guidance_scale (FLOAT) – CFG scale (1.0-20.0, default: 7.5)model_id (STRING) – HuggingFace model ID (fallback if MODEL/CLIP/VAE not provided)Outputs:
stereo_pair (IMAGE) – Side-by-side stereo imageleft_image (IMAGE) – Left eye viewright_image (IMAGE) – Right eye viewSupports: SD1.x and SD2.x models (SDXL/FLUX planned)
Controls the strength of the 3D effect. Higher values = more depth perception.
Controls which depth appears at screen plane (zero parallax).
0.0 = Nearest depth at screen → Content recedes behind screen0.5 = Mid-depth at screen → Balanced (default)1.0 = Furthest depth at screen → Content pops toward viewerUse cases:
Distributes divergence between eyes.
0.0 = Even distributionAdditional horizontal shift percentage (independent of depth).
Depth processing is automatically GPU-accelerated when CUDA is available:
pip install -r requirements.txt
num_ddim_steps to 30 for faster processingnull_text_optimization for 3x speed (lower quality)guidance_scale 3-5 to reduce “burned” looknum_ddim_steps, close other appsnull_text_optimization, increase num_ddim_stepsMIT License – see LICENSE file for details.
Note: This project includes code from multiple sources:
Created by Dobidop
@inproceedings{wang2024stereodiffusion,
title={StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models},
author={Wang, Lezhong and Frisvad, Jeppe Revall and Jensen, Mark Bo and Bigdeli, Siavash Arjomand},
booktitle={CVPR},
year={2024}
}
@article{hertz2022prompt,
title={Prompt-to-prompt image editing with cross attention control},
author={Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
year={2022}
}
Contributions welcome! Please submit issues or pull requests.