Automatically enables TensorFloat-32 (TF32) acceleration for NVIDIA RTX 30/40/50 series GPUs in ComfyUI.
Note: to use torch compile you have to disable cudamallochasync
cd ComfyUI/custom_nodes
git clone https://github.com/marduk191/ComfyUI-TF32-Enabler.git
# Or download and extract the zip file
When ComfyUI starts, you should see:
============================================================
🚀 ComfyUI TF32 Acceleration Enabled
============================================================
Matmul TF32: True
cuDNN TF32: True
CUDA Allocator: expandable_segments:True
GPU: NVIDIA GeForce RTX 5090
Compute Capability: 10.0
✅ torch.compile CUDA allocator fix applied
============================================================
Run the included test script to verify torch.compile works:
cd ComfyUI/custom_nodes/ComfyUI-TF32-Enabler
python test_torch_compile.py
This custom node enables:
torch.backends.cuda.matmul.allow_tf32 = Truetorch.backends.cudnn.allow_tf32 = TruePYTORCH_CUDA_ALLOC_CONF=expandable_segments:TrueTF32 uses 10-bit mantissa (vs FP32’s 23-bit) while maintaining the same 8-bit exponent range, providing:
The expandable segments allocator configuration resolves memory allocation issues when using torch.compile with CUDA operations.
Typical speedups on RTX 5090:
Works with all ComfyUI workflows and custom nodes. No conflicts expected.
MIT License – See LICENSE file for details
Issues and pull requests welcome!
Note: If your GPU doesn’t support TF32 (older than RTX 30 series), this node will safely do nothing and won’t cause errors.