torchao triton xformers




Note
日本語のREADMEはこちらです。
ComfyUI Flux Accelerator is a custom node for ComfyUI that accelerates Flux.1 image generation, just by using this node.
ComfyUI Flux Accelerator accelerates the generation of images by:
TAEF1 is a fast and efficient AutoEncoder that can encode and decode pixels in a very short time, in exchange for a little bit of quality.
ComfyUI Flux Accelerator utilizes torchao and torch.compile() to optimize the model and make it faster.
ComfyUI Flux Accelerator offers an option to skip redundant DiT blocks, which directly affects the speed of the generation.
You can choose the number of blocks to skip in the node (default is 3, 12 of MMDiT blocks).
ComfyUI Flux Accelerator can generate images up to _37.25%_ faster than the default settings.
Here are some examples (tested on RTX 4090):
custom_nodes folder of ComfyUI“`bash
git clone https://github.com/discus0434/comfyui-flux-accelerator.git
mv comfyui-flux-accelerator custom_nodes/
“`
“`bash
## Copied and modified https://github.com/facebookresearch/xformers/blob/main/README.md
# cuda 11.8 version
pip3 install -U torch torchvision torchao triton xformers –index-url https://download.pytorch.org/whl/cu118
# cuda 12.1 version
pip3 install -U torch torchvision torchao triton xformers –index-url https://download.pytorch.org/whl/cu121
# cuda 12.4 version
pip3 install -U torch torchvision torchao triton xformers –index-url https://download.pytorch.org/whl/cu124
“`
“`bash
cd custom_nodes/comfyui-flux-accelerator
chmod +x scripts/download_taef1.sh
./scripts/download_taef1.sh
“`
_Launch command may vary depending on your environment._
a. If you have H100, L40 or more newer GPU
“`bash
python main.py –fast –highvram –disable-cuda-malloc
“`
b. If you have RTX 4090
“`bash
python main.py –fast –highvram
“`
c. Otherwise
“`bash
python main.py
“`
workflow folder_You can load the workflow by clicking the Load button in the ComfyUI._
Just use the FluxAccelerator node in the workflow, and you’re good to go!
_If your GPU has less than 24GB VRAM, you may encounter frequent Out Of Memory errors when changing parameters. But simply ignore them and run again and it will work!_
ComfyUI Flux Accelerator has the following limitations:
ComfyUI Flux Accelerator sacrifices _a little bit_ of quality for speed by using TAEF1 and skipping redundant DiT layers. If you need high-quality images, you may want to use the default settings.
ComfyUI Flux Accelerator may take _30-60 seconds_ to compile the model for the first time. This is because it uses torch.compile() to optimize the model.
ComfyUI Flux Accelerator is now only compatible with Linux.
ComfyUI Flux Accelerator is licensed under the MIT License. See LICENSE for more information.