ComfyUI-Allegro

★ 5

文本到视频短视频生成Allegro集成模型下载与管理

在ComfyUI中集成Allegro，使用文本提示生成相对高质量的短视频，支持模型自动或手动下载与管理，便于在本地快速试用和部署。

💡 在ComfyUI中用文本提示快速生成高质量短视频并自动下载模型

🍴 3 Forks💻 Python🔄 2025-05-13

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/9d76119b2ef2

📦 requirements.txt

accelerate==0.33.0
diffusers==0.28.0
numpy==1.24.4
torch==2.4.1
tqdm==4.66.2
transformers==4.40.1
xformers==0.0.28.post1
einops==0.7.0
decord==0.6.0
sentencepiece==0.1.99
imageio
imageio-ffmpeg
ftfy
bs4

📄 README

ComfyUI-Allegro

ComfyUI supports over rhymes-ai/Allegro, which uses text prompt to generate short video in relatively high quality, especially comparing to other open source solutions available for now.

News 🔥

[25/1/14] Added supports on Image-to-Video models (Text Image to Video in Allegro term)

Installation

_Assuming that you are under your ComfyUI root directory_

git clone https://github.com/bombax-xiaoice/ComfyUI-Allegro custom_nodes/ComfyUI-Allegro

pip install -r custom_nodes/ComfyUI-Allegro/requirements.txt

_You can download the model file from huggingface or its mirror site beforehand, or just wait for the first run of (Down)Load Allegro Model or (Down)Load Allegro TextImage2Video Model to download it_

git lfs clone https://huggingface.co/rhymes-ai/Allegro custom_nodes/ComfyUI-Allegro/models

git lfs clone https://huggingface.co/rhymes-ai/Allegro-TI2V custom_nodes/ComfyUI-Allegro/ti2v_models

_Alternatively, if local disk space or download time is a concern, download the transformer from Allegro-TI2V only, then share other folders with Allegro_

mkdir -p ti2v_models/transformer/

wget https://huggingface.co/rhymes-ai/Allegro-TI2V/resolve/main/transformer/config.json -O ti2v_models/transformer/config.json

wget https://huggingface.co/rhymes-ai/Allegro-TI2V/resolve/main/transformer/diffusion_pytorch_model.safetensors -O ti2v_models/transformer/diffusion_pytorch_model.safetensors

ln -s custom_nodes/ComfyUI-Allegro/models/vae custom_nodes/ComfyUI-Allegro/ti2v_models/vae

ln -s custom_nodes/ComfyUI-Allegro/models/text_encoder custom_nodes/ComfyUI-Allegro/ti2v_models/text_encoder

ln -s custom_nodes/ComfyUI-Allegro/models/tokenizer custom_nodes/ComfyUI-Allegro/ti2v_models/tokenizer

ln -s custom_nodes/ComfyUI-Allegro/models/scheduler custom_nodes/ComfyUI-Allegro/ti2v_models/scheduler

Example Workflow

Drag the following image into comfyui, or click Load for custom_nodes/ComfyUI-Allegro/allegro-comfy-example.json

Results run under comfy

https://github.com/user-attachments/assets/75f90597-7e33-4076-b00f-7ed5d88ea22b

Example TextImage2Video Workflow

Drag the following image into comfyui, or click Load for custom_nodes/ComfyUI-Allegro/allegro-ti2v-comfy-example.json

Tips

Only verified that frame=88,width=1280,height=720 is working. Tried 24 frames and the result looks like random mosaics. Also tried width=560, where noisy bars show up along both left and right edges.

In (Down)Load Allegro Model, only provide the model path and leave others blank, unless you want to use alternative text encoder or vae models not provided by https://huggingface.co/rhymes-ai/Allegro

A default negative prompt will be used if you leave it blank in Allegro Text Encoder. A static template will also apply to the positive prompt.

Can skip Allegro Sampler’s input latents to use frames/width/height to initialize it randomly. Otherwise, the batch size of input latents must be set as 1/4 of the desirable frames count.

Verified to work on a single NVidia RTX 3070 card with 8G graphics memory, where __low_vram_mode__ is turned on to load 32 layers of unet transformer block one by one into gpu vram, VAE decoder is also loaded seperately, and text encoder fall over to cpu.

If you have enough graphics memory. You can try use –highvram on comfy start, where the entire pipeline is loaded into GPU directly to spare unnecessary conversion between CPU and GPU.

It is recommend to choose a preview method (inside comfy Manager), so that you can see intermediate result of each step during the long run.

In TextImage2Video mode, ref_images, as an required input to Allegro TextImage2Video Encoder, can be one reference image (starting frame), two reference images (starting and ending frame) or multiple reference images (frame interpolation). Then pass both ref_latents and ref_masks to Allegro TextImage2Video Sampler.

May use WAS-Suite‘s Image Batch to put reference images together.

The optional indices parameter further customizes image-to-frame mapping, e.g. 0,10,-1 map the first image to frame 0, the second image to frame 10, and the third image to the last frame.

Regarding the batch parameter in Encoder or Decoder, setting higher value may increase its speed at the risk of GPU OOM.