torch>=2.0.0 torchaudio>=2.0.0 omegaconf>=2.3.0 transformers>=4.37.0 einops>=0.8.0 einops-exts>=0.0.4 diffusers>=0.27.0 huggingface-hub>=0.25.0 librosa>=0.11.0 soundfile>=0.12.0 numpy>=1.24.0 safetensors>=0.4.0 filelock>=3.13.0 openunmix>=1.3.0 alias-free-torch>=0.0.6 descript-audio-codec>=1.0.0 julius>=0.2.7 lameenc>=1.8.1 k-diffusion>=0.1.1 vector-quantize-pytorch>=1.14.0 x-transformers>=2.0.0 packaging>=21.0

AI-powered song generation nodes for ComfyUI based on Tencent’s SongGeneration (LeVo) model. Generate complete songs with vocals and instrumentals from lyrics.
[](https://github.com/AslpLab/SongGeneration)
[](https://www.patreon.com/Machinedelusions)
| Node | Description |
|——|————-|
| Model Loader | Load SongGeneration model with memory options |
| Lyrics Formatter | Build properly formatted lyrics from sections |
| Description Builder | Create style descriptions from components |
| Generate | Main generation with text conditioning |
| Style Transfer | Generate using reference audio for style |
| Auto Style | Generate with preset style prompts |
ComfyUI Manager (Recommended)
Search for “FL Song Gen” in ComfyUI Manager and install.
Manual Installation
cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_FL-SongGen.git
cd ComfyUI_FL-SongGen
pip install -r requirements.txt
Models download automatically on first use to ComfyUI/models/songgen/.
| Model | Max Duration | VRAM | Languages |
|——-|:————:|:—-:|———–|
| songgeneration_base | 2m 30s | 10-16 GB | Chinese |
| songgeneration_base_new | 2m 30s | 10-16 GB | Chinese, English |
| songgeneration_base_full | 4m 30s | 12-18 GB | Chinese, English |
| songgeneration_large | 4m 30s | 22-28 GB | Chinese, English |
Note: VRAM range shows low memory mode vs normal mode. Enable
low_memin the Model Loader for reduced VRAM usage.
Getting the best results requires understanding how to format lyrics and descriptions properly.
Lyrics Format
Lyrics use section tags separated by ; (space-semicolon-space) with phrases separated by periods .:
[intro-short] ; [verse] First line. Second line. Third line ; [chorus] Chorus line one. Chorus line two ; [outro-short]
Instrumental sections (no lyrics):
| Tag | Duration | Description |
|—–|:——–:|————-|
| [intro-short] | ~0-10s | Short instrumental intro |
| [intro-medium] | ~10-20s | Medium instrumental intro |
| [inst-short] | ~0-10s | Short instrumental break |
| [inst-medium] | ~10-20s | Medium instrumental break |
| [outro-short] | ~0-10s | Short instrumental outro |
| [outro-medium] | ~10-20s | Medium instrumental outro |
Lyrical sections (lyrics required):
| Tag | Description |
|—–|————-|
| [verse] | Verse – typically tells the story |
| [chorus] | Chorus – the catchy, repeated hook |
| [bridge] | Bridge – contrasting part before final chorus |
; (with spaces).Complete Song Example
[intro-short] ; [verse] These faded memories of us. I can't erase the tears you cried before. Unchained this heart to find its way. My peace won't beg you to stay ; [chorus] Like a fool begs for supper. I find myself waiting for her. Only to find the broken pieces of my heart. That was needed for my soul to love again ; [inst-short] ; [verse] Silhouettes where you once stood. Life's rhythm changed its beat for good. Numb to whispers we once knew. My path won't circle back to you ; [chorus] Like a fool begs for supper. I find myself waiting for her. Only to find the broken pieces of my heart. That was needed for my soul to love again ; [outro-short]
Style Descriptions
"gender, timbre, genre, emotion, instruments, the bpm is X"
All dimensions are optional and can be combined in any order.
| Dimension | Options |
|———–|———|
| Gender | male, female |
| Timbre | dark, bright, warm, soft, rock |
| Genre | pop, rock, jazz, hip hop, R&B, folk, electronic, blues, country, classical, soul, reggae, k-pop |
| Emotion | sad, happy, emotional, angry, uplifting, romantic, melancholic, intense |
| Instruments | See list below |
| BPM | the bpm is 120 (use this exact phrase format) |
piano and drumsguitar and drumssynthesizer and pianoacoustic guitar and pianopiano and stringsguitar and synthesizerpiano and saxophoneelectric guitar and drumssynthesizer and drumsacoustic guitar and drumsfemale, warm, pop, emotional, piano and drums, the bpm is 120
male, dark, hip hop, sad, synthesizer and drums
female, bright, jazz, romantic, piano and saxophone, the bpm is 90
male, rock, intense, electric guitar and drums, the bpm is 140
Auto Style Presets
When using Auto Style mode, select from these presets:
| Preset | Description |
|——–|————-|
| Pop | Modern pop music |
| R&B | Rhythm and blues |
| Dance | Electronic dance music |
| Jazz | Jazz style |
| Folk | Folk/acoustic |
| Rock | Rock music |
| Chinese Style | Modern Chinese pop |
| Chinese Tradition | Traditional Chinese music |
| Chinese Opera | Chinese opera style |
| Metal | Heavy metal |
| Reggae | Reggae style |
| Auto | Let the model choose |
Style Transfer (Reference Audio)
Use a 10-second reference audio to guide the musical style:
You can optionally provide a text description alongside reference audio to further guide the generation. This can be useful to:
Note: If the description conflicts with the reference audio style, results may be unpredictable. Use complementary descriptions for best results.
Tips for Better Results
the bpm is Xbase_new model is recommended for English lyricslow_mem mode if you’re running low on VRAM| Requirement | Specification |
|————-|—————|
| Python | 3.10+ |
| CUDA | 11.8+ (for GPU acceleration) |
| RAM | 16 GB minimum (32 GB+ recommended) |
| VRAM | 10-28 GB (depends on model) |
Note: CPU-only mode is supported but very slow. Mac MPS may have limited support.
Apache 2.0
Based on SongGeneration (LeVo) by Tencent AI Lab.