ComfyUI_FL-SongGen

★ 59

歌曲生成人声合成风格迁移ComfyUI节点

基于腾讯SongGeneration(LeVo)模型的ComfyUI节点，用歌词生成完整歌曲（人声+伴奏），支持风格迁移、预设、自动下载模型与最长4分30秒输出。

💡 根据分段歌词和风格参考生成完整人声伴奏歌曲。

🍴 13 Forks💻 Python🔄 2026-01-24

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/e00a65475347

📦 requirements.txt

torch>=2.0.0
torchaudio>=2.0.0
omegaconf>=2.3.0
transformers>=4.37.0
einops>=0.8.0
einops-exts>=0.0.4
diffusers>=0.27.0
huggingface-hub>=0.25.0
librosa>=0.11.0
soundfile>=0.12.0
numpy>=1.24.0
safetensors>=0.4.0
filelock>=3.13.0
openunmix>=1.3.0
alias-free-torch>=0.0.6
descript-audio-codec>=1.0.0
julius>=0.2.7
lameenc>=1.8.1
k-diffusion>=0.1.1
vector-quantize-pytorch>=1.14.0
x-transformers>=2.0.0
packaging>=21.0

📄 README

FL Song Gen

AI-powered song generation nodes for ComfyUI based on Tencent’s SongGeneration (LeVo) model. Generate complete songs with vocals and instrumentals from lyrics.

[](https://github.com/AslpLab/SongGeneration)

[](https://www.patreon.com/Machinedelusions)

Features

Full Song Generation – Complete songs with vocals and instrumentals

Dual-Track Output – Separate vocal and background music tracks

Lyrics-to-Song – Structured lyrics with sections (verse, chorus, bridge, intro, outro)

Style Transfer – Use reference audio to guide the musical style

Text Descriptions – Control gender, timbre, genre, emotion, and BPM

Auto Style Presets – Quick generation with Pop, Rock, Jazz, and more

Long-Form Generation – Up to 4 minutes 30 seconds per song

Automatic Downloads – Models download automatically on first use

Nodes

| Node | Description |

|——|————-|

| Model Loader | Load SongGeneration model with memory options |

| Lyrics Formatter | Build properly formatted lyrics from sections |

| Description Builder | Create style descriptions from components |

| Generate | Main generation with text conditioning |

| Style Transfer | Generate using reference audio for style |

| Auto Style | Generate with preset style prompts |

Installation

ComfyUI Manager (Recommended)

Search for “FL Song Gen” in ComfyUI Manager and install.

Manual Installation

cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_FL-SongGen.git
cd ComfyUI_FL-SongGen
pip install -r requirements.txt

Models

Models download automatically on first use to ComfyUI/models/songgen/.

|——-|:————:|:—-:|———–|

| songgeneration_base | 2m 30s | 10-16 GB | Chinese |

| songgeneration_base_new | 2m 30s | 10-16 GB | Chinese, English |

| songgeneration_base_full | 4m 30s | 12-18 GB | Chinese, English |

| songgeneration_large | 4m 30s | 22-28 GB | Chinese, English |

Note: VRAM range shows low memory mode vs normal mode. Enable low_mem in the Model Loader for reduced VRAM usage.

Quick Start

Add FL Song Gen Model Loader and select model variant

Add FL Song Gen Lyrics Formatter to build your lyrics

Add FL Song Gen Description Builder for style (optional)

Connect to FL Song Gen Generate node

Connect outputs to audio save/preview nodes

Prompting Guide

Getting the best results requires understanding how to format lyrics and descriptions properly.

Lyrics Format

Basic Structure

Lyrics use section tags separated by ; (space-semicolon-space) with phrases separated by periods .:

[intro-short] ; [verse] First line. Second line. Third line ; [chorus] Chorus line one. Chorus line two ; [outro-short]

Structure Labels

Instrumental sections (no lyrics):

| Tag | Duration | Description |

|—–|:——–:|————-|

| [intro-short] | ~0-10s | Short instrumental intro |

| [intro-medium] | ~10-20s | Medium instrumental intro |

| [inst-short] | ~0-10s | Short instrumental break |

| [inst-medium] | ~10-20s | Medium instrumental break |

| [outro-short] | ~0-10s | Short instrumental outro |

| [outro-medium] | ~10-20s | Medium instrumental outro |

Lyrical sections (lyrics required):

| Tag | Description |

|—–|————-|

| [verse] | Verse – typically tells the story |

| [chorus] | Chorus – the catchy, repeated hook |

| [bridge] | Bridge – contrasting part before final chorus |

Formatting Rules

Sections are separated by ; (with spaces)

Lyrics within sections are separated by periods .

Each period represents a phrase/line break

Do NOT add lyrics to instrumental tags

Complete Song Example

[intro-short] ; [verse] These faded memories of us. I can't erase the tears you cried before. Unchained this heart to find its way. My peace won't beg you to stay ; [chorus] Like a fool begs for supper. I find myself waiting for her. Only to find the broken pieces of my heart. That was needed for my soul to love again ; [inst-short] ; [verse] Silhouettes where you once stood. Life's rhythm changed its beat for good. Numb to whispers we once knew. My path won't circle back to you ; [chorus] Like a fool begs for supper. I find myself waiting for her. Only to find the broken pieces of my heart. That was needed for my soul to love again ; [outro-short]

Style Descriptions

Format

"gender, timbre, genre, emotion, instruments, the bpm is X"

All dimensions are optional and can be combined in any order.

Available Options

| Dimension | Options |

|———–|———|

| Gender | male, female |

| Timbre | dark, bright, warm, soft, rock |

| Genre | pop, rock, jazz, hip hop, R&B, folk, electronic, blues, country, classical, soul, reggae, k-pop |

| Emotion | sad, happy, emotional, angry, uplifting, romantic, melancholic, intense |

| Instruments | See list below |

| BPM | the bpm is 120 (use this exact phrase format) |

Common Instrument Combinations

piano and drums

guitar and drums

synthesizer and piano

acoustic guitar and piano

piano and strings

guitar and synthesizer

piano and saxophone

electric guitar and drums

synthesizer and drums

acoustic guitar and drums

Example Descriptions

female, warm, pop, emotional, piano and drums, the bpm is 120

male, dark, hip hop, sad, synthesizer and drums

female, bright, jazz, romantic, piano and saxophone, the bpm is 90

male, rock, intense, electric guitar and drums, the bpm is 140

Auto Style Presets

When using Auto Style mode, select from these presets:

| Preset | Description |

|——–|————-|

| Pop | Modern pop music |

| R&B | Rhythm and blues |

| Dance | Electronic dance music |

| Jazz | Jazz style |

| Folk | Folk/acoustic |

| Rock | Rock music |

| Chinese Style | Modern Chinese pop |

| Chinese Tradition | Traditional Chinese music |

| Chinese Opera | Chinese opera style |

| Metal | Heavy metal |

| Reggae | Reggae style |

| Auto | Let the model choose |

Style Transfer (Reference Audio)

Use a 10-second reference audio to guide the musical style:

Only the first 10 seconds of the audio will be used

Using the chorus section of a reference song works best

Influences: genre, instrumentation, rhythm, and voice characteristics

Combining with Descriptions

You can optionally provide a text description alongside reference audio to further guide the generation. This can be useful to:

Specify voice gender when the reference audio is ambiguous

Add specific emotions or timbres

Set a specific BPM

Note: If the description conflicts with the reference audio style, results may be unpredictable. Use complementary descriptions for best results.

Tips for Better Results

Lyrics Tips

Keep phrases natural and singable

Use repetition strategically, especially in the chorus

Match syllable counts roughly between verses

Use emotionally evocative language

Description Tips

Use commas to separate attributes

Stick to predefined tags for best results

Don’t overload with too many conflicting descriptors

BPM must use the exact format: the bpm is X

General Tips

Start with shorter songs to test your prompts

The base_new model is recommended for English lyrics

Enable low_mem mode if you’re running low on VRAM

Instrumental sections help create natural song flow

Requirements

| Requirement | Specification |

|————-|—————|

| Python | 3.10+ |

| CUDA | 11.8+ (for GPU acceleration) |

| RAM | 16 GB minimum (32 GB+ recommended) |

| VRAM | 10-28 GB (depends on model) |

Note: CPU-only mode is supported but very slow. Mac MPS may have limited support.

License

Apache 2.0

Credits

Based on SongGeneration (LeVo) by Tencent AI Lab.